Complex Networks

Home Group Curriculum Vitae Recent Papers Talks Programs Photos

Complex Networks Econophysics Low-D Magnetism SOC CiteRank

Biomolecular networks

Complex biomolecular networks guide the biochemistry of a living cell on multiple levels: its metabolic and signaling pathways are shaped by the network of interacting proteins, whose production, in turn, is controlled by the genetic regulatory network.  More generally speaking, almost any complex system has a backbone network connecting its interacting modules to each other. 

Binding interactions between proteins are naturally described in terms of a weighted network in which individual edges are characterized by their binding strength (inverse dissociation constant) and individual protein-nodes – by their concentrations and subcellular localizations. However, the state-of-the-art high-throughput experimental techniques generate just a binary (yes or no) information about individual interactions. Therefore, most of the previous research concentrated just on topology of these networks, uncovering such general properties as a broad distribution of the number of binding partners of individual proteins, globally interconnected network architecture (the “small world” effect), a scarcity of direct interactions between highly connected hub-proteins, etc. We recently developed  a set of computational tools and analytical methods which allows one to go beyond purely topological studies of binding networks and efficiently calculate the mass-action equilibrium of protein concentrations and its response to systematic perturbations. 

Three views of a subset of 312 highly abundant nodes in yeast protein-binding network. (left panel) All binding links between these nodes. (middle panel) Binding links characterized by high concentration of heterodimers (>1,000 molecules per cell). (right panel) Concentration-coupled proteins A -> B with the property that a 2-fold increase in the total concentration of A reduces  the free concentration of its immediate binding partner B by 20% or more. Note that links roughly coincide with highly abundant dimers shown in the middle panel. Arrows reveal the preferential direction of propagation of perturbations (down the concentration gradient).

In our future works on this subject we plan to advance our understanding of biological effects of the mass-action equilibrium in protein binding networks by investigating 

Noise and fluctuations in equilibrium concentrations of proteins and their complexes. 
Genetic interactions and knockout lethality caused by disruption of binding equilibrium 
Competition between specific and non-specific binding interactions. 
Kinetics of relaxation and spatial inhomogeneity of concentrations 

Understanding the equilibrium and dynamical properties of protein binding networks, their response to large systematic perturbations and noise constitutes an important part of the system-level knowledge about an organism which could be further utilized in 

Quantitative simulation of complex cellular processes and pathways mediated by irreversible, catalytic interactions, that are not described by the mass-action equilibrium yet involve binding of reactants and depend on availability of participating proteins. 
Apprehension of undesirable cross-talk between different functional pathways. This is particularly important for prediction and minimization of side effects and interactions of drugs.

For any complex network it is important to know which properties of such a network were designed (or evolved in case of living systems), and which arose by pure chance. To answer this question in our 2002 Science paper Kim Sneppen and myself proposed an algorithm measuring the correlation profile of a given complex network. This profile compares the number of edges between pairs of nodes with degrees (numbers of neighbors) K0 and K1 in a given complex network in question and its properly randomized version, and detects statistically significant deviations of one from another. 

 We applied this algorithm to protein interaction and genetic regulatory networks in baker's yeast Saccharomyces cerevisiae.  It was found that links between highly connected proteins are systematically suppressed in favor of links between highly connected nodes and those of low connectivity. Such correlation pattern is beneficial for the cell since it decreases the likelihood of biologically undesirable cross talk and enhances the overall robustness of a network by localizing effects of deleterious perturbations.

 

 The correlation profile of the protein interaction network in yeast a part of which is shown in the left panel. Plotted is the correlation ratio R(K0,K1)=N(K0,K1)/Nr(K0,K1) between numbers of edges connecting nodes with degrees  K0 and K in the actual complex network [N(K0,K1)] and its randomly rewired null-model [Nr(K0,K1) ].  Randomized network is rewired in such a way that  degrees of all nodes are strictly preserved.  Red color specifies an enhanced number of connections, while blue and green – a suppressed one.

The network of physical interactions among nuclear proteins in yeast as measured  in the high-throughput two-hybrid experiment  (T. Ito, et. al., Proc. Natl. Acad. Sci. USA 98, 4569--4574 (2001)). Nodes are color coded according to  the effect of their deletion in null-mutant on viability of the yeast under laboratory conditions. Red nodes denote for essential proteins, whose null-mutants are not viable, while green ones denote non-essential proteins with viable null-mutants. 

 

 

Network connecting routers in the Internet

We also measured the correlation profile of the Internet. The data for connectivity of the Internet on the level of the so-called Autonomous Systems (AS) is available at the web site of the National Laboratory for Applied Network Research. Each AS is a large domain of IP addresses usually that usually belong to the same organization such as e.g. a university, a business enterprise, or an Internet Service Provider (ISP). Autonomous Systems connected by an edge in this network directly exchange information with each other.  The correlation profile of the Internet shown below is very different from that of the protein interaction network in yeast, even though their power-law degree distributions are deceivingly similar to each other.

 

Ranking in information networks (WWW and citations)

Due to their rapid growth and large size, many information networks have become untenable to navigate without some sort of ranking scheme. This is particularly evident in the example of the World Wide Web, a network of pages connected by hyperlinks. A successful solution to the problem of ranking the Web is Google’s PageRank algorithm. Another class of information networks that could benefit from such a ranking method are citation networks. These networks are comprised of scientific publications connected by citation links. Current methods of ranking publications based on the total number of citations received are rather crude. They are too ‘democratic’ in treating all citations as equal and ignoring differences in importance of citing papers. One of the advantages of Google’s PageRank algorithm is that it implicitly accounts for the importance of the citing article in a self-consistent fashion. However, there exist significant differences between the World Wide Web and citation networks that suggest a modification of the original PageRank algorithm. The most important difference is that, unlike hyperlinks, citations cannot be updated after publication. This makes ageing effects in citation networks much more pronounced than in the WWW. 

The success of the PageRank algorithm can be attributed, in part, to its ability to capture the behavior of people randomly browsing the network of web pages. Indeed, the PageRank of a given web page can be interpreted as the predicted traffic (quantified, for example, by the rate of downloads) for that page if every WWW user follows a random path starting from a randomly selected webpage. The assumption that a typical web surfer starts at a randomly selected webpage might be not completely unreasonable for the WWW, but it needs to be modified for citation networks. As all of us know, researchers typically start "surfing" scientific publications from a rather recent publication that caught their attention on a daily update of a preprint archive, a recent volume of a journal or, perhaps, was featured in a news article in the popular media. Thus a more realistic model for the traffic along the citation network should take into account that researchers ‘surfing’ the citation network preferentially start their quests from recent papers and progressively get to older and older papers with every step.

To account for strong ageing characteristics of citation networks, we modified the PageRank algorithm by initially distributing random surfers exponentially with age, in favor of more recent publications. The output of this algorithm, which we call CiteRank, is interpreted as approximate traffic to individual publications in a simple model of how researchers find new information. We optimize parameters of our algorithm to achieve the best performance. The results are compared for two rather different citation networks: all American Physical Society publications between 1893 and 2003 and the set of high-energy physics theory (hep-th) preprints. Despite major differences between these two networks, we find that their optimal parameters for the CiteRank algorithm are remarkably similar. The advantages and performance of CiteRank over more conventional methods of ranking publications are discussed.

 

Knowledge networks of opinions of customers on products

In our 2001 PRL paper with Yi-Cheng Zhang we introduced a concept of KNOWLEDGE NETWORKS. A simple example of a "knowledge network" is a collection of positive and negative opinions  that a large group of customers have about a number of products (e.g. books). Such networks of opinions are used by businesses (e.g. by the amazon.com) to recommend  their customers other books that they haven't read yet but which were highly rated by other customers with similar tastes. In the computer science lingo this process of reconstructing peoples' tastes based on their opinions on products is known as "collaborative filtering".

 

The difference between "plain" and "knowledge" networks is illustrated below:

Nodes in knowledge networks are characterized by M internal degrees of freedom (M-dimensional vectors), while their edges carry scalar products of vectors on two nodes they connect.

This rather abstract mathematical scenario can be realized in at least two real-life situations:

The above mentioned network of opinions that a group of customers ("readers") has on a number of products ("books"). Vectors on nodes correspond to customer's tastes and product features, while scalar products on edges  ("overlaps" ) are opinions of a given customer on a given product.

 

In biology this scenario can be realized in a network of "key-and-lock" interactions in a large ensemble of biomolecules. Vectors in this case specify the set of keys and locks present on the surface of a given molecule, while scalar products ("overlaps" ) stand for the interaction strength (binding energy).

Our algorithm allows one to predict "opinions" that are not yet known based on the opinions, which are already known. In case of customers and products it corresponds to predicting opinions of customers on products they never tried, while in bioinformatics application - to predicting binding energies of two molecules in silico. It turns out that the loop correlations described below allow to reconstruct missing opinions exactly provided that the density of known opinions in a knowledge network exceeds a certain value p2 ~ 2M/N reminiscent of the rigidity percolation threshold in statistical physics.

Below I describe a practical recursive algorithm on how such opinions (or binding energies in case of interacting biomolecules) can be reconstructed:



As can be seen below this algorithm exponentially converges above p2:

Home Next