Institute for Theory of Strongly Correlated and Complex Systems

Department of Physics , Condensed Matter Theory

International Workshop:

Complex biomolecular networks:
structure, evolution, and function

September 6-9, 2005

Montauk Yacht Club, 
Long Island, New York

Abstracts

Stefan Bornholdt, University of Bremen

Reliable gene regulation: Reproducible dynamics from noisy circuits

Why is life so immensely robust? To a large part this is due to the high reliability of the gene regulation machinery that controls the processes of the living cells and their coordination in multicellular organisms. But how does the cell operate stable genetic circuits, despite the noisy molecular basis of genetic switches and the lack of central clock-like synchronisation of their many constituents? I will talk about the most simple model where one can ask these questions: Networks of noisy switches. In large networks of such elements severe stability problems occur as, for example, propagating noise or desynchronized system dynamics. I sketch how these problems can be cured by suitable circuit architecture. Observed structural signatures of biological gene regulation networks support the hypothesis that gene network structures have been selected for stability against noise.
[1] K. Klemm and S. Bornholdt, q-bio/0309013
[2] K. Klemm and S. Bornholdt, q-bio/0409022

Mark Gerstein, Yale University

Computational Proteins: Understanding Protein Function on a Genome-scale using Networks

My talk will be concerned with topics in proteomics, in particular predicting protein function on a genomic scale. We approach this through the prediction and analysis of biological networks -- both of protein-protein interactions and transcription-factor-target relationships. I will describe how these networks can be determined through Bayesian integration of many genomic features and how they can be analyzed in terms of various simple topological statistics.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

A Bayesian networks approach for predicting protein-protein interactions from genomic data.
R Jansen, H Yu, D Greenbaum, Y Kluger, NJ Krogan, S Chung, A Emili, M Snyder, JF Greenblatt, M Gerstein (2003) Science 302: 449-53.

ExpressYourself: A modular platform for processing and visualizing microarray data.
NM Luscombe, TE Royce, P Bertone, N Echols, CE Horak, JT Chang, M Snyder, M Gerstein (2003) Nucleic Acids Res 31: 3477-82.

TopNet: a tool for comparing biological sub-networks, correlating protein
properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large topological changes.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004)
Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and protein-DNA
regulogs.
H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Kristin Gunsalus, New York University

Predictive models of molecular machines involved in early C. elegans embryogenesis

While numerous fundamental aspects of development have been uncovered through the discovery of individual genes and proteins, systems-level models are still missing for most developmental processes. The first two cell divisions of C. elegans constitute an ideal testbed for a systems-level approach. Early embryogenesis, including processes such as cell division and establishment of cellular polarity, is readily amenable to large-scale functional analysis. A first step toward a systems-level understanding is to provide “first-draft” models both of the molecular assemblies involved and of the functional connections between them. We show that such models can be derived from an integrated gene/protein network generated from three different types of functional relationships: protein interaction, expression profiling similarity, and phenotypic profiling similarity, as estimated from detailed early embryonic RNAi phenotypes systematically recorded for hundreds of genes. The topology of the integrated network suggests that early embryogenesis is achieved through coordination of a limited set of molecular machines. We have assayed the overall predictive value of such molecular machine models by dynamic localization of ten previously uncharacterized proteins within the living embryo.

Nicholas Ingolia, Harvard University

Topology and Robustness in Drosophila Segment Polarity

Previous work by von Dassow et al. demonstrated the robustness of a mathematical model of the genetic interactions that define the polarity of drosophila embryo segments.  I showed that this robustness is due to the positive feedback of gene products on their own expression.  This topological feature of the network allows individual cells in the model segment to adopt different stable expression states (bistability) corresponding to different cell types in the segment polarity pattern.  A positive feedback loop will only yield multiple stable states when the parameters that describe it satisfy a particular inequality. By testing which random parameter sets satisfy these inequalities, I show that bistability is necessary to form the segment polarity pattern and serves as a strong predictor of which parameter sets will succeed in forming the pattern.

Iaroslav Ispolatov, Ariadne Genomics

Dimers in evolution and topology of protein-protein interactions network

Protein-Protein Interaction (PPI) networks contain significantly more self-interacting proteins than expected if such homodimers appeared  randomly in the course of the evolution. On average, homodimers in PPIs of several eucaryotic organisms have twice as many interaction partners than non-self-interacting proteins.
A duplication of such self-interacting protein often creates a pair of paralogous proteins interacting with each other. We show that such pairs also occur more frequently than could be explained by pure chance alone.
Similar to homodimers, proteins involved in heterodimers with their paralogs, have about twice as many interacting partners as the rest of the network. The likelihood of a pair of paralogous proteins to interact with each other was also shown to decrease with divergence of their sequence similarity. This all points to the conclusion that most of interactions between paralogs are inherited from ancestral homodimeric proteins, rather than established after the duplication. We finally discuss the role of heterodimer links in creating such tightly linked subgraphs as triangles and higher cliques.

Wen-Hsiung Li, University of Chicago

Protein Function, Connectivity, Duplicability and Dispensability

Protein-protein interaction networks have evolved mainly through connectivity rewiring and gene duplication.  However, how protein function affects these processes and how a network grows in time have not been well studied.  Using protein-protein interaction data and genomic data from the budding yeast (Saccharomyces cerevisiae), we first examined whether there is a correlation between the age and connectivity of yeast proteins.  A steady increase in connectivity with protein age is observed for yeast proteins except for those that can be traced back to eubacteria. Second, we investigated whether protein connectivity and duplicability vary with gene function.  We found a higher average duplicability for proteins interacting with the external environments than for proteins localized within intracellular compartments.  For example, proteins that function in the cell periphery (mainly transporters) show a high duplicability but are lowly connected.  Conversely, proteins that function within the nucleus (e.g., transcription, RNA and DNA metabolisms, and ribosome biogenesis and assembly) are highly connected but have a low duplicability.  Third, we found a negative correlation between protein connectivity and duplicability. Finally, we studied the effect of protein complexity on gene dispensability and duplicability.

Ilya Mazo, Ariadne Genomics

Molecular Networks in Mammals: Extraction from Literature and Pathway Analysis

Using the proprietary high-content linguistics tool MedScan we compiled a comprehensive database of molecular networks by extracting the information from scientific literature. MedScan is capable of extracting functional associations between proteins, small molecules, and pathways, recognizes types of regulatory mechanisms involved, effects of regulation and experimental conditions.

The resulting database stores 700,000 relationships between mammalian proteins and chemicals including facts about protein interactions, promoter binding, molecular biosynthesis and trafficking, and cell process regulation. Different approaches towards reconstructing individual pathways or cascades from this database and assigning functional categories to proteins will be described.

Our visualization software is capable of systematically mining this database for small network motifs that are robust in regard to the effects induced at the gene expression levels. We have also developed a Bayesian framework for integration of microarray data and binary gene-to-gene regulatory relationships. The approach allows the reduction of expression pattern complexity and finds the minimal set of regulatory proteins that are responsible for differential expression of other genes.

Leonid Mirny, MIT

What can structure of the metabolic network tell us about function and evolution?

Understanding relationships between the structure (topology) and function of biological networks is a central question of systems biology. The idea that topology is a major determinant of systems function has become an attractive and highly disputed hypothesis. While the structural analysis of interaction networks demonstrates a correlation between the topological properties of a node (protein, gene) in the network and its functional essentiality, the analysis of metabolic networks fails to find such correlations.  On the contrary, approaches utilizing both the topology and biochemical parameters of metabolic networks, e.g. flux balance analysis (FBA), are more successful in predicting phenotypes of knock-out strains. Here we reconcile these seemingly conflicting results by showing that the topology of a metabolic network is, in fact, sufficient to predict the phenotypes of knock-out strains with the same accuracy as FBA on a large, unbiased dataset of mutants. This surprising result is obtained by introducing a novel topology-based measure of network transport: synthetic accessibility.

To investigate the structure of evolutionary modules and their relationship to functional ones, we integrated metabolic network with evolutionary associations between genes inferred from comparative genomics. Resulting metabolic-genomic network places metabolic pathways into evolutionary and genomic context, thereby revealing previously unknown components and modules. Comparison with traditional metabolic pathways shows that while in some cases there is almost exact correspondence, several pathways are split into independent modules. This study shows that evolutionary modules, rather than pathways may be thought of as regulatory and functional units in bacterial genomes.

Fritz Roth, Harvard University 

Analysis of I) S. cerevisiae synthetic-lethal interactions and II) a high-throughput experimental map of protein interaction in humans

A talk in two parts: I) yeast synthetic-lethal interactions and II) a high-throughput experimental map of human protein interactions.  Part I: Two genes have a synthetic lethal interaction if each mutants in each gene alone are viable, but mutation of both causes cell death. Such interactions provide robustness of an organism to mutation. We examined synthetic sick or lethal (SSL) genetic interactions from a systematic assay of ~500,000 gene pairs in /S. cerevisiae/. Here we describe: a) the value of SSL interactions in characterizing gene function; b) relationships to other biological networks; and c) exploitation of these relationships to predict synthetic genetic interactions.  Part II:  In collaboration with M. Vidal and others, we used a stringent high-throughput yeast two-hybrid system to test for interactions amongst the protein products of ~8,100 currently available Gateway-cloned open reading frames and detected ~2,800 interactions, with a >80% verification rate by independent co-affinity purification assay. We describe topological, evolutionary properties of this network, and find connections to >100 disease-associated proteins.

Ron Shamir, Tel Aviv University

Modeling, inference and evolution in bionetworks

I will describe ongoing efforts in my lab on two projects: (1) How to model adequately complex biological networks, in a way that accommodates prior knowledge and admits inference and expansion. (2) Tracing the evloution of cis-regulation among related species. The methods will be demostrated by results on yeast species.

Joint work with Amos Tanay, Irit Gat-Viks, Daniela Raijman (TAU) and Aviv
Regev (Harvard)

Sarah A. Teichmann, MRC Laboratory of Molecular Biology

Evolution of Protein Interactions in Complexes and Networks

There is an abundance of data on protein interactions and protein complexes, both from conventional small-scale experiments over the decades, and more recently by large-scale functional genomics experiments. Much less is known about the details of affinities and kinetics of these interactions. We can now draw on the information available about protein interactions in order to study the evolution of interactions. We show that interactions, just like individual proteins, frequently emerge by duplication and divergence. We have studied the role of different duplication scenarios in the evolution of interactions in the protein-protein interaction network and in sets of protein complexes. The duplication of a protein that engages in protein-protein interactions raises issues about the stoichiometry and equilibrium of protein complexes when the quantities of one component increases. Simultaneous duplication of all components involved in an interaction or a protein complex is predicted by the gene dosage balance hypothesis. In contrast, our results indicate that most interactions and complexes have evolved by step-wise partial duplications. We show that duplicated complexes retain the same overall function, but have different binding specificities and regulation, revealing that duplication is associated with functional specialization. We distinguish between duplications that result in a new, alternative protein complex and duplications that result in additional components of an existing complex, and quantify events of both types. The evolutionary analyses described above provide insight into affinities and specificities of interactions, and indicate ways in which prediction of these properties may be possible.

Denis Vitkup, NCBI/NIH

Context-based correlation in the context of cellular networks

Different context-based genomic correlations: phylogenetic profiles, co-expression, chromosomal gene distances, gene fusions show a notable agreement in the context of cellular networks. This demonstrates that design principles of cellular networks are conserved in evolution and are directly reflected in the structures of bacterial chromosomes. We investigate the behavior of these context-based correlations and demonstrate how they can be used to annotate orphan metabolic activities.

Yuri Wolf, NCBI/NIH

Unifying measures of gene function and evolution 

Recent genome analyses revealed intriguing correlations between variables characterizing the functioning of a gene, such as expression level, connectivity of genetic and protein-protein interaction networks, and knockout effect, and variables describing gene evolution, such as sequence evolution rate and propensity for gene loss. Typically, variables within each of these classes are positively correlated, e.g., products of highly expressed genes also tend to have many protein-protein interactions, whereas variables between classes are negatively correlated, e.g., highly expressed genes tend to evolve slowly. Here we describe principal component (PC) analysis of 7 genome-related variables and propose biological interpretations for the first three principal components. The first two PCs together reflect the intuitive notion of a gene's "importance", or the "status" of a gene in the genomic community, with positive contributions from knockout lethality, expression level and the number of paralogs, and negative contributions from sequence evolution rate and gene loss propensity. The third PC may be interpreted as a gene's "adaptability" whereby genes with high adaptability evolve fast, are relatively often lost during evolution, readily duplicate and are highly expressed, but only under certain conditions. Functional classes of genes substantially vary in status and adaptability, with the highest status characteristic of the translation system and cytoskeletal proteins, and highest adaptability seen in metabolic enzymes and transporters.

Itai Yanai, Harvard University

Integrating space and time to understand how a genetic network determines development

We are working to understand how genetic networks regulate the patterning of the different cell types across development. We are focusing on the C blastomere lineage of the C. elegans embryo which invariantly gives rise to body wall muscle, hypodermis, two neurons, and one cell death. Through an investigation of wild-type and mutant temporal and spatial expression data, we have proposed a network model for the patterned specification of cell fates within this lineage. Our approach towards validating this network
involves interplay between computational and experimental methods. We have constructed a perturbation matrix for a set of key transcription factors by systematically disrupting the function of each while measuring the expression of the others and employing computational methods to efficiently extract regulatory relationships among the genes. Individual interactions are then verified using a suite of experimental techniques such as reporter analysis, RNAi, and yeast one-hybrid. We have also investigated genetic buffering within this network by assembling a synthetic lethal matrix for the genes comprising the network. We believe that the integration of diverse data promises to unravel the genetic networks that underlie developmental processes.
Work done in the lab of Prof. Craig Hunter, Harvard University