The lab's research is driven by a conviction that protein and DNA sequences encode a significant core of information about the ultimate structure and function of genetic material and its gene products. Research goals of the lab involve using protein and DNA sequences along with evolutionary information to predict a protein's: overall function, interaction partners, secondary structure, disordered regions, subcellular localization, membrane spanning protein structure, intra-chain residue contacts, cell cycle control, and domain boundaries. Another significant research focus is to improve the effectiveness and efficiency of structural genomics projects' ability to determine the structures of proteins on a large scale.


Our main goal is to predict important aspects of protein structure and function using sequence information, evolutionary information and results from other predictions. We apply whichever type of algorithm is needed to solve a problem from modern machine learning (neural networks, SVMs, tree-algorithms, Bayesian classifiers) to established statistical means.

Protein Prediction

The lab's research is driven by a conviction that protein and DNA sequences encode a significant core of information about the ultimate structure and function of genetic material and its gene products. Research goals of the lab involve using protein and DNA sequences along with evolutionary information to predict aspects of the proteins relevant to the advance of biomedical research. Examples are the prediction of coarse-grained aspects of protein function such as the type of enzymatic activity (ECGO), the prediction of interaction partners (Interaction Sites, DISIS, PiNAT), subcellular localization (LOCtree, LOCnet, PredictNLS), and of functional effects of point mutations/SNPs (SNAP), the prediction of disordered regions (NORSp, Ucon, IUcon), membrane spanning segments (PROF/PHDhtm), aspects of protein secondary structure (PROF/PHD, DSSPcont) and solvent accessibility (PROF/PHD), internal residue-residue contacts (PROFcon), the identification of domain-like functional and structural subunits (CHOP, CHOPnet), as well as the clustering of proteins into families (CHOP).

Gagneur Lab

Genomics allows the identification and quantification of all major molecular constituents in cells from DNA to RNA, proteins and metabolites. The challenge now lies in identifying mechanistic and causal relationships across these multiple levels.

Our research is articulated along two axes: advancing our understanding of gene regulation through quantitative modeling; developing methods to study the mechanisms by which genetic variants condition phenotypes (systems genetics). To this end, we combine mathematical modeling with genome-wide experimental assays from either our own yeast lab or from collaborators.

Quantitative modeling of gene regulation

Our past work includes in yeast the first report of allele- and strand-specific expression genome-wide, the finding that promoters are typically bidirectional, and that expression of non-coding RNAs antisense to genes (a class that represents the majority of stable uncharacterized non-coding RNAs in yeast and affects more than a quarter of genes in humans) induces ultrasensitivity or threshold behavior on gene regulation. We have also developed predictive models of spatio-temporal enhancer expression patterns in the developing fly embryo based on genome-wide transcription factor binding patterns. The predictions, confirmed experimentally, demonstrated a surprising plurality of transcription factor binding patterns on enhancers with similar expression profiles.

Current projects include quantitative modeling of large-scale ChIP datasets, with a focus on genome-wide occupancy maps of factors of the core transcription machinery, and on RNA processing kinetics.

In a recent study, we developed bidirectional hidden Markov models to improve the annotation of DNA‐associated processes from genomics data. Application revealed variations in the yeast Pol II transcription cycle and identify directed chromatin state patterns at transcribed regions in the human genome.

One ultimate goal of genome biology is to understand how gene regulation is genetically encoded. To this end, we are developing systematic approaches combining 1) in vivo quantification of RNA metabolism rates, 2) identification of DNA elements predictive for these kinetic rates, and 3) testing for the causal role of these DNA elements by drawing on expression profiles of genetically distinct individuals. We have recently demonstrated the power of this approach using fission yeast as a model system. This recovered known DNA and RNA motifs involved in RNA synthesis and degradation, quantified the contributions of individual nucleotides to RNA synthesis, splicing, and degradation, and uncovered novel motifs that regulate RNA life time.

Systems genetics

Quantitative genetics is entering a new era. With low cost and high throughput sequencing, genotypes become easy to collect at larger scale and many significant associations between DNA variations in the genome and phenotypes such as disease are found. However, knowledge of the sole sequence variation yields poor opportunity for treatment. The challenge relies in understanding more globally the biological processes and molecular pathways that are affected by these genetic variants and lead to disease. To this end, combination of large scale molecular profiling such as transcription profiling, together with data analysis methods to delineate causal links from mere correlations is needed.

In this context, we recently showed how exploiting environmental perturbations greatly helps in delineating causation from correlative associations. This study in yeast was not only the first one to study genotype-environment interactions for causal inference, but as well the first to systematically assess causal inference predictions genome-wide. Our study has implications for the design and analysis of clinical molecular profiling efforts towards understanding how genetic variation causes disease, suggesting that multiple contexts (e.g., cell types) can be informative even if they are not afflicted by the disease.

We are pursuing our work in yeast to further gain insights in fundamental principles that govern the effect of genetic variation on molecular systems. Current projects along this line include the study of genetic variation on gene regulatory network and causal inference from gene expression temporal response. Moreover, we are now moving to human genetic applications with new collaboration partners (Prof. Klein, auto-immune genetic diseases, and Dr Prokisch, mitochondrial disorders). In the context of mitochondrial disorders we have recently been awarded a 2 million Euro grant for a junior group consortium, mitOmics, together with Fabiana Perocchi, Gene center, and Tobias Haack, TUM-med.

We are also member of a 3-year EU-funded research network called SOUND ( whose objective is to create the bioinformatic tools for statistically informed use of personal genomic and other ‘omic data in medicine, including cancers and rare metabolic diseases.