Software

All our software are free and open source.

Scooby

Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings.

The github repository contains model training and data loading code: https://github.com/gagneurlab/scooby

Flashzoi

Flashzoi is an up to 3x faster Borzoi enhancement and part of the ‘borzoi-pytorch’ package. It can be installed via pip (‘pip install borzoi-pytorch’) or is available on github: https://github.com/johahi/borzoi-pytorch.

SpeciesLMs

SpeciesLMs are genomic language models trained across different parts of the tree of life (Karollus et al., 2024; Tomaz da Silva et al., 2025). We offer models trained on the fungal kingdom and a metazoan model. These flavors are available on the huggingface hub: https://huggingface.co/collections/johahi/specieslms

Spectralis

Spectralis is a method for de novo peptide sequencing building upon the task of bin reclassification, which assigns ion series to discretized m/z values even in the absence of a peak. This is implemented with a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses. Based on bin reclassification, Spectralis predicts scores to assess the quality of peptide-spectrum matches using Levenshtein distance estimates. Furthermore, Spectralis consists of an evolutionary algorithm to fine-tune peptide-spectrum matches.

It is available on GitHub: https://github.com/gagneurlab/spectralis

AbExp

AbExp is a tool to predict aberrant gene expression in 49 human tissues based on DNA sequence variants. It was trained on aberrant gene expression calls from the GTEx dataset.

It is available on github: https://github.com/gagneurlab/AbExp

AbSplice & AbSplice 2

AbSplice predicts aberrant splicing across tissues. It combines sequence based machine learning models for variant effect prediction in splicing (i.e. MMSplice and SpliceAI) together with SpliceMap, a tissue-specific splice site annotation. If available, RNA-seq from accessible tissues such as blood or skin can be integrated for improved predictions.

AbSplice 2 is a method that predicts aberrant splicing across human tissues and developmental stages.

Both are available on GitHub:

DROP

DROP is an integrative workflow to help researchers use RNA-Seq data in order to detect genes with aberrant expression (using OUTRIDER), aberrant splicing (using FRASER), and mono-allelic expression. It consists of three independent modules for each of those strategies.

It is available on GitHub: https://github.com/gagneurlab/drop

FRASER

FRASER identifies aberrant splicing events from an RNA-seq dataset. It is applied in clinical research to identify candidate disease-causing genes for patients affected with a rare disorder of unknown cause. It implements a denoising autoencoder for count fraction data.

It is available on GitHub: https://github.com/gagneurlab/FRASER

MMSplice

MMSplice is a machine learning model that predicts effects of genetic variants on splicing. It won the CAGI 5 exon skipping challenge (2018). It implements a modular modeling approach where modules are neural networks modeling individual gene regions.

It is available in Kipoi: https://github.com/kipoi/models/tree/master/MMSplice

PROTRIDER

PROTRIDER is an autoencoder-based method to call protein outliers from mass spectrometry-based proteomics datasets.

PROTRIDER is available on github: https://github.com/gagneurlab/PROTRIDER

OUTRIDER

OUTRIDER identifies gene expression outliers from an RNA-seq dataset. It is applied in clinical research to identify candidate disease-causing genes for patients affected with a rare disorder of unknown cause. It implements a denoising autoencoder for count data.

It is available on Bioconductor: https://bioconductor.org/packages/release/bioc/html/OUTRIDER.html

Kipoi

Kipoi (pronounce: kípi; from the Greek κήποι: gardens) is an API and a repository of ready-to-use trained models for regulatory genomics. It contains >2,000 different models, covering canonical predictive tasks in transcriptional and post-transcriptional gene regulation. Kipoi's API is implemented as a python package (github.com/kipoi/kipoi) and it is also accessible from the command line or R.

Main web page: https://kipoi.org

wBuild

"workflow Build" (or maybe Wachutka build?). Data analysis and reporting workflow management.

All R-markdown scripts of a project get compiled and rendered into a navigable web-page. Data and scripts dependencies are handled using snakemake, whereby the programmer enters snakemake rules in the header of the R-markdown scripts.

https://pypi.python.org/pypi/wbuild/1.0

mgsa

MGSA is an effective alternative to classical gene set enrichment analysis. Classical methods analyze each set in isolation. Because sets such as biological pathways often share genes with each other, the returned list of enriched sets is usually long and redundant. In contrast, MGSA takes set overlap into account by working on all sets simultaneously and substantially reduces the number of redundant sets.

It is available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/mgsa.html

genomeIntervals

An intuitive R package to perform operations on genomic intervals such as merging, detecting overlap, or computing distances between intervals.

It is available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/genomeIntervals.html

To top

Informatics 29 Computational Molecular Medicine

Technische Universität München

Prof. Dr. Julien Gagneur