All our software are free and open source.
Scooby
Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings.
The github repository contains model training and data loading code: https://github.com/gagneurlab/scooby
Flashzoi
Flashzoi is an up to 3x faster Borzoi enhancement and part of the ‘borzoi-pytorch’ package. It can be installed via pip (‘pip install borzoi-pytorch’) or is available on github: https://github.com/johahi/borzoi-pytorch.
SpeciesLMs
SpeciesLMs are genomic language models trained across different parts of the tree of life (Karollus et al., 2024; Tomaz da Silva et al., 2025). We offer models trained on the fungal kingdom and a metazoan model. These flavors are available on the huggingface hub: https://huggingface.co/collections/johahi/specieslms
Spectralis
Spectralis is a method for de novo peptide sequencing building upon the task of bin reclassification, which assigns ion series to discretized m/z values even in the absence of a peak. This is implemented with a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses. Based on bin reclassification, Spectralis predicts scores to assess the quality of peptide-spectrum matches using Levenshtein distance estimates. Furthermore, Spectralis consists of an evolutionary algorithm to fine-tune peptide-spectrum matches.
It is available on GitHub: https://github.com/gagneurlab/spectralis
AbExp
AbExp is a tool to predict aberrant gene expression in 49 human tissues based on DNA sequence variants. It was trained on aberrant gene expression calls from the GTEx dataset.
It is available on github: https://github.com/gagneurlab/AbExp
AbSplice & AbSplice 2
AbSplice predicts aberrant splicing across tissues. It combines sequence based machine learning models for variant effect prediction in splicing (i.e. MMSplice and SpliceAI) together with SpliceMap, a tissue-specific splice site annotation. If available, RNA-seq from accessible tissues such as blood or skin can be integrated for improved predictions.
AbSplice 2 is a method that predicts aberrant splicing across human tissues and developmental stages.
Both are available on GitHub:
DROP
DROP is an integrative workflow to help researchers use RNA-Seq data in order to detect genes with aberrant expression (using OUTRIDER), aberrant splicing (using FRASER), and mono-allelic expression. It consists of three independent modules for each of those strategies.
It is available on GitHub: https://github.com/gagneurlab/drop
FRASER
FRASER identifies aberrant splicing events from an RNA-seq dataset. It is applied in clinical research to identify candidate disease-causing genes for patients affected with a rare disorder of unknown cause. It implements a denoising autoencoder for count fraction data.
It is available on GitHub: https://github.com/gagneurlab/FRASER
MMSplice
MMSplice is a machine learning model that predicts effects of genetic variants on splicing. It won the CAGI 5 exon skipping challenge (2018). It implements a modular modeling approach where modules are neural networks modeling individual gene regions.
It is available in Kipoi: https://github.com/kipoi/models/tree/master/MMSplice
PROTRIDER
PROTRIDER is an autoencoder-based method to call protein outliers from mass spectrometry-based proteomics datasets.
PROTRIDER is available on github: https://github.com/gagneurlab/PROTRIDER
OUTRIDER
OUTRIDER identifies gene expression outliers from an RNA-seq dataset. It is applied in clinical research to identify candidate disease-causing genes for patients affected with a rare disorder of unknown cause. It implements a denoising autoencoder for count data.
It is available on Bioconductor: https://bioconductor.org/packages/release/bioc/html/OUTRIDER.html
Kipoi
Kipoi (pronounce: kípi; from the Greek κήποι: gardens) is an API and a repository of ready-to-use trained models for regulatory genomics. It contains >2,000 different models, covering canonical predictive tasks in transcriptional and post-transcriptional gene regulation. Kipoi's API is implemented as a python package (github.com/kipoi/kipoi) and it is also accessible from the command line or R.
Main web page: https://kipoi.org
wBuild
"workflow Build" (or maybe Wachutka build?). Data analysis and reporting workflow management.
All R-markdown scripts of a project get compiled and rendered into a navigable web-page. Data and scripts dependencies are handled using snakemake, whereby the programmer enters snakemake rules in the header of the R-markdown scripts.
https://pypi.python.org/pypi/wbuild/1.0
mgsa
MGSA is an effective alternative to classical gene set enrichment analysis. Classical methods analyze each set in isolation. Because sets such as biological pathways often share genes with each other, the returned list of enriched sets is usually long and redundant. In contrast, MGSA takes set overlap into account by working on all sets simultaneously and substantially reduces the number of redundant sets.
It is available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/mgsa.html
genomeIntervals
An intuitive R package to perform operations on genomic intervals such as merging, detecting overlap, or computing distances between intervals.
It is available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/genomeIntervals.html