Predict Protein
Rostlab's PredictProtein service has been an online resource for sequence analysis since 1992. As technology, model and tools are evolving, so is our service. Here, you can find how to access the latest version of our service, learn about its usage and research history.
Service
Access the latest version of PredictProtein.
Take a tour of the service, to learn how to optimally benefit from our service. You can learn everything about our service via the documentation.
History
The PredictProtein service started out in 1992 as one of the first online resources for protein analysis. It started out as an automatic Email prediction service.
Afterwards, it was continuously improved, applying new methodologies and technologies. The abstract of the 2014 publication describes the state of the project back then as follows:
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org.
Since then, methods started to shift slowly from MSA-based analysis towards protein language models (pLMs). The abstract of the 2021 predict protein paper describes the growth of the resource and its incorporation of the latest embedding-based methods:
Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.
You can find the code of this, now legacy, PredictProtein version on GitHub.
We now see a change of focus: Moving away from MSA-based methods that primarily use CPUs, towards embedding-based methods that mostly require GPUs to function properly. This shift led to a re-design of the service to comply with the new requirements, both on a software and hardware level. bio_embeddings was a first step towards making these embedding-based methods more accessible for the whole bioinformatics community. Building on this, biocentral is now designated as the latest version of PredictProtein. It allows for embedding protein sequences, visualizing them, get feature predictions from embedding-based predictors and even training one's own models. Besides a programmatic Python API especially targeted at bioinformaticians, a web-based frontend is also still available. We will continue to work on improving our services and making the latest protein analysis methods available for everyone.