Machine Learning for Regulatory Genomics

Module: IN2393.

Credit: 6 ECTS

Room (lecture and exercise): online

Lecturer: Julien Gagneur, Matthias Heinig, Maria Colomé-Tatché, Annalisa Marsico 

Lecture: Tuesdays, 14:00 - 15:30, starting on 13th April 2021 

Exercise: Tuesdays, 15:30 - 17:00, starting on 13th April 2021 

Lecture Language: English

Prerequisite (recommended):

  • One introductory lecture on machine learning (e.g IN2064; MA4802)
  • Strong interest in biological and biomedical research questions
  • Basics in python programming

 

Who can attend

Generally, the module is geared toward students from bioinformatics, computer science, as well as other students with a quantitative training (physics, applied maths) and an interest to dive into molecular biology. Students from biology or medicine are welcome guaranteed they have some background in machine learning (see above) and no inhibition with basic programming.

The module is an elective module in the catalogue of:

 

  • MSc Bioinformatics
  • MSc Informatics
  • MSc Information Systems
  • MSc Informatics: Games Engineering
  • MSc Data Engineering and Analytics
  • MSc Physics

Intended Learning Outcomes:

At the end of the module students are able to:

  • Describe major steps of gene expression from accessing DNA to determining protein abundance.
  • Describe genome-wide assays employed to assess various steps of gene expression
  • Describe the concept of massively parallel reporter assays
  • Describe and apply deep learning methods to perform sequence-based predictions
  • Describe and apply the concept of model interpretation
  • Describe and apply the concept of convolutional neural network
  • Describe and apply the concept of transformers
  • Apply deep learning for sequence-based modeling of a genome-wide assay. Evaluate model performance and provide biological interpretation of its application to real data.

 

Content:

Gene expression refers to how cells read the information encoded in genomes. This lecture introduces biological and computational concepts to study gene expression. It consists of two parts:

(1) 6 lectures introduce biological mechanisms, experimental assays, and computational models for regulatory genomics. The six lectures are supported with modeling exercises in python.

(2) A 7-8 week hands-on project
 
The lectures are organized around steps of gene expression:

  • Introduction to gene regulation and sequence-based computational models of gene regulation
  • Transcriptional regulation
  • Chromatin-mediated regulation
  • RNA splicing
  • RNA modification and degradation
  • Translation

 Over these lectures, computational methods are introduced including:

  • Fitting procedures of deep neural network
  • Convolutional Neural Networks
  • LSTM and transformers
  • Embeddings for sequence data
  • Multi-task learning and transfer learning
  • End-to-end learning
  • Analytical and visualisation techniques for model interpretation