Open Topics

We offer multiple Bachelor/Master theses, Guided Research projects and IDPs in the area of data mining/machine learning. A non-exhaustive list of open topics is listed below.

If you are interested in a thesis or a guided research project, please send your CV and transcript of records to Prof. Stephan Günnemann via email and we will arrange a meeting to talk about the potential topics.

Robustness of Large Language Models

Type: Master's Thesis

Prerequisites:

Strong knowledge in machine learning
Very good coding skills
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)
Knowledge about NLP and LLMs

Description:

The success of Large Language Models (LLMs) has precipitated their deployment across a diverse range of applications. With the integration of plugins enhancing their capabilities, it becomes imperative to ensure that the governing rules of these LLMs are foolproof and immune to circumvention. Recent studies have exposed significant vulnerabilities inherent to these models, underlining an urgent need for more rigorous research to fortify their resilience and reliability. A focus in this work will be the understanding of the working mechanisms of these attacks.

We are currently seeking students for the upcoming Summer Semester of 2024, so we welcome prompt applications. This project is in collaboration with Google Research.

Contact: Tom Wollschläger

References:

Generative Models for Drug Discovery

Type: Mater Thesis / Guided Research

Prerequisites:

Strong machine learning knowledge
Proficiency with Python and deep learning frameworks (PyTorch or TensorFlow)
Knowledge of graph neural networks (e.g. GCN, MPNN)
No formal education in chemistry, physics or biology needed!

Description:

Effectively designing molecular geometries is essential to advancing pharmaceutical innovations, a domain which has experienced great attention through the success of generative models. These models promise a more efficient exploration of the vast chemical space and generation of novel compounds with specific properties by leveraging their learned representations, potentially leading to the discovery of molecules with unique properties that would otherwise go undiscovered. Our topics lie at the intersection of generative models like diffusion/flow matching models and graph representation learning, e.g., graph neural networks. The focus of our projects can be model development with an emphasis on downstream tasks (e.g., diffusion guidance at inference time) and a better understanding of the limitations of existing models.

Contact: Johanna Sommer, Leon Hetzel

References:

Equivariant Diffusion for Molecule Generation in 3D

Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation

Structure-based Drug Design with Equivariant Diffusion Models

Efficient Machine Learning: Pruning, Quantization, Distillation, and More - DAML x Pruna AI

Type: Master's Thesis / Guided Research / Hiwi

Prerequisites:

Strong knowledge in machine learning
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

Description:

The efficiency of machine learning algorithms is commonly evaluated by looking at target performance, speed and memory footprint metrics. Reduce the costs associated to these metrics is of primary importance for real-world applications with limited ressources (e.g. embedded systems, real-time predictions). In this project, you will work in collaboration with the DAML research group and the Pruna AI startup on investigating solutions to improve the efficiency of machine leanring models by looking at multiple techniques like pruning, quantization, distillation, and more.

Contact: Bertrand Charpentier

References:

Deep Generative Models

Type: Master Thesis / Guided Research

Prerequisites:

Strong machine learning and probability theory knowledge
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)
Knowledge of generative models and their basics (e.g., Normalizing Flows, Diffusion Models, VAE)
Optional: Neural ODEs/SDEs, Optimal Transport, Measure Theory

Description:

With recent advances, such as Diffusion Models, Transformers, Normalizing Flows, Flow Matching, etc., the field of generative models has gained significant attention in the machine learning and artificial intelligence research community. However, many problems and questions remain open, and the application to complex data domains such as graphs, time series, point processes, and sets is often non-trivial. We are interested in supervising motivated students to explore and extend the capabilities of state-of-the-art generative models for various data domains.

Contact: Marcel Kollovieh, David Lüdke

References:

Graph Structure Learning

Type: Guided Research / Hiwi

Prerequisites:

Strong machine learning knowledge
Proficiency with Python and deep learning frameworks (PyTorch or TensorFlow)
Knowledge of graph neural networks (e.g. GCN, MPNN)
Optional: Knowledge of graph theory and mathematical optimization

Description:

Graph deep learning is a powerful ML concept that enables the generalisation of successful deep neural architectures to non-Euclidean structured data. Such methods have shown promising results in a vast range of applications spanning the social sciences, biomedicine, particle physics, computer vision, graphics and chemistry. One of the major limitations of most current graph neural network architectures is that they often rely on the assumption that the underlying graph is known and fixed. However, this assumption is not always true, as the graph may be noisy or partially and even completely unknown. In the case of noisy or partially available graphs, it would be useful to jointly learn an optimised graph structure and the corresponding graph representations for the downstream task. On the other hand, when the graph is completely absent, it would be useful to infer it directly from the data. This is particularly interesting in inductive settings where some of the nodes were not present at training time. Furthermore, learning a graph can become an end in itself, as the inferred structure can provide complementary insights with respect to the downstream task. In this project, we aim to investigate solutions and devise new methods to construct an optimal graph structure based on the available (unstructured) data.

Contact: Filippo Guerranti

References:

A Machine Learning Perspective on Corner Cases in Autonomous Driving Perception

Type: Master's Thesis

Industrial partner: BMW

Prerequisites:

Strong knowledge in machine learning
Knowledge of Semantic Segmentation
Good programming skills
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

Description:

In autonomous driving, state-of-the-art deep neural networks are used for perception tasks like for example semantic segmentation. While the environment in datasets is controlled in real world application novel class or unknown disturbances can occur. To provide safe autonomous driving these cased must be identified.

The objective is to explore novel class segmentation and out of distribution approaches for semantic segmentation in the context of corner cases for autonomous driving.

Contact: Sebastian Schmidt

References:

Active Learning for Multi Agent 3D Object Detection

Type: Master's Thesis
Industrial partner: BMW

Prerequisites:

Strong knowledge in machine learning
Knowledge in Object Detection
Excellent programming skills
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

Description:

In autonomous driving, state-of-the-art deep neural networks are used for perception tasks like for example 3D object detection. To provide promising results, these networks often require a lot of complex annotation data for training. These annotations are often costly and redundant. Active learning is used to select the most informative samples for annotation and cover a dataset with as less annotated data as possible.

The objective is to explore active learning approaches for 3D object detection using combined uncertainty and diversity based methods.

Contact: Sebastian Schmidt

References:

Graph Neural Networks

Type: Master's thesis / Bachelor's thesis / guided research

Prerequisites:

Strong machine learning knowledge
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)
Knowledge of graph neural networks (e.g. GCN, MPNN)
Knowledge of graph/network theory

Description:

Graph neural networks (GNNs) have recently achieved great successes in a wide variety of applications, such as chemistry, reinforcement learning, knowledge graphs, traffic networks, or computer vision. These models leverage graph data by updating node representations based on messages passed between nodes connected by edges, or by transforming node representation using spectral graph properties. These approaches are very effective, but many theoretical aspects of these models remain unclear and there are many possible extensions to improve GNNs and go beyond the nodes' direct neighbors and simple message aggregation.

Contact: Simon Geisler

References:

Physics-aware Graph Neural Networks

Type: Master's thesis / guided research

Prerequisites:

Strong machine learning knowledge
Proficiency with Python and deep learning frameworks (JAX or PyTorch)
Knowledge of graph neural networks (e.g. GCN, MPNN, SchNet)
Optional: Knowledge of machine learning on molecules and quantum chemistry

Description:

Deep learning models, especially graph neural networks (GNNs), have recently achieved great successes in predicting quantum mechanical properties of molecules. There is a vast amount of applications for these models, such as finding the best method of chemical synthesis or selecting candidates for drugs, construction materials, batteries, or solar cells. However, GNNs have only been proposed in recent years and there remain many open questions about how to best represent and leverage quantum mechanical properties and methods.

Contact: Nicholas Gao

References:

Robustness Verification for Deep Classifiers

Type: Master's thesis / Guided research

Prerequisites:

Strong machine learning knowledge (at least equivalent to IN2064 plus an advanced course on deep learning)
Strong background in mathematical optimization (preferably combined with Machine Learning setting)
Proficiency with python and deep learning frameworks (Pytorch or Tensorflow)
(Preferred) Knowledge of training techniques to obtain classifiers that are robust against small perturbations in data

Description: Recent work shows that deep classifiers suffer under presence of adversarial examples: misclassified points that are very close to the training samples or even visually indistinguishable from them. This undesired behaviour constraints possibilities of deployment in safety critical scenarios for promising classification methods based on neural nets. Therefore, new training methods should be proposed that promote (or preferably ensure) robust behaviour of the classifier around training samples.

Contact: Aleksei Kuvshinov

References (Background):

References:

Uncertainty Estimation in Deep Learning

Type: Master's Thesis / Guided Research

Prerequisites:

Strong knowledge in machine learning
Strong knowledge in probability theory
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

Description:

Safe prediction is a key feature in many intelligent systems. Classically, Machine Learning models compute output predictions regardless of the underlying uncertainty of the encountered situations. In contrast, aleatoric and epistemic uncertainty bring knowledge about undecidable and uncommon situations. The uncertainty view can be a substantial help to detect and explain unsafe predictions, and therefore make ML systems more robust. The goal of this project is to improve the uncertainty estimation in ML models in various types of task.

Contact: Tom Wollschläger, Dominik Fuchsgruber, Bertrand Charpentier

References:

Hierarchies in Deep Learning

Type: Master's Thesis / Guided Research

Prerequisites:

Strong machine learning knowledge
Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

Description:

Multi-scale structures are ubiquitous in real life datasets. As an example, phylogenetic nomenclature naturally reveals a hierarchical classification of species based on their historical evolutions. Learning multi-scale structures can help to exhibit natural and meaningful organizations in the data and also to obtain compact data representation. The goal of this project is to leverage multi-scale structures to improve speed, performances and understanding of Deep Learning models.

Contact: Marcel Kollovieh, Bertrand Charpentier

References: