# Open Topics

We offer multiple Bachelor/Master theses, Guided Research projects and IDPs in the area of data mining/machine learning. A *non-exhaustive* list of open topics is listed below.

If you are interested in a thesis or a guided research project, please send your CV and transcript of records to Prof. Stephan Günnemann via email and we will arrange a meeting to talk about the potential topics.

### Machine Learning for Process Mining

**Type**: Master's Thesis

**Industrial partner**: Celonis

**Prerequisites**:

- Strong knowledge of Bayesian inference
- Knowledge of graphical models
- Good programming skills
- Interest in developing industry-relevant solutions

**Description**:

Celonis aims at providing a Digital Twin of business organizations based on event logs from IT systems, which can be used for what-if analyses and predictions. Technically, such a Digital Twin can be represented in a form of a graphical model where the structure (e.g. tasks, teams, routing rules) and parameters (e.g. processing times, number of employees) are to be estimated from the data. Since in practice, however, the data is often inaccurate or incomplete, such estimation is highly challenging. The objective of this project is to employ and develop novel ML/inference techniques to accurately learn the characteristics of such a Digital Twin.

**Contact**: Stephan Günnemann

**References**:

- Bayesian inference for queueing networks and modeling of internet services
- Automated discovery of business process simulation models from event logs

### Learning with Differential Equations

**Type:** Master's Thesis / Guided Research

**Prerequisites:**

- Graph neural networks & general machine learning
- Comfortable with at least one common deep learning library such as pytorch, jax or Flux.jl
- Preferred: Familiarity with ODEs and numerical methods for PDEs such as FDM, FVM, or FEM

**Description:** Differential equations are the natural language to describe many natural phenomena. Scientists use it to capture the behavior of fluids, populations, the electromagnetic field, and many other systems. Yet, until recently the fields of deep learning and differential equations were completely separate. Bringing the two together promises significant advances in the data-driven modeling of the physical world. We want to develop new methods to improve simulations of physical systems as well as infer the underlying dynamics from observational data.

**Contact:** Marten Lienen

**References:**

- Neural Ordinary Differential Equations
- Learning the Dynamics of Physical Systems from Sparse Observations with Finite Element Networks
- Fourier Neural Operator for Parametric Partial Differential Equations

### Diversity Active Learning with deep encoded environment Models

**Type:** Master's Thesis

**Industrial partner:** BMW

**Prerequisites: **

- Strong knowledge in machine learning
- Knowledge of Autoencoders
- Good programming skills
- Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

**Description: **

In Autonomous driving, state-of-the-art deep neural networks are used for perception tasks. To provide promising results, these networks often require a lot of complex annotation data for training. These annotations are often costly and redundant. Active learning is used to select the most informative samples for annotation. Diversity based active learning aims to best cover a dataset with as less annotated data as possible. The effectiveness of selecting diverse driving scenarios for labeling has been also shown in literature.

The objective is to explore diversity based active learning approaches covering diverse scenarios.

**Contact:** Sebastian Schmidt

**References: **

### Machine Learning on Temporal Graphs

**Type**: Master's Thesis

**Industrial partner**: Huawei's Munich Research Center

**Prerequisites**:

- Strong knowledge in machine learning
- Interest in machine learning research on temporal graph-structured data
- Proficiency in Python and deep learning frameworks (TensorFlow or PyTorch)
- Fluency in English (written and spoken)

**Description**:

Huawei’s Munich Research Center is responsible for advanced technology research beyond Huawei’s product lines. The AI4Sec Team focuses on machine learning research for cybersecurity. The topics include large-scale data mining, un-, semi- and self-supervise machine learning, graph analytics and geometric deep learning, temporal sequence models, continual and contrastive learning, out-of-distribution detection, reinforcement learning and NLP. The team address open problems in cybersecurity such as network analytics for advanced threat detection, malware detection and information extraction from unstructured data. The thesis' focus will be on machine learning on temporal graphs. Depending on the student’s strengths and interests, topics include temporal graph neural networks, robust community detection, out-of-distribution detection, temporal point processes on graphs.

**Contact**: Stephan Günnemann

**References**:

- Robust Dynamic Clustering for Temporal Networks
- Anomaly Detection in Dynamic Graphs via Transformer
- Detecting Anomalous Event Sequences with Temporal Point Processes
- A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning
- A Comprehensive Survey on Graph Anomaly Detection with Deep Learning

### Graph Neural Networks for Combinatorial Optimization

**Type:** Master's thesis / guided research

**Prerequisites:**

- Strong machine learning knowledge, specifically graph neural networks
- Proficiency with Python and deep learning frameworks, preferably PyTorch
- Basic knowledge of combinatorial problems like graph colouring, TSP, boolean satisfiability, etc.

**Description:**

Combinatorial optimization problems are notoriously hard to solve, yet are ubiquitous in numerous applications. Recently, there have been efforts to solve combinatorial problems such as the *Travelling Salesperson Problem* or the *Boolean Satisfiability Problem* with deep learning. As many of these problems can be represented as a graph, GNNs are most often used to leverage the graph structure of the problem and introduce useful invariances to the problem modelling. While there have been first successes, current models still struggle with generalization to large problem instances and, generally, competitiveness with traditional solvers. Potential topics could include constructing novel neural solvers or investigating the limitations of existing ones.

**Contact:** Johanna Sommer

**References:**

- Attention, Learn to Solve Routing Problems!
- Learning TSP Requires Rethinking Generalization
- Learning a SAT Solver from Single-Bit Supervision
- Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon
- Generalization of Neural Combinatorial Solvers through the Lens of Adversarial Robustness

### Graph Neural Networks

**Type:** Master's thesis / Bachelor's thesis / guided research

**Prerequisites: **

- Strong machine learning knowledge
- Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)
- Knowledge of graph neural networks (e.g. GCN, MPNN)
- Knowledge of graph/network theory

**Description:**

Graph neural networks (GNNs) have recently achieved great successes in a wide variety of applications, such as chemistry, reinforcement learning, knowledge graphs, traffic networks, or computer vision. These models leverage graph data by updating node representations based on messages passed between nodes connected by edges, or by transforming node representation using spectral graph properties. These approaches are very effective, but many theoretical aspects of these models remain unclear and there are many possible extensions to improve GNNs and go beyond the nodes' direct neighbors and simple message aggregation.

**Contact:** Simon Geisler

**References:**

- Semi-supervised classification with graph convolutional networks
- Relational inductive biases, deep learning, and graph networks
- Diffusion Improves Graph Learning
- Weisfeiler and leman go neural: Higher-order graph neural networks
- Reliable Graph Neural Networks via Robust Aggregation

### Physics-aware Graph Neural Networks

**Type:** Master's thesis / guided research

**Prerequisites: **

- Strong machine learning knowledge
- Proficiency with Python and deep learning frameworks (JAX or PyTorch)
- Knowledge of graph neural networks (e.g. GCN, MPNN, SchNet)
- Optional: Knowledge of machine learning on molecules and quantum chemistry

**Description:**

Deep learning models, especially graph neural networks (GNNs), have recently achieved great successes in predicting quantum mechanical properties of molecules. There is a vast amount of applications for these models, such as finding the best method of chemical synthesis or selecting candidates for drugs, construction materials, batteries, or solar cells. However, GNNs have only been proposed in recent years and there remain many open questions about how to best represent and leverage quantum mechanical properties and methods.

**Contact:** Nicholas Gao

**References:**

- Directional Message Passing for Molecular Graphs
- Neural message passing for quantum chemistry
- Learning to Simulate Complex Physics with Graph Network
- Ab initio solution of the many-electron Schrödinger equation with deep neural networks
- Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions
- Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

### Robustness Verification for Deep Classifiers

**Type:** Master's thesis / Guided research

**Prerequisites:**

- Strong machine learning knowledge (at least equivalent to IN2064 plus an advanced course on deep learning)
- Strong background in mathematical optimization (preferably combined with Machine Learning setting)
- Proficiency with python and deep learning frameworks (Tensorflow or Pytorch)
- (Preferred) Knowledge of training techniques to obtain classifiers that are robust against small perturbations in data

**Description**: Recent work shows that deep classifiers suffer under presence of adversarial examples: misclassified points that are very close to the training samples or even visually indistinguishable from them. This undesired behaviour constraints possibilities of deployment in safety critical scenarios for promising classification methods based on neural nets. Therefore, new training methods should be proposed that promote (or preferably ensure) robust behaviour of the classifier around training samples.

**Contact: **Aleksei Kuvshinov

**References (Background):**

**References:**

- Certified Adversarial Robustness via Randomized Smoothing
- Formal guarantees on the robustness of a classifier against adversarial manipulation
- Towards deep learning models resistant to adversarial attacks
- Provable defenses against adversarial examples via the convex outer adversarial polytope
- Certified defenses against adversarial examples
- Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks
- Provable robustness of relu networks via maximization of linear regions

### Efficient Machine Learning Models

**Type:** Master's Thesis / Guided Research / Hiwi

**Prerequisites: **** **

- Strong knowledge in machine learning
- Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

**Description:**

The efficiency of machine learning algorithms is commonly evaluated by looking at target performance, speed and memory footprint metrics. Reduce the costs associated to these metrics is of primary importance for real-world applications with limited ressources (e.g. embedded systems, real-time predictions). In this project, we aim at investigating solutions to improve the efficiency of machine leanring models by looking at multiple techniques.

**Contact: **Bertrand Charpentier

**References:**

- The Efficiency Misnomer
- A Gradient Flow Framework for Analyzing Network Pruning
- Distilling the Knowledge in a Neural Network
- A Survey of Quantization Methods for Efficient Neural Network Inference

### Uncertainty Estimation in Deep Learning

**Type:** Master's Thesis / Guided Research

**Prerequisites: **** **

- Strong knowledge in machine learning
- Strong knowledge in probability theory
- Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

**Description:**

Safe prediction is a key feature in many intelligent systems. Classically, Machine Learning models compute output predictions regardless of the underlying uncertainty of the encountered situations. In contrast, aleatoric and epistemic uncertainty bring knowledge about undecidable and uncommon situations. The uncertainty view can be a substantial help to detect and explain unsafe predictions, and therefore make ML systems more robust. The goal of this project is to improve the uncertainty estimation in ML models in various types of task.

**Contact: **Bertrand Charpentier

**References:**

- Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
- Predictive Uncertainty Estimation via Prior Networks
- Posterior Network: Uncertainty Estimation without OOD samples via Density-based Pseudo-Counts
- Evidential Deep Learning to Quantify Classification Uncertainty
- Weight Uncertainty in Neural Networks

### Hierarchies in Deep Learning

**Type:** Master's Thesis / Guided Research

**Prerequisites:**

- Strong machine learning knowledge
- Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

**Description:**

Multi-scale structures are ubiquitous in real life datasets. As an example, phylogenetic nomenclature naturally reveals a hierarchical classification of species based on their historical evolutions. Learning multi-scale structures can help to exhibit natural and meaningful organizations in the data and also to obtain compact data representation. The goal of this project is to leverage multi-scale structures to improve speed, performances and understanding of Deep Learning models.

**Contact: **Bertrand Charpentier, Daniel Zuegner

**References:**

- Tree Sampling Divergence: An Information-Theoretic Metricfor Hierarchical Graph Clustering
- Hierarchical Graph Representation Learning with Differentiable Pooling
- Gradient-based Hierarchical Clustering
- Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space