Our group will present six papers at NeurIPS 2025 (including one spotlight presentation). Congratulations!
What Expressivity Theory Misses: Message Passing Complexity for GNNs (Spotlight)
(Niklas Kemper, Tom Wollschläger, Stephan Günnemann)
Graph Neural Networks (GNNs) are typically analyzed through expressivity theory by characterizing their ability to distinguish non-isomorphic graphs. This framework has driven extensive research toward developing more expressive architectures, under the assumption that higher expressivity translates to better empirical performance. But what if this focus is misguided? We challenge this prevailing paradigm by showing that expressivity theory relies on unrealistic assumptions and provides only binary characterizations that miss practical limitations like over-squashing and under-reaching. To narrow this theory-practice gap, we introduce Message Passing Complexity (MPC): a continuous measure that quantifies how difficult it is for information to propagate through graph structures to solve specific tasks. Using fundamental graph tasks as testbeds, we demonstrate that MPC's predictions correlate strongly with empirical performance, revealing that success often depends not on maximizing expressivity but on minimizing task-specific complexity through appropriate architectural choices.
Joint Relational Database Generation via Graph-Conditional Diffusion Models
(Mohamed Amine Ketata, David Lüdke, Leo Schwinn, Stephan Günnemann)
More than 70% of the world's structured data is stored in relational databases (RDBs). However, access to this data is often restricted due to its privacy and sensitivity. To overcome this problem, synthetic data generation has recently emerged as a promising approach to safely share data. Prior methods for RDB generation often rely on rigid sequential generation, limiting flexibility and accuracy. In this paper, we introduce the first non-autoregressive generative model for relational databases that jointly models all tables in an RDB without imposing any specific table order. Viewing RDBs as large heterogeneous graphs, we propose the Graph-Conditional Relational Diffusion Model (GRDM), which leverages graph neural networks to jointly denoise row attributes and capture complex inter-table dependencies. GRDM overcomes the limitations of prior autoregressive approaches and sets a new state-of-the-art for relational database generation.
TreeGen: A Bayesian Generative Model for Hierarchies with Application to Jet Clustering
(Marcel Kollovieh, Nils Fleischmann, Filippo Guerranti, Bertrand Charpentier, Stephan Günnemann)
Hierarchical structures are central to science, from phylogenies to jet-clustering in high-energy physics. We present TreeGen, a generative framework that models distributions over hierarchies and transitions smoothly from probabilistic to discrete trees. Built on Bayesian Flow Networks and Bayesian Sample Inference, TreeGen evolves a belief encoding a distribution of trees rather than manipulating single instances. Trained on jets, TreeGen generates samples that closely match ground-truth log-likelihoods and adhere to physical constraints, outperforming traditional clustering baselines.
Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow
(Kristiyan Sakalyan*, Alessandro Palma*, Filippo Guerranti*, Fabian J Theis, Stephan Günnemann)
Understanding the evolution of cellular microenvironments is essential for deciphering tissue development and disease progression. While spatial transcriptomics now enables high-resolution mapping of tissue organization across space and time, current techniques that analyze cellular evolution operate at the single-cell level, overlooking critical spatial relationships. We introduce NicheFlow, a flow-based generative model that infers the temporal trajectory of cellular microenvironments across sequential spatial slides. By representing local cell neighborhoods as point clouds, NicheFlow jointly models the evolution of cell states and coordinates using optimal transport and Variational Flow Matching. Our approach successfully recovers both global spatial architecture and local microenvironment composition across diverse spatio-temporal datasets, from embryonic to brain development.
Consistent Sampling and Simulation: Molecular Dynamics with Energy-Based Diffusion Models
(Michael Plainer, Hao Wu, Leon Klein, Stephan Günnemann, Frank Noé)
Diffusion models have recently shown strong performance in scientific domains, including biochemistry. When trained on molecular data, such models can be used in two ways: (i) to generate independent molecular states through classical diffusion sampling, and (ii) to extract the forces acting on the molecules, enabling molecular dynamics simulations. This dual capability makes it possible to study both global equilibrium behavior and dynamical properties using the same model. However, while sampling works well, simulating dynamics directly with diffusion models is (as we show) an ill-posed problem and often leads to inconsistent results. In our work, we introduce a Fokker-Planck-based regularization that enforces consistency between sampling and simulation. With this, our approach is capable of sampling and simulating larger proteins such as BBA with diffusion models, without requiring data labels or physical priors. Moreover, we demonstrate that our method can generalize across molecules, enabling transferable models that can be applied beyond the training system.
Cached Token Similarity Is a Strong Prior for Fine-grained Visual Question Answering
(Liangyu Zhong*, Fabio Rosenthal*, Joachim Sicking, Fabian Hüger, Thorsten Bagdonat, Hanno Gottschalk, Leo Schwinn)
While Multimodal Large Language Models (MLLMs) offer great perception and reasoning capabilities for image-text input, fine-grained Visual Question Answering (VQA) focusing on small details still remains a challenge. Although visual cropping techniques seem promising, recent approaches have several limitations: the need for task-specific fine-tuning, low efficiency due to uninformed exhaustive search, or incompatibility with efficient attention implementations. We address these shortcomings by proposing a training-free visual cropping method, dubbed FOCUS, that leverages MLLM-internal representations to guide the search for the most relevant image region. This is accomplished in four steps: first, we identify the target object(s) in the prompt; second, we compute an object relevance map using the key-value (KV) cache; third, we propose and rank relevant image regions based on the map; and finally, we perform the fine-grained VQA task using the top-ranked region. As a result of this informed search strategy, our method achieves strong performance across four fine-grained VQA datasets and two types of MLLM. It outperforms three existing visual cropping methods in both accuracy and efficiency, and matches the best-performing baseline, ZoomEye, with 3 - 6.5 x higher efficiency. Finally, we perform an ablation study to assess the impact of key design choices.