Our group will present seven papers at ICML 2025 (including two spotlight presentations). Moreover, we have one paper at CVPR 2025. Congratulations!
ICML 2025:
Privacy Amplification by Structured Subsampling for Deep Differentially Private Time Series Forecasting (Spotlight)
(Jan Schuchardt, Mina Dalirrooyfard, Jed Guzelkabaagac, Anderson Schneider, Yuriy Nevmyvaka, Stephan Günnemann)
Differentially Private Stochastic Gradient Descent (DP-SGD) is the standard method for training machine learning models on sensitive data with strong formal privacy guarantees. The core principle underlying these strong privacy guarantees is amplification by subsampling: Training on randomly sampled batches is much more private than training on an entire set of input–label pairs. But what if our training data is not simply an unstructured set, but composed of sequentially structured data like natural language or time series? What if there are no explicit labels, and we are instead training our model to predict the next sentence or the next 24 hours? We answer this question by deriving formal privacy guarantees for self-supervised training of sequence models. In particular, we analyze the interplay of sampling sequences from a dataset, sampling shorter subsequences from these sequences, and splitting them into context and ground-truth for training. Using time series forecasting as a testbed, we experimentally demonstrate that our tight privacy guarantees enables private training on sequential data while retaining high model utility.
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
(Simon Geisler, Tom Wollschläger, M. H. I. Abdalla, Vincent Cohen-Addad, Johannes Gasteiger, Stephan Günnemann)
Existing optimization-based LLM attacks often fail due to non-adaptive, single-target objectives inconsistent with actual harmful behavior ("affirmative response"). We reformulate the attack using reinforcement learning, introducing an adaptive, distributional, and semantic objective derived via REINFORCE policy gradients. This approach optimizes the probability of generating a harmful content based on the model's actual output distribution, rather than just a fixed target. Applying this to state-of-the-art GCG and PGD attacks substantially increases their effectiveness, for instance, boosting ASR from 2% to 50% against Llama 3 with circuit breaker defense. Our work demonstrates a more potent method for uncovering LLM weaknesses, highlighting the need for adaptive objectives in rigorous safety assessments.
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
(Tom Wollschläger, Jannes Elstner, Simon Geisler, Vincent Cohen-Addad, Stephan Günnemann, Johannes Gasteiger)
Prior work suggests that Large Language Model (LLM) refusal behavior is governed by a single linear direction in activation space, which fails to fully explain how safety alignments are bypassed. We challenge this view by introducing a novel gradient-based representation engineering approach (Refusal Direction Optimization - RDO) to precisely identify refusal-mediating directions, overcoming limitations of prompt-based methods like Difference-in-Means (DIM). Our method reveals that refusal is mediated not by single directions, but by multi-dimensional concept cones. Furthermore, we introduce representational independence, a stricter criterion than orthogonality, to dissect how interventions interact across layers and identify mechanistically distinct refusal pathways.
Applying RDO, we successfully identify multi-dimensional refusal cones and multiple representationally independent refusal directions, demonstrating that refusal mechanisms are geometrically complex and driven by several distinct factors. RDO directions achieve higher attack success rates with fewer side effects compared to DIM.
Our work provides a more nuanced understanding of LLM refusal geometry and offers a powerful gradient-based tool for analyzing and manipulating model behavior, crucial for advancing LLM safety and interpretability.
Uncertainty Estimation for Heterophilic Graphs Through the Lens of Information Theory
(Dominik Fuchsgruber*, Tom Wollschläger*, Johannes Bordne, Stephan Günnemann)
Existing uncertainty estimation methods for graphs often rely on homophily assumptions and consequently deteriorate in heterophilic settings where nodes connect to dissimilar neighbors.
We analyze Message Passing Neural Networks (MPNNs) from an information-theoretic perspective, deriving a graph-specific analog to the Data Processing Inequality. This reveals that, unlike for i.i.d. data or homophilic graphs, target-relevant information can increase with network depth in heterophilic settings, as different layers capture distinct semantic information from neighbors. Our analysis establishes a key design principle: robust uncertainty estimation on graphs, particularly under heterophily, requires jointly considering all latent node representations throughout the MPNN. We implement this principle with Joint Latent Density Estimation (JLDE), a simple post-hoc density estimator applied to the concatenated embeddings across layers. JLDE achieves state-of-the-art epistemic uncertainty on various heterophilic graph benchmarks and distribution shifts, while matching prior methods on homophilic graphs without explicit homophily exploitation. Our work offers a theoretically grounded approach for reliable uncertainty estimation beyond the homophily constraint.
Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
(Leon Götz, Marcel Kollovieh, Stephan Günnemann, Leo Schwinn)
Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data that reliably predict the potential benefits of local merging without requiring evaluation on downstream tasks. Our comprehensive empirical evaluation demonstrates that local merging offers substantial efficiency gains with minimal impact on accuracy, achieving up to 5400% acceleration on the recently proposed Chronos foundation model.
UnHiPPO: Uncertainty-aware Initialization for State Space Models
(Marten Lienen, Abdullah Saydemir, Stephan Günnemann)
State Space Models (SSMs) are quickly becoming a go-to architecture for handling sequential data, thanks in part to their ability to efficiently capture long-range dependencies. At the heart of their success lies the HiPPO framework, which enables expressive and efficient signal representations, but comes with a critical limitation: it fundamentally assumes noise-free data. Our latest work, UnHiPPO, tackles this head-on by introducing a principled way to account for measurement uncertainty in the initialization of SSMs. By reinterpreting the problem through the lens of linear stochastic control theory, we derive an initialization scheme that naturally filters out noise, enabling SSMs to retain their impressive memory capabilities even in realistic, noisy environments. We demonstrate that UnHiPPO fortifies state space models against noise both during training and at inference time, paving the way for more reliable sequence modeling in scientific and real-world applications.
Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation (Spotlight)
(Alessandro Palma, Sergei Rybakov, Leon Hetzel, Stephan Günnemann, Fabian J Theis)
Latent space interpolations are a powerful tool for navigating deep generative models, especially in applied contexts where they capture system trajectories on the manifolds of high-dimensional data. An example is single-cell RNA sequencing (scRNA-seq), where existing methods model cellular state transitions as latent space interpolations with variational autoencoders, often assuming linear shifts and Euclidean geometry. However, unless explicitly enforced, linear interpolations in the latent space may not correspond to geodesic paths on the data manifold, limiting methods that assume Euclidean geometry in the representation space. We introduce FlatVI, a novel training framework that enforces Euclidean geometry in the latent manifold of discrete-likelihood variational autoencoders, specifically tailored for modelling single-cell count data. By regularising straight lines in the latent space to approximate geodesic interpolations on the decoded single-cell manifold, FlatVI enhances compatibility with downstream analyses that assume Euclidean latent geometry. Results on synthetic data validate our approach theoretically. At the same time, applications to temporally resolved scRNA-seq datasets demonstrate improved reconstruction of cellular trajectories and more accurate inference of biologically meaningful velocity fields.
CVPR 2025:
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
(Sebastian Schmidt, Leonard Schenk, Leo Schwinn, Stephan Günnemann)
As the data demand for deep learning models increases, active learning (AL) becomes essential to strategically select samples for labeling, which maximizes data efficiency and reduces training costs. Real-world scenarios necessitate the consideration of incomplete data knowledge within AL. Prior works address handling out-of-distribution (OOD) data, while another research direction has focused on category discovery. However, a combined analysis of real-world considerations combining AL with OOD data and category discovery remains unexplored. Our work addresses this gap by proposing Joint Out-of-distribution filtering and data Discovery Active learning (Joda), to address both challenges simultaneously by filtering out OOD data before selecting candidates for labeling. In contrast to previous methods, we deeply entangle the training procedure with filter and selection to construct a common feature space that aligns known and novel categories while separating OOD samples. Our work provides a highly efficient and effective method that completely omits auxiliary models and training access to the unlabeled pool for filtering or selection.