Our group will present two papers at ICCV 2025. Congratulations!
Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
(Sebastian Schmidt*, Julius Koerner*, Dominik Fuchsgruber, Stefano Gasperini, Federico Tombari, Stephan Günnemann)
In panoptic segmentation, individual instances must be separated within semantic classes. As state-of-the-art methods rely on a pre-defined set of classes, they struggle with novel categories and out-of-distribution (OOD) data. This is particularly problematic in safety-critical applications, such as autonomous driving, where reliability in unseen scenarios is essential. We address the gap between outstanding benchmark performance and reliability by proposing Prior2Former(P2F), the first approach for segmentation vision transformers rooted in evidential learning. P2F extends the mask vision transformer architecture by incorporating a Beta prior for computing model uncertainty in pixel-wise binary mask assignments. This design enables high-quality uncertainty estimation that effectively detects novel and OOD objects enabling state-of-the-art anomaly instance segmentation and open-world panoptic segmentation. Unlike most segmentation models addressing unknown classes, P2F operates without access to OOD data samples or contrastive training on void (i.e., unlabeled) classes, making it highly applicable in real-world scenarios where such prior information is unavailable. Additionally, P2F can be flexibly applied to anomaly instance and panoptic segmentation.
GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation
(Phillip Mueller*, Talip Ünlü*, Sebastian Schmidt, Marcel Kollovieh, Jiajie Fan, Stephan Günnemann, Lars Mikelsons)
Precise geometric control in image generation is essential for fields like engineering, product design, and creative industries to control 3-D object features accurately in 2D image space. Traditional 3D editing approaches are time-consuming and demand specialized skills, while current image-based generative methods lack accuracy in geometric conditioning. To address these challenges, we propose GeoDiffusion, a training-free framework for accurate and efficient geometric conditioning of 3D features in image generation, ensuring viewpoint consistency. With GeoDrag, a novel dragging algorithm in our framework, we improve the accuracy and speed of drag-based image editing on geometry guidance tasks and general instructions on DragBench.