Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning

This page links to additional material for our paper

Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
Yan Scholten, Stephan Günnemann
International Conference on Learning Representations, ICLR 2025 (Spotlight)

Links

[PDF | Code | Poster]

Abstract

Conformal prediction provides model-agnostic and distribution-free uncertainty quantification through prediction sets that are guaranteed to include the ground truth with any user-specified probability. Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data, which can significantly alter prediction sets in practice. As a solution, we propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning. To ensure reliability under training poisoning, we introduce smoothed score functions that reliably aggregate predictions of classifiers trained on distinct partitions of the training data. To ensure reliability under calibration poisoning, we construct multiple prediction sets, each calibrated on distinct subsets of the calibration data. We then aggregate them into a majority prediction set, which includes a class only if it appears in a majority of the individual sets. Both proposed aggregations mitigate the influence of datapoints in the training and calibration data on the final prediction set. We experimentally validate our approach on image classification tasks, achieving strong reliability while maintaining utility and preserving coverage on clean data. Overall, our approach represents an important step towards more trustworthy uncertainty quantification in the presence of data poisoning.

Cite

@inproceedings{scholten2025provably,

     title={Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning},

     author={Yan Scholten and Stephan G{\"u}nnemann and Leo Schwinn},

     booktitle={The Thirteenth International Conference on Learning Representations},

     year={2025},

     url={https://openreview.net/forum?id=ofuLWn8DFZ}

}

To top

Informatik 26 - Data Analytics and Machine Learning

Prof. Dr. Stephan Günnemann

Technische Universität München
TUM School of Computation, Information and Technology
Department of Computer Science
Boltzmannstr. 3
85748 Garching

Sekretariat:
Raum 00.11.057
Tel.: +49 89 289-17256
Fax: +49 89 289-17257