Master's Thesis Evan Christopher
Aligning Language Models for Differentially Private Text Generation
Differential Privacy (DP) has long been the standard for protecting tabular data, ensuring privacy at the mathematical level [1]. With the rapid advancement of Natural Language Processing (NLP), researcher have sought to extend these guarantees to unstructured text. However, despite this theoretical progress, the practical acceptance of DP in NLP (especially in text generation) remains low. This is driven by the inherent privacy-utility tradeoff: users consistently reject privatized text that look “unnatural”, opting instead for coherent text at the expense of privacy [2]. Consequently, recent literature emphasizes that for DP-NLP to gain widespread acceptance, the output of text privatization mechanisms must be reasonable, readable, and coherent output [2].
Current generative DP NLP approaches, including standard DP and dX-privacy, typically rely on the following pipeline: (1) calculating a latent representation, (2) applying noise, and (3) decoding downstream [3]. We identify a potential flaw in this approach, which undermines acceptability: the output often lacks natural flow, as pre-trained models are neither exposed to noisy text during training nor aligned with the objective of reconstructing fluid language from a corrupted encoding.
Therefore, to bridge this gap between privacy and acceptability, we first operationalize the concept of acceptability by conducting a scoping literature survey [4] of relevant evaluation metrics. Secondly, building on the DP-BART [5] architecture, we employ preference-based learning inspired by RLHF [6] that treats DP-perturbed latent representations as an out-of-distribution generalization challenge. To this avail, we conduct a user study in the form of a survey to gather human preference data on privatized outputs, using these judgments to align the decoder toward the "better end” of the random DP distribution. Finally, we evaluate our method against standard generative DP baselines to measure improvements.
Research Questions:
RQ1 How do DP mechanisms affect the semantic utility and human acceptability of text generated by large language models, and which evaluation metrics best capture human-likeness and task relevance?
RQ2 Can human preferences be leveraged to recover linguistic fidelity lost through DP in text generation?
RQ3 To what extent does integrating human preference alignment into differentially private text generation improve performance on downstream semantic tasks?
[1] Dwork, Cynthia, and Aaron Roth. "The Algorithmic Foundations of Differential Privacy."
[2] Meisenbacher, Stephen et al. "Investigating User Perspectives on Differentially Private Text Privatization.”
[3] Klymenko, Oleksandra et al. "Differential Privacy in Natural Language Processing: The Story So Far”
[4] Munn, Zachary, et al. "Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach.”
[5] Igamberdiev, Timour, and Ivan Habernal. "DP-BART for Privatized Text Rewriting under Local Differential Privacy”
[6] Ouyang, Long, et al. "Training language models to follow instructions with human feedback.”
| Attribute | Value |
|---|---|
| Title (de) | Alignment von Sprachmodellen für Textgenerierung unter Differential Privacy |
| Title (en) | Aligning Language Models for Differentially Private Text Generation |
| Project | |
| Type | Master's Thesis |
| Status | started |
| Student | Evan Christopher |
| Advisor | Stephen Meisenbacher |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 01.12.2025 |
| Sebis Contributor Agreement signed on | 27.11.2025 |
| Checklist filled | Yes |
| Submission date | 01.06.2026 |