Investigating Fact Checking Approaches for Faithful Text Generation based on Structured Knowledge Bases
Some of the most impressive recent advancements in the field of Natural Language Processing (NLP) are centered around generative language models, that can produce complex and human-sounding text. Despite being highly structured and grammatically correct, generated text can often contain factual errors and made-up claims not supported by evidence. This phenomenon is also known as model hallucination and is a common problem in most Natural Language Generation (NLG) applications such as dialogue systems, machine translation, and text summarization.
Abstractive text summarization deals with models which generate shorter versions of large source document. Generating summaries that are faithful to the original text and factually consistent is an open research problem [1]. Some approaches to tackle this problem include introducing new training objectives and factuality metrics to the pre-training process of models. Another promising direction lies in post-editing with fact correction – methods that fact-check the claims in candidate summaries and try to edit the detected errors by grounding them to background knowledge.
The aim of this master thesis would be to develop and test approaches for post-editing and fact correction in generated summaries to make them more faithful to the original text and factually correct. Some approaches include retrieving evidence from external knowledge bases [2], using iterative editing with in-filling from masked language models [3], or some combination thereof. The developed methods would be evaluated using factuality metrics on common summarization datasets like X-SUM or CNN/DM, as well as domain-specific biomedical datasets.
Title (de) | Untersuchung von Faktenprüfungsansätzen für die getreue Texterstellung auf der Grundlage strukturierter Wissensdatenbanken |
Title (en) | Investigating Fact-Checking Approaches for Faithful Text Generation based on Structured Knowledge Bases |
Project | Scientific Claim Verification with Evidence from Text and Structured Knowledge (VeriSci) |
Type | Master's Thesis |
Status | completed |
Student | Andrei Staradubets |
Advisor | Juraj Vladika |
Supervisor | Prof. Dr. Florian Matthes |
Start Date | 15.07.2023 |
Sebis Contributor Agreement signed on | 05.07.2023 |
Checklist filled | Yes |
Submission date | 15.01.2024 |
Kick-off presentation slides | |
Final presentation slides | |
Thesis PDF |