Master's Thesis Andreas Probst
Improving German Legal Information Retrieval with Contextual Word Embeddings for Word Sense Disambiguation
Motivation
Legal research is an important task for lawyers. It has been shown that Word Sense Disambiguation (WSD) has an important impact on general information retrieval (IR) because the ambiguity in natural language can have detrimental effects on the performance of text-based IR systems. On the one hand, precision in IR can be improved if only documents containing the relevant word sense in relation to a search query are retrieved. On the other hand, users can potentially benefit from directly displaying the different word senses of a word for search query terms. Current transformer-based models such as BERT and its derivatives dominate existing benchmarks due to their ability to capture context-sensitive information. They intrinsically contain the ability to encode WSD. It has not yet been investigated how transformer-based models for WSD perform on German legal text corpora. The main goal of this thesis is to qualitatively and quantitatively evaluate the performance of such models on German court rulings.
Research Questions
- What algorithms already exist to automatically classify word senses (knowledge-
based, unsupervised, and supervised)? - What possibilities are there to compensate for the lack of German sense annotated
(legal) data? - How do WSD algorithms perform on German (legal) text?
- How do legal experts judge the usefulness of our word sense filter?
| Attribute | Value |
|---|---|
| Title (de) | Verbesserung der juristischen Informationsbeschaffung mit Hilfe von kontextuellen Word Embeddings für Word Sense Disambiguation |
| Title (en) | Improving German Legal Information Retrieval with Contextual Word Embeddings for Word Sense Disambiguation |
| Project | Semantic Analysis of Court Rulings |
| Type | Master's Thesis |
| Status | completed |
| Student | Andreas Probst |
| Advisor | Ingo Glaser |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 15.01.2021 |
| Sebis Contributor Agreement signed on | 01.12.2020 |
| Checklist filled | Yes |
| Submission date | 15.07.2021 |