Master's Thesis von Markus Müller
Label Propagation for Tax Law Thesaurus Extension
Abstract
With the rise of digitalization, information retrieval has to cope with increasing amounts of digitized content. Legal content providers invest a lot of money for building domain- specific ontologies such as thesauri to retrieve a significantly increased number of relevant documents. Since 2002, many label propagation methods have been developed e.g. to identify groups of similar nodes in graphs. Label propagation is a family of graph-based semi-supervised machine learning algorithms. In this thesis, we will test the suitability of label propagation methods to extend a thesaurus from the tax law domain. The graph on which label propagation operates is a similarity graph constructed from word embeddings. We cover the process from end to end and conduct several parameter-studies to understand the impact of certain hyper-parameters on the overall performance. The results are then evaluated in manual studies and compared with a baseline approach.
This thesis is carried out in cooperation with Prof. Dr. Günnemann who holds the Professorship of Data Mining and Analytics at the chair for Datenbanksysteme at TUM.
Keywords: Thesaurus Extension, Legal Tech, Information Retrieval, Label Propagation, Word Embeddings, Data Science, Machine Learning
Code Repository
GitHub: sebischair/ThesaurusLabelPropagation
| Attribute | Value |
|---|---|
| Title (de) | Label Propagation zur Erweiterung von Steuerrechtsthesauri |
| Title (en) | Label Propagation for Tax Law Thesaurus Extension |
| Project | |
| Type | Master's Thesis |
| Status | completed |
| Student | Markus Müller |
| Advisor | Prof. Dr. Stephan Günnemann , Dr. Jörg Landthaler , Elena Scepankova |
| Supervisor | Prof. Dr. Florian Matthes |
| Start Date | 15.05.2018 |
| Sebis Contributor Agreement signed on | 03.05.2018 |
| Checklist filled | Yes |
| Submission date | 07.11.2018 |