Natural Language Processing - Methods and Applications - Software Engineering for Business Information Systems

Natural Language Processing - Methods and Applications (IN2107, IN4816)

Lecturer (assistant)	Florian Matthes [L] Anum Afzal Mahdi Dhaini Alexandre Mercier Stephen Meisenbacher Joshua Oehms Juraj Vladika
Number	0000002186
Type	seminar
Duration	2 SWS
Term	Sommersemester 2025
Language of instruction	English
Position within curricula	See TUMonline

10.02.2025 11:00-12:00 01.07.023, Seminarraum
25.04.2025 10:00-12:00 01.07.023, Seminarraum
02.05.2025 10:00-12:00 01.07.023, Seminarraum
09.05.2025 10:00-12:00 01.07.023, Seminarraum
16.05.2025 10:00-12:00 01.07.023, Seminarraum
23.05.2025 10:00-12:00 01.07.023, Seminarraum
30.05.2025 10:00-12:00 01.07.023, Seminarraum
06.06.2025 10:00-12:00 01.07.023, Seminarraum
13.06.2025 10:00-12:00 01.07.023, Seminarraum
20.06.2025 10:00-12:00 01.07.023, Seminarraum
27.06.2025 10:00-12:00 01.07.023, Seminarraum
04.07.2025 10:00-12:00 01.07.023, Seminarraum
11.07.2025 10:00-12:00 01.07.023, Seminarraum
18.07.2025 10:00-12:00 01.07.023, Seminarraum
25.07.2025 10:00-12:00 01.07.023, Seminarraum

Introduction:

In today's world, organizations, societies, and institutions rely heavily on natural language for communication. Vast amounts of unstructured information are stored in text documents, posing a significant challenge for machines to swiftly query relevant content and extract structured data.

Natural language processing (NLP), text mining, and natural language generation collectively encompass machine-driven solutions for text analysis, indexing, and creation. Over the past decades, a wide array of methods and solutions have emerged, propelled by the diverse challenges and rapid advancements in technologies like machine learning. This evolution highlights the vast potential of these tools.

This seminar is designed to explore the core technological components of NLP and their real-world applications. Participants will engage independently in research on a scientific topic using existing literature. They will then present and discuss their findings through presentations and seminar papers, fostering an enriching learning experience

Seminar Topics (Summer 2025):

Some topics can change from one year to another.

Foundations of NLP:

Word Embeddings: From Bag-of-words to Transformers: Machines cannot process text but only numbers. Therefore, representation of text in numerical format is necessary for doing any NLP task. This topic will give an overview of historical approaches for word embeddings,including simple word counting, embedding words as vectors in highly dimensional vector spaces, and modern transformer embeddings.
From N-grams to Large Language Models: This topic will provide an overview of NLP models over the years including some early classification models built via Naive Bayes approach and Hidden Marko Models, to Recurrent Neural Networks, to Transformer-based Language Models.
Large Language Models: Building Blocks: This topic covers the distinction between Language Models and Large Language Models, building blocks and training strategies, and different parameter scales of the same model.

Techniques in NLP:

From Binary to Extreme – An Overview of Text Classification Methods and their Challenges: The task of text classification is a fundamental task in Natural Language Processing, yet there are many ways to approach it. This topic will explore the spectrum of text classification methods, as well as discuss existing challenges and limitations.
Natural Language Processing and Computer-Human Interaction
Information Retrieval: (Domain-Specific) Improvement Strategies: This topic covers principles in dense and sparse Information Retrieval (IR). Since IR is the foundation of approaches such as RAG-systems it is important to ensure decent quality especially in specific domains like in the medical area. Therefore, this topic investigates techniques to make IR more domain aware.
(Generative) Information Extraction (including NER): Information Extraction, especially Named Entity Recognition, is a commonly used technique in order to structure textual data. This topic explores traditional techniques as well as generative approaches.

Large Language Models:

Domain Adaptations: In-context Learning, RAG and Fine-tuning: Language Models are often trained on general purpose data and do not perform well in specialized domains. Additionally, they have a knowledge cut-off depending on when they were trained. This topic would cover the techniques used to inject new data into LLMs through either updated weights (fine-tuning) or external techniques like In-context learning and Retrieval Augmented Generation.
Prompt-Engineering vs. PEFT Approaches: Since training/finetuning LLMs is expensive due to their hardware requirements, several approaches have been developed to decrease the amount of resources needed to adapt LLMs for your specific tasks. This topic will focus on various parameter-efficient fine-tuning (PEFT) and prompt engineering techniques with respect to different use cases.
Text Summarization: Approaches, Challenges and Evaluation: Automatic Text Summarization helps summarize enormous amounts of data into concise summaries. This topic cover various text summarization techniques, types of summarizations, and challenges involved in evaluation of summaries.
Question Answering Systems: Challenges and Approaches: One of the oldest tasks in NLP is question answering, i.e., building systems that can provide an answer to a posed user question. This includes multiple-choice QA, short-span QA, or long-form QA. The topic will investigate historical and recent approaches to QA systems and the challenges that come with integrating evidence, evaluating the answers, and more.
Model Hallucination: Generative language models tend to produce coherent text that sometimes contains hallucinations – text that is factually inconsistent and contradicts established knowledge. This topic will investigate the reasons hallucinations emerge, their characteristics in different NLP tasks, and methods to detect and mitigate them.
Common-sense & Logical Reasoning with LLMs: LLMs mainly work with unstructured textual input. Their reasoning capabilities can be improved by introducing logical predicates, thinking about problems step-by-step, or other techniques that improve their common-sense reasoning skills. The topic will investigate recent advancements in improving the reasoning process in LLMs.
Multimodal LLMs: SOTA Models, Techniques and Benchmarks/Usabillity (and Future Developments): Recent LLMs are already quite good in numerous text-only tasks. However many real-world use cases include for instance text, tables, charts, and images. Under this topic we will have a look into techniques, evaluation and the usability of multimodal models.
Explainability in Natural Language Processing: This topic explores the explainability aspect of NLP, covering various methods and approaches to interpret model outputs and decisions. It spans explainability techniques for both pre-trained and large language models while also highlighting the challenges and limitations of these methods
Explainable Fact-Checking with LLMs: This topic covers research on the ability of LLMs to verify claims and generate high-quality explanations and justifications for their assessments. It covers methods, datasets, and evaluation metrics, as well as various approaches for generating justifications and evaluating explanation quality in claim verification.
Tiny LLMs: Quantization, Pruning and Distillation : Modern LLMs are often scaled to datacenter-sizes to achieve their results but applications require them to run on edge-devices. This topic covers techniques to reduce their size while keeping as much of the quality as possible.You will explore quantization to reduce general models, and pruning and distillation for task-specific size reduction.
Agent-based Systems using LLM: Explore agentic architectures, frameworks for LLMs and separate hype from actual potential. Such systems aim to change how we interact with computers, enable actions by computers and attempt to push AI technology further.

Conversational AI:

Task-based, Social Conversational Agents & Dialogue Management (Dialogue State Tracking & Policy) : With new capabilities stemming from LLMs, dialogue systems need to adopt improved mechanisms for state tracking that can’t always rely purely on context windows. This topic will explore how conversations can be engineered to be empathetic and natural while also achieving the goals of the user.As the lines often get blurred between chitchat and tasks, mechanisms
Knowledge Graphs for Dialogue State Tracking

Privacy & Security in Natural Language Processing:

Ethical , Societal, and Legal Aspects of Large Language Models: Despite the rapidly advancing field of AI and NLP, powered by the impressive capabilities of LLMs, there have come to light a number of ethical, societal, and legal concerns regarding the proliferation of such tools. This topic will systematically explore and introduce the above mentioned concerns, bringing to light important considerations in modern NLP.
Differential Privacy in Natural Language Processing : Among privacy-preserving Natural Language Processing methods, the field of Differential Privacy (DP) in NLP has gained considerable traction in the research community, bringing about numerous innovative technical solutions. Despite this, there remains a number of challenges and open research directions. This topic will dive into DP in NLP, with a focus on providing a comprehensive yet approachable introduction to the field.
Adversarial Attacks on (Large) Language Models and their Mitigations: The beyond rapid growth of LLM usage has sparked countless productive and useful applications. Unfortunately, not all interactions with modern models are good-willed, and in the research sphere, many vulnerabilities have been uncovered with LLMs, along with adversarial attacks that exploit them. This topic will provide a cursory overview of such attacks, but also survey existing and proposed mitigation strategies to defend against malicious adversaries.
Large Language Model Alignment: The notion of alignment has become extremely important in modern LLMs, yet the discussions surrounding the topic are either divisive or not well-defined. This topic will provide clarity of LLM alignment – what it means, what are the predominant strategies and current thinking, and what are the most exciting research areas moving forward.

Deliverables

Prerequisites	Regular attendance (not more than one missed session) Active Participation during sessions
Presentations	40 min: 30 min presentation 15 min discussion
Project/Demo	optional. 0.3 grade bonus
Seminar Paper	8 pages Latex-Template provided on Moodle
Peer Review	2 reviews of other seminar papers

Deliverables

Prerequisites	Regular attendance (not more than one missed session) Active Participation during sessions
Presentations	40 min: 30 min presentation 15 min discussion
Project/Demo	optional. 0.3 grade bonus
Seminar Paper	8 pages Latex-Template provided on Moodle
Peer Review	2 reviews of other seminar papers

Submissions

Deliverable	Deadline	Format
Final presentation slides	Before your talk	Powerpoint, Keynote or PDF
Code for the project	After your talk	.zip
Seminar paper for peer review	28.07.25	PDF based on provided LaTex template
Peer review	04.08.25	txt-File
Revised seminar paper	11.08.25	PDF based on provided LaTex template

Natural Language Processing - Methods and Applications (IN2107, IN4816)

Dates

Introduction:

Seminar Topics (Summer 2025):

Deliverables

Deliverables

Submissions