Seminar Course: Large Language Models: Building Blocks, Training, and Control (IN2107, IN4816)
| Lecturer (assistant) | |
|---|---|
| Number | 0000002186 |
| Type | seminar |
| Duration | 2 SWS |
| Term | Sommersemester 2026 |
| Language of instruction | English |
| Position within curricula | See TUMonline |
- 04.02.2026 14:00-15:00 01.12.035, Seminarraum , https://teams.microsoft.com/meet/34707632941688?p=360VVgc6Z3QSbflNaT
- 15.04.2026 14:00-18:00 01.10.011, Seminarraum (Inf. 18/19
- 16.04.2026 08:00-12:00 01.10.011, Seminarraum (Inf. 18/19
- 17.04.2026 12:00-18:00 01.10.011, Seminarraum (Inf. 18/19
Preliminary Session:
The preliminary session will take place online on Wednesday, the 04.02.2026, at 2:00pm. Join the session online with the meeting details below:
Microsoft Teams meeting
Join: https://teams.microsoft.com/meet/34707632941688?p=360VVgc6Z3QSbflNaT
Meeting ID: 347 076 329 416 88
Passcode: Gm7H3Rf2
Introduction:
Large Language Models (LLMs) have rapidly become the cornerstone of modern artificial intelligence, driving advances in language understanding, reasoning, and creativity across research and industry. Their development reflects a broader shift toward general-purpose systems that learn from massive datasets and exhibit increasingly complex capabilities. Understanding how these models are built, trained, and optimized is essential for anyone seeking to engage with today’s AI ecosystem—whether as a researcher, engineer, or critical observer of emerging technologies.
This seminar provides a deep, structured exploration of LLMs, from foundational principles and model architectures to training methodologies and computational challenges. Students will learn how innovations in attention mechanisms, large-scale optimization, and efficiency techniques shape the performance and scalability of current systems. By connecting theory with practice, the course equips participants with the knowledge to analyze, evaluate, and contribute to ongoing developments in large-scale AI, preparing them to navigate one of the most transformative areas of contemporary computing.
Seminar Topics (Summer 2026):
Some topics can change from one year to another.
1. Foundations of Large Language Models
1.1 Defining an LLM
Large Language Models (LLMs) have revolutionized natural language processing and artificial intelligence at large. This topic introduces their conceptual foundations: how they function as conditional generative models, the evolution from traditional NLP to large-scale language modeling, and the business and research motivations that have shaped their rise. Students will gain a clear understanding of what distinguishes LLMs from earlier approaches and how they fit into the broader deep learning landscape. Further, we will observe trends in LLM development and explain how recent innovations fit into the LLM ecosystem.
2. LLM Building Blocks
2.1 The Standard Transformer
This topic provides an in-depth overview of the transformer architecture that underlies nearly all modern LLMs. It covers the essential components (tokenization, embeddings, positional encodings, attention, normalization, and feed-forward layers) explaining how they interact to model dependencies.
2.2 Advanced Building Blocks
Modern LLMs incorporate numerous refinements to improve scalability, efficiency, and expressiveness. This topic explores advanced attention mechanisms, including head- and query-sharing variants (MHA, MQA, GQA), linear and sparse attention methods (FlashAttention, Native Sparse Attention), and long-context strategies such as chunked or windowed attention. Students will also study conditional computation through Mixture-of-Experts (MoE) and routing architectures, highlighting how these techniques push model capacity and efficiency boundaries.
3 LLM Training
3.1 Training Setup
Before training begins, significant effort goes into data collection, curation, and system design. This topic introduces the full training pipeline, from dataset assembly and preprocessing to benchmarking, training frameworks, and compute planning. Students will examine how large-scale training is orchestrated in practice, including practical challenges around evaluation, data quality, and reproducibility.
3.2 Pre-Training
Pre-training constitutes the foundational learning stage for any LLM. This topic explores the optimization processes, objectives, and techniques that enable models to acquire broad linguistic and world knowledge. Students will learn about initialization strategies, loss functions, optimizers, regularization methods, and scheduling.
3.3 Post-Training and Alignment
This topic examines how pretrained models are aligned with human intent and optimized for helpfulness, safety, and consistency. Students will analyze the goals of post-training, including supervised methods (SFT), preference optimization (DPO, ORPO, APO) and reinforcement learning techniques (RLHF, RLVR; PPO, GRPO). The discussion highlights trade-offs between efficiency, controllability, and alignment, as well as the role of auxiliary objectives in refining LLM behavior.
4. Computational Efficiency
4.1 Hardware Fundamentals
Large-scale model inference is constrained by hardware efficiency. This topic introduces the hardware foundations of LLM computation, including GPU/TPU architectures, memory hierarchies, communication bottlenecks, and performance measurement. Students will gain practical insight into how hardware design impacts model throughput, scalability, and cost.
4.2 Computationally Efficient LLMs
Even after training, deploying and running LLMs efficiently remains a major challenge. This topic covers methods for improving inference speed and reducing memory use, such as parameter-efficient fine-tuning (LoRA, adapters), quantization, pruning, distillation, and speculative decoding. Students will understand how these techniques enable practical use of large models across varying resource budgets.
Deliverables
| Prerequisites |
|
| Presentations |
|
| Project/Demo | optional. 0.3 grade bonus |
| Seminar Paper |
|
| Peer Review |
|
Deliverables
| Prerequisites |
|
| Presentations |
|
| Project/Demo | optional. 0.3 grade bonus |
| Seminar Paper |
|
| Peer Review |
|
Submissions
| Deliverable | Deadline | Format |
| Final presentation slides | Before your talk | Powerpoint, Keynote or PDF |
| Code for the project | After your talk | .zip |
| Seminar paper for peer review | tbd | PDF based on provided LaTex template |
| Peer review | tbd | txt-File |
| Revised seminar paper | tbd | PDF based on provided LaTex template |