Large Language Models: Building Blocks, Training, and Control

Seminar Course: Large Language Models: Building Blocks, Training, and Control (IN2107, IN2396, IN4816)

Lecturer (assistant)	Florian Matthes [L] Jonas Gottal Katharina Sommer Tristan Till
Number	0000002186
Type	seminar
Duration	2 SWS
Term	Sommersemester 2026
Language of instruction	English
Position within curricula	See TUMonline

04.02.2026 14:00-15:00 01.12.035, Seminarraum , https://teams.microsoft.com/meet/34707632941688?p=360VVgc6Z3QSbflNaT
15.04.2026 14:00-18:00 01.10.011, Seminarraum (Inf. 18/19
16.04.2026 08:00-12:00 01.10.011, Seminarraum (Inf. 18/19
17.04.2026 12:00-18:00 01.10.011, Seminarraum (Inf. 18/19

Introduction:

Large Language Models (LLMs) have rapidly become the cornerstone of modern artificial intelligence, driving advances in language understanding, reasoning, and creativity across research and industry. Their development reflects a broader shift toward general-purpose systems that learn from massive datasets and exhibit increasingly complex capabilities. Understanding how these models are built, trained, and optimized is essential for anyone seeking to engage with today’s AI ecosystem—whether as a researcher, engineer, or critical observer of emerging technologies.

This seminar provides a deep, structured exploration of LLMs, from foundational principles and model architectures to training methodologies and computational challenges. Students will learn how innovations in attention mechanisms, large-scale optimization, and efficiency techniques shape the performance and scalability of current systems. By connecting theory with practice, the course equips participants with the knowledge to analyze, evaluate, and contribute to ongoing developments in large-scale AI, preparing them to navigate one of the most transformative areas of contemporary computing.

Seminar Topics (Summer 2026):

Some topics can change from one year to another.

1. Foundations of Large Language Models

1.1 Defining an LLM

Large Language Models (LLMs) have revolutionized natural language processing and artificial intelligence at large. This topic introduces their conceptual foundations: how they function as conditional generative models, the evolution from traditional NLP to large-scale language modeling, and the business and research motivations that have shaped their rise. Students will gain a clear understanding of what distinguishes LLMs from earlier approaches and how they fit into the broader deep learning landscape. Further, we will observe trends in LLM development and explain how recent innovations fit into the LLM ecosystem.

2. LLM Building Blocks

2.1 The Standard Transformer

This topic provides an in-depth overview of the transformer architecture that underlies nearly all modern LLMs. It covers the essential components (tokenization, embeddings, positional encodings, attention, normalization, and feed-forward layers) explaining how they interact to model dependencies.

2.2 Advanced Building Blocks

Modern LLMs incorporate numerous refinements to improve scalability, efficiency, and expressiveness. This topic explores advanced attention mechanisms, including head- and query-sharing variants (MHA, MQA, GQA), linear and sparse attention methods (FlashAttention, Native Sparse Attention), and long-context strategies such as chunked or windowed attention. Students will also study conditional computation through Mixture-of-Experts (MoE) and routing architectures, highlighting how these techniques push model capacity and efficiency boundaries.

3 LLM Training

3.1 Training Setup

Before training begins, significant effort goes into data collection, curation, and system design. This topic introduces the full training pipeline, from dataset assembly and preprocessing to benchmarking, training frameworks, and compute planning. Students will examine how large-scale training is orchestrated in practice, including practical challenges around evaluation, data quality, and reproducibility.

3.2 Pre-Training

Pre-training constitutes the foundational learning stage for any LLM. This topic explores the optimization processes, objectives, and techniques that enable models to acquire broad linguistic and world knowledge. Students will learn about initialization strategies, loss functions, optimizers, regularization methods, and scheduling.

3.3 Post-Training and Alignment

This topic examines how pretrained models are aligned with human intent and optimized for helpfulness, safety, and consistency. Students will analyze the goals of post-training, including supervised methods (SFT), preference optimization (DPO, ORPO, APO) and reinforcement learning techniques (RLHF, RLVR; PPO, GRPO). The discussion highlights trade-offs between efficiency, controllability, and alignment, as well as the role of auxiliary objectives in refining LLM behavior.

4. Computational Efficiency

4.1 Hardware Fundamentals

Large-scale model inference is constrained by hardware efficiency. This topic introduces the hardware foundations of LLM computation, including GPU/TPU architectures, memory hierarchies, communication bottlenecks, and performance measurement. Students will gain practical insight into how hardware design impacts model throughput, scalability, and cost.

4.2 Computationally Efficient LLMs

Even after training, deploying and running LLMs efficiently remains a major challenge. This topic covers methods for improving inference speed and reducing memory use, such as parameter-efficient fine-tuning (LoRA, adapters), quantization, pruning, distillation, and speculative decoding. Students will understand how these techniques enable practical use of large models across varying resource budgets.

Deliverables

Prerequisites	Regular attendance (not more than one missed session) Active Participation during sessions
Presentations	40 min: 30 min presentation 15 min discussion
Project/Demo	optional. 0.3 grade bonus
Seminar Paper	8 pages Latex-Template provided on Moodle
Peer Review	2 reviews of other seminar papers

Deliverables

Prerequisites	Regular attendance (not more than one missed session) Active Participation during sessions
Presentations	40 min: 30 min presentation 15 min discussion
Project/Demo	optional. 0.3 grade bonus
Seminar Paper	8 pages Latex-Template provided on Moodle
Peer Review	2 reviews of other seminar papers

Submissions

Deliverable	Deadline	Format
Final presentation slides	Before your talk	Powerpoint, Keynote or PDF
Code for the project	After your talk	.zip
Seminar paper for peer review	tbd	PDF based on provided LaTex template
Peer review	tbd	txt-File
Revised seminar paper	tbd	PDF based on provided LaTex template

Seminar Course: Large Language Models: Building Blocks, Training, and Control (IN2107, IN2396, IN4816)

Dates

Introduction:

Seminar Topics (Summer 2026):

Deliverables

Deliverables

Submissions