Teaching Offered Summer Semester 2022

Master Practical Course - Legal Data Analysis Lab (IN2106)

Instructors: Shanshan Xu, Santosh Tokala

6 SWS, 10 ECTS

Session Times: TBA

Information Session

Monday, February 7, 2022, 16:00 [Session RecordingSlides]

Content Outline

The analysis of legal data/text and the design and development of systems that provide valuable functionality to legal practitioners pose various challenges. These include noisy raw data that must be carefully preprocessed, ill-defined tasks for which only small datasets exist and for which learning supervision and evaluation is difficult to obtain, and domain-specific information of various kinds that must be taken into account at many stages of the process.

This lab course provides students with an opportunity to gain practical experience in working with legal data in small teams. The instructors will be offering projects centered around a research question/hypothesis. They will typically involve one or more datasets from a legal domain, one or more formal tasks, and one or more methods to be tried. Over the course of the semester, teams will develop an experimental system/prototype and evaluate it, thereby producing new insight about that hypothesis.

After an initial introduction of the legal informatics topic, students will be matched into teams and assigned projects. Teams will meet with their project mentors regularly to present work updates, discuss progress, and define action items. At the end of three milestone intervals, teams will present their progress to the whole cohort and discuss all projects with their peers.

Learning Outcomes

After completing this module, students will have gained practice in planning, implementing, and evaluating a legal data science/informatics project. In particular, they will have gained experience in:

  • formulating an experimental hypothesis
  • identifying characteristics of data from the legal domain and explain how they influence technical aspects of project work
  • conduct a targeted prior work survey in the legal informatics literature for a given project context
  • designing an experimental system towards producing insight from data and/or developing new functionality of interest
  • conducting model evaluation and behavior analysis


Students must have experience in machine learning and, ideally, natural language processing. They should have taken the following courses or be sufficiently proficient in the topics and methods they cover:

  • IN2332: Statistical Modeling and Machine Learning
  • IN2062: Grundlagen der künstlichen Intelligenz / Foundations of Artificial Intelligence
  • IN2361: Natural Language Processing
  • IN2395: Legal Data Science & Informatics

If a student has not taken IN2395, it is expected that they familiarize themselves with background materials relevant to their respective project.


Lecture Course: "Legal Data Science and Informatics" (IN2395)

Master's Level Elective Module

4 SWS / 6 credits

Instructor: Matthias Grabmair (matthias.grabmair@tum.de)

Session times:  Tuesday & Thursday at 14:00-16:00 (starting on time)

Media: In S2022, the course will take place online via Zoom sessions. The exam will take place in presence.

Content Outline

The way lawyers, Judges, corporate legal counsel, government agencies, and businesses engage with legal systems, requirements, and processes is increasingly influenced by technology. Prominent areas of practical interest are the intelligent search and analysis of legal documents, the role of machine learning in supporting legal decision making, and modeling legal processes using expertise encoded in formal rule systems. This module provides an overview, and practical introduction, to the research and state of the art in applying data science and artificial intelligence methods to tasks and problems arising in and around the public and private practice of law. 

Legal decision making, legal data, and legal documents in particular challenge many mainstream modeling and analysis techniques. Hence, the module is intended to be taken by (1) broadly interested students from technical majors interested in challenging interdisciplinary work, and (2) political science / business / law students seeking to enhance their understanding of how new technologies can shape their field.

The module consists of a mix of lectures, discussion sessions, and small practical workshops following a thematic progression:

  • Introduction to legal systems, legal reasoning, and the impact of AI on legal practice
  • Basics of machine learning and natural language processing (NLP) (intended as a primer/refresher for nontechnical students; largely tailored to specific legal applications contexts)
  • Case- and rule-based formalisms of legal reasoning
  • Legal data analytics, including case outcome prediction and empirical legal studies
  • Equal treatment imperatives and fair machine learning
  • Applications of NLP on legal text

Module sessions will cover concepts in an example-driven way through a mix of lectures, guided programming workshops, and discussion of topical research publications that students are expected to read before class.

The course belongs to the "Fachgebiet MLA (Machine Learning & Analytics)"

Learning Outcomes

After completing this module, students will be able to:

  • explain knowledge representation and argumentation formalisms used in AI&Law
  • explain the application of techniques from statistics, applied machine learning, and natural language processing to legal data
  • examine and critique experimental work and systems in legal data science/informatics
  • explain the planning, implementation, and evaluation of legal data science/informatics research work


Grading will be based on a written literature survey and discussion paper on a given topic (40% of the final grade) and a written examination (60% of the final grade), both of which will take place at the end of the semester.

Students are encouraged to submit questions about weekly reading assignments (i.e. topical publications) ahead of the session, which will be picked up during in-class discussions. Submitting a minimum number of quality reading questions will lead to a grade bonus of 0.3 at the end of the semester.

Enrollment & Prerequisites

  • IN0002: Fundamentals of Programming
  • IN8026: Einführung in die Programmierung mit Python / Introduction to Programming with Python (or equivalent; students must be able to autonomously work with Jupyter notebooks in the Python ecosystem)
  • IN0018: Diskrete Wahrscheinlichkeitstheorie / Discrete Probability Theory (or equivalent; students must be able to work with basic concepts from probability and statistics)
  • IN2332: Statistical Modeling and Machine Learning
  • IN2062: Grundlagen der künstlichen Intelligenz / Foundations of Artificial Intelligence
  • Willingness/ability to work intensively across disciplines (reading legal text, drafting specifications, programming, and domain-specific data analysis)

A formal sign-up procedure will be announced.

Literature Sample

Interested students who wish to learn more can look at these exemplary publications, most of which will be discussed in the course.

  • LOLA v. SKADDEN, ARPS, SLATE, MEAGHER & FLOM LLP, Court of Appeals, 2nd Circuit 2015 , Dynamo Holdings et al. vs. Commissioner of Internal Revenue, Docket No. 2685-11, 8393-12; July 13, 2016  
  • Sergot, Marek J., Fariba Sadri, Robert A. Kowalski, Frank Kriwaczek, Peter Hammond, and H. Terese Cory. "The British Nationality Act as a logic program." Communications of the ACM 29, no. 5 (1986): 370-386. 
  • Surdeanu, Mihai, Ramesh Nallapati, George Gregory, Joshua Walker, and Christopher D. Manning. "Risk analysis for intellectual property litigation." In Proceedings of the 13th International Conference on Artificial Intelligence and Law, pp. 116-120. ACM, 2011. 
  • Voelter, Markus, Sergej Koscejev, Marcel Riedel, Anna Deitsch, and Andreas Hinkelmann. "A Domain-Specific Language for Payroll Calculations: a Case Study at DATEV." 
  • Merigoux, Denis, Nicolas Chataing, and Jonathan Protzenko. "Catala: A Programming Language for the Law." arXiv preprint arXiv:2103.03198 (2021). 
  • Oltramari, Alessandro, Dhivya Piraviperumal, Florian Schaub, Shomir Wilson, Sushain Cherivirala, Thomas B. Norton, N. Cameron Russell, Peter Story, Joel Reidenberg, and Norman Sadeh. "PrivOnto: A semantic framework for the analysis of privacy policies." Semantic Web 9, no. 2 (2018): 185-203. 
  • Katz, Daniel Martin, I. I. Bommarito, J. Michael, and Josh Blackman. "A general approach predicting the behavior of the Supreme Court of the United States" (April 12, 2017) PLOS One 
  • Shulayeva, Olga, Advaith Siddharthan, and Adam Wyner. "Recognizing cited facts and principles in legal judgements." Artificial Intelligence and Law 25, no. 1 (2017): 107-126. 
  • Cardellino, Cristian, Milagro Teruel, Laura Alonso Alemany, and Serena Villata. "A low-cost, high-coverage legal named entity recognizer, classifier and linker." In Proceedings of the 16th edition of the International Conference on Artificial Intelligence and Law, pp. 9-18. ACM, 2017.  [better formatted version]
  • Zheng, Lucia, Neel Guha, Brandon R. Anderson, Peter Henderson, and Daniel E. Ho. "When does pretraining help? assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings." In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pp. 159-168. 2021. 
  • Chalkidis, Ilias, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, and Prodromos Malakasiotis. "Paragraph-level rationale extraction through regularization: A case study on european court of human rights cases." arXiv preprint arXiv:2103.13084 (2021). 
  • Palau, R. and Marie-Francine Moens. “Argumentation mining: the detection, classification and structure of arguments in text.” ICAIL (2009). 
  • Nils Holzenberger, Andrew Blair-Stanek and Benjamin Van Durme, A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering, Proceedings of the 2020 Natural Legal Language Processing Workshop (NLLP) 
  • Bennett, Zachary, Tony Russell-Rose, and Kate Farmer. "A scalable approach to legal question answering." In Proceedings of the 16th edition of the International Conference on Artificial Intelligence and Law, pp. 269-270. ACM, 2017. 
  • Nils Holzenberger, Andrew Blair-Stanek and Benjamin Van Durme, A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering, Proceedings of the 2020 Natural Legal Language Processing Workshop (NLLP) 
  • Suresh, Harini, and John V. Guttag. "A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle." arXiv preprint arXiv:1901.10002 (2019). 
  • Engel, Christoph, and Keren Weinshall. Manna from Heaven for Judges–Judges’ Reaction to a Quasi-Random Reduction in Caseload. No. 2020_01. Max Planck Institute for Research on Collective Goods, 2020.
  • Albright, Alex. "If you give a judge a risk score: evidence from Kentucky bail decisions." Harvard John M. Olin Fellow's Discussion Paper 85 (2019).