Business Analytics and Machine Learning (IN2028), WS 23/24

Prof. Dr. Martin Bichler

Julius Durmann  ·  Johannes Knörr  ·  Markus Ewert  ·  Yutong Chao

Description

This is an introductory course in data analysis with a focus on methods relevant in management and economics. The participants will learn wide-spread methods for numerical prediction, classification, clustering, and dimensionality reduction.

The course comprises weekly lectures, exercise sheets, homework sheets and tutorials in smaller groups. The exercises consist of theoretical considerations, applications, and programming exercises in Python. Additionally, students can participate in the "Analytics Cup" which gives them the opportunity to employ their knowledge on realistic data sets.

The course is an elective for students in the BSc Mathematics. Students from IN, GE, and DE&A can choose only one of the following classes:

  • Data Mining, IN2023, 2V, WS, Prof. Runkler
  • Business Analytics, IN2028, 2V+2Ü, WS, Prof. Bichler
  • Data Analysis and Visualization in R, IN2339, 2V+4Ü, WS, Prof. Gagneur

Prerequisites

This is intended as a Bachelor course. Master students are invited unless they already had a machine learning or data mining course.

This course has a number of prerequisites:

  • For the initial classes, we expect students to have knowledge about basic inferential statistics (statistical estimation, statistical testing, and the simple linear regression).
  • For later classes you will need linear algebra (basis transformations) and calculus (convex functions, gradients, Hessian matrix).

We will provide some repetition, but cannot revisit entire full courses in statistics, linear algebra and calculus. If you are not comfortable with the mathematical basics mentioned above, this might not be the right one for you. Besides, the course focuses on methods for classification and regression as they are widely used in business applications. A few exemplary applications will be discussed to motivate certain methods, but the focus is not on applications!

Organization

Introduction: Please attend our first lecture on October 23, 2023 for organizational details.

Important links (registration and information):

Lecture: Monday, 2 pm - 4 pm, Audimax Galileo

Tutorials: Room 01.10.011, MI building, Garching.

Group 1 Tuesday 10 am - 12 pm online
Group 2 Tuesday 2 pm - 4 pm online
Group 3 Wednesday 10 am - 12 pm onsite
Group 4 Wednesday 12 pm - 2 pm onsite
Group 5 Thursday 10 am - 12 pm onsite
Group 6 Thursday 2 pm - 4 pm onsite
Group 7 Thursday 4 pm - 6 pm onsite
Group 8 Friday 10 am - 12 pm onsite
Group 9 Friday 12 pm - 2 pm onsite
Group 10 Friday 2 pm - 4 pm onsite

Note: For the exact dates of lecture and tutorials, please check the schedule in TUMOnline. Some dates might be subject to change due to holidays or university events.

Exam: There will be two exam opportunities (endterm and retake) in early 2024. Both exams are planned as on-site exams. There will be no online exam option.

Syllabus*

  1. Regression Analysis (estimators, test theory, OLS)
  2. Regression Diagnostics (Gauss-Markov theorem, GM assumptions, omitted variable bias, panel data analysis)
  3. Logistic and Poisson Regression (GLMs, logit, probit, poisson regression)
  4. Naïve Bayes and Bayes Nets (Bayes rule, learning Bayes nets, d-separation)
  5. Decision Tree Classifiers (entropy, C4.5, CART, tree pruning)
  6. Data Preparation and Causal Inference (practical data preparation, causal inference, IV, PSM, multiple imputation, etc.)
  7. Model Selection (gain curves, lift, ROC, bias-variance tradeoff)
  8. Ensemble Methods and Clustering (bagging, random forests, boosting, hierarchical clustering, k-means, expectation maximization)
  9. High-Dimensional Problems (PCA, SVD, PCA regression, PLS, ridge regression, LASSO)
  10. Convex Optimization (gradient descent, Newton's method)
  11. Neural Networks (feed-forward networks, backpropagation, gradient descent)
  12. Reinforcement Learning
  13. Presentation of Practical Project (Analytics Cup) and Guest Lecture

*may be subject to change

Literature

The presentation slides for the lectures and tutorials are accessible via Moodle. The contents of the lectures can be found in chapters from the following textbooks:

  • Trevor Hastie, Jerome Friedman, Robert Tibshirani: Elements of Statistical Learning, Springer, 2016.
  • Ian Witten, Eibe Frank, Mark Hall, Christopher Pal: Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., Morgan Kauffman, 2016
  • James H. Stock and Mark W. Watson: Introduction to Econometrics, Pearson Education.
  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning, Springer, 2014 

Contacts

Please use the moodle forum for general questions!
The contact mail below is only meant for personal questions.

Mail: ba@dss.cit.tum.de

Julius Durmann, M.Sc.
Room 01.10.054 (Garching)

Markus Ewert, M.Sc.
Room 01.10.055 (Garching)

Johannes Knörr, M.Sc.
Room 01.10.056 (Garching)

Yutong Chao, M.Sc.
Room 01.10.036 (Garching)