Business Analytics and Machine Learning (IN2028), WS 22/23

Prof. Dr. Martin Bichler

Nils Kohring M.Sc.  ·  Johannes Knörr M.Sc.  ·  Markus Ewert M.Sc.  ·  Dmitrij Boschko M.Sc.

Note

This page is under construction and will be updated in the coming weeks.

We are looking for motivated tutors, see our job ads for tutor positions and for analytics tutor positions.

All information on this page are preliminary due to Corona. We are currently planning on conducting most of the lectures and tutorials in person with some sessions offered online.

Prerequisites

This course has a number of prerequisites:

  • For the initial classes, we expect students to have knowledge about basic inferential statistics (statistical estimation, statistical testing, and the simple linear regression).
  • For later classes you will need linear algebra (basis transformations) and calculus (convex functions, gradients, Hessian matrix).

We will provide some repetition, but cannot revisit entire full courses in statistics, linear algebra and calculus. If you are not comfortable with the mathematical basics mentioned above, this might not be the right one for you. Besides, the course focuses on methods for classification and regression as they are widely used in business applications. A few exemplary applications will be discussed to motivate certain methods, but the focus is not on applications!   

Organization

  • Time and place: Lectures are on Mondays, 14:00-16:00 (see TUMonline for changes). There are multiple tutorial groups that take place at various times of the week (see TUMonline).
  • Description: Module description of IN2028.
  • Requirements: This is intended as a Bachelor course requiring introductory classes on statistics , calculus, and algorithms. Master students are invited unless they already had a machine learning or data mining course. 
  • News and materials: Will be uploaded to Moodle.
  • Registration: To participate please register for the lecture via TUMonline (late registrations are possible). The tutorial registrations will open at a later point.
  • Format: On-site lectures with a mix of on-site and online tutorials.
  • Introduction: Please attend the first class (TBD) for organizational details.    
  • Exam: There will be two exam opportunities (endterm and retake) in early 2023. Both exams are planned as on-site exams. There will be no online exam option.
  • Questions: For any administrative issues, please contact Nils Kohring.

Description

This is an introductory course in data analysis with a focus on various methods for causal inference and applications in business and economics. The participants will learn wide-spread methods for numerical prediction, classification, clustering, and dimensionality reduction. The analysis of human choice behavior is particularly challenging and differs from other applications of data analysis and machine learning. This aspect will be central to this class. During tutorials, students will compute examples by hand and analyze data with the R language. The participants will be able to apply their knowledge during the Analytics Cup. This is a graded optional project where they get to analyze realistic data sets. If the grade in this project is better than the exam grade, it will be weighted by 33% and the exam by 67%. Therefore, participating students can only improve their grades. 

Students from IN, GE, and DE&A can choose only one of the following classes:

  • Data Mining, IN2023, 2V, WS, Prof. Runkler
  • Business Analytics, IN2028, 2V+2Ü, WS, Prof. Bichler
  • Data Analysis and Visualization in R, IN2339, 2V+4Ü, SS, Prof. Gagneur

Syllabus

  1. Regression Analysis (estimators, test theory, OLS)
  2. Regression Diagnostics (Gauss-Markov theorem, GM assumptions, omitted variable bias, panel data analysis)
  3. Logistic and Poisson Regression (GLMs, logit, probit, poisson regression)
  4. Naïve Bayes and Bayes Nets (Bayes rule, learning Bayes nets, d-separation)
  5. Decision Tree Classifiers (entropy, C4.5, CART, tree pruning)
  6. Data Preparation and Causal Inference (practical data preparation, causal inference, IV, PSM, multiple imputation, etc.)
  7. Model Selection (gain curves, lift, ROC, bias-variance tradeoff) and Introduction to the Analytics Cup (R tutorial)
  8. Ensemble Methods and Clustering (bagging, random forests, boosting, hierarchical clustering, k-means, expectation maximization)
  9. High-Dimensional Problems (PCA, SVD, PCA regression, PLS, ridge regression, LASSO)
  10. Association Rules and Recommenders (APRIORI, collaborative filtering: SVD-based and nearest neighbor), Neural Networks Intro
  11. Neural Networks (feed-forward networks, backpropagation, gradient descent)
  12. Convex Optimization (gradient descent, Newton's method)
  13. Presentation Analytics Cup

Literature

The presentation slides for the lectures and tutorials are accessible via Moodle. The contents of the lectures can be found in chapters from the following textbooks:

  • Trevor Hastie, Jerome Friedman, Robert Tibshirani: Elements of Statistical Learning, Springer, 2016. (E-Book)
  • Ian Witten, Eibe Frank, Mark Hall, Christopher Pal: Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., Morgan Kauffman, 2016 (E-Book)
  • James H. Stock and Mark W. Watson: Introduction to Econometrics, Pearson Education.
  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning, Springer, 2014 (E-Book)
  • Hadley Wickham, Garrett Grolemund: R for Data Science, 2017 (E-Book

Contacts

Nils Kohring, M.Sc.
Room 01.10.055 (Garching) 
Phone: 289-17506
E-Mail: nils(.)kohring(at)in(.)tum(.)de

Johannes Knörr, M.Sc. 
Room 01.10.056 (Garching) 
E-Mail: knorr(at)tum(.)de

Markus Ewert, M.Sc.
Room 01.10.055 (Garching) 
E-Mail: markus(.)ewert(at)tum(.)de

Dmitrij Boschko, M.Sc.
Room 01.10.056 (Garching) 
E-Mail: boschko(at)in(.)tum(.)de