Business Analytics and Machine Learning (IN2028), WS 21/22

Prof. Dr. Martin Bichler

Nils Kohring M.Sc.  ·  Stefan Heidekrüger M.Sc.  ·  Johannes Knörr M.Sc.  ·  Markus Ewert M.Sc.

Note: Module Renamed

Starting in the winter term 2021/22, the former module "IN2028 Business Analytics" is being replaced by the new module "IN2028 Business Analytics and Machine Learning". It is primarily intended for Bachelor students after introductory classes in statistics and algorithms.

It has been brought to our attention that the "curriculum" views in TUMonline have not yet been updated for some non-informatics study programs where IN2028 is an elective.

If you expect to find IN2028 in your personal curriculum tree in TUMonline but can no longer find it, select "Display --> show inact. nodes" on the top left. You should then be able to see and sign up for the module. If this happens to you, we strongly advise that you additionally contact your degree program coordinator to confirm that the new module will be eligible in your degree program. 

Note: Information Preliminary due to Corona

This is a very large class and due to the precautionary measures resulting from the Covid pandemic, we will provide the lecture prerecorded every week. Besides, many students informed us already that they cannot attend in-person and some cannot even make it to Munich due to reasons beyond their control. This website is for preliminary informational purposes only. Final information on organization and format will be announced to registered participants via the course's moodle page at the beginning of the semester.

Note: Prerequisites

This course has a number of prerequisites:

  • For the initial classes, we expect students to have knowledge about basic inferential statistics (statistical estimation, statistical testing, and the simple linear regression).
  • For later classes you will need linear algebra (basis transformations) and calculus (convex functions, gradients, Hessian matrix).

We will provide some repetition, but cannot revisit entire full courses in statistics, linear algebra and calculus. If you are not comfortable with the mathematical basics mentioned above, this might not be the right one for you. Besides, the course focuses on methods for classification and regression as they are widely used in business applications. A few exemplary applications will be discussed to motivate certain methods, but the focus is not on applications!   

Organization

  • Time and place: Lectures are on Thursday, 08:00-10:00 (prerecorded online)
  • Description: Module description of IN2028
  • Requirements: This is intended as a Bachelor course requiring introductory classes on statistics , calculus, and algorithms. Master students are invited unless they already had a machine learning or data mining course. 
  • News and materials: Will be uploaded to Moodle
  • Registration: To participate please register via TUMonline (late registrations are possible)
  • Format: Hybrid: video lectures plus online and in-person tutorials.
  • Introduction: Please attend the first class (webinar passw: ba) on Oct. 21, 2021 at 8am for organizational details.    
  • Exam: There will be two exam opportunities (endterm and retake) in early 2022. Both exams are conducted as remote exercises via TUMexam. There will be no on-site exam option.
  • Questions: For any administrative issues, please contact Nils Kohring

Description

This is an introductory course in data analysis with a focus on various methods for causal inference and applications in business and economics. The participants will learn wide-spread methods for numerical prediction, classification, clustering, and dimensionality reduction. The analysis of human choice behavior is particularly challenging and differs from other applications of data analysis and machine learning. This aspect will be central to this class. During tutorials, students will compute examples by hand and analyze data with the R language. The participants will be able to apply their knowledge during the Analytics Cup. This is a graded optional project where they get to analyze realistic data sets. If the grade in this project is better than the exam grade, it will be weighted by 33% and the exam by 67%. Therefore, participating students can only improve their grades. 

Students from IN, GE, and DE&A can choose only one of the following classes:

  • Data Mining, IN2023, 2V, WS, Prof. Runkler
  • Business Analytics, IN2028, 2V+2Ü, WS, Prof. Bichler
  • Data Analysis and Visualization in R, IN2339, 2V+4Ü, SS, Prof. Gagneur

Syllabus

  1. Regression Analysis (estimators, test theory, OLS)
  2. Regression Diagnostics (Gauss-Markov theorem, GM assumptions, omitted variable bias, panel data analysis)
  3. Logistic and Poisson Regression (GLMs, logit, probit, poisson regression)
  4. Naïve Bayes and Bayes Nets (Bayes rule, learning Bayes nets, d-separation)
  5. Decision Tree Classifiers (entropy, C4.5, CART, tree pruning)
  6. Data Preparation and Causal Inference (practical data preparation, causal inference, IV, PSM, multiple imputation, etc.)
  7. Model Selection (gain curves, lift, ROC, bias-variance tradeoff) and Introduction to the Analytics Cup (R tutorial)
  8. Ensemble Methods and Clustering (bagging, random forests, boosting, hierarchical clustering, k-means, expectation maximization)
  9. High-Dimensional Problems (PCA, SVD, PCA regression, PLS, ridge regression, LASSO)
  10. Association Rules and Recommenders (APRIORI, collaborative filtering: SVD-based and nearest neighbor), Neural Networks Intro
  11. Neural Networks (feed-forward networks, backpropagation, gradient descent)
  12. Convex Optimization (gradient descent, Newton's method)
  13. Presentation Analytics Cup

Literature

The presentation slides for the lectures and tutorials are accessible via Moodle. The contents of the lectures can be found in chapters from the following textbooks:

  • Trevor Hastie, Jerome Friedman, Robert Tibshirani: Elements of Statistical Learning, Springer, 2016. (E-Book)
  • Ian Witten, Eibe Frank, Mark Hall, Christopher Pal: Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed., Morgan Kauffman, 2016 (E-Book)
  • James H. Stock and Mark W. Watson: Introduction to Econometrics, Pearson Education.
  • Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani: An Introduction to Statistical Learning, Springer, 2014 (E-Book)
  • Hadley Wickham, Garrett Grolemund: R for Data Science, 2017 (E-Book

Contacts

Nils Kohring, M.Sc.
Room 01.10.055 (Garching) 
Phone: 289-17506
E-Mail: nils(.)kohring(at)in(.)tum(.)de

Stefan Heidekrüger, M.Sc.
Room 01.10.056 (Garching) 
E-Mail: stefan(.)heidekrueger(at)in(.)tum(.)de

Johannes Knörr, M.Sc. 
Room 01.10.056 (Garching) 
E-Mail: knorr(at)tum(.)de

Markus Ewert, M.Sc.
Room 01.10.05 (Garching) 
E-Mail: markus(.)ewert(at)tum(.)de

Prof. Dr. Martin Bichler
Room 01.10.061 (Garching) 
Phone: 289-17534 
E-Mail: bichler@in.tum.de