Statistical Foundations of Learning (CIT4230004)

This course replaces the module IN2378, which will no longer be offered.

This course introduces the students to a statistical perspective of machine learning, and provides mathematical tools to analyze the performance of machine learning algorithms.

The first part of the course introduces concepts on statistical learning theory and other foundational results for machine learning. In particular, the following topics are covered:

  • Risk minimization, Bayes risk, Empirical risk
  • Statistical consistency and universal consistency
  • Vapnik-Chervonenkis (VC) theory of generalization
  • PAC learning and No free lunch theorem
  • Algorithmic stability
  • Universal approximation theorem
  • Boosting
  • Above techniques will be used to study generalization of Nearest Neighbor rule, Support Vector Machine and Neural Networks

The second part of the course introduces some advanced and recent topics on the theory of learning:

  • Approximation bounds for clustering
  • Theory of over-parametrized models
  • Training dynamics of neural networks, including neural tangent kernel

A concluding lecture will present some current challenges and recent works in the statistical foundations of learning.

Previous Knowledge Expected

Machine learning (IN2064 or equivalent); Discrete probability theory (IN0018); Analysis for informatics (MA0902)

Some background in statistics (MA2402 or equivalent) could be helpful.

Languages of Instruction

English

Recommended Reading

The first part of the course will mainly follow parts of the textbooks:

  • Shai Shalev-Shwartz, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
  • Luc Devroye, László Györfi, and Gábor Lugosi. A probabilistic theory of pattern recognition. Springer Science & Business Media, 2013.

The second part will be based on recent papers, such as:

  • David Arthur, and Sergei Vassilvitskii. "k-means++ the advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2007.
  • Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn, and Claire Mathieu. "Hierarchical clustering: Objective functions and algorithms." Journal of the ACM (JACM) 66 (4):1-42, 2019.
  • Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin. "Deep learning: a statistical viewpoint." Acta numerica 30:87-201, 2021.
  • Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J. Tibshirani. "Surprises in high-dimensional ridgeless least squares interpolation." The Annals of Statistics 50(2):949-986, 2022.
  • Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Russ R. Salakhutdinov, and Ruosong Wang. "On exact computation with an infinitely wide neural net." Advances in Neural Information Processing Systems 32, 2019.

Additional references, if needed, will be provided during the lectures.