Analysis of new phenomena in machine/deep learning (copy 1)

Practical - Analysis of new phenomena in machine/deep learning SS2023

News

Pre-cource meeting Feb 7, 2022 14:00. You can find the slides here and the sample paper we discussed here.

Please fill out this survay if you are intersted in participating: https://forms.gle/tCh7THb2u963e1FG (THIS DOES NOT REPLACE THE MATCHING SYSTEM)

(there seems to be a problem with the original survay link. If it does not work try: https://forms.gle/enE1SfyS5H8AgADE7)

All further details will be discussed in the pre-cource meeting and the forst meeting of the practical.

CONTENT

Deep neural networks produce state-of-the-art results on a wide range of machine learning problems. While deep learning still remains elusive to rigorous theoretical analysis, its phenomenal performance has shaken the mathematical foundations of machine learning—contradicting many conventional beliefs of classical learning theory and the fundamental understanding of how algorithms can successfully learn patterns. However, the past few years have seen an exciting combination of mathematical and empirical research, uncovering some of the mysteries of modern machine/deep learning and development of formal theories to explain them, such as:

Generalization: Why do classical theories fail to explain generalization in deep networks? We can analyze this setting for example under specific distributional assumptions to obtain more expressive bounds. (see e.g. [1])
Over-parameterization: When overfitting can be good for learning? Here we will analyze papers that focus on the double-decent phenomena. This describes the setting where over-parameterized NNs deviate from the traditional bias-variance trade-off as they may perform best in a zero training loss / interpolating regime. (see e.g. [2])
Kernel behavior: When do neural networks behave identically to kernel methods? Under a small learning rate, (S)GD training ☰ Neural Tangent Kernel (NTK), a dot product kernel in gradient space of the NN parameters. (see e.g. [3])
Robustness: Why are some neural networks more robust? Why can specific changes in data trick the predictions of neural networks? (see e.g. [4])
Unsupervised learning: Analyze the performance of models where the training is performed on unlabeled data. The main research focus here is on finding good latent representations and recovery rates. (see e.g. [5])

Note that this is not the final list of topics and papers and might be still up to change for the practical.

This course will familiarize the students with empirical practices used to explain surprising phenomena in deep learning. In particular, the students will be asked to reproduce empirical findings of recent papers from top ML conferences (Neurips, ICML, ICLR and AISTATS) and empirically extend these observations to more complex problems/models.

[1] Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy. In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors
[2] Reinhard Heckel, Fatih Furkan Yilmaz. Early Stopping in Deep Networks: Double Descent and How to Eliminate it [ICLR 2021]
[3] Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, On Exact Computation with an Infinitely Wide Neural Net. [NeurIPS 2019]
[4] Jeremy Cohen, Elan Rosenfeld, J. Zico Kolter. Certified Adversarial Robustness via Randomized Smoothing [ICML 2019]
[5] Xuchan Bao, James Lucas, Sushant Sachdeva, Roger Grosse. Regularized linear autoencoders recover the principal components, eventually [NeurIPS 2020]

STRUCTURE

Groups of 2 students with similar papers - 1 research paper per student. For each paper the student will have to complete two main parts:

understand the main theoretical ideas of the paper and reproduce the empirical findings.
extend on the empirical observations with further experiments.

Each student will have to submit a report on the papers including the main ideas and, the reproducibility in the middle of the semester and a final report on the experimental extensions as well as executable code (further details will be provided in the initial meeting).

The final grade will depend on the report on reproducibility (40%) and extensions (20%) as well as the final group presentation (40%). Each student is graded individually based on their report / presentation

PREVIOUS KNOWLEDGE EXPECTED

Machine learning (IN2064)
Introduction to deep learning (IN2346)
Statistical foundations of learning (IN2378) - optional

LANGUAGES OF INSTRUCTION

English