Theoretical advances in deep learning (SS2020)

Theoretical advances in deep learning (Master seminar)

Final Presentations

For each paper it will be 30 min of presentation and a 10 min Q&A.

You can join the final presentations using this link: https://bbb.in.tum.de/deb-aqm-mkm

24.06.2020

10:00-10:40: Nearly-tight VC-dimension bounds for piecewise linear neural networks. Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian, COLT 2017
10:40-11:20: Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Arthur Jacot, Franck Gabriel, Clément Hongler. Neurips 2018
11:20-12:00: Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations. Yuanzhi Li, Tengyu Ma, Hongyang Zhang. COLT 2018
12:00-12:40: Can SGD Learn Recurrent Neural Networks with Provable Generalization? Zeyuan Allen-Zhu, Yuanzhi Li. Neurips 2019.

25.06.2020

13:00-13:40: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Neurips 2019
13:40-14:20: Are ResNets Provably Better than Linear Predictors? Ohad Shamir. Neurips 2018
14:20-15:00: On the Power and Limitations of Random Features for Understanding Neural Networks. Gilad Yehudai, Ohad Shamir. Neurips 2019
15:00-15:40: A convergence analysis of gradient descent for deep linear neural networks. Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu. ICLR 2019

26.06.2020

10:00-10:40: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks, Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang, ICML 2019
10:40-11:20: Convergence of Adversarial Training in Overparametrized Neural Networks. Ruiqi Gao et al. Neurips 2019
11:20-12:00: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma. Neurips 2019

News

Due to the current situation, we will have to adjust the structure of the seminar as follows:

Instead of the first meeting on April 30th, we will record this lecture, and you will receive a detailed overview and instructions for the seminar.
Depending on the development of the situation, we will decide later if the final presentation can be done in person or if a remote solution is necessary.
In the future, we will put all relevant information on this website
All further details should be addressed in the first 'meeting.'

Time plan

Janurar 28: Pre-course meeting will be held between 17:30 - 18:30 in Hörsaal 2 Slides for the meeting are here
February 22-29: Provide preference for papers (forms will be sent; select 3+ papers)
March 6: Assignment of papers
April 30: First meeting (assignments, reports and organisation)
May 3: Deadline for de-registration
June 1: Submit report and first version of slides (both as PDF)
June 24-26: Final presentation (block seminar, date to be finalised) • Office hours: 1 hour every week (date to be fixed)

Focus of seminar

Neural networks, particularly deep networks, have achieved unprecedented popularity over the past decade. While the empirical success of neural networks has reached new heights, one of the major achievements in recent years has been new theoretical studies on the statistical performance of neural networks.
This seminar will look at the following important topics on neural networks from a mathematical perspective:

Generalization error for neural networks and related concepts from learning theory
Optimization and convergence rates for neural networks
Sample complexity and hardness results
Connection of deep learning to other learning approaches (kernel methods etc)
Robustness of neural networks

Several recent papers from top machine learning conferences will be discussed during the seminar.

Pre-requisites

Prior knowledge of Machine learning (IN2064 or equivalent) is mandatory.
Experience of Statistical foundations of learning (IN2378); Introduction to deep learning (IN2346) will be preferred

Paper List (tentative)

Generalisation for neural networks

VC-dimension: How complex are the trained models?

Sample complexity: How much training data to learn an accurate model?

Nearly-tight VC-dimension bounds for piecewise linear neural networks. Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian, COLT 2017
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma. Neurips 2019

Optimisation in neural networks

Convergence in deep linear networks

A convergence analysis of gradient descent for deep linear neural networks. Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu. ICLR 2019

Converged solution better with non-linearity

Are ResNets Provably Better than Linear Predictors? Ohad Shamir. Neurips 2018

Optimisation in Over-parameterisated NNs

Generalization bound independent of network size

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks, Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang, ICML 2019

Gradient descent does regularisation

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations. Yuanzhi Li, Tengyu Ma, Hongyang Zhang. COLT 2018

Analysis of Stochastic Gradient Descent

With vanilla SGD, RNN can learn some concept classes

Can SGD Learn Recurrent Neural Networks with Provable Generalization? Zeyuan Allen-Zhu, Yuanzhi Li. Neurips 2019.

Exponential convergence rates for SGD

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning. Siyuan Ma, Raef Bassily, Mikhail Belkin. ICML 2018.

Adversarial ML / Robustness

Convergence of robust loss minimisation

Convergence of Adversarial Training in Overparametrized Neural Networks. Ruiqi Gao et al. Neurips 2019

Broader theory of risk bounds in presence of adversaries

Theoretical Analysis of Adversarial Learning: A Minimax Approach. Zhuozhuo Tu, Jingwei Zhang, Dacheng Tao. Neurips 2019.