Theoretical advances in deep learning (Master seminar)

Final Presentations

For each paper it will be 30 min of presentation and a 10 min Q&A.

You can join the final presentations using this link:


  • 10:00-10:40: Nearly-tight VC-dimension bounds for piecewise linear neural networks. Peter L. Bartlett, Nick Harvey, Chris Liaw, Abbas Mehrabian, COLT 2017
  • 10:40-11:20: Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Arthur Jacot, Franck Gabriel, Clément Hongler. Neurips 2018
  • 11:20-12:00: Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations. Yuanzhi Li, Tengyu Ma, Hongyang Zhang. COLT 2018
  • 12:00-12:40: Can SGD Learn Recurrent Neural Networks with Provable Generalization? Zeyuan Allen-Zhu, Yuanzhi Li. Neurips 2019.


  • 13:00-13:40: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Neurips 2019
  • 13:40-14:20: Are ResNets Provably Better than Linear Predictors? Ohad Shamir. Neurips 2018
  • 14:20-15:00: On the Power and Limitations of Random Features for Understanding Neural Networks. Gilad Yehudai, Ohad Shamir. Neurips 2019
  • 15:00-15:40: A convergence analysis of gradient descent for deep linear neural networks. Sanjeev Arora, Nadav Cohen, Noah Golowich, Wei Hu. ICLR 2019


  • 10:00-10:40: Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks, Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang, ICML 2019
  • 10:40-11:20: Convergence of Adversarial Training in Overparametrized Neural Networks. Ruiqi Gao et al. Neurips 2019
  • 11:20-12:00: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation. Colin Wei and Tengyu Ma. Neurips 2019


Due to the current situation, we will have to adjust the structure of the seminar as follows: 

  • Instead of the first meeting on April 30th, we will record this lecture, and you will receive a detailed overview and instructions for the seminar.
  • Depending on the development of the situation, we will decide later if the final presentation can be done in person or if a remote solution is necessary.
  • In the future, we will put all relevant information on this website
  • All further details should be addressed in the first 'meeting.'

Time plan

  • Janurar 28: Pre-course meeting will be held between 17:30 - 18:30 in Hörsaal 2 Slides for the meeting are here
  • February 22-29: Provide preference for papers (forms will be sent; select 3+ papers)
  • March 6: Assignment of papers
  • April 30: First meeting (assignments, reports and organisation)
  • May 3: Deadline for de-registration
  • June 1: Submit report and first version of slides (both as PDF)
  • June 24-26: Final presentation (block seminar, date to be finalised) • Office hours: 1 hour every week (date to be fixed)

Focus of seminar

Neural networks, particularly deep networks, have achieved unprecedented popularity over the past decade. While the empirical success of neural networks has reached new heights, one of the major achievements in recent years has been new theoretical studies on the statistical performance of neural networks. 
This seminar will look at the following important topics on neural networks from a mathematical perspective:

  • Generalization error for neural networks and related concepts from learning theory
  • Optimization and convergence rates for neural networks
  • Sample complexity and hardness results
  • Connection of deep learning to other learning approaches (kernel methods etc)
  • Robustness of neural networks

Several recent papers from top machine learning conferences will be discussed during the seminar.


  • Prior knowledge of Machine learning (IN2064 or equivalent) is mandatory. 
  • Experience of Statistical foundations of learning (IN2378); Introduction to deep learning (IN2346) will be preferred

Paper List (tentative)

Generalisation for neural networks

VC-dimension: How complex are the trained models?

Sample complexity: How much training data to learn an accurate model?

Optimisation in neural networks

Convergence in deep linear networks

Converged solution better with non-linearity

Optimisation in Over-parameterisated NNs

Generalization bound independent of network size

Gradient descent does regularisation

Analysis of Stochastic Gradient Descent

With vanilla SGD, RNN can learn some concept classes

Exponential convergence rates for SGD

Adversarial ML / Robustness

Convergence of robust loss minimisation

Broader theory of risk bounds in presence of adversaries

Other topics

Kernel behaviour of NNs

NNs and random features