Lecture: Mining Massive Datasets

This course builds upon the knowledge you gained in the lecture Machine Learning (IN2064). It provides advanced learning principles and covers more complex data domains. Put simply: This course is "Machine Learning 2".

Information: The number of course participants is limited this year (to ensure a high quality correction of the project tasks and taking into account the limited personal capacity available). The selection of participants will be done after the closing date of the registration period. That is, we will not follow a "first come, first serve" principle.


In this course, you will learn advanced machine learning and data mining techniques to process complex and large-scale data. We will specifically focus on the learning techniques for (i) graphs/network data and (ii) temporal data/sequences. Since in many of today's applications the considered data is further very large, we also discuss how scalable mining and learning can be achieved. The practical relevance of these methods will be highlighted by multiple important applications such as time series segmentation, ranking, or community detection.

The preliminary syllabus of the course is as follow

  • Introduction
    • Machine Learning, Data Mining Process
    • Basic Terminology
  • Scalability
    • Similarity Estimation
    • Filter-Refine Paradigm
    • Hashing & Sketches
      • Min-Hashing
      • Locality Sensitive Hashing
    • Membership Test / Bloom Filter
    • Large-Scale Optimization
  • Temporal Data & Sequences
    • Autoregressive Models
    • HMMs
    • Embeddings (e.g. Word2Vec)
    • Neural Networks (e.g. RNN, LSTM)
  • Graphs & Networks
    • Laws, Patterns
    • (Deep) Generative Models
      • VAE, Implicit Models
      • Generative Models for Graphs
    • Spectral Methods
      • Ranking (e.g., PageRank, HITS)
      • Community Detection
    • Representation Learning for Graphs
      • Graph Neural Networks
      • (Unsupervised) Node Embeddings


  • Lecture/Exercise: Wednesdays, 2:15pm, Interims Hörsaal 1
  • Lecture/Exercise: Thursdays, 2:15pm, Interims Hörsaal 1
  • All course material will be made available via Piazza
  • Required knowledge: Content of our Machine Learning lecture