Data Analysis and Visualization in R

Module IN2339

Credit: 6 ECTS.

Moodle: https://www.moodle.tum.de/course/view.php?id=68693

Lecture Script: https://gagneurlab.github.io/dataviz/

Contact:  teaching-gagneurlab@in.tum.de

When and where?

This lecture is given in the winter term.

On Tuesday Oct 18th 14:00-16:00  (Interims II, Hörsall 1, 5416 Interimshörsäle II, Lichtenbergstr. 2b - Chemie) we will have a first session in presence, covering the course organization and introduction. This session will not be recorded. 

Recorded Lectures can be viewed on Moodle at any time during the semester. There will be no lecture in presence this year. 

Exercises:

The exercises consist of two sessions per week.

- Tutorials will only be held in presence: 1h30 sessions, in which exercises are solved and interactively supported by tutors. (Multiple sessions are held at different times throughout the week, where each student attends one.)

- Central exercise session will be held in presence: In this plenary session on Tuesday 14:00 - 16:00, (Interims II, Hörsall 1, 5416 Interimshörsäle II, Lichtenbergstr. 2b - Chemie)  solutions to the homework are presented and discussed.

Please make sure that you register twice in TUMonline, for the lecture and also for the separate exercise.

Teaching material will be shared via the platform Moodle. To access Moodle, a registration to the course via TUM online is needed.  

Description

This module teaches methodologies and good practice of data science using R. The lecture is structured into three main parts, covering the major steps of data analysis:

1. Get the data: how to fetch, and manipulate real-world datasets. How to structure them ("tidy data") to most conveniently work with them.

2. Look at the data: basic and advanced visualization techniques (grammar of graphics, unsupervised learning) will allow students to navigate and identify interesting signal in large and complex datasets and formulate hypotheses.

3. Conclude: concepts of statistical testing will allow concluding about the raised hypotheses. Also methods from supervised learning will allow to model data and build accurate predictors. Each week, the lecture is accompanied with exercises. During the exercises, combinations of the concepts seen during the lecture will allow performing more involved data analysis tasks. Students generate report that embed code and analysis. Many examples will stem from applications in genomics, but no pre-requisite in this domain is necessary.

Required background and Computer setup

Experience with programming of any language. The theoretical aspects of data analysis are kept low in this module. However, basics in probabilities are required.

Chapters 13-15 ("Introduction to Statistics with R", "Probability" and "Random variables") of the Book "Introduction to Data Science" https://rafalab.github.io/dsbook/ make a good refresher. Make sure all concepts are familiar to you. Check your knowledge by trying the exercises. 

Bring a laptop with RStudio installed, a free programming interface for the R language.

Who can attend

The module is an elective module for many study programs. Among others, it is in the catalogue of:

  • BSc and MSc Bioinformatics
  • MSc Informatics
  • MSc Information Systems
  • MSc Data Engineering and Analytics
  • Medicine students
  • BSc and MSc 'Management and technology'

Recommended reading

Lecture Script: https://gagneurlab.github.io/dataviz/

R for Data Science, by Garrett Grolemund and Hadley Wickham

Introduction to Data Science, by Rafael A. Irizarry. 

 

Topics

R programming basics, report generation with R markdown Importing, cleaning and organizing data (tidy data) Plotting and Grammar of graphics Unsupervised learning (hierarchical clustering, k-means, PCA) Drawing robust interpretations (empirical testing by sampling, classical statistical tests) Supervised learning (regression, classification, cross-validation)

 

Evaluation

The final exam is a 2 hours written exam. The mark will be the one of the final exam. The exact exam date has not yet been set. We will announce this as soon as we know.

We plan the exam to be in presence on campus. This is subject to change depending on covid-19 regulations. 

 

Teaching team

This lecture is given by a team of scientists with long experience in high-dimensional data analysis in the field of genomics: Prof. Julien Gagneur and members of his lab. 

If you have any questions, please contact us via Email to teaching-gagneurlab@in.tum.de

If you cannot register for the exam please contact the secretary of your study program. We cannot register students.