Fundamentals of inference and learning, EE-411

This is an introductory course in the theory of statistics, inference, and machine learning, with an emphasis on theoretical understanding & practical exercises. The course will combine, and alternate, between mathematical theoretical foundations and practical computational aspects in python.

Professor: Florent Krzakala

Teaching Assistants: Davide Ghio, Ortiz Jimenez Guillermo, Dimitriadis Nikolaos, Luca Pesce

Content

The topics will be chosen from the following basic outline:

Statistical inference: Estimators, Bias-Variance, Consistency, Efficiency, Maximum likelihood, Fisher Information.
Bayesian inference, Priors, A posteriori estimation, Expectation-Minimization.
Supervised learning : Linear Regression, Ridge, Lasso, Sparse problems, high-dimensional Data, Kernel methods, Boosting, Bagging. K-NN, Support Vector Machines, logistic regression, Optimal Margin Classifier
Statistical learning theory: VC Bounds and Uniform convergence, Implicit regularisation, Double-descent
Unsupervised learning : Mixture Models, PCA & Kernel PCA, k-means
Deep learning: multi-layer nets, convnets, auto-encoder, Gradient-descent algorithms
Basics of Generative models & Reinforcement learning

For students: Moodle Link & videos of the course on TubeSwitch

Discussions: You can discuss and ask questions on the course. We use slack, which is a great platform for this, here is the invitation to join the forum forum on slack which is valid until the end of october.

Lecture List:

Short video on introduction and course information

21/9: (i) All of probability video part-a,video part-b video part-c; lecture notes

This first class is a recap on probability theory that will serve us well in this class. A good reference, and an absolutly recommended reading, for this lecture is Chap. 1-5 in All of statistics by Wasserman.

28/9: (ii) All of statistics video part-a,video part-b; lecture notes

This second class is focused on the theory of maximum likelihood estimation. There are many good references on the topic, including for instance chap. 9 in All of statistics, or for the Bayesian point of view, MacKay chap 2 and 3.

5/10: (iii) Supervised learning and KNN video part-a,video part-b. video part-c; lecture notes

A good read on supervised statistical learning is chapter 2 in An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani. They also discuss in detail K-neareast neighbors.

12/10: (iv) Gradient descents video part-a,video part-b; lecture notes

Gradient descent is the workhorse of all modern machine learning methods. There are many ressourse on gradient descent, from pedagogical ones to technical ones. Proximal operators are very powerful and are well described in this set of lectures Tibshirani1, Tibshirani2,Tibshirani3.

19/10 and 26/10 (v) Linear methods video part-a,video part-b;video part-c,video part-d; video part-e lecture notes

Linear methods are the simplest among all parametric methods, but are still extremly useful! A good discussion of OLD, Ridge and LASSO can be found in Chap 6, section 2 in An Introduction to Statistical Learning. Another good reference is this one. Linear classification methods are also at the center of machine learning technics, and are discussed in details in chapter 4 of An Introduction to Statistical Learning.

2/11 (vi) Features maps and Kernels video part-a, video part-b , video part-c lecture notes

Richers features maps that linear ones, and Kernel methods, are one of the most important aspect of supervised machine learning. Michael Jordan’s notes on kernel are a good reference. The review from Hofmann, Scholkopf and Smola is also very complete. Scikit-learn has a detailed and very efficient implementation.

9/11, 16/11 and 23/11 (vii) Neural networks & deep learning video part-a;video part-b;video part-c;video part-d; slides;lecture notes

Over the last decades, neural networks have made quite an impact, one might even say that they are at the origin of a revolution in machine learning and artificial intelligence. This simple website allows you to get intuition on how they actually work for simple dataset: Tensorflow playground. The universal approximation theorem is discussed in many references (see for instance here). Despite Backpropagation being a rather trivial application of the chain rule of derivatives from Newton and Liebnitz notes, it is the cornerstone of learning neural network. A good summary of gradient decent algorithms is here. Convnets have made quite an impact, and have revolutionized computer vision, see the nice introduction by Yann Lecun.

23/11 (viii) A short course on Statistical Learning Theory video part-a; video part-b;video part-c, lecture notes

There are many ressource on the topic online, and many books on this topic, which would deserve a course in its own. Nevertheless, it is good to have a basic understanding of where we stand theoretically and to have grasp of the notion of VC dimension.

30/11 (ix) Unsuperversised learning and dimensionality reduction video part-a video partb-b;video part-c; slides, lecture notes

Principal Component Analysis is (still) one of the most fundamental tool of machine learning. This post has great visual example, that you can play with to get an intuition.

7/12 and 14/12 (x) Generative model and clustering video part-a, lecture notes

Scikit learn has a good implemenation of k-means. Generative models are fundamental part of machine learning. The connection between Mixture of Gaussians and k-means clustering is well explained in David MacKay’s book page 300. The book is a very useful reference on this topic and probability in general (for instance Monte-Carlo methods adiscussed page 357). Boltzmann machines are discussed in in many places, for instances here and there. Generative Adversarial networks are very fashionable these days (check out This Person does not exists!). An introduction in pytorch is available here.

21/12 (xi) Everything else Slides.video part-a,video part-b, lecture notes

Ensembling methods are reallyan efficeint appraoch to machine learning. Here is a good reference for Adaboost. The book Introduction au Machine Learning has a great chapter on bagging and boosting.

RNN are still very useful (even though these days transformers are to be taking the lead!). We used extensivly the following introduction. A simple RNN implementaion for learning to add number in keras is given here.

Reiniforcement learning us certianly one of the most interesting direction these days. You can find a simple implementaion of q-learning herefor Frozen lake and of policy gradient for cartpole. The nature paper on alpha go is a fasicnating read on the new era of reinforcemecent learning.

Lab classes:

Before the classes: A short intro to python and to vizualization and making plots with Matplotlib).

Homeworks:

Projects (Due before February 5!):List of projects

A list of references

A good book for probability and statistics, accessible to students, is Larry A. Wasserman ‘s All of Statistics.
An accessible introduction to statistical learning is given in Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie.
Another great reference is Machine Learning:A Probabilistic Perspective by Kevin P. Murphy. MacKay’s Machine Learning:A Probabilistic Perspective Information Theory, Inference and Learning Algorithms is also a very useful ressource.
Modern Deep learning is well covered in this recent book: Dive into Deep Learning by A. Zhang, Z. Lipton, M. Li, A.J. Smola.
Un recent, et excellent, livre de reference en Francais: Introduction au Machine Learning par Chloé-Agathe Azencott.

Course Policies

Homeworks: There will be three homework assignments, each worth 15% of the final grade.
Scribing: Each students is expected to provide scribing for at least one lecture. Scribing will also count for 15% of the final grade. Student must write their name on the list of scribes and claim lectures on the scribing list together with ready-to-se overleaf links (see latex section below).
Projects: Project will account for 40% of the final grade. You may work in teams of 2-4 people. There will be a limited number of project to choose from, and you will not be able to chose other projects. Each team member’s contribution should be highlighted. You should use the project as an opportunity to “learn by doing”.
Exam: There will be no written exam.
Videos: videos of the lecture will be posted on the SwichTube channel of the course.
Academic Integrity: Collaboration among students is allowed, and encouraged, but is intended to help you learn. In other words, you may work on solving assignments together, but you should always write up your solutions separately. You should always implement code alone as well. Whenever collaboration happens, it should be reported by all parties involved in the relevant homework problem.

FAQ

How can I use python on my computer?

Two good options to run python online are EPFL Noto & Google Colab. Noto is EPFL’s JupyterLab centralized platform. It allows teachers and students to use notebooks without having to install python on their computer. Google colab provides a similar solution, with the added avantage that it gives access to GPUs. For instnace, you can open the jupyter notebook corresponding to the first exercice by a) opening google colab in your browser) b) selecting github, and c) writing the path https://github.com/IdePHICS/FundamentalLearningEPFL2021/blob/main/TP1/FoIL_ex1_public.ipynb

I do not know any python! What should I do?

TP0 provides a short introduction. If you need more and really need to study python, here is a a good Python and NumPy Tutorial.

What is overleaf?

If you cannot compile LaTeX on your own computer (and even if you can, this is often a good strategy anyway), EPFL is providing Overleaf Professional accounts for all students: Overleaf EPFL . With Overleaf you can write and compile LaTeX directly from your web browser.