This is an introductory course in the theory of statistics, inference, and machine learning, with an emphasis on theoretical understanding & practical exercises. The course will combine, and alternate, between mathematical theoretical foundations and practical computational aspects in python.
Professor: Florent Krzakala
Teaching Assistants: Davide Ghio, Ortiz Jimenez Guillermo, Dimitriadis Nikolaos, Luca Pesce
The topics will be chosen from the following basic outline:
- Statistical inference: Estimators, Bias-Variance, Consistency, Efficiency, Maximum likelihood, Fisher Information.
- Bayesian inference, Priors, A posteriori estimation, Expectation-Minimization.
- Supervised learning : Linear Regression, Ridge, Lasso, Sparse problems, high-dimensional Data, Kernel methods, Boosting, Bagging. K-NN, Support Vector Machines, logistic regression, Optimal Margin Classifier
- Statistical learning theory: VC Bounds and Uniform convergence, Implicit regularisation, Double-descent
- Unsupervised learning : Mixture Models, PCA & Kernel PCA, k-means
- Deep learning: multi-layer nets, convnets, auto-encoder, Gradient-descent algorithms
- Basics of Generative models & Reinforcement learning
Discussions: You can discuss and ask questions on the course. We use slack, which is a great platform for this, here is the invitation to join the forum forum on slack which is valid until the end of october.
Short video on introduction and course information
This first class is a recap on probability theory that will serve us well in this class. A good reference, and an absolutly recommended reading, for this lecture is Chap. 1-5 in All of statistics by Wasserman.
This second class is focused on the theory of maximum likelihood estimation. There are many good references on the topic, including for instance chap. 9 in All of statistics, or for the Bayesian point of view, MacKay chap 2 and 3.
A good read on supervised statistical learning is chapter 2 in An Introduction to Statistical Learning by James, Witten, Hastie and Tibshirani. They also discuss in detail K-neareast neighbors.
Week 4(13/10): Gradient descents EXO4
- HW1 (Due october 19, on moodle) Maximum likelihood and Probability
A list of references
- A good book for probability and statistics, accessible to students, is Larry A. Wasserman ‘s All of Statistics.
- An accessible introduction to statistical learning is given in Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie.
- Another great reference is Machine Learning:A Probabilistic Perspective by Kevin P. Murphy. MacKay’s Machine Learning:A Probabilistic Perspective Information Theory, Inference and Learning Algorithms is also a very useful ressource.
- Modern Deep learning is well covered in this recent book: Dive into Deep Learning by A. Zhang, Z. Lipton, M. Li, A.J. Smola.
- Un recent, et excellent, livre de reference en Francais: Introduction au Machine Learning par Chloé-Agathe Azencott.
- Homeworks: There will be three homework assignments, each worth 15% of the final grade.
- Scribing: Each students is expected to provide scribing for at least one lecture. Scribing will also count for 15% of the final grade. Student must write their name on the list of scribes and claim lectures on the scribing list together with ready-to-se overleaf links (see latex section below).
- Projects: Project will account for 40% of the final grade. You may work in teams of 2-4 people. There will be a limited number of project to choose from, and you will not be able to chose other projects. Each team member’s contribution should be highlighted. You should use the project as an opportunity to “learn by doing”.
- Exam: There will be no written exam.
- Videos: videos of the lecture will be posted on the SwichTube channel of the course.
- Academic Integrity: Collaboration among students is allowed, and encouraged, but is intended to help you learn. In other words, you may work on solving assignments together, but you should always write up your solutions separately. You should always implement code alone as well. Whenever collaboration happens, it should be reported by all parties involved in the relevant homework problem.
- How can I use python on my computer?
Two good options to run python online are EPFL Noto & Google Colab. Noto is EPFL’s JupyterLab centralized platform. It allows teachers and students to use notebooks without having to install python on their computer. Google colab provides a similar solution, with the added avantage that it gives access to GPUs. For instnace, you can open the jupyter notebook corresponding to the first exercice by a) opening google colab in your browser) b) selecting github, and c) writing the path
- I do not know any python! What should I do?
- What is overleaf?
If you cannot compile LaTeX on your own computer (and even if you can, this is often a good strategy anyway), EPFL is providing Overleaf Professional accounts for all students: Overleaf EPFL . With Overleaf you can write and compile LaTeX directly from your web browser.