Probability and Machine Learning

Probability theory is one of the more recently established fields of mathematics, and it is currently a very active area of research. It has found numerous applications in mathematical modeling for computer science, finance, statistical mechanics, dynamical systems, and bioinformatics. Topics in probability theory include martingale theory, stochastic processes, stochastic calculus, ergodicity and others. Probability theory has drawn from many other areas of mathematics such as measure theory, integration theory, representation theory, real and complex analysis and in return has provided some interesting insights into algebraic structures, geometry of graphs, differential geometry, ergodic theory, partial differentiation equation theory, and sampling algorithms.

Probability courses

ORF/MAT 309: Probability and Stochastic Systems
The course goes over many topics in probability theory, without going too much in depth in any particular one. It begins with the basic properties of mathematical expectation and variance, with an emphasis on standard discrete and continuous distributions. For the second half of the semester the course discusses Poisson processes, Gaussian processes, Brownian motion, and Markov chains.
The course provides a firm understanding of the basics of probability theory and is intended primarily for students who have not had much experience with the subject or rigorous mathematical proofs. It is the basic mathematical course taken by engineers, specifically ORFE majors, who intend to work with stochastic modeling and simulations. The course is extremely helpful if one is to take courses in mathematical finance, artificial intelligence, stochastic calculus, asset/commodity pricing etc. ORF/MAT 309 compensates its lack in rigor with a lot of intuition, so that results do not simply appear out of thin air but are logically motivated and well supported. Many students find the course to be especially challenging as the amount of material covered is so large. ORF/MAT 309 focuses on developing ideas and methods for solving problems as opposed to carefully building theory and as such is an exceptionally fun course to take. The course is extremely well organized, following for the most part Professor von Handel’s notes. Problem sets focus on improving the understanding of the material. As one of the larger courses in the ORFE department it has many TAs who regularly host problem sessions.

MAT 385: Probability Theory
The course goes over several topics in probability theory in a somewhat deep manner. Initially students are introduced to the basic properties of mathematical expectation and variance, distribution functions and characteristic functions. The course then goes on to random walks on the integers, reflection principles, laws of large numbers, central limit theorems and laws of large deviations.
The course provides a good introduction to various topics in probability theory, with a special treatment of convergence results. It is intended for math majors who are interested in learning more about probability theory at a deeper level. It is part of a two-course sequence, the second course focusing on stochastic processes. MAT 385 encourages students to think about probability theory in a more precise manner, giving a fairly careful treatment of the material.
Homework sets aim to improve the student’s understanding of the theory and rarely go beyond the application of a couple of results. There are weekly problem sets that account for fifty percent of your grade and there are problem sessions a couple of days before each problem sets is due, where students go over the basic ideas behind each question.
What distinguishes this course with most others in the mathematics department is the format of the final. Several problems are assigned during the last lecture and students have two weeks to solve them and write up their solutions. They show their solutions to the examiners, who then continue to ask questions regarding the problems as well as any material covered in the course. This verbal/written examination is certainly an interesting trait of the course that makes it a bit challenging near the end. In addition, Professor Sinai assigns a couple of problems, which if solved would exempt the student from taking the final exam.
The course is quite unorthodox and interesting. During the semester it is not exceptionally challenging as problem sets are manageable and one can usually ask Professor Sinai or go to the problem sessions if stuck on a problem.

MAT 486: Random Processes
This course is the second part of the math department’s two-semester probability sequence. It is similar in organization and presentation to the first part, MAT 385: Probability, as it was developed and is usually taught by Prof. Yakov Sinai. MAT 486 covers a larger number of topics than 385, which are for most part deeper and harder than those in 385. You should consider taking it if you liked 385, and are interested in probability or a closely related field.
The contents of the course are centered around the notion of random processes, which are collections of random variables indexed by some set, and thus include, for instance, random sequences, random functions etc. The course covers the notion of stationarity, in which case representation theorems (spectral representation), and ergodic theorems (Birkhoff, von Neumann) are proved. Random fields, and an example of phase transition are described. Extending the notions from 385, Markov Chains with continuous time are introduced. The Wiener measure and Brownian Motion are constructed as the weak limit of random walks. These are applied to the solution of the Dirichlet problem, which you might have encountered in Fourier Analysis. Finally, stochastic integrals are constructed, and some famous formulas (Ito, Feynman-Kac) are proved. The course ends with a discussion of Stochastic Differential Equations and diffusion processes.
Generally speaking, the course covers much more, and in more depth than 385. There are many notions and theorems from analysis that come up, and are usually taken for granted: for instance, results from functional analysis, Hilbert spaces, Fourier analysis are used quite liberally. Probability has very strong ties to analysis, and it shows. It is usually possible to understand the probability results without extensive knowledge of those fields, however for a good appreciation, it helps to know them.
The organization of the class parallels that of 385, and the same pieces of advice apply. Go to the ‘office hour’, where the homework is discussed. If you have time, try the ‘hard problems’ that Professor Sinai assigns for an automatic A. There are usually a few students each year who solve them. But be prepared that the first hard part is understanding what the problems really ask, since they can be less than clear. The exam is oral, as in 385: you get 5 problems and a month to think about them. You are also allowed to use your notes and book. If you have solved most of the HW problems by yourself, you should be able to do well. Also, be prepared to be asked a large number of additional questions at the actual examination. These are usually routine computations, definitions, or proofs of theorems. However, in some cases it might be a little hard to understand what the questions are.
To sum up, MAT 486 is a more challenging continuation of 385. If you liked the latter, and are seriously interested in probability or a field where it’s applied, you should consider taking 486.

ORF 526: Probability Theory
The course covers most of the basics of probability theory, from a measure-theoretic point of view. It begins with the development of measure theory and Lebesgue integration. Most of the basic results from real analysis are covered, including developing integration from scratch, Fatou’s lemma, monotone and dominated convergence theorems etc. It then goes on to the general results in measure theory such as the Caratheodory extension theorem, Radon-Nikodym theorem, Fubini’s theorem etc. It especially focuses on different types of convergence and the connections between them. The course then goes on to Gaussian processes and the analysis of discrete martingale theory, including Doob’s inequalities and convergence results. Finally, the course quickly goes over the main facts behind Markov chains, including the Ergodic theorem and stationarity.
The course is intended for first year graduate students in the ORFE department although it is an excellent course for people interested in probability theory in general. The treatment of the material is extremely careful and well developed. For the most part the course follows the graduate text by Shiryaev, which is an excellent but challenging read. The theory on Markov chains follows mostly the introductory book by Norris.
There are weekly problem sets that account for about half the grade in the course. Most of them aim to improve student understanding of the theory and can be quite challenging. Lectures are usually spent proving results, with very little discussion of applications. As such the course is extremely theoretical in nature and its aim is to provide students with a firm basis for upper-level probability courses such as ORF 527 (Stochastic calculus) and ORF 551/4 (Levy processes/ Markov processes).
The course is exceptionally challenging for people who have not taken real analysis before, especially in the first few weeks. A good understanding of the material and constructions is necessary for completing problem sets as well as the final exam.
This course is of interest to many mathematics majors because it can be used to fulfill the Real Analysis requirement in lieu of one of the other courses listed on the Analysis page.

Machine Learning

Machine learning is a sub-field of artificial intelligence that lies at the intersection of computer science, statistics, and probability theory. In broad terms, it concerns the design, implementation, analysis, and application of algorithms that can “learn from experience.” In the last few decades, it has advanced extremely rapidly and found applications to a wide range of problems including analysis of language (“natural language processing”) and linguistics, modeling of genetic variation, quantitative modeling in the social sciences, spam filtering, automated discovery of identity theft, and many more.

While ML is a complex and wide field, it can be roughly split in two parts: (1) prediction (“supervised” ML); and (2) modeling (“unsupervised” ML). Prediction problems require that one predict the value of some variable in the future given many observations of it in the past; forecasting the value of a stock given its value (and market conditions) over the course of a year is an example. One way to think about modeling problems is as prediction problems in which the variable to be predicted has not been observed at all; forecasting the value of a stock just from market conditions, without ever seeing its previous values, would be an example of a modeling problem. In general, unsupervised problems concern the discovery and exploitation of hidden patterns in data: the canonical example is the problem of collecting together similar images in a large database (“clustering”).

From a math major’s perspective, either or both of the theoretical and applications-oriented sides of ML might be very interesting. Theoretical problems can involve the analysis of specific prediction algorithms, either from a computer science or statistical perspective; the analysis and design of methods of “inference,” the key algorithmic step in modeling that estimates important unobserved quantities from observed data; or connection of machine learning problems with ideas from information theory, probability theory, or other theoretical—and sometimes application—domains. Applications-oriented problems, a catch-all category that includes any problem that ultimately concerns the analysis of real data, show a similar degree of variety. They can involve the deployment of existing methods to interesting data analysis problems; the design of novel methodologies that allow for new kinds of analyses; the use of unsupervised methods to solve traditionally supervised problems; and a range of other research activities. It should be noted that ML research enjoys a rich and fruitful interplay between theory and practice, so that theoretical projects can often include an empirical component and applied projects often require understanding relevant theory—and sometimes extending it!

At Princeton, ML research occurs principally in the Departments of Computer Science (COS) and of Operations Research and Financial Engineering (ORFE), though Professor E (MAT) also is heavily involved in ML research. The main faculty members associated with theoretical ML are Professors Sanjeev Arora, Elad Hazan and Yoram Singer. (Click here to view their research group’s page.) More broadly than theory, however, is the AI/ML group, whose website can be found here.

Below are a handful of ML courses that might be of particular interest to mathematics students.

MAT 490: Mathematical Introduction to Machine Learning

COS 511: Theoretical Machine Learning

Contacts

Evgeni Dimitrov ’13 (edimitro) for probability theory
Edgar Dobriban ’12 (dobribanedgar) for probability theory
Max Rabinovich ’13 (mrabinov) for machine learning

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.