Probability theory is one of the more recently established fields of mathematics, which is currently a very active area of research. It has found numerous applications in mathematical modeling for computer science, finance, statistical mechanics, dynamical systems, bioinformatics etc. Topics in probability theory include martingale theory, stochastic processes, stochastic calculus, ergodicity and others. Probability theory has drawn from many other areas of mathematics such as measure theory, integration theory, representation theory, real and complex analysis and in return has provided some interesting insights into algebraic structures, geometry of graphs, differential geometry, ergodic theory, PDE theory and sampling algorithms.
Probability Theory Courses [Show]Probability Theory Courses [Hide]
The following are MAT and ORF courses covering aspects of probability theory.
ORF/MAT 309: Probability and Stochastic Systems
The course goes over many topics in probability theory, without going too much in depth in any particular one. It begins with the basic properties of mathematical expectation and variance, with an emphasis on standard discrete and continuous distributions. For the second half of the semester the course discusses Poisson processes, Gaussian processes, Brownian motion, and Markov chains.
The course provides a firm understanding of the basics of probability theory and is intended primarily for students who have not had much experience with the subject or rigorous mathematical proofs. It is the basic mathematical course taken by engineers, specifically ORFE majors, who intend to work with stochastic modeling and simulations. The course is extremely helpful if one is to take courses in mathematical finance, artificial intelligence, stochastic calculus, asset/commodity pricing etc.
ORF/MAT 309 compensates its lack in rigor with a lot of intuition, so that results do not simply appear out of thin air but are logically motivated and well supported. Many students find the course to be especially challenging as the amount of material covered is so large. ORF/MAT 309 focuses on developing ideas and methods for solving problems as opposed to carefully building theory and as such is an exceptionally fun course to take.
About eighty percent of the course grade is based on two midterms and a final, which are all in class. In addition, there are ten weekly problem sets and a large “Big Homework”, which is written during winter break and is about three times as long as previous homework sets.
The course is extremely well organized, following for the most part Professor Cinlar’s notes. Problem sets focus on improving the understanding of the material. As one of the larger courses in the ORFE department it has many TAs, which regularly host problem sessions and precepts that one can take advantage of.
MAT 385: Probability Theory
The course goes over several topics in probability theory in a somewhat deep manner. Initially students are introduced to the basic properties of mathematical expectation and variance, distribution functions and characteristic functions. The course then goes on to random walks on the integers, reflection principles, laws of large numbers, central limit theorems and laws of large deviations.
The course provides a good introduction to various topics in probability theory, with a special treatment of convergence results. It is intended for math majors who are interested in learning more about probability theory at a deeper level. It is part of a two-course sequence, the second course focusing on stochastic processes. MAT 385 encourages students to think about probability theory in a more precise manner, giving a fairly careful treatment of the material.
Homework sets aim to improve the student’s understanding of the theory and rarely go beyond the application of a couple of results. There are weekly problem sets that account for fifty percent of your grade and there are problem sessions a couple of days before each problem sets is due, where students go over the basic ideas behind each question.
What distinguishes this course with most others in the mathematics department is the format of the final. Several problems are assigned during the last lecture and students have two weeks to solve them and write up their solutions. They show their solutions to the examiners, who then continue to ask questions regarding the problems as well as any material covered in the course. This verbal/written examination is certainly an interesting trait of the course that makes it a bit challenging near the end. In addition, Professor Sinai assigns a couple of problems, which if solved would exempt the student from taking the final exam.
The course is quite unorthodox and interesting. During the semester it is not exceptionally challenging as problem sets are manageable and one can usually ask Professor Sinai or go to the problem sessions if stuck on a problem.
MAT 486: Random Processes
This course is the second part of the math department’s two-semester probability sequence. It is similar in organization and presentation to the first part, MAT 385: Probability, as it was developed and is usually taught by Prof. Yakov Sinai. MAT 486 covers a larger number of topics than 385, which are for most part deeper and harder than those in 385. You should consider taking it if you liked 385, and are interested in probability or a closely related field.
The contents of the course are centered around the notion of random processes, which are collections of random variables indexed by some set, and thus include, for instance, random sequences, random functions etc. The course covers the notion of stationarity, in which case representation theorems (spectral representation), and ergodic theorems (Birkhoff, von Neumann) are proved. Random fields, and an example of phase transition are described. Extending the notions from 385, Markov Chains with continuous time are introduced. The Wiener measure and Brownian Motion are constructed as the weak limit of random walks. These are applied to the solution of the Dirichlet problem, which you might have encountered in Fourier Analysis. Finally, stochastic integrals are constructed, and some famous formulas (Ito, Feynman-Kac) are proved. The course ends with a discussion of Stochastic Differential Equations and diffusion processes.
Generally speaking, the course covers much more, and in more depth than 385. There are many notions and theorems from analysis that come up, and are usually taken for granted: for instance, results from functional analysis, Hilbert spaces, Fourier analysis are used quite liberally. Probability has very strong ties to analysis, and it shows. It is usually possible to understand the probability results without extensive knowledge of those fields, however for a good appreciation, it helps to know them.
The organization of the class parallels that of 385, and the same pieces of advice apply. Go to the ‘office hour’, where the homework is discussed. If you have time, try the ‘hard problems’ that Professor Sinai assigns for an automatic A. There are usually a few students each year who solve them. But be prepared that the first hard part is understanding what the problems really ask, since they can be less than clear. The exam is oral, as in 385: you get 5 problems and a month to think about them. You are also allowed to use your notes and book. If you have solved most of the HW problems by yourself, you should be able to do well. Also, be prepared to be asked a large number of additional questions at the actual examination. These are usually routine computations, definitions, or proofs of theorems. However, in some cases it might be a little hard to understand what the questions are.
To sum up, MAT 486 is a more challenging continuation of 385. If you liked the latter, and are seriously interested in probability or a field where it’s applied, you should consider taking 486.
ORF 526: Probability Theory
The course covers most of the basics of probability theory, from a measure-theoretic point of view. It begins with the development of measure theory and Lebesgue integration. Most of the basic results from real analysis are covered, including developing integration from scratch, Fatou’s lemma, monotone and dominated convergence theorems etc. It then goes on to the general results in measure theory such as the Caratheodory extension theorem, Radon-Nikodym theorem, Fubini’s theorem etc. It especially focuses on different types of convergence and the connections between them. The course then goes on to Gaussian processes and the analysis of discrete martingale theory, including Doob’s inequalities and convergence results. Finally, the course quickly goes over the main facts behind Markov chains, including the Ergodic theorem and stationarity.
The course is intended for first year graduate students in the ORFE department although it is an excellent course for people interested in probability theory in general. The treatment of the material is extremely careful and well developed. For the most part the course follows the graduate text by Shiryaev, which is an excellent but challenging read. The theory on Markov chains follows mostly the introductory book by Norris.
There are weekly problem sets that account for about half the grade in the course. Most of them aim to improve student understanding of the theory and can be quite challenging. Lectures are usually spent proving results, with very little discussion of applications. As such the course is extremely theoretical in nature and its aim is to provide students with a firm basis for upper-level probability courses such as ORF 527 (Stochastic calculus) and ORF 551/4 (Levy processes/ Markov processes).
The course is exceptionally challenging for people who have not taken real analysis before, especially in the first few weeks. A good understanding of the material and constructions is necessary for completing problem sets as well as the final exam.
Machine learning (ML) is a sub-field of artificial intelligence that lies at the intersection of computer science, statistics, and probability theory. In broad terms, it concerns the design, implementation, analysis, and application of algorithms that can ‘learn from experience.’ In the last few decades, it has advanced extremely rapidly and found applications to a wide range of problems including analysis of language (“natural language processing,” or NLP) and linguistics, modeling of genetic variation, quantitative modeling in the social sciences, spam filtering, automated discovery of identity theft, and many more.
While ML is a complex and wide field, it can be roughly split in two parts: (1) prediction (or, more correctly, “supervised” ML); and (2) modeling (or, more correctly, “unsupervised” ML). Prediction problems require that one predict the value of some variable in the future given many observations of it in the past; forecasting the value of a stock given its value (and market conditions) over the course of a year is an example. One way to think about modeling problems is as prediction problems in which the variable to be predicted has not been observed at all; forecasting the value of a stock just from market conditions, without ever seeing its previous values, would be an example of a modeling problem. In general, unsupervised problems concern the discovery and exploitation of hidden patterns in data: the canonical example is the problem of collecting together similar images in a large database (“clustering”).
From a math major’s perspective, either or both of the theoretical and applications-oriented sides of ML might be very interesting. Theoretical problems can involve the analysis of specific prediction algorithms, either from a computer science or statistical perspective; the analysis and design of methods of “inference,” the key algorithmic step in modeling that estimates important unobserved quantities from observed data; or connection of machine learning problems with ideas from information theory, probability theory, or other theoretical—and sometimes application—domains. Applications-oriented problems, a catch-all category that includes any problem that ultimately concerns the analysis of real data, show a similar degree of variety. They can involve the deployment of existing methods to interesting data analysis problems; the design of novel methodologies that allow for new kinds of analyses; the use of unsupervised methods to solve traditionally supervised problems; and a range of other research activities. It should be noted that ML research enjoys a rich and fruitful interplay between theory and practice, so that theoretical projects can often include an empirical component and applied projects often require understanding relevant theory—and sometimes extending it!
Machine Learning at Princeton [Show]Machine Learning at Princeton [Hide]
At Princeton, ML research occurs principally in the Departments of Computer Science (COS) and of Operations Research and Financial Engineering (ORF). In COS, the two main faculty members associated with ML are: Professor David Blei, whose research focuses on unsupervised problems and encompasses both theoretical work, particularly in inference, and applied work, particularly in text analysis problems; and Professor Rob Schapire, whose research focuses on algorithms for supervised problems, particularly “boosting methods,” an important class of algorithms he co-invented. While reading these brief summaries of their research areas, keep in mind that they are only approximations: both have wider interests and have worked on a wide range of problems. There is also increasing interest in ML by the theoretical computer science community; at Princeton, this line of research is being led by Professor Sanjeev Arora, whose research generally focuses on complexity theory and the theory of algorithms. In ORF, there are several faculty members whose research can be classed as ML. For details, you should consult the ORF website. A good starting list of faculty is: Professors Sébastien Bubeck, Jianqing Fan, Warren Powell, Phillipe Rigollet, and Ramon van Handel.
Machine Learning Courses [Show]Machine Learning Courses [Hide]
Course offerings in ML generally parallel research, with a mix of relevant COS and ORF classes.
The relevant COS courses are:
COS 402: Artificial Intelligence
This is Professor Rob Schapire’s survey course on modern artificial intelligence. ML is a major component of the course. Two units in the course examine in some detail algorithms for “probabilistic inference,” a key part of ML that allows quantities that are never directly observed to be estimated from observed data. This is only done for a few fundamental models, but the material presented is foundational and useful. Additionally, a long unit at the end of the course focuses on the prediction problems and introduces numerous prediction algorithms, including Professor Schapire’s own AdaBoost.
COS 424: Interacting with Data
This is Professor David Blei’s survey course on statistical machine learning. The course builds up the edifice of ML from the ground up, starting with a brief review of basic probability theory and ultimately analyzing several important classes of modeling problems, as well as their applications to text, image, and other data analysis. The course culminates with a month-long group data analysis project that gives students hands-on experience with ML.
COS 513: Foundations of Probabilistic Modeling
This is Professor David Blei’s advanced undergraduate– or graduate-level survey course on probabilistic modeling, an approach to ML problems that articulates precise assumptions about the probabilistic relationships between variables of interest and exploits these (via inference) to discover and make predictions about structure in data. The courses covers the foundational theory of probabilistic models, the most important algorithms for inference, and the properties of several representative classes of probabilistic models, along with their applications.
COS 597X: Advanced Topics in Computer Science
In some years, this family of graduate-level courses includes one or more on ML. Recent examples include Professor David Blei’s course on Nonparametric Bayesian Methods, Professor Rob Schapire’s course on Boosting, and Professor Sanjeev Arora’s course on provable methods in ML (Is Learning Easy?).
Some of the relevant ORF courses are:
ORF 418: Optimal Learning
ORF 505/FIN 505: Modern Regression and Time Series
ORF 524: Statistical Theory and Methods
ORF 525: Generalized Regression Models
ORF 525: Statistical Learning and Nonparametric Estimation
Evgeni Dimitrov ’13 (edimitro[at]princeton[dot]edu) for probability theory
Edgar Dobriban ’12 (dobribanedgar[at]gmail[dot]com) for probability theory
Max Rabinovich ’13 (mrabinov[at]princeton[dot]edu) for machine learning