A couple of months ago I taught an introduction to statistical learning theory. I took inspiration from two very good introductory books on SLT: “Foundations of ML”, and “Understanding Machine Learning: From Theory to Algorithms”. I also covered some classical results about nearest neighbors from the seminal book “A Probabilistic Theory of Pattern Recognition”. The lectures were recorded and I put the videos on the youtube channel. Sadly I still did not learn the basic lesson to stick to a black pen when teaching on a whiteboard, so at (many) times it is hard to read what I write. I also got sick after the first lecture so my voice is quite weak in the following lectures, making it hard to hear too… Despite these fatal flaws some people still found the material useful, so check it out if you are interested. Below I give a rough syllabus for the lectures, and I also give some notes on “algorithmic generalization” which was in my opinion the highlight of the lectures (you can also check my earlier notes on SLT here and here).

**Syllabus**

Lecture 1: Introduction to the statistical learning theory framework, its basic question (sample complexity) and its canonical settings (linear classification, linear regression, logistic regression, SVM, neural networks). Two basic methods for learning: (i) Empirical Risk Minimization, (ii) Nearest neighbor classification.

Lecture 2: Uniform law of large numbers approach to control the sample complexity of ERM (includes a brief reminder of concentration inequalities). Application: analysis of bounded regression (includes the non-standard topic of type/cotype and how it relates to different regularizations such as in LASSO).

Lecture 3: Reminder of the first two lectures and relation with the famous VC dimension. How to generalize beyond uniform law of large numbers: stability and robustness approaches (see below).

Lecture 4: How to generalize beyond uniform law of large numbers: information theoretic perspective (see below), PAC-Bayes, and online learning. Brief discussion of margin theory, and an introduction to modern questions in robust machine learning.

**Some notes on algorithmic generalization**

Let be input/output spaces. Let be a loss function, a probability measure supported on , and a learning rule (in words takes as input a dataset of examples, and output a mapping from -inputs to -outputs). With a slight abuse of notation, for and , we write . We define the generalization of on by:

In words, if then we expect the empirical performance of the learned classifier to be representative of its performance on a fresh out-of-sample data point, up to an additive . The whole difficulty of course is that the empirical evaluation is done with the *same *dataset that is used for training, leading to non-trivial dependencies. We should also note that in many situations one might be interested in the two-sided version of the generalization, as well as high probability bounds instead of bounds in expectation. For simplicity we focus on here.

The most classical approach to controlling generalization, which we covered in details in previous notes, is via uniform law of large numbers. More precisely assuming that the range of the learning rule is some hypothesis class one trivially has

However this approach might be too coarse when the learning rule is searching through a potentially huge space of hypothesis (such as in the case of neural networks). Certainly such uniform bound has no chance of explaining why neural networks with billions of parameters would generalize with a data set of merely millions of examples. For this one has to use *algorithm-based* arguments.

**Stability**

The classical example of algorithmic generalization is due to Bousquet and Elisseeff 2002. It is a simple rewriting of the generalization as a *stability* notion:

where . This viewpoint can be quite enlightening. For example in the uniform law of large numbers view, regularization enforces small capacity, while in the stability view we see that regularization ensures that the output hypothesis is not too brittle (this was covered in some details in the previous notes).

**Robustness**

The next approach I would like to discuss is related to deep questions about current machine learning methods. One of the outstanding problem in machine learning is that current algorithms are not robust to even mild shift of distribution at test time. Intuitively this lack of robustness seem to indicate a lack of generalization. Can we formalize this intuition? I will now give one such formal link between robustness and generalization due to Xu and Mannor 2010, which shows the reverse direction (robustness implies generalization). At some level robustness can be viewed as a “stability at test time” (while in Bousquet and Elisseeff we care about “stability at training time”).

Xu and Mannor define -robustness as follows: assume that can be partitioned into sets such that if and are in the same set then

A good example to have in mind would be a binary classifier with large margin, in which case corresponds to the covering number of at the scale given by the margin. Another (related) example would be regression with a Lipschitz function. In both cases would be typically exponential in the dimension of . The key result of Xu and Mannor that we prove next is a generalization bound of order . In any situation of interest this seems to me to be a pretty weak bound, yet on the other hand I find the framework to be very pretty and it is of topical interest. I would be surprised if this was the end of the road in the space of “generalization and robustness”.

Theorem (Xu and Mannor 2010):

A -robust learning rule satisfies

**Proof:** Let and note that . Now one has for a robust :

It only remains to observe that

**Information theoretic perspective**

Why do we think that a lack of robustness indicate a lack of generalization? Well it seems to me that a basic issue could simply be that the dataset was *memorized* by the neural network (which be a *very* non-robust way to learn). If true then one could basically find all the information about the data in the weights of the neural network. Again, can we prove at least the opposite direction, that is if the output hypothesis does not retain much information from the dataset then it must generalize. This is exactly what Russo and Zou 2016, where they use the mutual information as a measure of the “information” retained by the trained hypothesis about the dataset. More precisely they show the following result:

Theorem (Russo and Zou 2016):

Note that here we have assumed that the codomain of the learning rule consists of deterministic maps from inputs to outputs, in which case the mutual information is simply the entropy . However the proof below also applies to the case where the codomain of the learning rule consists of probability measures, see e.g., Xu and Raginsky 2017. Let us now conclude this (long) post with the proof of the above theorem.

The key point is very simple: one can view generalization as a decoupling property by writing:

where .

Now the theorem follows straightforwardly (if one knows Hoeffding’s lemma) from an application of the following beautiful lemma:

Lemma:Let . Let be random variables in and , be mutually independent copies of and . Assume that is -subgaussian (i.e., ) then

**Proof:** The mutual information is equal to the relative entropy between the distribution of and the distribution of . Recall also the variational representation of the relative entropy which is that the map is the convex conjugate of the log-partition function . In particular one has a lower bound on the mutual information for any such which means:

Now it only remains to use the definition of subgaussianity, that is take , and optimize over .

## By https://maininpokeraja.splashthat.com/ January 14, 2021 - 11:31 pm

You could certainly see your skills in the work you

write. The world hopes for even more passionate writers such as

you who are not afraid to say how they believe.

All the time follow your heart.

## By great lakes news rwanda January 14, 2021 - 3:57 pm

You are underrated

## By pubg mobile January 14, 2021 - 3:02 pm

pubg mobile güncel hilelerle sorunsuz bir şekilde indirip oyunun keyfini çıkartın. Diğer oyun hileleri için de mutlaka siteyi ziyaret edebilirsiniz. Herkese tavsiyemdir.

## By food engineering January 14, 2021 - 3:39 am

Food engineering

## By download pkv games apk android January 13, 2021 - 7:16 am

Hey there, You’ve done a great job. I will definitely

digg it and personally suggest to my friends.

I’m confident they will be benefited from this web site.

## By sex January 12, 2021 - 9:51 am

Out-of-patents medication manufacturers often pay generic

manufacturers to delay entering the market in an effort to artificially preserve

the monopoly price, not directly and illegally lengthen their patent monopoly,

and keep medicine costs artificially high. However in mild of its waning patents all over the world

Pfizer’s new strategy for retaining its market share is

to sell Viagra directly to prospects through its webpage,

in direct competitors with the anticipated new generic opponents and the online pharmacies which have peddled

unauthorised generics and faux Viagra pills for years. Within the UK the Law Commission is conducting a review into the sharing on intimate

photos on-line, which incorporates deepfakes,

but it is predicted to take years till any modifications could be made.

Pfizer’s Viagra sums up the problem facing drug manufacturers

world wide: they need patents to secure a monopoly proper (for 20 years) to be able to recoup tens of

millions of funding in research and improvement (R&D) into

new medicine.

## By vnrcappadociatours.com January 11, 2021 - 6:40 am

Cappadocia Daily Tours, Private Cappadocia Tours.

## By Cappadocia January 11, 2021 - 6:40 am

Thanks a lot for the info.

## By balloon-rides.net January 11, 2021 - 6:38 am

Cappadocia Balloon Flights, Balloon Flight in Cappadocia.

## By wowcappadocia.com January 11, 2021 - 6:37 am

Cappadocia Tours, Cappadocia Daily Tours.

## By Arcade Hunter MOD APK January 9, 2021 - 12:16 pm

All download links of apps listed on APKFab.com are

from Google Play Store or submitted by users.

## By Rehberlik Araştırma Merkezi January 5, 2021 - 4:22 am

Rehberlik Araştırma Merkezi

## By diniplatform January 4, 2021 - 1:50 pm

Değerli okuyucular diniplatform sitesi bilgi alışverişinin ve paylaşımının yapıldığı

kitap,teknoloji,film ve dizi,eğitim,seyahatgibi kategorilerin ve daha fazlasının bulunduğu sosyal bir platformdur.diniplatform.## By Modern Women January 4, 2021 - 12:11 pm

Good site you have got here.. It’s hard to find excellent writing like yours these days.

I truly appreciate individuals like you! Take care!!

## By authorstream.com January 2, 2021 - 5:14 am

He’s on our present you your relationships career money well

being love life could be. Marriage associated issues well being issues that will present you extra about your life producing

it a lot. Direct your life No matter twists and i can assure you can still

obtain readings. The huge demand regardless of the whole lot nevertheless a previous life

unfold gives examples of previous life. Do is open up your

mind and your coronary heart and within the case that the talent

level. Just bear in mind your marriage events distinctive

memorable and in accordance to. I went to

see if there may probably be an knowledgeable

in additional. A session you see what’s. Completely

free psychic session may be associated/interpreted as A.r.m.y’s assist for BTS and how they

actually really feel. Whatever it’s that you

just get one reply on your question for the free psychic.

Show these techniques take pleasure in several advantages out

of it as a crystal ball can get free readings. Love actually

is a voice communication software that allows you to get

additional solutions.

## By agen domino January 1, 2021 - 11:20 am

magnificent issues altogether, you just gained a brand new

reader. What would you recommend about your post that you just

made a few days ago? Any certain?

## By Rhoda January 1, 2021 - 4:50 am

obviously like your web site but you need to test the spelling

on several of your posts. A number of them are rife with spelling issues and I find it very troublesome

to tell the truth however I will definitely come back again.

## By Best Humminbird Helix 10 December 30, 2020 - 1:00 pm

The fishing guide is a person who takes the person on the boat and then takes them on tour of fishing and teaches everything regarding the fisheries. Fishing guide are experts that train you in fishing and they teach the people from basics of fishing like the hook to bait and how to catch fish.

## By scr888 kiosk December 29, 2020 - 9:38 pm

Your online business shoukd be built around a arena. Link Exchanges

are essential forr getting good page rank from search.

So, the first step plays an honest hee much too.

## By SEO December 29, 2020 - 2:08 pm

I like it. Great content. Thank you.