I (n.b., Julien Mairal) have been interested in drawing links between neural networks and kernel methods for some time, and I am grateful to Sebastien for giving me the opportunity to say a few words about it on his blog. My initial motivation was not to provide another “why deep learning works” theory, but simply to encode into kernel methods a few successful principles from convolutional neural networks (CNNs), such as the ability to model the local stationarity of natural images at multiple scales—we may call that modeling receptive fields—along with feature compositions and invariant representations. There was also something challenging in trying to reconcile end-to-end deep neural networks and non-parametric methods based on kernels that typically decouple data representation from the learning task.
The main goal of this blog post is then to discuss the construction of a particular multilayer kernel for images that encodes the previous principles, derive some invariance and stability properties for CNNs, and also present a simple mechanism to perform feature learning in reproducing kernel Hilbert spaces. In other words, we should not see any intrinsic contradiction between kernels and representation learning.
Preliminaries on kernel methods
Given data living in a set , a positive definite kernel
implicitly defines a Hilbert space
of functions from
to
, called reproducing kernel Hilbert space (RKHS), along with a mapping function
.
A predictive model in
associates to every point
a label in
, and admits a simple form
. Then, Cauchy-Schwarz inequality gives us a first basic stability property
This relation exhibits a discrepancy between neural networks and kernel methods. Whereas neural networks optimize the data representation for a specific task, the term on the right involves the product of two quantities where data representation and learning are decoupled:
is a distance between two data representations
, which are independent of the learning process, and
is a norm on the model
(typically optimized over data) that acts as a measure of complexity.
Thinking about neural networks in terms of kernel methods then requires defining the underlying representation , which can only depend on the network architecture, and the model
, which will be parametrized by (learned) network’s weights.
Building a convolutional kernel for convolutional neural networks
Following Alberto Bietti’s paper, we now consider the direct construction of a multilayer convolutional kernel for images. Given a two-dimensional image , the main idea is to build a sequence of “feature maps”
that are two-dimensional spatial maps carrying information about image neighborhoods (a.k.a receptive fields) at every location. As we proceed in this sequence, the goal is to model larger neighborhoods with more “invariance”.
Formally, an input image is represented as a square-integrable function in
, where
is a set of pixel coordinates, and
is a Hilbert space.
may be a discrete grid or a continuous domain such as
, and
may simply be
for RGB images. Then, a feature map
in
is obtained from a previous layer
as follows:
- modeling larger neighborhoods than in the previous layer: we map neighborhoods (patches) from
to a new Hilbert space
. Concretely, we define a homogeneous dot-product kernel between patches
from
:
where
is an inner-product derived from
, and
is a non-linear function that ensures positive definiteness, e.g.,
for vectors
with unit norm, see this paper. By doing so, we implicitly define a kernel mapping
that maps patches from
to a new Hilbert space
. This mechanism is illustrated in the picture at the beginning of the post, and produces a spatial map that carries these patch representations.
- increasing invariance: to gain invariance to small deformations, we smooth~
with a linear filter, as shown in the picture at the beginning of the post, which may be interpreted as anti-aliasing (in terms of signal processing) or linear pooling (in terms of neural networks).
Formally, the previous construction amounts to applying operators (patch extraction),
(kernel mapping), and
(smoothing/pooling operator) to
such that the
-th layer representation can be written as
We may finally define a kernel for images as , whose RKHS contains the functions
for
in
. Note now that we have introduced a concept of image representation
, which only depends on some network architecture (amounts of pooling, patch size), and predictive model
parametrized by
.
From such a construction, we will now derive stability results for classical convolutional neural networks (CNNs) and then derive non-standard CNNs based on kernel approximations that we call convolutional kernel networks (CKNs).
Next week, we will see how to perform feature (end-to-end) learning with the previous kernel representation, and also discuss other classical links between neural networks and kernel methods.
By PERİODENT January 17, 2021 - 10:09 pm
Hello
thank you so much for article.We recommend you to take a look at our article. Diş teli fiyatları https://www.periodent.net/dis-teli-fiyatlari/
By AkvaDent January 9, 2021 - 3:41 pm
Hello, I wish you continued success, which has been a very impressive article. We are a dental clinic serving in the health field. We recommend you to take a look at our article https://www.akvadent.com/dis-teli-fiyatlari/.
By MICHAEL FROM NORTHWESTERN UNI. January 7, 2021 - 9:17 am
Hello
thank you so much for article.
I traveled from USA to Turkey to get my dental implants done and well known implant brands done in competitive price.
https://www.goldensmile.com.tr/tedavilerimiz/implant-tedavisi/