Guest post by Julien Mairal: A Kernel Point of View on Convolutional Neural Networks, part I

I (n.b., Julien Mairal) have been interested in drawing links between neural networks and kernel methods for some time, and I am grateful to Sebastien for giving me the opportunity to say a few words about it on his blog. My initial motivation was not to provide another “why deep learning works” theory, but simply to encode into kernel methods a few successful principles from convolutional neural networks (CNNs), such as the ability to model the local stationarity of natural images at multiple scales—we may call that modeling receptive fields—along with feature compositions and invariant representations. There was also something challenging in trying to reconcile end-to-end deep neural networks and non-parametric methods based on kernels that typically decouple data representation from the learning task.

The main goal of this blog post is then to discuss the construction of a particular multilayer kernel for images that encodes the previous principles, derive some invariance and stability properties for CNNs, and also present a simple mechanism to perform feature learning in reproducing kernel Hilbert spaces. In other words, we should not see any intrinsic contradiction between kernels and representation learning.

Preliminaries on kernel methods

Given data living in a set $\mathcal{X}$ , a positive definite kernel $K: \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ implicitly defines a Hilbert space $\mathcal{H}$ of functions from $\mathcal{X}$ to $\mathbb{R}$ , called reproducing kernel Hilbert space (RKHS), along with a mapping function $\varphi: \mathcal{X} \to \mathcal{H}$ .

A predictive model $f$ in $\mathcal{H}$ associates to every point $x$ a label in $\mathbb{R}$ , and admits a simple form $f(x) =\langle f, \varphi(x) \rangle_{\mathcal{H}}$ . Then, Cauchy-Schwarz inequality gives us a first basic stability property

$\forall x, x'\in \mathcal{X},~~~~~ |f(x)-f(x')| \leq \|f\|_{\mathcal{H}} \| \varphi(x) - \varphi(x')\|_\mathcal{H}.$

This relation exhibits a discrepancy between neural networks and kernel methods. Whereas neural networks optimize the data representation for a specific task, the term on the right involves the product of two quantities where data representation and learning are decoupled:

$\|\varphi(x)-\varphi(x')\|_\mathcal{H}$ is a distance between two data representations $\varphi(x),\varphi(x')$ , which are independent of the learning process, and $\|f\|_\mathcal{H}$ is a norm on the model $f$ (typically optimized over data) that acts as a measure of complexity.

Thinking about neural networks in terms of kernel methods then requires defining the underlying representation $\varphi(x)$ , which can only depend on the network architecture, and the model $f$ , which will be parametrized by (learned) network’s weights.

Building a convolutional kernel for convolutional neural networks

Following Alberto Bietti’s paper, we now consider the direct construction of a multilayer convolutional kernel for images. Given a two-dimensional image $x_0$ , the main idea is to build a sequence of “feature maps” $x_1,x_2,\ldots$ that are two-dimensional spatial maps carrying information about image neighborhoods (a.k.a receptive fields) at every location. As we proceed in this sequence, the goal is to model larger neighborhoods with more “invariance”.

Formally, an input image $x_0$ is represented as a square-integrable function in $L^2(\Omega,\mathcal{H}_0)$ , where $\Omega$ is a set of pixel coordinates, and $\mathcal{H}_0$ is a Hilbert space. $\Omega$ may be a discrete grid or a continuous domain such as $\mathbb{R}^2$ , and $\mathcal{H}_0$ may simply be $\mathbb{R}^3$ for RGB images. Then, a feature map $x_k$ in $L^2(\Omega,\mathcal{H}_k)$ is obtained from a previous layer $x_{k-1}$ as follows:

modeling larger neighborhoods than in the previous layer: we map neighborhoods (patches) from $x_{k-1}$ to a new Hilbert space $\mathcal{H}_k$ . Concretely, we define a homogeneous dot-product kernel between patches $z, z'$ from $x_{k-1}$ :
$K_k(z,z') = \|z\| \|z'\| \kappa_k \left( \left\langle \frac{z}{\|z\|}, \frac{z'}{\|z'\|} \right\rangle \right),$

where $\langle . , . \rangle$ is an inner-product derived from $\mathcal{H}_{k-1}$ , and $\kappa_k$ is a non-linear function that ensures positive definiteness, e.g., $\kappa_k(\langle u,u'\rangle ) = e^{\alpha (\langle u,u'\rangle -1)} = e^{-\frac{\alpha}{2}\|u-u'\|^2}$ for vectors $u, u'$ with unit norm, see this paper. By doing so, we implicitly define a kernel mapping $\varphi_k$ that maps patches from $x_{k-1}$ to a new Hilbert space $\mathcal{H}_k$ . This mechanism is illustrated in the picture at the beginning of the post, and produces a spatial map that carries these patch representations.
increasing invariance: to gain invariance to small deformations, we smooth~ $x_{k-1}$ with a linear filter, as shown in the picture at the beginning of the post, which may be interpreted as anti-aliasing (in terms of signal processing) or linear pooling (in terms of neural networks).

Formally, the previous construction amounts to applying operators $P_k$ (patch extraction), $M_k$ (kernel mapping), and $A_k$ (smoothing/pooling operator) to $x_{k-1}$ such that the $n$ -th layer representation can be written as

$\Phi_n(x_0)= x_n= A_n M_n P_n \ldots A_1 M_1 P_1 x_0~~~\text{in}~~~~L^2(\Omega,\mathcal{H}_n).$

We may finally define a kernel for images as $\mathcal{K}_n(x_0,x_0')=\langle \Phi_n(x_0), \Phi_n(x_0') \rangle$ , whose RKHS contains the functions $f_w(x_0) = \langle w , \Phi_n(x_0) \rangle$ for $w$ in $L^2(\Omega,\mathcal{H}_n)$ . Note now that we have introduced a concept of image representation $\Phi_n$ , which only depends on some network architecture (amounts of pooling, patch size), and predictive model $f_w$ parametrized by $w$ .

From such a construction, we will now derive stability results for classical convolutional neural networks (CNNs) and then derive non-standard CNNs based on kernel approximations that we call convolutional kernel networks (CKNs).

Next week, we will see how to perform feature (end-to-end) learning with the previous kernel representation, and also discuss other classical links between neural networks and kernel methods.

3 Responses to "Guest post by Julien Mairal: A Kernel Point of View on Convolutional Neural Networks, part I"

By PERİODENT January 17, 2021 - 10:09 pm

Hello
thank you so much for article.We recommend you to take a look at our article. Diş teli fiyatları https://www.periodent.net/dis-teli-fiyatlari/
By AkvaDent January 9, 2021 - 3:41 pm

Hello, I wish you continued success, which has been a very impressive article. We are a dental clinic serving in the health field. We recommend you to take a look at our article https://www.akvadent.com/dis-teli-fiyatlari/.
By MICHAEL FROM NORTHWESTERN UNI. January 7, 2021 - 9:17 am

Hello
thank you so much for article.

I traveled from USA to Turkey to get my dental implants done and well known implant brands done in competitive price.
https://www.goldensmile.com.tr/tedavilerimiz/implant-tedavisi/

Oct	JAN	Apr
	20
2019	2021	2022

Guest post by Julien Mairal: A Kernel Point of View on Convolutional Neural Networks, part I

3 Responses to "Guest post by Julien Mairal: A Kernel Point of View on Convolutional Neural Networks, part I"

By PERİODENT January 17, 2021 - 10:09 pm

By AkvaDent January 9, 2021 - 3:41 pm

By MICHAEL FROM NORTHWESTERN UNI. January 7, 2021 - 9:17 am

Leave a reply

Archives

Categories

Recent Posts

Subscribe to Blog via Email

Meta

Blogroll