Nemirovski’s acceleration

I will describe here the very first (to my knowledge) acceleration algorithm for smooth convex optimization, which is due to Arkadi Nemirovski (dating back to the end of the 70’s). The algorithm relies on a $2$ -dimensional plane-search subroutine (which, in theory, can be implemented in $\log(1/\epsilon)$ calls to a first-order oracle). He later improved it to only require a $1$ -dimensional line-search in 1981, but of course the breakthrough that everyone knows about came a year after with the famous 1982 paper by Nesterov that gets rid of this extraneous logarithmic term altogether (and in addition is based on the deep insight of modifying Polyak’s momentum).

Let $f$ be a $1$ -smooth function. Denote $x^{+} = x - \nabla f(x)$ . Fix a sequence $(\lambda_t)_{t \in \N}$ , to be optimized later. We consider the “conjugate” point $\sum_{s =1}^t \lambda_s \nabla f(x_s)$ . The algorithm simply returns the optimal combination of the conjugate point and the gradient descent point, that is:

$x_{t+1} = \mathrm{argmin}_{x \in P_t} f(x) \, \text{where} \, P_t = \mathrm{span}\left(x_t^+, \sum_{s =1}^t \lambda_s \nabla f(x_s)\right) \,.$

Let us denote $g_s = \nabla f(x_s)$ and $\delta_s = f(x_s) - f(x^*)$ for shorthand. The key point is that $g_{t+1} \in P_t^{\perp}$ , and in particular $\|\sum_{s \leq t} \lambda_s g_s\|^2 = \sum_{s \leq t} \lambda_s^2 \|g_s\|^2$ . Now recognize that $\|g_s\|^2$ is a lower bound on the improvement $\delta_s - \delta_{s+1}$ (here we use that $x_{s+1}$ is better than $x_s^+$ ). Thus we get:

$\|\sum_{s \leq t} \lambda_s g_s\|^2 \leq \sum_{s \leq t} \lambda_s^2 (\delta_s - \delta_{s+1}) \leq \sum_{s \leq t} \delta_s (\lambda_s^2 - \lambda_{s-1}^2) \,.$

In other words if the sequence $\lambda$ is chosen such that $\lambda_s = \lambda_s^2 - \lambda_{s-1}^2$ then we get

$\|\sum_{s \leq t} \lambda_s g_s\|^2 \leq \sum_{s \leq t} \lambda_s \delta_s \,.$

This is good because roughly the reverse inequality also holds true by convexity (and the fact that $x_s \in P_s$ so $g_s \cdot x_s = 0$ ):

$\sum_{s \leq t} \lambda_s \delta_s \leq \sum_{s \leq t} \lambda_s g_s \cdot (x_s - x^*) \leq \|x^*\| \cdot \| \sum_{s \leq t} \lambda_s g_s\| \,.$

So finally we get $\sum_{s \leq t} \lambda_s \delta_s \leq \|x^*\|^2$ , and it just remains to realize that $\lambda_s$ is of order $s$ so that $\delta_t \leq \|x^*\|^2 / t^2$ .

This entry was posted in Optimization. Bookmark the permalink.

8 Responses to "Nemirovski’s acceleration"

By N June 19, 2019 - 10:32 pm

Hi Sebastien,
Do you have a source for Nemirovski’s results? I was not able to find either the late 70’s result, or the 1981 improvement you mention, on his webpages.

By Sebastien Bubeck June 20, 2019 - 12:23 pm

Here is the Russian paper (or part of it at least): https://blogs.princeton.edu/imabandit/wp-content/uploads/sites/122/2019/06/Nemirovski81_Russian.pdf and a discussion of it in English (by Nemirovski): https://blogs.princeton.edu/imabandit/wp-content/uploads/sites/122/2019/06/Nemirovski81_EnglishDiscussion.pdf

By Yichi June 18, 2019 - 12:19 pm

Hi, Sebastian, thanks for sharing that!

It seems that Nemirovski’s acceleration does not need to know how smooth the function f is in advance. But we should know it as a hyperparameter if we are using Nesterov’s acceleration. So the question is if it is possible to get an accelerated gradient descent algorithm without line search if we are not given the knowledge of the smooth parameter?

By Anonymous June 20, 2019 - 12:27 pm

With line search, there are many variants of Nesterov also does this, such as
this paper by Seb
https://arxiv.org/pdf/1506.08187.pdf

Without a line search, I suspect it is not possible?

By boojum January 11, 2019 - 3:09 pm

In the statement, where you write
x_{t+1} =_{x \in P_t} f(x) where $P_t$ is the span, is there a missing \min?

By Sebastien Bubeck January 11, 2019 - 3:56 pm

Yes “argmin” was missing, thanks!

By Sohail Bahmani January 9, 2019 - 2:39 pm

Hi Sebastien,

Just a typo: in the very last sentence the $g_s$ in the sum should be $\delta_s$.

By Sebastien Bubeck January 9, 2019 - 5:42 pm

Thanks Sohail, fixed!

By N June 19, 2019 - 10:32 pm

Hi Sebastien,
Do you have a source for Nemirovski’s results? I was not able to find either the late 70’s result, or the 1981 improvement you mention, on his webpages.
- By Sebastien Bubeck June 20, 2019 - 12:23 pm
  
  Here is the Russian paper (or part of it at least): https://blogs.princeton.edu/imabandit/wp-content/uploads/sites/122/2019/06/Nemirovski81_Russian.pdf and a discussion of it in English (by Nemirovski): https://blogs.princeton.edu/imabandit/wp-content/uploads/sites/122/2019/06/Nemirovski81_EnglishDiscussion.pdf
By Yichi June 18, 2019 - 12:19 pm

Hi, Sebastian, thanks for sharing that!

It seems that Nemirovski’s acceleration does not need to know how smooth the function f is in advance. But we should know it as a hyperparameter if we are using Nesterov’s acceleration. So the question is if it is possible to get an accelerated gradient descent algorithm without line search if we are not given the knowledge of the smooth parameter?
- By Anonymous June 20, 2019 - 12:27 pm
  
  With line search, there are many variants of Nesterov also does this, such as
  this paper by Seb
  https://arxiv.org/pdf/1506.08187.pdf
  
  Without a line search, I suspect it is not possible?
By boojum January 11, 2019 - 3:09 pm

In the statement, where you write
x_{t+1} =_{x \in P_t} f(x) where $P_t$ is the span, is there a missing \min?
- By Sebastien Bubeck January 11, 2019 - 3:56 pm
  
  Yes “argmin” was missing, thanks!
By Sohail Bahmani January 9, 2019 - 2:39 pm

Hi Sebastien,

Just a typo: in the very last sentence the $g_s$ in the sum should be $\delta_s$.
- By Sebastien Bubeck January 9, 2019 - 5:42 pm
  
  Thanks Sohail, fixed!

Oct	JAN	Apr
	23
2019	2021	2022

Nemirovski’s acceleration

8 Responses to "Nemirovski’s acceleration"

By N June 19, 2019 - 10:32 pm

By Sebastien Bubeck June 20, 2019 - 12:23 pm

By Yichi June 18, 2019 - 12:19 pm

By Anonymous June 20, 2019 - 12:27 pm

By boojum January 11, 2019 - 3:09 pm

By Sebastien Bubeck January 11, 2019 - 3:56 pm

By Sohail Bahmani January 9, 2019 - 2:39 pm

By Sebastien Bubeck January 9, 2019 - 5:42 pm

Leave a reply

Archives

Categories

Recent Posts

Subscribe to Blog via Email

Meta

Blogroll