**Professor Richard Ehrenborg**

**A theorem by Baxter**

**Wednesday, May 1st at 4:30pm**

**Fine Hall 214**

**Your choice of Large-Sized Boba (we’re going all out this week) or Regular-Sized Boba Fett**

**We discuss a theorem by Baxter from 1966 on planar graphs.**

By reformulating the theorem we make an incursion into topology

and prove analogous results in higher dimensions. A connection

with Sperner’s Lemma is also highlighted.

Joint work with Gábor Hetyei.

By reformulating the theorem we make an incursion into topology

and prove analogous results in higher dimensions. A connection

with Sperner’s Lemma is also highlighted.

Joint work with Gábor Hetyei.

Who?

**Dr. Nadav Cohen**

**Analyzing Optimization in Deep Learning via Trajectories**

**Friday, May 3rd at 4:30pm**

**Fine Hall 214**

**Your choice of Large-Sized Boba (we weren’t kidding above about going all out this week) or Post-Sarlacc Pit Boba Fett**

**The prominent approach for analyzing optimization in deep learning is based on the geometry of loss landscapes. While this approach has led to successful treatments of shallow (two layer) networks, it suffers from inherent limitations when facing deep (three or more layer) models. In this talk I will argue that a more refined perspective is in order, one that accounts for the specific trajectories taken by the optimizer. I will then demonstrate a manifestation of the latter approach, by analyzing the trajectories of gradient descent over arbitrarily deep linear neural networks. We will derive what is, to the best of my knowledge, the most general guarantee to date for efficient convergence to global minimum of a gradient-based algorithm training a deep network. Moreover, in stark contrast to conventional wisdom, we will see that sometimes, gradient descent can train a deep linear network faster than a classic linear model. In other words, depth can accelerate optimization, even without any gain in expressiveness, and despite introducing non-convexity to a formerly convex problem.**