5_uncounstrined_min - Obsidian Publish

# Unconstrained Minimization Boyd. Chapter 9. [Slides](https://cseweb.ucsd.edu/classes/wi23/cse203B-a/slides/lec9_optalgor.pdf) ## 1 Intro ## 2 Taylor's Expansion ## 3 Descent Methods Pro: Each step is simple Con: Accuracy degrades if consider only lower-oder terms Taylor expansion $\begin{align*} f(x)&= f(x_0)+\nabla f(x_0)^T(x-x_0)+ \frac{1}{2}(x-x_0)^T \nabla^2f(x_0)(x-x_0)\\ x_{k+1}&= x_k-t \nabla f(x_k) \end{align*}$ Limitations: - Unit doesn't match - Direction change with affine transformation - $f(x)$ keeps dropping as long as $\nabla f(x)\neq0$ and $t$ is small enough $\begin{align*} f(x_{k+1})=f(x_k)+\nabla f() \end{align*}$ - Convergence rate: $\begin{align*} \epsilon &= f(x_{k+1})-p^*\leq(1- \frac{m}{n})^k(f(x_0-p^*))\\ k&\propto \frac{\log((f(x_0)-p^*)/\epsilon)}{\log(1/(1-\frac{m}{M}))}\approx \frac{M}{m}\log\left((f(x_0)-p^*)/\epsilon\right) \end{align*}$ ## 4 Newton Methods Recall that descent method only consider up to the 1st order of Taylor's expansion. Now, we consider **Newton method**, which uses the approximation of 2nd order Taylor's expansion: $\begin{align*} f(x+v) &\approx f(x)+\nabla f(x)^T v+\frac{1}{2} v^T \nabla^2 f(x) v\\ \nabla_vf(x+v) &\approx \nabla f(x)+\nabla^2 f(x)v=0\\ \Longleftrightarrow v&= -\nabla^2 f(x)^{-1} \nabla f(x)\\ \\ f(x+v)&= f(x)+(-1) \nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x)+ \\ &\quad\frac{1}{2} \nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x)\\ &= f(x) \underset{}{\underbrace{ -\frac{1}{2} \nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x) }} \end{align*}$ The procedure: Input: $x\in\textbf{dom}f, \epsilon>0$ Repeat: 1. $\Delta x_{n t}:=-\nabla^2 f(x)^{-1} \nabla f(x),\; \lambda^2(x)=\nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x)$ 2. Quit if $\lambda^2/2\leq\epsilon$ 3. Line search $t$ 4. $x\coloneqq x+t\Delta x_{nt}$ #note Newton's approach finds a root of an equation, as opposed to finding the minimum of a function like in descent method. ### Line search $\begin{align*} v=-t \nabla^2f(x)^{-1} \nabla \end{align*}$ ### Convergence analysis Assumptions: - $S=\left\{x \in \operatorname{dom} f \mid f(x) \leq f\left(x_o\right)\right\}$. - $f$ is strongly convex on $S$ with constant $m$, s.t. $\nabla^2 f(x)\geq mI, \;\forall x\in S$ - $\nabla^2f$ is Lipschitz continuous on $S$ with constant $L$, i.e. $\begin{align*} \left\|\nabla^2 f(x)-\nabla^2 f(y)\right\|_2 \leq L\|x-y\|_2 \end{align*}$ Outlines: $\exists \eta \in\left(0, m^2 / L\right)$, there are 2 cases: 1. Damped Newton phase ($t<1$) $\begin{align*} &\Vert \nabla f(x) \Vert_2\geq \eta\\ \Longleftrightarrow & f\left(x^{k+1}\right)-f\left(x^k\right) \leq-\alpha \beta \eta^2 m / M^2 \end{align*}$ 2. Pure Newton phase ($t=1$), quadratically convergent $\begin{align*} &\Vert \nabla f(x) \Vert_2< \eta \\ \Longleftrightarrow &\frac{L}{m^2} \left\|\nabla f\left(x^{k+1}\right)\right\|_2\leq\left(\frac{L}{2 m^2}\left\|\nabla f\left(x^k\right)\right\|_2\right)^2 \\ & \leq\left(\frac{L}{2 m^2}\left\|\nabla f\left(x^l\right)\right\|_2\right)^{2^{k+1-l}} \leq\left(\frac{1}{2}\right)^{2^{k+1-l}} \quad k+1 \geq l \end{align*}$ #note (Lipschitz constant, Smallest eigenvalue is "how far we go") $\begin{align*} \Vert \nabla f(x+\Delta x_{nt}) \Vert_2&= \Vert \nabla f(x+\Delta x_{nt})- \nabla f(x ) - \nabla^2 f(x)\Delta x_{nt} \Vert_2\\ &= \Vert \int_0^1 \left(\nabla^2 f(x+t\Delta x_{nt}) - \nabla^2f(x)\right)\Delta x_{nt} dt \Vert_2\\ &\leq \Vert \int_0^1 Lt \Delta x_{nt}^T \Delta x_{nt}dt \Vert_2\\ &= \frac{1}{2}L \Vert x_{nt} \Vert_2^2 =\frac{L}{2}\Vert \nabla^2 f(x)^{-1}\nabla f(x) \Vert_2^2\\ &\leq \frac{L}{2 m^2}\Vert \nabla f(x) \Vert_2^2 \end{align*}$ #note In descent method, error drop is linear, in Newton method, error drop is quadratic #note Pros: - Unit matches in Newton's method: Gradient is cost / distance, Hessian is cost / distance^2. Hessian / Gradient = distance - (Gradient direction) Invariant to affine transformation - $f(x)$ keeps dropping ($\nabla^2 f(x)\succcurlyeq0$) - Convergence rate $(\frac{1}{2})^{2^{k+1-l}}$ - Complexity dominated by 2nd-order expansion ($\nabla ^2f(x)^{-1}\nabla f(x)$) ![[Pasted image 20230307132235.png|400]] #note $f(x+v) \approx f(x)+\nabla f(x)^T v+\frac{1}{2} v^T \nabla^2 f(x+tv) v$ is **exact** if we know $t\in[0,1]$ ### Affine invariance ![[Pasted image 20230307131753.png|400]] ## 5 Summary