# Unconstrained Minimization
Boyd. Chapter 9. [Slides](https://cseweb.ucsd.edu/classes/wi23/cse203B-a/slides/lec9_optalgor.pdf)
## 1 Intro
## 2 Taylor's Expansion
## 3 Descent Methods
Pro: Each step is simple
Con: Accuracy degrades if consider only lower-oder terms
Taylor expansion
$\begin{align*}
f(x)&= f(x_0)+\nabla f(x_0)^T(x-x_0)+ \frac{1}{2}(x-x_0)^T \nabla^2f(x_0)(x-x_0)\\
x_{k+1}&= x_k-t \nabla f(x_k)
\end{align*}$
Limitations:
- Unit doesn't match
- Direction change with affine transformation
- $f(x)$ keeps dropping as long as $\nabla f(x)\neq0$ and $t$ is small enough
$\begin{align*}
f(x_{k+1})=f(x_k)+\nabla f()
\end{align*}$
- Convergence rate:
$\begin{align*}
\epsilon &= f(x_{k+1})-p^*\leq(1- \frac{m}{n})^k(f(x_0-p^*))\\
k&\propto \frac{\log((f(x_0)-p^*)/\epsilon)}{\log(1/(1-\frac{m}{M}))}\approx
\frac{M}{m}\log\left((f(x_0)-p^*)/\epsilon\right)
\end{align*}$
## 4 Newton Methods
Recall that descent method only consider up to the 1st order of Taylor's expansion.
Now, we consider **Newton method**, which uses the approximation of 2nd order Taylor's expansion:
$\begin{align*}
f(x+v) &\approx f(x)+\nabla f(x)^T v+\frac{1}{2} v^T \nabla^2 f(x) v\\
\nabla_vf(x+v) &\approx \nabla f(x)+\nabla^2 f(x)v=0\\
\Longleftrightarrow v&= -\nabla^2 f(x)^{-1} \nabla f(x)\\
\\
f(x+v)&= f(x)+(-1) \nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x)+ \\
&\quad\frac{1}{2} \nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x)\\
&= f(x) \underset{}{\underbrace{ -\frac{1}{2} \nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x) }}
\end{align*}$
The procedure:
Input: $x\in\textbf{dom}f, \epsilon>0$
Repeat:
1. $\Delta x_{n t}:=-\nabla^2 f(x)^{-1} \nabla f(x),\; \lambda^2(x)=\nabla f(x)^T \nabla^2 f(x)^{-1} \nabla f(x)$
2. Quit if $\lambda^2/2\leq\epsilon$
3. Line search $t$
4. $x\coloneqq x+t\Delta x_{nt}$
#note Newton's approach finds a root of an equation, as opposed to finding the minimum of a function like in descent method.
### Line search
$\begin{align*}
v=-t \nabla^2f(x)^{-1} \nabla
\end{align*}$
### Convergence analysis
Assumptions:
- $S=\left\{x \in \operatorname{dom} f \mid f(x) \leq f\left(x_o\right)\right\}$.
- $f$ is strongly convex on $S$ with constant $m$, s.t. $\nabla^2 f(x)\geq mI, \;\forall x\in S$
- $\nabla^2f$ is Lipschitz continuous on $S$ with constant $L$, i.e.
$\begin{align*}
\left\|\nabla^2 f(x)-\nabla^2 f(y)\right\|_2 \leq L\|x-y\|_2
\end{align*}$
Outlines: $\exists \eta \in\left(0, m^2 / L\right)$, there are 2 cases:
1. Damped Newton phase ($t<1$)
$\begin{align*}
&\Vert \nabla f(x) \Vert_2\geq \eta\\
\Longleftrightarrow & f\left(x^{k+1}\right)-f\left(x^k\right) \leq-\alpha \beta \eta^2 m / M^2
\end{align*}$
2. Pure Newton phase ($t=1$), quadratically convergent
$\begin{align*}
&\Vert \nabla f(x) \Vert_2< \eta \\
\Longleftrightarrow
&\frac{L}{m^2} \left\|\nabla f\left(x^{k+1}\right)\right\|_2\leq\left(\frac{L}{2 m^2}\left\|\nabla f\left(x^k\right)\right\|_2\right)^2 \\
& \leq\left(\frac{L}{2 m^2}\left\|\nabla f\left(x^l\right)\right\|_2\right)^{2^{k+1-l}}
\leq\left(\frac{1}{2}\right)^{2^{k+1-l}} \quad k+1 \geq l
\end{align*}$
#note (Lipschitz constant, Smallest eigenvalue is "how far we go")
$\begin{align*}
\Vert \nabla f(x+\Delta x_{nt}) \Vert_2&= \Vert \nabla f(x+\Delta x_{nt})- \nabla f(x ) - \nabla^2 f(x)\Delta x_{nt} \Vert_2\\
&= \Vert \int_0^1 \left(\nabla^2 f(x+t\Delta x_{nt}) - \nabla^2f(x)\right)\Delta x_{nt} dt \Vert_2\\
&\leq \Vert \int_0^1 Lt \Delta x_{nt}^T \Delta x_{nt}dt \Vert_2\\
&= \frac{1}{2}L \Vert x_{nt} \Vert_2^2 =\frac{L}{2}\Vert \nabla^2 f(x)^{-1}\nabla f(x) \Vert_2^2\\
&\leq \frac{L}{2 m^2}\Vert \nabla f(x) \Vert_2^2
\end{align*}$
#note In descent method, error drop is linear, in Newton method, error drop is quadratic
#note Pros:
- Unit matches in Newton's method: Gradient is cost / distance, Hessian is cost / distance^2. Hessian / Gradient = distance
- (Gradient direction) Invariant to affine transformation
- $f(x)$ keeps dropping ($\nabla^2 f(x)\succcurlyeq0$)
- Convergence rate $(\frac{1}{2})^{2^{k+1-l}}$
- Complexity dominated by 2nd-order expansion ($\nabla ^2f(x)^{-1}\nabla f(x)$)
![[Pasted image 20230307132235.png|400]]
#note $f(x+v) \approx f(x)+\nabla f(x)^T v+\frac{1}{2} v^T \nabla^2 f(x+tv) v$ is **exact** if we know $t\in[0,1]$
### Affine invariance
![[Pasted image 20230307131753.png|400]]
## 5 Summary