# Viewing, Perspective Transformation
---
## 1. The Viewing Pipeline
![[Pasted image 20221221102848.png]]
This note will focus on **perspective transformation**: Recall that previously $M[3,:]=[0,0,0,1]$. this will soon change when we take into account perspective transformation.
A brief introduction on 2 types of projections: ![[Pasted image 20221215111430.png|120]]
- ***Orthographic***: Simply remove z-coordinate - $\text{Ortho}([x,y,z]^T)=[x,y]^T$. Parallel lines stay parallel.
- ***Perspective***: Models the projection that happens in human eyes and cameras.
---
## 2. Orthographic Projection: `glOrtho()`
### `glOrtho(l,r, t,b, n,f)`
Takes 6 params - left/right, top/bottom, near/far that specifies the dimensions of a **cuboid**. Then ***normalize*** said **cuboid** such that it's 1) centered at 0 (**shifting**) and 2) be [-1,+1] in all axes (**scaling**).
$\begin{align*}
M=\begin{bmatrix}\frac{2}{r-l}&0&0&0\\0&\frac{2}{t-b}&0&0\\0&0&\frac{2}{f-n}&0\\0&0&0&1\end{bmatrix}\begin{bmatrix}1&0&0&-\frac{l+r}{2}\\0&1&0&-\frac{t+b}{2}&\\0&0&1&-\frac{f+n}{2}\\0&0&0&1\end{bmatrix} = \begin{bmatrix}
\frac{2}{r-1} & 0 & 0 & -\frac{r+l}{r-1} \\
0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\
0 & 0 & \frac{2}{f-n} & -\frac{f+n}{f-n} \\
0 & 0 & 0 & 1
\end{bmatrix}
\end{align*}$
Note that we are not quite done yet. According to the OpenGL convention, [[0_background#Coordinate Frames|viewing direction]] is (counterintuitively) **away from** the object (this is because aperture points away from the object), meaning that $n>f$. Therefore $M$ needs to be adjusted accordingly:
$M=\begin{bmatrix}
\frac{2}{r-1} & 0 & 0 & -\frac{r+l}{r-1} \\
0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\
0 & 0 & \mathbf{-\frac{2}{f-n}} & -\frac{f+n}{f-n} \\
0 & 0 & 0 & 1
\end{bmatrix}$
---
## 3. Perspective Projection: `gluPerspective()`
![[2_viewing 2023-01-26 10.21.31.excalidraw.png]]
Perspective projection models how eye/camera sees by following 2 rules: 1) Further objects appear smaller, i.e. "***inverse distance***", 2) parallel lines in 3D converge at a single point when projected to 2D (parallel planes converge at a single line).
- An interesting and important observation is that perspective projection maps the ***viewing frustum*** to a viewing **cuboid**, since all lines that go though eye before projection becomes parallel to one another after projection.
### Terms and Definitions
- $d$: ***Focal length***. Distance from eye (i.e. ***center of projection***) to projection plane.
- $z$: Distance from eye to object
- $l,r;\;t,b;\;n,f$: Left, right, top, bottom, near, far coordinates of the viewing ~~frustum~~ cuboid (after perspective projection).
- Note that $\boxed{d=n}$ (focal length is z-coordinate of the near plane), but $d=-1$ is not guaranteed before perspective projection.
- Note that $\boxed{[t,b]=[1,1]}, \boxed{[n,f]=[-1,1]}$
- Note the lack of constraint on $l,r$. This is because we want customizable aspect ratio
- **Magnification factor**: $d/z$. . Notice how $\uparrow$z leads to $\downarrow$ magnification factor, which constitutes the desired property of inverse distance.
$x'/d = x/z, y'/d=y/z \; \rightarrow x'=x\boxed{\frac{d}{z}}, y'=y\boxed{\frac{d}{z}}$
- ***Aspect ratio***: $\text{width}/\text{height}$. This comes in handy for derive $P$.
$\text{aspect}=\frac{\text{width}}{\text{height}}=\frac{r-l}{2}$
- ***Field of view*** (***fovy***): The angle $2\theta$ such that $\tan\theta=\frac{t}{\vert n\vert}$. Notice that the 'y' in fov***y*** means/emphasizes "vertical" field of view (not left-to-right, not diagonal).
$\theta=\frac{\text{fovy}}{2}, \; d=\cot\theta=\cot\frac{\text{fovy}}{2}$
### The Projection Matrix $P$
We want a $P\in\mathbb{R}^{4\times4}$ to transform a homogeneous coordinate $p=[x,y,z,1]^T$ (resulting in $p'=Pp$) such that the following 3 goals are achieved:
1. It encodes **magnification facto**r $\frac{d}{z}$ (remember $z$ is the distance from object to eye). Note that this is equivalent to encoding `foyv` since
We start with $P=\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&-\frac{1}{d}&0\end{bmatrix}$, goal being that $p'=Pp=P\begin{bmatrix}x\\y\\z\\1\end{bmatrix}=\begin{bmatrix}x\\y\\z\\-\frac{z}{d}\end{bmatrix}=\begin{bmatrix}x\\y\\z\\w\end{bmatrix}$. **Dehomogenize** $p'$, we get $p'\sim [-x\frac{d}{z},-y\frac{d}{z},-d,1]^T$, showing that goal 1) is accomplished.
2. It encodes **aspect ratio**. This is equivalent to encoding `aspect` in `gluPerspective()`
We change $P$[0,0] to encode aspect ratio, i.e. now we have
$P=\begin{bmatrix}\frac{1}{\text{aspect}}&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&-\frac{1}{d}&0\end{bmatrix} \sim \begin{bmatrix}\frac{d}{\text{aspect}}&0&0&0\\0&d&0&0\\0&0&A&B\\0&0&-1&0\end{bmatrix}$
3. It needs to perform ***z-mapping***, i.e. mapping $[n,f]$ to [-1,1]. Note that this is equivalent to encoding `zNear, zFar` in `gluPerspective()`.
Now the only thing left to do is to work out $A,B \in \mathbb{R}$ for z-mapping. We start by considering only the lower-right sub-matrix of $P$, i.e. $P[2:,2:]$ (numpy array slicing syntax), since we only care about how $z$ (and the trailing 1 in homogeneous vector $p$ for dehomogenizing) is transformed.
$\begin{align*}P[2:,2:]p[2:]=\begin{bmatrix}A&B\\-1&0\end{bmatrix}\begin{bmatrix}z\\1\end{bmatrix}= \begin{bmatrix}Az+B\\-z\end{bmatrix} \underset{\text{dehomo.}}{\sim}\begin{bmatrix}\frac{Az+B}{-z}\\1\end{bmatrix}=\begin{bmatrix}-A-\frac{B}{z}\\1\end{bmatrix}\end{align*}$
The above equation establishes that the transformed $z$-coordinate is $-A-\frac{B}{z}$. What's left to do is trivial: substituting $z$ with $-n,-f$ (since the convention of $z$ pointing away from object) then solve the system of 2 equations and 2 unknowns for $A,B$ :
$\begin{align*}
\begin{Bmatrix}-A\mathbf{+}\frac{B}{n}=-1\\-A\mathbf{+}\frac{B}{f}=1\end{Bmatrix} &\rightarrow \begin{cases}A=-\frac{f+n}{f-n}\\B=-\frac{2fn}{f-n}\end{cases};\;\;
\text{finally, } P=\begin{bmatrix}\frac{d}{\text{aspect}}&0&0&0\\0&d&0&0\\0&0&-\frac{f+n}{f-n}&-\frac{2fn}{f-n}\\0&0&-1&0\end{bmatrix}
\end{align*}$
### gluPerspective(fovy, aspect, zNear, zFar)
With the previous discussion on $P$, this function is easy to understand since all it does is specifying $P$.
- `fovy`, `aspect` are the field of view and aspect ratio
- `zNear > 0`: $n$, i.e. the "near plane" z-coordinate of final project viewing ~~frustum~~ cuboid
- `zFar > 0`: $f$, i.e. the "far plane" z-coordinate of final project viewing ~~frustum~~ cuboid
### On $P