# Viewing, Perspective Transformation --- ## 1. The Viewing Pipeline ![[Pasted image 20221221102848.png]] This note will focus on **perspective transformation**: Recall that previously $M[3,:]=[0,0,0,1]$. this will soon change when we take into account perspective transformation. A brief introduction on 2 types of projections: ![[Pasted image 20221215111430.png|120]] - ***Orthographic***: Simply remove z-coordinate - $\text{Ortho}([x,y,z]^T)=[x,y]^T$. Parallel lines stay parallel. - ***Perspective***: Models the projection that happens in human eyes and cameras. --- ## 2. Orthographic Projection: `glOrtho()` ### `glOrtho(l,r, t,b, n,f)` Takes 6 params - left/right, top/bottom, near/far that specifies the dimensions of a **cuboid**. Then ***normalize*** said **cuboid** such that it's 1) centered at 0 (**shifting**) and 2) be [-1,+1] in all axes (**scaling**). $\begin{align*} M=\begin{bmatrix}\frac{2}{r-l}&0&0&0\\0&\frac{2}{t-b}&0&0\\0&0&\frac{2}{f-n}&0\\0&0&0&1\end{bmatrix}\begin{bmatrix}1&0&0&-\frac{l+r}{2}\\0&1&0&-\frac{t+b}{2}&\\0&0&1&-\frac{f+n}{2}\\0&0&0&1\end{bmatrix} = \begin{bmatrix} \frac{2}{r-1} & 0 & 0 & -\frac{r+l}{r-1} \\ 0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\ 0 & 0 & \frac{2}{f-n} & -\frac{f+n}{f-n} \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{align*}$ Note that we are not quite done yet. According to the OpenGL convention, [[0_background#Coordinate Frames|viewing direction]] is (counterintuitively) **away from** the object (this is because aperture points away from the object), meaning that $n>f$. Therefore $M$ needs to be adjusted accordingly: $M=\begin{bmatrix} \frac{2}{r-1} & 0 & 0 & -\frac{r+l}{r-1} \\ 0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\ 0 & 0 & \mathbf{-\frac{2}{f-n}} & -\frac{f+n}{f-n} \\ 0 & 0 & 0 & 1 \end{bmatrix}$ --- ## 3. Perspective Projection: `gluPerspective()` ![[2_viewing 2023-01-26 10.21.31.excalidraw.png]] Perspective projection models how eye/camera sees by following 2 rules: 1) Further objects appear smaller, i.e. "***inverse distance***", 2) parallel lines in 3D converge at a single point when projected to 2D (parallel planes converge at a single line). - An interesting and important observation is that perspective projection maps the ***viewing frustum*** to a viewing **cuboid**, since all lines that go though eye before projection becomes parallel to one another after projection. ### Terms and Definitions - $d$: ***Focal length***. Distance from eye (i.e. ***center of projection***) to projection plane. - $z$: Distance from eye to object - $l,r;\;t,b;\;n,f$: Left, right, top, bottom, near, far coordinates of the viewing ~~frustum~~ cuboid (after perspective projection). - Note that $\boxed{d=n}$ (focal length is z-coordinate of the near plane), but $d=-1$ is not guaranteed before perspective projection. - Note that $\boxed{[t,b]=[1,1]}, \boxed{[n,f]=[-1,1]}$ - Note the lack of constraint on $l,r$. This is because we want customizable aspect ratio - **Magnification factor**: $d/z$. . Notice how $\uparrow$z leads to $\downarrow$ magnification factor, which constitutes the desired property of inverse distance. $x'/d = x/z, y'/d=y/z \; \rightarrow x'=x\boxed{\frac{d}{z}}, y'=y\boxed{\frac{d}{z}}$ - ***Aspect ratio***: $\text{width}/\text{height}$. This comes in handy for derive $P$. $\text{aspect}=\frac{\text{width}}{\text{height}}=\frac{r-l}{2}$ - ***Field of view*** (***fovy***): The angle $2\theta$ such that $\tan\theta=\frac{t}{\vert n\vert}$. Notice that the 'y' in fov***y*** means/emphasizes "vertical" field of view (not left-to-right, not diagonal). $\theta=\frac{\text{fovy}}{2}, \; d=\cot\theta=\cot\frac{\text{fovy}}{2}$ ### The Projection Matrix $P$ We want a $P\in\mathbb{R}^{4\times4}$ to transform a homogeneous coordinate $p=[x,y,z,1]^T$ (resulting in $p'=Pp$) such that the following 3 goals are achieved: 1. It encodes **magnification facto**r $\frac{d}{z}$ (remember $z$ is the distance from object to eye). Note that this is equivalent to encoding `foyv` since We start with $P=\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&-\frac{1}{d}&0\end{bmatrix}$, goal being that $p'=Pp=P\begin{bmatrix}x\\y\\z\\1\end{bmatrix}=\begin{bmatrix}x\\y\\z\\-\frac{z}{d}\end{bmatrix}=\begin{bmatrix}x\\y\\z\\w\end{bmatrix}$. **Dehomogenize** $p'$, we get $p'\sim [-x\frac{d}{z},-y\frac{d}{z},-d,1]^T$, showing that goal 1) is accomplished. 2. It encodes **aspect ratio**. This is equivalent to encoding `aspect` in `gluPerspective()` We change $P$[0,0] to encode aspect ratio, i.e. now we have $P=\begin{bmatrix}\frac{1}{\text{aspect}}&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&-\frac{1}{d}&0\end{bmatrix} \sim \begin{bmatrix}\frac{d}{\text{aspect}}&0&0&0\\0&d&0&0\\0&0&A&B\\0&0&-1&0\end{bmatrix}$ 3. It needs to perform ***z-mapping***, i.e. mapping $[n,f]$ to [-1,1]. Note that this is equivalent to encoding `zNear, zFar` in `gluPerspective()`. Now the only thing left to do is to work out $A,B \in \mathbb{R}$ for z-mapping. We start by considering only the lower-right sub-matrix of $P$, i.e. $P[2:,2:]$ (numpy array slicing syntax), since we only care about how $z$ (and the trailing 1 in homogeneous vector $p$ for dehomogenizing) is transformed. $\begin{align*}P[2:,2:]p[2:]=\begin{bmatrix}A&B\\-1&0\end{bmatrix}\begin{bmatrix}z\\1\end{bmatrix}= \begin{bmatrix}Az+B\\-z\end{bmatrix} \underset{\text{dehomo.}}{\sim}\begin{bmatrix}\frac{Az+B}{-z}\\1\end{bmatrix}=\begin{bmatrix}-A-\frac{B}{z}\\1\end{bmatrix}\end{align*}$ The above equation establishes that the transformed $z$-coordinate is $-A-\frac{B}{z}$. What's left to do is trivial: substituting $z$ with $-n,-f$ (since the convention of $z$ pointing away from object) then solve the system of 2 equations and 2 unknowns for $A,B$ : $\begin{align*} \begin{Bmatrix}-A\mathbf{+}\frac{B}{n}=-1\\-A\mathbf{+}\frac{B}{f}=1\end{Bmatrix} &\rightarrow \begin{cases}A=-\frac{f+n}{f-n}\\B=-\frac{2fn}{f-n}\end{cases};\;\; \text{finally, } P=\begin{bmatrix}\frac{d}{\text{aspect}}&0&0&0\\0&d&0&0\\0&0&-\frac{f+n}{f-n}&-\frac{2fn}{f-n}\\0&0&-1&0\end{bmatrix} \end{align*}$ ### gluPerspective(fovy, aspect, zNear, zFar) With the previous discussion on $P$, this function is easy to understand since all it does is specifying $P$. - `fovy`, `aspect` are the field of view and aspect ratio - `zNear > 0`: $n$, i.e. the "near plane" z-coordinate of final project viewing ~~frustum~~ cuboid - `zFar > 0`: $f$, i.e. the "far plane" z-coordinate of final project viewing ~~frustum~~ cuboid ### On $Ps Nonlinearity $P$ is well-suited to handle a large range of depths (e.g. 10cm-100m); however, a disadvantage is that the depth resolution is not uniform (notice how $z\in[n,f]$ gets mapped to $[-1,1]$ by $P$, and that when $z$ is closer to $n$ an infinitesimal change in $z$ leads to a big change in $-A-\frac{B}{z}$ as compared when $z>>n$). - Depth resolution is higher closer to the near plane; lower farther away. - Don't set near = 0. This leads to a degenerative case were depth resolution is lost completely. - It is a good practice to set `near` to where the object to be viewedd is.