The Camera Matrix - Obsidian Publish

[[Matrices|Matrix]] encoding camera properties, usually split up into - Extrinsic, the cameras [[Robot Kinematics - Frames and Twists|pose]] in a chosen world frame. Used to transform into the camera frame. - Intrinsic, transforms (projects) points from the camera frame onto the image subspace In total, this gives a matrix from world homogeneous $[w_w \, y_w \, z_w \, 1]$ coordinates to image homogeneous coordinates $(u \, u \, 1)$, usually denoted via$\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \mathbf{P} \begin{bmatrix} x_w \\ y_w \\ z_w \\ 1 \end{bmatrix}, \quad \lambda \neq 0, \quad \mathbf{P} \in \mathbb{R}^{3\times 4}$ --- The extrinsic part is simply encoded in a homogeneous transformation$^{c}\mathbf{T}_{w} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^\top & \mathbf{1} \end{bmatrix} \in \mathbb{R}^{4\times 4}$ The intrinsic part consists of focal lengths $f_x, f_y$ (distance between pinhole and image in pixels), the principal point offset $(c_x, c_y)$ (shift between pinhole and image center) and the axis skew $s$ (image distortion, only digital). The image is obtained via$\mathbf{K} = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}.$