A perspective camera is just an idealized camera model used for our understanding of computer vision. For this, we will be using the frontal projection model (which is a bit different from what you learned awhile ago. Specifically that the projected plane is on the same side as the object.)

TLDR

Given we have points in world frame $r_{pw} = x_{w} y_{w} z_{w}$ and the transform between the camera and the world frame is given by

T_{s w} = [C_{s w} t_{s w}]

The pinhole camera model is defined as

z_{s} u v 1 = K T_{s w} [r_{pw} 1]

Which expanded looks like

z_{s} u v 1 = α 00 γ β 0 c_{u} c_{v} 1 [C_{s w} t_{s w}] x_{w} y_{w} z_{w} 1

$K$ is our camera intrinsic matrix
- $α$ is our horizontal focal length accounting for pixel size $α = f_{u} l$
- $β$ is our vertical focal length accounting for pixel size $β = f_{v}$
- $γ$ is our skew coefficient (usually 0)
- $c_{u}$ is the horizontal offset to center the image plane with the camera frame
- $c_{v}$ is the vertical offset to center the image place with the camera frame
$T$ is our camera extrinsic matrix (chopped transformation matrix)
- $C$ is a valid rotation matrix
- $t$ is our transformation vector
$z_{s}$ is the depth of the point in camera frame (must be divided out at the end to get our $u$ and $v$ ). The act of dividing out the depth is know as perspective projection

Accounting for Distortion

Distortion is a non-linear function, and there are number of different models for distortion.

D (\cdot) : [x_{n} y_{n}] \to [x_{d} y_{d}]

Brown-Conrady Distortion Model

AKA Plumb Bob Model

x_{d} = x_{n} radial distortion \frac{1 + k _{1} r ^{2} + k _{2} r ^{4} + k _{3} r ^{6}}{1 + k _{4} r ^{2} + k _{5} r ^{4} + k _{6} r ^{6}} + tangential distortion 2 p_{1} x_{n} y_{n} + p_{2} (r^{2} + 2 x_{n}^{2}) + thin prism distortion (misaligned lenses) s_{1} r^{2} + s_{2} r^{4}

y_{d} = y_{n} \frac{1 + k _{1} r ^{2} + k _{2} r ^{4} + k _{3} r ^{6}}{1 + k _{4} r ^{2} + k _{5} r ^{4} + k _{6} r ^{6}} + p_{1} (r^{2} + 2 y_{n}^{2}) + 2 p_{2} x_{n} y_{n} + s_{3} r^{2} + s_{4} r^{4}

where

r^{2} = x_{n}^{2} + y_{n}^{2}

This is the distortion model used by OpenCV

However, most of the time OpenCV uses the standard 5-parameter, simplified model for most calibrations.

x_{d} = x_{n} (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + 2 p_{1} x_{n} y_{n} + p_{2} (r^{2} + 2 x_{n}^{2})

y_{d} = y_{n} (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) + p_{1} (r^{2} + 2 y_{n}^{2}) + 2 p_{2} x_{n} y_{n}

Adding Distortion into the model

We add in the distortion after the extrinsic and perspective transform, before passing through camera intrinsics

z_{s} u v 1 = α 00 γ β 0 c_{u} c_{v} 1 [C_{s w} t_{s w}] x_{w} y_{w} z_{w} 1

u v 1 = α 00 γ β 0 c_{u} c_{v} 1 D \frac{T _{s w} [ r _{pw} 1 ]}{z _{s}}

Explanation

We have our pinhole $S$ and a point in 3D space, $P$ . The vector to the point from the pinhole is

ρ = r_{s}^{p s} = x y z

important to note that the $s_{3}$ axis is normal to the image plane

The projected point $O$ which will end up on the plane is given by a vector $p$

p = x_{n} y_{n} 1 = \frac{x}{z} \frac{y}{z} 1

It is homogeneous to work with matrix calculations. This is called a normalized image coordinate

Essential Matrix

If the same point is collected by the same camera after a transformation, the two observations of the same point are related by

p_{a}^{T} E_{ab} p_{b} = 0

Where $E$ is called the essential matrix.

E_{ab} = C_{ba}^{T} r_{b}^{ab \land}

And its related to the pose of the camera

T_{ba} = [C_{ba} 0^{T} r_{b}^{ab} 1]

Lens Distortion

It exists, and we need to characterize it and deal with it, This affects how close a real camera is to our idealized model. But once it is characterized, we can run an undistortion procedure to get the camera image to something we can actually use.

Intrinsic Parameters

We were assuming right before this that the focal length is 1, so we actually need to deal with that (as well as map to pixel coordinates which start from the top left of the image)

q = u v 1 = Kp = f_{u} 00 0 f_{v} 0 c_{u} c_{v} 1 x_{n} y_{n} 1

These have to be determined through camera calibration

Fundamental Matrix

Similar to the Essential Matrix, but defined between two different cameras. Lets say we have two different cameras

q_{a} = K_{a} p_{a}

q_{b} = K_{b} p_{b}

The fundamental matrix, $F_{ab}$ exists such that

q_{a}^{T} F_{ab} q_{b} = 0 where F_{ab} = K_{a}^{- T} E_{ab} K_{b}^{- 1}

Reasoning

q_{a}^{T} F_{ab} q_{b} = p_{a}^{T} 1 K_{a}^{T} K_{a}^{- T} E_{ab} 1 K_{b}^{- 1} K_{b} p_{b} = p_{a}^{T} E_{ab} p_{b} = 0

The constraint associated with the fundamental matrix is also called the epipolar constraint

Homography

If an observed point is on a plane of known geometry, it is possible to work out what the point will look like on another camera of a known pose change. This is called homography.

Like before, we have two cameras

q_{a} = K_{a} p_{a} = K_{a} \frac{1}{z _{a}} ρ_{a}

q_{b} = K_{b} p_{b} = K_{b} \frac{1}{z _{b}} ρ_{b}

Say we know the equation of the Plane expressed in both camera frames to be

{n_{a}, d_{a}} and {n_{b}, d_{b}}

This implies that

n_{a}^{T} ρ_{a} + d_{a} = 0

n_{b}^{T} ρ_{b} + d_{b} = 0

Substituting our equations for $ρ$

z_{i} n_{i}^{T} K_{i}^{- 1} q_{i} + d_{i} = 0 i = a, b

or z_{i} = - \frac{d _{i}}{n _{i}^{T} K _{i}^{- 1} q _{i}}

This implies that we can write the coordinates of $P$ with respect to any camera frame as

q_{i} = K_{i} \frac{1}{z _{i}} ρ_{i}

ρ_{i} = z_{i} K_{i}^{- 1} q_{i}

ρ_{i} = - \frac{d _{i}}{n _{i}^{T} K _{i}^{- 1} q _{i}} K_{i}^{- 1} q_{i}

Extending this to our coodinates in the image plane, we get that

q_{b} = K_{b} H_{ba} K_{a}^{- 1} q_{a} Homography Matrix H_{ba} = \frac{z _{a}}{z _{b}} C_{ba} (1 + \frac{1}{d _{a}} r_{a}^{ba} n_{a}^{T})

The Homography Matrix lets us determine the how a point in one image plane is gonna look like in another plane. Given that we know the geometry of the point

The Homography matrix is invertible

H_{ba}^{- 1} = H_{ab}

Explorer

Perspective Camera

TLDR

Accounting for Distortion

Brown-Conrady Distortion Model

Adding Distortion into the model

Explanation

Essential Matrix

Lens Distortion

Intrinsic Parameters

Fundamental Matrix

Homography

Graph View

Table of Contents