We express everything in a stereo camera with respect to the coordinate frame at the midpoint (known as the midpoint model).

ρ = r_{s}^{p s} = x y z

The model for the left and right camera are as follows

[u_{l} v_{l}] = PK \frac{1}{z} x + \frac{b}{2} y z [u_{r} v_{r}] = PK \frac{1}{z} x - \frac{b}{2} y z

Assuming that the two cameras have the same intrinsic parameters

Stacking the two we get

u_{l} v_{l} u_{r} v_{r} = s (ρ) = M f_{u} 0 f_{u} 0 0 f_{v} 0 f_{v} c_{u} c_{v} c_{u} c_{v} f_{u} \frac{b}{2} 0 - f_{u} \frac{b}{2} 0 \frac{1}{z} x y z 1

You can also model the stereo camera with respect to the Left or right frames as well.

Left Model

The camera model becomes

u_{l} v_{l} u_{r} v_{r} = M f_{u} 0 f_{u} 0 0 f_{v} 0 f_{v} c_{u} c_{v} c_{u} c_{v} 00 - f_{u} b 0 \frac{1}{z} x y z 1

The thing about stereo cameras is that we know the distance between the two cameras and the intrinsics of both. Because they lie in the same plane (only offset by b), we can formulate a relationship between a Point , $P$ z value and disparity

d = u_{l} - u_{r} = \frac{1}{z} f_{u} b

but keep in mind that we dont know z, and we have to usually guess using correspondance.

u_{l} v_{l} d = s (ρ) = M f_{u} 00 0 f_{v} 0 c_{u} c_{v} 0 00 f_{u} b \frac{1}{z} x y z 1

This sensor model is just telling us how disparity relates to the position of the point. (left, disparity if we know the geometry we are looking at) (right, disparity that we guess from some form of correspondence algorithm)

Explorer

Stereo Camera

Left Model

Graph View