Robust Loss FunctionsA collection of notes on various topics I am interested in.

From Maximum A Posteriori we have the loss function

J (x) = \frac{1}{2} i = 1 \sum N e_{i} (x)^{T} W_{i}^{- 1} (e_{i} (x))

The gradient of which (which is used for Gauss-Newton Method is given by

\frac{\partial J ( x )}{\partial x} = i = 1 \sum N e_{i} (x)^{T} W_{i}^{- 1} \frac{\partial e _{i} ( x )}{\partial x}

Because the cost function is quadratic, our cost explodes when outliers cause major errors. To handle this, there are a series of wrappers that we can use to limit the effects of robot loss functions.

J^{'} (x) = i = 1 \sum N α_{i} ρ (u_{i} (x)) u_{i} (x) = e_{i} (x)^{T} W_{i}^{- 1} (e_{i} (x))

where $α$ is a scalar weight you can define, $ρ$ is some non-linear cost function (the wrapper).

Some possible cost functions are

C a u c h y ρ (u) = \frac{1}{2} ln (1 + u^{2}) G e man - M c Cl u re ρ (u) = \frac{1}{2} \frac{u ^{2}}{1 + u ^{2}}

Huber (Quad when close, Lin when far) ρ (u) = {\frac{1}{2} u^{2} δ ∣ u ∣ - \frac{1}{2} δ^{2} if ∣ u ∣ \leq δ if ∣ u ∣ > δ

$δ$ is a specifiable parameter.

These cost functions don’t explode as much as the squared loss, and thus are more robust towards outliers. The downside is that we end up with slower convergence (but keep in mind that’s only if our data has very few outliers)

In the case of nice data, you don't need Robust Loss Functions like these.

Explorer

Robust Loss Functions

Graph View