Maximum A PosterioriA collection of notes on various topics I am interested in.

This follows Maximum A Posteriori in Linear-Gaussian Estimation, except for the nonlinear case.

We previously set the objective function to be the squared Mahalanobis Distance . Here we define the errors between the prior and measurements differently…

e_{v, k} (x) = {\overset{ˇ}{x}_{0} - x_{0} f (x_{k - 1}, v_{k}, 0) k = 0 k = 1 \dots K

\mathbf{e}{y,k}(\mathbf{x})=\mathbf{y}{k}-\mathbf{g}(\mathbf{x}_{k},\mathbf{0});; k=0\dots K

W e d e f in e t h e i rco n t r ib u t i o n s t o t h eo bj ec t i v e f u n c t i o na s

J_{v,k}(\mathbf{x})=\frac{1}{2}\mathbf{e}{v,k}(\mathbf{x})^{T}\mathbf{W}{v,k}^{-1}\mathbf{e}_{v,k}(\mathbf{x})

J_{y,k}(\mathbf{x})=\frac{1}{2}\mathbf{e}{y,k}(\mathbf{x})^{T}\mathbf{W}{y,k}^{-1}\mathbf{e}_{y,k}(\mathbf{x})

>[!error] $\mathbf{W}_{v,k}$ and $\mathbf{W}_{y,k}$ can be thought of as positive-definite symmetric matrix weights **that are often set to the process noise and measurement noise covariances of the system** > And the overall objective function is thus

J(\mathbf{x})=\sum ^{K}{k=0}(J{v,k}(\mathbf{x})+J_{y,k}(\mathbf{x}))

W ec an re w r i t e t hi s t o b ec l e an er

\mathbf{e}(\mathbf{x}) = \begin{bmatrix} \mathbf{e}v(\mathbf{x}) \ \mathbf{e}y(\mathbf{x}) \end{bmatrix}, \quad \mathbf{e}v(\mathbf{x}) = \begin{bmatrix} \mathbf{e}{v,0}(\mathbf{x}) \ \vdots \ \mathbf{e}{v,K}(\mathbf{x}) \end{bmatrix}, \quad \mathbf{e}y(\mathbf{x}) = \begin{bmatrix} \mathbf{e}{y,0}(\mathbf{x}) \ \vdots \ \mathbf{e}{y,K}(\mathbf{x}) \end{bmatrix}

\mathbf{W} = \text{diag}(\mathbf{W}_v, \mathbf{W}y), \quad \mathbf{W}v = \text{diag}(\mathbf{W}{v,0}, \ldots, \mathbf{W}{v,K})

\mathbf{W}y = \text{diag}(\mathbf{W}{y,0}, \ldots, \mathbf{W}_{y,K})

so t ha tt h eo bj ec t i v e f u n c t i o n c anb e w r i tt e na s

J(\mathbf{x}) = \frac{1}{2}\mathbf{e}(\mathbf{x})^T \mathbf{W}^{-1} \mathbf{e}(\mathbf{x})

W ec an f u r t h er d e f in e t h e m o d i f i e d error t er m,

\mathbf{u}(\mathbf{x}) = \mathbf{L}\mathbf{e}(\mathbf{x})

where $\mathbf{L}^T\mathbf{L} = \mathbf{W}^{-1}$ (i.e., from a Cholesky decomposition since $\mathbf{W}$ is symmetric positive-definite). Using these definitions, we can write the objective function simply as

J(\mathbf{x}) = \frac{1}{2}\mathbf{u}(\mathbf{x})^T \mathbf{u}(\mathbf{x})

A n d w ee n d u pw i t h t h e f ina l g o a l o f

\hat{\mathbf{x}}=\underset{\mathbf{x}}{\text{argmin}};J(\mathbf{x})

There are many ways to solve this optimization problem, including [[Newton's Method]] and [[Gauss-Newton Method]] # [[Gauss-Newton Method]] in Terms of Errors From the example in [[Gauss-Newton Method]], we get

\Rightarrow \quad \left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)^T \left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right) \delta\mathbf{x}^* = -\left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)^T \mathbf{u}(\mathbf{x}_{\text{op}})

We have that the error is related to $\mathbf{u}(x)$:

\mathbf{u}(\mathbf{x}) = \mathbf{L}\mathbf{e}(\mathbf{x})

S o i d e a ll y w es h o u l d e x p ress f in d in g o u rs t e p in t er m so f t h eerror .

(\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{H})\delta \mathbf{x}^{*}=\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{e}(\mathbf{x}{op}) \quad \mathbf{H}=-\frac{ \partial \mathbf{e}(\mathbf{x}) }{ \partial \mathbf{x} } \bigg|{\mathbf{x}_{op}}

Here, we **gotta linearize $\mathbf{e}(\mathbf{x})$** instead

\mathbf{e}{v,k}(\mathbf{x}{\text{op}} + \delta\mathbf{x}) \approx \begin{cases} \mathbf{e}{v,0}(\mathbf{x}{\text{op}}) - \delta\hat{\mathbf{x}}0, & k = 0 \ \mathbf{e}{v,k}(\mathbf{x}{\text{op}}) + \mathbf{F}{k-1}\delta\mathbf{x}_{k-1} - \delta\mathbf{x}_k, & k = 1 \ldots K \end{cases}

\mathbf{e}{y,k}(\mathbf{x}{\text{op}} + \delta\mathbf{x}) \approx \mathbf{e}{y,k}(\mathbf{x}{\text{op}}) - \mathbf{G}_k\delta\mathbf{x}_k, \quad k = 0 \ldots K

w h ere

\mathbf{e}{v,k}(\mathbf{x}{\text{op}}) \approx \begin{cases} \hat{\mathbf{x}}0 - \mathbf{x}{\text{op},0}, & k = 0 \ \mathbf{f}(\mathbf{x}_{\text{op},k-1}, \mathbf{v}k, 0) - \mathbf{x}{\text{op},k}, & k = 1 \ldots K \end{cases}

\mathbf{e}{y,k}(\mathbf{x}{\text{op}}) \approx \mathbf{y}k - \mathbf{g}(\mathbf{x}{\text{op},k}, 0), \quad k = 0 \ldots K

\mathbf{F}{k-1} = \frac{\partial \mathbf{f}(\mathbf{x}{k-1}, \mathbf{v}k, \mathbf{w}k)}{\partial \mathbf{x}{k-1}}\bigg|{\mathbf{x}_{\text{op},k-1}, \mathbf{v}_k, 0}, \quad \mathbf{G}_k = \frac{\partial \mathbf{g}(\mathbf{x}_k, \mathbf{n}k)}{\partial \mathbf{x}k}\bigg|{\mathbf{x}{\text{op},k}, 0}

w e l e t

\mathbf{W}{v,k}=\mathbf{Q}{k}‘\quad\mathbf{W}{y,k}=\mathbf{R}{k}’

Wi t ha llt hi s inmin d, w ec an co n s t r u c t o u rs t e p e q u a t i o n f or [[G a u ss - N e wt o n M e t h o d]] b yc l e v er l ys t a c kin g t hin g s

\delta\mathbf{x} = \begin{bmatrix} \delta\mathbf{x}_0 \ \delta\mathbf{x}_1 \ \delta\mathbf{x}_2 \ \vdots \ \delta\mathbf{x}_K \end{bmatrix}, \quad \mathbf{H} = \begin{bmatrix} \mathbf{1} & & & & \ -\mathbf{F}_0 & \mathbf{1} & & & \ & -\mathbf{F}1 & \ddots & & \ & & \ddots & \mathbf{1} & \ & & & -\mathbf{F}{K-1} & \mathbf{1} \ \hline -\mathbf{G}_0 & & & & \ & \mathbf{G}_1 & & & \ & & \mathbf{G}_2 & & \ & & & \ddots & \ & & & & \mathbf{G}_K \end{bmatrix}

\mathbf{e}(\mathbf{x}{\text{op}}) = \begin{bmatrix} \mathbf{e}{v,0}(\mathbf{x}{\text{op}}) \ \mathbf{e}{v,1}(\mathbf{x}{\text{op}}) \ \vdots \ \mathbf{e}{v,K}(\mathbf{x}{\text{op}}) \ \hline \mathbf{e}{y,0}(\mathbf{x}{\text{op}}) \ \mathbf{e}{y,1}(\mathbf{x}{\text{op}}) \ \vdots \ \mathbf{e}{y,K}(\mathbf{x}_{\text{op}}) \end{bmatrix}

\mathbf{W} = \text{diag}\left(\mathbf{P}_0, \mathbf{Q}_1, \ldots, \mathbf{Q}_K, \mathbf{R}_0, \mathbf{R}_1, \ldots, \mathbf{R}_K\right)

Whi c h l e t s u sy i e l d o u r [[G a u ss - N e wt o n M e t h o d ∣ G a u ss - N e wt o n U p d a t e]]

(\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{H})\delta \mathbf{x}^{*}=\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{e}(\mathbf{x}_{op})

# Laplace Approximation We sometimes want a covariance matrix of our approximation once its done. To do that we approximate it as the inverse of the approximated hessian at the point where we stopped convergence.

\hat{\mathbf{x}}=\underset{\mathbf{x}}{\text{argmin}};J(\mathbf{x})

\check{\mathbf{P}}=\left(\left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)^T \left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)\right)^{-1}=(\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{H})^{-1}

Explorer

Maximum A Posteriori

Graph View

Backlinks