This follows Maximum A Posteriori in Linear-Gaussian Estimation, except for the nonlinear case.
We previously set the objective function to be the squared Mahalanobis Distance . Here we define the errors between the prior and measurements differently…
\mathbf{e}{y,k}(\mathbf{x})=\mathbf{y}{k}-\mathbf{g}(\mathbf{x}_{k},\mathbf{0});; k=0\dots K
J_{v,k}(\mathbf{x})=\frac{1}{2}\mathbf{e}{v,k}(\mathbf{x})^{T}\mathbf{W}{v,k}^{-1}\mathbf{e}_{v,k}(\mathbf{x})
J_{y,k}(\mathbf{x})=\frac{1}{2}\mathbf{e}{y,k}(\mathbf{x})^{T}\mathbf{W}{y,k}^{-1}\mathbf{e}_{y,k}(\mathbf{x})
>[!error] $\mathbf{W}_{v,k}$ and $\mathbf{W}_{y,k}$ can be thought of as positive-definite symmetric matrix weights **that are often set to the process noise and measurement noise covariances of the system** > And the overall objective function is thusJ(\mathbf{x})=\sum ^{K}{k=0}(J{v,k}(\mathbf{x})+J_{y,k}(\mathbf{x}))
\mathbf{e}(\mathbf{x}) = \begin{bmatrix} \mathbf{e}v(\mathbf{x}) \ \mathbf{e}y(\mathbf{x}) \end{bmatrix}, \quad \mathbf{e}v(\mathbf{x}) = \begin{bmatrix} \mathbf{e}{v,0}(\mathbf{x}) \ \vdots \ \mathbf{e}{v,K}(\mathbf{x}) \end{bmatrix}, \quad \mathbf{e}y(\mathbf{x}) = \begin{bmatrix} \mathbf{e}{y,0}(\mathbf{x}) \ \vdots \ \mathbf{e}{y,K}(\mathbf{x}) \end{bmatrix}
\mathbf{W} = \text{diag}(\mathbf{W}_v, \mathbf{W}y), \quad \mathbf{W}v = \text{diag}(\mathbf{W}{v,0}, \ldots, \mathbf{W}{v,K})
\mathbf{W}y = \text{diag}(\mathbf{W}{y,0}, \ldots, \mathbf{W}_{y,K})
J(\mathbf{x}) = \frac{1}{2}\mathbf{e}(\mathbf{x})^T \mathbf{W}^{-1} \mathbf{e}(\mathbf{x})
\mathbf{u}(\mathbf{x}) = \mathbf{L}\mathbf{e}(\mathbf{x})
where $\mathbf{L}^T\mathbf{L} = \mathbf{W}^{-1}$ (i.e., from a Cholesky decomposition since $\mathbf{W}$ is symmetric positive-definite). Using these definitions, we can write the objective function simply asJ(\mathbf{x}) = \frac{1}{2}\mathbf{u}(\mathbf{x})^T \mathbf{u}(\mathbf{x})
\hat{\mathbf{x}}=\underset{\mathbf{x}}{\text{argmin}};J(\mathbf{x})
There are many ways to solve this optimization problem, including [[Newton's Method]] and [[Gauss-Newton Method]] # [[Gauss-Newton Method]] in Terms of Errors From the example in [[Gauss-Newton Method]], we get\Rightarrow \quad \left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)^T \left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right) \delta\mathbf{x}^* = -\left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)^T \mathbf{u}(\mathbf{x}_{\text{op}})
We have that the error is related to $\mathbf{u}(x)$:\mathbf{u}(\mathbf{x}) = \mathbf{L}\mathbf{e}(\mathbf{x})
(\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{H})\delta \mathbf{x}^{*}=\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{e}(\mathbf{x}{op}) \quad \mathbf{H}=-\frac{ \partial \mathbf{e}(\mathbf{x}) }{ \partial \mathbf{x} } \bigg|{\mathbf{x}_{op}}
Here, we **gotta linearize $\mathbf{e}(\mathbf{x})$** instead\mathbf{e}{v,k}(\mathbf{x}{\text{op}} + \delta\mathbf{x}) \approx \begin{cases} \mathbf{e}{v,0}(\mathbf{x}{\text{op}}) - \delta\hat{\mathbf{x}}0, & k = 0 \ \mathbf{e}{v,k}(\mathbf{x}{\text{op}}) + \mathbf{F}{k-1}\delta\mathbf{x}_{k-1} - \delta\mathbf{x}_k, & k = 1 \ldots K \end{cases}
\mathbf{e}{y,k}(\mathbf{x}{\text{op}} + \delta\mathbf{x}) \approx \mathbf{e}{y,k}(\mathbf{x}{\text{op}}) - \mathbf{G}_k\delta\mathbf{x}_k, \quad k = 0 \ldots K
\mathbf{e}{v,k}(\mathbf{x}{\text{op}}) \approx \begin{cases} \hat{\mathbf{x}}0 - \mathbf{x}{\text{op},0}, & k = 0 \ \mathbf{f}(\mathbf{x}_{\text{op},k-1}, \mathbf{v}k, 0) - \mathbf{x}{\text{op},k}, & k = 1 \ldots K \end{cases}
\mathbf{e}{y,k}(\mathbf{x}{\text{op}}) \approx \mathbf{y}k - \mathbf{g}(\mathbf{x}{\text{op},k}, 0), \quad k = 0 \ldots K
\mathbf{F}{k-1} = \frac{\partial \mathbf{f}(\mathbf{x}{k-1}, \mathbf{v}k, \mathbf{w}k)}{\partial \mathbf{x}{k-1}}\bigg|{\mathbf{x}_{\text{op},k-1}, \mathbf{v}_k, 0}, \quad \mathbf{G}_k = \frac{\partial \mathbf{g}(\mathbf{x}_k, \mathbf{n}k)}{\partial \mathbf{x}k}\bigg|{\mathbf{x}{\text{op},k}, 0}
\mathbf{W}{v,k}=\mathbf{Q}{k}‘\quad\mathbf{W}{y,k}=\mathbf{R}{k}’
\delta\mathbf{x} = \begin{bmatrix} \delta\mathbf{x}_0 \ \delta\mathbf{x}_1 \ \delta\mathbf{x}_2 \ \vdots \ \delta\mathbf{x}_K \end{bmatrix}, \quad \mathbf{H} = \begin{bmatrix} \mathbf{1} & & & & \ -\mathbf{F}_0 & \mathbf{1} & & & \ & -\mathbf{F}1 & \ddots & & \ & & \ddots & \mathbf{1} & \ & & & -\mathbf{F}{K-1} & \mathbf{1} \ \hline -\mathbf{G}_0 & & & & \ & \mathbf{G}_1 & & & \ & & \mathbf{G}_2 & & \ & & & \ddots & \ & & & & \mathbf{G}_K \end{bmatrix}
\mathbf{e}(\mathbf{x}{\text{op}}) = \begin{bmatrix} \mathbf{e}{v,0}(\mathbf{x}{\text{op}}) \ \mathbf{e}{v,1}(\mathbf{x}{\text{op}}) \ \vdots \ \mathbf{e}{v,K}(\mathbf{x}{\text{op}}) \ \hline \mathbf{e}{y,0}(\mathbf{x}{\text{op}}) \ \mathbf{e}{y,1}(\mathbf{x}{\text{op}}) \ \vdots \ \mathbf{e}{y,K}(\mathbf{x}_{\text{op}}) \end{bmatrix}
\mathbf{W} = \text{diag}\left(\mathbf{P}_0, \mathbf{Q}_1, \ldots, \mathbf{Q}_K, \mathbf{R}_0, \mathbf{R}_1, \ldots, \mathbf{R}_K\right)
(\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{H})\delta \mathbf{x}^{*}=\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{e}(\mathbf{x}_{op})
# Laplace Approximation We sometimes want a covariance matrix of our approximation once its done. To do that we approximate it as the inverse of the approximated hessian at the point where we stopped convergence.\hat{\mathbf{x}}=\underset{\mathbf{x}}{\text{argmin}};J(\mathbf{x})
\check{\mathbf{P}}=\left(\left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)^T \left(\frac{\partial \mathbf{u}(\mathbf{x})}{\partial \mathbf{x}}\bigg|{\mathbf{x}{\text{op}}}\right)\right)^{-1}=(\mathbf{H}^{T}\mathbf{W}^{-1}\mathbf{H})^{-1}
