In Gauss-Newton Method, we make the assumption that we are already near a local minima. This is because of our assumption about the Hessian approximation. Which assumes that we are already near the optimum.
where is a positive diagonal matrix. When , we can see that as becomes very big, the Hessian is relatively small, and we have
which corresponds to a very small step in the direction of steepest descent. When we recover Gauss-Newton Method
By controlling , we can decide when we want to do regular gradient descent and rapid Gauss-Newton optimization
When our initial estimate is very far from the optimum, we can slow down our optimizer to do gradient descent until we get near a optimum and rapidly converge to it with Gauss-Newton.
LARGE MEANS SMALLER STEP, SMALL MEANS BIGGER STEPS / MORE GAUSS-NEWTONY STEPS
