Progressively blurs an image through Gaussian Blur and then utilizes difference of gaussians to determine points of interest

SIFT determines local maxima and minima. It does this by doing the following For each point

Compare to 8 neighbors in the same scale
Compare to 9 neighbors in the scale above
Compare to 9 neighbors in the scale below

If it is the max or min, it is considered a keypoint candidate $(x, y, σ)$ . where $σ$ is the variance of the gaussian blur that we used at that specific layer. (A sub-pixel refinement scheme is done afterwards)

To Filter, theres a threshold you can set as the user.

We also want to describe our point further. We can do this with an Orientation Assignment.

Around the point, we organize the gradients of the pixels into a histogram (buckets of angles). The dominant orientation is added tot he keypoint ^ finds the dominant orientation v aligns on that dominant orientation and then constructs final feature vector

Final L2 Normalization step is done to make the descriptor invariant to linear illumination changes

Feature Matching

Can use Euclidean Distance (on the entire feature vector). Because they are linear, and normalized:

d(f₁, f₂)² = ||f₁||² + ||f₂||² - 2·f₁ᵀ·f₂
           = 2 - 2·f₁ᵀ·f₂  (since ||f₁|| = ||f₂|| = 1)

For ORB, they use hamming distance, which is the number of differing bits of the vector

Optimizations

Build a KD-tree on the descriptor space to search in O(logm)
Approximate Nearest Neighbor

Explorer

SIFT

Feature Matching

Optimizations

Graph View

Table of Contents