Unformatted text preview: Week 4
CS/ECE 181B SIFT
Scale Invariant Feature Transform
Lowe, David G. “Distinctive Image Features from Scale Invariant Features”, International
Journal of Computer Vision, Vol. 60, No. 2, 2004, pp. 91110 Good software reference
http://www.vlfeat.org/index.html Tuesday, January 26, 2010 Descriptor
• descriptor = key point + description around that
key point. • useful attributes: should be scale, rotation,
location, irradiance (brightness) and viewpoint
invariant. •
• should be distinct enough for matching purposes. Tuesday, January 26, 2010 easy to compute. Basic Steps
1. Scalespace extrema detection: Identify interest points
invariant to scale and orientation.
2. Keypoint localization: Keypoints selected by model
ﬁtting and stability.
3. Dominant orientation: One or more orientations
assigned to the keypoint, and data normalized w.r.t
orientation.
4. Keypoint descriptor: a representation of the local
region around the detected keypoints based on
histogram of oriented edges. Tuesday, January 26, 2010 Scalespace extrema detection
• Consider a Gaussian blurred image
L(x, y, σ ) = G(x, y, σ ) ∗ I (x, y )
1
−(x2 +y 2 )/σ 2
exp
G(x, y, σ ) =
2πσ 2 DoG: D(x, y, σ ) = L(x, y, kσ) − L(x, y, σ ) Tuesday, January 26, 2010 DoG Images (Lowe 2004) Tuesday, January 26, 2010 Local maxima (scale and space) Compare the marked
pixel X to its 26
neighbors in 3x3
regions at the current
and adjacent scales. Tuesday, January 26, 2010 scale space sampling
3500
Number of keypoints per image 100 Repeatability (%) 80 60 40
Matching location and scale
Nearest descriptor in database 20 0 3000
2500
2000
1500
Total number of keypoints
Nearest descriptor in database 1000
500 1 2 3
4
5
6
7
Number of scales sampled per octave 8 1 2 3
4
5
6
7
Number of scales sampled per octave • 32 real imagesthe ﬁrst graph shows the percent of keypoints that are repeatably detected at
(human faces, outdoor scenes, aerial
Figure 3: The top line of
the same location and
images, etc. scale in a transformed image as a function of the number of scales sampled per
octave. The lower line shows the percent of keypoints that have their descriptors correctly matched to
a large database. The second rotation, scaling, brightness/contrast
• transformations:graph shows the total number of keypoints detected in a typical image
as a function of the number of scale samples.
changes, image noise.
each other repeatability
• Highestnear the transition. when s= 3 scales per octave. Therefore, we must settle for a solution that trades off efﬁciency with completeness.
In fact, as might be expected and is conﬁrmed by our experiments, extrema that are close
together are quite unstable to small perturbations of the image. We can determine the best
Tuesday, January 26, 2010 experimentally by studying a range of sampling freq
choices
uencies and using those that 8 4. scale space extrema detection
• simple implementation: locate keypoints at the location and
scale of the central sample point. • •
•
• you can use interpolation to more accurately locate these points in
scale and space. remove low contrast keypoints.
eliminate edge responses (recall, points along the edge are
not unique for matching.) • use Harris detector type response function to prune. determine keypoint orientation(s). Tuesday, January 26, 2010 Original (233 x 189 pixels) (a)
Tuesday, January 26, 2010 (b) (b)
832 detected keypoints
Tuesday, January 26, 2010 (c)
Tuesday, January 26, 2010 Figure 5: This ﬁgure shows the stages of keypo
.... applying a threshold on contrast
(b) The initial 832 keypoints locations at maxima 536 points remain after a threshold on principal curvature
(corner like points) (d)
Tuesday, January 26, 2010 Eliminating edge responses
• The DoG ﬁltered image will have a large principal
curvature across the edge but a small one in the
perpendicular direction. • The principal curvatures can be computed from a
2x2 Hessian matrix, H, at the location and scale of
the keypoint:
Dxx Dxy
H=
Dxy Dyy • Eigenvalues of H are proportional to principal
curvatures of D. Tuesday, January 26, 2010 a bit more linear algebra
Let α be the eigenvalue with the largest magnitude and β be the smaller
one. Then:
Tr(H) = Dxx + Dyy = α + β
Det(H) = Dxx Dyy − (Dxy )2 = αβ T r(H)2
(α + β )2
(r + 1)2
=
=
Det(H)
αβ
r depends only on the ratios of eigenvalues. the quantity
on the RHS is minimum when the two eigenvalues are
equal (r=1), and increases with r.
T r(H)2
(r + 1)2
Lowe suggests using a value r =10.
<
Det(H) Tuesday, January 26, 2010 r ation.
Following experimentation with a number of approaches to assigning a local orienta
following approach was found to give the most stable results. The scale of the keyp
sed to select the Gaussian smoothed image, L, with the closest scale, so that all com
ons are A gradient orientation histogram is computed insample, L(x, y ), at
performed in a scaleinvariant manner. For each image the
neighborhood of the y ), and orientation, θ Gaussian image L using p
le, the gradient magnitude, m(x, keypoint using the(x, y ), is precomputed at
t
ferences:he closest scale to the scale of the keypoint. Orientation Assignment • m(x, y ) = (L(x + 1, y ) − L(x − 1, y ))2 + (L(x, y + 1) − L(x, y − 1))2 θ (x, y ) = tan−1 ((L(x, y + 1) − L(x, y − 1))/(L(x + 1, y ) − L(x − 1, y ))) • An orientation histogram has 36 bins. Each sample added to the bin points w
This histogram is formed from the gradient orientations of sample is
gion around the keypoint. gradient magnitude and by a bins covering the 360 de
The orientation histogram has 36 Gaussianweighted
weighted by its
ge of orientations. Each sample added to the histogram is weighted by its gradient ma
circular window with standard deviation that is 1.5 times that
e and by a Gaussianweighted circular window with a σ that is 1.5 times that of the s
of the scale of the keypoint.
the keypoint.
Peaks in the orientation histogram correspond to dominant orientation. If
Peaks in the histogram correspond to dominant directions of local gradi
e highesthere are other peaks @ more and then any other local peak that is w
peak in the histogram is detected, than 80% of the maximum, they
t
of the highest peak is used to also create a keypoint with that orientation. Therefore
are also used to create keypoints at that orientation.
ations with multiple peaks of similar magnitude, there will be multiple keypoints creat
same location and scale but different orientations. Only about 15% of points are assi • Tuesday, January 26, 2010 Steps So far...
1. Create the Gaussian pyramid and the DoG pyramid at
multiple scales.
2. ﬁnd the key point locations through a scale space analysis.
Prune the keypoints based on contrast and corner
strength.
3. compute the dominant edge directions in the
neighborhood of the keypoints.
4. so, we have the location, scale and orientation of the
keypoints at this point in computation.
We now need a representation of the image information at
the keypoint location, i.e., a description. Tuesday, January 26, 2010 6. An invariant descriptor.
• A weighted orientation histogram is now computed, relative to
the keypoint orientation, on 4x4 pixel neighborhoods. Image gradients Keypoint descriptor 4 x 4 windows x 8 orientation bins = 128 dimensional
descriptors Figure 7: A keypoint descriptor is created by ﬁrst computing the gradient magnitude and orientation
at each image sample point in a region around the keypoint location, as shown on the left. These are
weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated
into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with
the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within
Tuesday, January 26, 2010 a recognition task Figure 12: The training images for two objects are shown on the left. These can be recognized in a
cluttered image with extensive occlusion, shown in the middle. The results of recognition are shown
on the right. A parallelogram is drawn around each recognized object showing the boundaries of the
original training image under the afﬁne transformation solved for during recognition. Smaller squares
indicate the keypoints that were used for recognition. The leastsquares solution for the parameters x can be determined by solving the correspond Tuesday, January 26, 2010 http://www.vlfeat.org/overview/sift.html#tut.sift.extract Tuesday, January 26, 2010 http://www.vlfeat.org/overview/sift.html#tut.sift.extract Tuesday, January 26, 2010 http://www.vlfeat.org/overview/sift.html#tut.sift.extract Tuesday, January 26, 2010 Project #2
•
• Implement a SIFT descriptor. • compute the descriptors on both original and
modiﬁed images. • what percentage of the descriptors match in terms
of location and nearest neighbors? •
• Due on or before Feb 1, 5PM Tuesday, January 26, 2010 Evaluation of the descriptor: Take one image and
transform it with slight rotation, scaling and by
adding noise. reports: similar to project 1, email. ...
View
Full Document
 Fall '08
 Staff
 Tuesday, Scaleinvariant feature transform, orientation histogram

Click to edit the document details