06 sift

06 sift - Week 4 CS/ECE 181B SIFT Scale Invariant Feature...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Week 4 CS/ECE 181B SIFT Scale Invariant Feature Transform Lowe, David G. “Distinctive Image Features from Scale Invariant Features”, International Journal of Computer Vision, Vol. 60, No. 2, 2004, pp. 91-110 Good software reference http://www.vlfeat.org/index.html Tuesday, January 26, 2010 Descriptor • descriptor = key point + description around that key point. • useful attributes: should be scale, rotation, location, irradiance (brightness) and view-point invariant. • • should be distinct enough for matching purposes. Tuesday, January 26, 2010 easy to compute. Basic Steps 1. Scale-space extrema detection: Identify interest points invariant to scale and orientation. 2. Key-point localization: Keypoints selected by model fitting and stability. 3. Dominant orientation: One or more orientations assigned to the keypoint, and data normalized w.r.t orientation. 4. Keypoint descriptor: a representation of the local region around the detected keypoints based on histogram of oriented edges. Tuesday, January 26, 2010 Scale-space extrema detection • Consider a Gaussian blurred image L(x, y, σ ) = G(x, y, σ ) ∗ I (x, y ) 1 −(x2 +y 2 )/σ 2 exp G(x, y, σ ) = 2πσ 2 DoG: D(x, y, σ ) = L(x, y, kσ) − L(x, y, σ ) Tuesday, January 26, 2010 DoG Images (Lowe 2004) Tuesday, January 26, 2010 Local maxima (scale and space) Compare the marked pixel X to its 26 neighbors in 3x3 regions at the current and adjacent scales. Tuesday, January 26, 2010 scale space sampling 3500 Number of keypoints per image 100 Repeatability (%) 80 60 40 Matching location and scale Nearest descriptor in database 20 0 3000 2500 2000 1500 Total number of keypoints Nearest descriptor in database 1000 500 1 2 3 4 5 6 7 Number of scales sampled per octave 8 1 2 3 4 5 6 7 Number of scales sampled per octave • 32 real imagesthe first graph shows the percent of keypoints that are repeatably detected at (human faces, outdoor scenes, aerial Figure 3: The top line of the same location and images, etc. scale in a transformed image as a function of the number of scales sampled per octave. The lower line shows the percent of keypoints that have their descriptors correctly matched to a large database. The second rotation, scaling, brightness/contrast • transformations:graph shows the total number of keypoints detected in a typical image as a function of the number of scale samples. changes, image noise. each other repeatability • Highestnear the transition. when s= 3 scales per octave. Therefore, we must settle for a solution that trades off efficiency with completeness. In fact, as might be expected and is confirmed by our experiments, extrema that are close together are quite unstable to small perturbations of the image. We can determine the best Tuesday, January 26, 2010 experimentally by studying a range of sampling freq choices uencies and using those that 8 4. scale space extrema detection • simple implementation: locate keypoints at the location and scale of the central sample point. • • • • you can use interpolation to more accurately locate these points in scale and space. remove low contrast keypoints. eliminate edge responses (recall, points along the edge are not unique for matching.) • use Harris detector type response function to prune. determine keypoint orientation(s). Tuesday, January 26, 2010 Original (233 x 189 pixels) (a) Tuesday, January 26, 2010 (b) (b) 832 detected keypoints Tuesday, January 26, 2010 (c) Tuesday, January 26, 2010 Figure 5: This figure shows the stages of keypo .... applying a threshold on contrast (b) The initial 832 keypoints locations at maxima 536 points remain after a threshold on principal curvature (corner like points) (d) Tuesday, January 26, 2010 Eliminating edge responses • The DoG filtered image will have a large principal curvature across the edge but a small one in the perpendicular direction. • The principal curvatures can be computed from a 2x2 Hessian matrix, H, at the location and scale of the keypoint: Dxx Dxy H= Dxy Dyy • Eigenvalues of H are proportional to principal curvatures of D. Tuesday, January 26, 2010 a bit more linear algebra Let α be the eigenvalue with the largest magnitude and β be the smaller one. Then: Tr(H) = Dxx + Dyy = α + β Det(H) = Dxx Dyy − (Dxy )2 = αβ T r(H)2 (α + β )2 (r + 1)2 = = Det(H) αβ r depends only on the ratios of eigenvalues. the quantity on the RHS is minimum when the two eigenvalues are equal (r=1), and increases with r. T r(H)2 (r + 1)2 Lowe suggests using a value r =10. < Det(H) Tuesday, January 26, 2010 r ation. Following experimentation with a number of approaches to assigning a local orienta following approach was found to give the most stable results. The scale of the keyp sed to select the Gaussian smoothed image, L, with the closest scale, so that all com ons are A gradient orientation histogram is computed insample, L(x, y ), at performed in a scale-invariant manner. For each image the neighborhood of the y ), and orientation, θ Gaussian image L using p le, the gradient magnitude, m(x, keypoint using the(x, y ), is precomputed at t ferences:he closest scale to the scale of the keypoint. Orientation Assignment • m(x, y ) = (L(x + 1, y ) − L(x − 1, y ))2 + (L(x, y + 1) − L(x, y − 1))2 θ (x, y ) = tan−1 ((L(x, y + 1) − L(x, y − 1))/(L(x + 1, y ) − L(x − 1, y ))) • An orientation histogram has 36 bins. Each sample added to the bin points w This histogram is formed from the gradient orientations of sample is gion around the keypoint. gradient magnitude and by a bins covering the 360 de The orientation histogram has 36 Gaussian-weighted weighted by its ge of orientations. Each sample added to the histogram is weighted by its gradient ma circular window with standard deviation that is 1.5 times that e and by a Gaussian-weighted circular window with a σ that is 1.5 times that of the s of the scale of the keypoint. the keypoint. Peaks in the orientation histogram correspond to dominant orientation. If Peaks in the histogram correspond to dominant directions of local gradi e highesthere are other peaks @ more and then any other local peak that is w peak in the histogram is detected, than 80% of the maximum, they t of the highest peak is used to also create a keypoint with that orientation. Therefore are also used to create keypoints at that orientation. ations with multiple peaks of similar magnitude, there will be multiple keypoints creat same location and scale but different orientations. Only about 15% of points are assi • Tuesday, January 26, 2010 Steps So far... 1. Create the Gaussian pyramid and the DoG pyramid at multiple scales. 2. find the key point locations through a scale space analysis. Prune the keypoints based on contrast and corner strength. 3. compute the dominant edge directions in the neighborhood of the keypoints. 4. so, we have the location, scale and orientation of the keypoints at this point in computation. We now need a representation of the image information at the keypoint location, i.e., a description. Tuesday, January 26, 2010 6. An invariant descriptor. • A weighted orientation histogram is now computed, relative to the keypoint orientation, on 4x4 pixel neighborhoods. Image gradients Keypoint descriptor 4 x 4 windows x 8 orientation bins = 128 dimensional descriptors Figure 7: A keypoint descriptor is created by first computing the gradient magnitude and orientation at each image sample point in a region around the keypoint location, as shown on the left. These are weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within Tuesday, January 26, 2010 a recognition task Figure 12: The training images for two objects are shown on the left. These can be recognized in a cluttered image with extensive occlusion, shown in the middle. The results of recognition are shown on the right. A parallelogram is drawn around each recognized object showing the boundaries of the original training image under the affine transformation solved for during recognition. Smaller squares indicate the keypoints that were used for recognition. The least-squares solution for the parameters x can be determined by solving the correspond- Tuesday, January 26, 2010 http://www.vlfeat.org/overview/sift.html#tut.sift.extract Tuesday, January 26, 2010 http://www.vlfeat.org/overview/sift.html#tut.sift.extract Tuesday, January 26, 2010 http://www.vlfeat.org/overview/sift.html#tut.sift.extract Tuesday, January 26, 2010 Project #2 • • Implement a SIFT descriptor. • compute the descriptors on both original and modified images. • what percentage of the descriptors match in terms of location and nearest neighbors? • • Due on or before Feb 1, 5PM Tuesday, January 26, 2010 Evaluation of the descriptor: Take one image and transform it with slight rotation, scaling and by adding noise. reports: similar to project 1, email. ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online