11 recognition

11 recognition - Image Recognition Local or Global?...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Image Recognition Local or Global? Tuesday, March 2, 2010 Project • Eigenfaces for Face Recognition • Bag of Features for Object Classification Tuesday, March 2, 2010 Bag-of-features models from Fei-Fei Li, Rob Fergus, and Antonio Torralba Tuesday, March 2, 2010 combination of the best K eigenvectors: alternative: iA mean = # w j viewj = uTj !i ) ˆ ! " holistic u j , ( w K j =1 (we call the u j ’s eigenfaces) - Each normalized training face !i is represented in this basis by a vector: Tuesday, March 2, 2010 i w1 as the feature vector. use these coefficients ( % i & w2 ) $i = & ), i = 1, 2, . . . , M ... ) & 4 Project Logistics Goals Each group will implement the two methods: PCA (Principal Component Analysis) and BVW (Bag of Words). You can implement this in MATLAB or any other language of your choice. I will go over the project details in class tomorrow. The two primary references are listed at the end of this handout. 5 Tuesday, March 2, 2010 Face Images • There are 34 students in this class. Each student is responsible for collecting 3 pictures of his/her, all frontal images, exactly 64x64 pixels, JPEG compressed. • Faces are approximately centered. • Faces are approximately aligned as well. Use the two eyes and the corners of the lips as anchor points to normalize the scale. • Distribute only these normalized, 64x64 images. 6 Tuesday, March 2, 2010 Image Categories (256 x 256) 1.face images 2.persons (full body, different view points) 3.bicycles (side view) 4.cars (side view) 5.backgrounds (not containing any of the above objects) The objects should be the dominant object in the scene minimum of five per class 7 Tuesday, March 2, 2010 Timeline March 1: All student generated datasets should be made available online. March 2, 3, 4: Further discussion of implementation details (class lectures and discussion times) March 9, 11: Brief 15 minute presentation in class by the different groups. March 19: Last date to submit your final project report. Good news:You have almost a month for your final report! and you have at least two weeks to prepare for your presentation. 8 Tuesday, March 2, 2010 Scalable Recognition with a vocabulary tree • David Nister and Henrik Stewenius, CVPR 2006 Tuesday, March 2, 2010 r Science, University of Kentucky er/ http://www.vis.uky.edu/∼stewe/ Vocabulary Tree ge is ers . ng ust on ry re we in the he ed, with each node in the vocabulary tree there is an associated inverted file with reference to images containing an instance of that node. val of on 10 in Tuesday, March 2, 2010 Building the vocabulary tree hierarchical quantization initial k-means clustering second level k-means 11 Figure 2. An illustration of the process of building the vocabulary Tuesday, March 2, 2010 online phase each descriptor vector is propagated down the tree by comparing, at each level, to the k candidate cluster centers (represented by the k children of the tree) and choosing the closest one. This requires computing k dot products at each level, and a total of kL dot products (L is the number of levels in the tree.) The path down the tree can be encoded as a single integer which can used for scoring. 12 Tuesday, March 2, 2010 Scoring Relevance of a database image to the query image based on the similarity of the paths down the vocabulary tree (VT). Let wi be the weight associated with a node i in the VT. Let qi = ni wi, where ni : number of descriptor vectors of the query with a path through the node i. Similarly, di = mi wi , where di corresponds to a database image. Relevance Score: q d s(q, d) = ￿ − ￿ ￿q ￿ ￿d￿ Tuesday, March 2, 2010 13 Weights simple case: keep them constant. entropy weighting: N wi = ln Ni where N is the number of images in the database, and Ni is the number of images in the database with at least one descriptor vector path through node i. 14 Tuesday, March 2, 2010 observations (authors) a large vocabulary tree is better than a small one. do not assign strong weights to the inner nodes of the tree. retrieval performance (accuracy) increases with the number of leaf nodes. 15 Tuesday, March 2, 2010 Scoring - implementation Every node in the vocabulary tree is associated with an inverted file. The inverted files store the ID numbers of the images in which the particular node occurs, as well as for each image the term frequency m_i. Only leaf nodes explicitly represented in the 5. Implementation of Scoring implementation. To score efficiently with large databases we use inverted files. Every node in the vocabulary tree is associated with an inverted file. The inverted files store the id-numbers of the images in which a particular node occurs, as well as for each image the term frequency mi . Forward files can also be used as a complement, in order to look up which visual words are present in a particular image. Only the leaf nodes are explicitly represented in our implementation, while the inverted files of inner nodes simply are the concatenation of the inverted files of the leaf nodes, see Figure 4. The length of the inverted file is stored in each node of the vocabulary tree. This length is essentially the document frequency with which the entropy of the node is determined. As discussed above, inverted files above a certain length are blocked from scoring. While it is very straightforward to implement scoring with fully expanded forward files, it takes some more thought to score efficiently using inverted files. Assume that the entropy of each node is fixed and known, which can be accomplished with a pre-computation for a particular database, or by using a large representative database to Tuesday, March 2, 2010 List Virtual List List List Virtual Virtual • Inverted files of the inner nodes are concatenations of the IFs of the leaf nodes. • length of the IF is stored at each node. • Length is the document frequency with which the node entropy is determined. • IFs with a large number may be blocked from scoring (no weight given to Figure 4. The database structure shown with two levels and a branch factor of two. The leaf nodes have explicit inverted files such nodes) and the inner nodes have virtual inverted files that are computed as 16 the concatenation of the inverted files of the leaf nodes. which can easily be partitioned since the scalar product is linear in di . For other norms, the situation is more Scoring-implementation details some more details on how to implement the distance function efficiently -- see the paper. Feature extraction: 640 x 480 takes about 0.2 sec Query takes about 25ms on a 50K image database. 17 Tuesday, March 2, 2010 Results Figure 6. Curves showing percentage (y -axis) of the ground truth query images that make it into the top x percent (x-axis) frames of the query for a 1400 image database. The curves are shown up to 5% of the database size. As discussed in the text, it is crucial for scalable retrieval that the correct images from the database make it to the very top of the query, since verification is feasible only for a tiny fraction of the database when the database grows large. Hence, we are mainly interested in where the curves meet the y -axis. To avoid clutter, this number is given in Table 1 for a larger number of settings. A number of conclusions can be drawn from these results: A larger vocabulary improves retrieval performance. L1 -norm gives better retrieval performance than L2 -norm. Entropy weighting is important, at least for smaller vocabularies. Our best setting is method A, which gives much better performance than the setting used by [17], which is setting T. ure 5. The retrieval performance is evaluated using a large und truth database (6376 images) with groups of four images wn to be taken of the same object, but under different ditions. Each image in turn is used as query image, and the ee remaining images from its group should ideally be at the of the query result. In order to compare against less efficient -hierarchical schemes we also use a subset of the database sisting of around 1400 images. tings with a 1400 image subset of the test images. The ves show the distribution of how far the wanted images p in the query rankings. The points where a larger mber of methods meet the y -axis are given in Table 1. te especially that the use of a larger vocabulary and o L1 - norm gives performance improvements over the Tuesday, March 2, 2010 settings used by [17]. The performance with various settings was also tested on the full 6376 image database. It is important to note that the scores decrease with increasing database size as there are more images to confuse with. The effect of the shape of the vocabulary tree is shown in Figure 7. The effects of defining the vocabulary tree with varying amounts of data and training cycles are investigated in Figure 8. Figure 10 shows a snapshot of a demonstration of the method, running real-time on a 40000 image database of CD covers, some connected to music. We have so far tested the method with a database size as high as 1 million images, more than one order of magnitude larger than any other work we are aware of, at least in this category of method. The results are shown in Figure 9. As we could not obtain ground truth for that size of database, the 6376 image ground truth set was embedded in a database that also contains several movies: The Bourne Identity, The Matrix, Braveheart, Collateral, Resident Evil, Almost Famous and Monsters Inc. Note that all frames from the movies are in Figure 6. Curves showing p query images that make it of the query for a 1400 im ~6400 images (groups of four, up to 5% of the database different imaging conditions). crucial for scalable retriev database make it to the ve L1 norm seems to do better. Figure 6. Curves showing percentage ( is feasible only for a tin query images that make it into the top database grows large. Hen of vocabulary 1400 image databa larger the query for a helps. up to 5% of the meet the y -ax the curves database size. As crucial for scalable retrieval that the given in Table 1 for a larg database make it to the very top of th isconclusions for a be drawn f feasible only can tiny fraction database growsretrieval perform improves large. Hence, we are the curves meet the y -axis. To avo performance than L -norm given in Table 1 for a larger2number conclusions can be drawn from these r least for smaller vocabulari improves retrieval performance. L1 gives much better performa performance than L2 -norm. Entropy is for smaller vocabularies. Our best leastsetting T. gives much better performance than the is setting T. settings used by [17]. settings used by [17]. The performance wit The performance with various 18 on the 6376 image image d on the full full 6376database. It the scores decrease with increasin the scores decrease with are more images to confuse with. Performance Analysis 19 Tuesday, March 2, 2010 Discussion Items? Back to your project. Presentation logistics? volunteers to go on Tuesday next week? 20 Tuesday, March 2, 2010 ...
View Full Document

This note was uploaded on 12/29/2011 for the course ECE 181b taught by Professor Staff during the Fall '08 term at UCSB.

Ask a homework question - tutors are online