Unformatted text preview: Image Recognition
Local or Global? Tuesday, March 2, 2010 Project
• Eigenfaces for Face Recognition
• Bag of Features for Object Classiﬁcation Tuesday, March 2, 2010 Bag-of-features models from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Tuesday, March 2, 2010 combination of the best K eigenvectors: alternative: iA mean = # w j viewj = uTj !i )
! " holistic u j , ( w
K j =1 (we call the u j ’s eigenfaces) - Each normalized training face !i is represented in this basis by a vector: Tuesday, March 2, 2010 i
w1 as the feature vector.
use these coefficients (
& w2 )
$i = &
), i = 1, 2, . . . , M
& 4 Project Logistics
Each group will implement the two methods: PCA
(Principal Component Analysis) and BVW (Bag of Words).
You can implement this in MATLAB or any other language
of your choice. I will go over the project details in class
tomorrow. The two primary references are listed at the end
of this handout. 5 Tuesday, March 2, 2010 Face Images
• There are 34 students in this class. Each student is responsible for
collecting 3 pictures of his/her, all frontal images, exactly 64x64 pixels,
• Faces are approximately centered.
• Faces are approximately aligned as well. Use the two eyes and the corners
of the lips as anchor points to normalize the scale.
• Distribute only these normalized, 64x64 images. 6 Tuesday, March 2, 2010 Image Categories (256 x 256)
2.persons (full body, different view points)
3.bicycles (side view)
4.cars (side view)
5.backgrounds (not containing any of the above objects) The objects should be the dominant object in the scene
minimum of five per class 7 Tuesday, March 2, 2010 Timeline
March 1: All student generated datasets should be made
March 2, 3, 4: Further discussion of implementation details
(class lectures and discussion times)
March 9, 11: Brief 15 minute presentation in class by the
March 19: Last date to submit your ﬁnal project report.
Good news:You have almost a month for your final report!
and you have at least two weeks to prepare for your presentation.
8 Tuesday, March 2, 2010 Scalable Recognition
with a vocabulary tree
• David Nister and Henrik Stewenius, CVPR
2006 Tuesday, March 2, 2010 r Science, University of Kentucky
er/ http://www.vis.uky.edu/∼stewe/ Vocabulary Tree ge
ed, with each node in the vocabulary
tree there is an associated
inverted file with reference to
images containing an instance of
that node. val
on 10 in Tuesday, March 2, 2010 Building the vocabulary tree
hierarchical quantization initial k-means clustering second level k-means 11 Figure 2. An illustration of the process of building the vocabulary Tuesday, March 2, 2010 online phase
each descriptor vector is propagated down the
tree by comparing, at each level, to the k
candidate cluster centers (represented by the k
children of the tree) and choosing the closest
This requires computing k dot products at each
level, and a total of kL dot products (L is the
number of levels in the tree.)
The path down the tree can be encoded as a
single integer which can used for scoring. 12 Tuesday, March 2, 2010 Scoring
Relevance of a database image to the query
image based on the similarity of the paths down
the vocabulary tree (VT).
Let wi be the weight associated with a node i in the
Let qi = ni wi, where ni : number of descriptor
vectors of the query with a path through the
Similarly, di = mi wi , where di corresponds to a
s(q, d) =
Tuesday, March 2, 2010 13 Weights
simple case: keep them constant.
wi = ln
Ni where N is the number of images in the database,
and Ni is the number of images in the database
with at least one descriptor vector path through
node i. 14 Tuesday, March 2, 2010 observations (authors)
a large vocabulary tree is better than a small one.
do not assign strong weights to the inner nodes of
retrieval performance (accuracy) increases with
the number of leaf nodes. 15 Tuesday, March 2, 2010 Scoring - implementation
Every node in the vocabulary tree is associated
with an inverted file.
The inverted files store the ID numbers of the
images in which the particular node occurs, as
well as for each image the term frequency m_i.
Only leaf nodes explicitly represented in the
5. Implementation of Scoring
To score efﬁciently with large databases we use inverted
ﬁles. Every node in the vocabulary tree is associated with
an inverted ﬁle. The inverted ﬁles store the id-numbers of
the images in which a particular node occurs, as well as for
each image the term frequency mi . Forward ﬁles can also
be used as a complement, in order to look up which visual
words are present in a particular image.
Only the leaf nodes are explicitly represented in our
implementation, while the inverted ﬁles of inner nodes
simply are the concatenation of the inverted ﬁles of the
leaf nodes, see Figure 4. The length of the inverted ﬁle is
stored in each node of the vocabulary tree. This length is
essentially the document frequency with which the entropy
of the node is determined. As discussed above, inverted ﬁles
above a certain length are blocked from scoring.
While it is very straightforward to implement scoring
with fully expanded forward ﬁles, it takes some more
thought to score efﬁciently using inverted ﬁles. Assume
that the entropy of each node is ﬁxed and known, which
can be accomplished with a pre-computation for a particular
database, or by using a large representative database to Tuesday, March 2, 2010 List Virtual List List List Virtual Virtual • Inverted files of the inner nodes are
concatenations of the IFs of the leaf nodes.
• length of the IF is stored at each node.
• Length is the document frequency with
which the node entropy is determined.
• IFs with a large number may be blocked
from scoring (no weight given to
Figure 4. The database structure shown with two levels and a
branch factor of two. The leaf nodes have explicit inverted ﬁles
and the inner nodes have virtual inverted ﬁles that are computed as
the concatenation of the inverted ﬁles of the leaf nodes. which can easily be partitioned since the scalar product
is linear in di . For other norms, the situation is more Scoring-implementation details
some more details on how to implement the
distance function efficiently -- see the paper.
Feature extraction: 640 x 480 takes about 0.2 sec
Query takes about 25ms on a 50K image
database. 17 Tuesday, March 2, 2010 Results Figure 6. Curves showing percentage (y -axis) of the ground truth
query images that make it into the top x percent (x-axis) frames
of the query for a 1400 image database. The curves are shown
up to 5% of the database size. As discussed in the text, it is
crucial for scalable retrieval that the correct images from the
database make it to the very top of the query, since veriﬁcation
is feasible only for a tiny fraction of the database when the
database grows large. Hence, we are mainly interested in where
the curves meet the y -axis. To avoid clutter, this number is
given in Table 1 for a larger number of settings. A number of
conclusions can be drawn from these results: A larger vocabulary
improves retrieval performance. L1 -norm gives better retrieval
performance than L2 -norm. Entropy weighting is important, at
least for smaller vocabularies. Our best setting is method A, which
gives much better performance than the setting used by , which
is setting T. ure 5. The retrieval performance is evaluated using a large
und truth database (6376 images) with groups of four images
wn to be taken of the same object, but under different
ditions. Each image in turn is used as query image, and the
ee remaining images from its group should ideally be at the
of the query result. In order to compare against less efﬁcient
-hierarchical schemes we also use a subset of the database
sisting of around 1400 images. tings with a 1400 image subset of the test images. The
ves show the distribution of how far the wanted images
p in the query rankings. The points where a larger
mber of methods meet the y -axis are given in Table 1.
te especially that the use of a larger vocabulary and
o L1 - norm gives performance improvements over the Tuesday, March 2, 2010 settings used by .
The performance with various settings was also tested
on the full 6376 image database. It is important to note that
the scores decrease with increasing database size as there
are more images to confuse with. The effect of the shape
of the vocabulary tree is shown in Figure 7. The effects of
deﬁning the vocabulary tree with varying amounts of data
and training cycles are investigated in Figure 8.
Figure 10 shows a snapshot of a demonstration of the
method, running real-time on a 40000 image database of
CD covers, some connected to music. We have so far
tested the method with a database size as high as 1 million
images, more than one order of magnitude larger than any
other work we are aware of, at least in this category of
method. The results are shown in Figure 9. As we could
not obtain ground truth for that size of database, the 6376
image ground truth set was embedded in a database that also
contains several movies: The Bourne Identity, The Matrix,
Braveheart, Collateral, Resident Evil, Almost Famous and
Monsters Inc. Note that all frames from the movies are in Figure 6. Curves showing p
query images that make it
of the query for a 1400 im
~6400 images (groups of four,
up to 5% of the database
different imaging conditions).
crucial for scalable retriev
database make it to the ve
L1 norm seems to do better.
Figure 6. Curves showing percentage (
is feasible only for a tin
query images that make it into the top
database grows large. Hen
of vocabulary 1400 image databa
larger the query for a helps.
up to 5% of the meet the y -ax
the curves database size. As
crucial for scalable retrieval that the
given in Table 1 for a larg
database make it to the very top of th
isconclusions for a be drawn f
feasible only can tiny fraction
database growsretrieval perform
improves large. Hence, we are
the curves meet the y -axis. To avo
performance than L -norm
given in Table 1 for a larger2number
conclusions can be drawn from these r
least for smaller vocabulari
improves retrieval performance. L1 gives much better performa
performance than L2 -norm. Entropy
is for smaller vocabularies. Our best
leastsetting T. gives much better performance than the
is setting T. settings used by .
settings used by .
The performance wit
The performance with various
on the 6376 image image d
on the full full 6376database. It
the scores decrease with increasin
the scores decrease with
are more images to confuse with. Performance Analysis 19 Tuesday, March 2, 2010 Discussion Items?
Back to your project.
volunteers to go on Tuesday next week? 20 Tuesday, March 2, 2010 ...
View Full Document
- Fall '08
- WI, Tuesday, Vector space model, leaf nodes, inverted files, vocabulary tree