# nn-1 - 1 Near-Neighbor Search Applications Matrix...

This preview shows pages 1–12. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Near-Neighbor Search Applications Matrix Formulation Minhashing 2 Example Application : Face Recognition ◆ We have a database of (say) 1 million face images. ◆ We want to find the most similar images in the database. ◆ Represent faces by (relatively) invariant values, e.g., ratio of nose width to eye width. 3 Face Recognition – (2) ◆ Each image represented by a large number (say 1000) of numerical features. ◆ Problem : given a face, find those in the DB that are close in at least ¾ (say) of the features. 4 Face Recognition – (3) ◆ Many-one problem : given a new face, see if it is close to any of the 1 million old faces. ◆ Many-Many problem : which pairs of the 1 million faces are similar. 5 Simple Solution ◆ Represent each face by a vector of 1000 values and score the comparisons. ◆ Sort-of OK for many-one problem. ◆ Out of the question for the many-many problem (10 6 *10 6 *1000/2 numerical comparisons). ◆ We can do better ! 6 Multidimensional Indexes Don’t Work New face: [6,14,…] 0-4 5-9 10-14 . . . Dimension 1 = Surely we’d better look here. Maybe look here too, in case of a slight error. But the first dimension could be one of those that is not close. So we’d better look everywhere! 7 Another Problem : Entity Resolution ◆ Two sets of 1 million name-address-phone records. ◆ Some pairs, one from each set, represent the same person. ◆ Errors of many kinds : ◗ Typos, missing middle initial, area-code changes, St./Street, Bob/Robert, etc., etc. 8 Entity Resolution – (2) ◆ Choose a scoring system for how close names are. ◗ Deduct so much for edit distance > 0; so much for missing middle initial, etc. ◆ Similarly score differences in addresses, phone numbers. ◆ Sufficiently high total score -> records represent the same entity. 9 Simple Solution ◆ Compare each pair of records, one from each set. ◆ Score the pair. ◆ Call them the same if the score is sufficiently high. ◆ Unfeasible for 1 million records. ◆ We can do better ! 10 Example : Similar Customers ◆ Common pattern : looking for sets with a relatively large intersection. ◆ Represent a customer, e.g., of Netflix, by the set of movies they rented. ◆ Similar customers have a relatively large fraction of their choices in common. 11 Example : Similar Products ◆ Dual view of product-customer relationship....
View Full Document

{[ snackBarMessage ]}

### Page1 / 43

nn-1 - 1 Near-Neighbor Search Applications Matrix...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online