# nn - 1 Near-Neighbor Search Applications Matrix Formulation...

This preview shows pages 1–11. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Near-Neighbor Search Applications Matrix Formulation Minhashing 2 Example Problem--- Face Recognition r We have a database of (say) 1 million face images. r We are given a new image and want to find the most similar images in the database. r Represent faces by (relatively) invariant values, e.g., ratio of nose width to eye width. 3 Face Recognition --- (2) r Each image represented by a large number (say 1000) of numerical features. r Problem : given the features of a new face, find those in the DB that are close in at least ¾ (say) of the features. 4 Face Recognition --- (3) r M a n y - o n e p r o b l e m : given a new face, see if it is close to any of the 1 million old faces. r a n y - M a n y p r o b l e m : which pairs of the 1 million faces are similar. 5 Simple Solution r Represent each face by a vector of 1000 values and score the comparisons. r Sort-of OK for many-one problem. r Out of the question for the many-many problem (10 6 *10 6 *1000 numerical comparisons). r We can do better ! 6 Multidimensional Indexes Don’t Work New face: [6,14,…] 0-4 5-9 10-14 . . . Dimension 1 = Surely we’d better look here. Maybe look here too, in case of a slight error. But the first dimension could be one of those that is not close. So we’d better look everywhere! 7 Another Problem : Entity Resolution r Two sets of 1 million name-address-phone records. r Some pairs, one from each set, represent the same person. r Errors of many kinds : R Typos, missing middle initial, area-code changes, St./Street, Bob/Robert, etc., etc. 8 Entity Resolution --- (2) r Choose a scoring system for how close names are. R Deduct so much for edit distance > 0; so much for missing middle initial, etc. r Similarly score differences in addresses, phone numbers. r Sufficiently high total score -> records represent the same entity. 9 Simple Solution r Compare each pair of records, one from each set. r Score the pair. r Call them the same if the score is sufficiently high. r Unfeasible for 1 million records. r We can do better ! 10 Yet Another Problem : Finding Similar Documents r Given a body of documents, e.g., the Web, find pairs of docs that have a lot of text in common....
View Full Document

## This document was uploaded on 03/04/2012.

### Page1 / 41

nn - 1 Near-Neighbor Search Applications Matrix Formulation...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online