similarity2

similarity2 - 1 Applications of LSH Entity Resolution...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Applications of LSH Entity Resolution Fingerprints Similar News Articles 2 Desiderata r Whatever form we use for LSH, we want : 1. The time spent performing the LSH should be linear in the number of objects. 2. The number of candidate pairs should be proportional to the number of truly similar pairs. r Bucketizing guarantees (1). 3 Entity Resolution r The e n t i t y - r e s o l u t i o n problem is to examine a collection of records and determine which refer to the same entity. R E n t i t i e s could be people, events, etc. r Typically, we want to merge records if their values in corresponding fields are similar. 4 Matching Customer Records r I once took a consulting job solving the following problem: R Company A agreed to solicit customers for Company B, for a fee. R They then argued over how many customers. R Neither recorded exactly which customers were involved. 5 Customer Records (2) r Company B had about 1 million records of all its customers. r Company A had about 1 million records describing customers, some of whom it had signed up for B. r Records had name, address, and phone, but for various reasons, they could be different for the same person. 6 Customer Records (3) r Step 1 : Design a measure ( s c o r e ) of how similar records are: R E.g., deduct points for small misspellings (Jeffrey vs. Jeffery) or same phone with different area code. r Step 2 : Score all pairs of records; report high scores as matches. 7 Customer Records (4) r Problem : (1 million) 2 is too many pairs of records to score....
View Full Document

Page1 / 25

similarity2 - 1 Applications of LSH Entity Resolution...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online