similarity2

# similarity2 - Applications of LSH Entity Resolution...

1 Applications of LSH Entity Resolution Fingerprints Similar News Articles

2 Desiderata rhombus6 Whatever form we use for LSH, we want : 1. The time spent performing the LSH should be linear in the number of objects. 2. The number of candidate pairs should be proportional to the number of truly similar pairs. rhombus6 Bucketizing guarantees (1).
3 Entity Resolution rhombus6 The entity-resolution problem is to examine a collection of records and determine which refer to the same entity. rhombus4 Entities could be people, events, etc. rhombus6 Typically, we want to merge records if their values in corresponding fields are similar.

4 Matching Customer Records rhombus6 I once took a consulting job solving the following problem: rhombus4 Company A agreed to solicit customers for Company B, for a fee. rhombus4 They then argued over how many customers. rhombus4 Neither recorded exactly which customers were involved.
5 Customer Records – (2) rhombus6 Company B had about 1 million records of all its customers. rhombus6 Company A had about 1 million records describing customers, some of whom it had signed up for B. rhombus6 Records had name, address, and phone, but for various reasons, they could be different for the same person.

6 Customer Records – (3) rhombus6 Step 1 : Design a measure (“ score ”) of how similar records are: rhombus4 E.g., deduct points for small misspellings (“Jeffrey” vs. “Jeffery”) or same phone with different area code. rhombus6 Step 2 : Score all pairs of records; report high scores as matches.
7 Customer Records – (4) rhombus6 Problem : (1 million) 2 is too many pairs of records to score.

