cis6930fa11_IterativeBlocking

cis6930fa11_IterativeBlocking - Entity resolution with...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Entity resolution with Iterative Blocking Presented by Shuang Lin slin@cise.ufl.edu Outline Background Iterative Blocking Algorithm The Lego algorithm Disk-Based Iterative Blocking Experimental Evaluation Background Entity Resolution: identifies records in a database that refer to the same real-world entity. Motivation: Exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large databases. Most blocking techniques compare results within the same block only, do not exploit results from other blocks Motivation example CRITERIO N PARTITIONS BY b -,1 b -,2 b -,3 SC1 Zip code r s,t u,v SC2 1st char of last name r,s t u,v RECORD NAME ADDRESS(ZI P) EMAIL r John Doe 02139 jdoe@yahoo s John Doe 94305 t J.Foe 94305 jdoe@yahoo u Bobbie Brown 12345 bob@google v Bobbie Brown 12345 bob@google Motivation example Blocking result: {<r,s>,t,<u,v>} CRITERION b -,1 b -,2 b -,3 SC1 r s,t u,v SC2 r,s t u,v CRITERION b -,1 b -,2 b -,3 SC1 r s,t <u,v> SC2 r,s t u,v CRITERION b -,1 b -,2 b -,3 SC1 r s,t <u,v> SC2 <r,s> t <u,v> Iterative blocking Algorithm ER algorithm Input: a partition of records, Pi Output: a partition of records, Po Po dominate Pi Single blocking criterion () maps a record r to one or more bj,k blocks Multiple blocking criterion: MC(r)= =1 () Iterative blocking Algorithm iterative blocking process identifies matching records by running a core ER algorithm on each block and reflecting the resolution...
View Full Document

This note was uploaded on 11/09/2011 for the course CIS 6930 taught by Professor Staff during the Fall '08 term at University of Florida.

Page1 / 23

cis6930fa11_IterativeBlocking - Entity resolution with...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online