Sorng phase requires approximately onlogn comparisons

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: orary table with original table –  If w log n the comparisons are dominated by the sor�ng phase   Sor�ng phase requires approximately O(nlogn) comparisons. 27 Sangmi Lee Pallickara CS480 Principles of Data Management Spring 2013 Comparison Number of comparisons Key genera�ons Sor�ng Detec�on Overall Blocking (n2/b –n)/2 Windowing (w-1)(n-w/2) O(n) n/a O(nlogn) O(n2 /b) O(n(n/b+log n)) O(nlogn) n/a O(wn) O(n2) O(n(w+log n)) O(n2) CS480 Principles of Data Management Spring 2013 Full enum. (n2-n)/2 O(n) 28 Sangmi Lee Pallickara Sangmi Lee Pallickara Duplicate Detection Algorithms� :For Complex Relationships  Pairwise comparison algorithms  Algorithms for data with complex rela�onships  Clustering algorithms 29 Sangmi Lee Pallickara 30 5 3/4/13 CS480 Principles of Data Management Hierarchical relationships Spring 2013 Canada Colorado Fort Collins Arizona Denver Spring 2013   Two candidates on level li+1 may only be duplicated if their parents on level li are duplicated. (or same parent) North America USA CS480 Principles of Data Management Mexico –  e.g. the ci�es under different states do not need to be compared. Nevada   We can prune comparisons based on duplicate classifica�ons previously performed on ancestors. Boulder   Traverse the tree in a top-­‐down fashion –  Candidates at the top-­‐most level l1 are compared before we proceed to level l2 and so on. –  1:N rela�onships between parent and child elements 31 Sangmi Lee Pallickara CS480 Principles of Data Management SXNM algorithm Spring 2013 32 Sangmi Lee Pallickara CS480 Principles of Data Management Spring 2013 Relationships forming a graph   Does not assume a 1:N rela�onship between parent and child elements   In general, rela�onships between candidates can form a graph   Traverses the hierarchy from bo�om to top   Duplicate detec�on on rela�onship graphs as graph algorithms –  Detect duplicate authors that are nested under non-­‐ duplicate books.   Uses the output of level li for coparing items of level li-1 33 Sangmi Lee Pallickara CS480 Principles o...
View Full Document

This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online