O dm1 titlebig fishyear 2003 sangmi lee

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Management Example Spring 2013 Example MID TITLE YEAR REVIEW MID TITLE YEAR REVIEW m1 Big Fish 2003 8.1/10 m1 Big Fish 2003 8.1/10 m2 Public Enemies 2009 7.4/10 m2 Public Enemies 2009 7.4/10 m3 Public Enemies 09 null m3 Public Enemies 09 null m4 Big Fish 2003 7.5/10 m4 Big Fish 2003 7.5/10 2. Decide the relevant acributes 3. Decide the output –  Movie /tle is more relevant than Review –  Object Description O D(c) : Given a candidate c, the object descrip/on of c –  e.g. SELECT Title, Year! FROM Movie! WHERE MID=‘m1’! Will return the object descrip/on O D(c) of movie m1. O D(m1) = {(Title,Big Fish),(Year, 2003)} Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management –  Par//on the set of candidates based on informa/on provided by their object descrip/ons –  Each par//on contains candidates that represent the same real ­world object –  PMovie= {{m1, m4}, {m2, m3}} 21 Spring 2013 Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management 22 Spring 2013 Errors in the Duplication detection •  Duplica/on detec/on generates errors as well. –  Possible false nega/ve •  The similarity is not big enough –  Possible false posi/ve •  The similarity was high but they were not duplicates Similarity Functions •  Obtaining par//ons...
View Full Document

Ask a homework question - tutors are online