St using a similaritybased duplicate classicaon id

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lee Pallickara 5 Sangmi Lee Pallickara 6 1 3/1/13 CS480 Principles of Data Management Spring 2013 CS480 Principles of Data Management Example: Media database •  Media database that stores music tracks and an ar.st •  Using a similarity ­based duplicate classifica.on id ar5st track 1 Tori Amos Beekeeper 2 Amos, Tori Beekeeper 3 Beethoven Symphony Nr.5 4 Ludwig van Beethoven 5th Symphony 5 Beethoven Symphony Nr.1 6 Beethoven Symphony Nr.2 7 Beethoven Symphony Nr.3 8 Shubert Symphony Nr.1 9 AC DC Are you ready 10 AC/DC Are you ready 11 AC/DC Are U ready 12 Bob Dylan Are you Ready 13 •  Duplicates are {1,2}, {3,4}, and {9,10,11} Michael Jackson Thriller –  {1,2},{3,4},{3,5},{5,6},{6,7},{5,8},{9,10},{9,11},{10,11},{9, 12},{10,12} 1 CS480 Principles of Data Management 0.9 Spri...
View Full Document

Ask a homework question - tutors are online