St using a similaritybased duplicate classicaon id

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lee Pallickara 5 Sangmi Lee Pallickara 6 1 3/1/13 CS480 Principles of Data Management Spring 2013 CS480 Principles of Data Management Example: Media database •  Media database that stores music tracks and an •  Using a similarity ­based duplicate classifica.on id ar5st track 1 Tori Amos Beekeeper 2 Amos, Tori Beekeeper 3 Beethoven Symphony Nr.5 4 Ludwig van Beethoven 5th Symphony 5 Beethoven Symphony Nr.1 6 Beethoven Symphony Nr.2 7 Beethoven Symphony Nr.3 8 Shubert Symphony Nr.1 9 AC DC Are you ready 10 AC/DC Are you ready 11 AC/DC Are U ready 12 Bob Dylan Are you Ready 13 •  Duplicates are {1,2}, {3,4}, and {9,10,11} Michael Jackson Thriller –  {1,2},{3,4},{3,5},{5,6},{6,7},{5,8},{9,10},{9,11},{10,11},{9, 12},{10,12} 1 CS480 Principles of Data Management 0.9 Spri...
View Full Document

This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online