{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture5-1

# Tomas sean connery sir sean connery tokenizetomas sean

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e union of these two sets. •  What are P and Q? –  Set of tokens from Strings –  Complete descrip/on about data (candidates) Sangmi Lee Pallickara, CS480, Spring 2012 30 5 2/19/13 CS480 Principles of Data Management Spring 2013 CS480 Principles of Data Management Spring 2013 String comparison with JC Candidate comparison with JC •  Given a tokeniza/on func/on tokenize(s) that tokenizes a string s into a set of string tokens {s1, s2, … , sn } •  Given two candidates, the Jaccard coeﬃcient of two candidates c1 and c2 is given by, DescriptionJaccard(c1, c 2) = •  We compute the Jaccard coeﬃcient of two string s1, and s2 StringJaccard( s1, s2) = | tocknize( s1) tocknize( s2) | | tokenize( s1) tokenize( s2) | Sangmi Lee Pallickara, CS480, Spring 2012 € CS480 Principles of Data Management | OD(c1) OD(c 2) | | OD(c1) OD(c 2) | € 31 Spring 2013 Example Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management 32 Spring 2013 Example continued •  A Person is a candidate type with a descrip/on acribute Name. Tomas Sean Connery Sir Sean Connery Tokenize(Tomas Sean Connery) = {Tomas, Sean, Connery} Tokenize(Sir Sean Connery) = {Sir, Sean, Connery} Tokenize(Tomas Sean Connery) = {Tomas, Sean, Connery} Tokenize(Sir Sean Connery) = {Sir, S...
View Full Document

{[ snackBarMessage ]}