Tomas sean connery sir sean connery tokenizetomas sean

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e union of these two sets. •  What are P and Q? –  Set of tokens from Strings –  Complete descrip/on about data (candidates) Sangmi Lee Pallickara, CS480, Spring 2012 30 5 2/19/13 CS480 Principles of Data Management Spring 2013 CS480 Principles of Data Management Spring 2013 String comparison with JC Candidate comparison with JC •  Given a tokeniza/on func/on tokenize(s) that tokenizes a string s into a set of string tokens {s1, s2, … , sn } •  Given two candidates, the Jaccard coefficient of two candidates c1 and c2 is given by, DescriptionJaccard(c1, c 2) = •  We compute the Jaccard coefficient of two string s1, and s2 StringJaccard( s1, s2) = | tocknize( s1) tocknize( s2) | | tokenize( s1) tokenize( s2) | Sangmi Lee Pallickara, CS480, Spring 2012 € CS480 Principles of Data Management | OD(c1) OD(c 2) | | OD(c1) OD(c 2) | € 31 Spring 2013 Example Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management 32 Spring 2013 Example continued •  A Person is a candidate type with a descrip/on acribute Name. Tomas Sean Connery Sir Sean Connery Tokenize(Tomas Sean Connery) = {Tomas, Sean, Connery} Tokenize(Sir Sean Connery) = {Sir, Sean, Connery} Tokenize(Tomas Sean Connery) = {Tomas, Sean, Connery} Tokenize(Sir Sean Connery) = {Sir, S...
View Full Document

This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online