What are p and q set of tokens from strings complete

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: strings into a set of tokens •  Typographical errors within tokens are penalized. –  The similarity measures of Sean Connery and Shawn Conery will be zero –  tokenize() –  “Sean Connery” results in a set of tokens {Sean, Connery} •  Similarity is less sensi/ve to word swaps compared to similarity measures that consider a string as a whole –  Sean Connery and Connery Sean will yield a maximum similarity score –  Both strings contain the exact same tokens Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management 27 Spring 2013 Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management 28 Spring 2013 Jaccard Coefficient •  Compare two sets P and Q with the following formula: Jaccard( P, Q) = Jaccard Similarity | P Q | | P Q | •  Measures the frac/on of the data that is shared between P and Q •  € Compared to all data available in th...
View Full Document

This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online