lecture5-3

# Trace m 3 dixon vs dicksonx m 4 sangmi lee pallickara

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ples of Data Management Spring 2013 CS480 Principles of Data Management Jaro Similarity: Formula Spring 2013 Finding m •  Keep track of their posi=ons within the two strings JaroSim( s1, s2 ) = 0 JaroSim( s1, s2 ) = 1 | m | | m | | m | −0.5 t ×( + + ) 3 | s1 | | s2 | |m | | i − j |≤ 0.5 × min(| s1 |, | s2 |) Otherwise € € –  i and j –  Do not diﬀer by more than half of the length of the shorter string ,if m = 0 –  m : number of matching character –  t: number of transposi=on € Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management –  CRATE vs. TRACE (m = 3) –  DIXON vs. DICKSONX (m = 4) Sangmi Lee Pallickara, CS480, Spring 2012 19 Spring 2013 Finding t CS480 Principles of Data Management 20 Spring 2013 Jaro Similarity: example •  Once all common characters have been iden=ﬁed, both strings are traversed sequen=ally •  Determine the number t of transposi=ons of common characters •  s1 = Prof. John Doe Prof._John_Doe •  s2 = Dr. John Doe Dr._John_Doe •  The set of common characters σ is –  A transposi=on occurs when the i ­th common character of s1 is not equal to the i ­th common character of s2. –  Within the maximum diﬀerence (0.5 x min(|s1|,|s2|)) •  Example –  CRATE vs. TRACE (t =...
View Full Document

## This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online