Trace m 3 dixon vs dicksonx m 4 sangmi lee pallickara

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ples of Data Management Spring 2013 CS480 Principles of Data Management Jaro Similarity: Formula Spring 2013 Finding m •  Keep track of their posi=ons within the two strings JaroSim( s1, s2 ) = 0 JaroSim( s1, s2 ) = 1 | m | | m | | m | −0.5 t ×( + + ) 3 | s1 | | s2 | |m | | i − j |≤ 0.5 × min(| s1 |, | s2 |) Otherwise € € –  i and j –  Do not differ by more than half of the length of the shorter string ,if m = 0 –  m : number of matching character –  t: number of transposi=on € Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management –  CRATE vs. TRACE (m = 3) –  DIXON vs. DICKSONX (m = 4) Sangmi Lee Pallickara, CS480, Spring 2012 19 Spring 2013 Finding t CS480 Principles of Data Management 20 Spring 2013 Jaro Similarity: example •  Once all common characters have been iden=fied, both strings are traversed sequen=ally •  Determine the number t of transposi=ons of common characters •  s1 = Prof. John Doe Prof._John_Doe •  s2 = Dr. John Doe Dr._John_Doe •  The set of common characters σ is –  A transposi=on occurs when the i ­th common character of s1 is not equal to the i ­th common character of s2. –  Within the maximum difference (0.5 x min(|s1|,|s2|)) •  Example –  CRATE vs. TRACE (t =...
View Full Document

Ask a homework question - tutors are online