Of the two strings at most the length of the longer

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e 2 a 3 n 4 CS480 Principles of Data Management Spring 2013 3 2 2 1 2 4 3 3 2 2 5 4 4 3 2 14 CS480 Principles of Data Management Spring 2013 Upper/lower bounds –  A string metric for measuring distance between two sequences –  Minimum edit distance •  At least the difference of the size of the two strings •  At most the length of the longer string •  If the strings are iden=cal, 5 4 4 3 2 –  0 •  If the strings are the same size –  Hamming distance •  Number of posi=ons at which the corresponding symbols are different 15 Sangmi Lee Pallickara CS480 Principles of Data Management 4 3 3 2 2 •  Levenshtein Distance S h a w n 2 1 1 2 3 3 2 2 1 2 Sangmi Lee Pallickara Example-continued •  LevDist(Sean,Shawn) = M4,5 = 2 1 0 1 2 3 2 1 1 2 3 13 Sangmi Lee Pallickara 0 1 S e 2 a 3 n 4 1 0 1 2 3 Spring 2013 Sangmi Lee Pallickara CS480 Principles of Data Management 16 Spring 2013 Jaro Similarity •  Compares two strings by first iden3fying characters “common” to both strings Edit-based Similarity:" Jaro and Jaro-Winkler Similarity Sangmi Lee Pallickara 17 Sangmi Lee Pallickara, CS480, Spring 2012 18 3 2/22/13 CS480 Princi...
View Full Document

This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online