The individual edit operaons simple variant of the

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ost func=ons –  Cost of the transforma=on Edit-based Similarity: Levenshtein Distance •  Sum of the costs of the individual edit opera=ons •  Simple variant of the edit distance –  the Levenshtein distance Sangmi Lee Pallickara CS480 Principles of Data Management 7 Spring 2013 Levenshtein Distance CS480 Principles of Data Management Spring 2013 Compute the Levenshtein distance •  Ini=alize a (|s1|+1) × (|s2|+1) matrix M s1 = Sean s2 = Shawn –  |s| denotes the length of a string s. •  Fill the matrix M with values computed using the equa=on: •  Levenshtein distance between s1 and s2 is 2. Mi,o = i –  Replace the e in s1 by h –  Insert a w to s1 Mo,j = j •  Infinite number of transforms are possible Mi, j= Mi-1, j-1 if s1,i = s2,j 1+min(Mi-1, j, Mi, j-1 , Mi-1, j-1) otherwise –  e.g. delete all characters and add Shawn •  Requires minimal number of opera=ons Sangmi Lee Pallickara CS480 Principles of Data Management 9 Spring 2013 Sangmi Lee Pallickara CS480 Principles of Data Management Example •  •  •  •  8 Sangmi Lee Pallickara 10 Spring 2013 Example-continued •  Mi,o = i Two strings s1 = Sean and s2 = Shawn |s1| = 4 |s2| = 5 Build a (4+1) x (5+1) matrix 0 1 2 3 4 Sangmi Lee Pallickara 11 Sangmi Lee Pallickara 12 2 2/22/13 CS480 Principles of Data Management Spring 2013 CS480 Principles of Data Management Example-continued Spring 2013 Example-continued •  Mi,o = i •  Mo,j = j •  Mi,o = i •  Mo,j = j •  Mi, j= Mi-1, j-1 if s1,i = s2,j 1+min(Mi-1, j, Mi, j-1 , Mi-1, j-1) otherwise S h a w n 0 1 2 3 4 1 2 3 4 5 0 1 S...
View Full Document

Ask a homework question - tutors are online