This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Introduction to Algorithms November 18, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 25 Problem Set 7 Solutions Problem 7-1. Edit distance In this problem you will write a program to compute edit distance. This problem is mandatory. Failure to turn in a solution will result in a serious and negative impact on your term grade! We advise you to start this programming assignment as soon as possible, because getting all the details right in a program can take longer than you think. Many word processors and keyword search engines have a spelling correction feature. If you type in a misspelled word x , the word processor or search engine can suggest a correction y . The correction y should be a word that is close to x . One way to measure the similarity in spelling between two text strings is by “edit distance.” The notion of edit distance is useful in other fields as well. For example, biologists use edit distance to characterize the similarity of DNA or protein sequences. The edit distance d ( x, y ) of two strings of text, x [1 . . m ] and y [1 . . n ] , is defined to be the minimum possible cost of a sequence of “transformation operations” (defined below) that transforms string x [1 . . m ] into string y [1 . . n ] . 1 To define the effect of the transformation operations, we use an auxiliary string z [1 . . s ] that holds the intermediate results. At the beginning of the transformation sequence, s = m and z [1 . . s ] = x [1 . . m ] (i.e., we start with string x [1 . . m ] ). At the end of the transformation sequence, we should have s = n and z [1 . . s ] = y [1 . . n ] (i.e., our goal is to transform into string y [ . . n ] ). Throughout the tranformation, we maintain the current length s of string z , as well as a cursor position i , i.e., an index into string z . The invariant 1 i s + 1 holds at all times during the transformation. (Notice that the cursor can move one space beyond the end of the string z in order to allow insertions at the end of the string.) Each transformation operation may alter the string z , the size s , and the cursor position i . Each transformation operation also has an associated cost. The cost of a sequence of transformation operations is the sum of the costs of the individual operations on the sequence. The goal of the edit-distance problem is to find a sequence of transformation operations of minimum cost that transforms x [1 . . m ] into y [1 . . n ] . There are five transformation operations: 1 Here we view a text string as an array of characters. Individual characters can be manipulated in constant time. 2 Handout 25: Problem Set 7 Solutions Operation Cost Effect left 0 If i = 1 then do nothing. Otherwise, set i i − 1 ....
View Full Document
- Fall '05
- Dynamic Programming, cursor position, edit distance