This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Introduction to Algorithms October 10, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 13 Problem Set 3 Solutions Problem 31. Pattern Matching Principal Skinner has a problem: he is absolutely sure that Bart Simpson has plagiarized some text on a recent book report. One of Barts sentences sounds oddly familiar, but Skinner cant quite figure out where it came from. Skinner decides to see if some smartalec MIT student can help him out. Skinner gives you a DVD containing the full text of the Springfield public library. The data is stored in a binary string T [1] ,T [2] ,...,T [ n ] , which we view as an array T [1 ..n ] , where each T [ i ] is either or 1 . Skinner also gives you the quote from Bart Simpsons book report, a shorter binary string P [1 ..m ] , again where each P [ i ] is either or 1 , and where m < n . For a binary string A [1 ..k ] and for integers i,j with 1 i j k , we use the notation A [ i..j ] to refer to the binary string A [ i ] ,A [ i + 1] ,...,A [ j ] , called a substring of A . The goal of this problem is to determine whether P is a substring of T , i.e., whether P = A [ i..j ] for some i,j with . 1 i j n For the purpose of this problem, assume that you can manipulate O (log n )bit integers in constant time. For example, if x n 7 and y n 5 , then you can calculate x + y in constant time. On the other hand, you may not assume that mbit integers can be manipulated in constant time, because m may be too large. For example, if m = (log 2 n ) and x and y are each mbit integers, you cannot calculate x + y in constant time. (In general, it is reasonable to assume that you can manipulate integers of length logarithmic in the input size in constant time, but larger integers require proportionally more time.) (a) Assume that you have a hash function h ( x ) that computes a hash value of the m bit binary string x = A [ i.. ( i + m 1)] , for some binary string A [1 ..k ] and some 1 i k m + 1 . Moreover, assume that the hash function is perfect: if x = y , then h ( x ) = h ( y ) . Assume that you can calculate the hash function in O ( m ) time. Show how to determine whether P is a substring of T in O ( mn ) time. Solution: We compute the hash of the pattern string, and compare it to the hash of all possible length m substrings of A , i.e., compare h ( P ) to h ( A [ i.. ( i + m 1)]) , for 1 i < n m + 1 . Since the hash function is perfect, h ( P ) = h ( A [ i.. ( i + m 1)]) if and only if P = A [ i,.. ( i + m 1)] . There are O ( n ) hash functions to compute, O ( n ) comparisons of hash values, and each computation and comparison requires O ( m ) time, for a total running time of O ( mn ) ....
View
Full
Document
This note was uploaded on 02/01/2010 for the course COMPUTERSC 6.046J/18. taught by Professor Erikd.demaineandcharlese.leiserson during the Fall '05 term at MIT.
 Fall '05
 ErikD.DemaineandCharlesE.Leiserson

Click to edit the document details