Unformatted text preview: Part V: String Matching Lecture 15: String Matching Lecture 15: String Matching Part V: String Matching Objective and Outline Objective : Discuss some basic string algorithms Reference : Chapter 34 of CLRS Outline String Matching Problem and Terminology. Brute Force Algorithm. The KnuthMorrisPratt (KMP) Algorithm. The BoyerMoore (BM) Algorithm. Lecture 15: String Matching Part V: String Matching String Matching Problem and Terminology Given an alphabet Σ a text array T [1 . . . n ] of characters from Σ a pattern array P [1 . . . m ] of characters from Σ. String Matching Problem: Find all occurrences of P in T . Lecture 15: String Matching Part V: String Matching String Matching Problem and Terminology A pattern P occurs in T with shift s , if P [1 . . . m ] = T [ s + 1 . . . s + m ] P a a a a b c ... c c b b c c T s=2 a b String Matching Problem: find all values of s . Obviously, we must have 0 ≤ s ≤ n m . Lecture 15: String Matching Part V: String Matching Brute Force Algorithm c b a c c b a s=1 s=0 s=2 P T c a a b a c b a c b a c c c b a a Initially, P is aligned with T at the first index position. P is then compared with T from lefttoright . If a mismatch occurs, ”slide” P to right by 1 position, and start the comparison again. Lecture 15: String Matching Part V: String Matching Brute Force Algorithm BF StringMatcher(T, P) n = length(T); m = length(P); // s increments by 1 in each iteration => slide P to right by 1 for (s= 0; s ≤ nm; s++) do // starts the comparison of P and T again i=1; while (i ≤ m && T[s+i]=P[i]) do // corresponds to compare P and T from lefttoright i++; end if i=m+1 then print ”Pattern occurs with shift=”, s; end end Complexity: O ( mn ) Lecture 15: String Matching Part V: String Matching The KnuthMorrisPratt (KMP) Algorithm In the BruteForce algorithm, if a mismatch occurs while comparing P with one segment of T in lefttoright scan, we only slides P to right by 1 position . Sometimes, we can slide P more than one positions: i 1 2 3 4 5 6 7 8 9 T A B C A B A B A D P A B A D q 1 2 3 4 Mismatch first occurs at T [3] and P [3]. Sliding P [1] one position will not result in a match: We know that T [2] = T [3 1] = P [3 1] = P [2]. (program progress info) We also know that P [1] 6 = P [2] (property of P ) Hence P [1] 6 = T [2] So, we can slide P [1] to T [3] right away. Lecture 15: String Matching Part V: String Matching The KnuthMorrisPratt (KMP) Algorithm So, we can slide P [1] to T [3] right away. i 1 2 3 4 5 6 7 8 9 T A B C A B A B A D P A B A D q 1 2 3 4 Slide P [1] to T [4] Lecture 15: String Matching Part V: String Matching The KnuthMorrisPratt (KMP) Algorithm Slide P [1] to T [4] i 1 2 3 4 5 6 7 8 9 T A B C A B A B A D P A B A D q 1 2 3 4 Mismatch occurs at T [7] and P [4]....
This note was uploaded on 10/18/2009 for the course COMP 271 taught by Professor Arya during the Spring '07 term at HKUST.
 Spring '07
 ARYA
 Algorithms

