Pattern Matching 1 Pattern Matching 1 a b a c a a b 2 3 4 a b a c a b a b a c a b Pattern Matching 2 Outline and Reading Strings (§11.1) Pattern matching algorithms ± Brute-force algorithm (§11.2.1) ± Boyer-Moore algorithm (§11.2.2) ± Knuth-Morris-Pratt algorithm (§11.2.3) Pattern Matching 3 Strings A string is a sequence of characters Examples of strings: ± C++ program ± HTML document ± DNA sequence ± Digitized image An alphabet Σ is the set of possible characters for a family of strings Example of alphabets: ± ASCII (used by C and C++) ± Unicode (used by Java) ± {0, 1} ± {A, C, G, T} Let P be a string of size m ± A substring P [ i . . j ] of P is the subsequence of P consisting of the characters with ranks between i and j ± A prefix of P is a substring of the type P [0 .. i ] ± A suffix of P is a substring of the type P [ i . .m 1] Given strings T (text) and P (pattern), the pattern matching problem consists of finding a substring of T equal to P Applications: ± Text editors ± Search engines ± Biological research Pattern Matching 4 Brute-Force Algorithm The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift of P relative to T , until either ± a match is found, or ± all placements of the pattern have been tried Brute-force pattern matching runs in time O ( nm ) Example of worst case: ± T = aaa … ah ± P = aaah ± may occur in images and DNA sequences ± unlikely in English text Algorithm BruteForceMatch ( T, P ) Input text T of size n and pattern P of size m Output starting index of a substring of T equal to P or 1

