This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Algorithms Non-Lecture H: More String Matching Philosophers gathered from far and near To sit at his feat and hear and hear, Though he never was heard To utter a word But Abracadabra, abracadab, Abracada, abracad, Abraca, abrac, abra, ab! Twas all he had, Twas all they wanted to hear, and each Made copious notes of the mystical speech, Which they published next A trickle of text In the meadow of commentary. Mighty big books were these, In a number, as leaves of trees; In learning, remarkably very! Jamrach Holobom, quoted by Ambrose Bierce, The Devils Dictionary (1911) H More String Matching H.1 Redundant Comparisons Lets go back to the character-by-character method for string matching. Suppose we are looking for the pattern ABRACADABRA in some longer text using the (almost) brute force algorithm described in the previous lecture. Suppose also that when s = 11, the substring comparison fails at the fifth position; the corresponding character in the text (just after the vertical line below) is not a C . At this point, our algorithm would increment s and start the substring comparison from scratch. HOCUSPOCUSABRA BRACADABRA... ABRA / C ADABRA ABR ACADABRA If we look carefully at the text and the pattern, however, we should notice right away that theres no point in looking at s = 12. We already know that the next character is a B after all, it matched P [ 2 ] during the previous comparison so why bother even looking there? Likewise, we already know that the next two shifts s = 13 and s = 14 will also fail, so why bother looking there? HOCUSPOCUSABRA BRACADABRA... ABRA / C ADABRA / A BR ACADABRA / A B RACADABRA A BRACADABRA Finally, when we get to s = 15, we cant immediately rule out a match based on earlier comparisons. However, for precisely the same reason, we shouldnt start the substring comparison over from scratch we already know that T [ 15 ] = P [ 4 ] = A . Instead, we should start the substring comparison at the second character of the pattern, since we dont yet know whether or not it matches the corresponding text character. If you play with this idea long enough, youll notice that the character comparisons should always advance through the text. Once weve found a match for a text character, we never need to do another comparison with that text character again. In other words, we should be able to optimize the brute-force algorithm so that it always advances through the text. 1 Algorithms Non-Lecture H: More String Matching Youll also eventually notice a good rule for finding the next reasonable shift s . A prefix of a string is a substring that includes the first character; a suffix is a substring that includes the last character. A prefix or suffix is proper if it is not the entire string. Suppose we have just discovered that T [ i ] 6 = P [ j ] ....
View Full Document
This note was uploaded on 12/15/2009 for the course 942 cs taught by Professor A during the Spring '09 term at University of Illinois at Urbana–Champaign.
- Spring '09