MIT6_851S10_lec07

MIT6_851S10_lec07 - 6.851: Advanced Data Structures Spring...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 6.851: Advanced Data Structures Spring 2010 Lecture 7 February 26, 2010 Prof. Andre Schulz Scribe: Mark Chen 1 Overview In this lecture, we consider the string matching problem - finding all places in a text where some query string occurs. From the perspective of a one-shot approach, we can solve string matching in O ( | T | ) time, where | T | is the size of our text. This purely algorithmic approach has been studied extensively in the papers by Knuth-Morris-Pratt [6], Boyer-Moore [1], and Rabin-Karp [4]. However, we entertain the possibility that multiple queries will be made to the same text. This motivates the development of data structures that preprocess the text to allow for more efficient queries. We will show how to construct, use, and analyze these string data structures. 2 Storing Strings and String Matching First, we introduce some notation. Throughout these notes, will denote a finite alphabet. An example of a finite alphabet is the standard set of English letters = { a,b,c,...,z } . A fixed string of characters T * will comprise what we call a text . Another string of characters P * will be called a search pattern . For integers i and j , define T [ i : j ] as the substring of T starting from the i th character and ending with the j th character inclusive. We will often omit j and write T [ i :] to denote the suffix of T starting at the i th character. Finally, we let the symbol denote concatenation. As a simple illustration of our notation, ( abcde [0 : 2]) ( cde [1 :]) = abcde . Now we can formally state the string matching problem: Given an input text T * and a pattern P * , we want to find all occurrences of P in T . Closely related variants of the string matching problem ask for the first, first k , or some occurrences, rather than for all occurrences. 2.1 Tries and Compressed Tries A commonly used string data structure is called a trie , a tree where each edge stores a letter, each node stores a string, and the root stores the empty string. The recursive relationship between the values stored on the edges and the values stored in the nodes is as follows: Given a path of increasing depth p = r,v 1 ,v 2 ,...,v from the root r to a node v , the string stored at node v i is the concatenation of the string stored in v i- 1 with the letter stored on v i- 1 v i . We will denote the strings stored in the leaves of the trie as words, and the strings stored in all other nodes as prefixes. If there is a natural lexicographical ordering on the elements in , we order the edges of every nodes fan-out alphabetically, from left to right. With respect to this ordering, in order traversal 1 of the leaves gives us every word stored in the trie in alphabetical order. In particular, it is easy to see that the fan-out of any node must be bounded above by the size of the alphabet | | ....
View Full Document

Page1 / 9

MIT6_851S10_lec07 - 6.851: Advanced Data Structures Spring...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online