{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lec07

# lec07 - 6.851 Advanced Data Structures Spring 2010 Lecture...

This preview shows pages 1–3. Sign up to view the full content.

6.851: Advanced Data Structures Spring 2010 Lecture 7 – February 26, 2010 Prof. Andre Schulz Scribe: Mark Chen 1 Overview In this lecture, we consider the string matching problem - finding all places in a text where some query string occurs. From the perspective of a one-shot approach, we can solve string matching in O ( | T | ) time, where | T | is the size of our text. This purely algorithmic approach has been studied extensively in the papers by Knuth-Morris-Pratt [6], Boyer-Moore [1], and Rabin-Karp [4]. However, we entertain the possibility that multiple queries will be made to the same text. This motivates the development of data structures that preprocess the text to allow for more efficient queries. We will show how to construct, use, and analyze these string data structures. 2 Storing Strings and String Matching First, we introduce some notation. Throughout these notes, Σ will denote a finite alphabet. An example of a finite alphabet is the standard set of English letters Σ = { a, b, c, ..., z } . A fixed string of characters T Σ * will comprise what we call a text . Another string of characters P Σ * will be called a search pattern . For integers i and j , define T [ i : j ] as the substring of T starting from the i th character and ending with the j th character inclusive. We will often omit j and write T [ i :] to denote the suffix of T starting at the i th character. Finally, we let the symbol denote concatenation. As a simple illustration of our notation, ( abcde [0 : 2]) ( cde [1 :]) = abcde . Now we can formally state the string matching problem: Given an input text T Σ * and a pattern P Σ * , we want to find all occurrences of P in T . Closely related variants of the string matching problem ask for the first, first k , or some occurrences, rather than for all occurrences. 2.1 Tries and Compressed Tries A commonly used string data structure is called a trie , a tree where each edge stores a letter, each node stores a string, and the root stores the empty string. The recursive relationship between the values stored on the edges and the values stored in the nodes is as follows: Given a path of increasing depth p = r, v 1 , v 2 , ..., v from the root r to a node v , the string stored at node v i is the concatenation of the string stored in v i - 1 with the letter stored on v i - 1 v i . We will denote the strings stored in the leaves of the trie as words, and the strings stored in all other nodes as prefixes. If there is a natural lexicographical ordering on the elements in Σ, we order the edges of every node’s fan-out alphabetically, from left to right. With respect to this ordering, in order traversal 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
of the leaves gives us every word stored in the trie in alphabetical order. In particular, it is easy to see that the fan-out of any node must be bounded above by the size of the alphabet | Σ | .
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern