{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lec23

# lec23 - 6.851 Advanced Data Structures Spring 2010 Lecture...

This preview shows pages 1–3. Sign up to view the full content.

6.851: Advanced Data Structures Spring 2010 Lecture 23 — May 4, 2010 Prof. Erik Demaine Scribe: Mart´ ı Bol´ ıvar, heavily edited by Sarah Eisenstat 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and gave examples for succinct binary tries, as well as showing the equivalence of binary tries, rooted ordered trees, and balanced parenthesis expressions. Succinct data structures were introduced which solve the rank and select problems. In this lecture we introduce compact data structures for suffix arrays and suffix trees. Recall the problem that we are trying to solve. Given a text T over the alphabet Σ, we wish to preprocess T to create a data structure. We then want to be able to use this data structure to search for a pattern P , also over Σ. A suffix array is an array containing all of the suffixes of T in lexicographic order. In the interests of space, each entry in the suffix array stores an index in T , the start of the suffix in question. To find a pattern P in the suffix array, we perform binary search on all suffixes, which gives us all of the positions of P in T . 2 Survey In this section, we give a brief survey of results for compact suffix arrays. Recall that a compact data structure uses O ( OPT ) bits, where OPT is the information-theoretic optimum. For a suffix array, we need | T | lg | Σ | bits just to store the text T . Grossi and Vitter 2000 [3] Suffix array in 1 ε + O (1) | T | lg | Σ | bits, with query time O | P | log ε | Σ | | T | + | output | · log ε | Σ | | T | ! We will follow this paper fairly closely in our discussion today. Ferragina and Manzini 2000 [1] The space required is 5 H k ( T ) | T | + o ( | T | ) + O | T | ε · | Σ | O ( | Σ | ) bits, for all fixed values of k . H k ( T ) is the k th -order empirical entropy, or the regular entropy conditioned on knowing the previous k characters. More formally: H k ( T ) = X | w | = k Pr { w occurs } · H 0 (characters following an occurrence of w in T ) . 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Note that because we’re calculating this in the empirical case, Pr { w occurs } = # of occurrences of w | T | . For this data structure, query time is O ( | P | + | output | · lg ε | T | ) . Sadakane 2003 [5] Space in bits is 1 ε H 0 ( T ) | T | + O ( | T | lg lg | Σ | + | Σ | lg | Σ | ) , and query time is O ( | P | lg | T | + | output | lg ε | T | ) . Note that this bound is more like a suffix array, due to the multiplicative log factor. Grossi, Gupta, Vitter 2003 [2] This is the only known succinct result. Space in bits is H k ( T ) · | T | + O | T | lg | Σ | · lg lg | T | lg | T | , and query time is O ( | P | lg | Σ | + lg o (1) | T | ) . 3 Compressed suffix arrays For the rest of these notes, we will assume that the alphabet is binary (in other words, that | Σ | = 2). In this section, we will cover a simplified (and less space-efficient) data structure, which we will adapt in the next section for the compact data structure. 3.1 Top-Down The data structure uses ideas similar to those in the DC3 algorithm presented in Lecture 7. For this data structure, however, we will group the characters in our string into pairs rather than triples. If we were starting from the original suffix array, the definitions would be as follows: start The initial text T 0 = T , the initial size n 0 = n
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern