This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 6.851: Advanced Data Structures Spring 2010 Lecture 23 May 4, 2010 Prof. Erik Demaine 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and gave examples for succinct binary tries, as well as showing the equivalence of binary tries, rooted ordered trees, and balanced parenthesis expressions. Succinct data structures were introduced which solve the rank and select problems. In this lecture we introduce compact data structures for sux arrays and sux trees. Recall the problem that we are trying to solve. Given a text T over the alphabet , we wish to preprocess T to create a data structure. We then want to be able to use this data structure to search for a pattern P , also over . A sux array is an array containing all of the suxes of T in lexicographic order. In the interests of space, each entry in the sux array stores an index in T , the start of the sux in question. To find a pattern P in the sux array, we perform binary search on all suxes, which gives us all of the positions of P in T . 2 Survey In this section, we give a brief survey of results for compact sux arrays. Recall that a compact data structure uses O ( OPT ) bits, where OPT is the informationtheoretic optimum. For a sux array, we need  T  lg   bits just to store the text T . Grossi and Vitter 2000 [3] Sux array in 1 + O (1)  T  lg   bits, with query time O  P  + output log log    T       T  We will follow this paper fairly closely in our discussion today. Ferragina and Manzini 2000 [1] The space required is 5 H k ( T )  T  + o (  T  ) + O  T    O (   ) bits, for all fixed values of k . H k ( T ) is the k thorder empirical entropy, or the regular entropy conditioned on knowing the previous k characters. More formally: H k ( T ) = Pr { w occurs } H (characters following an occurrence of w in T ) .  w  = k 1 Note that because were calculating this in the empirical case, # of occurrences of w Pr { w occurs } = .  T  For this data structure, query time is O (  P  +  output  lg  T  ) . Sadakane 2003 [5] Space in bits is 1 H ( T )  T  + O (  T  lg lg   +   lg   ) , and query time is O (  P  lg  T  +  output  lg  T  ) . Note that this bound is more like a sux array, due to the multiplicative log factor. Grossi, Gupta, Vitter 2003 [2] This is the only known succinct result. Space in bits is H k ( T ) T + O  T  lg lg lg  T  ,     lg  T  and query time is O (  P  lg   + lg o (1)  T  ) . 3 Compressed sux arrays For the rest of these notes, we will assume that the alphabet is binary (in other words, that   = 2)....
View Full
Document
 Spring '10
 ErikDemaine
 Data Structures

Click to edit the document details