�
�
�
�
�
6.851:
Advanced
Data
Structures
Spring
2010
Lecture
23
—
May
4,
2010
Prof.
Erik
Demaine
1
Overview
In
the
last
lecture
we
introduced
the
concept
of
implicit,
succinct,
and
compact
data
structures,
and
gave
examples
for
succinct
binary
tries,
as
well
as
showing
the
equivalence
of
binary
tries,
rooted
ordered
trees,
and
balanced
parenthesis
expressions.
Succinct
data
structures
were
introduced
which
solve
the
rank
and
select
problems.
In
this
lecture
we
introduce
compact
data
structures
for
suﬃx
arrays
and
suﬃx
trees.
Recall
the
problem
that
we
are
trying
to
solve.
Given
a
text
T
over
the
alphabet
Σ,
we
wish
to
preprocess
T
to
create
a
data
structure.
We
then
want
to
be
able
to
use
this
data
structure
to
search
for
a
pattern
P
,
also
over
Σ.
A
suﬃx
array
is
an
array
containing
all
of
the
suﬃxes
of
T
in
lexicographic
order.
In
the
interests
of
space,
each
entry
in
the
suﬃx
array
stores
an
index
in
T
,
the
start
of
the
suﬃx
in
question.
To
find
a
pattern
P
in
the
suﬃx
array,
we
perform
binary
search
on
all
suﬃxes,
which
gives
us
all
of
the
positions
of
P
in
T
.
2
Survey
In
this
section,
we
give
a
brief
survey
of
results
for
compact
suﬃx
arrays.
Recall
that
a
compact
data
structure
uses
O
(
OPT
)
bits,
where
OPT
is
the
informationtheoretic
optimum.
For
a
suﬃx
array,
we
need

T

lg

Σ

bits
just
to
store
the
text
T
.
Grossi
and
Vitter
2000
[3]
Suﬃx
array
in
1
ε
+
O
(1)

T

lg

Σ

bits,
with
query
time
�
�
O

P

+
output
log
ε
log

ε
Σ


T


 ·

Σ


T

We
will
follow
this
paper
fairly
closely
in
our
discussion
today.
Ferragina
and
Manzini
2000
[1]
The
space
required
is
5
H
k
(
T
)

T

+
o
(

T

) +
O

T

ε
· 
Σ

O
(

Σ

)
bits,
for
all
fixed
values
of
k
.
H
k
(
T
)
is
the
k
th
order
empirical
entropy,
or
the
regular
entropy
conditioned
on
knowing
the
previous
k
characters.
More
formally:
H
k
(
T
) =
Pr
{
w
occurs
} ·
H
0
(characters
following
an
occurrence
of
w
in
T
)
.

w

=
k
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
�
�
�
�
Note
that
because
we’re
calculating
this
in
the
empirical
case,
#
of
occurrences
of
w
Pr
{
w
occurs
}
=
.

T

For
this
data
structure,
query
time
is
O
(

P

+

output
 ·
lg
ε

T

)
.
Sadakane
2003
[5]
Space
in
bits
is
1
ε
H
0
(
T
)

T

+
O
(

T

lg lg

Σ

+

Σ

lg

Σ

)
,
and
query
time
is
O
(

P

lg

T

+

output

lg
ε

T

)
.
Note
that
this
bound
is
more
like
a
suﬃx
array,
due
to
the
multiplicative
log
factor.
Grossi,
Gupta,
Vitter
2003
[2]
This
is
the
only
known
succinct
result.
Space
in
bits
is
H
k
(
T
)
T
+
O

T

lg
Σ
lg lg

T

,
· 


 ·
lg

T

and
query
time
is
O
(

P

lg

Σ

+
lg
o
(1)

T

)
.
3
Compressed
suﬃx
arrays
For
the
rest
of
these
notes,
we
will
assume
that
the
alphabet
is
binary
(in
other
words,
that

Σ

=
2).
In
this
section,
we
will
cover
a
simplified
(and
less
spaceeﬃcient)
data
structure,
which
we
will
adapt
in
the
next
section
for
the
compact
data
structure.
3.1
TopDown
The
data
structure
uses
ideas
similar
to
those
in
the
DC3
algorithm
presented
in
Lecture
7.
For
this
data
structure,
however,
we
will
group
the
characters
in
our
string
into
pairs
rather
than
triples.
If
we
were
starting
from
the
original
suﬃx
array,
the
definitions
would
be
as
follows:
start
The
initial
text
T
0
=
T
,
the
initial
size
n
0
=
n
,
and
the
initial
suﬃx
array
SA
0
=
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 ErikDemaine
 Data Structures, Array data structure, suffix array

Click to edit the document details