�
�
Introduction
to
Algorithms
October 10, 2005
Massachusetts Institute of Technology
6.046J/18.410J
Professors Erik D. Demaine and Charles E. Leiserson
Handout 13
Problem
Set
3
Solutions
Problem
31.
Pattern
Matching
Principal Skinner has a problem: he is absolutely
sure
that Bart Simpson has plagiarized some text
on a recent book report. One of Bart’s sentences sounds oddly familiar, but Skinner can’t quite
figure out where it came from. Skinner decides to see if some smartalec MIT student can help
him out.
Skinner gives you a DVD containing the full text of the Springfield public library. The data is
stored in a
binary
string
T
[1]
, T
[2]
, . . . , T
[
n
]
, which we view as an array
T
[1
. . n
]
, where each
T
[
i
]
is either
0
or
1
. Skinner also gives you the quote from Bart Simpson’s book report, a shorter
binary string
P
[1
. . m
]
, again where each
P
[
i
]
is either
0
or
1
, and where
m
<
n
. For a binary
string
A
[1
. . k
]
and for integers
i, j
with
1
≤
i
≤
j
≤
k
, we use the notation
A
[
i . . j
]
to refer
to the binary string
A
[
i
]
, A
[
i
+
1]
, . . . , A
[
j
]
, called a
substring
of
A
. The goal of this problem
is to determine whether
P
is a substring of
T
, i.e., whether
P
=
A
[
i . . j
]
for some
i, j
with
.
1
≤
i
≤
j
≤
n
For the purpose of this problem, assume that you can manipulate
O
(log
n
)
bit integers in constant
time. For example, if
x
≤
n
7
and
y
≤
n
5
, then you can calculate
x
+
y
in constant time. On the
other hand, you may not assume that
m
bit integers can be manipulated in constant time, because
m
may be too large. For example, if
m
=
Θ(log
2
n
)
and
x
and
y
are each
m
bit integers, you
cannot
calculate
x
+
y
in constant time. (In general, it is reasonable to assume that you can
manipulate integers of length logarithmic in the input size in constant time, but larger integers
require proportionally more time.)
(a)
Assume that you have a hash function
h
(
x
)
that computes a hash value of the
m

bit binary string
x
=
A
[
i . .
(
i
+
m
−
1)]
, for some binary string
A
[1
. . k
]
and some
1
≤
i
≤
k
−
m
+
1
. Moreover, assume that the hash function is perfect: if
x
=
y
, then
h
(
x
) =
h
(
y
)
. Assume that you can calculate the hash function in
O
(
m
)
time. Show
how to determine whether
P
is a substring of
T
in
O
(
mn
)
time.
Solution:
We compute the hash of the pattern string, and compare it to the hash of all
possible length
m
substrings of
A
, i.e., compare
h
(
P
)
to
h
(
A
[
i . .
(
i
+
m
−
1)])
, for
1
≤
i
<
n
−
m
+
1
. Since the hash function is perfect,
h
(
P
)
=
h
(
A
[
i . .
(
i
+
m
−
1)])
if
and only if
P
=
A
[
i, . .
(
i
+
m
−
1)]
. There are
O
(
n
)
hash functions to compute,
O
(
n
)
comparisons of hash values, and each computation and comparison requires
O
(
m
)
time, for a total running time of
O
(
mn
)
.
Note that because calculation of the hash function takes
O
(
m
)
time, this algorithm is
not asymptotically any better than simply comparing the substrings directly. This part
is designed as motivation for the rest of the problem.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
�
�
�
2
Handout
13:
Problem
Set
3
Solutions
(b)
Consider the following family of hash functions
h
p
, parameterized by a prime num
4
ber
p
in the range
[2
, cn
]
for some constant
c
>
0
:
h
p
(
x
)
=
x
(mod
p
)
.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '05
 ErikD.DemaineandCharlesE.Leiserson
 hash function, Hash functions, Perfect hash function

Click to edit the document details