STRING ALGORITHMS
(Cormen, Leiserson, Riveset, and Stein, 2001, ISBN: 0070131511 (McGraw Hill), Chapter 32,
p906)
String processing problem
Input: Two strings T and P.
Problem: Find if P is a substring of T.
Example (1):
Input: T = gtgatcagatcact, P = tca
Output: Yes. gtga
tca
ga
tca
ct, shift=4, 9
Example (2):
Input: T = 189342670893,
P = 1673
Output: No.
Naïve Algorithm (T, P)
suppose n = length(T), m = length(P);
for shift s=0 through nm do
if
(P[1..m] = = T[s+1 .. s+m])
then
//
actually a forloop runs here
print shift s;
End algorithm.
Complexity: O((nm+1)m), or O(max{
nm, m
2
}
)
A special note: we allow O(k+1) type notation in order to avoid O(0) term,
rather, we want to have O(1) (constant time) in such a boundary situation.
Note: Too many repetition of matching of characters.
RabinKarp scheme
Consider a character as a number in a radix system, e.g., English alphabet as
in radix26.
Pick up each mlength "number" starting from shift=0 through (nm).
So, T = gtgatcagatcact, in radix4 (a/0, t/1, g/2, c/3) becomes
gtg = '212' in base4 = 32+4+2 in decimal,
tga = '120' in base4 = 16+8+0 in decimal,
….
Then do the comparison with P  numberwise.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Advantage: Calculating strings can reuse old results.
Consider decimals: 4359 and 3592
3592 = (4359  4*1000)*10 + 2
General formula: t
s+1
= d (t
s

d
m1
T[s+1])
+
T[s+m+1], in radixd, where t
s
is the corresponding number for the substring T[s..(s+m)]. Note, m is the
size of P.
The firstpass scheme: (1) preprocess for (nm) numbers on T and 1 for P,
(2) compare the number for P with those computed on T.
Problem: in case each number is too large for comparison
Solution:
Hash
, use modular arithmetic, with respect to a prime q.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '12
 Dmitra
 Algorithms, String searching algorithm, automaton Algorithm ComputeTransitionFunction, Pqx

Click to edit the document details