This preview shows pages 1–3. Sign up to view the full content.
STRING ALGORITHMS
(Cormen, Leiserson, Riveset, and Stein, 2001, ISBN: 0070131511 (McGraw Hill), Chapter 32,
p906)
String processing problem
Input: Two strings T and P.
Problem: Find if P is a substring of T.
Example (1):
Input: T = gtgatcagatcact, P = tca
Output: Yes. gtga
tca
ga
tca
ct, shift=4, 9
Example (2):
Input: T = 189342670893,
P = 1673
Output: No.
Naïve Algorithm (T, P)
suppose n = length(T), m = length(P);
for shift s=0 through nm do
if
(P[1.
.m] = = T[s+1 .
. s+m])
then
//
actually a forloop runs here
print shift s;
End algorithm.
Complexity: O((nm+1)m), or O(max{
nm, m
2
}
)
A special note: we allow O(k+1) type notation in order to avoid O(0) term,
rather, we want to have O(1) (constant time) in such a boundary situation.
Note: Too many repetition of matching of characters.
RabinKarp scheme
Consider a character as a number in a radix system, e.g., English alphabet as
in radix26.
Pick up each mlength "number" starting from shift=0 through (nm).
So, T = gtgatcagatcact, in radix4 (a/0, t/1, g/2, c/3) becomes
gtg = '212' in base4 = 32+4+2 in decimal,
tga = '120' in base4 = 16+8+0 in decimal,
….
Then do the comparison with P  numberwise.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document Advantage: Calculating strings can reuse old results.
Consider decimals: 4359 and 3592
3592 = (4359  4*1000)*10 + 2
General formula: t
s+1
= d (t
s

d
m1
T[s+1])
+
T[s+m+1], in radixd, where t
s
is the corresponding number for the substring T[s.
.(s+m)]. Note, m is the
size of P.
The firstpass scheme: (1) preprocess for (nm) numbers on T and 1 for P,
(2) compare the number for P with those computed on T.
Problem: in case each number is too large for comparison
Solution:
Hash
, use modular arithmetic, with respect to a prime q.
New recurrence formula:
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 02/10/2012 for the course CSE 5211 taught by Professor Dmitra during the Spring '12 term at FIT.
 Spring '12
 Dmitra
 Algorithms

Click to edit the document details