Strings
6. Strings
String. Sequence of characters. Ex. Natural languages, Java programs, genomic sequences, .
The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of G's, A's, T's and C's
G e o m e t r i c A lg o r i t h m s
Reference: Chapters 26-27, Algorithms in C, 2nd Edition, Robert Sedgewick
Robert Sedgewick and Kevin Wayne Copyright 2006 http:/www.Princeton.EDU/~cos226
G e o m e t r ic se ar ch : O v e r v ie w
Types of data. Points
Optimize Judiciously
4.2 Hashing
More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason including blind stupidity. - William A. Wulf
We should forget about small efficiencies, say ab
G e o m e t r ic se ar ch : O v e r v ie w
G e o m e t r i c A lg o r i t h m s
Types of data. Points, lines, planes, polygons, circles, .
This lecture. Sets of N objects.
Geometric problems extend to higher dimensions.
Good algorithms also extend to high
Linear Programming
see ORF 307
Linear Programming
What is it? Quintessential tool for optimal allocation of scarce resources, among a number of competing activities. Powerful and general problem-solving method that encompasses: shortest path, network flow
B asic T e r m s
3 .1 E le m e n t a r y S o r t s
Ex: student record in a University.
Sort: rearrange sequence of objects into ascending order.
Reference: Chapter 6, Algorithms in Java, 3rd Edition, Robert Sedgewick.
Robert Sedgewick and Kevin Wayne Copy
S y m b o l T a b le
E le m e n t a r y S y m b o l T a b le s
Symbol table: key-value pair abstraction.
Insert a value with specified key.
Given a key, search for the corresponding value.
!
!
DNS lookup.
Insert URL with specified IP address.
Given URL, f
0LQ &RVW )ORZ
&RQWHQWV
s
0LQ FRVW IORZ 7UDQVSRUWDWLRQ SUREOHP $VVLJQPHQW SUREOHP 0DLO FDUULHU SUREOHP .OHLQ
V F\FOHFDQFHOLQJ DOJRULWKP 1HWZRUN VLPSOH[
s
s
s
s
s
QvprVvrv
8PT!%
6ytvuhq9hhTpr
Tvt!"
u)Qvpr@9Vp!%
0LQLPXP &RVW )ORZ
0LQLPXP FRVW IORZ SUREOHP
s
S et A DT
4 .5 S y m b o l T a b le A p p li c a t i o n s
Set. Unordered collection of distinct keys.
API for SET.
!
contains(key)
!
remove(key)
!
insert the key into the set
is the given key in the set?
remove the key from the set
return iterator over a
Minimum Spanning Tree
Minimum Spanning Tree
MST. Given connected graph G with positive edge weights, find a min weight set of edges that connects all of the vertices.
24
4 6 5 16 8 10
23 18 11 7
9
14 21
G
Reference: Chapter 20, Algorithms in Java, 3rd Edi
S t r in g S e ar ch
S t r in g S ear chin g
String search. Given a pattern string, find first match in text.
Model. Can't afford to preprocess the text.
Parameters. N = length of text, M = length of pattern.
typically N > M
Pattern
n
e
e
d
l
e
i
n
a
h
a
O v e r v ie w
Com b in at or ial S ear ch
Exhaustive search. Iterate through all elements of a search space.
Backtracking. Systematic method for generating all solutions
to a problem, by successively augmenting partial solutions.
Applicability. Huge rang
Desiderata
Reductions
Desiderata. Classify problems according to their computational requirements. Frustrating news. Huge number of fundamental problems have defied classification for decades.
Robert Sedgewick and Kevin Wayne Copyright 2006 http:/www.Prin
Pattern Matching
Regular Expressions
String search. Search for given string in a large text file. Regular expression. Natural and compact way to express multiple text patterns. Quintessential programmer's tool.
! !
Ex. Fragile X syndrome is a common cause
Mergesort and Quicksort
Mergesort and Quicksort
Two great sorting algorithms. Full scientific understanding of their properties has enabled us to hammer them into practical system sorts. Occupies a prominent place in world's computational infrastructure.
Sorting Applications
3.5 Applications
Applications.
!
Sort a list of names. Organize an MP3 library. Display Google PageRank results.
obvious applications
!
!
!
List RSS news items in reverse chronological order. Find the median. Find the closest pair. Bi
6RUWLQJ 6XPPDU\
Fr8hv DQyhpr 7iiyrT TryrpvT DrvT Turyy Rvpx Hrtr Crh Y Y Y Y Y Y Y Y Thiyr Y X I ! I ! I ! I I ! IytI !IytI 6rhtr I ! I ! I # I !IyI IytI !IytI 7r I I ! I I Iyt I IytI IytI Srhx
rrrv Irpuhtr rhpssshyyI vuFurrpr shrvhpvpr IytIthhrrhiyr IytI
Symbol Table Review
6.2 String Sets
Symbol table. Associate a value with a key. Search for value given key. Balanced trees use O(log N) key comparisons. Hashing uses O(1) probes, but probe proportional to key length.
! ! ! !
Q. Are key comparisons necessa
Running Time
Analysis of Algorithms
As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise - By what course of calculation can these results be ar
Symbol Table Review
4.4 Balanced Trees
Symbol table: key-value pair abstraction. Insert a value with specified key. Search for value given key. Delete value with given key.
! ! !
Randomized BST. O(log N) time per op. [unless you get ridiculously unlucky]
Data Compression
Data Compression
Compression reduces the size of a file: To save space when storing it. To save time when transmitting it. Most files have lots of redundancy.
! ! !
Who needs compression? Moore's law: # transistors on a chip doubles every
Directed Graphs
Directed Graphs
Digraph. Set of objects with oriented pairwise connections. Ex. One-way street, hyperlink.
Reference: Chapter 19, Algorithms in Java, 3rd Edition, Robert Sedgewick
Robert Sedgewick and Kevin Wayne Copyright 2006 http:/www.P
S y m b o l T a b le C h a lle n g e s
4 .3 B i n a r y S e a r c h T r e e s
Symbol table. Key-value pair abstraction.
Insert a value with specified key.
Search for value given key.
Delete value with given key.
!
!
!
B in a r y s e a r c h t r e e s
Chal
U n d ir e ct e d G r aph s
U n d ir ect ed G r aphs
Graph. Set of objects with pairwise connections.
Why study graph algorithms?
Interesting and broadly useful abstraction.
Challenging branch of computer science and discrete math.
Hundreds of graph algor