Quad Trees
Region data vs. point data.
Roads and rivers in a country/state.
Which rivers flow through Florida?
Which roads cross a river?
Network firewalls.
(source prefix, destination prefix, action)
(01*, 110*, drop packet)
27
dest 24
8
source
15
Suffix Trees
Suffix trees
Linearized suffix trees
Virtual suffix trees
Suffix arrays
Enhanced suffix arrays
Suffix cactus, suffix vectors,
Suffix Trees
String any sequence of characters.
Substring of string S string composed of
characters i through
Digital Search Trees & Binary Tries
Analog of radix sort to searching.
Keys are binary bit strings.
Fixed length 0110, 0010, 1010, 1011.
Variable length 01, 00, 101, 1011.
Application IP routing, packet classification,
firewalls.
IPv4 32 bit IP addr
Bottom-Up Splay TreesAnalysis
Actual and amortized complexity of join is
O(1).
Amortized complexity of search, insert, delete,
and split is O(log n).
Actual complexity of each splay tree operation
is the same as that of the associated splay.
Sufficien
B+-Trees
Same structure as B-trees.
Dictionary pairs are in leaves only. Leaves form a
doubly-linked list.
Remaining nodes have following structure:
j a0 k1 a1 k2 a2 kj aj
j = number of keys in node.
ai is a pointer to a subtree.
ki <= smallest key
Red Black Trees
Colored Nodes Definition
Binary search tree.
Each node is colored red or black.
Root and all external nodes are black.
No root-to-external-node path has two
consecutive red nodes.
All root-to-external-node paths have the
same number o
Red-Black TreesAgain
rank(x) = # black pointers on path from x to an
external node.
Same as #black nodes (excluding x) from x to an
external node.
rank(external node) = 0.
An Example
10
2
1
1
0
1
2
7
40
1
3
1
0 0
3
1
5
0
0
1
2 30
8
1
20
0
1
0
0
25
45
1
B-Trees (continued)
Analysis of worst-case and average number
of disk accesses for an insert.
Delete and analysis.
Structure for B-tree node
Worst-Case Disk Accesses
7 12
4
1 3
9
5 6
Insert 14.
Insert 2.
Insert 18.
8
15 20
10
13
16 17 30 40
Worst-Case
Binary Tries (continued)
split(k).
Similar to split algorithm for unbalanced binary
search trees.
Construct S and B on way down the trie.
Follow with a backward cleanup pass over the
constructed S and B.
Forward Pass
Suppose you are at node x, which
Internet Routers
http:/www.windowsecurity.com/whitepapers/Excerpts_from_The_Encyclopedia_of_Networking_.html
Sample Routers
Router Functionality
I
N
P
U
T
P
O
R
T
S
O
U
T
P
U
T
P
O
R
T
S
Rule Table
Used to decide where to send a packet next
(next hop).
Bloom Filters
Differential Files
Simple large database.
Collection/file of records residing on disk.
Single key.
Index to records.
Operations.
Retrieve.
Update.
Insert a new record.
Make changes to an existing record.
Delete a record.
Nave Mode
R-Trees
Extension of B+-trees.
Collection of d-dimensional rectangles.
A point in d-dimensions is a trivial rectangle.
Non-rectangular Data
Non-rectangular data may be represented by
minimum bounding rectangles (MBRs).
Operations
Insert
Delete
Find
BSP Trees
Binary space partitioning trees.
Used to store a collection of objects in ndimensional space.
Tree recursively divides n-dimensional
space using (n-1)-dimensional hyperplanes.
Space Partitioning
n-dimensional space
splitting hyperplane
(n-1)-
Multidimensional Range Search
Static collection of records.
No inserts, deletes, changes.
Only queries.
Each record has k key fields.
Multidimensional query.
Given k ranges [li, ui], 1 <= i <= k.
Report all records in collection such that
li <= ki
Segment Trees
Basic data structure in computational geometry.
Computational geometry.
Computations with geometric objects.
Points in 1-, 2-, 3-, d-space.
Closest pair of points.
Nearest neighbor of given point.
Lines in 1-, 2-, 3-, d-space.
Machin
Priority Search Trees
Keys are distinct ordered pairs (xi, yi).
Basic operations.
get(x,y) return element whose key is (x,y).
delete(x,y) delete and return element whose key
is (x,y).
insert(x,y,e) insert element e, whose key is (x,y).
Rectangle ope
Priority Search Trees
Keys are pairs (x,y).
Basic (search, insert, delete) and rectangle
operations.
Two varieties.
Based on a balanced binary search tree such as a
red-black tree.
Red-black Priority Search Tree (RBPST)
Based on a radix search tree.
Interval Trees
Store intervals of the form [li,ri], li <= ri.
An interval is stored in exactly 1 node.
So, O(n) nodes.
3 versions.
Differing capability.
Version 1
Store intervals of the form [li,ri], li <= ri.
At least 1 interval per node.
Static inte
Interval Trees
Store intervals of the form [li,ri], li <= ri.
Insert and delete intervals.
Version 1
Answer queries of the form: which intervals
intersect/overlap a given interval [l,r].
Version 2Variant
Report just 1 overlapping interval.
Definitio
B-Trees
Large degree B-trees used to represent very large
dictionaries that reside on disk.
Smaller degree B-trees used for internal-memory
dictionaries to overcome cache-miss penalties.
AVL Trees
n = 230 = 109 (approx).
30 <= height <= 43.
When the
Splay Trees
Binary search trees.
Search, insert, delete, and split have amortized
complexity O(log n) & actual complexity O(n).
Actual and amortized complexity of join is O(1).
Priority queue and double-ended priority queue
versions outperform heaps,
Tournament Trees
Winner trees.
Loser Trees.
Winner Tree Definition
Complete binary tree with n external
nodes and n 1 internal nodes.
External nodes represent tournament
players.
Each internal node represents a match
played between its two children;
the w
Improve Run Merging
Reduce number of merge passes.
Use higher order merge.
Number of passes
= ceil(logk(number of initial runs)
where k is the merge order.
More generally, a higher-order merge
reduces the cost of the optimal merge tree.
Improve Run Me
Advanced Data Structures
Sartaj Sahni
Clip Art Sources
www.barrysclipart.com
www.livinggraphics.com
www.rad.kumc.edu
www.livinggraphics.com
What The Course Is About
Study data structures for:
External sorting
Single and double ended priority queues
Di
External Sorting
Sort n records/elements that reside on a disk.
Space needed by the n records is very large.
n is very large, and each record may be large or
small.
n is small, but each record is very large.
So, not feasible to input the n records, s