Introduction
Tree structures support various basic dynamic set operations
including
Search
,
Predecessor
,
Successor
,
Minimum
,
Maximum
,
Insert
, and
Delete
in
time proportional to the height of the tree. Ideally, a tree will be balanced and the
height will be
log n
where
n
is the number of nodes in the tree. To ensure that the
height of the tree is as small as possible and therefore provide the best running time, a
balanced tree structure like a redblack tree, AVL tree, or btree must be used.
When working with large sets of data, it is often not possible or desirable to maintain
the entire structure in primary storage (RAM). Instead, a relatively small portion of
the data structure is maintained in primary storage, and additional data is read from
secondary storage as needed. Unfortunately, a magnetic disk, the most common form
of secondary storage, is significantly slower than random access memory (RAM). In
fact, the system often spends more time retrieving data than actually processing data.
Btrees are balanced trees that are optimized for situations when part or all of the tree
must be maintained in secondary storage such as a magnetic disk. Since disk accesses
are expensive (time consuming) operations, a btree tries to minimize the number of
disk accesses. For example, a btree with a height of 2 and a branching factor of 1001
can store over one billion keys but requires at most two disk accesses to search for any
node (Cormen 384).
The Structure of BTrees
Unlike a binarytree, each node of a btree may have a variable number of keys and
children. The keys are stored in nondecreasing order. Each key has an associated
child that is the root of a subtree containing all nodes with keys less than or equal to
the key but greater than the preceeding key. A node also has an additional rightmost
child that is the root for a subtree containing all keys greater than any keys in the
node.
A btree has a minumum number of allowable children for each node known as
the
minimization factor
. If
t
is this
minimization factor
, every node must have at
least
t  1
keys. Under certain circumstances, the root node is allowed to violate this
property by having fewer than
t  1
keys. Every node may have at most
2t  1
keys or,
equivalently,
2t
children.
Since each node tends to have a large branching factor (a large number of children), it
is typically neccessary to traverse relatively few nodes before locating the desired key.
If access to each node requires a disk access, then a btree will minimize the number
of disk accesses required. The minimzation factor is usually chosen so that the total
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
size of each node corresponds to a multiple of the block size of the underlying storage
device. This choice simplifies and optimizes disk access. Consequently, a btree is an
ideal data structure for situations where all data cannot reside in primary storage and
accesses to secondary storage are comparatively expensive (or time consuming).
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 MITIN
 Tree structure, Btree

Click to edit the document details