Introduction
Tree structures support various basic dynamic set operations
including
Search
,
Predecessor
,
Successor
,
Minimum
,
Maximum
,
Insert
, and
Delete
in
time proportional to the height of the tree. Ideally, a tree will be balanced and the
height will be
log n
where
n
is the number of nodes in the tree. To ensure that the
height of the tree is as small as possible and therefore provide the best running time, a
balanced tree structure like a red-black tree, AVL tree, or b-tree must be used.
When working with large sets of data, it is often not possible or desirable to maintain
the entire structure in primary storage (RAM). Instead, a relatively small portion of
the data structure is maintained in primary storage, and additional data is read from
secondary storage as needed. Unfortunately, a magnetic disk, the most common form
of secondary storage, is significantly slower than random access memory (RAM). In
fact, the system often spends more time retrieving data than actually processing data.
B-trees are balanced trees that are optimized for situations when part or all of the tree
must be maintained in secondary storage such as a magnetic disk. Since disk accesses
are expensive (time consuming) operations, a b-tree tries to minimize the number of
disk accesses. For example, a b-tree with a height of 2 and a branching factor of 1001
can store over one billion keys but requires at most two disk accesses to search for any
node (Cormen 384).
The Structure of B-Trees
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and
children. The keys are stored in non-decreasing order. Each key has an associated
child that is the root of a subtree containing all nodes with keys less than or equal to
the key but greater than the preceeding key. A node also has an additional rightmost
child that is the root for a subtree containing all keys greater than any keys in the
node.
A b-tree has a minumum number of allowable children for each node known as
the
minimization factor
. If
t
is this
minimization factor
, every node must have at
least
t - 1
keys. Under certain circumstances, the root node is allowed to violate this
property by having fewer than
t - 1
keys. Every node may have at most
2t - 1
keys or,
equivalently,
2t
children.
Since each node tends to have a large branching factor (a large number of children), it
is typically neccessary to traverse relatively few nodes before locating the desired key.
If access to each node requires a disk access, then a b-tree will minimize the number
of disk accesses required. The minimzation factor is usually chosen so that the total