This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Insertion is handled by some form of overﬂow ﬁle that is merged periodically with
the data ﬁle. The index is re—created during ﬁle reorganization. IBM’s ISAM organ—
ization incorporates a two—level index that is closely related to the organization of
the disk. The ﬁrst level is a cylinder index, which has the key value of an anchor
record for each cylinder of a disk pack and a pointer to the track index for the cylin—
der. The track index has the key value of an anchor record for each track in the cylinder and a pointer to the track. The track can then be searched sequentially for
the desired record or block. Algorithm 14.1 outlines the search procedure for a record in a data ﬁle that uses a
nondense multilevel primary index with t levels. We refer to entry 1' at level j of the
index as <Kj(i), Pj(i)>, and we search for a record whose primary key value is K. We
assume that any overﬂow records are ignored. If the record is in the ﬁle, there must
be some entry at level 1 with K1(i) S K < K1( i + 1) and the record will be in the block of the data ﬁle whose address is P1(i). Exercise 14.19 discusses modifying the search
algorithm for other types of indexes. Algorithm 14.1. Searching a Nondense Multilevel Primary Index
with t Levels. p <— address of top level block of index;
forje— tstep— 1 to 1 do
begin
read the index block (at jth index level) whose address is p;
search blockp for entry i such that (i) S K< Kj(i + 1) (if Kj(i )
is the last entry in the block, it is sufﬁcient to satisfy Kj(i ) S K);
p <— Pj(i ) (* picks appropriate pointer at jth index level *)
end;
read the data ﬁle block whose address is p;
search block p for record with key = K; As we have seen, a multilevel index reduces the number of blocks accessed when
searching for a record, given its indexing ﬁeld value. We are still faced with the prob—
lems of dealing with index insertions and deletions, because all index levels are
physically ordered ﬁles. To retain the beneﬁts of using multilevel indexing while
reducing index insertion and deletion problems, designers adopted a multilevel
index called a dynamic multilevel index that leaves some space in each of its blocks for inserting new entries. It is often implemented by using data structures called
Btrees and B+—trees, which we describe in the next section. 14.3 Dynamic Multilevel Indexes Using
BTrees and B+Trees Btrees and B+—trees are special cases of the well—known tree data structure. We
brieﬂy introduce the terminology used in discussing tree data structures. A tree is
formed of nodes. Each node in the tree, except for a special node called the root, has $4.35. 34.38. $4.37.. $4.38.
$4.39. ber of blocks required by the multilevel index and the blocks used in the
extra level of indirection; and (vi) the approximate number of block
accesses needed to search for and retrieve all records in the ﬁle that have
a speciﬁc Department_code value, using the index. f. Suppose that the ﬁle is ordered by the nonkey ﬁeld Department_code and
we want to construct a clustering index on Department_code that uses
block anchors (every new value of Department_code starts at the begin
ning of a new block). Assume there are 1000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed
among these values. Calculate (i) the index blocking factor bfri (which is
also the index fan—out f0); (ii) the number of ﬁrst—level index entries and
the number of ﬁrstlevel index blocks; (iii) the number of levels needed if
we make it into a multilevel index; (iv) the total number of blocks j
required by the multilevel index; and (v) the number of block accesses .
needed to search for and retrieve all records in the ﬁle that have a speciﬁc
Department_code value, using the clustering index (assume that multiple
blocks in a cluster are contiguous). g. Suppose that the ﬁle is not ordered by the key ﬁeld Ssn and we want to .
construct a B+—tree access structure (index) on Ssn. Calculate (i) the
orders 19 and pleaf of the B+—tree; (ii) the number of leaflevel blocks
needed if blocks are approximately 69 percent full (rounded up for con V
venience); (iii) the number of levels needed if internal nodes are also 69
percent full (rounded up for convenience); (iv) the total number of
blocks required by the B+tree; and (v) the number of block accesses
needed to search for and retrieve a record from the ﬁle—given its Ssn
value—using the B+tree. h. Repeat part g, but for a B—tree rather than for a B+—tree. Compare your
results for the Btree and for the B+tree. A PARTS ﬁle with Part# as the key ﬁeld includes records with the followin
Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78,15,16,2l
24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose that the search ﬁeld valu
are inserted in the given order in a B+—tree of order p = 4 and pleaf = 3; sho 2
how the tree will expand and what the ﬁnal tree will look like. Repeat Exercise 14.15, but use a Btree of order p = 4 instead of a B+tree. Suppose that the following search ﬁeld values are deleted, in the given 0rd
from the B+tree of Exercise 14.15; show how the tree will shrink and sho
the ﬁnal tree. The deleted values are 65, 75,43, 18, 20, 92, 59, 37. 1 Repeat Exercise 14.17, but for the Btree of Exercise 14.16. Algorithm 14.1 outlines the procedure for searching a nondense multil .“
primary index to retrieve a ﬁle record. Adapt the algorithm for each of I3:
following cases: ' a. A multilevel secondary index on a nonkey nonordering ﬁeld of a Ii
Assume that option 3 of Section 14.1.3 is used, where an extra level: indirection stores pointers to the individual records with the corres
ponding index ﬁeld value. b. A multilevel secondary index on a nonordering key ﬁeld of a ﬁle. c. A multilevel clustering index on a nonkey ordering ﬁeld of a ﬁle. 14.26. Suppose that several secondary indexes exist on nonkey fields of a file,
implemented using option 3 of Section 14.1.3; for example, we could have
secondary indexes on the ﬁelds Department_code, Job_code, and Salary of the
EMPLOYEE ﬁle of Exercise 14.14. Describe an efﬁcient way to search for and
retrieve records satisfying a complex selection condition on these ﬁelds, such
as (Department_code = 5 AND Job_code = 12 AND Salary 2 50,000), using the
record pointers in the indirection level. 14.23. Adapt Algorithms 14.2 and 14.3, which outline search and insertion proce—
dures for a B+—tree, to a B—tree. 14.22. It is possible to modify the B+—tree insertion algorithm to delay the case
where a new level is produced by checking for a possible redistribution of val—
ues among the leaf nodes. Figure 14.15 (next page) illustrates how this could
be done for our example in Figure 14.12; rather than splitting the leftmost
leaf node when 12 is inserted, we do a left redistribution by moving 7 to the
leaf node to its left (if there is space in this node). Figure 14.15 shows how
the tree would look when redistribution is considered. It is also possible to consider right redistribution. Try to modify the B+—tree insertion algorithm
to take redistribution into account. 14.23. Outline an algorithm for deletion from a B+—tree. 14.24. Repeat Exercise 14.23 for a B—tree. Selected Bibliography Bayer and McCreight (1972) introduced Btrees and associated algorithms. Comer
(1979) provides an excellent survey of B—trees and their history, and variations of B—
trees. Knuth ( 1973) provides detailed analysis of many search techniques, including
Btrees and some of their variations. Nievergelt (1974) discusses the use of binary
search trees for file organization. Textbooks on file structures including Wirth
(1972), Claybrook (1983), Smith and Barnes (1987), Miller (1987), and Salzberg
(1988) discuss indexing in detail and may be consulted for search, insertion, and
deletion algorithms for B—trees and B+—trees. Larson (1981) analyzes index—sequen—
tial ﬁles, and Held and Stonebraker (1978) compare static multilevel indexes with
Btree dynamic indexes. Lehman and Yao (1981) and Srinivasan and Carey (1991)
did further analysis of concurrent access to B—trees. The books by Wiederhold
(1983), Smith and Barnes (1987), and Salzberg (1988), among others, discuss many
of the search techniques described in this chapter. Grid files are introduced in
Nievergelt (1984). Partial—match retrieval, which uses partitioned hashing, is dis
cussed in Burkhard ( 1976, 1979). 548 Chapter 14 Index Structures for Files a a I n I 4—— lnsert12: overflow (left redistribution) Insert 9: overflow (new level) a: an: I n 
Insert 6: overflow (split) J I I I I I Figure 14.15
B+tree insertion with left redistribution. New techniques and applications of indexes and B+—trees are discussed in Lanka and
Mays (1991), Zobel et a1. (1992), and Faloutsos and Iagadish (1992). Mohan and
Narang (1992) discuss index creation. The performance of various B—tree and B"
tree algorithms is assessed in Baeza—Yates and Larson (1989) and Johnson and
Shasha (1993). Buffer management for indexes is discussed in Chan et a1. (1992). ...
View
Full
Document
This note was uploaded on 06/28/2009 for the course CS 440 taught by Professor Staff during the Spring '08 term at Oregon State.
 Spring '08
 Staff

Click to edit the document details