hw3-problems

Hw3-problems - Insertion is handled by some form of overflow file that is merged periodically with the data file The index is re—created

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Insertion is handled by some form of overflow file that is merged periodically with the data file. The index is re—created during file reorganization. IBM’s ISAM organ— ization incorporates a two—level index that is closely related to the organization of the disk. The first level is a cylinder index, which has the key value of an anchor record for each cylinder of a disk pack and a pointer to the track index for the cylin— der. The track index has the key value of an anchor record for each track in the cylinder and a pointer to the track. The track can then be searched sequentially for the desired record or block. Algorithm 14.1 outlines the search procedure for a record in a data file that uses a nondense multilevel primary index with t levels. We refer to entry 1' at level j of the index as <Kj(i), Pj(i)>, and we search for a record whose primary key value is K. We assume that any overflow records are ignored. If the record is in the file, there must be some entry at level 1 with K1(i) S K < K1( i + 1) and the record will be in the block of the data file whose address is P1(i). Exercise 14.19 discusses modifying the search algorithm for other types of indexes. Algorithm 14.1. Searching a Nondense Multilevel Primary Index with t Levels. p <— address of top level block of index; forje— tstep— 1 to 1 do begin read the index block (at jth index level) whose address is p; search blockp for entry i such that (i) S K< Kj(i + 1) (if Kj(i ) is the last entry in the block, it is sufficient to satisfy Kj(i ) S K); p <— Pj(i ) (* picks appropriate pointer at jth index level *) end; read the data file block whose address is p; search block p for record with key = K; As we have seen, a multilevel index reduces the number of blocks accessed when searching for a record, given its indexing field value. We are still faced with the prob— lems of dealing with index insertions and deletions, because all index levels are physically ordered files. To retain the benefits of using multilevel indexing while reducing index insertion and deletion problems, designers adopted a multilevel index called a dynamic multilevel index that leaves some space in each of its blocks for inserting new entries. It is often implemented by using data structures called B-trees and B+—trees, which we describe in the next section. 14.3 Dynamic Multilevel Indexes Using B-Trees and B+-Trees B-trees and B+—trees are special cases of the well—known tree data structure. We briefly introduce the terminology used in discussing tree data structures. A tree is formed of nodes. Each node in the tree, except for a special node called the root, has $4.35. 34.38. $4.37.. $4.38. $4.39. ber of blocks required by the multilevel index and the blocks used in the extra level of indirection; and (vi) the approximate number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the index. f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct a clustering index on Department_code that uses block anchors (every new value of Department_code starts at the begin- ning of a new block). Assume there are 1000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri (which is also the index fan—out f0); (ii) the number of first—level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks j required by the multilevel index; and (v) the number of block accesses . needed to search for and retrieve all records in the file that have a specific Department_code value, using the clustering index (assume that multiple blocks in a cluster are contiguous). g. Suppose that the file is not ordered by the key field Ssn and we want to . construct a B+—tree access structure (index) on Ssn. Calculate (i) the orders 19 and pleaf of the B+—tree; (ii) the number of leaf-level blocks needed if blocks are approximately 69 percent full (rounded up for con- V venience); (iii) the number of levels needed if internal nodes are also 69 percent full (rounded up for convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block accesses needed to search for and retrieve a record from the file—given its Ssn value—using the B+-tree. h. Repeat part g, but for a B—tree rather than for a B+—tree. Compare your results for the B-tree and for the B+-tree. A PARTS file with Part# as the key field includes records with the followin Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78,15,16,2l 24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose that the search field valu are inserted in the given order in a B+—tree of order p = 4 and pleaf = 3; sho 2 how the tree will expand and what the final tree will look like. Repeat Exercise 14.15, but use a B-tree of order p = 4 instead of a B+-tree. Suppose that the following search field values are deleted, in the given 0rd from the B+-tree of Exercise 14.15; show how the tree will shrink and sho the final tree. The deleted values are 65, 75,43, 18, 20, 92, 59, 37. 1 Repeat Exercise 14.17, but for the B-tree of Exercise 14.16. Algorithm 14.1 outlines the procedure for searching a nondense multil .“ primary index to retrieve a file record. Adapt the algorithm for each of I3: following cases: ' a. A multilevel secondary index on a nonkey nonordering field of a Ii Assume that option 3 of Section 14.1.3 is used, where an extra level: indirection stores pointers to the individual records with the corres- ponding index field value. b. A multilevel secondary index on a nonordering key field of a file. c. A multilevel clustering index on a nonkey ordering field of a file. 14.26. Suppose that several secondary indexes exist on nonkey fields of a file, implemented using option 3 of Section 14.1.3; for example, we could have secondary indexes on the fields Department_code, Job_code, and Salary of the EMPLOYEE file of Exercise 14.14. Describe an efficient way to search for and retrieve records satisfying a complex selection condition on these fields, such as (Department_code = 5 AND Job_code = 12 AND Salary 2 50,000), using the record pointers in the indirection level. 14.23. Adapt Algorithms 14.2 and 14.3, which outline search and insertion proce— dures for a B+—tree, to a B—tree. 14.22. It is possible to modify the B+—tree insertion algorithm to delay the case where a new level is produced by checking for a possible redistribution of val— ues among the leaf nodes. Figure 14.15 (next page) illustrates how this could be done for our example in Figure 14.12; rather than splitting the leftmost leaf node when 12 is inserted, we do a left redistribution by moving 7 to the leaf node to its left (if there is space in this node). Figure 14.15 shows how the tree would look when redistribution is considered. It is also possible to consider right redistribution. Try to modify the B+—tree insertion algorithm to take redistribution into account. 14.23. Outline an algorithm for deletion from a B+—tree. 14.24. Repeat Exercise 14.23 for a B—tree. Selected Bibliography Bayer and McCreight (1972) introduced B-trees and associated algorithms. Comer (1979) provides an excellent survey of B—trees and their history, and variations of B— trees. Knuth ( 1973) provides detailed analysis of many search techniques, including B-trees and some of their variations. Nievergelt (1974) discusses the use of binary search trees for file organization. Textbooks on file structures including Wirth (1972), Claybrook (1983), Smith and Barnes (1987), Miller (1987), and Salzberg (1988) discuss indexing in detail and may be consulted for search, insertion, and deletion algorithms for B—trees and B+—trees. Larson (1981) analyzes index—sequen— tial files, and Held and Stonebraker (1978) compare static multilevel indexes with B-tree dynamic indexes. Lehman and Yao (1981) and Srinivasan and Carey (1991) did further analysis of concurrent access to B—trees. The books by Wiederhold (1983), Smith and Barnes (1987), and Salzberg (1988), among others, discuss many of the search techniques described in this chapter. Grid files are introduced in Nievergelt (1984). Partial—match retrieval, which uses partitioned hashing, is dis- cussed in Burkhard ( 1976, 1979). 548 Chapter 14 Index Structures for Files a a I n I 4—— lnsert12: overflow (left redistribution) Insert 9: overflow (new level) a: an: I n | Insert 6: overflow (split) J I I I I I Figure 14.15 B+-tree insertion with left redistribution. New techniques and applications of indexes and B+—trees are discussed in Lanka and Mays (1991), Zobel et a1. (1992), and Faloutsos and Iagadish (1992). Mohan and Narang (1992) discuss index creation. The performance of various B—tree and B"- tree algorithms is assessed in Baeza—Yates and Larson (1989) and Johnson and Shasha (1993). Buffer management for indexes is discussed in Chan et a1. (1992). ...
View Full Document

This note was uploaded on 06/28/2009 for the course CS 440 taught by Professor Staff during the Spring '08 term at Oregon State.

Page1 / 4

Hw3-problems - Insertion is handled by some form of overflow file that is merged periodically with the data file The index is re—created

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online