# hw7 - CPS216 Data-Intensive Computing Systems Fall 2011...

This preview shows pages 1–3. Sign up to view the full content.

CPS216 Data-Intensive Computing Systems, Fall 2011, Assignment 7 Due date: Monday, Nov. 21, 2011, 5.00 PM. Late submissions will not be accepted. Do not forget to indicate your name on your submission. State all assumptions. For questions where descriptive solutions are required, you will be graded both on the correctness and clarity of your reasoning. Total points = 100. Question 1 Points 20 The following information is available about relations R and S: Relation R is clustered and the blocks of R are laid out contiguously on disk. B(R) = 1000 and T(R) = 10,000. Relation S is clustered and the blocks of S are laid out contiguously on disk. B(S) = 500 and T(S) = 5000. M = 101 blocks. For simplicity, we will assume that a random access can be done on average in time t r = 20 ms, and a sequential access can be done on average in time t s = 1 ms. For example, scanning ±ve contiguous blocks on disk, assuming the ±rst access is random, incurs a cost t r + 4 t s . 1. [ 6 Points ] How will you extend the “E²cient” Sort-Merge Join algorithm (that we learned in class) to minimize cost when our cost model distinguishes between random accesses and sequential accesses? (Note that in class we did not distinguish between random and sequential accesses.) Compute the cost of your algorithm. 2. [ 6 Points ] Design an algorithm for the block nested-loop join of relations R and S which has minimum cost when we distinguish between random and sequential disk accesses. Compute this minimum cost using the parameter values speci±ed above. 3. [ 8 Points ] How does your answer to (2) change if blocks of R are not laid out contiguously on disk? All other assumptions and parameters remain the same as speci±ed above. Compute the minimum cost possible for block nested-loop join in this case. Question 2 Points 20 = 5 + 3 * 5 Suppose you have two clustered relations R(A,X,Y) and S(B,C,Z). You have the following indexes on S. A non-clustering B-tree index on attribute B for S. A clustering B-tree index on attribute C for S. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Assume that both indexes are kept entirely in memory always (i.e., you do not need to read them from disk). Also, assume that all of the tuples of S that have the same value of attribute C are stored in sequentially adjacent (i.e., contiguous) blocks on disk. That is, if more than one block is needed to store all of the tuples with some value of C, then these blocks will be located sequentially on the disk. You have the following information about R and S: 100 tuples of R are stored per block on disk. Assume that blocks of R are laid out contiguously on disk. T(R) = 360,000 (number of tuples of R). The values of attribute A in R range from 1 to 360,000.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 6

hw7 - CPS216 Data-Intensive Computing Systems Fall 2011...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online