CPS216 DataIntensive Computing Systems, Fall 2011,
Assignment 7
•
Due date: Monday, Nov. 21, 2011, 5.00 PM. Late submissions will not be accepted.
•
Do not forget to indicate your name on your submission.
•
State all assumptions. For questions where descriptive solutions are required, you will be
graded both on the correctness and clarity of your reasoning.
•
Total points = 100.
Question 1
Points 20
The following information is available about relations R and S:
•
Relation R is clustered and the blocks of R are laid out contiguously on disk. B(R) = 1000
and T(R) = 10,000.
•
Relation S is clustered and the blocks of S are laid out contiguously on disk. B(S) = 500 and
T(S) = 5000.
•
M = 101 blocks.
•
For simplicity, we will assume that a random access can be done on average in time
t
r
= 20
ms, and a sequential access can be done on average in time
t
s
= 1 ms. For example, scanning
±ve contiguous blocks on disk, assuming the ±rst access is random, incurs a cost
t
r
+ 4
t
s
.
1. [
6 Points
] How will you extend the “E²cient” SortMerge Join algorithm (that we learned
in class) to minimize cost when our cost model distinguishes between random accesses and
sequential accesses? (Note that in class we did not distinguish between random and sequential
accesses.) Compute the cost of your algorithm.
2. [
6 Points
] Design an algorithm for the block nestedloop join of relations R and S which has
minimum cost when we distinguish between random and sequential disk accesses. Compute
this minimum cost using the parameter values speci±ed above.
3. [
8 Points
] How does your answer to (2) change if blocks of R are not laid out contiguously
on disk? All other assumptions and parameters remain the same as speci±ed above. Compute
the minimum cost possible for block nestedloop join in this case.
Question 2
Points 20 = 5 + 3 * 5
Suppose you have two clustered relations R(A,X,Y) and S(B,C,Z). You have the following indexes
on S.
•
A nonclustering Btree index on attribute B for S.
•
A clustering Btree index on attribute C for S.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentAssume that both indexes are kept entirely in memory always (i.e., you do not need to read them
from disk). Also, assume that all of the tuples of S that have the same value of attribute C are
stored in sequentially adjacent (i.e., contiguous) blocks on disk. That is, if more than one block is
needed to store all of the tuples with some value of C, then these blocks will be located sequentially
on the disk.
You have the following information about R and S:
•
100 tuples of R are stored per block on disk. Assume that blocks of R are laid out contiguously
on disk.
•
T(R) = 360,000 (number of tuples of R). The values of attribute A in R range from 1 to 360,000.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Summer '09
 Relational model, Seek time, Hash join, Sequential access, Btree

Click to edit the document details