COMPUTER ENGINEERING DEPARTMENT
Bilkent University
CS 351: Data Organization and Management
Dec. 23, 2005 – 12:15 – 14:15
NAME/Section:
GOOD
LUCK!
Notes: 1. There
are
100 points, 9
questions on 8 pages.
2. Please first READ the questions and then provide short answers.
3. Show your work.
4. You are not allowed to use your cell phone or PDA for any purpose.
5. Before proceeding with the questions please read the last page, Appendix,
for IBM 3380 parameters and further instructions.
1
.
(10 pts.)
In this question consider an inverted filebased search engine environment.
No. of Query
Terms
1
2
3
4
5
Probability
.2
.4
.2
.1
.1
The above table indicates that 20% of the queries contain 1 term (i.e., their probability is .2),
similar interpretations are valid for the other entries of the table.
In the same environment, the posting lists have the following characteristics in terms of their
length (in number of thousand disk blocks).
Posting list
Length
(in 1000 blocks)
1
2
3
4
Probability
.4
.4
.1
.1
The above table indicates that, for example, 40% of the posting lists occupy 1000 disk block.
a
.
Determine the expected number of terms in an average query.
Answer:
Expected number of terms = 1 x 0.2 + 2 x 0.4 + 3 x 0.2 + 4 x 0.1 + 5 x 0.1 = 2.5
Thus, an average query has about 3 terms.
b.
What is the expected number of posting disk blocks to be accessed for an average query?
Answer:
For each term the number of posting disk blocks to be accessed is :
1000 x (1 x 0.4 + 2 x 0.4 + 3 x 0.1 + 4 x 0.1) = 1900
CS 351: Data Organization and Management, Fall 2005, Final Exam
p. 2
As we have 3 terms in an average query, the expected number of posting disk blocks to be
accessed is : 3 x 1900 = 5700
c.
What is the expected query processing time if we assume that in a posting list 80% of the disk
blocks are allocated sequentially and the rest randomly.
Assume an IBM3380 environment.
Answer:
Let the number of sequentially allocated blocks be b
s
and randomly allocated blocks be b
r
.
Then the expected query processing time is calculated with the formula
b
s
x ebt + b
r
x (s + r + btt)
Using the average number of blocks to be accessed (calculated in b) and the IBM 3380
specifications we get query processing time = 5700 x 0.8 x 0.84 + 5700 x 0.2 x (16 + 8.3 +
0.8)
=
32444 ms = 32 sec.
2
.
(10 In a university database we have information about professors (with SSN), courses
(with courseid).
Professors teach courses.
Between these two items, there is Teaches
relationship.
(This question is tricky, and some of the drawings may not match the
specifications.)
a.
Professors can teach the same course in several semesters, and each offering must be
recorded.
Draw an ER diagram for this specification.
This note was uploaded on 12/01/2011 for the course CS 351 taught by Professor Fazlıcan during the Spring '11 term at Bilkent University.
 Spring '11
 FazlıCan

