Unformatted text preview: n “Alternative 3” B
tree index on the term column. The data entries in the leaves point to RecordIds of the InvertedFile heap file in the database. You explain to them that their DBMS will not ensure that the “Alternative 3” entries are sorted by RecordId. So the optimizer will not be able to choose the “standard” query plan from class using merge join. They don’t see any problem with that. To demonstrate, you show the Boolean query “Miley AND antidisestablishmentarianism”. The data entry (postings list) for “Miley” takes 350MB(42.9 million results on Google), and the one for “antidisestablishmentarianism” takes 5MB (81,200 results on Google). They have 10MB of buffer space to run this query. Assume the optimizer does a good job choosing among the various join algorithms and access methods we learned in class. Draw the query plan it would choose, and write down the total I/O cost including index access and join costs (but not the cost of writing out the answer). c) [2 points] What would the I/O cost have been using the scheme described in class: i.e. postings lists guaranteed to be sorted by docID, and simple merge join? 10 SID: ____________________________
Q6: A Little SQL [4 points] The questions on this page refer the the relation defined by this statement:
CREATE TABLE Students(id integer, gpa float, name text,
address text, gender char,
PRIMARY KEY (id));
a. [1 point] Are the two queries below equivalent ? That is, do they return the same answer on any database instance? Answer True of False; no explanation required. SELECT MAX(S.id) FROM Students S;
SELECT S.id FROM Students S
WHERE S.id >= ALL (SELECT S2.id FROM Students S2); b. [1 point] Among the 3 queries below, some or all are equivalent. Circle the ones that are equivalent. SELECT MAX(S.gpa) FROM Students S; SELECT S.gpa FROM Students S
WHERE S.gpa >= ALL (SELECT S2.gpa FROM Students S2); SELECT S.gpa FROM Students S
GROUP BY S.gpa
HAVING S.gpa >= ALL (SELECT S2.gpa FROM Students S2
WHERE S2.gpa > S.gpa); c. [1 point] Consider the following query and the table of data to the right: id gpa name address gender 123 null Joe 38 Maple M 124 3.2 Hui 64 Vine F 127 3.9 Celia 21 Elm F 111 SELECT S.id FROM Students S
WHERE S.gpa > 3.3
AND S.id > 120;
3.2 Hector 11 Oak M How many rows should be in the output? d. [1 point] Using the same data from the table in part (c), how many rows should be in the output of the following query? SELECT S.id FROM Students S
WHERE S.gpa > 3.3 OR S.gender = ‘M’; 11 SID: ____________________________
Q7: More SQL [8 points] Consider this old chestnut: the Stable Marriage Problem, described on its Wikipedia page as follows. Given n men and n women, where each person has ranked all members of the opposite sex with a unique number
between 1 and n in order of pre...
View
Full Document
 Fall '08
 Staff
 Newman, the00, oldman

Click to edit the document details