a 4 points the company has prototyped basic boolean

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: this update to page 5 successfully written to disk? The log record at LSN 70 says that transaction 1 updated page 2. Was this update to page 2 successfully written to disk? Explain briefly in both cases. (b) [4 points] At the end of the Analysis phase, what transactions will be in the transaction table, and with what lastLSN and Status values? What pages will be in the dirty page table, and with what recLSN values? Transaction ID lastLSN Status Page ID recLSN (c) [4 points] At which LSN in the log should redo begin? Which log records will be redone (list their LSNs)? All other log records will be skipped. 8 SID: ____________________________ Q5: Search and Query Processing [10 points] You are consulting on the design of a new search engine. The company building it wants to use SQL on top of a DBMS. (You tell them that using a DBMS is not the best approach for high ­performance text search. They tell you it is a non ­negotiable design decision. You nod reasonably; this is not your first time working with an irrational customer!) a) [4 points] The company has prototyped basic Boolean search on a small test data set. They are storing the files in a single table of the form Files(docID integer, content text, PRIMARY KEY (docID)). And they have a table of StopWords as well. Here’s their query template for a 2 ­keyword search ($1 and $2 are replaced with keywords at runtime): SELECT DISTINCT A.docID FROM Files A, Files B, StopWords S WHERE A.docID = B.docID AND A.content LIKE %$1% AND B.content LIKE %$2% AND $1 <> S.word AND $2 <> S.word; For each of the following comments, answer True or False, and explain your answer in the space provided (DO NOT use more space!): i. This query is exponential in the number of File tuples, so it will get exponentially slower as they add files to their corpus. ii. The self ­join in this query is useless. iii. This query will produce no output for the keyword $1 = “the” as long as it was inserted into the StopWords table. iv. The query optimizer may produce ridiculously bad join orders. ============================================================ If your answer continues below here it is TOO LONG. 9 SID: ____________________________ b) [4 points] The company likes your idea of using inverted indexes. They propose to use the scheme we described in class: build an InvertedFile relation in the DBMS with a...
View Full Document

This note was uploaded on 01/24/2014 for the course CS 186 taught by Professor Staff during the Fall '08 term at University of California, Berkeley.

Ask a homework question - tutors are online