Queryflocks - 4 Query Flocks Goal apply a-priori trick and other association-rule tricks to a more general class of complex queries 4.1 Query Flock

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 4 Query Flocks Goal: apply a-priori trick and other association-rule tricks to a more general class of complex queries. 4.1 Query Flock Notation A query ock is a generate-and-test system consisting of: 1. A query with parameters; we write the query in Datalog to simplify certain optimizations later. 2. A lter condition that says when the values of the parameters yields a query result that we accept. Note that the query ock is really a single query about its parameters; the parametrized-query component is not the real query. Example 4.1 : Frequent item pairs in a relation BasketsBID; item can be written as the query ock: Answerb - Basketsb,$1 AND Basketsb,$2 =s COUNTAnswer If we replace parameters $1 and $2 by values, e.g., diapers" and beer," respectively, then the query is asking for the set of basket ID's such that the basket contains both diapers and beer. The condition on the answer says that there must be at least s such baskets, where s is the support threshold. Thus, this query ock asks the usual question about the parameters $1 and $2: which pairs of items appear in at least s baskets?" 2 Example 4.2 : Here is a less usual example. It supposes relations: 1. Custname; attr; value. Tuple n; a; v means the customer with name n has value v for attribute a. For instance, Sue; age; 45 means that Sue is of age 45. 2. Buysname; prod tells what products each customer buys. 3. Typeprod; type tells the type of each product, e.g., product Coke" is of type soft drink." Here is the query ock that asks for values of some attribute that occur at least s times among buyers of a certain type of product: Answern - Custn,$a,$v AND Buysn,p AND Typep,$t =s COUNTAnswer 2 4.2 Execution Strategies The analog of a-priori is the observation that if we delete one or more subgoals from a Datalog query, the size of the set of answers can only increase. Our hope is that by computing some temporary relations using a subset of the subgoals, we can lter the sets of values for one or more parameters, using computations that are much less expensive than computing the entire query about the full set of parameters. We can describe the intermediate steps, as well as the nal computation of the parameter-values that pass the test by a sequence of steps of the form Relation := FILTER parameters , query , condition  The query is the ock query, with zero or more subgoals eliminated. A requirement is that this query be safe ; i.e., every variable appearing in the head appears in a nonnegated subgoal involving a relation i.e., not a subgoal involving an arithmetic comparison like a b. 13 The parameters are those appearing in the query. The condition is the same as the condition of the ock itself. Example 4.3 : The ock of Example 4.1 might be solved by using the rst subgoal to lter $1 and the second subgoal to lter $2. OK1$1 := FILTER $1 , Answerb - Basketsb,$1, COUNTAnswer = s OK2$2 := FILTER $2 , Answerb - Basketsb,$2, COUNTAnswer = s OK$1,$2 := FILTER $1,$2 , Answerb - Basketsb,$1 AND Basketsb,$2 AND OK1$1 AND OK2$2, COUNTAnswer = s Of course a clever ocks compiler recognizes that these two ltering steps are really the same and only computes one of OK1 and OK2. The reason a-priori often saves a lot of time is because the join of four relations at the last step computation of OK $1; $2 can be carried out in an order that reduces the size of intermediate relations, when compared with just joining Baskets with itself, as suggested by the ordering of Fig. 6. JOIN JOIN Baskets OK1 Baskets JOIN OK2 Figure 6: Preferred order for join in market-basket ock Notice that the ordering in Fig. 6 is not a left-deep ordering, which suggests that the typical commercial DBMS would not nd this order, and a query- ocks compiler needs to feed simpler queries to the DBMS so the right order of join is used by the DBMS. 2 Example 4.4 : Now let us consider how we might use lter steps to improve the running time of the nal join in Example 4.2. Using just the Cust subgoal is a lter on f$a; $vg, but there is no useful lter for just one of these parameters. We cannot use: Answern - Typep,$t to lter $t, because the query is not safe n appears in the head but not the body. However, Answern - Buysn,p AND Typep,$t is safe and may be used. A possible plan for optimizing this query ock is in Fig. 7. Figure 8 shows the preferred join order for the nal step. 2 14 OK1$a,$v := FILTER $a,$v , Answern - Custn,$a,$v, COUNTAnswer = s OK2$t := FILTER $t , Answern - Buysn,p AND Typep,$t, COUNTAnswer = s OK$a,$v,$t := FILTER $a,$v,$t , Answern - Custn,$a,$v, AND Buysn,p AND Typep,$t AND OK1$a,$v AND OK2$t, COUNTAnswer = s Figure 7: Query- ock plan for Example 4.2 JOIN JOIN Buys OK1 JOIN Cust Type JOIN OK2 Figure 8: Join order for nal step in Fig. 7 15 ...
View Full Document

This note was uploaded on 01/31/2011 for the course CS 345 taught by Professor Dunbar,a during the Fall '07 term at UC Davis.

Ask a homework question - tutors are online