slides01-11 - Using CQ Theory in Information Integration...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Using CQ Theory in Information Integration Yes; this stu really does get used in systems. We shall talk about three somewhat di erent systems that use the theory in various ways: 1. Information Manifold, developed by Alon Levy at ATT Research Labs Levy is now at U. Washington. 2. Infomaster, developed at Stanford by Mike Genesereth and his group. 3. Tsimmis, developed in the Stanford DB group. Two Broad Approaches 1. View Centric : There is a set of global predicates. Information sources are described by what they produce, in terms of the global predicates. 3 View = query describing what a source produces. 3 Global predicates behave like EDB, even though they are not stored and don't really exist. 3 Queries in terms of the global predicates are answered by piecing together views. 2. Query-Centric : A mediator exports global predicates. 3 Queries about these global predicates are translated by the mediator into queries at the sources and the answer is pieced together from the source responses. 3 Source predicates play the role of EDB. 3 Predicates exported by the mediator are de ned by views" of the source predicates. Building Queries From Views Information Manifold IM is built on the principle that there is a global set of predicates, and information sources are described in terms of what they can say about those predicates. We describe each information source by a set of views that they can provide. 3 Views are expressed as CQ's whose subgoals use the global predicates. 1 Queries are also CQ's about the global predicates. Fundamental Question: Given a query and a collection of views, how do we nd an expression using the views only, that is equivalent to the query. Remember: equivalence = containment in both directions. Sometimes equivalence is not possible; we need to nd a query about the views that is maximally contained in the query. In IM, we really want all CQ's whose subgoals are views and that are contained in the query, since each expression may contribute answers to the query. 3 Exception: if one CQ is contained in another, then we don't need the contained CQ. Example Let us consider an integrated information system about employees of a company. Global predicates: empE  = E is an employee phoneE; P  = P is E 's phone officeE; O = O is E 's o ce mgrE; M  = M is E 's manager deptE; D = D is E 's department We suppose three sources, each providing one view: v1E,P,M :- empE & phoneE,P & mgrE,M v2E,O,D :- empE & officeE,O & deptE,D v3E,P :- empE & phoneE,P & deptE,toy 1. View v1 , gives information about employees, their phones and managers. 2. View v2 and gives information about the o ces and departments of employees. 3. View v3 provides the phones of employees, but only for employees in the Toy Department. Interpretation of View De nitions 2 A view de nition gives properties that the tuples produced by the view must have. The view de nition is not a guarantee that all such tuples are provided by the view. There is not even a guarantee that results produced by the two views are consistent. 3 E.g., there is no reason to believe the phone information provided by v1 and v3 is consistent. Example The constraint department = Toy" is enforced by the subgoal deptE; toy in the de nition of v3. This constraint would be important if we asked a query about employees known not to be in the Toy Department; we would not include v3 in any solution. Consider the query: what are Sally's phone and o ce?" In terms of the global predicates: q1P,O :- phonesally,P & officesally,O There are two minimal solutions to this query. 3 Minimal" = not contained in any other solution that is also contained in the query. a1P,O :- v1sally,P,M & v2sally,O,D a2P,O :- v3sally,P & v2sally,O,D If we expand the views in the rules for the answer, we get: a1P,O :- empsally & phonesally,P & mgrsally,M & empsally & officesally,O & deptsally,D a2P,O :- empsally & phonesally,P & deptsally,toy & empsally & officesally,O & deptsally,D Note these CQ's are not equivalent to q1; they are the CQ's that come closest to q1 while still being contained in q1 and constructable from the views. Selecting Solutions to a Query The search for solutions by IM is based on a theorem that limits the set of CQ's that can possibly be useful. 3 The search is exponential in principle but appears manageable in practice. The Query-Expansion Process Query Q answer  :- p 1   & . . . & p n   i i Solution S answer  :- v 1   & . . . & v r   j j answer  :- p 1 1 : : : p 1 j j k1 p r1 : : : p r r Expansion E j jk Explanation of Expansion Diagram A query Q is given; solutions S are proposed, and each solution is expanded to a CQ E = E S  by replacing the view-subgoals in S by their de nitions in terms of the global predicates. 3 As always, when replacing a subgoal by the body of a rule, be sure to use unique variables for the local variables in the rule body. A solution S is valid for Q if E S  Q. In principle, there can be an in nite number of valid solutions for a query Q. 3 Just add irrelevant subgoals to S ; they may make the solution smaller, but it will still be contained in Q. Thus, we want only minimal solutions, those not contained in any other solution. Important Reminder Minimality is at the level of solutions, not expansions. 4 Since views may provide di erent subsets of the global predicates, comparing expansions for containment might lead to false conclusions based on the false assumption that two views provided the same data. Example Views: v1X,Y :- parX,Y v2X,Y :- parX,Y Query: ansX,Y :- parX,Y Solutions: v v ansX,Y :- 1 X,Y ansX,Y :- 2 X,Y The expansions of the solutions are each contained in the query, so they are valid solutions, and should be included. 3 They are in fact equivalent to the query, but that is irrelevant, since the :-" in the view de nitions is a misnomer; the views need not have every par fact. The solutions themselves without expansion are not contained in one another. Thus, neither can eliminate the other in the set of solutions. Theorem If S is a solution for query Q, and S has more subgoals than Q, then S is not minimal. Proof Look at the containment mapping from Q to E S . If S has more subgoals than Q, then there must be some subgoal g of S such that no subgoal of Q is mapped to any subgoal of E S  that comes from the expansion of g. If we delete g from S to make a new solution S , then E S  Q. 3 Proof: The containment mapping from Q to E S  is also a containment mapping from Q to E S . 0 0 0 5 Moreover, S S . 3 Proof: The identity mapping on subgoals gives us the containment mapping. 3 Note this test must be carried out without expansion. Thus, S is a valid solution that contains S in raw form without expansion. 0 0 Example Continuing the employees" example, query q1: q1P,O :- phonesally,P & officesally,O has two subgoals. Answers a1 and a2 each have two subgoals, so they might be minimal they are!. However, the following answer: a3P,O :- v1sally,P,M & v2sally,O,D & v3E,P cannot be minimal, because it has three subgoals, more than q1 does. 3 Note that a3 is a1 with the additional condition that Sally's phone must be the phone of somebody in the Toy Dept. 3 Thus, a3 a1 without expansion, and a3 cannot be minimal. The expansion of a3 is: a3P,O :- empsally & phonesally,P & mgrsally,M & empsally & officesally,O & deptsally,D & empE & phoneE,P & deptE,toy 3 Thus, E a3 q1, and a3 is valid, although not minimal. 6 ...
View Full Document

This document was uploaded on 01/06/2012.

Ask a homework question - tutors are online