This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Using CQ Theory in Information Integration
Yes; this stu really does get used in systems. We
shall talk about three somewhat di erent systems
that use the theory in various ways:
1. Information Manifold, developed by Alon
Levy at ATT Research Labs Levy is now at
U. Washington.
2. Infomaster, developed at Stanford by Mike
Genesereth and his group.
3. Tsimmis, developed in the Stanford DB
group. Two Broad Approaches 1. View Centric : There is a set of global
predicates. Information sources are described
by what they produce, in terms of the global
predicates.
3 View = query describing what a source
produces.
3 Global predicates behave like EDB, even
though they are not stored and don't
really exist.
3 Queries in terms of the global predicates
are answered by piecing together views.
2. QueryCentric : A mediator exports global
predicates.
3 Queries about these global predicates are
translated by the mediator into queries
at the sources and the answer is pieced
together from the source responses.
3 Source predicates play the role of EDB.
3 Predicates exported by the mediator
are de ned by views" of the source
predicates. Building Queries From Views Information Manifold IM is built on the principle
that there is a global set of predicates, and
information sources are described in terms of what
they can say about those predicates.
We describe each information source by a set
of views that they can provide.
3 Views are expressed as CQ's whose
subgoals use the global predicates.
1 Queries are also CQ's about the global
predicates. Fundamental Question: Given a query and a collection of views, how do
we nd an expression using the views only, that is
equivalent to the query.
Remember: equivalence = containment in
both directions.
Sometimes equivalence is not possible; we
need to nd a query about the views that is
maximally contained in the query.
In IM, we really want all CQ's whose subgoals
are views and that are contained in the query,
since each expression may contribute answers
to the query.
3 Exception: if one CQ is contained in
another, then we don't need the contained
CQ. Example Let us consider an integrated information system
about employees of a company.
Global predicates:
empE = E is an employee
phoneE; P = P is E 's phone
officeE; O = O is E 's o ce
mgrE; M = M is E 's manager
deptE; D = D is E 's department
We suppose three sources, each providing one view:
v1E,P,M : empE & phoneE,P
& mgrE,M
v2E,O,D : empE & officeE,O
& deptE,D
v3E,P : empE & phoneE,P
& deptE,toy 1. View v1 , gives information about employees,
their phones and managers.
2. View v2 and gives information about the
o ces and departments of employees.
3. View v3 provides the phones of employees, but
only for employees in the Toy Department. Interpretation of View De nitions
2 A view de nition gives properties that the
tuples produced by the view must have.
The view de nition is not a guarantee that all
such tuples are provided by the view.
There is not even a guarantee that results
produced by the two views are consistent.
3 E.g., there is no reason to believe the
phone information provided by v1 and
v3 is consistent. Example The constraint department = Toy" is enforced by
the subgoal deptE; toy in the de nition of v3.
This constraint would be important if we
asked a query about employees known not
to be in the Toy Department; we would not
include v3 in any solution.
Consider the query: what are Sally's phone and
o ce?" In terms of the global predicates:
q1P,O : phonesally,P &
officesally,O There are two minimal solutions to this query.
3
Minimal" = not contained in any other
solution that is also contained in the
query.
a1P,O : v1sally,P,M & v2sally,O,D
a2P,O : v3sally,P & v2sally,O,D If we expand the views in the rules for the answer,
we get:
a1P,O : empsally & phonesally,P
& mgrsally,M & empsally
& officesally,O & deptsally,D
a2P,O : empsally & phonesally,P
& deptsally,toy & empsally
& officesally,O & deptsally,D Note these CQ's are not equivalent to q1; they
are the CQ's that come closest to q1 while still
being contained in q1 and constructable from
the views. Selecting Solutions to a Query The search for solutions by IM is based on a
theorem that limits the set of CQ's that can
possibly be useful.
3 The search is exponential in principle but
appears manageable in practice. The QueryExpansion Process
Query Q
answer : p 1 & . . . & p n
i i Solution S
answer : v 1 & . . . & v r
j j answer : p 1 1 : : : p 1
j j k1 p r1 : : : p r r
Expansion E
j jk Explanation of Expansion Diagram
A query Q is given; solutions S are proposed,
and each solution is expanded to a CQ E =
E S by replacing the viewsubgoals in S
by their de nitions in terms of the global
predicates.
3 As always, when replacing a subgoal by
the body of a rule, be sure to use unique
variables for the local variables in the rule
body.
A solution S is valid for Q if E S Q.
In principle, there can be an in nite number
of valid solutions for a query Q.
3 Just add irrelevant subgoals to S ; they
may make the solution smaller, but it will
still be contained in Q.
Thus, we want only minimal solutions, those
not contained in any other solution. Important Reminder Minimality is at the level of solutions, not
expansions.
4 Since views may provide di erent subsets of
the global predicates, comparing expansions
for containment might lead to false conclusions
based on the false assumption that two
views provided the same data. Example
Views:
v1X,Y : parX,Y
v2X,Y : parX,Y
Query:
ansX,Y : parX,Y Solutions: v
v ansX,Y : 1 X,Y
ansX,Y : 2 X,Y The expansions of the solutions are each
contained in the query, so they are valid
solutions, and should be included.
3 They are in fact equivalent to the query,
but that is irrelevant, since the :" in
the view de nitions is a misnomer; the
views need not have every par fact.
The solutions themselves without expansion
are not contained in one another. Thus,
neither can eliminate the other in the set of
solutions. Theorem If S is a solution for query Q, and S has more
subgoals than Q, then S is not minimal. Proof Look at the containment mapping from Q to E S .
If S has more subgoals than Q, then there
must be some subgoal g of S such that no
subgoal of Q is mapped to any subgoal of
E S that comes from the expansion of g.
If we delete g from S to make a new solution
S , then E S Q.
3 Proof: The containment mapping from Q
to E S is also a containment mapping
from Q to E S .
0 0 0 5 Moreover, S S .
3 Proof: The identity mapping on subgoals
gives us the containment mapping.
3 Note this test must be carried out
without expansion.
Thus, S is a valid solution that contains S in
raw form without expansion.
0 0 Example Continuing the employees" example, query q1:
q1P,O : phonesally,P &
officesally,O has two subgoals. Answers a1 and a2 each have
two subgoals, so they might be minimal they
are!.
However, the following answer:
a3P,O : v1sally,P,M
& v2sally,O,D & v3E,P cannot be minimal, because it has three
subgoals, more than q1 does.
3 Note that a3 is a1 with the additional
condition that Sally's phone must be the
phone of somebody in the Toy Dept.
3 Thus, a3 a1 without expansion, and a3
cannot be minimal.
The expansion of a3 is:
a3P,O : empsally & phonesally,P
& mgrsally,M & empsally
& officesally,O & deptsally,D
& empE & phoneE,P
& deptE,toy 3 Thus, E a3 q1, and a3 is valid,
although not minimal. 6 ...
View
Full
Document
This document was uploaded on 01/06/2012.
 Spring '09

Click to edit the document details