This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Conjunctive Queries = safe, Datalog rules:
H : G1 & & Gn
Most common form of query; equivalent to
selectprojectjoin queries.
Useful for optimization of active elements,
e.g., checking distributed constraints,
maintaining materialized views.
Useful for information integration. Applying a CQ to a Database If Q is a CQ, and D is a database of EDB facts,
then QD is the set of heads of Q that we get
when we:
Substitute constants for variables in the body
of Q in all possible ways.
Require all subgoals to become true. Example
pX; Y : , qX; Z & qZ; Y
EDB = fq1; 2; q2; 3; q3; 4g.
Only substitutions that make subgoals both
true:
1. X ! 1; Y ! 3; Z ! 2.
2. X ! 2; Y ! 4; Z ! 3.
Yield heads p1; 3 and p2; 4. Containment
Q1 Q2 i for every database D, Q1D Q2D.
Containment problem is NPcomplete, but
not a hard" problem in practical situations
short queries, few pairs of subgoals with same
predicate.
Function symbols do not make problems more
di cult.
Adding negated subgoals and or arithmetic
subgoals, e.g., X Y , makes things more
complex, but important special cases. Example
1 A: pX,Y : rX,W & bW,Z & rZ,Y
B : pX,Y : rX,W & bW,W & rW,Y
Claim: B A.
In proof, suppose px; y is in B D. Then
there is some w such that rx; w, bw; w, and
rw; y are in D.
In A, make the substitution X ! x, Y ! y,
W ! w, Z ! w.
Thus, the head of A becomes px; y, and all
subgoals of A are in D.
Thus, px; y is also in AD, proving B A. Testing Containment of CQ's 1. Containment mappings.
2. Canonical databases.
Similar for basic CQ case, but 2 is useful for
more general cases like negated subgoals. Containment Mappings Mapping from variables of CQ Q2 to variables of
CQ Q1 such that
1. Head of Q2 becomes head of Q1 .
2. Each subgoal of Q2 becomes some subgoal of
Q1.
3 It is not necessary that every subgoal of
Q1 is the target of some subgoal of Q2. Example A, B as above:
A: pX,Y : rX,W & bW,Z & rZ,Y
B : pX,Y : rX,W & bW,W & rW,Y
Containment mapping from A to B : X ! X ,
Y ! Y , W ! W, Z ! W.
No containment mapping from B to A.
Subgoal bW; W in B can only go to bW; Z
in A. That would require both W ! W and
W ! Z. Example
C1: pX : aX,Y & aY,Z & aZ,W
C2: pX : aX,Y & aY,X
2 Containment mapping from C1 to C2. X !
X, Y ! Y , Z ! X, W ! Y .
No containment mapping from C2 to C1.
Proof:
a X ! X required for head.
b Thus, rst subgoal of C2 must map to
rst subgoal of C1; Y must map to Y .
c Similarly, 2nd subgoal of C2 must map to
2nd subgoal of C1, so X must map to Z .
d But we already found X maps to X . Containment Mapping Theorem
Q1 Q2 i there exists a containment mapping
from Q2 to Q1 . Proof If
Let : Q2 ! Q1 be a containment mapping. Let D
be any DB.
Every tuple t in Q1D is produced by some
substitution on the variables of Q1 that
makes Q1's subgoals all become facts in D.
Claim: is a substitution for variables of
Q2 that produces t.
1.
Fi = some Gj . Therefore, it is
in D.
2.
H2 = H1 = t.
Thus, every t in Q1D is also in Q2 D; i.e.,
Q1 Q2. Proof Only If Key idea: frozen CQ.
1. Create a unique constant for each variable of
the CQ Q.
2. Frozen Q is a database consisting of all the
subgoals of Q, with the chosen constants
substituted for variables. Example
pX : aX,Y & aY,Z & aZ,W Let x be the constant for X , etc. The relation
for predicate a consists of the three tuples x; y,
y; z , and z; w.
3 Proof Only If Continued
Let Q1 Q2 . Let database D be the frozen Q1 . Q1D contains t, the frozen" head of Q1
3 Sounds gruesome, but the reason is that
we can use the substitution in which
each variable of Q1 is replaced by its
corresponding constant.
Since Q1 Q2 , Q2D must also contain t.
Let be the substitution of constants from
D for the variables of Q2 that makes each
subgoal of Q2 a tuple of D and yields t as the
head.
Let be the substitution that maps constants
of D to their unique, corresponding variable of
Q1. Q2: E : F1 & Fm X; Y
t Q1: ab H : G1 & & GiA; B & is a containment mapping from Q2 to Q1
because:
a The head of Q2 is mapped by to t, and
t is the frozen head of Q1, so
maps
the head of Q2 to the unfrozen" t, that
is, the head of Q1 .
b Each subgoal Fi of Q2 is mapped by to
some tuple of D, which is a frozen version
of some subgoal Gj of Q1. Then
maps Fi to the unfrozen tuple, that is, to
Gj itself. Dual View of Containment Mappings A containment mapping, de ned as a mapping on
variables, induces a mapping on subgoals.
Therefore, we can alternatively de ne a
containment mapping as a function on
subgoals, thus inducing a mapping on
variables.
The containment mapping condition becomes:
the subgoal mapping does not cause a variable
to be mapped to two di erent variables or
4 constants, nor cause a constant to be mapped
to a variable or a constant other than itself. Example
Again consider
A: pX,Y : rX,W & bW,Z & rZ,Y
B : pX,Y : rX,W & bW,W & rW,Y
Previously, we found the containment
mapping X ! X , Y ! Y , W ! W , Z ! W
from A to B .
We could as well describe this mapping as
rX; W ! rX; W , bW; Z ! bW; W ,
and rZ; Y ! rW; Y . Method of Canonical Databases
Instead of looking for a containment mapping from
Q2 to Q1 in order to test Q1 Q2, we can apply
the following test:
1. Create a canonical database D that is the
frozen body of Q1.
2. Compute Q2D.
3. If Q2D contains the frozen head of Q1, then
Q1 Q2; else not.
The proof that this method works is
essentially the same as the argument for
containment mappings:
3 The only way the frozen head of Q1
can be in Q2 D is for there to be a
containment mapping Q2 ! Q1 . Example
C1: pX : aX,Y & aY,Z & aZ,W
C2: pX : aX,Y & aY,X
Here is the test for C2 C1 :
Choose constants X ! 0, Y ! 1.
Canonical DB from C1 is
D = fa0; 1; a1; 0g
C1D = fp0; p1g.
5 Since the frozen head of C2 is p0, which is in
C1D, we conclude C2 C1.
3 Note that the instantiation of C1 that
shows p0 is in C1D is X ! 0, Y ! 1,
Z ! 0, and W ! 1.
3 If we replace 0 and 1 by the variables
X and Y they stand for, we have the
containment mapping from C1 to C2 . Saraiya's Containment Test
Containment of CQ's is NPcomplete in
general.
Sariaya's algorithm is a polynomialtime test
of Q1 Q2 for the common case that no
predicate appears more than twice among the
subgoals of Q1.
3 They can appear any number of times in
Q2.
The algorithm is a reduction to 2SAT and
yields a lineartime algorithm.
Our algorithm is more direct, but quadratic. The Algorithm Pick a subgoal of Q2, and consider the
consequences of mapping it to the two possible
subgoals of Q1.
Follow all consequences of this choice:
subgoals that must map to subgoals, and
variables that must map to variables.
3 If we know pX1 ; : : :; Xn must map to
pY1 ; : : :; Yn, then infer that each Xi
must map to Yi .
3 If pX1 ; : : :; Xn is a subgoal of Q2, and
we know Xi maps to some variable Z ,
and exactly one of the psubgoals of
Q1 has Z in the ith component, then
conclude pX1 ; : : :; Xn maps to this
subgoal.
One of two things must happen:
1. We derive a contradiction: a subgoal or
variable that must map to two di erent
things.
3 If so, try the other choice if there is one;
fail if there is no other choice.
6 2. We close the set of inferences we must make.
3 Then we can forever forget about the
question of how to map the determined
subgoals and variables.
3 We have found one mapping that works
and that can't interfere with the mapping
of any other subgoals or variables, so we
make another arbitrary choice if there are
any unmapped subgoals. Example Let us test C1 C2 , where:
C1: pB : aA,B & aB,A & bA,C & bC,B
C2: pX : aX,Y & bY,Z & bZ,W & aW,X
Note this simple example omits some options:
C1 could have a predicate appearing only once
in the body, and C2 could have 3 or more
occurrences of some predicates.
Here is a description of inferences that might
be made:
1 Suppose aX; Y ! aA; B
2
Then X ! A, Y ! B
3
Now, bY; Z ! bB; ?
4
Since there is no bB; ?, fail
5 Thus, we must map aX; Y ! aB; A
6
Then X ! B and Y ! A,
7
bY; Z ! bA; C , Z ! C ,
8
bZ; W ! bC; B , W ! B
9
Now, aW; X must map to aB; B
10
Since aB; B does not exist, fail
Note, however, that if the last subgoal of C1
were bC; A, we would have W ! A at
line 8 and aW; X ! aA; B at line 9.
3 That completes the containment mapping
successfully, with X ! B , Y ! A, Z !
C , and W ! A. Generalization to Unions of CQ's
P1 P2 Pk Q1 Q2 Qn i for all
Pi there exists some Qj such that Pi Qj .
Proof If
Obvious.
7 Proof Only If Assume the containment holds.
Let D be the canonical frozen database from
CQ Pi .
Since the containment holds, and Pi D surely
includes the frozen head of Pi, there must be
some Qj such that Qj D includes the frozen
head of Pi .
Thus, Pi Qj . Union Theorem Just Misses Being False Consider generalized CQ's allowing arithmeticcomparison subgoals.
P1: pX : eX & 10 = X & X = 20
Q1: pX : eX & 10 = X & X = 15
Q2: pX : eX & 15 = X & X = 20 P1 Q1 Q2, but P1 Q1 and P1 Q2 are
both false. CQ Contained in Recursive Datalog Test relies on method of canonical DB's;
containment mapping approach doesn't work it's
meaningless.
Make DB D from frozen body of CQ.
Apply program to D. If frozen head of CQ
appears in result, then yes contained, else
no. Example
CQ Q1 is:
Q1: pathX,Y : arcX,Z & arcZ,W & arcW,Y Q2 is the value of path in the following
recursive Datalog program:
r1: pathX,Y : arcX,Y
r2: pathX,Y : pathX,Z & pathZ,Y
Intuitively, Q1 = paths of length 3; Q2 =
paths of length 1 or more.
Freeze Q1, say with 0, 1, 2, 3 as constants for
X , Z , W , Y , respectively.
D = farc0; 1; arc1; 2; arc2; 3g
8 Frozen head is path0; 3.
Easy to infer that path0; 3 is in Q2D 
use r1 three times to infer path0; 1,
path1; 2, path2; 3, then use r2 to infer
path0; 2, path0; 3. Harder Cases
Datalog program CQ: doubly exponential
complexity. Reference: Chaudhuri, S. and
M. Y. Vardi 1992 . On the equivalence of
datalog programs," Proc. Eleventh ACM
Symposium on Principles of Database
Systems, pp. 55 66.
Datalog program Datalog program:
undecidable. 9 ...
View
Full
Document
This document was uploaded on 01/06/2012.
 Spring '09

Click to edit the document details