Universal Relations
The idea keeps getting reinvented about 3
times a year somewhere in the world.
3 So let's see it once and for all, and then if
you ever need it you can just implement
it rather than reinventing it.
Example of Problem that the UR Solves
Join Sizes
Sometimes, the size of a join result can be
exponential in the size of the input relations, even if
the join is acyclic.
Example 1
Consider A1 A2 . A2 A3 . . An,1An .
Let each Ai have domain f1; 2; 3; 4g.
Let each relation consist of the eigh
Hypergraphs
Hypergraph = nodes plus hyperedges that are
sets of any number of nodes.
Applications include optimizing queries
that are joins and representing universal
relations" a useful data-modeling concept.
Typically, nodes represent attributes and
hyp
Multivalued Dependencies
In relation R, we say MVD X ! Y holds if
!
whenever there are tuples s and t in R such that
X s = X t, then there is a tuple r in r such
that:
1. XY r = XY s.
2. R,Y X r = R,Y X t.
I.e., r agrees with s on the attributes
mentioned
Background: Functional Dependencies
We are always talking about a relation R,
with a xed schema set of attributes and a
varying instance set of tuples.
Conventions: A; B; : : : are attributes; : : :; Y; Z
are sets of attributes. Concatenation means
union.
More Clustering
CURE Algorithm
Non-Euclidean Approaches
The CURE Algorithm
Problem with BFR/k -means:
Assumes clusters are normally distributed
in each dimension.
And axes are fixed - ellipses at an angle
are not OK.
CURE:
Assumes a Euclidean distance.
Conjunctive Queries
= safe, Datalog rules:
H :- G1 & & Gn
Most common form of query; equivalent to
select-project-join queries.
Useful for optimization of active elements,
e.g., checking distributed constraints,
maintaining materialized views.
Useful for
CQ's With Negation
General form of conjunctive query with negation
CQN:
H :- G1 & . & Gn &
NOT F1 & . & NOT Fm
G's are positive subgoals; F 's are negative
subgoals.
Apply CQN Q to DB D by considering all
possible substitutions of constants for the
variab
Five Groups of Rules for Magic Construction
Let r be a typical rule
H :- G1 & G2 & & G
Group I
Supplementary magic for next subgoal. If G
has IDB predicate p:
m pbound args of G :- sup ,1variables
Group II
Magic 0th supplementary. If head has predicate
q:
Rule Goal Trees
Nodes correspond to rules and to subgoals of rules.
Rule node: children = subgoals of the rule.
Goal node: children = rules whose heads unify
with the goal. Unifying substitution must be
made in the rule.
3 Be careful that local variables
Magic Sets
Optimization technique for recursive Datalog.
Also a win on some nonrecursive SQL
Mumick, Finkelstein, Pirahesh, and
Ramakrishnan, 1990 SIGMOD, pp. 247 258.
Combines bene ts of both top-down
backward chaining, recursive tree search
and bottom-u
Getting All You Can Out of Views
The situation is that we are given a collection
of views and a query possibly recursive.
3 We want to nd all the answers to the
query that we can using the views.
This technology, due to Oliver Duschka, comes
from Infomast
The Bucket Algorithm
We can answer queries using views" by trying
all CQ's with no more view subgoals than the
query has subgoals.
However, a more organized exploration of the
possibilities, called the bucket algorithm looks
at how views can cover" each
Using CQ Theory in Information Integration
Yes; this stu really does get used in systems. We
shall talk about three somewhat di erent systems
that use the theory in various ways:
1. Information Manifold, developed by Alon
Levy at ATT Research Labs Levy is
Relationships Among Semantics
If a program + EDB has a strati ed or perfect
locally strati ed model, then that is the unique
stable model.
A program + EDB can have a unique stable
model even if there is no perfect model.
Example
p :- NOT q
q :- NOT p
p :-
Local Strati cation
Instantiate rules; i.e., substitute all
possible constants for variables, but reject
instantiations that cause some EDB subgoal
to be false.
3 Ground atom = atom with no variables.
Build dependency graph at the level of ground
atoms by
Review of Logic as a Query Language
Datalog programs are collections of rules, which are
Horn clauses or if-then expressions.
Example
The following rules express what is needed to
make" a le. It assumes these relations or EDB
extensional database predicat
Using Views to Implement
Datalog Programs
Inverse Rules
Duschkas Algorithm
Inverting Rules
Idea: invert the view definitions to give
the global predicates definitions in terms
of views and function symbols.
Plug the globals definitions into the body
of
AssociationRules
MarketBaskets
FrequentItemsets
AprioriAlgorithm
TheMarketBasketModel
x Alargesetofitems,e.g.,thingssoldin
asupermarket.
x Alargesetofbaskets,eachofwhichis
asmallsetoftheitems,e.g.,thethings
onecustomerbuysononeday.
Support
x Simplestq
CS345DataMining
Introductions
WhatIsIt?
CulturesofDataMining
MoreStreamMining
CountingHowManyElements
ComputingMoments
CountingDistinctElements
x Problem:adatastreamconsistsof
elementschosenfromasetofsizen.
Maintainacountofthenumberof
distinctelementsseensofar.
x Obviousapproach:maintainthesetof
elementsseen.
A