Universal Relations
The idea keeps getting reinvented about 3
times a year somewhere in the world.
3 So let's see it once and for all, and then if
you ever need it you can just implement
it rather than reinventing it.
Example of Problem that the UR Solves
1
Join Sizes
Sometimes, the size of a join result can be
exponential in the size of the input relations, even if
the join is acyclic.
Example 1
Consider A1 A2 . A2 A3 . . An,1An .
Let each Ai have domain f1; 2; 3; 4g.
Let each relation consist of the eigh
Hypergraphs
Hypergraph = nodes plus hyperedges that are
sets of any number of nodes.
Applications include optimizing queries
that are joins and representing universal
relations" a useful data-modeling concept.
Typically, nodes represent attributes and
hyp
Multivalued Dependencies
In relation R, we say MVD X ! Y holds if
!
whenever there are tuples s and t in R such that
X s = X t, then there is a tuple r in r such
that:
1. XY r = XY s.
2. R,Y X r = R,Y X t.
I.e., r agrees with s on the attributes
mentioned
Background: Functional Dependencies
We are always talking about a relation R,
with a xed schema set of attributes and a
varying instance set of tuples.
Conventions: A; B; : : : are attributes; : : :; Y; Z
are sets of attributes. Concatenation means
union.
CS345 Midterm Examination
Wednesday, May 14, 2003, 9:30 11:30AM
Directions
The exam is open book; any written materials may be used.
Answer all 9 questions on the exam paper itself.
The total number of points is 120 i.e., 1 point per minute.
Do not forget
More Clustering
CURE Algorithm
Non-Euclidean Approaches
1
The CURE Algorithm
Problem with BFR/k -means:
Assumes clusters are normally distributed
in each dimension.
And axes are fixed - ellipses at an angle
are not OK.
CURE:
Assumes a Euclidean distance.
Conjunctive Queries
= safe, Datalog rules:
H :- G1 & & Gn
Most common form of query; equivalent to
select-project-join queries.
Useful for optimization of active elements,
e.g., checking distributed constraints,
maintaining materialized views.
Useful for
CQ's With Negation
General form of conjunctive query with negation
CQN:
H :- G1 & . & Gn &
NOT F1 & . & NOT Fm
G's are positive subgoals; F 's are negative
subgoals.
Apply CQN Q to DB D by considering all
possible substitutions of constants for the
variab
Five Groups of Rules for Magic Construction
Let r be a typical rule
H :- G1 & G2 & & G
Group I
Supplementary magic for next subgoal. If G
has IDB predicate p:
m pbound args of G :- sup ,1variables
Group II
Magic 0th supplementary. If head has predicate
q:
Rule Goal Trees
Nodes correspond to rules and to subgoals of rules.
Rule node: children = subgoals of the rule.
Goal node: children = rules whose heads unify
with the goal. Unifying substitution must be
made in the rule.
3 Be careful that local variables
Magic Sets
Optimization technique for recursive Datalog.
Also a win on some nonrecursive SQL
Mumick, Finkelstein, Pirahesh, and
Ramakrishnan, 1990 SIGMOD, pp. 247 258.
Combines bene ts of both top-down
backward chaining, recursive tree search
and bottom-u
Getting All You Can Out of Views
The situation is that we are given a collection
of views and a query possibly recursive.
3 We want to nd all the answers to the
query that we can using the views.
This technology, due to Oliver Duschka, comes
from Infomast
1
The Bucket Algorithm
We can answer queries using views" by trying
all CQ's with no more view subgoals than the
query has subgoals.
However, a more organized exploration of the
possibilities, called the bucket algorithm looks
at how views can cover" each
Using CQ Theory in Information Integration
Yes; this stu really does get used in systems. We
shall talk about three somewhat di erent systems
that use the theory in various ways:
1. Information Manifold, developed by Alon
Levy at ATT Research Labs Levy is
Relationships Among Semantics
If a program + EDB has a strati ed or perfect
locally strati ed model, then that is the unique
stable model.
A program + EDB can have a unique stable
model even if there is no perfect model.
Example
p :- NOT q
q :- NOT p
p :-
Local Strati cation
Instantiate rules; i.e., substitute all
possible constants for variables, but reject
instantiations that cause some EDB subgoal
to be false.
3 Ground atom = atom with no variables.
Build dependency graph at the level of ground
atoms by
Review of Logic as a Query Language
Datalog programs are collections of rules, which are
Horn clauses or if-then expressions.
Example
The following rules express what is needed to
make" a le. It assumes these relations or EDB
extensional database predicat
Using Views to Implement
Datalog Programs
Inverse Rules
Duschkas Algorithm
1
Inverting Rules
Idea: invert the view definitions to give
the global predicates definitions in terms
of views and function symbols.
Plug the globals definitions into the body
of
AssociationRules
MarketBaskets
FrequentItemsets
AprioriAlgorithm
1
TheMarketBasketModel
x Alargesetofitems,e.g.,thingssoldin
asupermarket.
x Alargesetofbaskets,eachofwhichis
asmallsetoftheitems,e.g.,thethings
onecustomerbuysononeday.
2
Support
x Simplestq
CS345DataMining
Introductions
WhatIsIt?
CulturesofDataMining
1
CourseStaff
x Instructors:
AnandRajaraman
JeffUllman
x TA:
RobbieYan
2
Requirements
x Homework(Gradianceandother)20%
GradianceclasscodeBB8F698B
x Project40%
x FinalExam40%
3
Project
x Soft
MoreStreamMining
CountingHowManyElements
ComputingMoments
1
CountingDistinctElements
x Problem:adatastreamconsistsof
elementschosenfromasetofsizen.
Maintainacountofthenumberof
distinctelementsseensofar.
x Obviousapproach:maintainthesetof
elementsseen.
2
A