notes99-1 - What is Database Theory? A collection of...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: What is Database Theory? A collection of studies, often connected to the relational model of data. Restricted forms of logic, between SQL and full rst-order. Dependency theory: generalizing functional dependencies. Conjunctive queries CQ's: useful, decidable special case. Universal relations" tting a database schema into a single virtual relation. 1 Why Care? A lot of this work was, quite frankly, done for the fun of it." However, it turns out to have unexpected applications, as natural ideas often do: Information integration: 3 Logic, CQ's, etc., used for expressing how information sources t together. 3 Recent work using universal-relation too | eliminates requirement that user understand a lot about the integrated schema. More powerful query languages. 3 Recursion needed in repositories, other applications. 3 Database logic provided some important ideas used in SQL3 standard: seminaive evaluation, strati ed negation. Potential application: constraints and triggers are inherently recursive. When do they converge? 2 1. Logic intro, especially logical rules if-then, dealing with negation. 3 In database logic there is a special semantics frowned upon by Mathematicians, but it works. 2. Logic processing: optimizing collections of rules that constitute a query. 3 Magic-sets" technique for recursive queries. 3. Conjunctive queries: decidability of containment, special cases. 4. Information-integration architectures: rule expansion vs. systems that piece together solutions to queries from logical de nitions of sources. 3 Important CQ application. 5. Universal relation data model: answering queries without knowing the schema. Outline of Topics 3 6. Other stu if I have time for it and or there is class interest: a Data mining of databases. b Materialized views, warehouses, data cubes. 4 Course Requirements 1. The usual stu : midterm, nal, problem sets. 2. A project: 3 Each student should attempt to implement an algorithm for one of the problems discussed in the class. 3 Your choice, but you should pick something that is combinatorially hard, i.e., the problem is dealing e ciently with large cases. 3 I'll suggest some problems as we go, and keep a list on the Web page. 5 Review of Logic as a Query Language Example Datalog programs are collections of rules, which are Horn clauses or if-then expressions. The following rules express what is needed to make" a le. It assumes these relations or EDB extensional database predicates are available: 1. sourceF : F is a source le, i.e., stored in the le system. 2. includesF; G: le F includes le G. 3. createF; P; G: we create le F by applying process P to le G. reqF,F reqF,G reqF,G reqF,G ::::sourceF includesF,G createF,P,G reqF,H & reqH,G 6 Rules :- is read if" Atom = predicate applied to arguments. Head is atom. Body is logical AND of zero or more atoms. Atoms of body are called subgoals. Head predicate is IDB intensional database = predicate de ned by rules. Body subgoals may have IDB or EDB predicates. Datalog program = collection of rules. One IDB predicate is distinguished and represents result of program. Head :- Body 7 Meaning of Rules The head is true for its arguments whenever there exist values for any local variables those that appear in the body, but not the head that make all the subgoals true. Extensions 1. Negated subgoals. Example: cycleF :- reqF,F & NOT sourceF 2. Constants as arguments. Example: reqF,"stdio.h" :- typeF,"cCode" 3. Arithmetic subgoals. Example: compositeA :- dividesB,A & B 1&B A 3 Opposite of an arithmetic atom is a relational atom. 8 Applying Rules Naive Evaluation" Given an EDB: 1. Start with all IDB relations empty. 2. Instantiate with constants variables of all rules in all possible ways. If all subgoals become true, then infer that the head is true. 3. Repeat 2 in rounds," as long as new IDB facts can be inferred. 2 makes sense and is nite, as long as rules are safe = each variable that appears anywhere in the rule appears in some nonnegated, nonarithmetic subgoal of the body. Limit of 1 3 = Least xed point of the rules and EDB. 9 Seminaive Evaluation More e cient approach to evaluating rules. Based on principle that if at round i a fact is inferred for the rst time, then we must have used a rule in which one or more subgoals were instantiated to facts that were inferred on round i , 1. Thus, for each IDB predicate p, keep both relation P and relation P ; the latter represents the new facts for p inferred on the most recent round. 10 1. Initialize IDB relations by using only those rules without IDB subgoals. 2. Initialize the -IDB relations to be equal to the corresponding IDB relations. 3. In one round, for each IDB predicate p: a Compute new P by applying each rule for p, but with one subgoal treated as a -IDB relation and the others treated as the correct IDB or EDB relation. Do for all possible choices of the -subgoal. b Remove from new P all facts that are already in P . c P := P P . 4. Repeat 3 until no changes to any IDB relation. Outline of SNE Algorithm 11 Example 1 2 3 4 reqF,F reqF,G reqF,G reqF,G ::::sourceF includesF,G createF,P,G reqF,H & reqH,G Assume EDB relations S , I , C and IDB relation R, with obvious correspondence to predicates. Initialize: R := R := 1=2 S S I 1 3 C . Iterate until R = ;: 1. R := 1 3 R . R R . R 2. R := R , R 3. R := R R ; ; 12 Models Model of rules + EDB facts = set of atoms selected to be true such that 1. An EDB fact is selected true i it is in the given EDB relation. 2. All rules become true under any instantiation of the variables. 3 Facts not stated true in the model are assumed false. 3 Only way to falsify a rule is to make each subgoal true and the head false. Minimal model = model + no proper subset is a model. For a Datalog program with only nonnegated, relational atoms in the bodies, the unique minimal model is what naive or seminaive evaluation produces, i.e., the IDB facts we are forced to deduce. Moreover, this LFP is reached after a nite number of rounds, if the EDB is nite. 13 Terms built from 1. Constants. 2. Variables. 3. Function symbols applied to terms as arguments. 3 Example: Function Symbols addr streetmaple; number101 , 14 Example Binary trees de ned by isTreenull isTreenodeL,T1,T2 :labelL & isTreeT1 & isTreeT2 If labela and labelb are true, infers facts like , isTree nodea; null; null , isTree node b; null; nodea; null; null Application of rules as for Datalog: make all possible instantiations of variables and infer head if all subgoals are true. LFP is still unique minimal model, as long as subgoals are relational, nonnegated. But LFP may be reached only after an in nite number of rounds. 15 ...
View Full Document

This document was uploaded on 03/04/2012.

Ask a homework question - tutors are online