behavior_of_database_production_rules__t_1568344

behavior_of_database_production_rules__t_1568344 - Proc. of...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Proc. of 1992 ACM-SIGMOD Conference, pages 59{68 Behavior of Database Production Rules: Termination, Con uence, and Observable Determinism Alexander Aiken Jennifer Widom Joseph M. Hellerstein IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 faiken, widomg@almaden.ibm.com, joey@postgres.berkeley.edu Abstract. Static analysis methods are given for determining whether arbitrary sets of database production rules are (1) guaranteed to terminate; (2) guaranteed to produce a unique nal database state; (3) guaranteed to produce a unique stream of observable actions. When the analysis determines that one of these properties is not guaranteed, it isolates the rules responsible for the problem and determines criteria that, if satis ed, guarantee the property. The analysis methods are presented in the context of the Starburst Rule System; they will form the basis of an interactive development environment for Starburst rule programmers. 1 Introduction Production rules in database systems allow speci cation of data manipulation operations that are executed automatically whenever certain events occur or conditions are met, e.g. GJ91,Han89,MD89,SJGP90,WF90]. Database production rules provide a general and powerful mechanism for integrity constraint enforcement, derived data maintenance, triggers and alerters, authorization checking, and versioning, as well as providing a platform for large and e cient knowledge-bases and expert systems. However, it can be very di cult in general to predict how a set of database production rules will behave. Rule processing occurs as a result of arbitrary database changes; certain rules are triggered initially, and their execution can trigger additional rules or trigger the same rules additional times. The unstructured, unpredictable, and often nondeterministic behavior of rule processing can be a nightmare for the database rule programmer. A signi cant step in aiding the database rule programmer is to provide information about the following three properties of rule behavior: Termination: Is rule processing guaranteed to terminate after any set of changes to the database in any state? Current address: CS Division, Department of EECS, University of California, Berkeley, CA 94720 Con uence: Can the execution order of non-prioritized rules make any di erence in the nal database state? That is, if multiple rules are triggered at the same time during rule processing, can the nal database state at termination of rule processing depend on which is considered rst? If not, the rule set is con uent. Observable Determinism: If a rule action is visible to the environment (e.g., if it performs data retrieval or a rollback statement), then we say it is observable. Can the execution order of non-prioritized rules make any di erence in the order or appearance of observable actions? If not, the rule set is observably deterministic. These properties can be very di cult or impossible to decide in the general case. We have developed conservative static analysis algorithms that: guarantee that a set of rules will terminate or say that it may not terminate; guarantee that a set of rules is con uent or say that it may not be con uent; guarantee that a set of rules is observably deterministic or say that it may not be observably deterministic. Furthermore, when the answer is \may not" for any of these properties, the analysis algorithms isolate the rules responsible for the problem and determine criteria that, if satis ed, guarantee the property. Hence the analysis can form the basis of an interactive environment where the rule programmer invokes the analyzer to obtain information about rule behavior. If termination, con uence, or observable determinism is desired but not guaranteed, then the user may verify that the necessary criteria are satis ed or may modify the rule set and try again. Our analysis methods have been developed and are presented in the context of the Starburst Rule System WCL91], a fully functional production rules facility integrated into the Starburst extensible relational DBMS prototype at the IBM Almaden Research Center H+ 90]. Although some aspects of the analysis are dependent on Starburst rules, we have tried to remain as general as possible and our methods certainly can be adapted to other database rule languages. 1.1 Related Work Most previous work in static analysis of production rules HH91,Ras90,ZH90] di ers from ours in two ways. First, it considers simpli ed versions of the OPS5 production rule language BFKM85]. OPS5 has a quite di erent model of rule processing than most database production rule systems, including the Starburst Rule System. Second, the goal of previous work is to impose restrictions and/or orderings on OPS5 rule sets such that unique xed points are guaranteed. Our goal, on the other hand, is to permit arbitrary rule sets and provide useful information about their behavior in the database setting. In Section 9 we make some additional, more technical, comparisons, and explain how our analysis techniques subsume results in HH91,Ras90,ZH90]. In KU91], the issue of rule set termination is discussed, along with the issue of con icting updates| determining when one rule may undo changes made by a previous rule. Although models and a problem-solving architecture for rule analysis are proposed, no algorithms are given. In AS91], issues of termination and unique xed points are considered in the context of various extensions to Datalog. In addition to the very di erent semantics of Datalog (logic) and production rules, AS91] does not address the issue of determining whether a given rule set exhibits certain properties (as we do), but rather states results about whether all rule sets in a given language are guaranteed to exhibit the properties. In CW90] we presented initial methods for analyzing termination in the context of deriving production rules for integrity constraint maintenance; these methods form the basis of our approach to termination in this paper. 1.2 Outline of Paper As an introduction to database production rule languages and to establish a basis for our analysis techniques, in Section 2 we give the syntax and semantics of Starburst production rules. In Section 3 we introduce initial notation and de nitions, and we describe some straightforward preliminary analysis of rule sets. In Section 4 we present a model of rule processing to be used as the formal basis for our analysis algorithms. Termination analysis is covered in Section 5 and con uence in Section 6. In Section 7 we give methods for analyzing partial con uence, which speci es that a rule set is con uent with respect to a portion of the database. Observable determinism is covered in Section 8. Finally, in Section 9 we draw conclusions and discuss future work. 2 The Starburst Rule System We provide a brief overview of the set-oriented, SQLbased Starburst production rule language. Further details and numerous examples appear in WCL91,WF90]. Starburst production rules are based on the notion of transitions. A transition is a database state change resulting from execution of a sequence of data manipulation operations. Rules consider only the net e ect of transitions, meaning that: (1) if a tuple is updated several times, only the composite update is considered; (2) if a tuple is updated then deleted, only the deletion is considered; (3) if a tuple is inserted then updated, this is considered as inserting the updated tuple; (4) if a tuple is inserted then deleted, this is not considered at all. A formal theory of transitions and their net e ects appears in WF90]. The syntax for de ning a rule is: create rule name on table when transition predicate if condition ] then action precedes rule-list ] follows rule-list ] The transition predicate speci es one or more triggering operations on the rule's table: inserted, deleted, or updated(c1; : : : ; cn), where c1 ; : : : ; cn are column names. The rule is triggered by a given transition if at least one of the speci ed operations occurred in the net e ect of the transition. The optional condition speci es an SQL predicate. The action speci es an arbitrary sequence of SQL data manipulation operations to be executed when the rule is triggered and its condition is true. The optional precedes and follows clauses are used to induce a partial ordering on the set of de ned rules. If a rule r1 speci es a rule r2 in its precedes list, or if r2 speci es r1 in its follows list, then r1 is higher than r2 in the ordering. (We also say that r1 has precedence or priority over r2.) When no direct or transitive ordering is speci ed between two rules, their order is arbitrary. A rule's condition and action may refer to the current state of the database through top-level or nested SQL select operations. In addition, rule conditions and actions may refer to transition tables, which are logical tables re ecting the changes to the rule's table that have occurred during the triggering transition. At the end of a given transition, transition table inserted in a rule refers to those tuples of the rule's table that were inserted by the transition, transition table deleted refers to those tuples that were deleted, and transition tables new-updated and old-updated refer to the new and old values (respectively) of the updated tuples. A rule may refer only to transition tables corresponding to its triggering operations. Rules are activated at rule assertion points. There is an assertion point at the end of each transaction, and there may be additional user-speci ed assertion points within a transaction. We describe the semantics of rule processing at an arbitrary assertion point. The state change resulting from the user-generated database operations executed since the last assertion point (or start of the transaction) creates the rst relevant transition, and some set of rules are triggered by this transition. A triggered rule r is chosen from this set for consideration. Rule r must be chosen so that no other triggered rule has precedence over r. If r has a condition, then it is checked. If r's condition is false, then another triggered rule is chosen for consideration. Otherwise, if r has no condition or its condition is true, then r's action is executed. After execution of r's action, all rules not yet considered are triggered only if their transition predicates hold with respect to the composite transition cre- ated by the initial transition and subsequent execution of r's action. That is, these rules see r's action as if it were executed as part of the initial transition. Rules already considered (including r) have already \processed" the initial transition; thus, they are triggered again only if their transition predicate holds with respect to the transition created by r's action. From the new set of triggered rules, a rule r is chosen for consideration such that no other triggered rule has precedence over r . Rule processing continues in this fashion. At an arbitrary time in rule processing, a given rule is triggered if its transition predicate holds with respect to the (composite) transition since the last time it was considered. If it has not yet been considered, it is triggered if its transition predicate holds with respect to the transition since the last rule assertion point or start of the transaction. The values of transition tables in rule conditions and actions always re ect the rule's triggering transition. Rule processing terminates when there are no triggered rules. The analysis techniques we present are based on this language and rule processing semantics, but with modi cations they also could apply to other similar languages; see Section 9. 0 0 3 De nitions and Preliminary Analysis Let R = fr1; r2; : : : ; rng denote an arbitrary set of Starburst production rules to be analyzed. Analysis is performed on a xed set of rules|when the rule set is changed, analysis must be repeated. (Incremental methods are certainly possible; see Section 9.) Let P denote the set of user-de ned priority orderings on rules in R (as speci ed by their precedes and follows clauses), including those implied by transitivity. P = fri > rj ; rk > rl ; : : :g, where ri > rj denotes that rule ri has precedence over rj . Let T = ft1 ; t2; : : : ; tm g denote the tables in the database schema, and let C = fti :cj ; tk :cl ; : : :g denote the columns of tables in T . Finally, let O denote the set of database modi cation operations: O = fhI; ti j t 2 T g fhD; ti j t 2 T g fhU; t:ci j hI; ti t:c 2 Cg denotes insertions into table t, hD; ti denotes deletions from table t, and hU; t:ci denotes updates to column c of table t. The following de nitions are computed using straightforward preliminary analysis of the rules in R: Triggered-By takes a rule r and produces the set of operations in O that trigger r. Triggered-By is trivial to compute based on rule syntax. Performs takes a rule r and produces the set of operations in O that may be performed by r's action. Performs is trivial to compute based on rule syntax. Triggers takes a rule r and produces all rules r that can become triggered as a result of r's action (possibly including r itself). Triggers(r) = fr 2 R j Performs(r) \ Triggered-By(r ) 6= ;g. Reads takes a rule r and produces all columns in C that may be read by r in its condition or action. 0 0 0 Reads( ) contains every referenced in a select or where clause in condition or action. In addition, for every htransi referenced, where htransi is one of inserted, deleted, new-updated, or oldupdated, is in Reads( ) for 's triggering table . (Recall from Section 2 that inserted, deleted, new-updated, and old-updated are transition ta1 r t:c 0 rs :c t:c r r t bles based on changes to t.) Can-Untrigger takes a set of operations O O and produces all rules that can be \untriggered" as a result of operations in O . A rule is untriggered if it is triggered at some point during rule processing but not chosen for consideration, then subsequently no longer triggered because all triggering changes were undone by other rules.2 Can-Untrigger(O ) = fr 2 R j hD; ti 2 O and hI; ti or hU; t:ci 2 Triggered-By(r) for some t 2 T ; t:c 2 C g. Choose takes a set of triggered rules R R and produces a subset of R indicating those rules eligible for consideration (based on priorities). Choose(R ) = fri j ri 2 R and there is no rj 2 R such that rj > ri 2 P g. Observable takes a rule r and indicates whether r's action may be observable. In Starburst, a rule's action may be observable i it includes a select or rollback statement. 0 0 0 0 0 0 0 0 0 4 Execution Model We now de ne a formal model of execution-time rule processing. The model is based on execution graphs and accurately captures the semantics of rule processing described in Section 2. Note that execution graphs are used to discuss and to prove the correctness of our analysis techniques, but they are not part of the analysis itself. A directed execution graph has a distinguished initial state representing the start of rule processing (at any rule assertion point) and zero or more nal states representing termination of rule processing. The paths in the graph represent all possible execution sequences during rule processing; branches in the graph result from choosing di erent rules to consider when more than one is eligible. (Hence any graph for a totally ordered rule set has no branches.) The graph may have in nitely long paths, possibly due to cycles, and these represent nontermination of rule processing. More formally, a state (node) S in an execution graph has two components: (1) a database state D; (2) a set TR containing each triggered rule and its associated transition tables. We denote this state as S = (D; TR). The initial state I is created by an initial transition, which results from a sequence of user-generated database operations. Hence, I = (DI ; TRI ) where DI is a data1 Note that, unlike in OPS5, there is no distinction between reading values \positively" and \negatively" in this rule language. 2 As an example, a rule r1 might be triggered by insertions, but another rule r2 might delete all inserted tuples before r1 is chosen for consideration. Untriggering is rare in practice. base state and there is some (possibly empty) set of operations O O such that: TRI = fr 2 R j O \ Triggered-By(r) 6= ;g O are the operations producing the initial transition, and TRI contains the rules triggered by those operations. A nal state F is some (DF ; ;), since no rules are triggered when rule processing terminates. Each directed edge in an execution graph is labeled with a rule r and represents the consideration of r during rule processing. (This includes determining whether r's condition is true and, if so, executing r's action.) Using de nitions from Section 3, the following lemma states certain properties that hold for all execution graphs. The lemma is stated without proof|it follows directly from the semantics of rule processing described in Section 2. 0 0 0 Lemma 4.1 (Properties of Execution Graphs) Consider any execution graph edge from a state (D1 ; TR1 ) to a state (D2 ; TR2 ) labeled with a rule r. Then: r 2 Choose(TR1 ) There is some (possibly empty) set of operations O Performs(r) such that the triggered rules in TR2 can be derived from the triggered rules in TR1 by: 1. removing rule r 2. removing some subset of the rules in Can-Untrigger(O ) 3. adding all rules r 2 R such that O \ Triggered-By(r ) 6= ; 2 The operations in O are those executed by r's action. If r 's condition is false then O is empty. If r 's condition is true then O still may be a proper subset of Performs(r) since, by the semantics of SQL, for most operations there are certain database states on which they have no e ect. Finally, note that although rule r is removed in step 1, r may be added again in step 3 if O \ Triggered-By(r) 6= ;. The properties in Lemma 4.1 are guaranteed for all execution graphs. By performing more complex analysis on rule conditions and actions, by incorporating properties of database states, and by considering a variety of special cases, we probably can identify additional properties of execution graphs. Since our analysis techniques are based on execution graph properties, more accurate properties may result in more accurate rule analysis. We believe that the properties used here, although somewhat conservative, are su ciently accurate to yield strong analysis techniques. 0 0 0 0 0 0 0 0 0 5 Termination We want to determine whether the rules in R are guaranteed to terminate. That is, we want to determine if for all user-generated operations and initial database states, rule processing always reaches a point at which there are no triggered rules to consider. We take as an assumption that individual rule actions terminate. Hence, in terms of execution graphs, the rules in R are guaranteed to terminate i all paths in every execution graph for R are nite. As suggested in CW90], termination is analyzed by constructing a directed triggering graph for the rules in R, denoted TGR . The nodes in TGR represent the rules in R and the edges represent the Triggers relationship. That is, there is an edge from ri to rj in TGR i rj 2 Triggers(ri ). Theorem 5.1 (Termination) If there are no cycles in TGR then the rules in R are guaranteed to terminate. Proof: Omitted due to space constraints; see AWH92]. Hence, to determine whether the rules in R are guaranteed to terminate, triggering graph TGR is constructed and checked for cycles. Although this may appear to be a very conservative approach, by considering only the known properties of our execution graph model (Lemma 4.1), we see that whenever there is a cycle in the triggering graph, our analysis cannot rule out the possibility that there is an execution graph with an in nite path. Clearly, however, there are a number of special cases in which there is a cycle in the triggering graph but other properties (not captured in Lemma 4.1) guarantee termination. Examples are: The action of some rule r on the cycle only deletes from a table t, and no other rules on the cycle insert into t. Eventually r's action has no e ect. The action of some rule r on the cycle only performs a \monotonic" update (e.g. increments values), guaranteeing that the condition of some rule r on the cycle eventually becomes false (e.g. some value is less than 10). Although some such cases may be detected automatically, for now we assume that they are discovered by the user through the interactive analysis process: Once the analyzer has built the triggering graph for the rules in R, the user is noti ed of all cycles (or strong components). If the user is able to verify that, on each cycle, there is some rule r such that repeated consideration of the rules on the cycle guarantee that r's condition eventually becomes false or r's action eventually has no e ect, then the rules in R are guaranteed to terminate. As part of a case study, we used this approach to establish termination for a set of rules in a power network design application CW90]. 0 6 Con uence Next we want to determine whether the rules in R are con uent. That is, we want to determine if the nal database state at termination of rule processing can depend on which rule is chosen for consideration when multiple non-prioritized rules are triggered. In terms of execution graphs, the rules in R are con uent if every execution graph for R has at most one nal state. (Recall that all nal states in an execution graph have an empty set of triggered rules, so two di erent nal states cannot represent the same database state.) Con uence for production rules is a particularly di cult problem because, in addition to the standard problems associated with con uence Hue80], we must take into account the interactions between rule triggering and rule priorities. For example, it is not su cient to simply consider the combined e ects of two rule actions; it also S j i ?? @ @ R @ ? j i @@ ?? j@ ? i R r r S S r r S 0 Figure 1: Commutative rules is necessary to consider all rules that can become triggered, directly or indirectly, by those actions, and the relative ordering of these triggered rules. These issues are discussed as we develop our requirements for con uence in Section 6.3. As preliminaries, we rst introduce the notion of rule commutativity, and we make a useful observation about execution graphs. 6.1 Rule Commutativity We say that two rules ri and rj are commutative (or ri and rj commute) if, given any state S in any execution graph, considering rule ri and then rule rj from state S produces the same execution graph state S as considering rule rj and then rule ri ; this is depicted in Figure 1. If this equivalence does not always hold, then ri and rj are noncommutative (or ri and rj do not commute). Each rule clearly commutes with itself. Based on the de nitions of Section 3, we give a set of conditions for analyzing whether pairs of distinct rules commute. Lemma 6.1 For distinct rules ri and rj , if any of the following conditions hold then ri and rj may be noncommutative; otherwise they are commutative: 1. rj 2 Triggers(ri), i.e. ri can cause rj to become triggered 2. rj 2 Can-Untrigger(Performs(ri)), i.e. ri can untrigger rj 3. hI; ti, hD; ti, or hU; t:ci is in Performs(ri) and t:c is in Reads(rj ) for some t:c 2 C , i.e. ri's operations can a ect what rj reads 4. hI; ti is in Performs(ri ) and hD; ti or hU; t:ci is in Performs(rj ) for some t 2 T or t:c 2 C , i.e. ri 's insertions can a ect what rj updates or deletes3 5. hU; t:ci is in both Performs(ri) and Performs(rj ), i.e. ri 's updates can a ect rj 's updates 6. any of 1{5 with ri and rj reversed 2 We leave it to the reader to verify that if a pair of rules does not satisfy any of 1{6 then the rules are guaranteed to commute. The conditions in Lemma 6.1 are somewhat conservative and probably could be re ned by performing more complex analysis on rule conditions and actions and by considering a variety of special cases. As two examples of this, consider rules ri and rj such that: 0 3 In SQL it is possible to delete from or update a table without reading the table, which is why cases 4 and 5 are distinct from case 3. 1. ri inserts into a table t and rj deletes from t, but the tuples inserted by ri never satisfy the delete condition of rj , or 2. ri and rj update the same table but never the same tuples. In the rst example, ri and rj are noncommutative according to condition 4 of Lemma 6.1, but they do actually commute. In the second example, ri and rj are noncommutative according to condition 5 but do commute. Although some such cases may be detected automatically, for now we assume that they are speci ed by the user during the interactive analysis process: We allow the user to declare that pairs of rules that appear noncommutative according to Lemma 6.1 actually do commute. The analysis algorithms then treat these rules as commutative. 6.2 Observation We say that two rules ri and rj are unordered if neither ri > rj nor rj > ri is in P . (Similarly, we say two rules ri and rj are ordered if ri > rj or rj > ri is in P .) Based on our execution graph model, we make the following observation about possible states, which is used in the next section to develop our criteria for con uence. Observation 6.2 Consider any two unordered rules ri and rj in R. It is very likely that there is an execution graph with a state that has (at least) two outgoing edges, one labeled ri and one labeled rj . (Informally, there is very likely a scenario in which both ri and rj are triggered and eligible for consideration. Recall that a triggered rule r is eligible for consideration i there is no other triggered rule with precedence over r.) Justi cation: Let O = Triggered-By(ri) TriggeredBy(rj ). Consider an execution graph for which the operations in O are the initial user-generated operations, so that ri and rj are both triggered in the initial state. Consider any path of length 0 or more from the initial state to a state S = (D; TR) in which there are no rules r 2 TR such that r > ri or r > rj is in P , i.e. there are no triggered rules with precedence over ri or rj .4 State S has at least two outgoing edges, one labeled ri and one labeled rj . 2 0 0 6.3 Analyzing Con uence We now return to the question of con uence. We want to determine if every execution graph for R is guaranteed to have at most one nal state. For two execution graph states Si and Sj , let Si ! Sj denote that there is an edge in the execution graph from state Si to state Sj and let Si ! Sj denote that there is a path of length 0 or more from Si to Sj . (! is the re exive-transitive closure of !.) Our rst Lemma establishes conditions for con uence based on !: 4 Such a path does not exist if ri or rj is untriggered along all potential paths, or if rules with precedence over ri or rj are considered inde nitely along all potential paths. These are highly unlikely (and probably undesirable) circumstances, but are why this is an observation rather than a theorem. S ?? @@ R @ ? j i @@ ?? R @? S j i ?? @@ R @ ? j i ?? @ @ R @ ? j i @@ ?? R @? r r S S S S 0 i S S S R 0 0 0 0 0 0 0 0 ? ? j ? ? r 0 0 1 0 Lemma 6.3 (Path Con uence) Consider an arbitrary execution graph EG and suppose that for any three states S , Si , and Sj in EG such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S (Figure 2a). Then EG has at most one nal state.5 Proof: Suppose, for the sake of a contradiction, that EG has two distinct nal states, F1 and F2. Let I be the initial state, so I ! F1 and I ! F2. Then, by assumption, there must be a fourth state S such that F1 ! S and F2 ! S . Since F1 and F2 are both nal states, S = F1 and S = F2, contradicting F1 6= F2. 2 It is quite di cult in general to determine when the supposition of Lemma 6.3 holds, since it is based entirely on arbitrarily long paths. The following Lemma gives a somewhat weaker condition that is easier to verify and implies the supposition of Lemma 6.3; it does, however, add the requirement that rule processing is guaranteed to terminate: Lemma 6.4 (Edge Con uence) Consider an arbitrary execution graph EG with no in nite paths. Suppose that for any three states S , Si , and Sj in EG such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S (Figure 2b). Then for any three states S , Si , and Sj in EG such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S . Proof: Classic result; see e.g. Hue80]. We use Lemma 6.4 as the basis for our analysis techniques. Based on this Lemma (along with Lemma 6.3), we can guarantee con uence for the rules in R if we know 1. there are no in nite paths in any execution graph for R (i.e., the rules in R are guaranteed to terminate), and 2. in any execution graph for R, for any three states S , Si , and Sj such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S . We assume that the rst condition has been established through the analysis techniques of Section 5; we focus 0 5 Sometimes the term con uence is used to denote the supposition of this Lemma Hue80], which then implies con uence in the sense that we've de ned it. j S S (a) Based on paths (b) Based on edges Figure 2: Conditions for con uence 0 r r S R 2 i r i 0 S j 0 Figure 3: Paths towards common state S 0 our attention on analysis techniques for establishing the second condition. Consider any execution graph for R and any three states S , Si , and Sj such that S ! Si and S ! Sj . This con guration is produced by every state S that has at least two unordered triggered rules that are eligible for consideration. Let ri be the rule labeling edge S ! Si and rj be the rule labeling edge S ! Sj , as in Figure 2b. We want to prove that there is a fourth state S such that Si ! S and Sj ! S . It is tempting to assume that if ri and rj are commutative, then rj can be considered from state Si and ri from Sj , producing a common state S as in Figure 1. Unfortunately, this is not always possible: If ri causes a rule r with precedence over rj to become triggered, then rj is not eligible for consideration in state Si (similarly for ri in state Sj ). Since the new triggered rule r must be considered before rule rj , r must commute with rj . Furthermore, r may cause additional rules with precedence over rj to become triggered. With this in mind, we motivate the requirements for the existence of a common state S that is reachable from both Si and Sj . We do this by attempting to \build" valid paths from Si and Sj towards S ; call these paths p1 and p2, respectively. From state Si , triggered rules with precedence over rj are considered until rj is eligible; call these rules R1 . Similarly, from Sj triggered rules with precedence over ri are considered until ri is eligible; call these rules R2. After this, rj can be considered on path p1 and ri can be considered on path p2 . Paths p1 and p2 up to this point are depicted in Figure 3. Now suppose that from state Si we can continue path p1 by considering the rules in R2 (in the same order), i.e. suppose the rules in R2 are appropriately triggered and eligible. Similarly, suppose that from Sj we can consider the rules in R1. Then the same rules are considered along both paths. Consequently, if each rule in frig R1 commutes with each rule in frj g R2 , then the two paths are equivalent and reach a common state S ; this is depicted in Figure 4. Unfortunately, even this scenario is not necessarily valid: There is no guarantee that the rules in R2 are triggered and eligible from state Si ; similarly for R1 and Sj . 0 0 0 0 0 0 0 0 0 0 0 Lemma 6.6 (Con uence Lemma) S j i ?? @ @ R @ ? r r i R 0 j S S 1 ? j ? ? j i @@ ?? 2 R @?1 r ? R r S 0 S R 2 i 0 R S 0 Figure 4: Paths reaching common state S 0 0 0 R 1 2 fri g frj g repeat until unchanged: 1 fr 2 R j r 2 Triggers(r1 ) for some r1 2 R1 and r > r2 2 P for some r2 2 R2 and r 6= rj g R2 R2 fr 2 R j r 2 Triggers(r2 ) for some r2 2 R2 and r > r1 2 P for some r1 2 R1 and r 6= rig For every pair of rules r1 2 R1 and r2 2 R2, r1 and r2 R 1 R must commute. 2 The following lemma and theorem formally prove that the requirement of De nition 6.5 indeed guarantees conuence. 0 0 0 (For example, a rule in R2 may not be eligible from state Si because rj triggered a rule with higher priority.) We can guarantee this, however, if we extend the rules originally considered in R1 to include all eligible rules with precedence over rules in R2, and extend the rules in R2 similarly. Using this mutually recursive de nition of R1 and R2, the pairwise commutativity of rules in fri g R1 with rules in frj g R2 guarantees the existence of state S , and consequently guarantees con uence. To establish con uence for the rules in R, then, we must consider in this fashion every pair of rules ri and rj such that some state in some execution graph for R may have two outgoing edges, one labeled with ri and one with rj . Recall Observation 6.2: For any two unordered rules ri and rj , it is very likely that there is an execution graph with a state that has two outgoing edges, one labeled ri and one labeled rj . Consequently, we consider every pair of unordered rules, and our analysis requirement for con uence is stated as follows. De nition 6.5 (Con uence Requirement) Consider any pair of unordered rules ri and rj in R. Let R1 R and R2 R be constructed by the following algorithm: R Suppose the Con uence Requirement (De nition 6.5) holds for R. Then in any execution graph EG for R, for any three states S , Si , and Sj in EG such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S . Proof: Omitted due to space constraints; see AWH92]. (The formal proof parallels the motivation shown in Figure 4, although the full construction is slightly more complex.) Theorem 6.7 (Con uence Theorem) Suppose the Con uence Requirement holds for R and there are no in nite paths in any execution graph for R. Then any execution graph for R has exactly one nal state, i.e. the rules in R are con uent. Proof: Let EG be any execution graph for R. By Conuence Lemma 6.6, for any three states S , Si , and Sj in EG such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S . Therefore, by Edge Con uence Lemma 6.4, for any three states S , Si , and Sj in EG such that S ! Si and S ! Sj , there is a fourth state S such that Si ! S and Sj ! S . By Path Con uence Lemma 6.3, EG has at most one nal state, hence (since there are no in nite paths) EG has exactly one nal state. 2 Thus, analyzing whether the rules in R are con uent requires considering each pair of unordered rules ri and rj in R: Sets R1 and R2 are built from ri and rj according to De nition 6.5, and the rules in R1 and R2 are checked pairwise for commutativity. 0 0 0 0 0 6.4 Using Con uence Analysis If our analysis determines that the rules in R are not con uent, it can be attributed to pairs of unordered rules ri and rj that generate sets R1 and R2 such that rules r1 2 R1 and r2 2 R2 do not commute. (In the most common case, r1 and r2 are ri and rj themselves; see Corollary 6.8 below.) With this information, it appears that the user has three possible courses of action towards con uence (short of modifying the rules themselves): 1. Certify that rules r1 and r2 actually do commute 2. Specify a user-de ned priority between rules ri and rj so they no longer must satisfy the Con uence Requirement 3. Remove user-de ned priorities so r1 or r2 is no longer part of R1 or R2 Approach 1 is clearly the best when it is valid. Approach 3 is non-intuitive and in fact useless: removing orderings to eliminate r1 or r2 from R1 or R2 simply produces a corresponding violation to the Con uence Requirement elsewhere. Hence, if Approach 1 is not applicable (i.e. rules r1 and r2 do not commute) then Approach 2 should be used. Note, however, that adding an ordering between rules ri and rj does not immediately guarantee con uence|sets R1 or R2 may increase for other pairs of rules and indicate that the rule set is still not con uent.6 6 Intuitively, a source of non-con uence can appear to \move around", requiring an iterative process of adding or- As guidelines for developing con uent rule sets, the following corollaries indicate simple properties that are satis ed by the rules in R if they are found to be con uent using our methods. Corollary 6.8 If R is found to be con uent and ri and rj are unordered rules in R, then ri and rj commute. Proof: Unordered rules ri and rj generate sets R1 and R2 such that ri 2 R1 and rj 2 R2 . Hence, by the Conuence Requirement, ri and rj must commute. 2 Corollary 6.9 If R is found to be con uent and P = ; (i.e. there are no user-de ned priorities between any rules in R), then every pair of rules in R commutes. Proof: Follows directly from Corollary 6.8. 2 Corollary 6.10 If R is found to be con uent and ri and rj in R are such that ri may trigger rj (or vice-versa), then ri and rj are ordered. Proof: Since rj 2 Triggers(ri ), by our conditions for noncommutativity (Lemma 6.1), ri and rj do not commute. Suppose, for the sake of a contradiction, that ri and rj are unordered. Then by Corollary 6.8 they must commute. 2 Additional similar corollaries certainly exist and provide useful initial tools for the rule programmer. We used our approach (by hand) to analyze con uence for several medium-sized rule applications. In most cases the rule sets were initially found to be non-con uent. However, for those rule sets that actually were con uent, user speci cation of rule commutativity eventually allowed con uence to be veri ed. Furthermore, for some rule sets the analysis uncovered previously undetected sources of non-con uence. 7 Partial Con uence De nition 7.1 (Signi cant Rules) Let T T be a set of tables. The set of rules that are signi cant with respect to T , denoted Sig(T ), is computed by the following algorithm: Sig(T ) fr 2 R j hI; ti, hD; ti, or hU; t:ci is in Performs(r) for some t 2 T g repeat until unchanged: Sig(T ) Sig(T ) fr 2 R j there is an r 2 Sig(T ) such that r and r do not commuteg 2 That is, Sig(T ) contains all rules that modify any table in T , along with (recursively) all rules that do not commute with rules in Sig(T ). This algorithm determines whether rules commute using our conservative conditions for noncommutativity from Lemma 6.1. Hence, the user can in uence the computation of Sig(T ) by specifying that pairs of rules that appear noncommutative according to Lemma 6.1 actually do commute. As in Con uence Theorem 6.7, partial con uence requires that rules are guaranteed to terminate. In this case, however, the rule set under consideration is Sig(T ). Thus, before analyzing partial con uence, termination of the rules in Sig(T ) must be established using the techniques of Section 5.7 Theorem 7.2 (Partial Con uence) Let T T be a set of tables. Suppose the Con uence Requirement (De nition 6.5) holds for the rules in Sig(T ) and there are no in nite paths in any execution graph for Sig(T ). Then given any two nal states F1 and F2 in any execution graph for R, the tables in T are identical in F1 and F2 , i.e. the rules in R are con uent with respect to T . Proof: Omitted due to space constraints; see AWH92]. Hence, analyzing whether the rules in R are con uent with respect to T requires rst computing Sig(T ), then considering each pair of unordered rules ri and rj in Sig(T ): Sets R1 and R2 are built according to Definition 6.5 and checked pairwise for commutativity. If the analysis determines that the rules in R are not partially con uent, then the same interactive approach as that described in Section 6.4 for con uence can be used here to establish partial con uence. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Con uence may be too strong a requirement for some applications. It sometimes is useful to allow rule set R to be non-con uent for certain \unimportant" (e.g. scratch) tables in the database, but to ensure that R is con uent for other \important" (e.g. data) tables. We call this partial con uence, or con uence with respect to T , where T is a subset of the set of tables T in the database schema. In terms of execution graphs, the rules in R are con uent with respect to T if, given any execution graph EG for R and any two nal states F1 = (D1 ; ;) and F2 = (D2 ; ;) in EG, the tables in T are identical in database states D1 and D2 . (Partial con uence obviously is implied by con uence, since con uence guarantees at most one nal state.) Partial con uence is analyzed by analyzing con uence for a subset of the rules in R: those rules that can directly or indirectly a ect the nal value of tables in T . 0 0 0 0 0 derings (or certifying commutativity) until the rule set is made con uent. This happens because our analysis techniques simply detect that con uence requires two rules to be ordered|the user chooses an ordering, and this choice a ects which additional rules must be ordered. 0 0 0 0 8 Observable Determinism In some database production rule languages, such as Starburst, the nal database state may not be the only e ect of rule processing|some rule actions may be visible to the environment (observable) while rules are being processed. When this is the case, the user may want to determine whether a rule set is observably deterministic, i.e. whether the order and appearance of observable rule actions is the same regardless of which rule is chosen for consideration when multiple non-prioritized rules are 7 That is, even though the rules in Sig(T 0 ) are never processed on their own, it must be established that if they were processed on their own they would terminate. As in Section 6.3, this is necessary for De nition 6.5 to guarantee con uence. triggered. Note that observable determinism and con uence are orthogonal properties: a rule set may be conuent but not observably deterministic or vice-versa. We analyze observable determinism using our techniques for partial con uence. Intuitively, we add a ctional table Obs to the database, and we pretend that those rules with observable actions also \timestamp and log" their observable actions in table Obs. We analyze the resulting rule set for con uence with respect to table Obs; if partial con uence holds, then the rule set is observably deterministic. More formally, recall the de nitions of Section 3. Let T obs = T fObsg be an extended set of tables, let C obs = C fObs:cg be an extended set of columns, and let Oobs be the corresponding extended set of operations. Let Readsobs and Performsobs extend the de nitions of Reads and Performs as follows. For every r 2 R such that Observable(r), add Obs:c to Reads(r) and hI; Obsi to Performs(r). For convenience, we say that a rule r is observable if Observable(r). Theorem 8.1 (Observable Determinism) Suppose, using extended de nitions Tobs , Cobs , Oobs , Readsobs , and Performsobs, that our analysis methods for partial con uence determine that rule set R is conuent with respect to Obs. That is, suppose (from Theorem 7.2) that the Con uence Requirement of De nition 6.5 holds for the rules in Sig(Obs) and there are no in nite paths in any execution graph for R. Then the rules in R are observably deterministic. Proof: By supposition, any hypothetical behavior of the rules in R that is consistent with the de nitions of Readsobs and Performsobs is con uent with respect to Obs. Consider the following such behavior. Suppose each observable rule r, in addition to its existing actions, inserts a new tuple into Obs that contains the current number of tuples in Obs (the \timestamp") and a complete description of r's observable actions (the \log"). Since there is a unique nal value for Obs, the hypothetical tuples written to Obs must be identical on all execution paths. Consequently, there is only one possible order and appearance of observable actions, and the rules in R are observably deterministic. 2 If, using the analysis methods indicated by this theorem, the rules in R are not found to be observably deterministic, then the same interactive approach as that described in Section 6.4 can be used to establish con uence with respect to Obs, and consequently observable determinism. Although this requires the user to be aware of ctional table Obs, the use of Obs in the analysis techniques is quite intuitive and may actually guide the user in establishing observable determinism. The following corollary gives a simple property that is satis ed by the observable rules in R if they are found to be deterministic using our methods. Additional useful corollaries certainly exist. Corollary 8.2 If R is found to be observably deterministic and ri and rj are distinct observable rules in R, then ri and rj are ordered.8 8 Note that this is not an if and only if condition: order- Proof: Since i is observable, Obs 2 Reads( i) and hI Obsi 2 Performs( i ); similarly for j . Therefore, by r ; :c r r r De nition 7.1, ri and rj are both in Sig(Obs). In addition, by Lemma 6.1, ri and rj satisfy our conditions for noncommutativity. Suppose, for the sake of a contradiction, that ri and rj are unordered. Then ri and rj generate sets R1 and R2 (from De nition 6.5) such that ri 2 R1 and rj 2 R2. Hence, by the Con uence Requirement, ri and rj must commute. 2 9 Conclusions and Future Work We have given static analysis methods that determine whether arbitrary sets of database production rules are guaranteed to terminate, are con uent, are partially conuent with respect to a set of tables, or are observably deterministic. Our algorithms are conservative|they may not always detect when a rule set satis es these properties. However, they isolate the responsible rules when a property is not satis ed, and they determine simple criteria that, if satis ed, guarantee the property. Furthermore, for the cases when these criteria are not satis ed, our methods often can suggest modi cations to the rule set that are likely to make the property hold. Consequently, our methods can form the basis of a powerful interactive development environment for database rule programmers. Although our methods have been designed for the Starburst Rule System, we expect that they can be adapted to accommodate the syntax and semantics of other database rule languages. In particular, the fundamental de nitions of Section 3 (Triggers, Performs, Choose, etc.) can simply be rede ned for an alternative rule language. Alternative rule processing semantics will probably require that the execution graph model is modi ed, which consequently will cause algorithms (and proofs) to be modi ed. However, our fundamental \building blocks" of rule analysis techniques can remain the same: the triggering graph for analyzing termination, the Edge and Path Lemmas for analyzing con uence, the notion of partial con uence, and the use of partial con uence in analyzing observable determinism. Some technical comparisons can be drawn between this work and the results in HH91, Ras90, ZH90]. In HH91], a version of the OPS5 production rule language is considered, and a class of rule sets is identi ed that (conservatively) guarantees the unique xed point property, which essentially corresponds to our notion of conuence. By de ning a mapping between our language and the language in HH91], we have shown that our conuence requirements properly subsume their xed point requirements: if a rule set has the unique xed point property according to HH91], then our methods determine that the corresponding rule set is con uent, but not always vice-versa. The methods in HH91] have previously been shown to subsume those in Ras90,ZH90], hence our approach, although still conservative, appears quite accurate when compared with previous work. ings between all pairs of observable rules does not necessarily guarantee observable determinism. Finally, we plan a number of improvements and extensions to this work: Incremental methods: In our current approach, complete analysis is performed after any change to the rule set. In many cases it is clear that most results of previous analysis are still valid and only incremental additional analysis needs to be performed. We plan to modify our methods to incorporate incremental analysis. At the coarsest level, most rule applications can be partitioned into groups of rules such that, across partitions, rules reference di erent sets of tables and have no priority ordering. Although rules from di erent partitions are processed at the same time and their execution may be interleaved, they have no e ect on each other. Hence, analysis can be applied separately to each partition, and it needs to be repeated for a partition only when rules in that partition change. Less conservative methods: As discussed throughout the paper, many of our assumptions, definitions, and algorithms are conservative, and there is room for re nement. This may include more complex analysis of SQL, more accurate properties of our execution model, and a suite of special cases. Restricted user operations: Our analysis assumes that the user-generated operations that initiate rule processing are arbitrary. However, in some cases it may be known that these will be of a particular type, i.e. users will only perform certain operations on certain tables. This may reduce possible execution paths during rule processing, and consequently may guarantee properties that otherwise do not hold. We plan to extend our methods so that termination, con uence, and observable determinism can be analyzed in the context of limited user-generated operations. Implementation and experimentation: We plan to implement our algorithms as part of an interactive development environment for the Starburst Rule System. Although we have veri ed by hand that our methods are indeed useful, implementation will allow practical experimentation with large and realistic rule applications. Acknowledgements Thanks to Stefano Ceri and Guy Lohman for helpful comments on an initial draft. References AS91] S. Abiteboul and E. Simon. Fundamental properties of deterministic and nondeterministic extensions of datalog. Theoretical Computer Science, 78:137{158, 1991. AWH92] A. Aiken, J. Widom, and J.M. Hellerstein. Behavior of database production rules: Termination, con uence, and observable determinism. IBM Research Report RJ 8562, IBM Almaden Research Center, San Jose, California, January 1992. BFKM85] L. Brownston, R. Farrell, E. Kant, and N. Martin. Programming Expert Systems in OPS5: An Introduction to Rule-Based Programming. Addison-Wesley, Reading, Massachusetts, 1985. CW90] S. Ceri and J. Widom. Deriving production rules for constraint maintenance. In Proceedings of the Sixteenth International Conference on Very Large Data Bases, pages 566{577, Brisbane, Australia, August 1990. GJ91] N. Gehani and H.V. Jagadish. Ode as an active database: Constraints and triggers. In Proceedings of the Seventeenth International Conference on Very Large Data Bases, pages 327{336, Barcelona, Spain, September 1991. H+90] L.M. Haas et al. Starburst mid- ight: As the dust clears. IEEE Transactions on Knowledge and Data Engineering, 2(1):143{160, March 1990. Han89] E.N. Hanson. An initial report on the design of Ariel: A DBMS with an integrated production rule system. SIGMOD Record, Special Issue on Rule Management and Processing in Expert Database Systems, 18(3):12{19, September 1989. HH91] J.M. Hellerstein and M. Hsu. Determinism in partially ordered production systems. IBM Research Report RJ 8009, IBM Almaden Research Center, San Jose, California, March 1991. Hue80] G. Huet. Con uent reductions: Abstract properties and applications to term rewriting systems. Journal of the ACM, 27(4):797{821, October 1980. KU91] A.P. Karadimce and S.D. Urban. Diagnosing anomalous rule behavior in databases with integrity maintenance production rules. In Third Workshop on Foundations of Models and Languages for Data and Objects, Aigen, Austria, September 1991. MD89] D.R. McCarthy and U. Dayal. The architecture of an active database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 215{ 224, Portland, Oregon, May 1989. Ras90] L. Raschid. Maintaining consistency in a strati ed production system. In Proceedings of the AAAI National Conference on Arti cial Intelligence, 1990. SJGP90] M. Stonebraker, A. Jhingran, J. Goh, and S. Potamianos. On rules, procedures, caching and views in data base systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 281{290, Atlantic City, New Jersey, May 1990. WCL91] J. Widom, R.J. Cochrane, and B.G. Lindsay. Implementing set-oriented production rules as an extension to Starburst. In Proceedings of the Seventeenth International Conference on Very Large Data Bases, pages 275{285, Barcelona, Spain, September 1991. WF90] J. Widom and S.J. Finkelstein. Set-oriented production rules in relational database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 259{ 270, Atlantic City, New Jersey, May 1990. ZH90] Y. Zhou and M. Hsu. A theory for rule triggering systems. In Advances in Database Technology| EDBT '90, Lecture Notes in Computer Science 416, pages 407{421. Springer-Verlag, Berlin, March 1990. ...
View Full Document

This note was uploaded on 06/14/2011 for the course NOTHING 1 taught by Professor Kalantari during the Spring '11 term at Aarhus Universitet.

Ask a homework question - tutors are online