This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Proc. of 1992 ACMSIGMOD Conference, pages 59{68 Behavior of Database Production Rules:
Termination, Con uence, and Observable Determinism
Alexander Aiken
Jennifer Widom
Joseph M. Hellerstein IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120
faiken, widomg@almaden.ibm.com, joey@postgres.berkeley.edu Abstract. Static analysis methods are given for determining whether arbitrary sets of database production rules are
(1) guaranteed to terminate; (2) guaranteed to produce a
unique nal database state; (3) guaranteed to produce a
unique stream of observable actions. When the analysis determines that one of these properties is not guaranteed, it
isolates the rules responsible for the problem and determines
criteria that, if satis ed, guarantee the property. The analysis methods are presented in the context of the Starburst
Rule System; they will form the basis of an interactive development environment for Starburst rule programmers. 1 Introduction Production rules in database systems allow speci cation
of data manipulation operations that are executed automatically whenever certain events occur or conditions are
met, e.g. GJ91,Han89,MD89,SJGP90,WF90]. Database
production rules provide a general and powerful mechanism for integrity constraint enforcement, derived data
maintenance, triggers and alerters, authorization checking, and versioning, as well as providing a platform for
large and e cient knowledgebases and expert systems.
However, it can be very di cult in general to predict how
a set of database production rules will behave. Rule processing occurs as a result of arbitrary database changes;
certain rules are triggered initially, and their execution
can trigger additional rules or trigger the same rules additional times. The unstructured, unpredictable, and
often nondeterministic behavior of rule processing can
be a nightmare for the database rule programmer.
A signi cant step in aiding the database rule programmer is to provide information about the following three
properties of rule behavior:
Termination: Is rule processing guaranteed to terminate after any set of changes to the database in any
state? Current address: CS Division, Department of EECS,
University of California, Berkeley, CA 94720 Con uence: Can the execution order of nonprioritized rules make any di erence in the nal database
state? That is, if multiple rules are triggered at
the same time during rule processing, can the nal
database state at termination of rule processing depend on which is considered rst? If not, the rule set
is con uent.
Observable Determinism: If a rule action is visible
to the environment (e.g., if it performs data retrieval
or a rollback statement), then we say it is observable.
Can the execution order of nonprioritized rules make
any di erence in the order or appearance of observable actions? If not, the rule set is observably deterministic.
These properties can be very di cult or impossible to
decide in the general case. We have developed conservative static analysis algorithms that:
guarantee that a set of rules will terminate or say
that it may not terminate;
guarantee that a set of rules is con uent or say that
it may not be con uent;
guarantee that a set of rules is observably deterministic or say that it may not be observably deterministic.
Furthermore, when the answer is \may not" for any of
these properties, the analysis algorithms isolate the rules
responsible for the problem and determine criteria that,
if satis ed, guarantee the property. Hence the analysis
can form the basis of an interactive environment where
the rule programmer invokes the analyzer to obtain information about rule behavior. If termination, con uence,
or observable determinism is desired but not guaranteed,
then the user may verify that the necessary criteria are
satis ed or may modify the rule set and try again.
Our analysis methods have been developed and are
presented in the context of the Starburst Rule System
WCL91], a fully functional production rules facility integrated into the Starburst extensible relational DBMS
prototype at the IBM Almaden Research Center H+ 90].
Although some aspects of the analysis are dependent on
Starburst rules, we have tried to remain as general as
possible and our methods certainly can be adapted to
other database rule languages. 1.1 Related Work Most previous work in static analysis of production rules
HH91,Ras90,ZH90] di ers from ours in two ways. First,
it considers simpli ed versions of the OPS5 production
rule language BFKM85]. OPS5 has a quite di erent
model of rule processing than most database production
rule systems, including the Starburst Rule System. Second, the goal of previous work is to impose restrictions
and/or orderings on OPS5 rule sets such that unique
xed points are guaranteed. Our goal, on the other hand,
is to permit arbitrary rule sets and provide useful information about their behavior in the database setting.
In Section 9 we make some additional, more technical,
comparisons, and explain how our analysis techniques
subsume results in HH91,Ras90,ZH90].
In KU91], the issue of rule set termination is discussed, along with the issue of con icting updates
determining when one rule may undo changes made by
a previous rule. Although models and a problemsolving
architecture for rule analysis are proposed, no algorithms
are given. In AS91], issues of termination and unique
xed points are considered in the context of various
extensions to Datalog. In addition to the very di erent semantics of Datalog (logic) and production rules,
AS91] does not address the issue of determining whether
a given rule set exhibits certain properties (as we do),
but rather states results about whether all rule sets in a
given language are guaranteed to exhibit the properties.
In CW90] we presented initial methods for analyzing
termination in the context of deriving production rules
for integrity constraint maintenance; these methods form
the basis of our approach to termination in this paper. 1.2 Outline of Paper As an introduction to database production rule languages and to establish a basis for our analysis techniques, in Section 2 we give the syntax and semantics of
Starburst production rules. In Section 3 we introduce
initial notation and de nitions, and we describe some
straightforward preliminary analysis of rule sets. In Section 4 we present a model of rule processing to be used
as the formal basis for our analysis algorithms. Termination analysis is covered in Section 5 and con uence
in Section 6. In Section 7 we give methods for analyzing partial con uence, which speci es that a rule set is
con uent with respect to a portion of the database. Observable determinism is covered in Section 8. Finally, in
Section 9 we draw conclusions and discuss future work. 2 The Starburst Rule System We provide a brief overview of the setoriented, SQLbased Starburst production rule language. Further details and numerous examples appear in WCL91,WF90].
Starburst production rules are based on the notion
of transitions. A transition is a database state change
resulting from execution of a sequence of data manipulation operations. Rules consider only the net e ect of
transitions, meaning that: (1) if a tuple is updated several times, only the composite update is considered; (2)
if a tuple is updated then deleted, only the deletion is considered; (3) if a tuple is inserted then updated, this is
considered as inserting the updated tuple; (4) if a tuple
is inserted then deleted, this is not considered at all. A
formal theory of transitions and their net e ects appears
in WF90].
The syntax for de ning a rule is:
create rule name on table
when transition predicate
if condition ]
then action
precedes rulelist ]
follows rulelist ]
The transition predicate speci es one or more triggering operations on the rule's table: inserted, deleted,
or updated(c1; : : : ; cn), where c1 ; : : : ; cn are column
names. The rule is triggered by a given transition if
at least one of the speci ed operations occurred in the
net e ect of the transition. The optional condition speci es an SQL predicate. The action speci es an arbitrary
sequence of SQL data manipulation operations to be executed when the rule is triggered and its condition is true.
The optional precedes and follows clauses are used to
induce a partial ordering on the set of de ned rules. If
a rule r1 speci es a rule r2 in its precedes list, or if r2
speci es r1 in its follows list, then r1 is higher than r2
in the ordering. (We also say that r1 has precedence or
priority over r2.) When no direct or transitive ordering
is speci ed between two rules, their order is arbitrary.
A rule's condition and action may refer to the current state of the database through toplevel or nested
SQL select operations. In addition, rule conditions and
actions may refer to transition tables, which are logical
tables re ecting the changes to the rule's table that have
occurred during the triggering transition. At the end of
a given transition, transition table inserted in a rule
refers to those tuples of the rule's table that were inserted by the transition, transition table deleted refers
to those tuples that were deleted, and transition tables
newupdated and oldupdated refer to the new and
old values (respectively) of the updated tuples. A rule
may refer only to transition tables corresponding to its
triggering operations.
Rules are activated at rule assertion points. There is
an assertion point at the end of each transaction, and
there may be additional userspeci ed assertion points
within a transaction. We describe the semantics of rule
processing at an arbitrary assertion point. The state
change resulting from the usergenerated database operations executed since the last assertion point (or start
of the transaction) creates the rst relevant transition,
and some set of rules are triggered by this transition. A
triggered rule r is chosen from this set for consideration.
Rule r must be chosen so that no other triggered rule
has precedence over r. If r has a condition, then it is
checked. If r's condition is false, then another triggered
rule is chosen for consideration. Otherwise, if r has no
condition or its condition is true, then r's action is executed. After execution of r's action, all rules not yet
considered are triggered only if their transition predicates hold with respect to the composite transition cre ated by the initial transition and subsequent execution
of r's action. That is, these rules see r's action as if it
were executed as part of the initial transition. Rules already considered (including r) have already \processed"
the initial transition; thus, they are triggered again only
if their transition predicate holds with respect to the
transition created by r's action. From the new set of
triggered rules, a rule r is chosen for consideration such
that no other triggered rule has precedence over r . Rule
processing continues in this fashion.
At an arbitrary time in rule processing, a given rule
is triggered if its transition predicate holds with respect
to the (composite) transition since the last time it was
considered. If it has not yet been considered, it is triggered if its transition predicate holds with respect to the
transition since the last rule assertion point or start of
the transaction. The values of transition tables in rule
conditions and actions always re ect the rule's triggering transition. Rule processing terminates when there
are no triggered rules.
The analysis techniques we present are based on this
language and rule processing semantics, but with modi cations they also could apply to other similar languages;
see Section 9.
0 0 3 De nitions and Preliminary Analysis Let R = fr1; r2; : : : ; rng denote an arbitrary set of Starburst production rules to be analyzed. Analysis is performed on a xed set of ruleswhen the rule set is
changed, analysis must be repeated. (Incremental methods are certainly possible; see Section 9.) Let P denote
the set of userde ned priority orderings on rules in R (as
speci ed by their precedes and follows clauses), including those implied by transitivity. P = fri > rj ; rk >
rl ; : : :g, where ri > rj denotes that rule ri has precedence over rj . Let T = ft1 ; t2; : : : ; tm g denote the tables
in the database schema, and let C = fti :cj ; tk :cl ; : : :g
denote the columns of tables in T . Finally, let O denote
the set of database modi cation operations:
O = fhI; ti j t 2 T g
fhD; ti j t 2 T g
fhU; t:ci j hI; ti t:c 2 Cg denotes insertions into table t, hD; ti denotes deletions from table t, and hU; t:ci denotes updates to column c of table t.
The following de nitions are computed using straightforward preliminary analysis of the rules in R:
TriggeredBy takes a rule r and produces the set of
operations in O that trigger r. TriggeredBy is trivial
to compute based on rule syntax.
Performs takes a rule r and produces the set of operations in O that may be performed by r's action.
Performs is trivial to compute based on rule syntax.
Triggers takes a rule r and produces all rules r
that can become triggered as a result of r's action (possibly including r itself). Triggers(r) =
fr 2 R j Performs(r) \ TriggeredBy(r ) 6= ;g.
Reads takes a rule r and produces all columns in C
that may be read by r in its condition or action.
0 0 0 Reads( ) contains every referenced in a select or
where clause in condition or action. In addition, for every htransi referenced, where htransi is
one of inserted, deleted, newupdated, or oldupdated, is in Reads( ) for 's triggering table
. (Recall from Section 2 that inserted, deleted,
newupdated, and oldupdated are transition ta1
r t:c 0 rs :c t:c r r t bles based on changes to t.)
CanUntrigger takes a set of operations O O and
produces all rules that can be \untriggered" as a
result of operations in O . A rule is untriggered if
it is triggered at some point during rule processing
but not chosen for consideration, then subsequently
no longer triggered because all triggering changes
were undone by other rules.2 CanUntrigger(O ) =
fr 2 R j hD; ti 2 O and hI; ti or hU; t:ci 2 TriggeredBy(r) for some t 2 T ; t:c 2 C g.
Choose takes a set of triggered rules R
R and
produces a subset of R indicating those rules eligible
for consideration (based on priorities). Choose(R )
= fri j ri 2 R and there is no rj 2 R such that
rj > ri 2 P g.
Observable takes a rule r and indicates whether r's
action may be observable. In Starburst, a rule's action may be observable i it includes a select or rollback statement.
0 0 0 0 0 0 0 0 0 4 Execution Model We now de ne a formal model of executiontime rule
processing. The model is based on execution graphs and
accurately captures the semantics of rule processing described in Section 2. Note that execution graphs are used
to discuss and to prove the correctness of our analysis
techniques, but they are not part of the analysis itself.
A directed execution graph has a distinguished initial state representing the start of rule processing (at
any rule assertion point) and zero or more nal states
representing termination of rule processing. The paths
in the graph represent all possible execution sequences
during rule processing; branches in the graph result from
choosing di erent rules to consider when more than one
is eligible. (Hence any graph for a totally ordered rule
set has no branches.) The graph may have in nitely
long paths, possibly due to cycles, and these represent
nontermination of rule processing.
More formally, a state (node) S in an execution graph
has two components: (1) a database state D; (2) a
set TR containing each triggered rule and its associated
transition tables. We denote this state as S = (D; TR).
The initial state I is created by an initial transition,
which results from a sequence of usergenerated database
operations. Hence, I = (DI ; TRI ) where DI is a data1 Note that, unlike in OPS5, there is no distinction between reading values \positively" and \negatively" in this
rule language.
2 As an example, a rule r1 might be triggered by insertions,
but another rule r2 might delete all inserted tuples before r1
is chosen for consideration. Untriggering is rare in practice. base state and there is some (possibly empty) set of operations O O such that:
TRI = fr 2 R j O \ TriggeredBy(r) 6= ;g
O are the operations producing the initial transition,
and TRI contains the rules triggered by those operations. A nal state F is some (DF ; ;), since no rules are
triggered when rule processing terminates.
Each directed edge in an execution graph is labeled
with a rule r and represents the consideration of r during
rule processing. (This includes determining whether r's
condition is true and, if so, executing r's action.) Using
de nitions from Section 3, the following lemma states
certain properties that hold for all execution graphs. The
lemma is stated without proofit follows directly from
the semantics of rule processing described in Section 2.
0 0 0 Lemma 4.1 (Properties of Execution Graphs) Consider any execution graph edge from a state (D1 ;
TR1 ) to a state (D2 ; TR2 ) labeled with a rule r. Then:
r 2 Choose(TR1 )
There is some (possibly empty) set of operations O
Performs(r) such that the triggered rules in TR2 can
be derived from the triggered rules in TR1 by:
1. removing rule r
2. removing some subset of the rules in
CanUntrigger(O )
3. adding all rules r 2 R such that
O \ TriggeredBy(r ) 6= ;
2
The operations in O are those executed by r's action. If
r 's condition is false then O is empty. If r 's condition is
true then O still may be a proper subset of Performs(r)
since, by the semantics of SQL, for most operations there
are certain database states on which they have no e ect.
Finally, note that although rule r is removed in step 1, r
may be added again in step 3 if O \ TriggeredBy(r) 6= ;.
The properties in Lemma 4.1 are guaranteed for all
execution graphs. By performing more complex analysis
on rule conditions and actions, by incorporating properties of database states, and by considering a variety of
special cases, we probably can identify additional properties of execution graphs. Since our analysis techniques
are based on execution graph properties, more accurate properties may result in more accurate rule analysis. We believe that the properties used here, although
somewhat conservative, are su ciently accurate to yield
strong analysis techniques.
0 0 0 0 0 0 0 0 0 5 Termination We want to determine whether the rules in R are guaranteed to terminate. That is, we want to determine if for
all usergenerated operations and initial database states,
rule processing always reaches a point at which there are
no triggered rules to consider. We take as an assumption that individual rule actions terminate. Hence, in
terms of execution graphs, the rules in R are guaranteed
to terminate i all paths in every execution graph for R
are nite.
As suggested in CW90], termination is analyzed by
constructing a directed triggering graph for the rules in R, denoted TGR . The nodes in TGR represent the rules
in R and the edges represent the Triggers relationship.
That is, there is an edge from ri to rj in TGR i rj 2
Triggers(ri ).
Theorem 5.1 (Termination) If there are no cycles
in TGR then the rules in R are guaranteed to terminate.
Proof: Omitted due to space constraints; see AWH92].
Hence, to determine whether the rules in R are
guaranteed to terminate, triggering graph TGR is constructed and checked for cycles. Although this may appear to be a very conservative approach, by considering
only the known properties of our execution graph model
(Lemma 4.1), we see that whenever there is a cycle in the
triggering graph, our analysis cannot rule out the possibility that there is an execution graph with an in nite
path. Clearly, however, there are a number of special
cases in which there is a cycle in the triggering graph
but other properties (not captured in Lemma 4.1) guarantee termination. Examples are:
The action of some rule r on the cycle only deletes
from a table t, and no other rules on the cycle insert
into t. Eventually r's action has no e ect.
The action of some rule r on the cycle only performs
a \monotonic" update (e.g. increments values), guaranteeing that the condition of some rule r on the
cycle eventually becomes false (e.g. some value is less
than 10).
Although some such cases may be detected automatically, for now we assume that they are discovered by the
user through the interactive analysis process: Once the
analyzer has built the triggering graph for the rules in R,
the user is noti ed of all cycles (or strong components).
If the user is able to verify that, on each cycle, there is
some rule r such that repeated consideration of the rules
on the cycle guarantee that r's condition eventually becomes false or r's action eventually has no e ect, then
the rules in R are guaranteed to terminate.
As part of a case study, we used this approach to establish termination for a set of rules in a power network
design application CW90].
0 6 Con uence Next we want to determine whether the rules in R are
con uent. That is, we want to determine if the nal
database state at termination of rule processing can depend on which rule is chosen for consideration when multiple nonprioritized rules are triggered. In terms of execution graphs, the rules in R are con uent if every execution graph for R has at most one nal state. (Recall
that all nal states in an execution graph have an empty
set of triggered rules, so two di erent nal states cannot
represent the same database state.)
Con uence for production rules is a particularly di cult problem because, in addition to the standard problems associated with con uence Hue80], we must take
into account the interactions between rule triggering and
rule priorities. For example, it is not su cient to simply
consider the combined e ects of two rule actions; it also S j
i
?? @ @
R
@
?
j
i
@@ ??
j@ ? i
R
r r S S r r S 0 Figure 1: Commutative rules
is necessary to consider all rules that can become triggered, directly or indirectly, by those actions, and the
relative ordering of these triggered rules. These issues
are discussed as we develop our requirements for con uence in Section 6.3. As preliminaries, we rst introduce
the notion of rule commutativity, and we make a useful
observation about execution graphs. 6.1 Rule Commutativity We say that two rules ri and rj are commutative (or ri
and rj commute) if, given any state S in any execution
graph, considering rule ri and then rule rj from state S
produces the same execution graph state S as considering rule rj and then rule ri ; this is depicted in Figure 1.
If this equivalence does not always hold, then ri and rj
are noncommutative (or ri and rj do not commute).
Each rule clearly commutes with itself. Based on the
de nitions of Section 3, we give a set of conditions for
analyzing whether pairs of distinct rules commute.
Lemma 6.1 For distinct rules ri and rj , if any of the
following conditions hold then ri and rj may be noncommutative; otherwise they are commutative:
1. rj 2 Triggers(ri), i.e. ri can cause rj to become triggered
2. rj 2 CanUntrigger(Performs(ri)), i.e. ri can untrigger rj
3. hI; ti, hD; ti, or hU; t:ci is in Performs(ri) and t:c is
in Reads(rj ) for some t:c 2 C , i.e. ri's operations can
a ect what rj reads
4. hI; ti is in Performs(ri ) and hD; ti or hU; t:ci is in
Performs(rj ) for some t 2 T or t:c 2 C , i.e. ri 's
insertions can a ect what rj updates or deletes3
5. hU; t:ci is in both Performs(ri) and Performs(rj ), i.e.
ri 's updates can a ect rj 's updates
6. any of 1{5 with ri and rj reversed 2
We leave it to the reader to verify that if a pair of rules
does not satisfy any of 1{6 then the rules are guaranteed
to commute.
The conditions in Lemma 6.1 are somewhat conservative and probably could be re ned by performing more
complex analysis on rule conditions and actions and by
considering a variety of special cases. As two examples
of this, consider rules ri and rj such that:
0 3 In SQL it is possible to delete from or update a table
without reading the table, which is why cases 4 and 5 are
distinct from case 3. 1. ri inserts into a table t and rj deletes from t, but the
tuples inserted by ri never satisfy the delete condition
of rj , or
2. ri and rj update the same table but never the same
tuples.
In the rst example, ri and rj are noncommutative according to condition 4 of Lemma 6.1, but they do actually commute. In the second example, ri and rj are
noncommutative according to condition 5 but do commute. Although some such cases may be detected automatically, for now we assume that they are speci ed
by the user during the interactive analysis process: We
allow the user to declare that pairs of rules that appear noncommutative according to Lemma 6.1 actually
do commute. The analysis algorithms then treat these
rules as commutative. 6.2 Observation We say that two rules ri and rj are unordered if neither
ri > rj nor rj > ri is in P . (Similarly, we say two rules
ri and rj are ordered if ri > rj or rj > ri is in P .) Based
on our execution graph model, we make the following
observation about possible states, which is used in the
next section to develop our criteria for con uence.
Observation 6.2 Consider any two unordered rules ri
and rj in R. It is very likely that there is an execution graph with a state that has (at least) two outgoing
edges, one labeled ri and one labeled rj . (Informally,
there is very likely a scenario in which both ri and rj
are triggered and eligible for consideration. Recall that
a triggered rule r is eligible for consideration i there is
no other triggered rule with precedence over r.)
Justi cation: Let O = TriggeredBy(ri) TriggeredBy(rj ). Consider an execution graph for which the operations in O are the initial usergenerated operations,
so that ri and rj are both triggered in the initial state.
Consider any path of length 0 or more from the initial
state to a state S = (D; TR) in which there are no rules
r 2 TR such that r > ri or r > rj is in P , i.e. there are
no triggered rules with precedence over ri or rj .4 State
S has at least two outgoing edges, one labeled ri and one
labeled rj . 2
0 0 6.3 Analyzing Con uence We now return to the question of con uence. We want to
determine if every execution graph for R is guaranteed
to have at most one nal state. For two execution graph
states Si and Sj , let Si ! Sj denote that there is an
edge in the execution graph from state Si to state Sj
and let Si ! Sj denote that there is a path of length
0 or more from Si to Sj . (! is the re exivetransitive
closure of !.) Our rst Lemma establishes conditions
for con uence based on !: 4 Such a path does not exist if ri or rj is untriggered along
all potential paths, or if rules with precedence over ri or rj
are considered inde nitely along all potential paths. These
are highly unlikely (and probably undesirable) circumstances,
but are why this is an observation rather than a theorem. S ?? @@
R
@
?
j
i
@@ ??
R
@?
S j
i
?? @@
R
@
? j
i
?? @ @
R
@
?
j
i
@@ ??
R
@?
r r S S S S 0 i S S S R 0 0 0 0 0 0 0 0 ? ? j ? ? r 0 0 1 0 Lemma 6.3 (Path Con uence) Consider an arbitrary execution graph EG and suppose that for any three
states S , Si , and Sj in EG such that S ! Si and S ! Sj ,
there is a fourth state S such that Si ! S and Sj ! S
(Figure 2a). Then EG has at most one nal state.5
Proof: Suppose, for the sake of a contradiction, that EG
has two distinct nal states, F1 and F2. Let I be the initial state, so I ! F1 and I ! F2. Then, by assumption,
there must be a fourth state S such that F1 ! S and
F2 ! S . Since F1 and F2 are both
nal states, S = F1
and S = F2, contradicting F1 6= F2. 2
It is quite di cult in general to determine when the supposition of Lemma 6.3 holds, since it is based entirely
on arbitrarily long paths. The following Lemma gives a
somewhat weaker condition that is easier to verify and
implies the supposition of Lemma 6.3; it does, however,
add the requirement that rule processing is guaranteed
to terminate:
Lemma 6.4 (Edge Con uence) Consider an arbitrary execution graph EG with no in nite paths. Suppose that for any three states S , Si , and Sj in EG such
that S ! Si and S ! Sj , there is a fourth state S such
that Si ! S and Sj ! S (Figure 2b). Then for any
three states S , Si , and Sj in EG such that S ! Si and
S ! Sj , there is a fourth state S such that Si ! S and
Sj ! S .
Proof: Classic result; see e.g. Hue80].
We use Lemma 6.4 as the basis for our analysis techniques. Based on this Lemma (along with Lemma 6.3),
we can guarantee con uence for the rules in R if we know
1. there are no in nite paths in any execution graph for
R (i.e., the rules in R are guaranteed to terminate),
and
2. in any execution graph for R, for any three states S ,
Si , and Sj such that S ! Si and S ! Sj , there is a
fourth state S such that Si ! S and Sj ! S .
We assume that the rst condition has been established
through the analysis techniques of Section 5; we focus
0 5 Sometimes the term con uence is used to denote the supposition of this Lemma Hue80], which then implies con uence in the sense that we've de ned it. j S S (a) Based on paths
(b) Based on edges
Figure 2: Conditions for con uence 0 r r S R 2 i r i
0 S j
0 Figure 3: Paths towards common state S 0 our attention on analysis techniques for establishing the
second condition.
Consider any execution graph for R and any three
states S , Si , and Sj such that S ! Si and S ! Sj . This
con guration is produced by every state S that has at
least two unordered triggered rules that are eligible for
consideration. Let ri be the rule labeling edge S ! Si
and rj be the rule labeling edge S ! Sj , as in Figure 2b.
We want to prove that there is a fourth state S such that
Si ! S and Sj ! S . It is tempting to assume that if ri
and rj are commutative, then rj can be considered from
state Si and ri from Sj , producing a common state S as
in Figure 1. Unfortunately, this is not always possible:
If ri causes a rule r with precedence over rj to become
triggered, then rj is not eligible for consideration in state
Si (similarly for ri in state Sj ). Since the new triggered
rule r must be considered before rule rj , r must commute
with rj . Furthermore, r may cause additional rules with
precedence over rj to become triggered.
With this in mind, we motivate the requirements for
the existence of a common state S that is reachable from
both Si and Sj . We do this by attempting to \build"
valid paths from Si and Sj towards S ; call these paths p1
and p2, respectively. From state Si , triggered rules with
precedence over rj are considered until rj is eligible; call
these rules R1 . Similarly, from Sj triggered rules with
precedence over ri are considered until ri is eligible; call
these rules R2. After this, rj can be considered on path
p1 and ri can be considered on path p2 . Paths p1 and p2
up to this point are depicted in Figure 3.
Now suppose that from state Si we can continue path
p1 by considering the rules in R2 (in the same order), i.e.
suppose the rules in R2 are appropriately triggered and
eligible. Similarly, suppose that from Sj we can consider
the rules in R1. Then the same rules are considered
along both paths. Consequently, if each rule in frig
R1 commutes with each rule in frj g
R2 , then the two
paths are equivalent and reach a common state S ; this
is depicted in Figure 4.
Unfortunately, even this scenario is not necessarily
valid: There is no guarantee that the rules in R2 are triggered and eligible from state Si ; similarly for R1 and Sj .
0 0 0 0 0 0 0 0 0 0 0 Lemma 6.6 (Con uence Lemma) S j
i
?? @ @
R
@
?
r r i R 0 j S S 1 ? j ?
?
j
i
@@ ??
2
R
@?1 r ? R r S 0 S R 2 i 0 R S 0 Figure 4: Paths reaching common state S 0 0 0 R 1
2 fri g
frj g repeat until unchanged: 1 fr 2 R j r 2 Triggers(r1 ) for some r1 2 R1
and r > r2 2 P for some r2 2 R2
and r 6= rj g
R2
R2
fr 2 R j r 2 Triggers(r2 ) for some r2 2 R2
and r > r1 2 P for some r1 2 R1
and r 6= rig
For every pair of rules r1 2 R1 and r2 2 R2, r1 and r2
R 1 R must commute. 2
The following lemma and theorem formally prove that
the requirement of De nition 6.5 indeed guarantees conuence. 0 0 0 (For example, a rule in R2 may not be eligible from state
Si because rj triggered a rule with higher priority.) We
can guarantee this, however, if we extend the rules originally considered in R1 to include all eligible rules with
precedence over rules in R2, and extend the rules in R2
similarly. Using this mutually recursive de nition of R1
and R2, the pairwise commutativity of rules in fri g R1
with rules in frj g R2 guarantees the existence of state
S , and consequently guarantees con uence.
To establish con uence for the rules in R, then, we
must consider in this fashion every pair of rules ri and
rj such that some state in some execution graph for R
may have two outgoing edges, one labeled with ri and
one with rj . Recall Observation 6.2: For any two unordered rules ri and rj , it is very likely that there is
an execution graph with a state that has two outgoing
edges, one labeled ri and one labeled rj . Consequently,
we consider every pair of unordered rules, and our analysis requirement for con uence is stated as follows.
De nition 6.5 (Con uence Requirement) Consider any pair of unordered rules ri and rj in R. Let
R1
R and R2
R be constructed by the following
algorithm:
R Suppose the
Con uence Requirement (De nition 6.5) holds for R.
Then in any execution graph EG for R, for any three
states S , Si , and Sj in EG such that S ! Si and S ! Sj ,
there is a fourth state S such that Si ! S and Sj ! S .
Proof: Omitted due to space constraints; see AWH92].
(The formal proof parallels the motivation shown in Figure 4, although the full construction is slightly more complex.)
Theorem 6.7 (Con uence Theorem) Suppose the
Con uence Requirement holds for R and there are no
in nite paths in any execution graph for R. Then any
execution graph for R has exactly one nal state, i.e. the
rules in R are con uent.
Proof: Let EG be any execution graph for R. By Conuence Lemma 6.6, for any three states S , Si , and Sj
in EG such that S ! Si and S ! Sj , there is a fourth
state S such that Si ! S and Sj ! S . Therefore, by
Edge Con uence Lemma 6.4, for any three states S , Si ,
and Sj in EG such that S ! Si and S ! Sj , there is a
fourth state S such that Si ! S and Sj ! S . By Path
Con uence Lemma 6.3, EG has at most one nal state,
hence (since there are no in nite paths) EG has exactly
one nal state. 2
Thus, analyzing whether the rules in R are con uent requires considering each pair of unordered rules ri and rj
in R: Sets R1 and R2 are built from ri and rj according
to De nition 6.5, and the rules in R1 and R2 are checked
pairwise for commutativity.
0 0 0 0 0 6.4 Using Con uence Analysis If our analysis determines that the rules in R are not
con uent, it can be attributed to pairs of unordered rules
ri and rj that generate sets R1 and R2 such that rules
r1 2 R1 and r2 2 R2 do not commute. (In the most
common case, r1 and r2 are ri and rj themselves; see
Corollary 6.8 below.) With this information, it appears
that the user has three possible courses of action towards
con uence (short of modifying the rules themselves):
1. Certify that rules r1 and r2 actually do commute
2. Specify a userde ned priority between rules ri and
rj so they no longer must satisfy the Con uence Requirement
3. Remove userde ned priorities so r1 or r2 is no longer
part of R1 or R2
Approach 1 is clearly the best when it is valid. Approach
3 is nonintuitive and in fact useless: removing orderings
to eliminate r1 or r2 from R1 or R2 simply produces a
corresponding violation to the Con uence Requirement
elsewhere. Hence, if Approach 1 is not applicable (i.e.
rules r1 and r2 do not commute) then Approach 2 should
be used. Note, however, that adding an ordering between rules ri and rj does not immediately guarantee
con uencesets R1 or R2 may increase for other pairs of
rules and indicate that the rule set is still not con uent.6 6 Intuitively, a source of noncon uence can appear to
\move around", requiring an iterative process of adding or As guidelines for developing con uent rule sets, the
following corollaries indicate simple properties that are
satis ed by the rules in R if they are found to be con uent using our methods.
Corollary 6.8 If R is found to be con uent and ri and
rj are unordered rules in R, then ri and rj commute.
Proof: Unordered rules ri and rj generate sets R1 and
R2 such that ri 2 R1 and rj 2 R2 . Hence, by the Conuence Requirement, ri and rj must commute. 2
Corollary 6.9 If R is found to be con uent and P = ;
(i.e. there are no userde ned priorities between any rules
in R), then every pair of rules in R commutes.
Proof: Follows directly from Corollary 6.8. 2
Corollary 6.10 If R is found to be con uent and ri and
rj in R are such that ri may trigger rj (or viceversa),
then ri and rj are ordered.
Proof: Since rj 2 Triggers(ri ), by our conditions for
noncommutativity (Lemma 6.1), ri and rj do not commute. Suppose, for the sake of a contradiction, that ri
and rj are unordered. Then by Corollary 6.8 they must
commute. 2
Additional similar corollaries certainly exist and provide
useful initial tools for the rule programmer.
We used our approach (by hand) to analyze con uence
for several mediumsized rule applications. In most cases
the rule sets were initially found to be noncon uent.
However, for those rule sets that actually were con uent, user speci cation of rule commutativity eventually
allowed con uence to be veri ed. Furthermore, for some
rule sets the analysis uncovered previously undetected
sources of noncon uence. 7 Partial Con uence De nition 7.1 (Signi cant Rules) Let T T be a
set of tables. The set of rules that are signi cant with
respect to T , denoted Sig(T ), is computed by the following algorithm:
Sig(T ) fr 2 R j hI; ti, hD; ti, or hU; t:ci
is in Performs(r) for some t 2 T g
repeat until unchanged:
Sig(T ) Sig(T )
fr 2 R j there is an r 2 Sig(T ) such that
r and r do not commuteg
2
That is, Sig(T ) contains all rules that modify any table
in T , along with (recursively) all rules that do not commute with rules in Sig(T ). This algorithm determines
whether rules commute using our conservative conditions
for noncommutativity from Lemma 6.1. Hence, the user
can in uence the computation of Sig(T ) by specifying
that pairs of rules that appear noncommutative according to Lemma 6.1 actually do commute.
As in Con uence Theorem 6.7, partial con uence requires that rules are guaranteed to terminate. In this
case, however, the rule set under consideration is Sig(T ).
Thus, before analyzing partial con uence, termination of
the rules in Sig(T ) must be established using the techniques of Section 5.7
Theorem 7.2 (Partial Con uence) Let T T be
a set of tables. Suppose the Con uence Requirement
(De nition 6.5) holds for the rules in Sig(T ) and there
are no in nite paths in any execution graph for Sig(T ).
Then given any two nal states F1 and F2 in any execution graph for R, the tables in T are identical in F1 and
F2 , i.e. the rules in R are con uent with respect to T .
Proof: Omitted due to space constraints; see AWH92].
Hence, analyzing whether the rules in R are con uent with respect to T requires rst computing Sig(T ),
then considering each pair of unordered rules ri and rj
in Sig(T ): Sets R1 and R2 are built according to Definition 6.5 and checked pairwise for commutativity. If
the analysis determines that the rules in R are not partially con uent, then the same interactive approach as
that described in Section 6.4 for con uence can be used
here to establish partial con uence.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Con uence may be too strong a requirement for some applications. It sometimes is useful to allow rule set R to
be noncon uent for certain \unimportant" (e.g. scratch)
tables in the database, but to ensure that R is con uent
for other \important" (e.g. data) tables. We call this partial con uence, or con uence with respect to T , where T
is a subset of the set of tables T in the database schema.
In terms of execution graphs, the rules in R are con uent
with respect to T if, given any execution graph EG for
R and any two nal states F1 = (D1 ; ;) and F2 = (D2 ; ;)
in EG, the tables in T are identical in database states
D1 and D2 . (Partial con uence obviously is implied by
con uence, since con uence guarantees at most one nal
state.)
Partial con uence is analyzed by analyzing con uence
for a subset of the rules in R: those rules that can directly
or indirectly a ect the nal value of tables in T .
0 0 0 0 0 derings (or certifying commutativity) until the rule set is
made con uent. This happens because our analysis techniques simply detect that con uence requires two rules to be
orderedthe user chooses an ordering, and this choice a ects
which additional rules must be ordered. 0 0 0 0 8 Observable Determinism In some database production rule languages, such as
Starburst, the nal database state may not be the only
e ect of rule processingsome rule actions may be visible to the environment (observable) while rules are being
processed. When this is the case, the user may want to
determine whether a rule set is observably deterministic,
i.e. whether the order and appearance of observable rule
actions is the same regardless of which rule is chosen
for consideration when multiple nonprioritized rules are 7 That is, even though the rules in Sig(T 0 ) are never processed on their own, it must be established that if they were
processed on their own they would terminate. As in Section 6.3, this is necessary for De nition 6.5 to guarantee
con uence. triggered. Note that observable determinism and con uence are orthogonal properties: a rule set may be conuent but not observably deterministic or viceversa.
We analyze observable determinism using our techniques for partial con uence. Intuitively, we add a ctional table Obs to the database, and we pretend that
those rules with observable actions also \timestamp and
log" their observable actions in table Obs. We analyze
the resulting rule set for con uence with respect to table Obs; if partial con uence holds, then the rule set is
observably deterministic.
More formally, recall the de nitions of Section 3. Let
T
obs = T fObsg be an extended set of tables, let
C
obs = C fObs:cg be an extended set of columns, and
let Oobs be the corresponding extended set of operations.
Let Readsobs and Performsobs extend the de nitions of
Reads and Performs as follows. For every r 2 R such
that Observable(r), add Obs:c to Reads(r) and hI; Obsi
to Performs(r). For convenience, we say that a rule r is
observable if Observable(r).
Theorem 8.1 (Observable Determinism) Suppose, using extended de nitions Tobs , Cobs , Oobs ,
Readsobs , and Performsobs, that our analysis methods
for partial con uence determine that rule set R is conuent with respect to Obs. That is, suppose (from Theorem 7.2) that the Con uence Requirement of De nition 6.5 holds for the rules in Sig(Obs) and there are no
in nite paths in any execution graph for R. Then the
rules in R are observably deterministic.
Proof: By supposition, any hypothetical behavior of
the rules in R that is consistent with the de nitions of
Readsobs and Performsobs is con uent with respect to
Obs. Consider the following such behavior. Suppose
each observable rule r, in addition to its existing actions,
inserts a new tuple into Obs that contains the current
number of tuples in Obs (the \timestamp") and a complete description of r's observable actions (the \log").
Since there is a unique nal value for Obs, the hypothetical tuples written to Obs must be identical on all
execution paths. Consequently, there is only one possible order and appearance of observable actions, and the
rules in R are observably deterministic. 2
If, using the analysis methods indicated by this theorem, the rules in R are not found to be observably deterministic, then the same interactive approach as that
described in Section 6.4 can be used to establish con uence with respect to Obs, and consequently observable
determinism. Although this requires the user to be aware
of ctional table Obs, the use of Obs in the analysis techniques is quite intuitive and may actually guide the user
in establishing observable determinism.
The following corollary gives a simple property that is
satis ed by the observable rules in R if they are found
to be deterministic using our methods. Additional useful
corollaries certainly exist.
Corollary 8.2 If R is found to be observably deterministic and ri and rj are distinct observable rules in R,
then ri and rj are ordered.8
8 Note that this is not an if and only if condition: order Proof: Since i is observable, Obs 2 Reads( i) and
hI Obsi 2 Performs( i ); similarly for j . Therefore, by
r ; :c r r r De nition 7.1, ri and rj are both in Sig(Obs). In addition, by Lemma 6.1, ri and rj satisfy our conditions
for noncommutativity. Suppose, for the sake of a contradiction, that ri and rj are unordered. Then ri and
rj generate sets R1 and R2 (from De nition 6.5) such
that ri 2 R1 and rj 2 R2. Hence, by the Con uence
Requirement, ri and rj must commute. 2 9 Conclusions and Future Work We have given static analysis methods that determine
whether arbitrary sets of database production rules are
guaranteed to terminate, are con uent, are partially conuent with respect to a set of tables, or are observably
deterministic. Our algorithms are conservativethey
may not always detect when a rule set satis es these
properties. However, they isolate the responsible rules
when a property is not satis ed, and they determine
simple criteria that, if satis ed, guarantee the property.
Furthermore, for the cases when these criteria are not
satis ed, our methods often can suggest modi cations
to the rule set that are likely to make the property hold.
Consequently, our methods can form the basis of a powerful interactive development environment for database
rule programmers.
Although our methods have been designed for the
Starburst Rule System, we expect that they can be
adapted to accommodate the syntax and semantics of
other database rule languages. In particular, the fundamental de nitions of Section 3 (Triggers, Performs,
Choose, etc.) can simply be rede ned for an alternative rule language. Alternative rule processing semantics
will probably require that the execution graph model
is modi ed, which consequently will cause algorithms
(and proofs) to be modi ed. However, our fundamental
\building blocks" of rule analysis techniques can remain
the same: the triggering graph for analyzing termination, the Edge and Path Lemmas for analyzing con uence, the notion of partial con uence, and the use of
partial con uence in analyzing observable determinism.
Some technical comparisons can be drawn between
this work and the results in HH91, Ras90, ZH90]. In
HH91], a version of the OPS5 production rule language
is considered, and a class of rule sets is identi ed that
(conservatively) guarantees the unique xed point property, which essentially corresponds to our notion of conuence. By de ning a mapping between our language
and the language in HH91], we have shown that our conuence requirements properly subsume their xed point
requirements: if a rule set has the unique xed point
property according to HH91], then our methods determine that the corresponding rule set is con uent, but
not always viceversa. The methods in HH91] have previously been shown to subsume those in Ras90,ZH90],
hence our approach, although still conservative, appears
quite accurate when compared with previous work.
ings between all pairs of observable rules does not necessarily
guarantee observable determinism. Finally, we plan a number of improvements and extensions to this work:
Incremental methods: In our current approach,
complete analysis is performed after any change to
the rule set. In many cases it is clear that most results of previous analysis are still valid and only incremental additional analysis needs to be performed.
We plan to modify our methods to incorporate incremental analysis. At the coarsest level, most rule applications can be partitioned into groups of rules such
that, across partitions, rules reference di erent sets
of tables and have no priority ordering. Although
rules from di erent partitions are processed at the
same time and their execution may be interleaved,
they have no e ect on each other. Hence, analysis
can be applied separately to each partition, and it
needs to be repeated for a partition only when rules
in that partition change.
Less conservative methods: As discussed
throughout the paper, many of our assumptions, definitions, and algorithms are conservative, and there
is room for re nement. This may include more complex analysis of SQL, more accurate properties of our
execution model, and a suite of special cases.
Restricted user operations: Our analysis assumes
that the usergenerated operations that initiate rule
processing are arbitrary. However, in some cases it
may be known that these will be of a particular type,
i.e. users will only perform certain operations on certain tables. This may reduce possible execution paths
during rule processing, and consequently may guarantee properties that otherwise do not hold. We plan
to extend our methods so that termination, con uence, and observable determinism can be analyzed in
the context of limited usergenerated operations.
Implementation and experimentation: We plan
to implement our algorithms as part of an interactive development environment for the Starburst Rule
System. Although we have veri ed by hand that our
methods are indeed useful, implementation will allow practical experimentation with large and realistic
rule applications. Acknowledgements Thanks to Stefano Ceri and Guy Lohman for helpful
comments on an initial draft. References
AS91] S. Abiteboul and E. Simon. Fundamental properties of deterministic and nondeterministic extensions of datalog. Theoretical Computer Science,
78:137{158, 1991.
AWH92] A. Aiken, J. Widom, and J.M. Hellerstein. Behavior of database production rules: Termination, con uence, and observable determinism.
IBM Research Report RJ 8562, IBM Almaden
Research Center, San Jose, California, January
1992. BFKM85] L. Brownston, R. Farrell, E. Kant, and N. Martin. Programming Expert Systems in OPS5:
An Introduction to RuleBased Programming.
AddisonWesley, Reading, Massachusetts, 1985.
CW90] S. Ceri and J. Widom. Deriving production rules
for constraint maintenance. In Proceedings of the
Sixteenth International Conference on Very Large
Data Bases, pages 566{577, Brisbane, Australia,
August 1990.
GJ91]
N. Gehani and H.V. Jagadish. Ode as an active database: Constraints and triggers. In Proceedings of the Seventeenth International Conference on Very Large Data Bases, pages 327{336,
Barcelona, Spain, September 1991.
H+90]
L.M. Haas et al. Starburst mid ight: As the dust
clears. IEEE Transactions on Knowledge and
Data Engineering, 2(1):143{160, March 1990.
Han89] E.N. Hanson. An initial report on the design
of Ariel: A DBMS with an integrated production rule system. SIGMOD Record, Special Issue
on Rule Management and Processing in Expert
Database Systems, 18(3):12{19, September 1989.
HH91] J.M. Hellerstein and M. Hsu. Determinism in
partially ordered production systems. IBM Research Report RJ 8009, IBM Almaden Research
Center, San Jose, California, March 1991.
Hue80] G. Huet. Con uent reductions: Abstract properties and applications to term rewriting systems. Journal of the ACM, 27(4):797{821, October 1980.
KU91] A.P. Karadimce and S.D. Urban. Diagnosing
anomalous rule behavior in databases with integrity maintenance production rules. In Third
Workshop on Foundations of Models and Languages for Data and Objects, Aigen, Austria,
September 1991.
MD89] D.R. McCarthy and U. Dayal. The architecture
of an active database management system. In
Proceedings of the ACM SIGMOD International
Conference on Management of Data, pages 215{
224, Portland, Oregon, May 1989.
Ras90] L. Raschid. Maintaining consistency in a strati ed production system. In Proceedings of the
AAAI National Conference on Arti cial Intelligence, 1990.
SJGP90] M. Stonebraker, A. Jhingran, J. Goh, and
S. Potamianos. On rules, procedures, caching
and views in data base systems. In Proceedings
of the ACM SIGMOD International Conference
on Management of Data, pages 281{290, Atlantic
City, New Jersey, May 1990.
WCL91] J. Widom, R.J. Cochrane, and B.G. Lindsay. Implementing setoriented production rules as an
extension to Starburst. In Proceedings of the Seventeenth International Conference on Very Large
Data Bases, pages 275{285, Barcelona, Spain,
September 1991.
WF90] J. Widom and S.J. Finkelstein. Setoriented production rules in relational database systems. In
Proceedings of the ACM SIGMOD International
Conference on Management of Data, pages 259{
270, Atlantic City, New Jersey, May 1990. ZH90] Y. Zhou and M. Hsu. A theory for rule triggering
systems. In Advances in Database Technology
EDBT '90, Lecture Notes in Computer Science 416, pages 407{421. SpringerVerlag, Berlin,
March 1990. ...
View
Full
Document
This note was uploaded on 06/14/2011 for the course NOTHING 1 taught by Professor Kalantari during the Spring '11 term at Aarhus Universitet.
 Spring '11
 kalantari

Click to edit the document details