Interesting Measures
•
Simplicity for human comprehension
•
Rule length – may be calculated by # conjuncts in rule
(Note:
each conjunct adds further
restriction to the rule.)
Certainty:
Each discovered pattern should have a measure of certainty associated with it that assesses the
validity or “trustworthiness” of the pattern.
A certainty measure for association rules of the form “A =>
B”, where A and B are sets of items, is confidence. Given a set of taskrelevant data tuples (or
transactions in a transaction database) the confidence of “A => B” is defined as
A
containing
tuples
B
and
A
both
containing
tuples
A
B
P
B
A
confidence
_
_
_
#
_
_
_
_
_
#
)

(
)
(
=
=
=
A confidence value of 100% or 1, indicates that the rule is always correct on the data analyzed. Such rules
are called exact.
For classification rules the equation can easily be adapted to act as a measure for certainty referred to as
reliability or accuracy. Classification rules propose a model for distinguishing objects, or tuples, of a
target class from objects of contrasting classes. A low reliability value indicates that the rule in question
incorrectly classifies a large number of contrasting class objects as target class objects. Rule reliability is
also knowing as rule strength, rule quality, certainty factor, and discriminating weight.
Utility
: The potential usefulness of a patter is a factory defining its interestingness.
It can be estimated by
a utility function, such as support. The support of an association pattern refers to the percentage of task
relevant data tuples or transitions for which the pattern is true. For association rules of the form “A => B”
where A and B are sets of items, it is defined as
tuples
total
B
and
A
both
containing
tuples
A
B
P
B
A
_
_#
_
_
_
_
_
#
)
(
)
(
support
=
∩
=
=
 Fall '09
 Merz
 Statistics, Association rules, Chisquare distribution

