Unformatted text preview: eville and Nikiforov 1993). The Components of
DataMining Algorithms
The next step is to construct speciﬁc algorithms to implement the general methods we
outlined. One can identify three primary
components in any datamining algorithm:
(1) model representation, (2) model evaluation, and (3) search.
This reductionist view is not necessarily
complete or fully encompassing; rather, it is a
convenient way to express the key concepts
of datamining algorithms in a relatively
uniﬁed and compact manner. Cheeseman
(1990) outlines a similar structure.
Model representation is the language used to
describe discoverable patterns. If the representation is too limited, then no amount of
training time or examples can produce an accurate model for the data. It is important that
a data analyst fully comprehend the representational assumptions that might be inherent
in a particular method. It is equally important that an algorithm designer clearly state
which representational assumptions are being
made by a particular algorithm. Note that increased representational power for models increases the danger of overﬁtting the training
data, resulting in reduced prediction accuracy
on unseen data.
Modelevaluation criteria are quantitative FALL 1996 45 Articles Decision Trees and Rules
o Debt
No Loan o x
o
x x x
x o
x
o o
x o o o
x x o
o x o Loan o
t Income Figure 6. Using a Single Threshold on the Income Variable to
Try to Classify the Loan Data Set. statements (or ﬁt functions) of how well a particular pattern (a model and its parameters)
meets the goals of the KDD process. For example, predictive models are often judged by
the empirical prediction accuracy on some
test set. Descriptive models can be evaluated
along the dimensions of predictive accuracy,
novelty, utility, and understandability of the
ﬁtted model.
Search method consists of two components:
(1) parameter search and (2) model search.
Once the model representation (or family of
representations) and the modelevaluation
criteria are ﬁxed, then the datamining problem has been reduced to purely an optimization task: Find the parameters and models
from the selected family that optimize the
evaluation criteria. In parameter search, the
algorithm must search for the parameters
that optimize the modelevaluation criteria
given observed data and a ﬁxed model representation. Model search occurs as a loop over
the parametersearch method: The model representation is changed so that a family of
models is considered. Some DataMining Methods
A wide variety of datamining methods exist,
but here, we only focus on a subset of popular techniques. Each method is discussed in
the context of model representation, model
evaluation, and search. 46 AI MAGAZINE Decision trees and rules that use univariate
splits have a simple representational form,
making the inferred model relatively easy for
the user to comprehend. However, the restriction to a particular tree or rule representation
can signiﬁcantly restrict the functional form
(and, thus, the approximat...
View
Full Document
 Spring '14
 Data Mining, KDD

Click to edit the document details