Unformatted text preview: on the goal of the task. With dimensionality reduction or transformation 42 AI MAGAZINE methods, the effective number of variables
under consideration can be reduced, or invariant representations for the data can be
Fifth is matching the goals of the KDD process (step 1) to a particular data-mining
method. For example, summarization, classiﬁcation, regression, clustering, and so on,
are described later as well as in Fayyad, Piatetsky-Shapiro, and Smyth (1996).
Sixth is exploratory analysis and model
and hypothesis selection: choosing the datamining algorithm(s) and selecting method(s)
to be used for searching for data patterns.
This process includes deciding which models
and parameters might be appropriate (for example, models of categorical data are different than models of vectors over the reals) and
matching a particular data-mining method
with the overall criteria of the KDD process
(for example, the end user might be more interested in understanding the model than its
Seventh is data mining: searching for patterns of interest in a particular representational form or a set of such representations,
including classiﬁcation rules or trees, regression, and clustering. The user can signiﬁcantly aid the data-mining method by correctly
performing the preceding steps.
Eighth is interpreting mined patterns, possibly returning to any of steps 1 through 7 for
further iteration. This step can also involve
visualization of the extracted patterns and
models or visualization of the data given the
Ninth is acting on the discovered knowledge: using the knowledge directly, incorporating the knowledge into another system for
further action, or simply documenting it and
reporting it to interested parties. This process
also includes checking for and resolving potential conﬂicts with previously believed (or
The KDD process can involve signiﬁcant
iteration and can contain loops between
any two steps. The basic ﬂow of steps (although not the potential multitude of iterations and loops) is illustrated in ﬁgure 1.
Most previous work on KDD has focused on
step 7, the data mining. However, the other
steps are as important (and probably more
so) for the successful application of KDD in
practice. Having deﬁned the basic notions
and introduced the KDD process, we now
focus on the data-mining component,
which has, by far, received the most attention in the literature. Articles The Data-Mining Step
of the KDD Process
The data-mining component of the KDD process often involves repeated iterative application of particular data-mining methods. This
section presents an overview of the primary
goals of data mining, a description of the
methods used to address these goals, and a
brief description of the data-mining algorithms that incorporate these methods.
The knowledge discovery goals are deﬁned
by the intended use of the system. We can
distinguish two types of goals: (1) veriﬁcation
and (2) discovery. With veriﬁcation, the system is limited to verifying the user’s hypothesis. With discovery, the system autonomously
View Full Document
This document was uploaded on 02/15/2014.
- Spring '14