This preview shows page 1. Sign up to view the full content.
Unformatted text preview: such as deciding
which models and parameters may be appropriate and matching a particular data
mining method with the overall criteria of the KDD process.
5. Data Interpretation/Evaluation. This step involves interpreting the discovered
patterns, as well as possible visualization of the extracted patterns. In this step,
redundant or irrelevant patterns are removed, and powerful visualization tools are
used to translate the useful patterns into the form most understandable by the
users. The user then incorporates this knowledge into the performance system,
taking actions based on the knowledge, or simply documenting it and reporting it
to interested parties.
It may be noted from Figure 16.14 that a realistic KDD process is not simple and
linear, but thoroughly iterative and interactive. That is, the results of analysis are
fed back into the modeling and hypothesis derivation process to produce improved
results on subsequent iterations.
It may also be noted here that often people use the terms data mining and KDD
interchangeably because data mining is key to the KDD process. However, as
shown in the figure, the data mining activity (which is often considered as the core
activity of the KDD process) takes only a small part (estimated at 15% to 25%) of
the effort of the overall KDD process. The activities from data selection to data
transformation in the entire KDD process are popularly known as data
warehousing. As data warehousing and data mining have gained wide popularity,
they are separately covered below.
DATA WAREHOUSING Data warehousing is the process of creating a repository of integrated data for the
purpose of decision support and analysis. A data warehouse is the resulting
repository of integrated data. It is a subject-oriented, integrated, time-variant, and
non-volatile collection of data in support of management's decision-making
process. The meaning of the key terms used in this definition is as follows.
1. Subject-oriented. This means that all data in a dat...
View Full Document
This document was uploaded on 04/07/2014.
- Spring '14