This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ta collected in manufacturing databases if properly analyzed can provide
useful information about performance and optimization opportunities, as well as
the keys to improving processes and troubleshooting problems.
The traditional method of turning data into knowledge relied on manual analysis
and interpretation. However, manual analysis of data is becoming impractical in
many domains as data volumes grow exponentially. Furthermore, manual analysis
of a dataset is slow, expensive, and highly subjective. When the scale of data
analysis grew beyond human capabilities, people started looking to computer
technology to automate the process. This initially resulted in the use of statistical
techniques on computers for automatic data analysis. Although the use of
statistical techniques along with file management tools once sufficed for analysis
of data in databases, the large size of modern databases, the mission-critical nature
of the data, and speed with which analyses need to be made, motivated researchers
to look into new approaches of data analysis. This ultimately resulted in a new
generation of techniques and tools for automatic analysis of huge volumes of data.
These new techniques and tools meant for extraction of knowledge from large
datasets are the subject of the field called knowledge discovery in databases
The KDD Process
KDD is an umbrella term describing a variety of activities for making sense of
data. It is used to describe the overall process of finding useful patterns in data,
including not only the data mining step of running specific knowledge discovery
algorithms but also pre- and post-processing of data and a host of other important
activities. The KDD process, shown in Figure 16.14, typically consists of the
Data Selection. This step involves selection of a dataset or focusing on a
subset of variables or data samples on which knowledge discovery is to be
performed. This step is more significant than it may appear to be, especially when
the data is to be pulled from multiple data sources. For data selecti...
View Full Document
This document was uploaded on 04/07/2014.
- Spring '14