3 data transformation after the selection and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ta collected in manufacturing databases if properly analyzed can provide useful information about performance and optimization opportunities, as well as the keys to improving processes and troubleshooting problems. The traditional method of turning data into knowledge relied on manual analysis and interpretation. However, manual analysis of data is becoming impractical in many domains as data volumes grow exponentially. Furthermore, manual analysis of a dataset is slow, expensive, and highly subjective. When the scale of data analysis grew beyond human capabilities, people started looking to computer technology to automate the process. This initially resulted in the use of statistical techniques on computers for automatic data analysis. Although the use of statistical techniques along with file management tools once sufficed for analysis of data in databases, the large size of modern databases, the mission-critical nature of the data, and speed with which analyses need to be made, motivated researchers to look into new approaches of data analysis. This ultimately resulted in a new generation of techniques and tools for automatic analysis of huge volumes of data. These new techniques and tools meant for extraction of knowledge from large datasets are the subject of the field called knowledge discovery in databases (KDD). The KDD Process KDD is an umbrella term describing a variety of activities for making sense of data. It is used to describe the overall process of finding useful patterns in data, including not only the data mining step of running specific knowledge discovery algorithms but also pre- and post-processing of data and a host of other important activities. The KDD process, shown in Figure 16.14, typically consists of the following steps: 1. Data Selection. This step involves selection of a dataset or focusing on a subset of variables or data samples on which knowledge discovery is to be performed. This step is more significant than it may appear to be, especially when the data is to be pulled from multiple data sources. For data selecti...
View Full Document

This document was uploaded on 04/07/2014.

Ask a homework question - tutors are online