Due to this feature the data in a data warehouse

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: such as deciding which models and parameters may be appropriate and matching a particular data mining method with the overall criteria of the KDD process. 5. Data Interpretation/Evaluation. This step involves interpreting the discovered patterns, as well as possible visualization of the extracted patterns. In this step, redundant or irrelevant patterns are removed, and powerful visualization tools are used to translate the useful patterns into the form most understandable by the users. The user then incorporates this knowledge into the performance system, taking actions based on the knowledge, or simply documenting it and reporting it to interested parties. It may be noted from Figure 16.14 that a realistic KDD process is not simple and linear, but thoroughly iterative and interactive. That is, the results of analysis are fed back into the modeling and hypothesis derivation process to produce improved results on subsequent iterations. It may also be noted here that often people use the terms data mining and KDD interchangeably because data mining is key to the KDD process. However, as shown in the figure, the data mining activity (which is often considered as the core activity of the KDD process) takes only a small part (estimated at 15% to 25%) of the effort of the overall KDD process. The activities from data selection to data transformation in the entire KDD process are popularly known as data warehousing. As data warehousing and data mining have gained wide popularity, they are separately covered below. DATA WAREHOUSING Data warehousing is the process of creating a repository of integrated data for the purpose of decision support and analysis. A data warehouse is the resulting repository of integrated data. It is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision-making process. The meaning of the key terms used in this definition is as follows. 1. Subject-oriented. This means that all data in a dat...
View Full Document

This document was uploaded on 04/07/2014.

Ask a homework question - tutors are online