From Data Mining to Knowledge Discovery in Databases

This reductionist view is not necessarily complete or

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: eville and Nikiforov 1993). The Components of Data-Mining Algorithms The next step is to construct specific algorithms to implement the general methods we outlined. One can identify three primary components in any data-mining algorithm: (1) model representation, (2) model evaluation, and (3) search. This reductionist view is not necessarily complete or fully encompassing; rather, it is a convenient way to express the key concepts of data-mining algorithms in a relatively unified and compact manner. Cheeseman (1990) outlines a similar structure. Model representation is the language used to describe discoverable patterns. If the representation is too limited, then no amount of training time or examples can produce an accurate model for the data. It is important that a data analyst fully comprehend the representational assumptions that might be inherent in a particular method. It is equally important that an algorithm designer clearly state which representational assumptions are being made by a particular algorithm. Note that increased representational power for models increases the danger of overfitting the training data, resulting in reduced prediction accuracy on unseen data. Model-evaluation criteria are quantitative FALL 1996 45 Articles Decision Trees and Rules o Debt No Loan o x o x x x x o x o o x o o o x x o o x o Loan o t Income Figure 6. Using a Single Threshold on the Income Variable to Try to Classify the Loan Data Set. statements (or fit functions) of how well a particular pattern (a model and its parameters) meets the goals of the KDD process. For example, predictive models are often judged by the empirical prediction accuracy on some test set. Descriptive models can be evaluated along the dimensions of predictive accuracy, novelty, utility, and understandability of the fitted model. Search method consists of two components: (1) parameter search and (2) model search. Once the model representation (or family of representations) and the model-evaluation criteria are fixed, then the data-mining problem has been reduced to purely an optimization task: Find the parameters and models from the selected family that optimize the evaluation criteria. In parameter search, the algorithm must search for the parameters that optimize the model-evaluation criteria given observed data and a fixed model representation. Model search occurs as a loop over the parameter-search method: The model representation is changed so that a family of models is considered. Some Data-Mining Methods A wide variety of data-mining methods exist, but here, we only focus on a subset of popular techniques. Each method is discussed in the context of model representation, model evaluation, and search. 46 AI MAGAZINE Decision trees and rules that use univariate splits have a simple representational form, making the inferred model relatively easy for the user to comprehend. However, the restriction to a particular tree or rule representation can significantly restrict the functional form (and, thus, the approximat...
View Full Document

Ask a homework question - tutors are online