From Data Mining to Knowledge Discovery in Databases

If one enlarges the model space to allow more general

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ion power) of the model. For example, figure 6 illustrates the effect of a threshold split applied to the income variable for a loan data set: It is clear that using such simple threshold splits (parallel to the feature axes) severely limits the type of classification boundaries that can be induced. If one enlarges the model space to allow more general expressions (such as multivariate hyperplanes at arbitrary angles), then the model is more powerful for prediction but can be much more difficult to comprehend. A large number of decision tree and rule-induction algorithms are described in the machinelearning and applied statistics literature (Quinlan 1992; Breiman et al. 1984). To a large extent, they depend on likelihood-based model-evaluation methods, with varying degrees of sophistication in terms of penalizing model complexity. Greedy search methods, which involve growing and pruning rule and tree structures, are typically used to explore the superexponential space of possible models. Trees and rules are primarily used for predictive modeling, both for classification (Apte and Hong 1996; Fayyad, Djorgovski, and Weir 1996) and regression, although they can also be applied to summary descriptive modeling (Agrawal et al. 1996). Nonlinear Regression and Classification Methods These methods consist of a family of techniques for prediction that fit linear and nonlinear combinations of basis functions (sigmoids, splines, polynomials) to combinations of the input variables. Examples include feedforward neural networks, adaptive spline methods, and projection pursuit regression (see Elder and Pregibon [1996], Cheng and Titterington [1994], and Friedman [1989] for more detailed discussions). Consider neural networks, for example. Figure 7 illustrates the type of nonlinear decision boundary that a neural network might find for the loan data set. In terms of model evaluation, although networks of the appropriate size can universally approximate any smooth function to any desired degree of accuracy, relatively little is known about the representation properties of fixed-size networks estimated from finite data sets. Also, the standard squared error and Articles c ross-entropy loss functions used to train neural networks can be viewed as log-likelihood functions for regression and classification, respectively (Ripley 1994; Geman, Bienenstock, and Doursat 1992). Back propagation is a parameter-search method that performs gradient descent in parameter (weight) space to find a local maximum of the likelihood function starting from random initial conditions. Nonlinear regression methods, although powerful in representational power, can be difficult to interpret. For example, although the classification boundaries of figure 7 might be more accurate than the simple threshold boundary of figure 6, the threshold boundary has the advantage that the model can be expressed, to some degree of certainty, as a simple rule of the form “if income is greater than threshold, then loan will have good status.” Example-Based Methods The representation is simp...
View Full Document

This document was uploaded on 02/15/2014.

Ask a homework question - tutors are online