This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ion power) of the
model. For example, ﬁgure 6 illustrates the effect of a threshold split applied to the income
variable for a loan data set: It is clear that using such simple threshold splits (parallel to
the feature axes) severely limits the type of
classiﬁcation boundaries that can be induced.
If one enlarges the model space to allow more
general expressions (such as multivariate hyperplanes at arbitrary angles), then the model
is more powerful for prediction but can be
much more difﬁcult to comprehend. A large
number of decision tree and ruleinduction
algorithms are described in the machinelearning and applied statistics literature
(Quinlan 1992; Breiman et al. 1984).
To a large extent, they depend on likelihoodbased modelevaluation methods, with
varying degrees of sophistication in terms of
penalizing model complexity. Greedy search
methods, which involve growing and pruning rule and tree structures, are typically used
to explore the superexponential space of possible models. Trees and rules are primarily
used for predictive modeling, both for classiﬁcation (Apte and Hong 1996; Fayyad, Djorgovski, and Weir 1996) and regression, although they can also be applied to summary
descriptive modeling (Agrawal et al. 1996). Nonlinear Regression and
Classiﬁcation Methods
These methods consist of a family of techniques for prediction that ﬁt linear and nonlinear combinations of basis functions (sigmoids, splines, polynomials) to combinations
of the input variables. Examples include feedforward neural networks, adaptive spline
methods, and projection pursuit regression
(see Elder and Pregibon [1996], Cheng and
Titterington [1994], and Friedman [1989] for
more detailed discussions). Consider neural
networks, for example. Figure 7 illustrates the
type of nonlinear decision boundary that a
neural network might ﬁnd for the loan data
set. In terms of model evaluation, although
networks of the appropriate size can universally approximate any smooth function to
any desired degree of accuracy, relatively little
is known about the representation properties
of ﬁxedsize networks estimated from ﬁnite
data sets. Also, the standard squared error and Articles c rossentropy loss functions used to train
neural networks can be viewed as loglikelihood functions for regression and
classiﬁcation, respectively (Ripley 1994; Geman, Bienenstock, and Doursat 1992). Back
propagation is a parametersearch method
that performs gradient descent in parameter
(weight) space to ﬁnd a local maximum of
the likelihood function starting from random
initial conditions. Nonlinear regression methods, although powerful in representational
power, can be difﬁcult to interpret.
For example, although the classiﬁcation
boundaries of ﬁgure 7 might be more accurate than the simple threshold boundary of
ﬁgure 6, the threshold boundary has the advantage that the model can be expressed, to
some degree of certainty, as a simple rule of
the form “if income is greater than threshold,
then loan will have good status.” ExampleBased Methods
The representation is simp...
View
Full
Document
This document was uploaded on 02/15/2014.
 Spring '14

Click to edit the document details