This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 9 Additive Models, Trees, and Related Methods In this chapter we begin our discussion of some specific methods for super- vised learning. These techniques each assume a (different) structured form for the unknown regression function, and by doing so they finesse the curse of dimensionality. Of course, they pay the possible price of misspecifying the model, and so in each case there is a tradeoff that has to be made. They take off where Chapters 3–6 left off. We describe five related techniques: generalized additive models, trees, multivariate adaptive regression splines, the patient rule induction method, and hierarchical mixtures of experts. 9.1 Generalized Additive Models Regression models play an important role in many data analyses, providing prediction and classification rules, and data analytic tools for understand- ing the importance of different inputs. Although attractively simple, the traditional linear model often fails in these situations: in real life, effects are often not linear. In earlier chapters we described techniques that used predefined basis functions to achieve nonlinearities. This section describes more automatic ﬂexible statistical methods that may be used to identify and characterize nonlinear regression effects. These methods are called “generalized additive models.” In the regression setting, a generalized additive model has the form E( Y | X 1 , X 2 , . . . , X p ) = α + f 1 ( X 1 ) + f 2 ( X 2 ) + ··· + f p ( X p ) . (9.1) © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 295 DOI: 10.1007/b94608_9, 296 9. Additive Models, Trees, and Related Methods As usual X 1 , X 2 , . . . , X p represent predictors and Y is the outcome; the f j ’s are unspecified smooth (“nonparametric”) functions. If we were to model each function using an expansion of basis functions (as in Chapter 5), the resulting model could then be fit by simple least squares. Our approach here is different: we fit each function using a scatterplot smoother (e.g., a cubic smoothing spline or kernel smoother), and provide an algorithm for simultaneously estimating all p functions (Section 9.1.1). For two-class classification, recall the logistic regression model for binary data discussed in Section 4.4. We relate the mean of the binary response μ ( X ) = Pr( Y = 1 | X ) to the predictors via a linear regression model and the logit link function: log μ ( X ) 1 − μ ( X ) = α + β 1 X 1 + ··· + β p X p . (9.2) The additive logistic regression model replaces each linear term by a more general functional form log μ ( X ) 1 − μ ( X ) = α + f 1 ( X 1 ) + ··· + f p ( X p ) , (9.3) where again each f j is an unspecified smooth function. While the non- parametric form for the functions f j makes the model more ﬂexible, the additivity is retained and allows us to interpret the model in much the same way as before. The additive logistic regression model is an examplesame way as before....
View Full Document
- Spring '10
- Regression Analysis, Additive Models, Generalized Additive Models