Chp8 - Copy

# Chp8 - Copy - 8 Model Inference and Averaging 8.1...

This preview shows pages 1–3. Sign up to view the full content.

8 Model Inference and Averaging 8.1 Introduction For most of this book, the fitting (learning) of models has been achieved by minimizing a sum of squares for regression, or by minimizing cross-entropy for classification. In fact, both of these minimizations are instances of the maximum likelihood approach to fitting. In this chapter we provide a general exposition of the maximum likeli- hood approach, as well as the Bayesian method for inference. The boot- strap, introduced in Chapter 7, is discussed in this context, and its relation to maximum likelihood and Bayes is described. Finally, we present some related techniques for model averaging and improvement, including com- mittee methods, bagging, stacking and bumping. 8.2 The Bootstrap and Maximum Likelihood Methods 8.2.1 A Smoothing Example The bootstrap method provides a direct computational way of assessing uncertainty, by sampling from the training data. Here we illustrate the bootstrap in a simple one-dimensional smoothing problem, and show its connection to maximum likelihood. © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 261 DOI: 10.1007/b94608_8,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
262 8. Model Inference and Averaging 0.0 0.5 1.0 1.5 2.0 2.5 3.0 -1 0 1 2 3 4 5 x y • • •• •• •• 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x B-spline Basis FIGURE 8.1. (Left panel): Data for smoothing example. (Right panel:) Set of seven B -spline basis functions. The broken vertical lines indicate the placement of the three knots. Denote the training data by Z = { z 1 , z 2 , . . . , z N } , with z i = ( x i , y i ), i = 1 , 2 , . . . , N . Here x i is a one-dimensional input, and y i the outcome, either continuous or categorical. As an example, consider the N = 50 data points shown in the left panel of Figure 8.1. Suppose we decide to fit a cubic spline to the data, with three knots placed at the quartiles of the X values. This is a seven-dimensional lin- ear space of functions, and can be represented, for example, by a linear expansion of B -spline basis functions (see Section 5.9.2): μ ( x ) = 7 j =1 β j h j ( x ) . (8.1) Here the h j ( x ), j = 1 , 2 , . . . , 7 are the seven functions shown in the right panel of Figure 8.1. We can think of μ ( x ) as representing the conditional mean E( Y | X = x ). Let H be the N × 7 matrix with ij th element h j ( x i ). The usual estimate of β , obtained by minimizing the squared error over the training set, is given by ˆ β = ( H T H ) 1 H T y . (8.2) The corresponding fit ˆ μ ( x ) = 7 j =1 ˆ β j h j ( x ) is shown in the top left panel of Figure 8.2. The estimated covariance matrix of ˆ β is Var( ˆ β ) = ( H T H ) 1 ˆ σ 2 , (8.3) where we have estimated the noise variance by ˆ σ 2 = N i =1 ( y i ˆ μ ( x i )) 2 /N .
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern