Chp8 - Copy

Chp8 - Copy - 8 Model Inference and Averaging 8.1...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
8 Model Inference and Averaging 8.1 Introduction For most of this book, the fitting (learning) of models has been achieved by minimizing a sum of squares for regression, or by minimizing cross-entropy for classification. In fact, both of these minimizations are instances of the maximum likelihood approach to fitting. In this chapter we provide a general exposition of the maximum likeli- hood approach, as well as the Bayesian method for inference. The boot- strap, introduced in Chapter 7, is discussed in this context, and its relation to maximum likelihood and Bayes is described. Finally, we present some related techniques for model averaging and improvement, including com- mittee methods, bagging, stacking and bumping. 8.2 The Bootstrap and Maximum Likelihood Methods 8.2.1 A Smoothing Example The bootstrap method provides a direct computational way of assessing uncertainty, by sampling from the training data. Here we illustrate the bootstrap in a simple one-dimensional smoothing problem, and show its connection to maximum likelihood. © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 261 DOI: 10.1007/b94608_8,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
262 8. Model Inference and Averaging 0.0 0.5 1.0 1.5 2.0 2.5 3.0 -1012345 x y • • •• 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x B-spline Basis FIGURE 8.1. (Left panel): Data for smoothing example. (Right panel:) Set of seven B -spline basis functions. The broken vertical lines indicate the placement of the three knots. Denote the training data by Z = { z 1 ,z 2 ,...,z N } , with z i =( x i ,y i ), i =1 , 2 ,...,N . Here x i is a one-dimensional input, and y i the outcome, either continuous or categorical. As an example, consider the N = 50 data points shown in the left panel of Figure 8.1. Suppose we decide to ±t a cubic spline to the data, with three knots placed at the quartiles of the X values. This is a seven-dimensional lin- ear space of functions, and can be represented, for example, by a linear expansion of B -spline basis functions (see Section 5.9.2): μ ( x )= 7 ± j =1 β j h j ( x ) . (8.1) Here the h j ( x ), j , 2 ,..., 7 are the seven functions shown in the right panel of Figure 8.1. We can think of μ ( x ) as representing the conditional mean E( Y | X = x ). Let H be the N × 7 matrix with ij th element h j ( x i ). The usual estimate of β , obtained by minimizing the squared error over the training set, is given by ˆ β H T H ) 1 H T y . (8.2) The corresponding ±t ˆ μ ( x 7 j =1 ˆ β j h j ( x ) is shown in the top left panel of Figure 8.2. The estimated covariance matrix of ˆ β is d Var( ˆ β )=( H T H ) 1 ˆ σ 2 , (8.3) where we have estimated the noise variance by ˆ σ 2 = N i =1 ( y i ˆ μ ( x i )) 2 /N .
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/14/2010 for the course STAT 132 taught by Professor Haulk during the Spring '10 term at UBC.

Page1 / 34

Chp8 - Copy - 8 Model Inference and Averaging 8.1...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online