Chp11 - 11 Neural Networks 11.1 Introduction In this...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
11 Neural Networks 11.1 Introduction In this chapter we describe a class of learning methods that was developed separately in different fields—statistics and artificial intelligence—based on essentially identical models. The central idea is to extract linear com- binations of the inputs as derived features, and then model the target as a nonlinear function of these features. The result is a powerful learning method, with widespread applications in many fields. We first discuss the projection pursuit model, which evolved in the domain of semiparamet- ric statistics and smoothing. The rest of the chapter is devoted to neural network models. 11.2 Projection Pursuit Regression As in our generic supervised learning problem, assume we have an input vector X with p components, and a target Y .Let ω m ,m =1 , 2 ,...,M, be unit p -vectors of unknown parameters. The projection pursuit regression (PPR) model has the form f ( X )= M ± m =1 g m ( ω T m X ) . (11.1) This is an additive model, but in the derived features V m = ω T m X rather than the inputs themselves. The functions g m are unspecified and are esti- © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 389 DOI: 10.1007/b94608_11,
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
390 Neural Networks g ( V ) X 1 X 2 g ( V ) X 1 X 2 FIGURE 11.1. Perspective plots of two ridge functions. (Left:) g ( V )=1 / [1 + exp( 5( V 0 . 5))] ,where V =( X 1 + X 2 ) / 2 . (Right:) g ( V )=( V +0 . 1) sin(1 / ( V/ 3+0 . 1)) V = X 1 . mated along with the directions ω m using some flexible smoothing method (see below). The function g m ( ω T m X ) is called a ridge function in IR p .I tva r ie son ly in the direction defined by the vector ω m . The scalar variable V m = ω T m X is the projection of X onto the unit vector ω m , and we seek ω m so that the model fits well, hence the name “projection pursuit.” Figure 11.1 shows =(1 / 2)(1 , 1) T , so that the function only varies in the direction X 1 + X 2 . In the example on the right, ω , 0). The PPR model (11.1) is very general, since the operation of forming nonlinear functions of linear combinations generates a surprisingly large class of models. For example, the product X 1 · X 2 can be written as [( X 1 + X 2 ) 2 ( X 1 X 2 ) 2 ] / 4, and higher-order products can be represented simi- larly. In fact, if M is taken arbitrarily large, for appropriate choice of g m the PPR model can approximate any continuous function in IR p arbitrarily well. Such a class of models is called a universal approximator . However this generality comes at a price. Interpretation of the fitted model is usually difficult, because each input enters into the model in a complex and multi- faceted way. As a result, the PPR model is most useful for prediction, and not very useful for producing an understandable model for the data. The M = 1 model, known as the single index model in econometrics, is an exception. It is slightly more general than the linear regression model, and offers a similar interpretation.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 28

Chp11 - 11 Neural Networks 11.1 Introduction In this...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online