Chp11

# Chp11 - 11 Neural Networks 11.1 Introduction In this...

This preview shows pages 1–3. Sign up to view the full content.

11 Neural Networks 11.1 Introduction In this chapter we describe a class of learning methods that was developed separately in diﬀerent ﬁelds—statistics and artiﬁcial intelligence—based on essentially identical models. The central idea is to extract linear com- binations of the inputs as derived features, and then model the target as a nonlinear function of these features. The result is a powerful learning method, with widespread applications in many ﬁelds. We ﬁrst discuss the projection pursuit model, which evolved in the domain of semiparamet- ric statistics and smoothing. The rest of the chapter is devoted to neural network models. 11.2 Projection Pursuit Regression As in our generic supervised learning problem, assume we have an input vector X with p components, and a target Y .Let ω m ,m =1 , 2 ,...,M, be unit p -vectors of unknown parameters. The projection pursuit regression (PPR) model has the form f ( X )= M ± m =1 g m ( ω T m X ) . (11.1) This is an additive model, but in the derived features V m = ω T m X rather than the inputs themselves. The functions g m are unspeciﬁed and are esti- © Springer Science+Business Media, LLC 2009 T. Hastie et al., The Elements of Statistical Learning, Second Edition, 389 DOI: 10.1007/b94608_11,

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
390 Neural Networks g ( V ) X 1 X 2 g ( V ) X 1 X 2 FIGURE 11.1. Perspective plots of two ridge functions. (Left:) g ( V )=1 / [1 + exp( 5( V 0 . 5))] ,where V =( X 1 + X 2 ) / 2 . (Right:) g ( V )=( V +0 . 1) sin(1 / ( V/ 3+0 . 1)) V = X 1 . mated along with the directions ω m using some ﬂexible smoothing method (see below). The function g m ( ω T m X ) is called a ridge function in IR p .I tva r ie son ly in the direction deﬁned by the vector ω m . The scalar variable V m = ω T m X is the projection of X onto the unit vector ω m , and we seek ω m so that the model ﬁts well, hence the name “projection pursuit.” Figure 11.1 shows =(1 / 2)(1 , 1) T , so that the function only varies in the direction X 1 + X 2 . In the example on the right, ω , 0). The PPR model (11.1) is very general, since the operation of forming nonlinear functions of linear combinations generates a surprisingly large class of models. For example, the product X 1 · X 2 can be written as [( X 1 + X 2 ) 2 ( X 1 X 2 ) 2 ] / 4, and higher-order products can be represented simi- larly. In fact, if M is taken arbitrarily large, for appropriate choice of g m the PPR model can approximate any continuous function in IR p arbitrarily well. Such a class of models is called a universal approximator . However this generality comes at a price. Interpretation of the ﬁtted model is usually diﬃcult, because each input enters into the model in a complex and multi- faceted way. As a result, the PPR model is most useful for prediction, and not very useful for producing an understandable model for the data. The M = 1 model, known as the single index model in econometrics, is an exception. It is slightly more general than the linear regression model, and oﬀers a similar interpretation.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 28

Chp11 - 11 Neural Networks 11.1 Introduction In this...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online