This preview shows pages 1–6. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STA 414/2104 Mar 9, 2010 Notes I Sample test questions posted I Review and/or questions on Thursday this week I Test will have 3 questions: one from Sample test, one specific to 414/2104 I Extra Office Hour Monday, March 15, 34 I Watch web site for late breaking announcement re MidTerm 1 / 20 STA 414/2104 Mar 9, 2010 Neural Networks I feed forward single layer neural network I Y k = g k { k + M X m = 1 km ( m + p X = 1 m X ) } = f k ( X ) I ( x ) = 1 1 + e x tanh ( x ) = e x e x e x + e x , maps to ( 1 , + 1 ) 2 / 20 STA 414/2104 Mar 9, 2010 ... neural networks I Y k = g k { k + M X m = 1 km ( m + p X = 1 m X ) } = f k ( X ) I = ( m , m , k , k ) I R ( ) = N i = 1 K k = 1 { y ik f k ( x i ) } 2 , or I R ( ) = N i = 1 K k = 1 y ik log f k ( x i ) I dim ( ) = M ( p + 1 ) + K ( M + 1 ) regularization/shrinkage, also called weight decay I minimize R ( ) + J ( ) = R ( ) + X km 2 km + X m 2 m ! I standardize inputs to mean 0, variance 1 for regularization I backfitting algorithm for minimizing R ( ) described in 11.4; extension to R ( ) + J ( ) in 11.5.2 3 / 20 STA 414/2104 Mar 9, 2010 ... neural networks I nnet in MASS library: recommend ( 10 4 , 10 2 ) for squared error loss; ( . 01 ,. 1 ) for loglikelihood I compare Figure 11.4 top/bottom I results very sensitive to starting values: R ( ) has many local maxima I recommendation (Ripley): take average predictions over several nnet fits I weight decay seems to be more important than number of hidden units I See 11.7, 8, 9 for interesting examples where neural nets work well 4 / 20 STA 414/2104 Mar 9, 2010...
View Full
Document
 Spring '09

Click to edit the document details