### Neuro_3

Course: CSCI 5525, Spring 2012
School: Minnesota
5525: CSCI Machine Learning (Spring 2012) Multilayer Perceptron Rui Kuang Department of Computer Science and Engineering University of Minnesota Tuning the Network Size Destructive Weight decay: penalizing non-zero parameters The same as adding additional constaints Constructive Growing networks: if training erro is high, add more hidden units.

5525: CSCI Machine Learning (Spring 2012) Multilayer Perceptron Rui Kuang Department of Computer Science and Engineering University of Minnesota Tuning the Network Size Destructive Weight decay: penalizing non-zero parameters The same as adding additional constaints Constructive Growing networks: if training erro is high, add more hidden units. %E "w i = #\$ # &w i %w i & E ' = E + ' w i2 2i E. Alpaydin, Introduction to Machine Learning Bayesian Learning Consider weights wi as random vars, prior p(wi) p ( X |w ) p ( w ) p ( w|X ) = p (X ) log p ( w|X ) = log p ( X |w ) + log p ( w ) + C p ( w ) = ! p ( wi ) i \$ wi2 ' where p ( wi ) = c " exp &# ) % 2(1 / 2 ! ) ( E' = E +! w 2 Weight decay, ridge regression, regularization cost=data-misfit + complexity E. Alpaydin, Introduction to Machine Learning Learning Time Applications: Sequence recognition: Speech recognition Sequence reproduction: Time-series prediction Sequence association Network architectures Recurrent networks (Rumelhart et al., 1986) Time-delay networks (Waibel et al., 1989) E. Alpaydin, Introduction to Machine Learning Time-Delay Neural Networks E. Alpaydin, Introduction to Machine Learning Recurrent Networks Feed forward networks: Information only flows one way One input pattern produces one output No sense of time (or memory of previous state) Recurrency Nodes connect back to other nodes or themselves Information flow is multidirectional Sense of time and memory of previous state(s) Biological nervous show systems high Recurrent Networks E. Alpaydin, Introduction to Machine Learning Unfolding in Time E. Alpaydin, Introduction to Machine Learning CSCI 5525: Machine Learning (Spring 2012) Local Models Rui Kuang Department of Computer Science and Engineering University of Minnesota Introduction Divide the input space into local regions and learn simple (constant/linear) models in each patch Unsupervised: Competitive, online clustering Supervised: Radial-basis functions, mixture of experts E. Alpaydin, Introduction to Machine Learning Competitive Learning ( ) 2 E {mi }i=1 X = " " b x ! mi k t t ii t # % 1 if x t ! mi = min x t ! ml l bit = \$ % 0 otherwise & Batch k - means : mi "bx = "b t i t t t t i Online k - means : &E t #mij = \$% = %bit x tj \$ mij &mij ( ) E. Alpaydin, Introduction to Machine Learning Online K-means E = " b x ! mi t t ii t 2 \$E t #mij = !! = !bit ( x tj ! mij ) \$mij E. Alpaydin, Introduction to Machine Learning Network Interpretation Winner-take-all network Renormalizing: mi = 1, !i Weight decay term: !mij = !bit ( x tj " mij ) = !bit x tj " !bit mij E. Alpaydin, Introduction to Machine Learning Adaptive Resonance Theory Incremental; add a new cluster if not covered; defined by vigilance, t i t k b = x " mi = " min x t " ml l =1 'mk +1 # x t if bi > \$ ( %mi = &( x t " mi ) otherwise ) (Carpenter and Grossberg, 1988) E. Alpaydin, Introduction to Machine Learning
Minnesota - CSCI - 5525
