ml-lecture04

ml-lecture04 - Lecture 4: More on neural networks Finding a...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 4: More on neural networks Finding a good network structure: overftting Automatic construction o network structure: Constructive methods Destructive methods Other kinds o networks Autoencoders Recurrent neural networks September 17, 2007 1 COMP-652 Lecture 4 How large should the network be? Overftting occurs i there are too many parameters compared to the amount o data available Choosing the number o hidden units: Too ew hidden units do not allow the concept to be learned Too many lead to slow learning and overftting I the n inputs are binary, log n is a good heuristic choice Choosing the number o layers Always start with one hidden layer Never go beyond 2 hidden layers, unless the task structure suggests something dierent September 17, 2007 2 COMP-652 Lecture 4 Overtraining in feed-forward networks 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 5000 10000 15000 20000 Error Number of weight updates Error versus weight updates (example 1) Training set error Validation set error 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 1000 2000 3000 4000 5000 6000 Error Number of weight updates Error versus weight updates (example 2) Training set error Validation set error This is a different form of overFtting, which occurs when weights take on large magnitudes, pushing the sigmoids into saturation Effectively, as learning progresses, the network has more parameters Use a validation set to decide when to stop training! September 17, 2007 3 COMP-652 Lecture 4 k-fold cross-validation 1. Split the training data into k partitions (folds) 2. Repeat k times: (a) Take one fold to be the test set (b) Take one fold to be the validation set (c) Take the remaining k- 2 folds to form the training set (d) We train the parameters on the training set, using the validation set to decide when to stop, then measure J train ( i ) and J test ( i ) on fold i 3. Report the average of J train ( i ) and the average of J test ( i ) , i = 1 , . . . k . Magic number: k = 10 . September 17, 2007 4 COMP-652 Lecture 4 More on cross-validation It is good to ensure the same distribution of examples in each fold If two algorithms are compared, it should be on the same folds We get an idea not only of the average performance, but also of...
View Full Document

Page1 / 11

ml-lecture04 - Lecture 4: More on neural networks Finding a...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online