training where we split our dataset into k folds instead of a typical training and test split. ● For example, let’s say we’re using 5 folds. We train on 4, and use the 5 th final fold as our test. We then train on the other 4 folds, and test on another. ● We then use the average weights across coming out of each cycle. ● Cross Validation reduces overfitting but slows the training process 531
Early Stopping 532
Dropout 533 ● Dropout refers to dropping nodes (both hidden and visible) in a neural network with the aim of reducing overfitting. ● In training certain parts of the neural network are ignored during some forward and backward propagations. ● Dropout is an approach to regularization in neural networks which helps reducing interdependent learning amongst the neurons. Thus the NN learns more robust or meaningful features. ● In Dropout we set a parameter ‘ P’ that sets the probability of which nodes are kept or ( 1 - p) for those that are dropped. ● Dropout almost doubles the time to converge in training
Dropout Illustrated 534
Data Augmentation 535 ● Data Augmentation is one of the easiest ways to improve our models. ● It’s simply taking our input dataset and making slight variations to it in order to improve the amount of data we have for training. Examples below. ● This allows us to build more robust models that don’t overfit.
Epochs, Iterations and Batch Sizes Understanding some Neural Network Training Terminolgy 6.9
Epochs ● You may have seen or heard me mention Epochs in the training process, so what exactly is an Epoch? ● An Epoch occurs when the full set of our training data is passed/forward propagated and then backpropagated through our neural network. ● After the first Epoch, we will have a decent set of weights, however, by feeding our training data again and again into our Neural Network, we can further improve the weights. This is why we train for several iterations/epochs (50+ usually) 537
Batches ● Unless we had huge volumes of RAM, we can’t simply pass all our training data to our Neural Network in training. We need to split the data up into segments or………..Batches Batches Batches. ● Batch Size is the number of training samples we use in a single batch. ● Example, say we had 1000 samples of data, and specified a batch size of 100. In training, we’d take 100 samples of that data and use it in the forward/backward pass then update our weights. If the batch size is 1, we’re simply doing Stochastic Gradient Descent. 538
Iterations ● Many confuse iterations and Epochs (I was one of them) ● However, the difference is quite simple, Iterations are the number of batches we need to complete one Epoch. ● In our previous example, we had 1000 items in our dataset, and set a batch size of 100. Therefore, we’ll need 10 iterations (100 x 10) to complete and Epoch.
You've reached the end of your free preview.
Want to read all 776 pages?
- Summer '18
- Machine Learning