training where we split our dataset into k folds instead of a
typical training
and test split.
●
For example, let’s say we’re using 5 folds. We train on 4, and use the 5
th
final
fold as our test.
We then train on the other 4 folds, and test on another.
●
We then use the average weights across coming out of each cycle.
●
Cross Validation reduces overfitting but slows the training process
531

Early Stopping
532

Dropout
533
●
Dropout refers to dropping nodes (both hidden and visible) in a
neural network with the aim of reducing overfitting.
●
In training certain parts of the neural network are ignored during
some forward and backward propagations.
●
Dropout is an approach to regularization in neural networks
which helps reducing interdependent learning amongst the
neurons. Thus the NN learns more robust or meaningful features.
●
In Dropout we set a parameter ‘
P’ that sets the probability of
which nodes are kept or (
1
-
p) for those that are dropped.
●
Dropout almost doubles the time to converge in training

Dropout Illustrated
534

Data Augmentation
535
●
Data Augmentation is one of the easiest ways to improve our models.
●
It’s simply taking our input dataset and making slight variations to it in
order to improve the amount of data we have for training. Examples
below.
●
This allows us to build more robust models that don’t overfit.

Epochs, Iterations and Batch
Sizes
Understanding some Neural Network Training Terminolgy
6.9

Epochs
●
You may have seen or heard me mention Epochs in the
training process, so what exactly is an Epoch?
●
An Epoch occurs when the full set of our training data is
passed/forward propagated and then backpropagated
through our neural network.
●
After the first Epoch, we will have a decent set of weights,
however, by feeding our training data again and again into
our Neural Network, we can further improve the weights.
This is why we train for several iterations/epochs (50+
usually)
537

Batches
●
Unless we had huge volumes of RAM, we can’t simply pass
all our training data to our Neural Network in training. We
need to split the data up into segments or………..Batches
Batches
Batches.
●
Batch Size is the number of training samples we use in a
single batch.
●
Example, say we had 1000 samples of data, and specified a
batch size of 100. In training, we’d take 100 samples of that
data and use it in the forward/backward pass then update
our weights. If the batch size is 1, we’re simply doing
Stochastic Gradient Descent.
538

Iterations
●
Many confuse iterations and Epochs (I was one of them)
●
However, the difference is quite simple, Iterations are the
number of batches we need to complete one Epoch.
●
In our previous example, we had 1000 items in our dataset,
and set a batch size of 100. Therefore, we’ll need 10
iterations (100 x 10) to complete and Epoch.


You've reached the end of your free preview.
Want to read all 776 pages?
- Summer '18
- bhosle
- Machine Learning