5 Thus training an RNN is simply unrolling the RNN for a given size of input

# 5 thus training an rnn is simply unrolling the rnn

• 169
• 86% (7) 6 out of 7 people found this document helpful

This preview shows page 96 - 100 out of 169 pages.

5. Thus, training an RNN is simply unrolling the RNN for a given size of input (and, correspondingly, the expected output) and training the unrolled RNN via computing the gradients and using stochastic gradient descent. As mentioned earlier in the chapter, RNNs can deal with arbitrarily long inputs and correspondingly, they need to be trained on arbitrarily long inputs. Figure  6-7 illustrates how an RNN is unrolled for different sizes of inputs. Note that once the RNN is unrolled, the process of training the RNN is identical to training a regular neural network which we have covered in earlier chapters. In Figure  6-7 the RNN described in Figure  6-1 is unrolled for input sizes of 1,2,3 and 4.
CHAPTER 6 RECURRENT NEURAL NETWORKS 87 Figure 6-7. Unrolling the RNN corresponding to Figure  6-1 for different sizes of inputs
CHAPTER 6 RECURRENT NEURAL NETWORKS 88 Figure 6-8. Teacher Forcing (Top – Training, Bottom - Prediction) Given that the data set to be trained on consists of sequences of varying sizes, the input sequences are grouped so that the sequences of the same size fall in one group. Then for a group, we can unroll the RNN for the sequence length and train. Training for a different group will require the RNN to be unrolled for a different sequence length. Thus, it is possible to train the RNN on inputs of varying sizes by unrolling and training with the unrolling done based on the sequence length.
CHAPTER 6 RECURRENT NEURAL NETWORKS 89 It must be noted that training the unrolled RNN (illustrated in Figure  6-1 ) is essentially a sequential process, as the hidden states are dependent on each other. In the case of RNNs wherein the recurrence is over the output instead of the hidden state (Figure  6-2 ), it is possible to use a technique called teacher forcing as illustrated in Figure  6-8 . The key idea here is to use y t - () 1 instead of ˆ y t - 1 in the computation of h ( t ) while training. While making predictions (when the model is deployed for usage), however, ˆ y t - 1 is used. Bidirectional RNNs Let us now take a look at another variation on RNNs, namely, the bidirectional RNN. The key intuition behind a bidirectional RNN is to use the entities that lie further in the sequence to make a prediction for the current entity. For all the RNNs we have considered so far we have been using the previous entities (captured by the hidden state) and the current entity in the sequence to make the prediction. However, we have not been using information concerning the entities that lie further in the sequence to make predictions. A bidirectional RNN leverages this information and can give improved predictive accuracy in many cases. A bidirectional RNN can be described using the following equations: hU xW hb f t f t f t f + =+ + tanh 1 b t b t b t b - + tanh 1 ˆ ys oftmaxVh Vh c t bb t ff t + The following points are to be noted: 1. The RNN computation involves first computing the forward hidden state and backward hidden state for an entity in the sequence. This is denoted by h f ( t ) and h b ( t ) respectively.