86%(7)6 out of 7 people found this document helpful
This preview shows page 96 - 100 out of 169 pages.
5. Thus, training an RNN is simply unrolling the RNN for a given size of input (and, correspondingly, the expected output) and training the unrolled RNN via computing the gradients and using stochastic gradient descent.As mentioned earlier in the chapter, RNNs can deal with arbitrarily long inputs and correspondingly, they need to be trained on arbitrarily long inputs. Figure 6-7illustrates how an RNN is unrolled for different sizes of inputs. Note that once the RNN is unrolled, the process of training the RNN is identical to training a regular neural network which we have covered in earlier chapters. In Figure 6-7the RNN described in Figure 6-1is unrolled for input sizes of 1,2,3 and 4.
CHAPTER 6 ■RECURRENT NEURAL NETWORKS87Figure 6-7.Unrolling the RNN corresponding to Figure 6-1for different sizes of inputs
CHAPTER 6 ■RECURRENT NEURAL NETWORKS88Figure 6-8.Teacher Forcing (Top – Training, Bottom - Prediction)Given that the data set to be trained on consists of sequences of varying sizes, the input sequences are grouped so that the sequences of the same size fall in one group. Then for a group, we can unroll the RNN for the sequence length and train. Training for a different group will require the RNN to be unrolled for a different sequence length. Thus, it is possible to train the RNN on inputs of varying sizes by unrolling and training with the unrolling done based on the sequence length.
CHAPTER 6 ■RECURRENT NEURAL NETWORKS89It must be noted that training the unrolled RNN (illustrated in Figure 6-1) is essentially a sequential process, as the hidden states are dependent on each other. In the case of RNNs wherein the recurrence is over the output instead of the hidden state (Figure 6-2), it is possible to use a technique called teacher forcing as illustrated in Figure 6-8. The key idea here is to use yt-()1instead of ˆyt-1in the computation of h(t)while training. While making predictions (when the model is deployed for usage), however, ˆyt-1is used.Bidirectional RNNsLet us now take a look at another variation on RNNs, namely, the bidirectional RNN. The key intuition behind a bidirectional RNN is to use the entities that lie further in the sequence to make a prediction for the current entity. For all the RNNs we have considered so far we have been using the previous entities (captured by the hidden state) and the current entity in the sequence to make the prediction. However, we have not been using information concerning the entities that lie further in the sequence to make predictions. A bidirectional RNN leverages this information and can give improved predictive accuracy in many cases.A bidirectional RNN can be described using the following equations:hUxWhbftftftf+=++tanh1btbtbtb-+tanh1ˆysoftmaxVhVhctbbtfft+The following points are to be noted:1. The RNN computation involves first computing the forward hidden state and backward hidden state for an entity in the sequence. This is denoted by hf(t)and hb(t)respectively.