jurafsky&martin_3rdEd_17 (1).pdf

Function s tochastic g radient d escent l f x y

Info icon This preview shows pages 115–117. Sign up to view the full content.

function S TOCHASTIC G RADIENT D ESCENT ( L () , f () , x , y ) returns q # where: L is the loss function # f is a function parameterized by q # x is the set of training inputs x ( 1 ) , x ( 2 ) ,..., x ( n ) # y is the set of training outputs (labels) y ( 1 ) , y ( 2 ) ,..., y ( n ) q small random values while not done Sample a training tuple ( x ( i ) , y ( i ) ) Compute the loss L ( f ( x ( i ) ; q ) , y ( i ) ) # How far off is f ( x ( i ) ) from y ( i ) ? g q L ( f ( x ( i ) ; q ) , y ( i ) ) # How should we move q to maximize loss ? q q - h k g # go the other way instead Figure 8.11 The stochastic gradient descent algorithm, after (Goldberg, 2017) . Stochastic gradient descent is called stochastic because it chooses a single ran- dom example at a time, moving the weights so as to improve performance on that single example. That can result in very choppy movements, so an alternative version of the algorithm, minibatch gradient descent, computes the gradient over batches of minibatch training instances rather than a single instance. The learning rate h k is a parameter that must be adjusted. If it’s too high, the learner will take steps that are too large, overshooting the minimum of the loss func- tion. If it’s too low, the learner will take steps that are too small, and take too long to get to the minimum. It is most common to being the learning rate at a higher value, and then slowly decrease it, so that it is a function of the iteration k of training. 8.5 Neural Language Models Now that we’ve introduced neural networks it’s time to see an application. The first application we’ll consider is language modeling: predicting upcoming words from prior word context. Although we have already introduced a perfectly useful language modeling paradigm (the smoothed N-grams of Chapter 4), neural net-based language models turn out to have many advantages. Among these are that neural language models don’t need smoothing, they can handle much longer histories, and they can generalize over contexts of similar words. Furthermore, neural net language models underlie many of the models we’ll introduce for generation, summarization, machine translation, and dialog.
Image of page 115

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

116 C HAPTER 8 N EURAL N ETWORKS AND N EURAL L ANGUAGE M ODELS On the other hand, there is a cost for this improved performance: neural net language models are strikingly slower to train than traditional language models, and so for many tasks traditional language modeling is still the right technology. In this chapter we’ll describe simple feedforward neural language models, first introduced by Bengio et al. (2003b) . We will turn to the recurrent language model, more commonly used today, in Chapter 9b. A feedforward neural LM is a standard feedforward network that takes as input at time t a representation of some number of previous words ( w t - 1 , w t - 2 , etc) and outputs a probability distribution over possible next words. Thus, like the traditional LM the feedforward neural LM approximates the probability of a word given the entire prior context P ( w t | w t - 1 1 ) by approximating based on the N previous words: P ( w t | w t - 1 1 ) P ( w t | w t - 1 t - N + 1 ) (8.23)
Image of page 116
Image of page 117
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern