jurafsky&martin_3rdEd_17 (1).pdf

Lsa embeddings generally set k300 so these embeddings

Info icon This preview shows pages 288–290. Sign up to view the full content.

View Full Document Right Arrow Icon
LSA embeddings generally set k=300, so these embeddings are relatively short by comparison to other dense embeddings. Instead of PPMI or tf-idf weighting on the original term-document matrix, LSA implementations generally use a particular weighting of each co-occurrence cell that multiplies two weights called the local and global weights for each cell ( i , j ) —term i in document j . The local weight of each term i is its log frequency: log f ( i , j )+ 1 The global weight of term i is a version of its entropy: 1 + P j p ( i , j ) log p ( i , j ) log D , where D
Image of page 288

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
16.1 D ENSE V ECTORS VIA SVD 289 is the number of documents. LSA has also been proposed as a cognitive model for human language use (Lan- dauer and Dumais, 1997) and applied to a wide variety of NLP applications; see the end of the chapter for details. 16.1.2 SVD applied to word-context matrices Rather than applying SVD to the term-document matrix (as in the LSA algorithm of the previous section), an alternative that is widely practiced is to apply SVD to the word-word or word-context matrix. In this version the context dimensions are words rather than documents, an idea first proposed by Sch¨utze (1992b) . The mathematics is identical to what is described in Fig. 16.2 : SVD factorizes the word-context matrix X into three matrices W , S , and C T . The only difference is that we are starting from a PPMI-weighted word-word matrix, instead of a term- document matrix. Once again only the top k dimensions are retained (corresponding to the k most important singular values), leading to a reduced | V | k matrix W k , with one k - dimensioned row per word. Just as with LSA, this row acts as a dense k -dimensional vector (embedding) representing that word. The other matrices ( S and C ) are simply thrown away. 1 This use of just the top dimensions, whether for a term-document matrix like LSA, or for a term-term matrix, is called truncated SVD . Truncated SVD is pa- truncated SVD rameterized by k , the number of dimensions in the representation for each word, typically ranging from 500 to 5000. Thus SVD run on term-context matrices tends to use many more dimensions than the 300-dimensional embeddings produced by LSA. This difference presumably has something to do with the difference in granu- larity; LSA counts for words are much coarser-grained, counting the co-occurrences in an entire document, while word-context PPMI matrices count words in a small window. Generally the dimensions we keep are the highest-order dimensions, al- though for some tasks, it helps to throw out a small number of the most high-order dimensions, such as the first 1 or even the first 50 (Lapesa and Evert, 2014) . Fig. 16.3 shows a high-level sketch of the entire SVD process. The dense em- beddings produced by SVD sometimes perform better than the raw PPMI matrices on semantic tasks like word similarity. Various aspects of the dimensionality reduc- tion seem to be contributing to the increased performance. If low-order dimensions represent unimportant information, the truncated SVD may be acting to removing noise. By removing parameters, the truncation may also help the models generalize
Image of page 289
Image of page 290
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern