jurafsky&martin_3rdEd_17 (1).pdf

Choose which is the correct synonym as in the example

Info icon This preview shows pages 284–286. Sign up to view the full content.

View Full Document Right Arrow Icon
choose which is the correct synonym, as in the example: Levied is closest in mean- ing to: imposed, believed, requested, correlated (Landauer and Dumais, 1997) . All of these datasets present words without context. Slightly more realistic are intrinsic similarity tasks that include context. The Stanford Contextual Word Similarity (SCWS) dataset (Huang et al., 2012) offers a richer evaluation scenario, giving human judgments on 2,003 pairs of words in their sentential context, including nouns, verbs, and adjectives. This dataset enables the evaluation of word similarity algorithms that can make use of context words. The semantic textual similarity task ( Agirre et al. 2012 , Agirre et al. 2015 ) evaluates the performance of sentence-level similarity algorithms, consisting of a set of pairs of sentences, each pair with human-labeled similarity scores. Another task used for evaluate is an analogy task, where the system has to solve problems of the form a is to b as c is to d , given a, b, and c and having to find d . The system is given two words that participate in a relation (for example Athens and Greece , which participate in the capital relation) and a word like Oslo and must find the word Norway . Or more syntactically-oriented examples: given mouse , mice , and dollar the system must return dollars . Large sets of such tuples have been created ( Mikolov et al. 2013 , Mikolov et al. 2013b ). 15.6 Summary The term-document matrix, first created for information retrieval, has rows for each word ( term ) in the vocabulary and a column for each document. The cell specify the count of that term in the document. The word-context (or word-word , or term-term ) matrix has a row for each (target) word in the vocabulary and a column for each context term in the vocabulary. Each cell indicates the number of times the context term occurs in a window (of a specified size) around the target word in a corpus. A common weighting for the Instead of using the raw word word co-occurrence matrix, it is often weighted. A common weighting is positive pointwise mu- tual information or PPMI . Alternative weightings are tf-idf , used for information retrieval task, and significance- based methods like t-test . PPMI and other versions of the word-word matrix can be viewed as offering high-dimensional, sparse (most values are 0) vector representations of words. The cosine of two vectors is a common function used for word similarity. Bibliographical and Historical Notes Models of distributional word similarity arose out of research in linguistics and psy- chology of the 1950s. The idea that meaning was related to distribution of words
Image of page 284

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
E XERCISES 285 in context was widespread in linguistic theory of the 1950s; even before the well- known Firth (1957) and Harris (1968) dictums discussed earlier, Joos (1950) stated that the linguist’s “meaning” of a morpheme. . . is by definition the set of conditional probabilities of its occurrence in context with all other morphemes.
Image of page 285
Image of page 286
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern