jurafsky&martin_3rdEd_17 (1).pdf

Cally including part of speech tagging lemmatization

Info icon This preview shows pages 309–311. Sign up to view the full content.

View Full Document Right Arrow Icon
cally including part-of-speech tagging, lemmatization, and, in some cases, syntactic parsing to reveal headwords and dependency relations. Context features relevant to the target word can then be extracted from this enriched input. A feature vector feature vector consisting of numeric or nominal values encodes this linguistic information as an input to most machine learning algorithms. Two classes of features are generally extracted from these neighboring contexts, both of which we have seen previously in part-of-speech tagging: collocational fea- tures and bag-of-words features. A collocation is a word or series of words in a collocation position-specific relationship to a target word (i.e., exactly one word to the right, or the two words starting 3 words to the left, and so on). Thus, collocational features collocational features encode information about specific positions located to the left or right of the target word. Typical features extracted for these context words include the word itself, the root form of the word, and the word’s part-of-speech. Such features are effective at encoding local lexical and grammatical information that can often accurately isolate a given sense. For example consider the ambiguous word bass in the following WSJ sentence: (17.17) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. A collocational feature vector, extracted from a window of two words to the right and left of the target word, made up of the words themselves, their respective parts- of-speech, and pairs of words, that is, [ w i - 2 , POS i - 2 , w i - 1 , POS i - 1 , w i + 1 , POS i + 1 , w i + 2 , POS i + 2 , w i - 1 i - 2 , w i + 1 i ] (17.18) would yield the following vector: [guitar, NN, and, CC, player, NN, stand, VB, and guitar, player stand] High performing systems generally use POS tags and word collocations of length 1, 2, and 3 from a window of words 3 to the left and 3 to the right (Zhong and Ng, 2010) . The second type of feature consists of bag-of-words information about neigh- boring words. A bag-of-words means an unordered set of words, with their exact bag-of-words
Image of page 309

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
310 C HAPTER 17 C OMPUTING WITH W ORD S ENSES position ignored. The simplest bag-of-words approach represents the context of a target word by a vector of features, each binary feature indicating whether a vocab- ulary word w does or doesn’t occur in the context. This vocabulary is typically pre-selected as some useful subset of words in a training corpus. In most WSD applications, the context region surrounding the target word is generally a small, symmetric, fixed-size window with the target word at the center. Bag-of-word features are effective at capturing the general topic of the dis- course in which the target word has occurred. This, in turn, tends to identify senses of a word that are specific to certain domains. We generally don’t use stopwords, punctuation, or number as features, and words are lemmatized and lower-cased. In
Image of page 310
Image of page 311
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern