viterbi.4 - Natural Language Processing Natural Language Processing POS Tag Sets Part-of-Speech Tagging There is no universally agreed upon set of

viterbi.4 - Natural Language Processing Natural Language...

This preview shows page 1 - 3 out of 5 pages.

Natural Language Processing 1 Part-of-Speech Tagging Each word must be assigned its correct part-of-speech, such as noun, verb, adjective, or adverb, based on its function in a sentence. Simple heuristics go a long way! You get about 90% accuracy by choosing the most frequent tag for a word based on a large training corpus. Most POS taggers are statistical or rule-based. Statistical taggers can achieve about 97% accuracy, but they require training data and do not explicitly represent intuitive rules. Natural Language Processing 2 POS Tag Sets There is no universally agreed upon set of part-of-speech tags! The size of different tag sets can vary a lot. Penn Treebank uses 45 tags Original Brown Corpus used 87 tags British National Corpus Basic Tagset (C5) used 61 tags Enriched C6 tagset used 160 tags CMU POS tagger for Twitter used 25 tags Natural Language Processing 3 Rule-based POS Tagging Rule-based taggers rely on a dictionary to provide possible POS tags for a word, or rules can be learned using training data. Manually developed disambiguation rules can perform reasonably well. Example rules: If preceding word = ART, then disambiguate { NOUN,VERB } as NOUN. If a possible verb does not agree in number with the preceding NP, then eliminate the verb tag. If the preceding word takes an S complement, then tag “that” as a subordinating conjunction (vs. determiner). Natural Language Processing 4 Statistical Part-of-Speech Tagging Statistical part-of-speech tagging involves selecting the most likely sequence of tags for the words in a sentence. What we really want to calculate is: P ( T 1 ...T n | w 1 ...w n ) but this would require an unreasonable amount of data. We could apply Bayes’ rule and calculate: ( P ( T 1 ...T n ) * P ( w 1 ...w n | T 1 ...T n )) /P ( w 1 ...w n ) but this still requires too much data. Instead, we can approximate this function by making independence assumptions based on part-of-speech tag bigrams and lexical generation probabilities.
Image of page 1
Natural Language Processing 5 A Complete POS Tagging Model using Tag Bigrams P ( T 1 ...T n | w 1 ...w n ) P ( T 1 ...T n ) * P ( w 1 ...w n | T 1 ...T n ) P ( w 1 ...w n ) n productdisplay i =1 P ( T i | T i - 1 ) * P ( w i | T i ) Natural Language Processing 6 Probability Definitions We use bigram transition probabilities
Image of page 2
Image of page 3

You've reached the end of your free preview.

Want to read all 5 pages?

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors