POSProb - L INGUISTICS 384: L ANGUAGE AND C OMPUTERS P ART-...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: L INGUISTICS 384: L ANGUAGE AND C OMPUTERS P ART- OF-S PEECH ( A . K . A . POS OR P O S) T AGGING I N- CLASS E XERCISE 1 Introduction OK. So we all know what a part-of-speech (POS) tag is by now. But how does a POS tagger actually decide how to label a word? YOU NEED TO TRAIN IT! , yells the little voice in your head. This is true, we need a training set that has been labelled (by a human, presumably) with POS categories. 1 You also need a labelled test set so that you can see whether any tagger you train is any good. Youll typically instruct the tagger to label the words in a test set, and then peek at the answers to see how well it has done. So, with that in mind, were going to take a tiny little training set and use that set to distinguish between prepositions and verbs in a few test sentences. Once weve done this, you will know how to translate these sentences. But the translations will only be correct if the part-of-speech tags are. We will do this by looking at vari- ous information sources from the training set (things like What was the previous word?) and counting up and dividing to produce the probability that an unseen word has a particular POS. If we know the probability (a.k.a. the likelihood) of, say, verb vs. preposition , we can pick the POS with the highest probability. 2 Training So, given two sources of information namely What was the previous word? and What is the following word? we will teach our tagger to distinguish verbs from prepositions. That way, it can tag words it has never seen before (at least in the training set) by looking at the contexts of these words the very definition 2 of part of speech . Without further ado, heres the training set: 3 (1.) I swim verb in prep that pool....
View Full Document

This note was uploaded on 04/21/2010 for the course SOCIOLOGY 549 taught by Professor Kamizi during the Fall '09 term at Ohio State.

Page1 / 3

POSProb - L INGUISTICS 384: L ANGUAGE AND C OMPUTERS P ART-...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online