Unformatted text preview: L INGUISTICS 384: L ANGUAGE AND C OMPUTERS P ART- OF-S PEECH ( A . K . A . POS OR P O S) T AGGING I N- CLASS E XERCISE 1 Introduction OK. So we all know what a part-of-speech (POS) tag is by now. But how does a POS tagger actually decide how to label a word? YOU NEED TO TRAIN IT! , yells the little voice in your head. This is true, we need a training set that has been labelled (by a human, presumably) with POS categories. 1 You also need a labelled test set so that you can see whether any tagger you train is any good. Youll typically instruct the tagger to label the words in a test set, and then peek at the answers to see how well it has done. So, with that in mind, were going to take a tiny little training set and use that set to distinguish between prepositions and verbs in a few test sentences. Once weve done this, you will know how to translate these sentences. But the translations will only be correct if the part-of-speech tags are. We will do this by looking at vari- ous information sources from the training set (things like What was the previous word?) and counting up and dividing to produce the probability that an unseen word has a particular POS. If we know the probability (a.k.a. the likelihood) of, say, verb vs. preposition , we can pick the POS with the highest probability. 2 Training So, given two sources of information namely What was the previous word? and What is the following word? we will teach our tagger to distinguish verbs from prepositions. That way, it can tag words it has never seen before (at least in the training set) by looking at the contexts of these words the very definition 2 of part of speech . Without further ado, heres the training set: 3 (1.) I swim verb in prep that pool....
