{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

3 - Speech and Language Processing An introduction to...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
DRAFT Speech and Language Processing: An introduction to speech recognition, natural language processing, and computational linguistics. Daniel Jurafsky & James H. Martin. Copyright c circlecopyrt 2006, All rights reserved. Draft of June 30, 2007. Do not cite without permission. 3 WORDS & TRANSDUCERS How can there be any sin in sincere? Where is the good in goodbye? Meredith Wilson, The Music Man Ch. 2 introduced the regular expression, showing for example how a single search string could help a web search engine find both woodchuck and woodchucks . Hunting for singular or plural woodchucks was easy; the plural just tacks an s on to the end. But suppose we were looking for another fascinating woodland creatures; let’s say a fox , and a fish , that surly peccary and perhaps a Canadian wild goose . Hunting for the plurals of these animals takes more than just tacking on an s . The plural of fox is foxes ; of peccary , peccaries ; and of goose , geese . To confuse matters further, fish don’t usually change their form when they are plural 1 . It takes two kinds of knowledge to correctly search for singulars and plurals of these forms. Orthographic rules tell us that English words ending in -y are pluralized by changing the -y to -i- and adding an -es . Morphological rules tell us that fish has a null plural, and that the plural of goose is formed by changing the vowel. The problem of recognizing that a word (like foxes ) breaks down into component morphemes ( fox and -es ) and building a structured representation of this fact is called morphological parsing . MORPHOLOGICAL PARSING Parsing means taking an input and producing some sort of linguistic structure for it. PARSING We will use the term parsing very broadly throughout this book, including many kinds of structures that might be produced; morphological, syntactic, semantic, discourse; in the form of a string, or a tree, or a network. Morphological parsing or stemming applies to many affixes other than plurals; for example we might need to take any English verb form ending in -ing ( going , talking , congratulating ) and parse it into its verbal stem plus the -ing morpheme. So given the surface or input form going , we might want to SURFACE produce the parsed form VERB-go + GERUND-ing . Morphological parsing is important throughout speech and language processing. It plays a crucial role in part-of-speech tagging for morphologically complex languages like Russian or German, as we will see in Ch. 5. It is important for producing the large dictionaries that are necessary for robust spell-checking. We will need it in ma- 1 (see e.g., Seuss (1960))
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
DRAFT 2 Chapter 3. Words & Transducers chine translation to realize for example that the French words va and aller should both translate to forms of the English verb go . To solve the morphological parsing problem, why couldn’t we just store all the plural forms of English nouns and -ing forms of English verbs in a dictionary and do parsing by lookup? Sometimes we can do this, and for example for English speech recognition this is exactly what we do. But for many NLP applications this isn’t pos-
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}