Towards Creating Precision Grammars from Interlinear Glossed Text - Towards Creating Precision Grammars from Interlinear Glossed Text Inferring

Towards Creating Precision Grammars from Interlinear Glossed Text

This preview shows page 1 - 2 out of 10 pages.

Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities , pages 74–83, Sofia, Bulgaria, August 8 2013. c 2013 Association for Computational Linguistics Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties Emily M. Bender Michael Wayne Goodman Joshua Crowgey Fei Xia Department of Linguistics University of Washington Seattle WA 98195-4340 { ebender,goodmami,jcrowgey,fxia } @uw.edu Abstract We propose to bring together two kinds of linguistic resources—interlinear glossed text (IGT) and a language-independent precision grammar resource—to automat- ically create precision grammars in the context of language documentation. This paper takes the first steps in that direction by extracting major-constituent word or- der and case system properties from IGT for a diverse sample of languages. 1 Introduction Hale et al. (1992) predicted that more than 90% of the world’s approximately 7,000 languages will become extinct by the year 2100. This is a crisis not only for the field of linguistics—on track to lose the majority of its primary data—but also a crisis for the social sciences more broadly as lan- guages are a key piece of cultural heritage. The field of linguistics has responded with increased efforts to document endangered languages. Lan- guage documentation not only captures key lin- guistic data (both primary data and analytical facts) but also supports language revitalization ef- forts. It must include both primary data collec- tion (as in Abney and Bird’s (2010) universal cor- pus) and analytical work elucidating the linguistic structures of each language. As such, the outputs of documentary linguistics are dictionaries, de- scriptive (prose) grammars as well as transcribed and translated texts (Woodbury, 2003). Traditionally, these outputs were printed ar- tifacts, but the field of documentary linguistics has increasingly realized the benefits of producing digital artifacts as well (Nordhoff and Poggeman, 2012). Bender et al. (2012a) argue that the docu- mentary value of electronic descriptive grammars can be significantly enhanced by pairing them with implemented (machine-readable) precision gram- mars and grammar-derived treebanks. However, the creation of such precision grammars is time consuming, and the cost of developing them must be brought down if they are to be effectively inte- grated into language documentation projects. In this work, we are interested in leveraging existing linguistic resources of two distinct types in order to facilitate the development of precision grammars for language documentation. The first type of linguistic resource is collections of inter- linear glossed text (IGT), a typical format for dis- playing linguistic examples. A sample of IGT from Shona is shown in (1).
Image of page 1
Image of page 2

You've reached the end of your free preview.

Want to read all 10 pages?

  • Summer '19
  • IGT, grammar matrix

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture