{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

StolckeSRILM

StolckeSRILM - SRILM AN EXTENSIBLE LANGUAGE MODELING...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
SRILM — AN EXTENSIBLE LANGUAGE MODELING TOOLKIT Andreas Stolcke Speech Technology and Research Laboratory SRI International, Menlo Park, CA, U.S.A. http://www.speech.sri.com/ ABSTRACT SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimen- tation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a vari- ety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipu- lation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and imple- mentation, highlighting ease of rapid prototyping, reusability, and combinability of tools. 1. INTRODUCTION Statistical language modeling is the science (and often art) of building models that estimate the prior probabilities of word strings. Language modeling has many applications in natural lan- guage technology and other areas where sequences of discrete ob- jects play a role, with prominent roles in speech recognition and natural language tagging (including specialized tasks such as part- of-speech tagging, word and sentence segmentation, and shallow parsing). As pointed out in [1], the main techniques for effec- tive language modeling have been known for at least a decade, al- though one suspects that important advances are possible, and in- deed needed, to bring about significant breakthroughs in the appli- cation areas cited above—such breakthroughs just have been very hard to come by [2, 3]. Various software packages for statistical language modeling have been in use for many years—the basic algorithms are simple enough that one can easily implement them with reasonable effort for research use. One such package, the CMU-Cambridge LM toolkit [1], has been in wide use in the research community and has greatly facilitated the construction of language models (LMs) for many practitioners. This paper describes a fairly recent addition to the set of publicly available LM tools, the SRI Language Modeling Toolkit ( SRILM ). Compared to existing LM tools, SRILM offers a pro- gramming interface and an extensible set of LM classes, several non-standard LM types, and more a comprehensive functionality that goes beyond language modeling to include tagging, N-best rescoring, and other applications. This paper describes the design philosophy and key implementation choices in SRILM , summa- rizes its capabilities, and concludes by discussing deficiencies and plans for future development. For lack of space we must refer to other publications for an introduction to language modeling and its role in speech recognition and other areas [3, 4].
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern