summit-tutorial - Massachusetts Institute of Technology...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.345 / HST.728 Automatic Speech Recognition Spring, 2007 SUMMIT Speech Recognizer – Tutorial 1 1 B UILDING A R ECOGNIZER U SING SUMMIT: A T UTORIAL This set of tutorial documents describe the steps required to build a recognizer using the MIT SUMMIT speech recognition system. SUMMIT is a probabilistic segment-based recognizer which is implemented within a Finite State Transducer (FST) framework. It resides inside a research software environment called SAPPHIRE. The tutorial documentation is broken into the various sections listed below. Please read each section carefully. Please also note that the documentation will often use the italics font to refer to variable filenames (i.e., filename which will change depending on the domain of the recognizer or the personal choice of the user). The most common use of this convention will be the variable ‘ domain ’. This variable is set in the as the default name of the domain which you are building your recognizer for. For example, the filename name domain .espec would be realized as pegasus.espec in the PEGASUS domain. 2 O VERVIEW OF SUMMIT 2.1 The SLS SAPPHIRE Package SUMMIT is an automatic speech recognition system which runs within a software package called SAPPHIRE . SUMMIT utilizes acoustic models, pronunciation models, and language models within a finite-state transducer search mechanism to determine sentence hypotheses for spoken utterances. To get started, the environment variable SLS_HOME must be set to point to the location of the ‘sls’ directory tree on your fileserver. For example: setenv SLS_HOME /usr/Galaxy/sls Next, the directory $SLS_HOME/bin should be added to your executable path. For example: set path = ( $SLS_HOME/bin $path ) It is advisable to add these commands to your shell initialization file (e.g., your .cshrc file). 2.2 Organizing Data During training and testing, the SUMMIT recognizer utilizes speech waveforms stored. These waveform files can be stored in either the Microsoft RIFF format or the NIST SPHERE waveform format. 1 This document is adapted from online documentation released with the MIT SLS GALAXY system in 2002. Despite best efforts, some of the sections may be out-of-date.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Orthographic transcriptions of the waveforms are required to train and test a recognizer. Details about our methods and tools for transcribing data are provided in section 3. To facilitate the training and testing procedures, a database of recorded speech, commonly called a corpus , can be constructed. The corpus records information about each speech waveform and organizes the data in sets specified by the creator of the corpus. Details about how to create a corpus can be found in section 4. To determine the list of utterances used by SAPPHIRE for any particular job, a control file must be
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 25

summit-tutorial - Massachusetts Institute of Technology...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online