This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: EM TRAINING OF FINITESTATE TRANSDUCERS AND ITS APPLICATION TO PRONUNCIATION MODELING Han Shu and I. Lee Hetherington Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139 USA { hshu,ilh } @sls.lcs.mit.edu ABSTRACT Recently, finitestate transducers (FSTs) have been shown to be useful for a number of applications in speech and lan guage processing. FST operations such as composition, de terminization, and minimization make manipulating FSTs very simple. In this paper, we present a method to learn weights for arbitrary FSTs using the EM algorithm. We show that this FST EM algorithm is able to learn pronun ciation weights that improve the word error rate for a spon taneous speech recognition task. 1. INTRODUCTION Recently, finitestate transducers (FSTs) have been shown to be useful for a number of applications in speech and language processing [1]. For example, the summit segment based speech recognizer successfully utilizes FSTs to specify various constraints [2]. FST operations such as composi tion, determinization, and minimization make manipulat ing FSTs very simple. In this paper, we present a method to learn weights for arbitrary FSTs using the EM algorithm [3]. Our method is similar to that of [4]; however we do not explicitly make use of the “expectation semiring.” To test the FST EM algorithm, we apply the algorithm to the problem of learning pronunciation weights. With phonological rules and multiple phonemic pronunciations, the pronunciation graph for spontaneous speech can have high branching factors. Pronunciation weighting has been shown to be beneficial for segmentbased speech recogni tion [5]. In [5] withinword pronunciation weights were ML estimated from training examples. In this paper, we experi ment with learning various pronunciation weights on phono logical rules and phonemic pronunciations via the proposed FST EM algorithm. 2. PROBABILISTIC INTERPRETATION OF FSTS A weighted FST, T , assigns a weight, or score, to each com plete path through it, where a path corresponds to a partic This research was supported by DARPA under contract N660019918904 monitored through Naval Command, Control and Ocean Surveillance Center. ular input and output label sequence ( x, y ). The interpre tation of the weights depends on how they are manipulated algebraically, and the algebraic structure is a semiring. 2.1. Weight Semirings A semiring ( , ⊕ , ⊗ , , 1) defines the set containing the weights, the operators ⊕ and ⊗ , with the identity elements and 1 such that for all a, b, c ∈ , a ⊕ 0 = a , ⊕ a = a , a ⊗ 1 = a , 1 ⊗ a = a , a ⊗ 0 = 0, ⊗ a = 0, ( a ⊕ b ) ⊗ c = ( a ⊗ c ) ⊕ ( b ⊗ c ), and a ⊗ ( b ⊕ c ) = ( a ⊗ b ) ⊕ ( a ⊗ c ) [1]. When manipulating weighted transducers, the ⊗ and ⊕ operators are used to combine weights in series and parallel, respectively....
View
Full
Document
This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.
 Spring '10
 Glass
 Computer Science

Click to edit the document details