This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: LOOSELY COUPLED HMMS FOR ASR H.J. Nock £ S.J. Young Cambridge University Engineering Department, Trumpington Street, Cambridge, CB2 1PZ, UK. f hjn11,sjy g @eng.cam.ac.uk ABSTRACT Hidden Markov Models (HMMs) have been successful for mod- elling the dynamics of carefully dictated speech, but their per- formance degrades severely when used to model conversational speech. This paper presents a preliminary feasibility study of an alternative class of models: loosely coupled HMMs . Since speech is produced by a system of loosely coupled articulators, stochastic models explicitly representing this parallelism may have advan- tages for automatic speech recognition (ASR), particularly when trying to model the phonological effects inherent in casual sponta- neous speech. The paper evaluates one coupled model on a simple ASR task, using both exact and approximate estimation schemes. We conclude such models merit further investigation. 1. INTRODUCTION Hidden Markov Models (HMMs) have been successful for mod- elling the dynamics of carefully dictated speech. However, their performance degrades severely when they are used to model con- versational speech, and it has been widely hypothesized that more sophisticated models will be required to achieve acceptable tran- scription performance on this type of data. This paper describes our preliminary investigations into an alternative class of models, which we describe informally as loosely coupled HMMs . Since speech is produced by a system of loosely coupled articulators, stochastic models which explicitly represent this parallelism may have advantages for automatic speech recognition (ASR), partic- ularly when trying to model the phonological effects inherent in casual spontaneous speech. Today’s large-vocabulary recognizers are constructed using the notion of phonemic segments, corresponding (roughly) to par- ticular configurations of the articulators. We build one or more statistical models for each element of the resulting inventory of speech segments (the phone set ) and model words as a simple concatenation of segments. However, both speech scientists and linguists agree that the notion of a phoneme or speech segment is not a realistic one. Whilst the phoneme concept may be adequate for carefully read speech, in which articulatory gestures corre- spond sufficiently closely to some abstract ideal, there is evidence from speech production studies showing that changes in speaking rate, manner and style can all lead to variation in the amplitude of and phase relations between articulatory gestures. These changes, £ H.J. Nock is supported by an EPSRC studentship. The authors thank Dr. M.J.F. Gales for helpful discussions....
View Full Document
This note was uploaded on 03/27/2010 for the course CS 123 taught by Professor Darghooz during the Spring '10 term at Albion College.
- Spring '10