bocchieri93vectorquantization - VECTOR QUANTIZATION FOR THE...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
VECTOR QUANTIZATION FOR THE EFFICIENT COMPUTATION OF CONTINUOUS DENSITY LIKELIHOODS Enrico Bocchieri Speech Research Dept., AT&T Bell Laboratories, MH-2C 568, Murray Hill, N.J. 07974 ABSTRACT In speech recognition systems based on Continuous Observation Density Hidden Markov Models, the computation of the state likelihoods is an intensive task. This paper presents an efficient method for the computation of the likelihoods defined by weighted sums (mixtures) of Gaussians. The proposed method uses vector quantization of the input feature vector to identify a subset of Gaussian neighbors. It is here shown that, under certain conditions, instead of computing the likelihoods of all the Gaussians, one needs to compute the likelihoods of only the Gaussian neighbors. Significant (up to a factor of nine) likelihood computation reductions have been obtained on various data bases, with only a small loss of recognition accuracy. 1. MOTIVATION AND ALGORITHM OVERVIEW In speech recognition algorithms based on Hidden Markov Models (HMM), the state likelihoods of the input observation vectors are typically represented by either discrete or parametric (continuous) distributions. In the continuous case, the state likelihood computation is an intensive task, that requires up to 96% of the total computation in a typical small vocabulary application (connected digit recognition), and more than 80% of the total computation in a large vocabulary system based on Beam-Search of the state path (DARPA Resource Management). The likelihood computation is intensive also for tied-mixture systems, as discussed in [ 11. Therefore, an efficient likelihood computation method can significantly reduce the computational requirements of a recognition system. In parbcular, we have studied an efficient algorithm for the computation of the state likelihoods represented by Gaussian mixtures. The generic Gaussian component likelihood of frame feature vector fi is denoted by: G( fr, L, U,) 9 m = 1, M (1) where and U, are the mean vector and diagonal covariance of the mth Gaussian component, respectively, and M is the total number of the mixture components of all the states of all the words (speech units) in the vocabulary. The likelihood Z,(f,) of observation f,, given a certain state s, is computed as: ls(fi) = Em G( ft, Pm, 9 m E M, E,=1 (2) m E M, where E, is the mixture weight and M, is the subset of mixture components (1) that belong to state s. It is well known that the Gaussian models (1) are statistically accurate only if the input feature vector is near to the Gaussian means. The Gaussian model provides at best a poor approximation of the likelihood (and a small likelihood contribution) when the feature vector falls on its distribution tail (outlier feature vector). The proposed method computes only the likelihoods of those Gaussians for which the input vector is not an outlier. During system training, all the mixture components (1) are clustered into neighborhoods. A vector quantizer, consisting of one
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 4

bocchieri93vectorquantization - VECTOR QUANTIZATION FOR THE...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online