Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
ABSTRACT In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, loosely based on the human binaural hearing system, con- sists of passing the speech signals detected by multiple micro- phones through bandpass filtering and nonlinear rectification operations, and then cross-correlating the outputs from each chan- nel within each frequency band. These operations provide an esti- mate of the energy contained in the speech signal in each frequency band, and provides rejection of off-axis jamming noise sources. We demonstrate that this method increases recognition accuracy for a multi-channel signal compared to equivalent pro- cessing of a monaural signal. 1. INTRODUCTION The need for speech recognition systems and spoken language sys- tems to be robust with respect to their acoustical environment has become more widely appreciated in recent years. Results of sev- eral studies have demonstrated that even automatic speech recog- nition systems that are designed to be speaker independent can perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained, even in a relatively quiet office environment ( e.g. [1]). Applications such as speech recognition over tele- phones, in automobiles, on a factory floor, or outdoors demand an even greater degree of environmental robustness. The proposed paper describes a novel algorithm for combining the outputs of multiple microphones that improves the recognition accuracy of automatic speech recognition systems. Several different types of array processing strategies have been applied to speech recognition systems. The simplest such system is the delay-and-sum beamformer ( e.g. [2]). In delay-and-sum sys- tems, steering delays are applied at the outputs of the microphones to compensate for arrival time differences between microphones to a desired signal, reinforcing the desired signal over other signals present. A second approach is to use an adaptive algorithm based on minimizing mean square energy, such as the Frost or the Grif- fiths-Jim algorithm [3]. These algorithms can provide nulls in the direction of undesired noise sources, as well as greater sensitivity in the direction of the desired signal, but they assume that the desired signal is statistically independent of all sources of degrada- MULTI-MICROPHONE CORRELATION-BASED PROCESSING FOR ROBUST SPEECH RECOGNITION Thomas M. Sullivan and Richard M. Stern Department of Electrical and Computer Engineering School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 tion. Consequently, they do not perform well in environments when the distortion is at least in part a delayed version of the desired speech signal as is the case in many typical reverberant rooms ( e.g. [4]). (This problem can be avoided by only adapting during non-speech segments [5]).
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/28/2010 for the course EE EE564 taught by Professor Runyiyu during the Spring '10 term at Eastern Mediterranean University.


This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online