10.1.1.51.5434 - Multi-Microphone Correlation-Based...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Multi-Microphone Correlation-Based Processing for Robust Automatic Speech Recognition by Thomas M. Sullivan Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Submitted to the Department of Electrical and Computer Engineering in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy. August 1996
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Page 2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . 8 1.1. The Cross-Condition Problem . . . . . . . . . . . . . . . . . . . . 8 1.2. Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3. Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2. Background . . . . . . . . . . . . . . . . . . . . .12 2.1. Delay-and-Sum Beamforming . . . . . . . . . . . . . . . . . . . 12 2.1.1. Application of Delay-and-Sum Processing to Speech Recognition . . 13 2.2. Traditional Adaptive Arrays . . . . . . . . . . . . . . . . . . . . 13 2.2.1. Adaptive Noise Cancelling . . . . . . . . . . . . . . . . . . 15 2.2.2. Application of Traditional Adaptive Methods to Speech Recognition . 16 2.3. Cross-Correlation Based Arrays . . . . . . . . . . . . . . . . . . 18 2.3.1. Phenomena . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.2. Binaural Models . . . . . . . . . . . . . . . . . . . . . . 20 2.4. Other Multi-Channel Processing Systems . . . . . . . . . . . . . . 23 2.4.1. Binaural Dereverberation . . . . . . . . . . . . . . . . . . 23 2.4.2. Binaural Processing Systems . . . . . . . . . . . . . . . . . 23 2.4.3. Sub-Band Multi-Channel Processing. . . . . . . . . . . . . . . 23 2.4.4. Recent Binaural Methods . . . . . . . . . . . . . . . . . . 24 2.5. Monophonic Enhancement Techniques . . . . . . . . . . . . . . . 24 2.5.1. Spectral Subtraction . . . . . . . . . . . . . . . . . . . . 25 2.5.2. Environmental Normalization . . . . . . . . . . . . . . . . . 26 2.5.3. Homomorphic Processing . . . . . . . . . . . . . . . . . . 26 2.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.7. Goals for this Thesis . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 3. The SPHINX-I Speech Recognition System . . . . . . . .29 3.1. An Overview of the SPHINX-I System . . . . . . . . . . . . . . . 29 3.2. Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3. Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . 30 3.4. Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . 31 3.5. System Training . . . . . . . . . . . . . . . . . . . . . . . . 33 Chapter 4. Pilot Experiment Using an Existing Delay-and-Sum Array . .35 4.1. Pilot Experiment with a Working Array . . . . . . . . . . . . . . . 35 4.1.1. Experimental Procedure . . . . . . . . . . . . . . . . . . . 35 4.1.2. Array System Description . . . . . . . . . . . . . . . . . . 36 4.1.3. A Discussion on Spatial Aliasing . . . . . . . . . . . . . . . 37 4.1.4. Experimental Results . . . . . . . . . . . . . . . . . . . . 38 Chapter 5. An Algorithm for Correlation-Based Processing of Speech . .40 5.1. The Correlation-Based Array Processing Algorithm . . . . . . . . . . 40 5.2. Details of the Correlation-based Processing Algorithm . . . . . . . . . 42 5.2.1. Sensors and Spacing . . . . . . . . . . . . . . . . . . . . 42 5.2.2. Steering Delays . . . . . . . . . . . . . . . . . . . . . . 42 5.2.3. Filterbank. . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.4. Rectification. . . . . . . . . . . . . . . . . . . . . . . . 46
Background image of page 2
Page 3 5.2.5. Correlation . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2.6. Feature Vector . . . . . . . . . . . . . . . . . . . . . . . 48 5.3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 6. Pilot Experiments Using Correlation-Based Processing . . .49 6.1. Amplitude Ratios of Tones Plus Noise through Basic Bandpass Filters . . . 49 6.2. Amplitude Ratios of Tones Plus Noise through Auditory Filters. . . . . . 56 6.3. Spectral Profiles of Speech Corrupted with Noise . . . . . . . . . . . 56 6.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Chapter 7. Speech Recognition Experiments . . . . . . . . . . . .61 7.1. Effects of Component Choices and Parameter Values on Recognition Accuracy62 7.1.1. Effect of Rectifier Shape . . . . . . . . . . . . . . . . . . . 62 7.1.2. Effect of the Number of Input Channels . . . . . . . . . . . . . 66 7.1.3. Implementation of Steering Delays. . . . . . . . . . . . . . . 67 7.1.4. Effects of Microphone Spacing and the Use of Interleaved Arrays . . 70 7.1.5. Effect of the Shape of the Peripheral Filterbank . . . . . . . . . . 74 7.1.6. Reexamination of Rectifier Shape . . . . . . . . . . . . . . . 75 7.1.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2. Comparison of Recognition Accuracy Obtained with Cross-Correlation Processing, De- lay-and-Sum Beamforming, and Traditional Adaptive Filtering . . . . . . . . 78 7.2.1.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 113

10.1.1.51.5434 - Multi-Microphone Correlation-Based...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online