This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 186 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 1, JANUARY 2008 Unsupervised Pattern Discovery in Speech Alex S. Park , Member, IEEE , and James R. Glass , Senior Member, IEEE Abstract— We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsuper- vised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demon- strate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multiword phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream. Index Terms— Speech processing, unsupervised pattern dis- covery, word acquisition. I. INTRODUCTION O VER the last several decades, significant progress has been made in developing automatic speech recognition (ASR) systems which are now capable of performing large-vo- cabulary continuous speech recognition , . In spite of this progress, the underlying paradigm of most approaches to speech recognition has remained the same. The problem is cast as one of classification, where input data (speech) is segmented and classified into a preexisting set of known cat- egories (words). Discovering where these word entities come from is typically not addressed. This problem is of interest to us because it represents a key difference in the language pro- cessing strategies employed by humans and machines. Equally important, it raises the question of how much can be learned from speech data alone, in the absence of supervised input. In this paper, we propose a computational technique for ex- tracting words and linguistic entities from speech without super- vision. The inspiration for our unsupervised approach to speech processing comes from two sources. The first source comes Manuscript received January 10, 2007; revised July 26, 2007. This work was supported by the National Science Foundation under Grant #IIS-0415865. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Helen Meng....
View Full Document
This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.
- Spring '10