zue04eightychallenges

zue04eightychallenges - Zue. Speech Input/Output...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Zue. Speech Input/Output Technologies Eighty Challenges Facing Speech Input/Output Technologies Victor Zue MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA, USA [email protected] ABSTRACT During the past three decades, we have witnessed remarkable progress in the development of speech input/output technologies. Despite these successes, we are far from reaching human capabilities of recognizing nearly perfectly the speech spoken by many speakers, under varying acoustic environments, with essentially unrestricted vocabulary. Synthetic speech still sounds stilted and robot-like, lacking in real personality and emotion. There are many challenges that will remain unmet unless we can advance our fundamental understanding of human communication – how speech is produced and perceived, utilizing our innate linguistic competence. This paper outlines some of these challenges, ranging from signal presentation and lexical access to language understanding and multimodal integration, and speculates on how these challenges could be met. 1. INTRODUCTION During the past three decades, we have witnessed remarkable progress in the development of speech input/output technologies. The ability to speak and listen to computers, as if they were human, no longer exists only in Hollywood fantasies and advanced research laboratories. Speech recognition error rates continue to fall steadily as task complexity increases, and the quality and intelligibility of computer-generated speech continue to improve. Today, our lives are touched almost daily by systems that can allow us to dial phone numbers, issue verbal commands, perform transactions, or even dictate a letter, all using the devices we are born with. Despite these successes, we are far from reaching human capabilities of recognizing nearly perfectly the speech spoken by many speakers, under varying acoustic environments, with essentially unrestricted vocabulary. Synthetic speech still sounds stilted and robot-like, lacking in real personality and emotion. How are we going to reach nirvana – enjoying truly anthropomorphic interfaces that can deal with us on our terms, using human language technologies? While mathematical formalisms, data collection from humans, and rigorous performance evaluations are key ingredients, there are many challenges that will remain unmet unless we can advance our fundamental understanding of human communication – how speech is produced and perceived, utilizing our innate linguistic competence. In this paper, I will outline some of these challenges, ranging from signal presentation and lexical access to language understanding and multimodal integration, and speculate on how these challenges could be met. 2. FUNDAMENTAL CHALLENGES
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 17

zue04eightychallenges - Zue. Speech Input/Output...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online