124.11.lec20

124.11.lec20 - CS 124/LINGUIST 180: From Click to edit...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Click to edit Master subtitle style Dan Jurafsky Lecture 20: Speech Recognition
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6/1/11 The final exam Friday March 18, 12:15-3:15 in 370-370 Open book and open note You won’t need a calculator Computers are ok to read, e.g., the slides and the textbooks, but no use of the internet on your laptop or any internet-aware devices, on the honor code i.e., open book and notes, but not open-web The problems will be very much like
Background image of page 2
6/1/11 Topics we covered http://www.stanford.edu/class/cs124/
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6/1/11 Some classes in these areas cs276 Information Retrieval and Web Search Nayak/Raghavan, Spring 2011 cs224N Natural Language Processing Manning, Spring 2012 cs224W Social and Information Network Analysis Lescovec (Winter 2011? cs224U Natural Language Understanding fall 2011 (or winter 2012) cs224S Speech Recognition, Understanding, Dialogue , Jurafsky, not taught next year ling284 History of Computational Linguistics Jurafsky and Kay, winter 2011 cs121 Intro to AI Latombe cs221 Artificial Intelligence Thrun or Ng Often Winter cs228 Structured Probabilistic Models Koller
Background image of page 4
6/1/11 Speech speech recognition speech synthesis dialogue spoken sentiment extraction speaker/language id
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6/1/11 Applications of Speech Recognition/Understanding (ASR/ASU) Dictation Telephone-based Information GOOG 411 Directions, air travel, banking, etc “Google Voice” Voice mail transcription Hands-free (in car) Second language ('L2') (accent reduction) Audio archive searching and aligning 1/5/07
Background image of page 6
6/1/11 Speaker Recognition tasks Speaker Recognition w Speaker Verification (Speaker Detection) § Is this speech sample from a particular speaker Is that Jane? w Speaker Identification § Which of this set of speakers does this speech sample come from Who is that? § Related tasks: Gender ID , Language ID Is this a woman or a man? Speaker Diarization w Segmenting a dialogue or multiparty conversation Who spoke when?
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6/1/11 Applications of Speaker Recognition and Language Recognition Language recognition for call routing Speaker Recognition: Speaker verification (binary decision) Voice password, telephone assistant Speaker identification (one of N) Criminal investigation 1/5/07
Background image of page 8
6/1/11 Speech synthesis Telephone dialogue systems Games The ipod shuffle http://www.apple.com/ipodshuffle/voiceover.ht Compare to state-of-the-art synthesis: http://www.research.att.com/~ttsweb/tts/dem
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6/1/11 LVCSR Large Vocabulary Continuous Speech Recognition ~20,000-64,000 words Speaker independent (vs. speaker- dependent) Continuous speech (vs isolated-word) Useful for: Dictation Voice-mail transcription
Background image of page 10
6/1/11 Outline for ASR ASR Tasks and Architecture Five easy pieces of an ASR system 1) The Lexicon (An HMM with phones as hidden states) 2) The Language Model 3) The Acoustic Model (phone detector) 4) Feature extraction (“MFCC”) 5) HMM Stuff: 1) Viterbi decoding 2) EM (Baum-Welch) training
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6/1/11 Current error rates Task Vocabulary Error Rate% Digits 11 0.5 WSJ read speech 5K 3 WSJ read speech
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 82

124.11.lec20 - CS 124/LINGUIST 180: From Click to edit...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online