Ling250-16-SpeechTech2-ASR-S16

Ling250-16-SpeechTech2-ASR-S16 - 1 Linguistics 250 Sound...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
5/26/2016 1 Linguistics 250: Sound Patterns in Human Language May 26, 2016 Speech technology: Automatic speech recognition The Blizzard Challenge 2016 Online experiment rate current speech synthesis systems! http://groups.inf.ed.ac.uk/blizzard/blizzard2016/english/register-er.html 2 Last time Text-to-speech systems (computer-generated speech production Two broad types: Parametric synthesis: starting from features or properties of speech Synthesis-by-rule: an acoustic approach Articulatory synthesis: an articulatory approach Concatenative synthesis: starting from real speech samples Challenges to natural-sounding synthetic speech production 3 Today: Automatic Speech Recognition Automatic speech recognition (ASR) takes place when a computer recognizes spoken utterances (e.g., can transcribe what we say). Recognition is different from understanding : the computer recognizes the words that make up the input utterance, but not necessarily their meaning. But understanding is also necessary to properly recognize speech (e.g., think about homophones or syntactically ambiguous sentences) 4 Many of these slides are modified from: MITOpenCourseWare: Home > Courses > Electrical Engineering and Computer Science > Automatic Speech Recognition > Lecture Notes Synthesis vs. Recognition Synthesis is simpler? Synthesis is limited to the generation of speech from one human-like voice and to a limited set of materials to be ‘spoken’. Recognition is more open-ended. Unpredictability and variation include: the words that are spoken the accent(s) of the speaker(s) the individual characteristics of a speaker’s voice (related to gender, size, age etc.) 5 How does ASR work? Speech recognition involves pattern matching : the incoming speech signal is matched against stored patterns of speech sounds. Stored speech sounds: The computer stores acoustic information (spectra = amplitude by frequency) for all possible speech sounds, pronounced in different contexts. Incoming speech analysis: The incoming speech signal is analyzed into spectra taken at regular intervals (e.g. every 10 ms). Pattern matching: The computer compares the incoming spectra to those stored and comes up with the probabilities that each incoming spectrum has of being a match for each stored spectrum.
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern