224s.09.lec6

224s.09.lec6 - CS224S/LINGUIST281...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and  Dialogue Dan Jurafsky Lecture 6: Waveform Synthesis  (in Concatenative TTS) IP Notice:  many of these slides come directly from Richard Sproat’s slides, and  others (and some of Richard’s) come from  Alan Black’s excellent TTS lecture notes.  A couple also from Paul Taylor
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Goal of Today’s Lecture Given: String of phones Prosody Desired F0 for entire utterance Duration for each phone Stress value for each phone, possibly accent value Generate: Waveforms
Background image of page 2
Outline: Waveform Synthesis in  Concatenative TTS Diphone Synthesis Break: Final Projects Unit Selection Synthesis Target cost Unit cost Joining Dumb PSOLA
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The hourglass architecture
Background image of page 4
Internal Representation:  Input to Waveform Wynthesis
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Diphone TTS architecture Training: Choose units (kinds of diphones) Record 1 speaker saying 1 example of each diphone Mark the boundaries of each diphones,  cut each diphone out and create a diphone database Synthesizing an utterance,  grab relevant sequence of  diphones from database Concatenate the diphones, doing slight signal  processing at boundaries use signal processing to change the prosody (F0,  energy, duration) of selected sequence of diphones
Background image of page 6
Diphones Mid-phone is more stable than edge:
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Diphones mid-phone is more stable than edge Need O(phone 2 ) number of units Some combinations don’t exist (hopefully) ATT (Olive et al. 1998) system had 43 phones 1849 possible diphones Phonotactics ([h] only occurs before vowels), don’t need to  keep diphones across silence  Only 1172 actual diphones May include stress, consonant clusters So could have more Lots of phonetic knowledge in design Database relatively small (by today’s standards) Around 8 megabytes for English (16 KHz 16 bit) Slide from Richard Sproat
Background image of page 8
Voice Speaker Called a  voice talent Diphone database Called a  voice
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Designing a diphone inventory: Nonsense words Build set of carrier words: pau t aa b aa b aa pau pau t aa m aa m aa pau pau t aa m iy m aa pau pau t aa m iy m aa pau pau t aa m ih m aa pau Advantages: Easy to get all diphones Likely to be pronounced consistently No lexical interference Disadvantages: (possibly) bigger database Speaker becomes bored Slide from Richard Sproat
Background image of page 10
Designing a diphone inventory: Natural words Greedily select sentences/words: Quebecois arguments Brouhaha abstractions Arkansas arranging Advantages: Will be pronounced naturally Easier for speaker to pronounce Smaller database? (505 pairs vs. 1345 words) Disadvantages: May not be pronounced correctly Slide from Richard Sproat
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 66

224s.09.lec6 - CS224S/LINGUIST281...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online