20065ee113D_1_exp_4 - UCLA Electrical Engineering Professor...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: UCLA Electrical Engineering Professor Jain EE 113D TA Delbert Huang Experiment 4 Introduction to Speech Synthesis Purpose To design a speech synthesizer that will produce vowel sounds on the TMS320C542 DSKplus board. Introduction A Simple Vowel Synthesizer It is well known that speech signals can be generated by the use of analog hardware or software implementa- tions. Software implementations are more advantageous since they are more exible and the signal-to-noise ratio(SNR) can be as large as desired. One of the simplest ways to produce speech signals is by using the output of a linear system that models the vocal tract with a certain input. In the case of vowels, this input is modelled as a train of impulses. The distance between these impulses controls the pitch. Pitch is one of the characteristics of a voiced sound that makes the sounds produced by a male di er from that produced by a female or a child. Typically, pitch is higher for female and child speakers than it is for males as shown in Table 1. Speaker Average (Hz) Minimum (Hz) Maximum (Hz) Male 125 80 200 Female 225 150 350 Child 300 200 500 Table 1: Pitch frequencies ( f ) for di erent types of speakers The vocal tract can be thought of as cascaded sections of resonators where each section can be modelled digitally using a simple second-order all-pole IIR lter. Fig. 1 shows a simple vowel synthesizer based on a linear system with three resonators (R1, R2 and R3) that model three formant frequencies, and an impulse generator that controls the pitch frequency f . +-------------------+ +------+ +------+ +------+ | Impulse Generator |----->| R1 |-->| R2 |-->| R3 |--> s(n) +-------------------+ +------+ +------+ +------+ Fig 1. Digital model for speech production. Vowels are distinguished mainly by the values of the rst three formant frequencies and bandwidths as shown in Table 2. It is important to notice that the coe cients of the three sections generated by this method must be scaled for cascade implementation on a DSP chip. This is done by multiplying the inputs to each IIR lter with a value (inverse of maximum gain) such that the output will not exceed the range of the DSP. 1 Phonemes Speaker F 1 F 2 F 3 BW 1 BW 2 BW 3 /a/ Male 700 1250 2600 130 150 160 /i/ Male 270 2300 3000 60 200 400 /a/ Female 890 1220 2800 130 150 200 /i/ Female 1350 2800 3300 60 200 400 Table 2: Values of BW and F for di erent phonemes Procedure 1. In the C: \ EE113L \ SOURCE \ EXP_4 directory, a main template le SPEECH.ASM has been provided for you. This le contains some of the usual initialization as well as a section which generates the impulse train which is required as the input for this experiment (as shown in Fig. 1). The le also makes macro calls to the usual AC01 initialization routines in SPC_INIT.ASM and the interrupt vector table in SPC_VECS.ASM which have been provided to you. Lastly, it contains section headings where your code for this experiment needs to be included.code for this experiment needs to be included....
View Full Document

Page1 / 5

20065ee113D_1_exp_4 - UCLA Electrical Engineering Professor...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online