113D_1_EE113D_CR7_Modulation - E LECTRICAL ENGINEERING 113L...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: E LECTRICAL ENGINEERING 113L Digital Signal Processing Laboratory @ 1998, Dr. M. Wert er Le ctu r e 6: Modula tio n T echniques Int rod uct ion Most signals cannot be sent dir ectly over transmission chann els. Inst ead , a carrier wave, whose pro pert ies are bet ter sui ted to the transmission medium, is mo dified t o represent t he signal. Mod ulation is the syst emati c alterat ion of a carr ier wave in accordance with a message (t he modulating signal). It is interest ing t o not e that many nonelectrical forms of communication also involve a m odulation process, speech being a good example. When a person speaks, the movements of the mouth take place at rather low rates, on the order of 10 Hz, and as such they cannot effectively produce propag atin g acoustic waves. Transmission of voice thr ough air is achieved by generating higher-frequency carrier tones in the vocal cords and modulating these tones with the muscular actions of the oral cavity. What the ear hears as speech is thus a modulated acoustic wave, sim ilar in many · respects to a modulated electric wave. Why Modulat e? I have already given an answer to this question; namely, that modul at ion is required to match a signal to the transmission medium. However , there are some other considerations involved: 1: Modulation for ease of radiation Efficient electrom a gnetic radiation requir es radiating elements (antenn as ) whose physical dimensi ons are at least 1/10 wavelength or so. But many signals, especially audio signals, have frequency components down to 100 Hz or lower, which would require antennas of 300 km (200 miles) lon g if radiated directly. Using the frequency translation proper ty of modulation, these signals can be impressed on a high-frequency carrier, t hereby perm itti ng substantial reduction of the antenna size. For example, in t he F M bro adc ast band, where carriers are in the 8S- to l O S-MHz range, ant ennas nee d to be no more than one meter (3 feet) or so across. 2: Modulation to reduce noise and inter ference It is impossible to eliminate noise from a system . And thou gh it is p ossible to eliminate interference, it may not be practical. For tunately, c ert ain typ es of modulation have the useful prop erty of suppressing b ot h noise and int erference. The suppression, how ever , is not without a price; it gener ally requires a wider transmission ban d width (frequency range). 3: Modulation for frequency assignment The owner of a radi o or televisio n st ation has the op t ion to broadcast his progr am over the same transmission medium as oth er stations . The selection an d separation of anyone st at ion is possible beca use each has a different -assign ed carrier frequency. Were it not for modulation, only one stat ion could oper ate in a given area. Two or more stations transmit ting directly in the same medium, without modulation would produce a hopeless jumble of interfering signals. 4: Modulat ion for multiplexing Oft en it is desired to send many signals simultaneously between the same two points. Mult iplexing techniques permit mu lt iple-sign al trans mission on one channel such that each signal can be picked out at the receiving end . Applications of mu ltiplexing include AM and FM braodcasting, and long-dist ance telephone. It is quit e common , for inst an ce, to have as many as 1,800 intercity telephone conversat ions m ultiplexed for transmission on a single coaxial ca ble. Am plit ude Modulation (AM) T he carrier signal used in most systems of communication is a hi gh frequ ency sinusoidal wave of t he form Ae cos(21l"Fet + 4» In the carrier signal , A e is called the amplitude, Fe is called the carrier frequency, and r/J is called the phase . Withou t loss of generality we can set A e 1 and r/J 0 for am plit ude modulation . If t he informat ion sign al s(t ), t hat is desired to be trans mit t ed , is realized by variation of the amp lit ude of the carrier signal, the latter is sai d to be amplitude-modu lated (AM): = = y(t) = s(t ). cos(21l" Fet ) 1 149 T hus, if the car rier is amp lit ude-mod ulated by a sinusoidal signal set) = A m COS(271-Fmt) t hen th e am plitude-m od ulat ed carrier is of the form y et ) = A m COS(271-Fmt)cos(271-Fet ) Usin g th e cosin e ide nt ity cos (o:) cos ({3 ) = ~ cos(o: + fJ) + ~ cos(o: - fJ) we get yet) 1 1 = 2" Am cos (271" (Fe + Fm)t) + "2 A m cos(271"( Fc - Fm)t ) T he effect of m od u lat ion m ay therefore be expressed as a pair of sinusoidal comp on ents of amplitu de A m / 2, an d differing in frequ en cy fro m the car rier by plus and minus th e mo dulation frequency. T hese sinuso idal com p onents d ue to mod ulat ion ar e cal led sid eb ands . More general, hav ing an arb itrary input signal set) and using Eu ler's formula we can writ e Applying Fourier Transform on this equat ion we get Let 's now apply a sec ond modulation on yet ), called Y2(t). Then If we send signal Y2(t) through a low-pass filter with a corner frequency wit h FLP < Fe, th en the frequency components shifted by ± 2Fe will b e filte r ed out of this signal, and only the original signal set) (at half amplitude) will remain. We now have res t ored the original signal set) . Ph ase Modulation (PM ) In the preceding section we po inted out how a sinusoidal wave of the form A c cos(271" Fet + cP ) can be made to carry inform at ion by modulatin g (varying) the ampli tude fact or A e , thus giving what is called an am plit ude-modulat ed signal. T his, however , is not the only way in which the carrier wave can be made to carry informa tion . It is , for examp le, possib le to keep th e amplit ude A e constant and vary the ar gu ment of the sinusoidal funct ion in accordance with the signal to be transmitted . T his is called angle modulation. T wo simple schemes for . doing this are called phase modulation and frequency modulation, respectively. In phase modulat ion , the ph ase cP in t he above equat ion is var ied in accordance wit h a signal set) . T hus if the carrier is phase-modulated by a signal set) = A m cos(2 71"Fmt ) t hen the ph as e-modu lat ed c arrier is of the form Fre quency Mod ul at ion (F M) In frequency modulat ion , the ins tan taneous frequency of the carrier sign al is varied accor dance with the sign al. We define th e instantaneous fr equ ency F as Let set) = A m cos(271"Fm t) 2 150 Th en F = 21r So 1 di = 4> d4> Fe + s(t) = Fe + Am cos(21rFmt ) t) = 21rFet + ~: sin(27rF m T hus th e frequency modu la ted signal becomes Sinewave genera tor by the difference equation If lect ure 4 we ha ve seen that one way to creat e sinusoidal signals is based on th e secon d-order difference equ ation : y (n) = H I. x (n - 1) + A I. y (n - 1) + A 2. y (n - 2) If we set HI = 1, A l = 2cos(8) , and A 2 = -1, then with a delta function x(n ) = b"(n) as input, we will have a sinusoidal solution. Note that the value of IAI I < 2, therefore we define C I = AI/2 , so that the value of C I is normalized: IGd < l. Th e frequency of t his sinusoidal oscillator is given by the formula Fe = 21rarccos (AI/ 2) = 21r arccos(Ct} Th e following lines show how we can describe the AM modul at ion in assem bly language. . s et s e ct " .data", Ox4 00,1 .setsect " .text" , Ox1800,O .setsect "vectors", Ox180 , O .sect "vectors " . copy "am_ v e c s. a s m" . data section is located at Ox400 i n dat a memory .t e xt se ct i on is located at Ox1800 i n prog r am memor y vectors section is located at Ox180 i n pro gr am mem y or Fs Fs start of vectors section copy values from "am_vecs .asm" into vectors section am_vec.a sm contains the i nt er r upt handling vectors start of data values in " . dat a " sect ion this is the location of y(n-1) the initial impul s e input is set to 2 -14 C1 = A1/2 = 7991/2-15 = 0.24387, Fc = 0 .21 Fs thi s is the location for the output value start of executib l e code in " .text" se ct i on here we s ubs t i t ut e the initialization, as always infinite loop: wait for the re c e i ve in te r ru pt Set data memory pag e pointer t o location of y(n-1) zero accumulator A lo ad new input s(n) in accumula t or A store nell i nput at locat ion O PUT t em or ar y UT p yminus1 yminus2 coeff OUTPUT . data .llord .llord .llord . llor d . t ext 0 16384 7991 0 start : WAIT: goto WAIT receive : DP = #yminus1 A = #0 A = trcv 00UTPUT = A ;-- -- - - -generate carr ie r f r e quency ---A = 0ymi nu s2 « 16 pl ace y(n - 2) i n accumula to r A h i gh llor d -y ( n-2) --> A A = -A C1*y(n -1 ) - y (n - 2) - -> A macp(0yminus1 , coef f , A) macd (0yminus 1, coef f , A) 2*C1*y ( n- 1)- y (n -2 ) --> A a l s o move y(n- 1) - -> y (n- 2) 0 ym i nusl = hi( A ) y(n) =A, yen ) --> y (n- 1) 3 151 ; - ------AM modulation s e ct i on insert here the code for AM modulation ; and store res ul t at lo cat i on OUTPUT ; - - -- - - -send output A = tOOUTPUT = #OFFFCh txdr = A A &: A return_enable transmit: return_enable .co py " am_init.asm" .end load value at OUTPUT in accumulator A set tvo LSBs to zero for AlC send output to transmit regist er enable interrupts and return from interrupt enable interrupts and return from interrupt include AlC initi a l i z at i on from "am_i n i t . asm" In the case of FM modulation we need to change the coefficient C I before the "generate carrier frequency" section, by adding the received input signal s(n) to coefficient CI : C I •n e w = CI,old + s(n ) Notice that the above equation is not exact, because the frequency of the oscillator is related with the coefficient C I by an ARCCOS function (a nonlinear function) . However, if th e amplitude Am of the input signal s(n) is small enough, we can use linear approximation around the operating point of the CI,old value so that the above equation can be used . 4 152 E LECTRICAL ENGINEERING 113L Digital Signal Processing Laboratory @199 8, Dr . M. Werte r Lecture 7: Sp eech Synthesis Human Speech Production T he figure below illustrates a cr oss-section t hr ough the human head, showing t he vocal tr act and speech articulators . T he vocal tr act may be said t o begin properly at t he larynx (voc al cords), even t hou gh the t rach ea (wind pi pe) and lu ngs are obviously involved in vocalisation. T he lungs do, of course, provide the primar y acoustic energy for sp eech , and its air is forced t hrough the windpipe into the vocal t ract, upon which the vibratory patterns of speech ar e imposed . r ig. 2.1. Cross-=tioD lhlO' p the h.man head, showins the opeech altitulalors. T here are two principle ways of superimposing vibrat ions on t his air stream. The simplest is to cr eat e turbu len ce by m aking a constriction at som e point in the vocal tract. This effectively excites the tract at the point of constrict ion with ran dom acoustic noise. The more efficient method of excitation is to cause th e vocal cords to vibrate ina periodic manner. By varying the tension of the cords and the air pressure from t he lun gs, hum ans can control the frequency of these vocal cord vibra t ions, called the pit ch frequency, over a typical range of 50 - 500 Hz . The pitch frequ ency for female an d child sp eakers is higher than that for mal es. The vibratory patterns superimposed on the airflow through the vocal tract are said to be 'voiced ' (periodic vibration of vocal cords), 'unvoiced' (turbulent flow at constriction), or mixed (per iodic and turbulent ). In the English language we distinguish vowels, diphthongs, stops, fricatives, liquids and semi-vowels. A list of the Intern at ional Phonetic Association (IPA ) symbols of English is given in the table below. r ig. 2.4. Some: suggested IPA symbols for the phonetictnnoc:riptioo 0( EDKIish. Diphlhonp A-hul Vowds i-heat I-bit .-bet a-bal _rurther .-lalher Slops p-c-pea I-lea .-loot :>-hom .-hood a -rate aJ-rite VI- lois .-Ioot VI- louis " - peer pair m-I":Im n -ran h- by d-d ic S- SU Y Glotw n op 1-10)' IJ -rane 7- 001..,. r- &e Fricatives /I- lhish Glolw friUli .. f-.hy rr}'< s-sish h- hol Scm"'vowds w -wc Liq.ids I - lie j -ye 1 153 S pectrograms It has long been know n that the human ear detects frequency spectra of sounds , rath er than th eir actual tim e waveforms. Th e basilar membrane of the inner ear performs an approximate short-time frequency-transform of sound pressure, and it is this information which is presented to the brain via the auditory nerve. In consequence, the spe ech spectrograph was invented to convert the time waveform of sounds to short-time spectra , and to display this t hr ee-dimensional information (time, frequency and intensity) as an en ergy-densit y map . The spectrogram has thus bec ome th e accepted method of describing the acoustic features of speech. This time/frequency/ intensity display shows th e spectral distribution of energy throughout an utterance in a dramatic and meaningful way. The result is a "sound picture" , meaningful in the sense that familiar sounds give similar sp ectrograms . T hat is, s peech which sound t he same to the human ear look the same on the spectrogram, to the human eye. i Fig. 2.7. (a) Spectrogram of 'seat', narrow band. . The above figure shows the spectrogram of the word 'seat', phonetically / sit [, The vertical axis represen ts fre quency, covering the range 0 - 8 kHz, and the horizont al axis is proportional to time. The "blackn ess" of the spectrogram at each poi nt is prop ort ional to the amount of dB of energy of the sp eech at that particu lar freq uency and that particular t ime. Th e intensity range of the "blackness" (or grey scale ) covers about 40 dB . . F-'g.ll (G) CrosHcctioa or spcctropam or fi& 2.7{G).rmm oowd porboD or....1·. (b) Cross-scctioD orspcctropam orYIll2.7(b). rmm fri ative in 'scat'. c (6 ) 1 oIii:; 1 'Ii" II i i i 3 • , it S F ...... cncy(1<Jh) 2 154 Cross-sections of these spectrograms at particular time instances are illustrated in the figure above. The harmonic structure of the vowel spectrum is clearly visible in this figure. The harmonic structure of the voiced spectrum, shown in this figure, is shaped by an "envelope" function, characteristic of the resonance of the vocal tract. The peaks in this spectral envelope are known as "formants." In vowels, there are usually four to five formants in the frequency band of int erest (up to 5 kHz), tho ugh for a given speaker three formants are usually enough to characterise a particular vowel. Other speech sounds may also be characterised by formants, but t he situation is usually more complicated . The frictive lsi of 'seat ' appears on the left of the spectrogram as wideband energy. There are no horizontal or vertical distinguisables because the sound is not voiced . Its spectral cross-section is characteristic of wideband noise shaped by high frequency formants. However, the -det ailed struc ture of the formants is not as percep tually significant as it is for voiced sou nds . The It I in 'seat' appears on the right of the spectrogram as a silent gap, which is the closure stop, followed by a burst of wide energy, which is the turbulence noise emitted when the stop is released. The formants characteristic of the vowel Iii are to be seen in the centre of the voiced region. As t he vocal tract changes shape in the trans ition from th e frica ti ve l si to the vowel Iii, an d from Iii t o the st op Itl , the formant frequencies also change position. Percept ual clues to th e ident ity of the fricat ive lsi and t he stop It I are present in these formants t ransitions , which are the acoustic correlates of co-articulation. Spectrograms are indispensable aids in analysing bot h human and syn thetic speech. Usually, when a spectrogram of synthet ic sp eech resembles that.of real speech, the synthe t ic and the real speech will soun d the same . Forman t Sy nthesisers The chief merit of c ascaded forman t synthesisers is that its transfer function resembles th at of the vocal tract (without nasal coupling), and is therefore good for non-n asal voiced soun ds . Usually, only three to five resonat ors are necessary to obtain accept a ble quality speech, thus only three to five formant frequencies and bandwid ths need to be specified to define the spe ct ru m . The simplest form of an analog resonant low-pass filter has a func t ion Ha(s) = s 2+B' :+n~ n2 _ F; - FJ+jBFo-F2 where B = B'/21r is the bandwidth and Fo = no / 21r is th e resonan t frequency. It has been shown that simple, low-pass , secon d-ord er functions of this kin d are quite adequate if a compensating network is used to correct for the absence of higher formants. This correcting function, illustrated in the figure below, dep en ds on the numb er and frequ ency of the formants used, but in a practical situation can be approximated by a fixed network. The fact that formant amplitudes cannot be controlled individually does not matter if the bandwidths are adjusted correctly. r'l- 4.3. Spectrum ex>mctioD fac:tor for seriallOrmant lJll lhesis. After Fant (1956). ~ . _ .:; 30 20 10 Corftdioft r.ctor - Contn"'bution r..... 1aryaa 0 -U!~.-:=.~ - 20 -30 source • radiltion + ID&ha" poks ! ~ - 10 -40 I 10 -10 -20 -30 Spcctnl "~topc 2 ) 4 BF ft<l"mC1(tIb1 o ror wowd a -eo I 2 3 4 In a practical synthesiser it is not too much of an approximation to fix the bandwidth of each formant so that only three to five formant frequencies are required to specify any non-nasal voiced sound. This makes the cascaded formant synthesiser very attractive in terrns of control data economy. However, the model fits only non-nasal voiced speech . Stops, nasal and fricatives have to be catered for by extra resonators designed to introduce additi onal po les and zeros of th e transfer function. A typical formant synthesiser is shown schematically in the figure below. 3 155 r ig. • .•• Schematic diagram of the OVE II syn lbcsiscr . F. is pitch; A •• At. A. &: Ac.... amplitudes; F,. FJ &: F J an: fomwtt fmlucacics; K. is • fricali.., ZICro aad K •• K I arc fricali..: poIcs. Fant (1973~ Aller arc lricau.. poles. Aller F1aDagan. Coker &: Bird (1962~ A.. &: A• .... amplitudes; F , . F J &: F J an: fonnant frequCDCics; l' is . nasaI .~ aad Z. is. nasal JlCn>; Z, is alricali"" UfO aDd 1'•• ll'. Fir. • .5. Schanatic: diagram of ocriaJ formant syn lhcsixr. F. is pitch; • The vocal trac t can be thought of as cascaded sections of resonators, where each sect ion can be modeled digitally using a simple secon d-order all-pole IIR filter with t ransfer funct ion K H (z) = 1 + b I + b2Z lZ where R is related to the 3-dB ban dwidth as 2 K = -~=----;--:-:---:----=-=--"""" 1 - 2.R. coS(t/J )Z- l + R2Z-2 B = 21T B' ~ (1 - R) -;.- Fs and the resonant frequency Fo can be found approximately by r; = no ~ 21T (1-) Fs 21T In these equations the value of Fs is the sampling frequency. Th e magnit ude response of this function is shown in the figure below. The factor K will be chosen to mak e IH(z)1 = 1, at zero frequency (DC) j that is, rig. <C.22. Two-<lday digilallil.... giw.a complex ooDjupIC polesin the s.pIaoe. Fftq_ rcspo_ is Iowpua n:soaaoL .t IH(.)I Digital resonators of this type are very useful in t he design of cascaded formant synthesisers, since t hey are low pass an d have unity DC gain. T here is an additional advantage in t his applicat ion in t h at no higher pole correct ion is 4 156 n ecess ary. The re ason is th at t he periodic nature of the digital filter t ran sfer function effectively puts in an infin ity of higher p oles, an d extra com pensati ng poles are not required. To cater for the pr oduc tion of nasals, fricat ives and stops in a cascade d sy nthesiser, it is necessary to use functions which give bot h po les and zeros in th e Z-plane. Th e transfer function is 2 I H( z ) == K t 1 + al z - + a 2zI + b2z - 2 K 2 1 + bl z where K I == 1 + bl + b2 and K 2 == 1 + a t + a2 t o get a uni ty gain at DC. Th e c omp lete sch emati c diagram of a t ypical , digital form ant synt hesiser is shown in the figur e below. F ig. 4.25. Schematic: diapam ol typic:al digital forma.a t lJllthesiocr. Arter Rabiner , Schafer &; Coter (1 97 1~ F, Fz Vowel synt hesiser For this exp erim ent we are going t o creat e a speech synt esiser that is lim ited to vowels. Vowels are distinguished m ain ly by t he values of the first three form ants frequencies and bandwidths as shown in th e table below. Fi s. 6 .2 Phoneme parameter la bles for 1C:pDCDIa11J11thcsi1 &om fonnant data. F, . F, and F, are tatgcI forma.atlRqueDCics, and R, • R, and R, arc tbcir corrapoodiDs tatgcI regions. From RabiDer. t968h. Phoneme F, 270 390 530 R. 75 75 75 75 75 37 75 75 75 75 F, 2290 1990 1840 1720 1190 R, 75 75 80 75 75 75 75 75 80 80 F. 3010 R, Nasal Fric Voiced Y..-l:r i e .. A ~ I 25 SO ISO 660 520 730 570 440 300 490 • 1090 840 1020 " u ~ 170 13SO 610 .110 110 110 2390 75 2440 115 2410 115 2240 90 2240 90 1690 100 24llO 2410 + + + + + + + + + + + + + + + + + Smd_1s w 300 j 300 380 420 2S 25 40 110 80 80 2200 3065 2575 ISO 200 2200 880 1300 800 1700 I r LIqvIds 2S 30 ISO 1600 100 SI(J PS b d S p t· k NIU4b m 0 ~ 0 0 0 0 0 0 280 280 280 175 SO 30 IS SO 30 23SO 800 1700 75 I7SO 120 SO 2600 160 SO 2000 100 40 30 30 10 17 17 17 30 23SO '900 1700 17SO 2600 2000 2200 2600 80 100 - 11 11 17 40 100 100 120 100 2300 900 1400 1300 1800 1100 1600 27SO 2400 + + + + + + + + + r FriaIlWn SO 40 40 100 8 J II • 200 30 200 30 175 175 30 30 30 30 30 2200 2500 2000 2400 ISO 70 SO 40 40 100 200 200 175 1300 1800 2200 2500 2000 120 100 70 ISO +. + + + + + + + + 5 157 I mpulse Ge ne rat or The input signal to th e t hr ee variable resonators is an impulse gene r ator op erating a t t he pith frequency. The average pitch frequency F p it ch for a male is 125 Hz , for a femal e is 225 Hz, and for a child is 300 Hz . We cr eate t he impulse generat or by presen ting an impulse every pitch period T = IIF pit ch. For ex am p le , let t he DSP operate at a sampling frequency of Fs = 10 kHz , then to get an impulse at t he pitch frequency for a male we need t o present an impulse every 80 samples. Th e assembly code below will gen erat e thi s pulse train. . dat a . vord 16384 . t ext R1 A = #79 star t of data values i n " . data " section t hi s is l oca t i on of input to 1s t section start of e xecut ib le code in " . t ext " sec t io n Initialize for 80 i t er at i ons so t hat F_pi t ch = Fs/80 = ( 10 kHz)/80 = 125 Hz infinite loop : vai t f or t he receive interrupt. x1n st art : W AIT: goto WAIT Set Data M or y Page Po int er to x1n em receive : DP = #x1n dummy read A = tr cv ; che ck i f AR 1 = 0 if( *AR1- != 0) goto noupd at e set impulse va l ue x1 (n) = 2- 14 \1lxln = #16384 reset AR 1 AR1 = #79 goto cont inu e s e t interim value s to 0 noupdate &x1n=#0 zero accumulat or A cont i nu e A = #0 Scal ing to prevent overflow In the previous filter design we have scaled the digital filters t o get unity gain a t DC . However , t he gain at t he resonant frequency will then be more than one, which will give an overflow an d could lead to overflow oscillations and other nonlinear it ies. To pre vent this to happen in the digital filter we will properly scale each section. T h at is, given H(~W ) = 1 - 2 .R. cos( ¢» e- j w + R 2e - 2j w K we will find t he maximum of the magnitude Characteristic by solving t he equ at ion The sol u t ion of this equation is cos (WMA X ) = C 2rose¢»~ ;: ) K R2) sine¢»~ Subst it u t ing this result back in the magnit ude characteristic yields HMAX = (1 _ R Therefore, to properly scale the digital filt er to prevent overflow we need to se t K = (1 R 2 ) sine¢»~ N ote th at we pre viously foun d t he multiplier coefficien ts - bl = 2.R. ros e¢» ~ and r-bz = - R 2, wh er e = 1- 1rB Fs ¢> = 21rFo Fs R eferences R. Linggar d , El ectronic S ynthesis of Speech. C amb ridge: Cam br idge University Press , 1985. 6 158 Le ctur e 7: Speech Synth esis Human Speech Production Fig. 2.1. Cross-section through the human head. showing the speech articulators. _ _-Ve lum __-b~'::::Ton gue ..... ---.~- Epigl o tt is Oeso phagus The English lan guage Fig. 24. Some suggested IPA symbols for the phonetic transcription of English. Vowels Diphthon~ i-beat I-bit e-bct a-bat ;-further a-father Stops p- pea t - tea k-key Fricatives 8-thigh f- fie s-sigh J- shy Liquids r-rye I- lie A-hut v-hot ;)-hom o-hood u-loot el-rate ai-rite vI-lois Ill-louis z;-pecr e;-pair m- ram n-ran I)-rang Glollal stop u;-boar o;-boor ;v-load all-loud b -by d-die g-guy 6-father v-carvcr \ \ ?-butter GIOllalfricative z-mars 3- largcss Semi-vowels w-we h- hot j -ye 1 159 S pectrograms The inner ear performs an approximate short-time frequency- transform. The speech spectrograph was invented to convert the time waveform of sounds t o short-time spectra, and to display this three-dimensional information (time, frequency and intensity) as an energy-density map . Fig. 2.7. (a) Spectrogram of 'seat', narrow band. (b) Spectrogram of 'seat' wide band. The above figure shows the spectrogram of the word 'seat', phonetically /sit/. The vert ical axis represents frequency, covering the range 0 "" 8 kHz, and the horizon tal axis is proportional to time. The "blackness" of the spectrogram at each point is proportional to the amount of dB of energy of the speech at that particular freq uency and that particular time. The intensity range of the "blackness" (or grey scale) covers about 40 dB. \ , 2 160 C ross-sect ions of spect ogram Fig. 2.8. (a) Cross-section of spectrogram of Fig 2.7(a), from vowel portion ol'seat'. (b) Cross -section of spectrogram of Fig 2.7(b ), from fricative in 'seat'. ~ I (a) (b) iii Ii II " II, i , iii , iii '*:: i :::: i:::: j4i "I " " i :: :: iii" " " II i " II :: " I" " II " ,, 01 2345678 Frequency (kHz) The harmonic structure of the vowel spectrum is clearly visible in this figure. Other speech sounds may also be characterised by formants, but the situation is usually more complicated. " , \ 3 " - 161 ·Formant Synthesisers The simplest form of an analog resonant low-pass fi lter has a func tion Ha(s) = where B 2 S + 02 B0 'S + 02 Ho o ~--~-- F2 FJ + j B F. - F2 = B ' /21r is the bandwidth and Fo = Oo/ 21r is the reso nant frequency. Fig. 4.3. Spectru m correction factor for serial forman t synthesis. After Fant (1956). 30 20 ~ '-1-- - .E c. 10 0 -+4~-r--- " - 10 F. F] I Co rrection facto r = Co ntri but ion from larynx source + radiation + higher poles ' ...... ~ -20 -30 -40 12 3 Freq uency (kHz ) 4 10 o - 10 r--_-------~ -20 - 30 -40 2 Spectral envelope for vowel ;a 3 4 Formant Synthesisers Fig. 4.4. Schematic diagram of the OVE II synthesiser. F o is pitch; AN• .40. AH & Ac are amplitudes; Fl' F 2 & F J are forman t freq uencies; K o is a fricative zero an d K I ' K 2 are fricative poles. After F an t (1973). x } - - - - --; Nasals \ \ VocaJ ics X~----1 4 --; 162 Fig. 4.5. Schematic diagram of serial formant synthes iser. is pitch; , AN are amplitudes; F I ' F 2 & F 3 are formant frequencies; PN is a ! nasal pole and ZN is a nasal zero; Zr is a fricative zero and P F & P F are fricative poles. After Flanagan. Coker & Bird (1962). I • A. & r; + >--- Synthetic speech The vocal tract can be thought of as cascaded sections of resonators, where each section can be modeled digitally using a simple second-order all-pole IIR filter with transfer function H(z) _ - 1 + b1z- 1 + ~Z -2 K _ K 1 - 2.R. coS(</»Z-l + R2 z-2 where R is related to the 3-dB bandwidth as B= - ~ (I-R)~ 27r 1r' and the resonant frequency Fa can be found approximately by \ B' F \ In these equations the value of F s is the sampling frequency. The magnitude response of this function is shown in the figure below. The factor K will be chosen to make IH (z)1= 1, at zero frequency (DC); t hat is, K = 1 + b1 + ~ 5 -; 163 . I. Fig. 4.22 Two-delay digital filter giving complex conj ugate poles in the z-plane. F requency response is lowpass resonan t. H(z) IH (z)1 = 1 + b , z -' k + b,z -' - - - j f - - - - - t - - - - - - - - . l - - - - (J I' To cater for the production of nasals, fricatives and stops in a cascaded synthesiser, it is necessary to use functions which give both poles and zeros in the Z-plane. The transfer function is l 2 H(z) = K 1 1 + alz- + l12 Z K 2 1 + b1z- 1 + ~Z-2 where K 1 = 1 + bl + ~ and K 2 = 1 + al + a2 to get a un ity gain at DC. The complete schematic diagram of a typical, digital formant synthesiser is shown in the fi gure below. Fig. 4.25. Schematic diagram of typical digital forman t synthesiser. . After Rabiner, Schafer & Coker (1971). FaF, F, \ \ Speech White noise se nent or Variable X>----l pole/ zero Fr Fz 6 . '- 164 Vowel synthesiser For this experiment we ar e going to create a speech syn tesiser that is limited to vowels. Vowels are distin guished mainly by th e values of the first t hree formants frequencies and bandwidths as shown in the table below. Fig. 6.2 Pho neme parameter ta bles for segmen tal synth esis from formant <tat a. F I , F 2 and F J ar e target formant frequencies, and R I R 2 and R J are their corresponding target regions. From Rabiner, 1968b. . P honeme Vowels i I • F1 RI F2 R2 FJ RJ Nasal Frie Voiced e lC A Q :> \) u :l 270 75 390 75 530 75 660 75 520 75 730 37 570 75 440 75 300 75 490 75 300 25 300 25 380 25 420 30 0 0 0 0 0 0 SO 30 15 SO 30 2290' 1990 1840 1720 1190 1090 840 1020 870 13SO 610 2200 880 1300 800 1700 23SO 800 1700 2350 '900 75 75 80 75 75 75 75 75 80 80 3010 ISO 25SO .110 2480 110 2410 110 2390 75 2440 115 . 2410 115 2240 90 2240 90 1690 100 ISO 200 + + + + + + + + + + + + + + + + + Semi-vowels w j Liquids 40 2200 110 3065 80 2575 80 1600 75 SO SO 40 30 30 1 r ISO 100 120 160 100 80 100 Stops b d g P t k. N asals 17SO 2600 2000 1750 2600 2000 10 m D IJ 280 17 280 17 280 17 175 30 200 200 30 30 1700 2300 900 17 2200 17 2600 17 27SO 50 40 40 100 50 40 2400 2200 2500 2000 40 100 100 120 100 70 I SO 120 100 70 ISO + + + + + + +. + + + + ;~ + + r FricatiDa 6 I s v ~ z 3 175 30 175 30 200 30 200 30 175 30 1400 1300 1800 1100 1600 1300 1800 2400 2200 40 2500 100 2000 + + + + 7 -:. 165 ... ·' ... Impulse Generat or x1n . data . w d 16384 or . t ext AR1 = #79 start of dat a va l ues i n ". dat a " section t hi s is locat i on of in put to 1s t sectio n start of executib le code i n " . t ext " sec t io n In i t i a l ize for 80 iterations so that F_pitch = Fs/80 = ( 10 kHz)/80 = 125 Hz inf i ni t e loop: w i t for the rece i ve interrupt. a start : WA IT : goto WAIT r ece i ve : DP = #x1n Set Data Memory Page Poi nter to x1n A = trcv ; ' ~ y read if(*AR1- != 0) goto noupdate ; check if AR1= 0 @ x1n = #16384 set impulse value x1(n) = 2- 14 AR1 = #79 reset AR1 goto conti nue noupdate @x1n=#0 set inte~ i m valu es to 0 continue A = #0 zero accumulator A Scaling to prevent overflow To prevent overflow in t he digital filter we will properly scale each section. That is, given H(dW ) = 1 - 2.R. cos(</J)e- jw + R2e-2jw K we will find the maximum of t he magnit ude characteristic The solution is cos (WM A X) = ( 1 + R 2R 2 ) ~IH~"')l = 0 cos(</J) Subst it ut ing this result back in the magnit ude characteristic yields K H Therefore, MAX = (1 - ; \ R2) sin(</J) K = (1- R 2 ) sin(</J ) Note t hat we found the multiplier coefficients -b i = 2.R. cos(</J ) and - ~ = - R 2, where R= l - - nB Fs </J = 2nFo Fs 8 166 ...
View Full Document

This note was uploaded on 11/06/2010 for the course EE 113 taught by Professor Walker during the Spring '08 term at UCLA.

Ask a homework question - tutors are online