This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Speech Processing
ShortTime Fourier Transform Analysis and Synthesis ShortTime Fourier Transform Analysis and Synthesis MinimumPhase Synthesis Speech & Audio Signals are varying and can be considered stochastic signals that carry information. This necessitates shorttime analysis since a single Fourier transform (FT) can not characterize changes in spectral content over time (i.e., timevarying formants and harmonics) Discretetime shorttime Fourier transform (STFT) consists of separate FT of the signal in the neighborhood of that instant. FT in the STFT analysis is replaced by the discrete FT (DFT) Resulting STFT is discrete in both time and frequency. Discrete STFT vs. Discretetime STFT which is continuous in frequency. In linear Prediction and Homomorphic Processing, underlying model of the source/filter is assumed. This leads to: Model based analysis/synthesis, also note that Analysis methods presented implicitly both used short time analysis methods (to be presented). In ShortTime Analysis systems no such restrictions apply. Veton Kpuska 2 February 11, 2012 ShortTime Analysis (STFT) Two approaches of STFT are explored:
1. 2. Fouriertransform & Filterbank February 11, 2012 Veton Kpuska 3 FourierTransform View Recall (from Chapter 3): X ( n, ) = w[n] is a finitelength, symmetrical sequence (i.e., window) of length Nw. m =  x[ m] w[ n  m]e  j n w[n] 0 for [0, Nw1] w[n] Analysis window or Analysis Filter February 11, 2012 Veton Kpuska 4 FourierTransform View x[n] timedomain signal fn[m]=x[m]w[nm] Denotes shorttime section of x[m] at point n. That is, signal at the frame n. X(n,) Fourier transform of fn[m] of shorttime windowed signal data. Computing the DFT: X ( n, k ) = X ( n, )  = 2 k N February 11, 2012 Veton Kpuska 5 FourierTransform View Thus X(n,k) is STFT for every =(2/N)k DFT: Frequency sampling interval = (2/N) Frequency sampling factor = N X ( n,k ) = x[ m] w[ nm] e
m= j 2 km N February 11, 2012 Veton Kpuska 6 FourierTransform View February 11, 2012 Veton Kpuska 7 Example 7.1 Let x[n] be a periodic impulse train sequence: x[ n] = [ n lP ] l=  ...
P P 2P 3P n Also let w[n] be a triangle of length P: P/2 0 P/2+1 n Ppoints February 11, 2012 Veton Kpuska 8 Example 7.1
X ( n, ) =
m =  x[m]w[n  m]e  jm  jm = (m  lP ) w[n  m]e m =  l =  =
l =  w(n  lP)e  j lP Nonzero only for m=lP Window located at lP & Linear phase lP February 11, 2012 Veton Kpuska 9 Example 7.1 Since windows w[n] do not overlap, X(n,) = constant and X(n,) is linear. Computation of DFT for N=P gives: 2
X ( n, k ) =
m = x[m]w[n m]e j N km j = ( m lP ) w[ n m]e m = l = 2 km P = ( n lP )e w
l = 2 j k lP P 1 X ( n, k ) = l =  w(n  lP) = constant
February 11, 2012 Veton Kpuska DFT of translated, nonoverlapping windows with phase shift of zero (due to sampling) 10 Spectogram X(n,)2 If analysis window length is pitch period wideband spectrogram vertical striations Otherwise narrowband spectrogram horizontal striations How often to apply analysis window to the signal? X(n,k) is decimated by a temporal decimation factor L: How to chose sampling rates in time (L) and frequency (Nfft length) it will be addressed in one of the forthcoming sections.
Veton Kpuska 11 X(nL,k) = DFT{fnL(m)} fnL[m] sections are a subset of fn[m] February 11, 2012 Analysis window
x[m] p=1 L p=2 p=3 w[pLm] February 11, 2012 Veton Kpuska 12 Spectrogram X(n,)2 February 11, 2012 Veton Kpuska 13 FourierTransform View Note that in , X(n,) is periodic over 2 (same as Fourier transform) and is Hermetian (H=H') symmetric. For real sequences A timeshift results in linear phase shift (same as in Fourier Transform): ~ X (n, ) = x[mn0 ]w[nm]e  jm = x[q]w[nn0 q ]e  j ( q+n0 ) = m= q= ReX(n,)}or  X(n,) is symmetric { ImX(n,)}or arg{ { X(n,)}is antisymmetric =e  jn 0 q= Thus, a shift by n0 in the original time sequence introduces a linear phase, but also a shift in time, corresponding to a shift in each shorttime section by n0. x[q]w[nn0 q]e  jq = e  jn0 X (nn0 , ) February 11, 2012 Veton Kpuska 14 Filtering View In the interpretation w[n] is considered to be a filter whose impulse response is w[n]. Thus w[n] is referred to as analysis filter. Let's fix the value of =o. X ( n,o ) = x[ m] e  jo m w[ nm]
m= ( ) The above equation represents the convolution of the sequence x[n]ejon with the sequence w[n]. Thus: X ( n,o ) = x[ n] e
February 11, 2012 Veton Kpuska (  j o n ) w[ n] 15 Filtering View x[n]ejon Modulation of x[n] up to frequency o. The product: February 11, 2012 Veton Kpuska 16 Filtering View X ( n,o ) =e  jo n x[ n] w[ n] e jo n Alternate view: ( ) The discrete STFT can be also interpreted from the filtering viewpoint. X ( n,k ) =e j 2 kn N 2 j kn x[ n] w[ n] e N This equation brings the interpretation of the discrete STFT as the output of the filter bank shown in the next slide. February 11, 2012 Veton Kpuska 17 Filtering View February 11, 2012 Veton Kpuska 18 Filtering View General Properties:
1. 2. 3. If x[n] has the length N & w[n] has the length M, then X(n,) has length N+M+1 along n. The bandwidth of X(n,o) is less than or equal to that of w[n]. Sequence X(n,o) has its spectrum centered at the origin. February 11, 2012 Veton Kpuska 19 Example 7.2 Consider a Gaussian window of the form: w[n]=e a ( n  no ) 2 The discrete STFT with DFT length N, therefore, can be considered as a bank of filters with impulse responses: hk [n]=e a ( n  no ) 2 e j 2 kn N For x[n]=(n) x[n]*hk[n]=hk[n] If N=50, corresponding to bandpass filters spaced by 200 Hz for the sampling rate of 10000 samples/s, then: February 11, 2012 Veton Kpuska 20 Example 7.2 For k=0,5,10,15 the following is obtained: ho [n]=e h5 [n]=e a ( nno ) 2 e e
2 j 2 0n 50 2 5n 50 j =e a ( nno ) 2 a ( nno ) 2 j h10 [n]=e h15 [n]=e
February 11, 2012 a ( nno ) e e 2 10 n 50 2 15 n 50 a ( nno ) 2 j Veton Kpuska 21 Example 7.2 February 11, 2012 Veton Kpuska 22 Example 7.3 Consider the filter bank of previous example 7.2 that was designed with a Gaussian window of the form: w[n]=e a ( n n o ) 2 Figure 7.7 shows the Fourier transform magnitudes of the output of the four complex bandpass filters hk[n] for k=0,5,10, and 15 as presented in previous slide and depicted in the figure 7.6. February 11, 2012 Veton Kpuska 23 Example 7.3 After Demodulation the resulting bandpass outputs have the same spectral shape as in the figure but centered at the origin. February 11, 2012 Veton Kpuska 24 TimeFrequency Resolution Tradeoffs In Chapter 3 basic issue in analysis window selection is the compromise required between a long window for showing signal detail in frequency and a short window required for representing fine temporal structure: STFT {x[n]}= { f n [m]} = { x[m]w[nm]} = X ( )W ( )e jn 1 = W ( )e jn X ( + )d 2  Since both X() and W() are periodic over 2 linear convolution is essentially circular. From the equation above: W() smears (smoothes) X(). Want W() as narrow as possible ideally W()=() for good frequency resolution. W()=() will result in a infinitely long w[n]. Poor time resolution. Conflicting goal February 11, 2012 Veton Kpuska 25 Example 7.4 Figure 7.8 depicts timefrequency resolution tradeoff: February 11, 2012 Veton Kpuska 26 TimeFrequency Resolution Tradeoffs From the previous example, smoothing interpretation of STFT is not valid for nonstationary sequences. For steady signal long analysis windows are appropriate and they yield good frequency resolution as depicted in the next figure. February 11, 2012 Veton Kpuska 27 TimeFrequency Resolution Tradeoffs However, for short and transient signals, plosive speech, flaps, diphthongs, etc. , short windows are preferred in order to capture temporal events. Shorter windows yield poor frequency resolution. February 11, 2012 Veton Kpuska 28 ShortTime Synthesis How to obtain original sequence back from its discretetime STFT? The inversion is represented mathematically by a synthesis equation which expresses a sequence in terms of its discretetime STFT. Recall that for fn[m]=x[m]w[nm]: Thus: X (n, )= f n [m]e  jn
n= 1 { X (n, )} = f n [m]= x[m]w[nm]
Veton Kpuska 29 If w[n]0 then recovery is complete.
February 11, 2012 ShortTime Synthesis For each n, we take the inverse Fourier transform of the corresponding function of frequency, then we obtain the sequence fn[m]. Evaluating fn[m] for m=n the following is obtained: The process of taking the inverse Fourier transform of X(n,) for a specific n and then dividing by w[0] is represented in the following relation: x[n]w[0]. For w[0]0 x[n] can be obtained by dividing fn[n]/w[0]. 1 x[n]= X ( n,)e jn d 2w[0]  representing synthesis equation for the discretetime STFT. February 11, 2012 Veton Kpuska 30 ShortTime Synthesis In contrast to discretetime STFT X(n,) the discrete STFT X(n,k) is not always invertible. Example 1. Consider the case when w[n] is bandlimited with bandwidth of B. February 11, 2012 Veton Kpuska 31 ShortTime Synthesis Note if there are frequency components of x[n] which do not pass through any of the filter regions of the discrete STFT then it is not a unique representation of x[n], and x[n] is not invertible. Example 2. Consider X(n,k) decimated in time by factor L, i.e., STFT is applied every L samples. w[n] is nonzero over its length Nw. If L > Nw then there are gaps in time where x[n] is not represented/considered. Thus in such cases again x[n] is not invertible.
Veton Kpuska 32 February 11, 2012 L > Nw x[m]
L w[pLm] Nw February 11, 2012 Veton Kpuska 33 ShortTime Synthesis Conclusion: Constraints must be adopted to ensure uniqueness and invertability: 1. 2. Proper/Adequate frequency sampling: B2/Nw (B Window bandwidth) Proper Temporal Decimation: LNw February 11, 2012 Veton Kpuska 34 Filter Bank Summation (FBS) Method Traditional shorttime synthesis method that is commonly referred to as the Filter Bank Summation (FBS). FBS is best described in terms of the filtering interpretation of the discrete STFT. The discrete STFT is considered to be the set of outputs of a bank of filters. The output of each filter is modulated with a complex exponential Modulated filter outputs are summed at each instant of time to obtain the corresponding time sample of the original sequence (see Figure 7.5(b) in the slide 18). February 11, 2012 Veton Kpuska 35 Filter Bank Summation (FBS) Method Recall the synthesis equation given earlier: 1 x[n]= X (n, )e jn d 2w[0]  FBS method carries out discrete version of this equation by utilizing discrete STFT X(n,k):
j kn 1 N 1 y[n]= X ( n ,k ) e N Nw[0] k =0 2 Derive conditions such that to ensure that y[n] x[n].
Veton Kpuska 36 February 11, 2012 Filter Bank Summation (FBS) Method From Figure 7.5
x[n] 1 Analysis followed by synthesis N 1 y[n] Thus: 2 2  j km j kn 1 y[n]= x[m]w[nm]e N e N m= Nw[0] k =0 X ( n ,k ) Interchanging summation operation this equation reduces to: 2 N 1
j nk 1 y[n]= x[n]w[n]e N Nw[0] k =0 February 11, 2012 Veton Kpuska 37 Filter Bank Summation (FBS) Method Furthermore 1 y[n]= x[n]w[n]e Nw[0] k =0 1 y[n]= x[n]w[n]e Nw[0] k =0
N 1 N 1 j 2 nk N j 2 nk N 1 y[n]= x[n]w[n] N [nrN ] Nw[0] r = Periodic impulse train period = N February 11, 2012 Veton Kpuska 38 Filter Bank Summation (FBS) Method Thus: y[n] is the output of the convolution of x[n] with a product of the analysis window with a periodic impulse sequence. Note: reduces to [n] if: 1 y[n]= x[n]w[n] [nrN ] w[0] r = w[n] [nrN ]
r = Window length NwN, or For Nw>N, must have w[rN]=0 for r0, that is w[rN ]=0; for r = 1,2 ,3, ...
February 11, 2012 Veton Kpuska 39 Filter Bank Summation (FBS) Method February 11, 2012 Veton Kpuska 40 Filter Bank Summation (FBS) Method This constraint is known as the FBS constraint. It must be fulfilled in order to ensure exact signal synthesis with the FBS method. This constrained is commonly expressed in frequency domain: This expression states that the frequency responses of the analysis filters should sum to a constant across the entire bandwidth. We will conclude this discussion by stating that a filter bank with N filters, based on an analysis filter of length less than or equal to N, is always an allpass system. 2 W  N k = Nw( 0) k =0
N 1 February 11, 2012 Veton Kpuska 41 Generalized FBS Method Note: "Smoothing" function f[n.m] is referred to as the timevarying synthesis filter. It can be shown that any f[n,m] that fulfills the condition below makes the synthesis equation above valid (Exercise 7.6): 1 jn x[n]= r f [n,nr ] X (r , )e d 2  = Note also that basic FBS method can be obtained by setting the synthesis filter to be a nonsmoothing filter: f[n,m]=[m] m= f [nm]w[m]=1 February 11, 2012 Veton Kpuska 42 Generalized FBS Method Consider the discrete STFT with decimation factor L. Generalized FSB of the synthesized signal is given by: Furthermore, consider time invariant smoothing filter: f[n,m]=f[m] That is: f[n,nrL]=f[nrL] L y[n]= f [n,nrL] X (rL,k )e N r = k =0 N 1 j 2 nk N February 11, 2012 Veton Kpuska 43 Generalized FBS Method Thus j nk L N 1 y[n]= f [nrL] X (rL,k )e N N r = k =0 2 This equation holds when the following constrain is satisfied by the analysis and synthesis filters as well as the temporal decimation and frequency sampling factors: L f [nrL]w[rLn+ pN ]= [ p ],
r = n For f[m]=[m] and L=1 this method reduces to the basic FBS method.
Veton Kpuska 44 February 11, 2012 Generalized FBS Method
Interested in L>1 case and in using f[n] as interpolator. Interpolation FBS Methods: 1. 2. Helical Interpolation (Partnoff) Weighted Overlapadd Method (Croshiere) February 11, 2012 Veton Kpuska 45 OverlapAdd (OLA) Method FBS Method was motivated from the filtering view of the STFT OLA method was motivated from the Fourier transform view of the STFT. In the OLA method:
1. 2. Inverse DFT for each fixed time in the discrete STFT is taken, Overlap and add operation between the shorttime section is performed, This works provided that analysis window is designed such that the overlap and add operation effectively eliminates the analysis window from the synthesized sequence. Basic idea is that the redundancy within overlapping segments and the averaging of the redundant samples remove the effect of windowing. February 11, 2012 Veton Kpuska 46 OverlapAdd (OLA) Method Recall the shorttime synthesis relation: 1 x[n]= X (n, )e jn d 2W [0]  If x[n] is averaged over many shorttime segments and normalized by W(0) then 1 x[n]= X ( p, )e jp d 2W [0]  p= where W (0) = w[n]
n= February 11, 2012 Veton Kpuska 47 OverlapAdd (OLA) Method Discretized version of OLA is given by:
2 j kn 1 1 N 1 y[n]= N X ( p ,k ) e N W (0) p= k =0 Note that the above IDFT is true provided that N>N w. The expression for y[n] thus becomes: IDFT: f p [ n ]= x[ n ] w[ pn ] 1 1 y[n]= x[n]w[ pn]= x[n] W (0) pw[ pn] W (0) p= = Which provided that: then p= w[ pn]=W (0)
y[n]=x[n] Always True because sum of values of a sequence must always equal the first value of its Fourier transform (D.C. Energy of a signal is by definition sum of signal values) February 11, 2012 Veton Kpuska 48 OverlapAdd (OLA) Method For decimation in time by factor of L, it can be shown (Exercise 7.4) that: W ( 0) w[ pLn]= L p= Then x[n] can be synthesized using the following equation:
2 j kn L 1 N 1 y[n]= N X ( pL,k )e N W (0) p= k =0 The above equation depicts general constrain imposed by OLA method. It requires that the sum of all the analysis windows (obtained by sliding w[n] with Lpoint increments) to add up to a constant as shown in the next figure. February 11, 2012 Veton Kpuska 49 OverlapAdd (OLA) Method February 11, 2012 Veton Kpuska 50 OverlapAdd (OLA) Method Duality of OLA constraint and FBS constraint: W  N k = Nw( 0) k =0 N 1 FBS 2 p= w[ pLn]= OLA W (0) L FBS method requires that finitelength windows have a length N w less than the number of analysis filters N to satisfy FBS constrain (N>N w). Analogously, for OLA methods it can be shown that its constrained is satisfied by allfinite bandwidth analysis windows whose maximum frequency is less than 2/L (where L is temporal decimation factor). In addition this finitebandwidth constraint can be relaxed by allowing the shifted window transform replicas to take on value zero at the frequency origin =0: 2 W  k =0, L at = 2 k L Analogous to FBS constrain for Nw>N where the window w[n] is required to take on value zero at n= N, 2N, 3N,... February 11, 2012 Veton Kpuska 51 OverlapAdd (OLA) Method February 11, 2012 Veton Kpuska 52 TimeFrequency Sampling Different qualitative view of the timefrequency sampling concepts for OLA and FBS constrains from the perspective of classical timedomain and frequencydomain aliasing. Following discussion serves as additional summary of sampling issues for those two methods that gives motivation for our earlier statement that sufficient but not necessary conditions for invertability of the discrete STFT are:
1. 2. 3. The analysis window is nonzero over its finite length Nw. The temporal decimation factor LNw The frequency sampling interval 2/N 2/Nw February 11, 2012 Veton Kpuska 53 TimeFrequency Sampling Consider windowed/shorttime signal: From Fourier transform point of view: fn[m]=w[m]x[nm], and X(n,) Fourier transform of fn[m] Analysis window duration of Nw From Timedomain point of view: Reconstruction of fn[m] from X(n,k) requires a frequency sampling of at least 2/Nw or finer. Time decimation interval L is required to meet Nyquist criterion based on the bandwidth of the window w[n]. This implies sampling of X(n,k) at a time interval L 2/c to avoid frequencydomain aliasing of the time sequence X(n,) c c c is the bandwidth of W() [c, c] Veton Kpuska 54 February 11, 2012 TimeFrequency Sampling February 11, 2012 Veton Kpuska 55 TimeFrequency Sampling Sufficient (but not necessary) conditions for signal reconstruction are:
1. 2. 3. Window is nonzero over its lengths Nw Temporal decimation factor L Nw (2/c) Frequency sampling interval 2/N 2/Nw To avoid aliasing: I. II. In the time domain by ensuring condition 3. In the frequency domain by ensuring condition 2. February 11, 2012 Veton Kpuska 56 Time Decimation Sampling Implication on the use of practical windows:
I. Rectangular window, Nw Assuming bandwidth equal to the extent of the main lobe B = [2/Nw,: 2/Nw]= 4/Nw c c 2 N w Lw ;50% Overlap in windows B 2 I. Hamming Window, Nw 2 Nw Bandwidth B = 8/Nw ;75% Overlap in windows Veton Kpuska 57 Lw B 4 February 11, 2012 Summary OLA Method (DFT of order N)
1. No time aliasing if window length Nw so that: 2/N 2/Nw No frequencydomain aliasing occurs if decimation factor L is small enough so that filter bandwidth c =(2/L) 2. 3. If zeros are allowed in W() then condition 2 can be relaxed. In this case we can undersample in frequency and still recover the sequence. February 11, 2012 Veton Kpuska 58 Summary FBS Method
1. 2. No frequencydomain aliasing occurs if the decimation factor L meets the Nyquist criterion, i.e., L Nw (2/c) where c is the w[n] bandwidth. Not timedomain aliasing occurs if 2/N 2/Nw Nw N. If zeros in w[n] are allowed then condition 2 can be relaxed. In this case we can undersample in time and still recover the sequence. 3. February 11, 2012 Veton Kpuska 59 ShortTime Fourier Transform Magnitude (STFTM) Spectrogram major tool in speech applications: Spectrogram is squared STFT magnitude (STFTM). It has been suggested that human ear extracts perceptual information strictly form a spectrogramlikerepresentation of speech ( J.C. Anderson, "Speech Analysis/Synthesis Based on Perception", PhD Thesis, MIT, 1984) Experienced speech researchers have trained themselves to "read" the spectrogram itself (Victor Zue, MIT). Primary topic of FITece5528 "Acoustics of American Speech". February 11, 2012 Veton Kpuska 60 ShortTime Fourier Transform Magnitude (STFTM) STFTM discards (possibly) phase information, which has numerous uses in application areas: In all these applications phase information estimation of speech is difficult (e.g., presence of noise in the signal) Furthermore, a number of techniques have been developed to obtain phase estimate from a STFT magnitude. This section introduces STFTM as an alternative timefrequency signal representation. In addition analysis and synthesis techniques will be developed for STFTM. Timescale modification Speech Enhancement February 11, 2012 Veton Kpuska 61 ShortTime Fourier Transform Magnitude (STFTM) SquaredMagnitude and Autocorrelation Relationship: 1 r[ n , m ] = 2
2  X (n, ) e jn d
r[n, m]e  jn 2 Shorttime autocorrelation Shorttime magnitude X ( n, ) = m =  mautocorrelation "lag" February 11, 2012 Veton Kpuska 62 ShortTime Fourier Transform Magnitude (STFTM) Furthermore, the autocorrelation r[n,m] is given by the convolution of the shorttime signal: r[n,m] = fn[m]*fn[m] fn[m]=x[m]w[nm] where February 11, 2012 Veton Kpuska 63 Signal Representation Under what conditions STFTM can be used to represent a sequence uniquely? Note that:  F{x[n]} =  F{x[n]} Ambiguity, thus STFTM is not unique representation for all cases. However, by imposing certain mild restrictions on: the analysis window and the signal, unique signal representation is indeed possible with the discretetime STFTM.
Veton Kpuska 64 February 11, 2012 Signal Representation Suppose x[n] is the sum of two signals: x1[n] and x2[n] occupying different regions of the naxis. Furthermore, suppose that the gap of zeros between x1[n] and x2[n] is large enough so that there is no analysis window position for which the corresponding shorttime section includes nonzero samples of both x1[n] and x2[n]. Because of the ambiguity condition STFTM of:
x1[n] + x2[n] x1[n] x2[n], and x1[n] + x2[n] is the same. February 11, 2012 Veton Kpuska 65 Signal Representation Any uniqueness conditions must include a restriction on the length of zero gaps between nonzero portions of the signal x[n]. Sufficient uniqueness conditions are the following:
1. The analysis window w[n] is known sequence of finite length Nw, with no zeros over its durations. The sequence x[n] is onesided with at most Nw2 consecutive zero samples, and the sign of its first nonzero value is known. 2. February 11, 2012 Veton Kpuska 66 Signal Representation If shorttime spectral magnitude of signal segment at time n is know then Spectral magnitude of the adjacent section at time n+1 must be consistent in the region of overlap with the known short time section. If the analysis window were nonzero and of length Nw, then after dividing out the analysis window, the first Nw1 samples of the segment at time n+1, must equal the last N w 1 of the segment at time n (as illustrated in the next slide) If the last sample of a segment can be extrapolated from its first Nw1 values, one could repeat this process to obtain the entire signal x[n]. If the successive STFTM correspond to overlapping signal segments then: February 11, 2012 Veton Kpuska 67 Signal Representation February 11, 2012 Veton Kpuska 68 Signal Representation To develop the procedure for extrapolating the next sample of a sequence using its STFTM, assume that the first N w1 samples under the analysis window positioned at time n are known. Goal is to compute sample x[n] from these initial samples and the STFT magnitude, X(n,), or equivalently r[n,m]. The sequence x[n] has been obtained up to some time n1 from its STFTM. February 11, 2012 Veton Kpuska 69 Signal Representation Note that r[n, Nw1], the maximum lag of autocorrelation, is given by the product of the first and last value of the segment: r[n, N w  1] = ( w[0]x[n  0])( w[ N w  1]x[n  ( N w  1)]) r[n, N w  1] x[n] = w[0]w[ N w  1]x[n  ( N w  1)]
first of next last of present February 11, 2012 Veton Kpuska 70 Signal Representation Note that: X ( n, ) =
2 m =  r [ n, m] e  jn If the first value of the shorttime section, x[n(Nw1)] happens to be equal to zero, must find the first nonzero value within the section and again use the product relation as depicted in the last expression. Note that such a sample can be found because it was assumed that there are at most Nw2 consecutive zero samples between any two nonzero samples of x[n]. February 11, 2012 Veton Kpuska 71 Signal Representation Sequential extrapolation algorithm
1. 2. 3. Initialize with x[0] Update time n Compute r[n,Nw1] from the inverse DFT of  X(n,k)2. 4. r[n, N w  1] Compute: x[ n] = w[0]w[ N w  1]x[n  ( N w  1)]
Return to step (2) and repeat
Veton Kpuska 72 5. February 11, 2012 Reconstruction from TimeFrequency Samples To carry out STFTM analysis on a digital computer, discrete STFTM must be applied. Uniqueness theory of STFTM can be easily extended to discrete STFTM. Uniqueness of STFTM based on the shorttime autocorrelation functions. Autocorrelation functions can be obtained even if the STFTM is sampled in frequency (discrete STFTM) with adequate frequency sampling. To consider effects of temporal decimation with factor L, we note that adjacent shorttime sections now have an overlap of NwL instead of Nw1.
Veton Kpuska 73 February 11, 2012 Reconstruction from TimeFrequency Samples Sufficient uniqueness conditions for the partial overlap case:
1. The analysis window w[n] is a known sequence of finite length Nw, with no zeros over its duration. The sequence x[n] is onesided with, at most Nw2L consecutive zero samples. L consecutive samples of x[n] (from the first nonzero sample) are known. This is a sufficient but not a necessary condition. 2. February 11, 2012 Veton Kpuska 74 Signal Estimation from the Modified STFT or STFTM Synthesis of a signal from a timefrequency function of a modified STFT or STFTM required in many applications. Modification may arise due to:
1. 2. 3. 4. Limitations: Quantization errors (e.g., from speech coding) Timevarying filtering Speech Enhancement Signal Rate modifications Modifications in frequency should result in time modification that are restricted within an analysis window (Figure 7.18 next slide) Overlapping sections must undergo similar modifications (Figure 7.19) February 11, 2012 Veton Kpuska 75 Signal Estimation from the Modified STFT or STFTM Example 7.5. Removal of interfering tone. Consider modifying a valid X(n,) of short time fn[m]=x[m]w[nm] segment by inserting a zero gap where there is known to lie an unwanted interfering sine wave component. Removal of the interfering signal with H(n,). Resulting frequency representation is: Y(n,)=X(n,)H(n,) Inverse transforming it to obtain modified shorttime sequence gn[m] is nonzero beyond the extent of the original shorttime segment fn[m]=x[m]w[nm]. February 11, 2012 Veton Kpuska 76 Signal Estimation from the Modified STFT or STFTM Example 7.6 At time n: Suppose a timedecimated STFT, X(nL,) is multiplied by a linear phase factor ejno to obtain Y(nL,)=X(nL,)ejno At time (n+1) X((n+1)L,) is multiplied by a negative of this linear phase factor ejno to obtain Y((n+1)L,)=X((n+1)L,)ejno Overlapping sections of inverse Fourier Transforms denoted by gnL[m] and g(n+1)L[m] are not consistent. February 11, 2012 Veton Kpuska 77 Heuristic Application of STFT Synthesis Methods Although modifications of the STFT or STFTM may violate some principles, results may be "reasonable". Resulting effect of modifying STFT (FBS and OLA) with another timefrequency function can be shown to be a timevarying convolution between x[n] and a function [n,m]: x[n]*[n,m]. Let X(n,) be modified by a function H(n,): Y(n,) = X(n,)H(n,) This corresponds to a new shorttime segment: gn[m] = fn[n]*h[n,m] h[n,m] time varying system impulse response (Chapter 2). February 11, 2012 Veton Kpuska 78 Heuristic Application of STFT Synthesis Methods Consider FBS method (discretization in frequency to obtain): Y (n,k ) =Y (n, ) Npoint IDFT of H(n,k): 2 = k N = X ( n ,k ) H ( n ,k ) l = Then resulting sequence can be written as: ~ h [n,m]= h[n,mlN ], periodic over N where ^ y[n]= x[nm]h[n,m]
m= ^ h[n,m]= w[n] h[n,mlN ]
l = February 11, 2012 Veton Kpuska 79 Heuristic Application of STFT Synthesis Methods Using OLA method, it can be shown (see Exercise 7.11) that: ^ h[n,m]= w[n] h[n,mlN ]
l = Contrasting FBS with OLA FBS: OLA: multiplication convolution instantaneous change smoothing February 11, 2012 Veton Kpuska 80 Heuristic Application of STFT Synthesis Methods Example 7.7 Suppose we want to deliberately introduce reverberation into a signal x[n] by convolution with the filter: h[n] = [n] + [nno] Fourier transform of which is: H() = 1 + ejno STFT of resulting signal is given by: Y(n,)= X(n,)H()
where X (n, ) = x[m]w[nm]e  jm
m= February 11, 2012 Veton Kpuska 81 Example 7.7 (cont.) Using OLA method (7.21): 2 j kn 1 1 N 1 y[n]= n Y ( p,k ) e N W (0) p= k =0 It is then possible to express y[n] in terms of original sequence: 2 j k ( n m ) 1 N 1 1 N y[n]= w[ p m] x[m] N H ( k ) e W (0) p= 0  k = p= W (0) IDFT h[ nm+rN ] r =
^ = x[m]h[nm]
p= February 11, 2012 Veton Kpuska 82 Example 7.7 (cont.) Where ^ h[n]= h[n+ rN ]= ( [n+rN ]+ [nno +rN ])
r = r = is periodic extension of h[n], over N, of which we only consider interval [0,N1]. This implies that original reverberated signal is obtained only when no<N, otherwise temporal alias will occur (as illustrated in 7.20). February 11, 2012 Veton Kpuska 83 Example 7.7 (cont.) February 11, 2012 Veton Kpuska 84 TimeScale Modification and Enhancement of Speech The signal construction methods presented in this chapter can be applied in a variety of speech applications. TimeScale Modification In speech case would like to change articulation rate (faster, slower) without changing the pitch February 11, 2012 Veton Kpuska 85 TimeScale Modification February 11, 2012 Veton Kpuska 86 TimeScale Modification Methods: Cut & Paste (Fairbanks method): Discard or duplicate frames, in order to speed up or slow down the articulation respectively. Problem: Pitchsynchronous OLA (Scott & Gerber) Pitch period mismatch at adjacent frames causes distortion. Select frame size & location synchronous to pitch periods. Problem of pitch period mismatch is avoided. Problem: STFTM Synthesis 1. 2. 3. Pitch synchronization is not always easy. To avoid pitch synchronization problems use only the magnitude of STFT (i.e., STFTM) Compute X(nL,) at an appropriate frame interval decimation rate L (e.g., L=128 at Fs=10000 Hz, and N is several T0 long) Modify decimation rate with new rate M (e.g., M=L/2) for a speedup of factor of : Y(nM,)= X(nL,) Apply the LeastSquared Error iterative estimation algorithm until Y(nM,) converged. Problem: Occasional reverberant characteristic of synthesized signal are perceived due to lack of STFT phase control. February 11, 2012 Veton Kpuska 87 TimeScale Modification February 11, 2012 Veton Kpuska 88 Noise Reduction A number of techniques developed to remove/reduce additive noise: Noise corrupted signal is given by: STFT Synthesis: y[n]=x[n]+b[n] Subtract Noise spectrum b() ^ ^ X (nL, ) = Y (nL, ) Sb ( ) e jY ( nL , )
2 2 2 ^ ^ if Y (nL, ) Sb ( ) < 0 Y (nL, ) S b ( ) = 0 [ 1 2 Original phase spectrum Y(nL,) is retained because phase of the noise can not be reliably estimated in general. Factor is a control of the degree of noise reduction.
Veton Kpuska 89 February 11, 2012 Noise Reduction STFTM Synthesis: Ignore phase and use Sequential Extrapolation or Least Squared Error estimation method to construct clean signal. February 11, 2012 Veton Kpuska 90 ...
View
Full
Document
This note was uploaded on 02/10/2012 for the course ECE 3552 taught by Professor Staff during the Fall '10 term at FIT.
 Fall '10
 Staff

Click to edit the document details