Pitch Estimation by Enhanced Super Resolution

Pitch Estimation by Enhanced Super Resolution - Speech...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Speech Processing Project: Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling Professor: Dr. Kepuska
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2
Background image of page 2
Pitch Estimation by Enhanced Super Resolution F ` a determinator Introduction The fundamental frequency F ` a of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds during the voicing of segments. The pitch of speech is the perceptual correlate of F ` a . The fundamental frequency of speech is important in the prosodic features of stress and intonation. However, determining F ` a is not a simple task, and many approaches called fundamental frequency determination Algorithm (FDAs). The objective of FDA, fundamental frequency determinator algorithms, is to determine the fundamental frequency of speech waveform or analyzing the pitch automatically. The FDAs is implemented by the desire to examine methods of fundamental frequency extraction which use radically different techniques, and by the ease of implementation form the original descriptions of the algorithms. For example, the algorithms to determine the F ` a which existing are. Cepstrum-based F ` a determinator (CFD) (Noll, 1969). Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970) Feature-based F ` a tracker (FBFT) (Phillips, 1985) Parallel processing method (PP) (Gold & Rabiner, 1969) Integrated F ` a tracking algorithm (IFTA) (Secrest & Doddington, 1983) Super resolution F ` a determinator (SRFD) (Medan et al., 1991) The CFD and HPS make use of frequency domain representations of the speech signal to determine the F ` a . The FBFT and PP produce fundamental frequency estimates by analyzing the waveform in the time domain. Finally, the IFTA and SRFD uses a waveform similarity metric based on a normalized cross-correlation coefficient. The most reliable and accurate method of determining the fundamental frequency of a speech waveform is tried to discover in order to minimise the number of errors occurring during F ` a extraction from propagating into the prosodic analysis. A preliminary evaluation of the FDAs reviewed [1] provided evidence that the super resolution F ` a determinator (SRFD) and the feature-based F ` a tracker (FBFT) perform most reliably and consistently. Enhance SRFD (eSRFD) is the performances of the SRFD algorithm, in order to reduced the occurrence of errors such that it is optimized for prosodic analysis. 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Enhance Super Resolution F ` a determinator (eSRFD) The eSRFD is based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient to quantify the degree of similarity between two adjacent, non-overlapping section in order to determine the fundamental frequency. The eSRFD algorithm is follows by: Pass the speech waveform to low-pass filter .
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 12

Pitch Estimation by Enhanced Super Resolution - Speech...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online