This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: n (NX) |d (D)| a (AX) | l (L) | | ng (NX) | Q | a (AX) | t (T) | r (R) | o (AO) | s (Z) | Q | s (S) | k (K) | Q | i (IH) | Oa (OW) |d (ED)| sh (SH) | a (EY) g (H) | i (IH) | v (V) | es (Z) | s (S) | o (OW) 1.0. INTRODUCTION Speech is one of the essential communication tools between humans. We use speech to convey our views and thoughts and many other things. In the modern world and with the technological advances it is essential to be able to understand means of processing speech signals. There are several ways of characterizing the communications potential of speech. One approach is in terms of information theory. According to information theory, speech can be represented in terms of its message content, or information. In the first section of this project (Speech Segmentation) we will deal with this approach to speech characterization. Another approach is to characterize speech is in terms of the signal carrying the message information, which is the acoustic waveform. This will be practiced in the second and third section of this document (Preemphasis of speech and Short_time Fourier Transform). To perform and understand intended purpose of this project, we have used the speech file S5.MAT “Oak is strong and also gives shade.” 2.0. SPEECH SEGMENTATION Segmentation is the first step in characterizing speech in terms of its message content. It is basic step of speech segmentation by breaking apart the sound signal and classifying it into a string of phones. Process of segmentation is very difficult both for humans and computers. The reason is, for example, the speech sound may blend with the speech sound adjacent to it smoothly, or they may fuse together. However, it is very useful to understand and recognize different parts of the speech waveform that relate to different phonemes of the utterance. 1. Phonetic Representation of Text One of the tools to enable us to define a way of representing sequences of speech in terms of ASCII characters is “ARPABET”. “ARPABET” is a phonetic transcription code developed by Advanced Research Projects Agency (ARPA) as a part of their Speech Understanding Project (1971-1976) 1 . It represents each phoneme of General American English with a distinct sequence of ASCII characters. This will enable us to represent speech sequences in a way that is useful to computers. Now we have a closer look at S5.MAT file. This waveform was sampled with a sampling rate of 8000 samples / sec. Table 1 shows phonetic representation of the S5.MAT waveform using the “ARPABET” system of phonetic symbols. Oa k i s S t r o ng a n d O W K Q I H Z Q S T R A O N X Q A X N D a l s o g i v es sh a de A X L S O W H I H V Z S H E Y D 1 1 Source DARPA (Defense Advanced Research Projects Agency) EE253 Term Project #1 – Introduction to Speech Processing Page 1 0.5 1 1.5 2 2.5 x 10 4-4-3-2-1 1 2 3 x 10 4 Oak is strong and also gives shade: s5(1:24570) Sample Index Table 1....
View Full Document
This note was uploaded on 02/26/2010 for the course EE 253 taught by Professor Marouf,e during the Fall '08 term at San Jose State.
- Fall '08