I6-2 - Sedma Nacionalna Konferencija so Me|unarodno U~estvo...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
PERFORMANCE OF DTW SPEECH RECOGNIZER ON PACKET SWITCHED NETWORK Ivan Kraljevski 1 , Zoran Gacovski 2 , Sime Arsenovski 2 , Martin Mihajlov 3 1 Faculty for Veterinary Medicine Skopje, ul. Lazar Pop-Trajkov 5-7, 1000, Skopje, Macedonia, kraljivan@fvm.ukim.edu.mk 2 Faculty of Social Sciences, Anton Popov bb, 1000, Skopje, Macedonia, zoran@eurm.edu.mk, sime.arsenovski@eurm.edu.mk 3 Postgraduate Student, Faculty of Electrical Engineering – Skopje, Karpoš II bb, PO Box 574, 1000 Skopje, Macedonia Abstract – In this work speech recognizer performance was evaluated simulating transfer of voice over packet switched networks. The recognizer is based on Dynamic Time Warping speech recognition engine, and several tests has been made on test set of voice samples from single speaker with simulated packet loss effects on the perceived speech. Achieved results were compared with predicted values by E-model and MOS values of the used transmission channel and their correlation was observed. Keywords – speech recognition, packet loss, packet switched network, E-model, subjective quality 1. INTRODUCTION Recent developments in speech technology have enabled a new generation of interactive voice response (IVR) services operating over the conventional circuit switched PSTN telephone and packet switched best-effort networks. There has been growing interest in using the Internet and other Internet protocol (IP) networks for telephony services. Motivations such as reduced cost, simplification of infrastructure through network convergence, cause that Voice over IP (VoIP) based on packet audio is becoming a popular service. Voice over IP networks differ from conventional telephone networks in that voice quality is affected by a wider variety of network impairments and can vary from call to call and even during a call. To properly identified and respond to user request in an IVR system, high performance speech recognition is required. Speech input achieved via the recognizer is usually supported with dial-pulse input. Speech output can be based on stored concatenated speech or speech to text system can be used. Speech recognition over telephony is a challenging task, since there is great signal, network and speaker variability [1]. Best-effort IP networks present significant new challenges to the delivery of real-time voice traffic. IP networks do not guarantees that sufficient bandwidth is reserved during the call process, whereas the circuit-switched PSTN. Delay is not guaranteed to be either minimal or constant in an IP network. In addition, dropped packets and packet delay variation, or jitter, introduce distortions not found in traditional telephony. Low bitrate (high compression ratio) codecs used to reduce required bandwidth distort the original waveform significantly before it is even transmitted. The compressed speech produced by such codecs is also more sensitive to packet loss. These issues are particularly important for IVR systems which may introduce bad recognition performance if they are used in packet switched networks.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 5

I6-2 - Sedma Nacionalna Konferencija so Me|unarodno U~estvo...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online