gillickcox89significancetesting

gillickcox89significancetesting - SOME STATISTICAL ISSUES...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
SOME STATISTICAL ISSUES IN THE COMPARISON OF SPEECH RECOGNITION ALGORITHMS L. Gillick and Stephen Cox Dragon Systems, Inc., Chapel Bridge Park, 90 Bridge Street, Newton, MA 02158, USA British Telecom Research Laboratories, Martlesham Heath, Ipswich IP5 7RE, U.K. ABSTRACT In the development of speech recognition algorithms, it is important to know whether any apparent difference in per- formance of algorithms is statistically signiFcant, yet this issue is almost always overlooked. We present two sim- ple tests for deciding whether the difference in error-rates between two algorithms tested on the same data set is sta- tistically signiFcant. The Frst (McNemar’s test) requires the errors made by an algorithm to be independent events and is most appropriate for isolated word algorithms. The second (a matched-pairs test) can be used even when er- rors are not independent events and is more appropriate for connected speech. 1. INTRODUCTION The speech recognition literature currently abounds with descriptions of novel or improved algorithms for speech recognition. It is common practice for researchers to test two or more algorithms together and then to make claims for their relative efFcacy on the basis of the test results. However, these claims are seldom backed by evidence that any difference in performanceis statistically signiFcant; in- deed, most papers show an almost complete lack of aware- ness of the importance of comparing results of experiments in a way that takes account of variability and uncertainty in a principled manner. In this paper, we present some statistical ideas and techniques that will make it possible to perform such comparisons on algorithms (or systems) that recognise isolated words and connected or continuous speech. We hope to thereby encourage researchers who are reporting empirical results to use statistical measures in summarizing their Fndings and drawing conclusions. We concentrate on methods in which the algorithms are tested on the same data set. Algorithms are often compared by testing them with the same data because by forcing the test items to be the same, the results then re±ect differences be- tween the algorithms rather than any accidental differences in the difFculty of the test items in independent data sets. However, the constraint of testing different algorithms on the same data set calls for a more sophisticated statistical approach than that required if each algorithm were tested on an independent set. 1.1. Notation We shall use capital letters throughout to denote random variables (RVs) and lowercase letters for scalars or ob- served values of random variables. An exception to the above rule is an estimate of a parameter which, although it is an RV, we denote by a cicum±exed lower-case letter. 2. A SIMPLE APPROACH
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/08/2010 for the course CS 6.345 taught by Professor Glass during the Spring '10 term at MIT.

Page1 / 4

gillickcox89significancetesting - SOME STATISTICAL ISSUES...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online