gillickcox89significancetesting

gillickcox89significancetesting - SOME STATISTICAL ISSUES...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
SOME STATISTICAL ISSUES IN THE COMPARISON OF SPEECH RECOGNITION ALGORITHMS L. Gillick and Stephen Cox Dragon Systems, Inc., Chapel Bridge Park, 90 Bridge Street, Newton, MA 02158, USA British Telecom Research Laboratories, Martlesham Heath, Ipswich IP5 7RE, U.K. ABSTRACT In the development of speech recognition algorithms, it is important to know whether any apparent difference in per- formance of algorithms is statistically significant, yet this issue is almost always overlooked. We present two sim- ple tests for deciding whether the difference in error-rates between two algorithms tested on the same data set is sta- tistically significant. The first (McNemar’s test) requires the errors made by an algorithm to be independent events and is most appropriate for isolated word algorithms. The second (a matched-pairs test) can be used even when er- rors are not independent events and is more appropriate for connected speech. 1. INTRODUCTION The speech recognition literature currently abounds with descriptions of novel or improved algorithms for speech recognition. It is common practice for researchers to test two or more algorithms together and then to make claims for their relative efficacy on the basis of the test results. However, these claims are seldom backed by evidence that any difference in performanceis statistically significant; in- deed, most papers show an almost complete lack of aware- ness of the importance of comparing results of experiments in a way that takes account of variability and uncertainty in a principled manner. In this paper, we present some statistical ideas and techniques that will make it possible to perform such comparisons on algorithms (or systems) that recognise isolated words and connected or continuous speech. We hope to thereby encourage researchers who are reporting empirical results to use statistical measures in summarizing their findings and drawing conclusions. We concentrate on methods in which the algorithms are tested on the same data set. Algorithms are often compared by testing them with the same data because by forcing the test items to be the same, the results then reflect differences be- tween the algorithms rather than any accidental differences in the difficulty of the test items in independent data sets. However, the constraint of testing different algorithms on the same data set calls for a more sophisticated statistical approach than that required if each algorithm were tested on an independent set. 1.1. Notation We shall use capital letters throughout to denote random variables (RVs) and lowercase letters for scalars or ob- served values of random variables. An exception to the above rule is an estimate of a parameter which, although it is an RV, we denote by a cicumflexed lower-case letter.
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern