33. Statistics
1
33. STATISTICS
Revised September 2009 by G. Cowan (RHUL).
This chapter gives an overview of statistical methods used in high-energy physics. In
statistics, we are interested in using a given sample of data to make inferences about
a probabilistic model,
e.g.
, to assess the model’s validity or to determine the values
of its parameters. There are two main approaches to statistical inference, which we
may call frequentist and Bayesian. In frequentist statistics, probability is interpreted as
the frequency of the outcome of a repeatable experiment. The most important tools
in this framework are parameter estimation, covered in Section 33.1, and statistical
tests, discussed in Section 33.2. Frequentist confidence intervals, which are constructed
so as to cover the true value of a parameter with a specified probability, are treated in
Section 33.3.2. Note that in frequentist statistics one does not define a probability for a
hypothesis or for a parameter.
Frequentist statistics provides the usual tools for reporting the outcome of an
experiment objectively, without needing to incorporate prior beliefs concerning the
parameter being measured or the theory being tested. As such, they are used for
reporting most measurements and their statistical uncertainties in high-energy physics.
In Bayesian statistics, the interpretation of probability is more general and includes
degree of belief
(called subjective probability). One can then speak of a probability
density function (p.d.f.) for a parameter, which expresses one’s state of knowledge about
where its true value lies. Bayesian methods allow for a natural way to input additional
information, such as physical boundaries and subjective information; in fact they
require
the
prior
p.d.f. as input for the parameters,
i.e.
, the degree of belief about the parameters’
values before carrying out the measurement. Using Bayes’ theorem Eq. (32
.
4), the prior
degree of belief is updated by the data from the experiment. Bayesian methods for
interval estimation are discussed in Sections 33.3.1 and
33.3.2.6
Bayesian techniques are often used to treat systematic uncertainties, where the author’s
beliefs about, say, the accuracy of the measuring device may enter. Bayesian statistics
also provides a useful framework for discussing the validity of different theoretical
interpretations of the data. This aspect of a measurement, however, will usually be
treated separately from the reporting of the result.
For many inference problems, the frequentist and Bayesian approaches give similar
numerical answers, even though they are based on fundamentally different interpretations
of probability. For small data samples, however, and for measurements of a parameter
near a physical boundary, the different approaches may yield different results, so we are
forced to make a choice. For a discussion of Bayesian vs. non-Bayesian methods, see
References written by a statistician[1], by a physicist[2], or the more detailed comparison
in Ref. [3].