This preview shows pages 1–20. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Statistics allows us to look at our data
in different ways and make objective
and intelligent decisions regarding their
quality and use. 34 CHAPTER STATISTICAL EVALUATION OF
DATA We haVe shown that the magnitude of the indeterminate error associ
ated with an individual measurement is determined by a chance combina
tion of tiny individual errors, each of which may be positive or negative.
Because chance is involved in this type of error, we can use the laws of
statistics to extract information from experimental data.1 In this chapter
we describe several important statistical procedures and show h0w they
are used to estimate the magnitude of the indeterminate error in an anal ysis. 3A THE STATISTICAL TREATMENT
OF INDETERMINATE ERRORS Statistics is the mathematical science that deals with chance variations.
We must emphasize at the outset that statistics only reveals information I
that is already present in a data set. That is, no new information is created ',
by statistiCs. Statistical treatment of a data set does, however, allow us to
make objective judgments concerning the validity of results that are oth
erwise difﬁcult to make. 3A—1 The Population and the Sample In order to use statistics to treat our data, we must assume that the few
replicate experimental results gathered in the laboratory are a tiny but
representative fraction of an inﬁnite number of results that could be col
lected if we had inﬁnite time. Statisticians call this small set of data a
sampie and view it as a subset of a population, or a universe, of data that IReferences on statistical applications include R. L. Anderson, Practical Statistics for Ana
lytt'cai Chemists. New York: Van NostranddReinhold, 1987; R. Calcutt and R. Boddy,
Statistics for Analytical Chemists. New York: Chapman and Hall, 1983; J. Mandel, in
Treatise on Armlytt‘cai Chemistry, 2nd cd., I. M. Kolthoff and P. J. Elving, Eds., Part I, Vol. 1, Chapter 5. New York: Wiley, 1978. 3A The Statistical Treatment of Indeterminate Errors 35 in principle exists. For example, the data in Table 2—3 make up a statisti
cal sample of an inﬁnite population of pipetcalibration measurements that
can be imagined (but not performed). The iaWS of statistiCs apply strictly to a p0pulation of data only. To use
these laws, we must assume that the handful of data that make up the
typical sample truly represents the inﬁnite population of results. Unfortu
nately, there is no guarantee that this assumption is valid. As a result,
statistical estimates about the magnitude of indeterminate errors are
themselves subject to uncertainty and therefore can only be made in
terms of probabilities. The Population Mean (a) and the Sample Mean (i) We will ﬁnd it useful to differentiate between the sampie mean and the
popaiatian mean. The sample mean is the mean of a limited sample drawn
from a population of data. It is deﬁned by Equation 2—1, when N is a small
number. The populatioo mean, in contrast, is the true mean for the popu
lation. It is also deﬁned by Equation 2—] when N approaches inﬁnity. If
the data are free of determinate error, the population mean is also the true
value. T o emphasize the difference between the two means, the sample
mean is symbolized by if and the population mean by a. More often than
not, particularly when N is small, if differs from it because a small sample
of data does not exactly represent its pepulation. The Sample Standard Deviation (s) and the Population Standard De
viation (a) We must also differentiate between the sample standard deviation and the
popaiatiOn standard deviation. The sample standard deviation 5 was de
ﬁned in Equation 2—2; that is, S = —NT ‘3'” In contrast, the population standard deviation 0', which is the true
standard deviation, is given by (32) Note that this equation differs from Equation 3w1 in two ways. First, the
population mean p. appears in the numerator of Equation 3—2 in place of
the sample mean, E. Second, N replaces the number of degrees of free
dom (N — 1) that appears in EquatiOn 3—1. The reason the number of degrees of freedom must be used when N is
small is as follows. When 0‘ is unknown, two quantities must be extracted
from a set of data: .7? and 3. One degree of freedom is used to establish if Do not confuse the statistical sample
with the analytical sample. Four ana
lytical samples analyzed in the labora
tory represent a single statistical sam
ple. This is an unfortunate duplication
of the term “sample” that should cause
no trouble once you are aware of its
two meanings. Sample mean = .f, where N
E x.
f = (:31; when N is small.
Population mean = a, where
N
Z x.
n a [TV when N—) an. In the absence of determinate error, the
population mean a is the true value of
a measured quantity. When N—r W,i—+uands—>cr. — 35 Chapter 3 Statistical Evaluation of Data Figure 3—1 Normal error curves. The standard
deviation for curve B is twice that
fOr curve A, that is, 03 : 20,1. {a} The abscissa is the deviation
from the mean in the units of mea
surement. (b) The abscissa is the
deviation from the mean in units of
0’. Thus, the two curves A and B
are identical here. because, with their signs retained, the sum of the individual deviations
must add up to zero. Thus, when N  l deviations have been computed,
the ﬁnal one is known. Consequently, only N — 1 deviations provide an
independent measure of the precision of the set. 3A—2 Properties of the Normal Error Curve Figure 3—1a shows two Gaussian curves in which the relative frequency
of occurrence of various deviations from the mean is plotted as a function
of deviation from the mean (x  it). The two curves are for two popula—
tions of data that differ only in standard deviation. The standard deviation
for the population yielding the broader but lower curve (B) is twice that
for the pepulation yielding curve A. Figure 3—lb shows another type of normal error curve in which the
abscissa is now a new variable, 3, which is defined as 3 = — (34.) 0.4 0.3 P
M Relative frequency F5 0.4 0.3 P
N Relative frequency 9 {b} 3A The Statistical Treatment 01 Indeterminate Errors 37 Feature 3—1
WHY USE THE NUMBER OF DEGREES OF FREEDOM INSTEAD OF N? The effect of using the number of degrees of freedom for calculat
ing the standard deviation can be demonstrated by dividing the data
in Table 2—3 into 10 samples of 5 data each, 5 samples of 10 data
each, and 2 samples of 25 data each. When 0 and s are calculated
for each sample using Equations 3~1 and 3—2, the results are Number ot Samples Mean {T of Samples Mean 3 of Samples and Size (Equation 3—2) (Equation 3—1)
10 samples of 5 0.0053 0.0059
5 samples of 10 0.0058 0.006]
2 samples of 25 0.0059 0.0061
1 sample of 50 0.0060 0.0060 A negative bias accompanies application of Equation 3a2 to small
sets of data; this bias is reﬂected in the data of column 2. Note (in
column 3) that the bias disappears when .r is calculated using Equa
tion 3—1. Note that z is the standard deviation from the mean of the data expressed
in units ofstandard deviation. That is, when x — ,a = 0, z is equal to one
standard deviation; when x — p. = 20', z is equal to two standard devia—
tions; and so forth. Since 2. is the deviation of the mean in standard
deviation units, a plot of relative frequency versus this parameter yields a
single Gaussian curve that describes all populatious of data regardless of
standard deviation. Thus, Figure 3—1b is the normal error curve for both
sets of data used to plot curves A and B in Figure 3—1a. The normal error curve has several general properties. (1) The mean
occurs at the central point of maximum frequency. (2) There is a symmet
rical distribution of pesitive and negative deviations about the maximum.
(3) There is an exponential decrease in frequency as the magnitude of the
deviations increases. Thus, small indeterminate uncertainties are ob
served much more often than very large ones. Areas Under a Normal Error Curve It can be shown that 68.3% of the area beneath any normal error curve lies
within one standard deviation (:10) of the mean n. Thus, 68.3% of the
data making up the population lie within these bounds. Furthermore,
approximately 95.5% of all data are within :20 of the mean and 99.7%
within :30: The vertical dashed lines show the areas bounded by :10,
$20, and :30 in Figure 3—1b. Because of area relationships such as these, the standard deviation of a
Population of data is a useful predictive tool. For example, we can say — 38 Chapter 3 When N) 20, s E 0'. Figure 32
Relative error in o as a function of
N Statistical Evaluation of Data that the chances are 68.3 in 100 that the indeterminate uncertainty of any
single measurement in a normal distribution is no more than :10: Simi
larly, the chances are 95.5 in 100 that the error is less than :20, and so forth. Standard Error of a Mean The ﬁgures on percentage distribution just quoted refer to the probable
error for a single measurement. If a series of samples, each containing N
data, are taken randomly from a population of data, the mean of each set
will show less and less scatter as N increases. The standard deviation of
each mean is known as the standard error of the mean and is given the symbol 0,... It can be shown that the standard error is inversely propor»
tional to the square root of the number of data N used to calculate the 11103.11: arm = urn/JV (3—4)
where sigma is deﬁned by Equation 3—2. An analogous equation can be
written for a sample standard deviation: sm = six/N (3—5) 3A—3 Properties of the Standard Deviation Effect of N on the Reliability of s Uncertainty in the calculated value of 5 decreases as N in Equation 3—1
increases. Figure 3+2 shows the error in o as a function of N. When N is
greater than about 20, s and 0" can be assumed to be identical for all
practical purposes For example, if the 50 measurements in Table 2—3 are
divided into ten subgroups of 5 measurements each, the value of s varies
widely from one subgrOup to another (0.0023 to 0.0079 mL) even though
the average of the computed values of s is that of the entire set (0.0056 50 Relative error in .i'. % IO 3A The Statistical Treatment of Indeterminate Errors 39 mL). In contrast, the computed values of s for two subsets of 25 measure—
ments each are nearly identical (0.0054 and 0.0058 mL). The rapid improvement in the reliability of s as N increases makes it
feasible to obtain a good approximation of 0 when the method of meas—
urement is not excessively timeconsuming and when an adequate supply
of sample is available. For example, ifthe pH of numerous solutions is to
be measured in the course of an investigation, it is useful to evaluate s in a
series of preliminary experiments. This measurement is simple, requiring
only that a pair of rinsed and dried electrodes be immersed in the test
solution. The voltage between the electrodes is proportional to pH. To
determine s, 20 to 30 portions of a buffer solution of ﬁxed pH can
be measured with all steps of the procedure being followed exactly.
Normally, it is safe to assume that the indeterminate error in this test
is the same as that in subsequent measurements. The value of s calcu
lated from Equation 3~I is thus a valid and accurate measure of the
theoretical 0. Pooling Data to Improve the Reliability of s The foregoing procedure is not always practical for analyses that are time
consuming. In this situation, data from a series of samples accumulated
over time can be pooled to provide an estimate of s that is superior to the
value for any individual subset. Again, we must assUme the same sources
of indeterminate error in all the samples. This assumption is usually valid
if the samples have similar compositions and have been analyzed in ex
actly the same way. To obtain a pooled estimate of the standard deviation, spoolgd, deviations
from the mean for each subset are squared; the squares of all subsets are
then summed and divided by an appropriate number of degrees of free—
dom, as shown in Equation 3—6. The pooled s is obtained by extracting
the square root of the quotient. One degree of freedom is lost for each
subset. Thus, the number of degrees of freedom for the pooled s is equal
to the total number of measurements minus the number of subsets. (35) Where N] is the number of data in set 1, N2 is the number in set 2, and so
forth. The term N5 is the number of data sets that are being pooled. Example 3L1 The mercury content in samples of seven ﬁsh taken from the Sacramento
River was determined by a method based upon the absorption of radiation
by gaseous elemental mercury. Calculate a pooled estimate of the stan gard deviation for the method, based upon the ﬁrst three columns of
ata: — 40 Chapter 3 Statistical Evaluation of Data Sum of
Number of Square of
Samples Mean. Deviations
Specimen Measured Hg Content, ppm ppm Hg from Mean
1 3 1.80, 1.58, 1.64 1.673 0.0258
2 4 0.96, 0.98, 1.02, 1.10 1.015 0.0115
3 2 3.13, 3.35 3.240 0.0242
4 6 2.06, 1.93, 2.12, 2.16, 2.018 0.0611
1.89. 1.95
5 4 0.5?, 0.58, 0.64, 0.49 0.570 0.0114
6 5 2.35, 2.44, 2.70, 2.48, 2.44 2.482 0.0685
7 4 1.11, 1.15, 1.22, 1.04 1.130 0.0170
N = 28 Sum of Squares = 0.2196 The values in the last two columns for sample 1 were computed as fol—
lows: xi iixr‘ "‘ (Xi T ’32
1.80 0.127 0.016!
1.58 0.093 0.0086
L64 0.033 0.0011
5.02 Sum of squares = 0.0258 The other data in columns 4 and 5 were obtained similarly. Then 0.0258 + 0.0115 + 0.0242 + 0.0611 + 0.0114 + 0.0685 + 0.0170
SW“ 2 23 — r = 0.10 ppm Hg Note that one degree of freedom is lost for each of the seven samples.
Because more than 20 degrees of freedom remain, however, the com
puted value of s can be considered a good approximation of 0'; that is, s —)
o = 0.10 ppm Hg. BB THE USES OF STATISTICS Experimentalists use statistical calculations to sharpen their judgment
concerning the effects of indeterminate errors. The most common appli—
cations of statistics to analytical chemistry include: 1. Deﬁning the interval around the mean of a set within which the popula
tion mean can be expected to be found with a given probability. 2. Determining the number of replicate measurements required to assure SB The Uses of Statistics 41 (at a given probability) that an experimental mean falls within a prede—
termined interval around the population mean. 3_ Deciding whether an outlying value in a set of replicate results should
be retained or rejected in calculating the mean for the set. 4, Estimating the probability that two samples analyzed by the same
method are signiﬁcantly different in composition, that is, whether a
differenCe in experimental results is likely to be a consequence of
indeterminate error or a real composition difference. 5. Estimating the probability that there is a difference in precision be—
tween two sets of data obtained by different workers or by different
methods. 6. Deﬁning and estimating detection limits. 7. Treating calibration data. We will examine each of these applications in the sections that follow. 38—1 Confidence Limits The exact value of the mean ,u. for a population of data can never be determined exactly because such a determination requires an inﬁnite number of measurements. Statistical theory allows us to set limits around an experimentally determined mean f, however, and the true mean ,u. lies within these limits with a given degree of probability. These limits are called conﬁdence limits, and the interval they deﬁne is known as the Conﬁdence limits deﬁne an interval conﬁdence intervaL around i that probably centains n.
The size of the conﬁdencs interval, which is derived from the sample stande deviation, depends on the certainty with which 3 is known. If there is reason to believe that .r is a good approximation of (I, then the conﬁdence interval can be signiﬁcantly narrower than if the estimate of s is based open only two or three measurements. The Confidence Interval When 3 Is a Good Approximation of (T Figure 3—3 shows a series ofﬁve normal error curves. In each, the relative
frequency is plotted as a function of the quantity 2; (Equation 3—3), which
is the deviation from the mean in units ofthe pepntarion standard devia
tion. The shaded area in each plot lies between the values of *z and +z
that are indicated to the left and right of the curves, respectively. The
number within the shaded area is the percentage of the total area under
the curve that is included within the 2; values. For example, as shown in
the top curve, 50% of the area under any Gaussian curve is located be—
tween —0.67(r and +0.6?er. Proceeding downward, we see that 80% of the
total area lies between —1.290 and + 1.290, and 90% lies between —1.640
and +1.64o. Relationships such as these allow us to deﬁne a range of
values around a measurement within which the true mean is likely to lie
with a certain probability. For example, we may assume that 90 times out
of 100, the true mean, pt, will be within $1.640 of any measurement that We make. Here, the confidence tenet is 90% and the conﬁdence interval is The conﬁdence level is the probabil
iZO' = $1,640: ity expressed as a percent. I 42 Chapter 3 Statistical Evaluation of Data The conﬁdence limits are the values
above and below a measurement
that bound its conﬁdence interval. Conﬁdence
level 1 Relative frequency. yi'N 4n30e201n 0 3? 5. U C 3 3' C O .1 E 3 m . . . '
4a—3cr2crrlar 0 lo 20 3o 40 59:. Figure 3e3
Areas under a Gaussian curve for
various values of i2. nﬁdence limits (CL) of a single We ﬁnd a general expression for the co
3. Remember that z can take measurement by rearranging Equation 3—
positive or negative values. Thus, CL for p. = x : zcr (3—?) Values for z at various conﬁdence levels are found in Table 3—1. Example 3—2 i
Calculate the 50% and 95% conﬁdence limits for the ﬁrst entry (1.80 ppm I Hg) in Example 3—1.
In that example, we calculated 5 to be 0.10 ppm Hg and had sufﬁcient data to assume 5 —+ or. From Table 3—1, we see that z = 0.67 and 1.96 for
the two conﬁdence levels. Thus, from Equation 3—? 50% CL for p. = 1.80 i 0.6? X 0.10 = 1.80 i 0.07
95% CL for ,U. = 1.80 11.96 X 0.10 = 1.80 i 0.20 From these calculations, we conclude that the chances are 50 in 100
that a, the population mean (and, in the absence ofdeterminate error, the
true value), lies in the interval between 1.?3 and 1.8? ppm Hg. Further
more, there is a 95% chance that p. lies in the interval between 1.60 and 2.00 ppm Hg. Equation 34’ applies to the result of a single measurement. Application
of Equation 3—4 shows that the conﬁdence interval is decreased by W
for the average of N replicate measurements. Thus, a more general form of Equation 3—? is z” (3—8) Example 3—3 Calculate the 50% and 95% conﬁdence limits for the mean value (1.6? ppm
Hg) for specimen 1 in Example 3—1. Again, 3 ~—) 0' = 0.10.
For the three measurements, umxow
V3 1.96x 0.10
V3 50% CL = 1.67 d: = 1.67 i 0.04 95%CL=1.6?: =1.67i0.11 Thus, the chances are 50 in 100 that the population mean is located in
the interval between 1.63 and 1.71 ppm Hg and 95 in 100 that it lies
betWeen 1.56 and 1.?8 ppm. Example 3—4 How many replicate measurements of Specimen 1 in Example 3—1 are
needed to decrease the 95% conﬁdence interval to :00? ppm Hg? The pooled value is a good estimate of J. For a conﬁdence interval of
10.0? ppm Hg, then, substitution into Equation 3—8 leads to zs 1.96 X 0.10
+ —————__ 0.07:: =_
W W W = : ————1'960>3?0'10 = :2.80 N = ($2.8)2 = 7.8 We conclude that eight measurements would provide a slightly better than
95% chance of the population mean lying within :01)? ppm of the experi
mental mean. Equation 3—8 tells us that the conﬁdence interval for an analysis can be
halved by carrying out four measurements. Sixteen measurements will
narrow the interval by a factor of 4, and so on. We rapidly reach a point of
diminishing returns in acquiring additional data. Ordinarily we take ad
vantage of the relatively large gain attained by averaging two to four measurements but can seldom afford the time required for additional
increases in conﬁdence. It is essential to keep in mind at all times that conﬁdence intervals based on Equation 38 apply Only in the absence ofdeterminate errors and only
rfwe can assume that 5 == 0. The Confidence Limits When o' Is Unknown Often We are faced with limitations in time or amount of available sample
that prevent us from accurately estimating 0'. Here, a single set of repli
cate measurements must provide not only a mean but also an estimate of
precision. As indicated earlier, 5 calculated from a small set of data may
be quite uncertain. Thus, conﬁdence limits are necessarily broader when
a good estimate of (r is not available. To account for the variability of s, we use the important statistical
parameter t, which is deﬁned in exactly the same way as z (Equation 3—3)
except that here 3 is substituted for 0': 16h
S r = (3—9) Like 2: in Equation 3—3, 1 depends on the desired conﬁdence level. It
also depends on the number of degrees of freedom in the calculation of 5. 3B The Uses of Statistics 43 Table 3—1 CONFIDENCE LEVELS FOR
VARIOUS VALUES 0F 2 Confidence Levels. % z
50 0.67
68 1.00
80 1.29
90 1.64
95 1.96
96 2.00
99 2.58
.99.? 3.00
99.9 3.29
Number of Relative Size
Measurements of Confidence
Averaged. N Interval
l 1.00
2 0.7[
3 0.58
4 0.50
5 0.45
6 0.41
10 0.32 The r statistic is often called Student's t.
Student was me name used by W. S.
Gossctt when he wrote the classic
paper on t that appeared in Btomerrika,
1908, 6. l. Gossett was employed by
the Guinness Brewery to analyze statis
tically the results of determinations of
the alcohol content of their products.
As a result of this work, he discovered
the now—famous statistical treatment of
small sets of data. To avoid the disclo—
sure of any trade secrets of his em—
ployer, Gossett published the paper
under the name Student. 44 Chapter 3 Statistical Evaluation of Data ASN*I—>m.r—+z. Table 3—2
VALUES 0F t FOR VARIOUS LEVELS OF PROBABILITY
Degrees of Factor for Confidence Interval
Freedom 80% 90% 95% 99% 99.9%
1 3.08 6.31 12.? 63.7 637
2 1.89 2.92 4.30 9.92 31.6
3 1.64 2.35 3.18 5.84 12.9
4 1.53 2.13 2.73 4.60 8.60
5 1.48 2.02 2.5? 4.03 6.86
6 1.44 1.94 2.45 3.?1 5.96
7 1.42 1.90 2.36 3.50 5.40
8 1.40 1.86 2.31 3.36 5.04
9 1.38 1.83 2.26 3.25 4.78
10 1.3? 1.81 2.23 3.17 4.59
11 1.36 1.80 2.20 3.11 4.44
12 1.36 1.28 2.18 3.06 4.32
13 1.35 1.7? 2.16 3.01 4.22
14 1.34 1.76 2.14 2.98 4.14
m 1.29 1.64 1.96 2.58 3.29 Table 3~2 provides values for r for a few degrees of freedom. More
extensive tables are found in variours mathematical and statistical hand
books. Note that t—9 z (Table 3—1) as the number of degrees of freedom
becomes inﬁnite. The conﬁdence limits for the mean f of N replicate measurements can
be derived from t by an equation similar to Equation 3—8: CL for ,u. = f i — (3—‘I 0) Example 3—5 A chemist obtained the following data for the alcohol content of a sample
of blood: % C2H50H: 0.084, 0.089, and 0.079. Calculate the 95% conﬁ
dence limits for the mean assuming (a) no additional knowledge about the
precision of the method and (b) it is known that s e» o' = 0.005% C2H50H
on the basis of previous eXperience. (a) 2 x, = 0.084 + 0.089 + 0.0?9 = 0.252 E x? = 0.007056 + 0.007921 + 0.006241 = 0.021218 0.021218  (0.25303
5 : ——3—;1—— = 0.005093 CgHSOH Here, i = 0.25213 2 0.084. Table 3—2 indicates that I = 4.30 for two
degrees of freedom and 95% conﬁdence. Thus, 95%CLZE: ts 00341w _— . =0.084i0.012‘VCHOH
V’N V3 0 2 5 Fr. BB The Uses of Statistics 45 (13) Because a good value of 0’ is available, 1.96 x 00050
: —__.__.— Vj = 0.084 : 0.006% C2H50H 95% CL = r: % = 0.084 Note that a sure knowledge of a signiﬁcantly decreases the conﬁdence
interval. 38—2 Rejection of Outliers
When a set of data contains an outlying result that appears to differ excessively from the average, a decision must be made whether to retain or reject the result.2 The choice of criterion for the rejection of a sus— pected result has its perils. If we set a stringent standard that makes the rejection of a questiouable measurement difﬁcult, We run the risk of re taining results that are spurious and have an inordinate effect on the average of the data. If we set lenient limits on precision and thereby make Outliers are the result of gross mow
the rejection of a result easy, we are liker to discard measurements that (Section 2C).
rightfully belong in the set and thus introduce a bias to the data. It is an unfortunate fact that no universal rule can be invoked to settle the ques tion of retention or rejection. Statistical Tests Several statistical procedures such as the Q test and the T” test have been
deveIOped to provide criteria for rejection or retention of outliers. Such
tests assume that the distribution of the pOpulation data is normal, or
Gaussian. Unfortunately, this condition cannot be proved or disproved
for samples consisting of many fewer than 50 results. Consequently, sta
tistical rules, which are perfectly reliable for normal distributions of data,
should be used with extreme caution when applied to samples containing
only a few data. J. Mandel, in discussing the treatment of small sets of
data, writes, “Those who believe that they can discard observations with
statistical sanction by using statistical rules for the rejection of outliers are
simply deluding themselves.H3 Thus, statistical tests for rejection should
be used only as aids to commou sense when small samples are involved. The 0 Test The Q test is a simple, widely used statistical test.“ In this test the abso
lute value of the difference between the questionable result x, and its 1!. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. .l. Elving.
Eda, Part I. Vol. I, pp. 282—289. New York: Wiley, 19?8. 31. Mandel, in Treatise on Analytical Chemistry, 2nd ed, 1. M. Kolthoff and P. J. Elving.
Eds1 Part1, Vol. 1, p. 282. New York: Wiley, 1978. ‘R. E. Dean and W. J. Dixon. Anal. Chem.. 1951, 23. 636. 46 Chapter 3 Statistical Evaluation of Data Table 3—3
CRITICAL VALUES FOR REJECTION QUOTIENT C” Om (Reject if Oexp > 05...) Number of 90% 95% 99%
Observations Conﬁdence Conﬁdence Conﬁdence 3 0.941 0.970 0.994 4 0.765 0.329 0.926 5 0.642 0.?10 0.821 6 0.560 0.625 0.740 7 0.507 0.568 0.680 8 0.468 0.526 0.634 9 0.43? 0.493 0.598 10 0.412 0.466 0.568 *chroduced from D. B. Rorabacher, Anal. Chem. 1991, 63, 139. By
courtesy of the American Chemical Society. nearest neighbor x” is divided by the spread w of the entire set to give the
quantity QC”: Qexp : qu — xnll'lrw This ratio is then compared with rejection values Qcm found in Table 3—3. If Quip is greater than Qml, the questionable result can be rejected with
the indicated degree of conﬁdence. Example 3—6 The analysis of a calcite sample yielded CaO percentages of 55.95, 56.00,
56.04, 56.08, and 56.23. The last value appears anomalous; should it be
retained or rejected? The difference between 56.23 and 56.08 is 0.15%. The Spread (56.23 —
55.95) is 0.28%. Thus, 0.15
QB”, = O—ﬁ = 0.54 For ﬁve measurements, QC,“ at the 90% conﬁdence level is 0.642. Because
0.536 < 0.642, we must retain the value 56.23. The Tn Test In the American Society for Testing Materials (ASTM) 1",, test, the quan
tity T,I serves as the rejection criterion,5 where T :lxg—fl
S 5For further discussion of this test, see J. Mandel, in Treatise on Analytical Chemistry, 2nd ed, 1. M. Kolthoff and P. J. Elving, Eds, Part 1, Vol. 1, pp. 283—285. New York: Wiley,
1978. BB The Uses of Statistics 47 Here, xq is the questionable result, and f and s are the mean and standard
deviations of the entire set including the questionable result. Rejection is
indicated if the calculated 1",, is greater than the critical values found in Table 341. Example 3—?
Apply the ".1"n test to the data in Example 3—6. 2x, = 55.95 + 56.00 + 56.04 + 56.08 + 56.23 = 280.3
2 x3 = (55.95)2 + {56.00}? + (56.04)2 + (56.08)2 + (56.23)2 = 15713.6634
f = 280.35 = 56.06 15713.6634 — (280.3)25
———‘s“:1—— = 010? 56.23 — 56.06130“)? = 1.59 5 To Table 3—4 indicates that the critical value of Tn for ﬁve measurements is
greater than the experimental value at all conﬁdence levels. Therefore,
retention is also indicated by this test. Recommendations for the Treatment of Outliers Recommendations for the treatment of a small set of results that contains
a suspect value follow: 1. Reexamine carefully all data relating to the outlying result to see if a
gross error could have affected its value. This recommendation de
mands a properly kept laboratory notebook containing careful nota
tions of all observations. 2. If possible, estimate the precision that can be reasonably expected
from the procedure to be sure that the outlying result actually is ques— tionable.
Table 3—4
CRITICAL VALUES FOR REJECTION QUOTIENT 1",,“ Use caution when rejecting data for
Number 0t TH any reason.
Observations 95% Conﬁdence 91.5% Conﬁdence 99% Conﬁdence
3 1.15 1.15 1.15
4 1.46 1.48 1.49
5 1.6? 1.71 1.75
6 1.82 1.89 1.94
7 1.94 2.02 2.10
8 2.03 2.13 2.22
9 2.11 2.21 2.52
m 2.18 2.29 2.41 ______________—__——__w__—_____— *Adapted from J. Mandel, in Treatise on Analytical Chemistry, 2nd cd., I. M. Kolthoff and P11 ElViIlg, Eds, Part I, Vol. 1, p. 284. New York: Wiley, 1978. With permission of John
WHEY & Sons, Inc. — 48 Chapter 3 Statistical Evaluation of Data In statistics a nail hypruhesis postu
[ates that two observed quzinlities
are the same. The notation to indicate the 5% proba
bility level is P b 0.05. 3. Repeat the analysis if sufﬁcient sample and time are available. Agree
ment betWeen the newly acquired data and those data of the original
set that appear to be valid will lend weight to the notion that the
outlying result should be rejected. Furthermore, if retention is still
indicated, the questionable result will have a relatively small effect on
the mean of the larger set of data. 4. If more data cannot be secured, apply the Q test or the 1",, test to the existing set to see if the doubtful result should be retained or rejected on statistical grounds. If the statistical test indicates retention, consider reporting the median of the set rather than the mean. The median has the great virtue of allowing inclusion of all data in a set without undue influence from an
outlying value. In addition, the median of a normally distributed set
containing three measurements provides a better estimate of the cor
rect value than the mean of the set after the outlying value has been discarded. 'J‘I The blind application of statistical tests to retain or reject a suspect
measurement in a small set of data is not likely to be much more fruitful
than an arbitrary decision. The application of good judgment based on
broad experienCe with an analytical method is usually a sounder ap
proach. In the end, the Only valid reason for rejecting a result from a small
set of data is the sure knowledge that a mistake was made in the measure
ment process. Without this knowledge, a cautious approach to rejection of an outlier is wise. 3B~3 Statistical Aids to Hypothesis Testing Much of scientiﬁc and engineering endeavor is based upon hypothesis
testing. Thus. in order to explain an observation, a hypothetical model is
advanced and tested experimentally to determine its validity. If the
results from these experiments do not support the model, We reject it and
seek a new hypothesis. If agreement is found, the hypothetical model
serves as the basis for further experiments. When the hypothesis is sup
ported by sufﬁcient experimental data, it becomes recognized as a useful
theory until such time as data are obtained that refute it. Experimental results seldom agree exactly with those predicted from a
theoretical model. Consequently, scientists and engineers frequently
must judge whether a numerical difference is a manifestation of the inde~
terminate errors inevitable in all measurements. Certain statistical tests
are useful in sharpening these judgments. Tests of this kind make use of a null hypothesis, which assumes that the
numerical quantities being compared are, in fact, the same. The probabil—
ity of the observed differences appearing as a result of indeterminate error
is then computed from statistical theory. Usually, if the observed differ
ence is greater than or equal to the difference that would occur 5 times in
100 (the 5% probability level), the null hypothesis is considered question
able and the difference isjudged to be Signiﬁcant. Other probability levels,
such as 1 in 100 or 10 in 100, may also be adopted, depending upon the certainty desired in the judgment. pF—' BB The Uses of Statistics 49 The kinds of testing that chemists use most often include the compari
son of (1) the mean from an analysis .f with what is believed to be the true
value ’11,; (2) the means i; and if; from two sets of analyses; (3) the standard
deviations s, and .92 or 01 and or; from two sets of measurements; and (4)
the standard deviation 5 of a small set of data with the standard deviation
0 ofa larger set of measurements. The sections that follow consider some
of the methods for making these comparisons. Comparison of an Experimental Mean with a True Value A common way of testing for determinate errors in an analytical method is
to use the method to analyze a sample whose composition is accurately
known. It is likely that the experimental mean i will differ from the
accepted value pt; the judgment must then be made whether this differ
ence is the consequence of indeterminate error or determinate error. In treating this type of problem statistically, the difference it — ,u is
compared with the difference that could be caused by indeterminate
error. If the observed difference is less than that computed for a chosen
probability level, the null hypothesis that f and ,u. are the same cannot be
rejected; that is, no signiﬁcant determinate error has been demonstrated.
It is important to realize, however, that this statement does not say that
there is no determinate error; it says only that whatever determinate error
is present is so small that it cannot be detected. If i — ,u is signiﬁcantly
larger than either the expected or the critical value, we may assume that
the difference is real and that the determinate error is signiﬁcant. The critical value for the rejection of the null hypothesis is calculated
by rewriting Equation 3—10 in the form [3 f—#=ios (3—12) where N is the number of replicate measurements employed in the test. If
a good estimate of o is available, Equation 3—12 can be modiﬁed by
replacing r with z and s with 0'. Example 3—8 A new procedure for the rapid determination of sulfur in kerosenes was
tested on a sample known from its method of preparation to centain
0.123% S. The results were % S = 0.112, 0.118, 0.115, and 0.119. Do the
data indicate that there 'is a determinate error in the method? 2x, = 0.112 + 0.118 + 0.115 + 0.119 = 0.464
f = 0.464i’4 = 0.116% S
i h ,u. = 0.116 — 0.123 = —0.007% S
2 x} = 0.012544 + 0.013924 + 0.013225 + 0.014161 = 0.053854 0.053854 — (0.464JZI4 0.000030
5: —‘“—4—T—= —3—=0.0032 — 50 Chapter 3 Statistical Evaluation of Data If it was conﬁrmed by further exper
iments that the method always gave
low results, we would say that the
method had a negative bias. Even if a mean value is shown to be
equal to the true value at a given conﬁ—
dence level, we cannot conclude that
there is no determinate error in the
data. From Table 3—2, we ﬁnd that at the 95% conﬁdence level, I has a value
of 3.18 for three degrees of freedom. Thus, _ts_ : 3.18 X 0.0032 = 4300051 V2 V3 An experimental mean can be expected to deviate by 10.0051 or greater
no more frequently than 5 times in 100. Thus, if we conclude that f — p. =
#000? is a signiﬁcant difference and that a determinate error is present,
we will, on the average, be wrong fewer than 5 times in 100. If We make a similar calenlation employing the value for r at the 99%
conﬁdence level, £st assumes a value of 0.0093. Thus, if we insist
upon being wrong no more often than 1 time in 100, we must conclude that
no difference between the results has been demonstrated. Note that this
statement is different from saying that there is no determinate error. Comparison of Two Experimental Means The results of chemical analyses are frequently used to determine
whether two materials are identical. Here, the chemist must judge
whether a difference in the means of two sets of measurements is real and
constitutes evidence that the samples are different or whether the discrep
ancy is simply a consequence of indeterminate errors in the two sets. To
illustrate, let us assume that N I replicate analyses of material l yielded a
mean value of £1, and N2 analyses of material 2 obtained by the same
method gave a mean of £2 . If the data were collected in an identical way, it
is usually safe to assume that the standard deviations of the two sets of
measurements are the same and modify Equation 3—12 to take into ac
count that one set of results is being compared with a second rather than
with the true mean of the data, In. In this case, as with the previous one, we invoke the null hypothesis
that the samples are identical and that the observed difference in the
results, (it ~— $2), is the result of indeterminate errors. To test this hypoth
esis statistically, we modify Equation 3+12 in the following way. First, we
substitute X: for it, thus making the left side of the equation the numerical
differenCe betWeen the two means i1 — 222. Since we know from Equation
3—5 that the standard deviation of the mean 51 is S] . . . _
= and likewise for x;, Sml m S: 3‘ 2
m2 vii]; Thus the variance 5% of the difference (at = x] — x1) between the means is
given by 2 2
5d = Sim ‘l' Sm: By substituting the values for 3,}, 5m], and 3",; into this equation, we have r If we then assume that the pooled standard deviation $1,001“ is a good
estimate of both 3m; and smz, then ( 5d )2 : (Sitooled) + (Spooled)2 = 32 I d (N! + N2)
m V N V N2 p000 NIN2
and 5a _ /M
‘V—N — Spooled AHA!2 Substituting this equation into Equation 3—12 (and also £2 for it), we ﬁnd that
_ __ N] + N2
xl ‘— 1'2 : i tspooled N! N? The numerical value for the term on the right is computed using I for the
particular conﬁdence level desired. The number of degrees of freedom for
ﬁnding t in Table 3—2 is N] + N2 — 2. Ifthe experimental difference it. — f;
is smaller than the computed value, the null hypothesis is not rejected and
no signiﬁcant difference between the means has been demonstrated. An
experimental difference greater than the value computed from 1 indicates
that there is a significant difference between the means. If a good estimate of o is available, Equation 3—12 can be modiﬁed by
inserting z for t and o for s. (34 3) Example 3—9 The composition ofa ﬂake of paint found on the clothes of the victim of a
hitandrun accident was compared with that of paint from the car sus
pected of causing the accident. Do the following data for the spectro
scopic determination of titaniUm in the paint suggest a difference in com
position between the two materials? From previous experience, the
standard deviation for the method is known to be 0.35% Ti; that is, s —)
o = 0.35% Ti. Paint from clothes % Ti 2 4.0, 4.6
Paint from car % Ti 2 4.5, 5.3, 5.5, 5.0, 4.9
_ 4.6 + 4.0
X] — _—"—2 — _ 4.5 + 5.3 + 5.5 + 5.0 + 4.9
x2 : s = i1  £2 = 4.3 ~ 5.0 = —0.7% Ti 38 The Uses of Statistics 51 52 Chapter 3 Statistical Evaluation of Data Modifying Equation 3—13 to take into account our knowledge that
s —) er and taking values of z from Table 3—2, we calculate for the 95% conﬁdence level NI+N2 2+5
12:0 W=:1.96><0.35 2 52:05? and for the 99% conﬁdence level W, + N3 {2 + 5
"—i—ZU' NINE : X 2 X 5 = Only 5 out of 100 data should differ by 0.5?% Ti or greater, and only 1
out of 100 should differ by as much as 0.?6% Ti. Thus, it seems reasona
bly probable (between 95% and somewhat less than 99% certain) that the
observed difference of —0.7% does not arise from indeterminate error but
in fact is caused, at least in part, by a real difference between the two
paint samples. Hence, we conclude that the suspected vehicle was proba bly not involved. Example 3—10 Two barrels of wine were analyzed for their alcohol content in order to
determine whether they were from differentsources. On the basis of six
analyses, the average content of the ﬁrst barrel was established to be
12.61% ethanol. Four analyses of the second barrel gave a mean of
12.53% alcohol. The ten analyses yielded a pooled value of 5 = 0.070%.
Do the data indicate a difference between the wines? Here We employ Equation 3—13, using I for eight degrees of freedom
{10 e 2). At the 95% conﬁdence level, a + 4
its ——_N1N2 = 12.31 X 0.070 6 X 4 = :0.10% The observed difference is i;  f2 = 12.61 — 12.53 = 0.08% As often as 5 times in 100, indeterminate error will be responsible for a
difference as great as 0.10%. At the 95% conﬁdenCe level, then, no differ
ence in the alcohol content of the wine has been established. In Example 3—10, no signiﬁcant difference between the two wines was
detected at the 95% probability level. Note that this statement is not
equivalent to saying that i, is equal to :22; nor do the tests prove that the
wines come from the same source. Indeed. it is conceivable that one wine
is a red and the other is a white. To establish with a reasonable probability
that the two wines are fr0m the same source would require extensive testing of other characteristics, such as taste, color, odor. and refractive 38 The Uses of Statistics 53 index, as well as tartaric acid, sugar, and trace element content. If no
signiﬁcant differences are revealed by all of these tests and by others,
then it might be possible to judge the two wines as having a common
origin. In contrast, the ﬁnding of one signiﬁcant difference in any test
would clearly show that the two wines are different. Thus, the establish
ment of a signiﬁcant difference by a single test is much more revealing
than the establishment of an absence of difference. Estimation of Detection Limits Equation 343 is useful for estimating the detection limit for a measure
ment. Here, the standard deviation from several blank determinations is
computed. The minimum detectable quantity Axmm is N1+N2 ﬂxmin : iI _' it: > 15b N N:
1 {3—14) where the subscript (9 refers to the blank determination. Example 3—11 A method for the analysis of DDT gave the following results when applied
to pesticide—free foliage samples: ,ug DDT = 0.2, —0.5, —0.2, 1.0, 0.8,
0.6, 0.4, 1.2. Calculate the DDTdetection limit (at the 99% conﬁdence
level) of the method for (a) a single analysis and (b) the mean of ﬁve
analyses. Here we ﬁnd Ex; = 0.2 — 0.5 — 0.2 +1.0 + 0.8 — 0.6 + 0.4 +1.2 = 2.3
Ex? = 0.04 + 0.25 + 0.04 + 1.0 + 0.64 + 0.36 + 0.16 + 1.44 = 3.93 3.93 — (2.3)2i’8 3.26875
Sb= Tl—‘= 7 =068ng (a) For a single analysis, N. = 1 and the number of degrees of freedom is
l + 8 — 2 = T. From Table 3—2, we ﬁnd I = 3.50, and so 1 + 8
mm," > 3.50 x 0.68 01 X 8 > 2.5 ptg DDT Thus, 99 times out of 100, a result greater than 2.5 ,ug DDT indicates
the presence of the pesticide on the plant. it?) Here, N] = 5, and the number of degrees of freedom is 11. Therefore,
I = 3.11, and s + s
Axum, > 3.11 X 0.68 5 X 8 > 1.2 ag DDT I. ...
View
Full
Document
 Fall '11
 StephenJacobson

Click to edit the document details