**Unformatted text preview: **© Journal of the American Statistical Association June 1972, Volume 67, Number 338
Theory & Methods Section On Simpson’s Paradox and the Sure- COLIN R. BLYTH* This paradox is the possibility of PlAl Bl<PlAl 8’) even though PlAl B)ZP(Al 8')
both under the additional condition C and under the complement C’ of that
condition. Details are given on why this can happen and how extreme the in-
equalities can be. An example shows that Savage’s sure-thing principle (“it you
would deﬁnitely prefer g to f, either knowing that the event C obtained, or know-
ing that C did not obtain, then you definitely prefer g to L") is not applicable to
alternatives t and 9 that involve sequential operations. 1. INTRODUCTION Simpson’s paradox is described in [3]. An occurrence in
real data is given in [1, page 449]. Here is an example
illustrating this paradox: A doctor was planning to try out a new treatment on
patients, mostly local (0), and a few in Chicago (0’). A
statistician advised him to use a table of random numbers,
and as each 0 patient became available, assign him to the
new treatment with probability .91, leave him to the stan—
dard treatment with probability .09; and the same for
each 0’ patient with probabilities .01 and .99 respectively.
(These probabilities were expected to give him about the
number of patients he could handle in each city.) When
the doctor returned later with the data of Tabulation 1.1,
the statistician told him that the new treatment was ob-
viously a very bad one, and criticized him for having con~
tinued trying it on so many patients. Tabulation 1.1 Treatment Standard New
Dead: 5950 9005
Alive: 5050 (46%) 1095 (11%) The doctor replied that he had continued because the
new treatment was obviously a very good one, having
nearly doubled the recovery rate in both cities; and he
was correct, as Tabulation 1.2 shows. Tabulation 1.2 C patients only C' patients only Treat- ment: Standard New Standard M
Dead: 950 9000 5000 5
Alive: 50 (5%) 1000 (10%) 5000 (50%) 95 (95%) * Colin R. Blyth is professor of mathematics, Queen’s University, Kingston,
Ontario, Canada, on leave from the University of Illinois at Urbana. This work was
supported by National Science Foundation grants GP 14786 and GP 28154 at the
University of Illinois. Thing Principle From this data one would expect to have had about
10,700 recoveries had all patients received the new treat-
ment, as compared to the actual 6,145 recoveries, and to
about 5,600 recoveries had all received the standard treat-
ment. The difﬁculty is not one of chance variation—the
observed proportions might be the true ones, or the ob—
served numbers could be multiplied by a constant large
enough to make this essentially so. As with any paradox,
there is nothing paradoxical once we see what has hap-
pened: The C patients are much less likely to recover, and
the new treatment was given mostly to C patients; and of
course a treatment will show a poor recovery rate if tried
out mostly on the most seriously ill patients. Section 2 is a discussion of what this paradox amounts
to mathematically, and how extreme it can get. Section 3
is a discussion of the sure-thing principle, in View of this
paradox. 2. SIMPSON'S PARADOX Mathematically, Simpson’s paradox is this: It is possible to have PM 1 B) < P(A \ B’), and have at the same time both (2.1)
P(A 1 BC) ; P(A l 3'0)
P(A [ BC’) ; P(A \ B’C’). One tends to reason intuitively that this is impossible
because ' P(A ] B) = An average ofP(A [ BC) and P(A \ BC”)
P(A ] B’) = An average of P(A ] 8’0)
and P(A \ B’C’), (2.2) which is easily seen to be true when all the conditioning
events have positive probabilities: P(AlB) = {13(0] 3)} -P(A \ BC) + {Malaya/1130')
PM i B') = {PM B'>} HA 1 BC) + {P(C'l B’>} PM 1 13’0”); (2.2’) 364 Simpson's Paradox but the reasoning fails because these two averages have
different weightings. However, in particular, if B and C’
are independent, these two weightings coincide and the
reasoning correctly shows that (2.1) is impossible. The
paradox can be said to result from the dependence or in—
teraction of B and C’. In the introductory example, A = Alive, B = New treat-
ment, C = Local patient; P( ) refers to probability for a
patient chosen at random from among those recorded in
table (1.2), and coincides with proportions in that table,
now being taken as the total available population. In that
example we have P(A I B) = .11 < P(A I B’) = .46,
P(A I B0) = .10 > P(A I B’C) = .05,
P(A I BC’) = .95 > P(A I B’C’) = .50. The initially surprising fact that an average of .10 and
.95 is so much smaller than an average of .05 and .50 is
easily explained by showing the numerical values in (2.2’) : {.99} (.10) + {.01} (.95)
{.10}(.05) + {.90}(.50). .11:
.46 == Here the paradox could not happen if B and C were inde-
pendent, i.e., if the proportion receiving the new treat—
ment were the same for C patients and C" patients. Here is the most extreme possibility in Simpson’s para-
dox: Subject to the conditions
P(A 1 BC) ; «y~P(A I 3'0) P(A I BC”) 2 7-P(A I B’C’) (2.3) with 'y g 1, it is possible to have P(A I B) 2 0 and P(A I B’) g 1/7. In particular, with *y = 1, it is possible to have P(A I B) 20
and P(AIB’)§1. To show (2.3), recall what caused the
paradox in the introductory example: the C patients were
much more seriously ill, and the new treatment was given
mostly to C patients. Let the probabilities of recovery for
patients getting the standard, new treatments be a, 7:1 for
C patients and [3, 7,8 for C" patients, with l3>oz. Then, by
giving the new treatment to sufficiently many 0 patients
and suﬁiciently few 0’ patients, we can make P (A I B) Eva
and P(AIB’)gB. Now, by suitable choices of a and a,
and noticing the requirement 'yBg 1, we can make P(A I B)
g0 and P(AI B’)§1/'y, which is clearly the most extreme
possibility. It is easy to construct examples using the preceding
method—make P(A I C) small enough, P(AI 0’) large
enough, P(BI 0) large enough, and P(BI 0’) small enough.
The introductory example was constructed in just this
way with 7:2. Tabulation 2.1 shows an example con—
structed in the same way with 7 =1. 365
Tabulation 2.1
C patients only C’ patients only
Treat— ‘
ment: Standard New Standard New
Dead 2 100 10000 1000 9
Alive: 9 (8%) 1000 (10%) 10000 (91%) 100 (92%)
All patients Treat- ment: Standard New Dead: 1100 10009 Alive: 10009 (90%) 1100 (11%) 3. THE SURE-THING PRINCIPLE The relatively formal statement of the sure—thing prin-
ciple given by Savage, [2, pp. 21—22], is this:
If you would deﬁnitely prefer g to f, either knowing - that the event 0 obtained, or knowing that the event
C did not obtain, then you deﬁnitely prefer g to f. (3.1) Here f and g are two alternative possible acts of almost
any sort [1, paragraph on pp. 15—16]: “What in the ordi-
nary way of thinking might be regarded as a chain of de-
cisions, one leading to the other in time, is to be regarded
as a single decision or act ;” for example, the choice of a
strategy in a game, or the choice of a sequential decision
procedure in statistics. In the Simpson’s paradox example of Section 1, the
sure-thing principle is false as applied to f: Draw patients at random from Tabulation 1.2 until you
get one who got the standard treatment, and bet a dollar
that he recovered. g: Draw patients at random from Tabulation 1.2 until you
get one who got the new treatment, and bet a dollar that he recovered.
C: The patient you bet on is a local one. Given that C obtained, you would deﬁnitely prefer g to
f because 9 gives you double the probability of winning a
dollar. Given that 0’ obtained, you would deﬁnitely pre-
fer g to f because g gives you nearly double the probability
of winning a dollar. But you deﬁnitely prefer f to 9 because
f gives you over four times the probability of winning a
dollar. Here, in detail, is the process being considered: We
have the record of each of the 21,100 patients of Tabula—
tion 1.2 on a card whose color indicates the treatment that
patient got. We draw cards at random, one at a time (with
replacement, say), sequentially, from this deck. We are
allowed to observe the color of each card as we draw it,
and to use one of the two alternative possible stopping
rules f and 9. Using f, the card we bet on is equally likely
to belong to any one of 11,000 patients of whom 5,050
(i.e., 46%) recover; using 9, the card we bet on is equally
likely to belong to any one of 10,100 patients of whom
1,095 (i.e., 11%) recover (see Tabulation 1.1); therefore
we deﬁnitely prefer f to g. But if we knew that C ob-
tained, i.e., that the card we stopped on was for a 0 pa— 366 tient (or to put it another way, if the drawing were de-
clared void and a new drawing made when the card we
stop on belongs to a C" patient—of course, such knowl—
edge would make the process a different, conditional, one),
then we would know that, using f, the card we bet on
would be equally likely to belong to any one of 1,000 pa—
tients of whom 50 (i.e., 5%) recover; and, using g, the card
we bet on would be equally likely to belong to any one of
10,000 patients of whom 1,000 (i.e., 10%) recover (see
Tabulation 1.2); therefore we would deﬁnitely prefer g to
f. Similarly, if we knew that 0’ obtained, we would de—
ﬁnitely prefer g to f. The space of possible outcomes of the process just de—
scribed is the familiar space of stopped sequences of se—
quential analysis; the probability model is that each card
drawn, independent of other drawings, is equally likely
to be any one of the 21,100 possible cards. Just as in se-
quential analysis, we can either say that a change of
stopping rule changes the sample space; or else designate
the sample space as the set of all stopped sequences, and
say that a change of stopping rule changes the probability
distribution on this space. In sequential analysis, this paradox is so familiar that
it is not considered paradoxical: at a stage where we can
take 0, 1 or 2 more observations, and where it is inadvis~
able to take 1 more and inadvisable to take 2 more (and
so is inadvisable to use any randomization independent of
observations between 1 and 2 more), it can be advisable
to use a sequential continuation that takes sometimes 1
and sometimes 2 observations. Of course, this is the whole Journal of the American Statistical Association, June 1972 point of sequential analysis; were it not so, sequential
methods would offer no advantage. The sure—thing principle, then, seems not applicable to
situations in which any action taken within f or g, or
(what can be construed as the same thing) in which the
choice between f and g, is allowed to be based sequentially
on events dependent with 0. Such situations appear to
include any in which subjective methods are used, since
subjective beliefs are presumably based sequentially on
events that could be dependent with C. My thanks to L. J. Savage, for telling me of [1], and
for other helpful correspondence. Similar thanks to G. W.
Haggstrom, who pointed out that Simpson’s paradox is
the simplest form of the false correlation paradox in which
the domain of x is divided into short intervals, on each of
which y is a linear function of x with large negative slope,
but these short line segments get progressively higher to
the right, so that over the whole domain of x, the variable
y is practically a linear function of x with large positive
slope. [Received April 1970. Revised July 1971] REFERENCES [1] Cohen, M. R. and Nagel, E., An Introduction to Logic and
Scientiﬁc Method, New York: Harcourt, Brace and Co., 1934. [2] Savage, L. J., The Foundations of Statistics, New York: John
Wiley and Sons, Inc., 1954. [3] Simpson, E. H., “The Interpretation of Interaction in Con-
tingency Tables,” Journal of the Royal Statistical Society, Ser.
B, 13, No. 2 (1951), 238—41. Some Probability Paradoxes in Choice from COLIN R. BLYTH* The probability P(X>Y) can be arbitrarily close to 1 even though the random
variable X is stochastically smaller than Y; the probabilities P(X<Y), P(Y<Z),
P(Z<X) can all exceed %; it is possible that P(X= min X, Y, Z) <P(Y=min X, Y, Z)
<P(Z=min X, Y, Z) even though P(X<Y), P(X<Z), P(Y<Z) all exceed %.
This article examines these paradoxes and extensions of them, and discusses the
difﬁculties they cause in the problem of choosing from among possible random
losses or payoffs,- as in choosing from among possible statistical decision pro-
cedures or from among possible wagers. 1. INTRODUCTION AND SUMMARY This article is concerned with choosing a “best” one * Colin R. Blyth is professor of mathematics, Queen’s University, Kingston, On-
tario, Canada, and is on leave from the University of Illinois at Urbana. This work
was supported by National Science Foundation grants GP 8727 and GP 28154 at
the University of Illinois. Among Random Alternatives from a class of available random alternatives, using this
simple probability model: corresponding to the ith alter—
native is a real—valued random variable Li, and our choice
is to be based solely on the joint distribution of the L,’s,
which is known. Here the desirabilities of the alternatives
depend on the outcome of a chance process, with known
distribution, which generates the L,’s: on a particular
realization of that process, the ith alternative delivers a
numerical amount, say L,=u,~, of something that is good
——a payoff, such as money, time in useful life, pleasure,
satisfaction, etc. (or else bad a loss, such as money, time
in a speed contest, displeasure, dissatisfaction, etc.).
Our program is this: ﬁrst we choose an i, and then there ...

View
Full Document

- Fall '16
- Statistics