Blyth_OnSimpsonsParadoxandtheSureThingPrinciple - Journal...

This preview shows page 1 - 3 out of 3 pages.

Image of page 1

Subscribe to view the full document.

Image of page 2
Image of page 3

Unformatted text preview: © Journal of the American Statistical Association June 1972, Volume 67, Number 338 Theory & Methods Section On Simpson’s Paradox and the Sure- COLIN R. BLYTH* This paradox is the possibility of PlAl Bl<PlAl 8’) even though PlAl B)ZP(Al 8') both under the additional condition C and under the complement C’ of that condition. Details are given on why this can happen and how extreme the in- equalities can be. An example shows that Savage’s sure-thing principle (“it you would definitely prefer g to f, either knowing that the event C obtained, or know- ing that C did not obtain, then you definitely prefer g to L") is not applicable to alternatives t and 9 that involve sequential operations. 1. INTRODUCTION Simpson’s paradox is described in [3]. An occurrence in real data is given in [1, page 449]. Here is an example illustrating this paradox: A doctor was planning to try out a new treatment on patients, mostly local (0), and a few in Chicago (0’). A statistician advised him to use a table of random numbers, and as each 0 patient became available, assign him to the new treatment with probability .91, leave him to the stan— dard treatment with probability .09; and the same for each 0’ patient with probabilities .01 and .99 respectively. (These probabilities were expected to give him about the number of patients he could handle in each city.) When the doctor returned later with the data of Tabulation 1.1, the statistician told him that the new treatment was ob- viously a very bad one, and criticized him for having con~ tinued trying it on so many patients. Tabulation 1.1 Treatment Standard New Dead: 5950 9005 Alive: 5050 (46%) 1095 (11%) The doctor replied that he had continued because the new treatment was obviously a very good one, having nearly doubled the recovery rate in both cities; and he was correct, as Tabulation 1.2 shows. Tabulation 1.2 C patients only C' patients only Treat- ment: Standard New Standard M Dead: 950 9000 5000 5 Alive: 50 (5%) 1000 (10%) 5000 (50%) 95 (95%) * Colin R. Blyth is professor of mathematics, Queen’s University, Kingston, Ontario, Canada, on leave from the University of Illinois at Urbana. This work was supported by National Science Foundation grants GP 14786 and GP 28154 at the University of Illinois. Thing Principle From this data one would expect to have had about 10,700 recoveries had all patients received the new treat- ment, as compared to the actual 6,145 recoveries, and to about 5,600 recoveries had all received the standard treat- ment. The difficulty is not one of chance variation—the observed proportions might be the true ones, or the ob— served numbers could be multiplied by a constant large enough to make this essentially so. As with any paradox, there is nothing paradoxical once we see what has hap- pened: The C patients are much less likely to recover, and the new treatment was given mostly to C patients; and of course a treatment will show a poor recovery rate if tried out mostly on the most seriously ill patients. Section 2 is a discussion of what this paradox amounts to mathematically, and how extreme it can get. Section 3 is a discussion of the sure-thing principle, in View of this paradox. 2. SIMPSON'S PARADOX Mathematically, Simpson’s paradox is this: It is possible to have PM 1 B) < P(A \ B’), and have at the same time both (2.1) P(A 1 BC) ; P(A l 3'0) P(A [ BC’) ; P(A \ B’C’). One tends to reason intuitively that this is impossible because ' P(A ] B) = An average ofP(A [ BC) and P(A \ BC”) P(A ] B’) = An average of P(A ] 8’0) and P(A \ B’C’), (2.2) which is easily seen to be true when all the conditioning events have positive probabilities: P(AlB) = {13(0] 3)} -P(A \ BC) + {Malaya/1130') PM i B') = {PM B'>} HA 1 BC) + {P(C'l B’>} PM 1 13’0”); (2.2’) 364 Simpson's Paradox but the reasoning fails because these two averages have different weightings. However, in particular, if B and C’ are independent, these two weightings coincide and the reasoning correctly shows that (2.1) is impossible. The paradox can be said to result from the dependence or in— teraction of B and C’. In the introductory example, A = Alive, B = New treat- ment, C = Local patient; P( ) refers to probability for a patient chosen at random from among those recorded in table (1.2), and coincides with proportions in that table, now being taken as the total available population. In that example we have P(A I B) = .11 < P(A I B’) = .46, P(A I B0) = .10 > P(A I B’C) = .05, P(A I BC’) = .95 > P(A I B’C’) = .50. The initially surprising fact that an average of .10 and .95 is so much smaller than an average of .05 and .50 is easily explained by showing the numerical values in (2.2’) : {.99} (.10) + {.01} (.95) {.10}(.05) + {.90}(.50). .11: .46 == Here the paradox could not happen if B and C were inde- pendent, i.e., if the proportion receiving the new treat— ment were the same for C patients and C" patients. Here is the most extreme possibility in Simpson’s para- dox: Subject to the conditions P(A 1 BC) ; «y~P(A I 3'0) P(A I BC”) 2 7-P(A I B’C’) (2.3) with 'y g 1, it is possible to have P(A I B) 2 0 and P(A I B’) g 1/7. In particular, with *y = 1, it is possible to have P(A I B) 20 and P(AIB’)§1. To show (2.3), recall what caused the paradox in the introductory example: the C patients were much more seriously ill, and the new treatment was given mostly to C patients. Let the probabilities of recovery for patients getting the standard, new treatments be a, 7:1 for C patients and [3, 7,8 for C" patients, with l3>oz. Then, by giving the new treatment to sufficiently many 0 patients and sufiiciently few 0’ patients, we can make P (A I B) Eva and P(AIB’)gB. Now, by suitable choices of a and a, and noticing the requirement 'yBg 1, we can make P(A I B) g0 and P(AI B’)§1/'y, which is clearly the most extreme possibility. It is easy to construct examples using the preceding method—make P(A I C) small enough, P(AI 0’) large enough, P(BI 0) large enough, and P(BI 0’) small enough. The introductory example was constructed in just this way with 7:2. Tabulation 2.1 shows an example con— structed in the same way with 7 =1. 365 Tabulation 2.1 C patients only C’ patients only Treat— ‘ ment: Standard New Standard New Dead 2 100 10000 1000 9 Alive: 9 (8%) 1000 (10%) 10000 (91%) 100 (92%) All patients Treat- ment: Standard New Dead: 1100 10009 Alive: 10009 (90%) 1100 (11%) 3. THE SURE-THING PRINCIPLE The relatively formal statement of the sure—thing prin- ciple given by Savage, [2, pp. 21—22], is this: If you would definitely prefer g to f, either knowing - that the event 0 obtained, or knowing that the event C did not obtain, then you definitely prefer g to f. (3.1) Here f and g are two alternative possible acts of almost any sort [1, paragraph on pp. 15—16]: “What in the ordi- nary way of thinking might be regarded as a chain of de- cisions, one leading to the other in time, is to be regarded as a single decision or act ;” for example, the choice of a strategy in a game, or the choice of a sequential decision procedure in statistics. In the Simpson’s paradox example of Section 1, the sure-thing principle is false as applied to f: Draw patients at random from Tabulation 1.2 until you get one who got the standard treatment, and bet a dollar that he recovered. g: Draw patients at random from Tabulation 1.2 until you get one who got the new treatment, and bet a dollar that he recovered. C: The patient you bet on is a local one. Given that C obtained, you would definitely prefer g to f because 9 gives you double the probability of winning a dollar. Given that 0’ obtained, you would definitely pre- fer g to f because g gives you nearly double the probability of winning a dollar. But you definitely prefer f to 9 because f gives you over four times the probability of winning a dollar. Here, in detail, is the process being considered: We have the record of each of the 21,100 patients of Tabula— tion 1.2 on a card whose color indicates the treatment that patient got. We draw cards at random, one at a time (with replacement, say), sequentially, from this deck. We are allowed to observe the color of each card as we draw it, and to use one of the two alternative possible stopping rules f and 9. Using f, the card we bet on is equally likely to belong to any one of 11,000 patients of whom 5,050 (i.e., 46%) recover; using 9, the card we bet on is equally likely to belong to any one of 10,100 patients of whom 1,095 (i.e., 11%) recover (see Tabulation 1.1); therefore we definitely prefer f to g. But if we knew that C ob- tained, i.e., that the card we stopped on was for a 0 pa— 366 tient (or to put it another way, if the drawing were de- clared void and a new drawing made when the card we stop on belongs to a C" patient—of course, such knowl— edge would make the process a different, conditional, one), then we would know that, using f, the card we bet on would be equally likely to belong to any one of 1,000 pa— tients of whom 50 (i.e., 5%) recover; and, using g, the card we bet on would be equally likely to belong to any one of 10,000 patients of whom 1,000 (i.e., 10%) recover (see Tabulation 1.2); therefore we would definitely prefer g to f. Similarly, if we knew that 0’ obtained, we would de— finitely prefer g to f. The space of possible outcomes of the process just de— scribed is the familiar space of stopped sequences of se— quential analysis; the probability model is that each card drawn, independent of other drawings, is equally likely to be any one of the 21,100 possible cards. Just as in se- quential analysis, we can either say that a change of stopping rule changes the sample space; or else designate the sample space as the set of all stopped sequences, and say that a change of stopping rule changes the probability distribution on this space. In sequential analysis, this paradox is so familiar that it is not considered paradoxical: at a stage where we can take 0, 1 or 2 more observations, and where it is inadvis~ able to take 1 more and inadvisable to take 2 more (and so is inadvisable to use any randomization independent of observations between 1 and 2 more), it can be advisable to use a sequential continuation that takes sometimes 1 and sometimes 2 observations. Of course, this is the whole Journal of the American Statistical Association, June 1972 point of sequential analysis; were it not so, sequential methods would offer no advantage. The sure—thing principle, then, seems not applicable to situations in which any action taken within f or g, or (what can be construed as the same thing) in which the choice between f and g, is allowed to be based sequentially on events dependent with 0. Such situations appear to include any in which subjective methods are used, since subjective beliefs are presumably based sequentially on events that could be dependent with C. My thanks to L. J. Savage, for telling me of [1], and for other helpful correspondence. Similar thanks to G. W. Haggstrom, who pointed out that Simpson’s paradox is the simplest form of the false correlation paradox in which the domain of x is divided into short intervals, on each of which y is a linear function of x with large negative slope, but these short line segments get progressively higher to the right, so that over the whole domain of x, the variable y is practically a linear function of x with large positive slope. [Received April 1970. Revised July 1971] REFERENCES [1] Cohen, M. R. and Nagel, E., An Introduction to Logic and Scientific Method, New York: Harcourt, Brace and Co., 1934. [2] Savage, L. J., The Foundations of Statistics, New York: John Wiley and Sons, Inc., 1954. [3] Simpson, E. H., “The Interpretation of Interaction in Con- tingency Tables,” Journal of the Royal Statistical Society, Ser. B, 13, No. 2 (1951), 238—41. Some Probability Paradoxes in Choice from COLIN R. BLYTH* The probability P(X>Y) can be arbitrarily close to 1 even though the random variable X is stochastically smaller than Y; the probabilities P(X<Y), P(Y<Z), P(Z<X) can all exceed %; it is possible that P(X= min X, Y, Z) <P(Y=min X, Y, Z) <P(Z=min X, Y, Z) even though P(X<Y), P(X<Z), P(Y<Z) all exceed %. This article examines these paradoxes and extensions of them, and discusses the difficulties they cause in the problem of choosing from among possible random losses or payoffs,- as in choosing from among possible statistical decision pro- cedures or from among possible wagers. 1. INTRODUCTION AND SUMMARY This article is concerned with choosing a “best” one * Colin R. Blyth is professor of mathematics, Queen’s University, Kingston, On- tario, Canada, and is on leave from the University of Illinois at Urbana. This work was supported by National Science Foundation grants GP 8727 and GP 28154 at the University of Illinois. Among Random Alternatives from a class of available random alternatives, using this simple probability model: corresponding to the ith alter— native is a real—valued random variable Li, and our choice is to be based solely on the joint distribution of the L,’s, which is known. Here the desirabilities of the alternatives depend on the outcome of a chance process, with known distribution, which generates the L,’s: on a particular realization of that process, the ith alternative delivers a numerical amount, say L,=u,~, of something that is good ——a payoff, such as money, time in useful life, pleasure, satisfaction, etc. (or else bad a loss, such as money, time in a speed contest, displeasure, dissatisfaction, etc.). Our program is this: first we choose an i, and then there ...
View Full Document

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern