371chapter4f2011 - Chapter 4 The Poisson Distribution 4.1...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named for Simeon-Denis Poisson (1781–1840). In addition, poisson is French for fish. In this chapter we will study a family of probability distributions for a countably infinite sample space, each member of which is called a Poisson Distribution. Recall that a binomial distribution is characterized by the values of two parameters: n and p. A Poisson distribution is simpler in that it has only one parameter, which we denote by θ, pronounced theta. (Many books and websites use λ, pronounced lambda, instead of θ. We save λ for a related purpose.) The parameter θ must be positive: θ > 0. Below is the formula for computing probabilities for the Po isson. P (X = x) = e−θ θx , for x = 0, 1, 2, 3, . . . . x! (4.1) In this equation, e is the famous number from calculus, e = lim (1 + 1/n)n = 2.71828 . . . . n→∞ You might recall from the study of infinite series in calculus, that ∞ bx /x! = eb , x=0 for any real number b. Thus, ∞ ∞ P (X = x) = e−θ x=0 θx /x! = e−θ eθ = 1. x=0 Thus, we see that Formula 4.1 is a mathematically valid way to assign probabilities to the nonnegative integers. The mean of the Poisson is its parameter θ; i.e. µ = θ. This can be proven using calculus and a √ similar argument shows that the variance of a Poisson is also equal to θ; i.e. σ 2 = θ and σ = θ . 43 When I write X ∼ Poisson(θ) I mean that X is a random variable with its probability distribution given by the Poisson with parameter value θ. I ask you for patience. I am going to delay my explanation of why the Poisson distribution is important in science. Poisson probabilities can be computed by hand with a scientific calculator. Alternatively, the following website, which is linked to our course webpage, can be used: http://stattrek.com/Tables/Poisson.aspx I will give an example to illustrate the use of this site. Let X ∼ Poisson(θ). The website calculates five probabilities for you: P (X = x); P (X < x); P (X ≤ x); P (X > x); and P (X ≥ x). You must give as input your value of θ and a value of x. Suppose that I have X ∼ Poisson(10) and I am interested in P (X = 8). I go to the site and type ‘8’ in the box labeled ‘Poisson random variable,’ and I type ‘10’ in the box labeled ‘Average rate of success.’ I click on the ‘Calculate’ box and the site gives me the following answers: P (X = 8) = 0.1126; P (X < 8) = 0.2202; P (X ≤ 8) = 0.3328; P (X > 8) = 0.6672; and P (X ≥ 8) = 0.7798. (There is, of course, a great deal of redundancy in these five a nswers because two pairs of events are complements of each other.) It can be shown that for the Poisson, if θ ≤ 5 then its probability histogram is markedly asymmetrical, but if θ ≥ 25 its probability histogram is approximately symmetric and b ell-shaped. This last statement suggests that we might use a normal curve to compute approximate probabilities for the Poisson, provided θ is large. For example, suppose that X ∼ Poisson(25) and I want to calculate P (X ≥ 30). We will use a modification of the method we learned for the binomial. √ First, we note that µ = 25 and σ = 25 = 5. Thus, our approximating curve will be the normal curve with these values for its mean and standard deviation. Using the continuity correction, we replace P (X ≥ 30) with P (X ≥ 29.5). Next, going to the normal curve website, we find that the area above (to the right of) 29.5 is 0.1841. From the Poiss on website, I find that the exact probability is 0.1821. To summarize: To approximate P (X ≥ x) for X ∼ Poisson(θ), √ • Use the normal curve with mean equal to θ and standard deviation equal to θ. • Find the area under the normal curve above (to the right of) ( x − 0.5). If θ is unknown we can use the value of X to estimate it. The point estimate is x and, following the presentation for the binomial, we can use the snc to obtain an approximate confidence interval for θ. The result is: √ x ± z x. 44 Here is an example of its use. Ralph assumes that X has a Poisson distribution, but does not know the value of θ. He observes x = 30. His point estimate of the mean is 30 and his 95% confidence interval is √ 30 ± 1.96 30 = 30 ± 10.7 = [19.3, 40.7]. We will now investigate the accuracy of the snc approximation. Suppose that, in fact, θ = 40. The 95% confidence interval will be correct if, and only if, √ √ X − 1.96 X ≤ 40 ≤ X + 1.96 X. After algebra, this becomes (30 ≤ X ≤ 54). The probability of this event, from the website, is 0.9428, which is pretty close to the desired 0.9500. I calculated the exact probability that the approximate CI is correct for several values of θ; my results are below. θ: Exact Prob. of Correct Interval 30 35 40 50 100 0.9308 0.9368 0.9428 0.9487 0.9450 In my opinion, the approximate CI works adequately for θ ≥ 40. If you believe that θ might be smaller than 40 (and evidence of this would be if X was smaller than 40), then you might want to use an exact method, as I illustrated in Chapter 3 for the binomial. In fact, the website that gives us exact CI’s for the binomial also gives exact CI’s for the Poisson. Bart assumes that X ∼ Poisson(θ) but does not know the value of θ. He observes X = 3 and wants to obtain: • The two-sided 95% CI for θ; and • The upper one-sided 95% CI for θ. I will use the website to find Bart’s CI’s. I type ‘3’ (the value of X ) into the ‘Observed Events:’ box and click on compute. (I don’t need to specify the confidence level because the 95% two-sided CI is the default for this site.) I get [0.6187, 8.7673] as the exact two-sided 95% CI for θ. For the one-sided CI, I scroll down and type ‘5’ in the ‘upper tail’ box and ‘0’ in the ‘lower tail’ box. Then I scroll up and hit compute. I get the CI: [0.0008, 7.7537]. This is clearly a computer error—round-off error—because the lower bound must be 0. So , the answer is that 7.7537 is the 95% upper bound for θ. 4.2 Poisson Approximation to the Binomial Earlier I promised that I would provide some motivation for s tudying the Poisson distribution. We have seen that for the binomial, if n is moderately large and p is not too close to 0 (remember, we don’t worry about p being close to 1) then a normal curve gives good approximatio ns to binomial probabilities. In this section we will see that if p is close to 0 and n is large, the Poisson 45 can be used to approximate the binomial. Thus, the Poisson provides an approximate method in one of the situations in which the normal curve approximatio n is poor. I will show you the derivation of this fact below. If you have not studied calculus and limits, you might find this derivation too difficult to follow. This proof will not be on any exam in this course. Remember, if X ∼ Bin(n, p), then for a fixed value of x, P (X = x) = n! px q n − x . x!(n − x)! Now, replace p in this formula by θ/n. In my ‘limit’ argument below, as n grows, θ will remain fixed which means that p = θ/n will become smaller. We get: P (X = x) = n! (θ/n)x (1 − θ/n)n−x = x!(n − x)! θx n! [(1 − θ/n)n . x (1 − θ /n)x x! (n − x)!n Now the term in the square brackets: n! (n − x)!nx (1 − θ/n)x , for x fixed, converges (i.e. gets closer and closer) to 1 as n → ∞; thus, it can be ignored for large n. As shown in calculus, as n → ∞, (1 − θ/n)n converges to e−θ . The result follows. In the old days this result was very useful. For very large n and small p and computations performed by hand, the Poisson might be preferred to working with the binomial. Nowadays, as we will see below, this result is important mostly because it gives us greater insight into problems. Next, we will consider estimation. Suppose that we have n = 10,000 BT and there are x = 10 successes observed. The website for the exact binomial confidence interval gives [0.0005, 0.0018] for the 95% two-sided confidence interval for p. Alternatively, we can approximate the distribution of X by the Poisson with parameter θ = 10000p. Using the observed x = 10, the exact 95% two-sided confidence interval for θ is [4.7954, 18.3904]. The CI is an assertion that the following inequality is true: 4.7954 ≤ θ ≤ 18.3904. Now we substitute θ = 10000p and this becomes 4.7954 ≤ 10000p ≤ 18.3904. Dividing thru by 10000, we get the following CI for p: 0.0005 ≤ p ≤ 0.0018, 46 the same answer we had when we used the binomial distribution. Now, I would understand if you are thinking, “Why should we learn to do the confidence interval for p two ways?” Fair enough; but computers ideally do more than just give us answers to specific questions; they let us learn about patterns in answers. For example, suppose X ∼ Poisson(θ) and we observe X = 0. From the website, the 95% one-sided confidence interval for θ is [0, 2.9957]. Why is this interesting? Well, I have said that we don’t care about cases where p = 0. But sometimes we might hope for p = 0. Borrowing from the movie, Armageddon, let every day be a trial and the day is a ‘success’ if the Earth is hit by a asteroid/meteor that destroys all hum an life. Obviously, throughout human habitation of this planet there have been no successes. Given 0 successes in n trials, the above answer indicates that we are 95% confident that p ≤ 2.9957/n. Just don’t ask me exactly what n equals. Or how I know that the trials are i.i.d. 4.3 The Poisson Process The binomial distribution is appropriate for counting succ esses in n i.i.d. trials. For p small and n large, the binomial can be well approximated by the Poisson. Thus, it is not too surprising to learn that the Poisson is also a model for counting successes. Consider a process evolving in time in which at ‘random times ’ successes occur. What does this possibly mean? Perhaps the following picture will help. O 0 OO 1 O 2 O 3 4 O O 5 O 6 In this picture, observation begins at time t = 0 and time passing is denoted by moving to the right on the number line. At various times successes will occur, with each success denoted by the letter ‘O’ placed on the number line. Here are some examples o f such processes. 1. A ‘target’ is placed near radioactive material and whenev er a radioactive particle hits the target we have a success. 2. A road intersection is observed. A success is the occurren ce of an accident. 3. A hockey (soccer) game is watched. A success occurs whenever a goal is scored. 4. On a remote stretch of highway, a success occurs when a vehi cle passes. The idea is that the times of occurrences of successes cannot be predicted with certainty. We would like, however, to be able to calculate probabilities. To do this, we need a mathematical model, much like our mathematical model for BT. Our model is called the Poisson Process. A careful mathematical presentation and derivation is beyond the goals of this course. Here are the basic ideas: 47 1. The number of successes in disjoint intervals are indepen dent of each other. For example, in a Poisson Process, the number of successes in the interval [0, 3] is independent of the number of successes in the interval [5, 6]. 2. The probability distribution of the number of successes counted in any time interval depends only on the length of the interval. For example, the probability of getting exactly five successes is the same for interval [0, 2.5] as it is for interval [3.5, 6.0]. 3. Successes cannot be simultaneous. With these assumptions, it turns out that the probability di stribution of the number of successes in any interval of time is the Poisson distribution with parameter θ, where θ = λ × w , where w > 0 is the length of the interval and λ > 0 is a feature of the process, often called its rate. I have presented the Poisson Process as occurring in one dime nsion—time. It also can be applied if the one dimension is, say, distance. For example, a researcher could be walking along a path and occasionally finds successes. Also, the Poisson Process can be extended to two or three dimensions. For example, in two dimensions a researcher cou ld be searching a field for a certain plant or animal that is deemed a success. In three dimensions a researcher could be searching a volume of air, water or dirt looking for something of interes t. The modification needed for two or three dimensions is quite simple: the process still has a rate, again called λ, and now the number of successes in an area or volume has a Pois son distribution with θ equal to the rate multiplied by the area or volume, whichever is appropriate. 4.4 Independent Poissons Earlier we learned that if X1 , X2 , . . . , Xn are i.i.d. dichotomous outcomes (success or failure), then we can calculate probabilities for the sum of these guys X : X = X1 + X2 + . . . Xn . Probabilities for X are given by the binomial distribution. There is a similar re sult for the Poisson, but the conditions are actually weaker. The interested read er can think about how the following fact is implied by the Poisson Process. Suppose that for i = 1, 2, 3, . . . , n, the random variable Xi ∼ Poisson(θi ) and that the sequence of Xi ’s are independent. (If all of the θi ’s are the same, then we have i.i.d. The point is that we don’t need the i.d., just the independence.) Define θ+ = n=1 θi . The result is that X ∼ Poisson(θ+ ). i Because of this result we will often (as I have done above), but not always, pretend that we have one Poisson random variable, even if, in reality, we have a sum of independent Poisson random variables. I will illustrate what I mean with an estimation example. Suppose that Cathy observes 10 i.i.d. Poisson random variables, each with parameter θ. She summarizes the ten values she obtains by computing their total, X , remembering that X ∼ Poisson(10θ). Cathy can then calculate a CI for 10θ and convert it to a CI for θ. 48 For example, suppose that Cathy observes a total of 92 when she totals her 10 values. Because 92 is so large, I will use the formula for the approximate two- sided 95% CI for 10θ. It is: √ 92 ± 1.96 92 = 92 ± 18.800 = [73.200, 110.800]. Thus, the two-sided 95% CI for θ is [7.320, 11.080]. By the way, the exact CI for 10θ is [74.165, 112.83]. This is typically what happens; the exact CI for a Poisson is shifted to the right of the approximate CI. 4.5 *Why Bother with the Poisson? (Optional) Suppose that we plan to observe an i.i.d. sequence of random variables and that each random variable has for possible values: 0, 1, 2, . . .. (This scenario frequently occurs in science.) In this chapter I have suggested that we assume that each random vari able has a Poisson distribution. But why? What do we gain? Why not just do the following? Define p0 = P (X = 0), p1 = P (X = 1), p2 = P (X = 2), . . . , where there is now a sequence of probabilities known only to n ature. As a researcher we can try to estimate this sequence. This question is an example of a, if not the, fundamental question a researcher always considers: How much math structure should we impose on a problem? Ce rtainly, the Poisson leads to values for p0 , p1 , p2 , . . .. The difference is that with the Poisson we impose a structure on these probabilities, whereas in the ‘general case’ we do not impos e a structure. As with many things in human experience, some people are too extreme on this issue. Some people put too much faith in the Poisson (or other assumed structures) and cling to it even when the data make its continued assumption ridiculous; others claim the moral high ground and proclaim: “I don’t make unnecessary assumptions.” I cannot give you any rules for how to behave; instead, I will give you an extended example of how answers change when we change assumptions. Let us consider a Poisson Process in two dimensions. For concreteness, imagine you are in a field searching for a plant/insect that you don’t particularly like; i.e. you will be happiest if there are none. Thus, you might want to know the numerical value of P (X = 0). Of course, P (X = 0) is what we call p0 and for the Poisson it is equal to e−θ . Suppose it is true (i.e. this is what Nature knows) that X ∼ Poisson(0.6931) which makes P (X = 0) = e−0.6931 = 0.500. Suppose further that we have two researchers: • Researcher A assumes Poisson with unknown θ. • Researcher B assumes no parametric structure; i.e. B wants to know p0 . 49 Note that both researchers want to get an estimate of 0.500 fo r P (X = 0). Suppose that the two researchers observe the same data, name ly n = 10 trials. Who will do better? Well, we answer this question by simulating the data. I used my computer to simulate n = 10 i.i.d. trials from the Poisson(0.6931) and obtained the following data: 1, 0, 1, 0, 3, 1, 2, 0, 0, 4. Researcher B counts four occurrences of ‘0’ in the sample and estimates P (X = 0) to be 4/10 = 0.4. Researcher A estimates θ by the mean of the 10 numbers: 12/10 = 1.2 and then estimates P (X = 0) by e−1.2 = 0.3012. In this one simulated data set, each researcher’s estimate is too low and Researcher B does better than A. One data set, however, is not conclusive. So, I simulated 999 more data sets of size n = 10 to obtain a total of 1000 simulated data sets. In this simulation, sometimes A did better, sometimes B did better. Statisticians try to decide which does better overall. First, we look at how each researcher did on average. If you av erage the 1,000 estimates for A you get 0.5226 and for B you get 0.5066. Surprisingly, B, who makes fewer assumptions, is, on average, closer to the truth. When we find a result in a simul ation study that seems surprising we should wonder whether it is a false alarm caused by the appr oximate nature of simulation answers. While I cannot explain why at this point, I will simply say that this is not a false alarm. A consequence of assuming Poisson is that, especially for small values of n, there can be some bias in the mean value of an estimate. By contrast, the fact that the mean of the estimates by B exceeds 0.5 is not meaningful; i.e. B’s method does not possess bias. I will still conclude that A is better than B, despite the bias; I will now describe the basis for this conclusion. From the point-of-view of Nature, who knows the truth, every estimate value has an error: e = estimate minus truth. In this simulation the error e is the estimate minus 0.5. Now errors can be positive or negative. Also, trying to make sense of 1000 erro rs is too difficult; we need a way to summarize them. Statisticians advocate averaging the erro rs after making sure that the negatives and positives don’t cancel. We have two preferred ways of doi ng this: • Convert each error to an absolute error by taking its absolute value. • Convert each error to a squared error by squaring it. For my simulation study, the mean absolute error is 0.1064 for A and 0.1240 for B. Because there is a minimum theoretical value of 0 for the mean absolut e error, it makes sense to summarize this difference by saying that the mean absolute error for A is 14.2% smaller than it is for B. This 14.2% is my preferred measure and why I conclude that A is better than B. As we will see, statisticians like to square errors, althoug h justifying this in an intuitive way is a bit difficult. I will just remark that for this simulation study, the mean squared error for A is 0.001853 and for B it is 0.002574. (Because all of the absolute errors are 0.5 or smaller, squaring the errors make them smaller.) To revisit the issue of bias, I repeated the above simulation study, but now with n = 100. The mean of the estimates for A is 0.5006 and for B is 0.5007. These discrepancies from 0.5 are not meaningful; i.e. there is no bias. 50 ...
View Full Document

Ask a homework question - tutors are online