This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Chapter 4
The Poisson Distribution
4.1 The Fish Distribution?
The Poisson distribution is named for Simeon-Denis Poisson (1781–1840). In addition, poisson is
French for ﬁsh.
In this chapter we will study a family of probability distributions for a countably inﬁnite sample
space, each member of which is called a Poisson Distribution. Recall that a binomial distribution
is characterized by the values of two parameters: n and p. A Poisson distribution is simpler in that
it has only one parameter, which we denote by θ, pronounced theta. (Many books and websites
use λ, pronounced lambda, instead of θ. We save λ for a related purpose.) The parameter θ must
be positive: θ > 0. Below is the formula for computing probabilities for the Po isson.
P (X = x) = e−θ θx
, for x = 0, 1, 2, 3, . . . .
x! (4.1) In this equation, e is the famous number from calculus,
e = lim (1 + 1/n)n = 2.71828 . . . .
n→∞ You might recall from the study of inﬁnite series in calculus, that
∞ bx /x! = eb ,
x=0 for any real number b. Thus,
∞ ∞ P (X = x) = e−θ
x=0 θx /x! = e−θ eθ = 1.
x=0 Thus, we see that Formula 4.1 is a mathematically valid way to assign probabilities to the nonnegative integers.
The mean of the Poisson is its parameter θ; i.e. µ = θ. This can be proven using calculus and a
similar argument shows that the variance of a Poisson is also equal to θ; i.e. σ 2 = θ and σ = θ .
43 When I write X ∼ Poisson(θ) I mean that X is a random variable with its probability distribution given by the Poisson with parameter value θ.
I ask you for patience. I am going to delay my explanation of why the Poisson distribution is
important in science.
Poisson probabilities can be computed by hand with a scientiﬁc calculator. Alternatively, the
following website, which is linked to our course webpage, can be used:
I will give an example to illustrate the use of this site.
Let X ∼ Poisson(θ). The website calculates ﬁve probabilities for you:
P (X = x); P (X < x); P (X ≤ x); P (X > x); and P (X ≥ x).
You must give as input your value of θ and a value of x. Suppose that I have X ∼ Poisson(10)
and I am interested in P (X = 8). I go to the site and type ‘8’ in the box labeled ‘Poisson random
variable,’ and I type ‘10’ in the box labeled ‘Average rate of success.’ I click on the ‘Calculate’
box and the site gives me the following answers:
P (X = 8) = 0.1126; P (X < 8) = 0.2202; P (X ≤ 8) = 0.3328; P (X > 8) = 0.6672;
and P (X ≥ 8) = 0.7798.
(There is, of course, a great deal of redundancy in these ﬁve a nswers because two pairs of events
are complements of each other.)
It can be shown that for the Poisson, if θ ≤ 5 then its probability histogram is markedly
asymmetrical, but if θ ≥ 25 its probability histogram is approximately symmetric and b ell-shaped.
This last statement suggests that we might use a normal curve to compute approximate probabilities
for the Poisson, provided θ is large.
For example, suppose that X ∼ Poisson(25) and I want to calculate P (X ≥ 30). We will use
a modiﬁcation of the method we learned for the binomial.
First, we note that µ = 25 and σ = 25 = 5. Thus, our approximating curve will be the normal
curve with these values for its mean and standard deviation. Using the continuity correction, we
replace P (X ≥ 30) with P (X ≥ 29.5). Next, going to the normal curve website, we ﬁnd that
the area above (to the right of) 29.5 is 0.1841. From the Poiss on website, I ﬁnd that the exact
probability is 0.1821.
To summarize: To approximate P (X ≥ x) for X ∼ Poisson(θ),
• Use the normal curve with mean equal to θ and standard deviation equal to θ.
• Find the area under the normal curve above (to the right of) ( x − 0.5).
If θ is unknown we can use the value of X to estimate it. The point estimate is x and, following
the presentation for the binomial, we can use the snc to obtain an approximate conﬁdence interval
for θ. The result is:
x ± z x.
44 Here is an example of its use.
Ralph assumes that X has a Poisson distribution, but does not know the value of θ. He observes
x = 30. His point estimate of the mean is 30 and his 95% conﬁdence interval is
30 ± 1.96 30 = 30 ± 10.7 = [19.3, 40.7].
We will now investigate the accuracy of the snc approximation. Suppose that, in fact, θ = 40.
The 95% conﬁdence interval will be correct if, and only if,
X − 1.96 X ≤ 40 ≤ X + 1.96 X.
After algebra, this becomes (30 ≤ X ≤ 54). The probability of this event, from the website, is
0.9428, which is pretty close to the desired 0.9500.
I calculated the exact probability that the approximate CI is correct for several values of θ; my
results are below.
Exact Prob. of Correct Interval 30
0.9308 0.9368 0.9428 0.9487 0.9450 In my opinion, the approximate CI works adequately for θ ≥ 40. If you believe that θ might be
smaller than 40 (and evidence of this would be if X was smaller than 40), then you might want to
use an exact method, as I illustrated in Chapter 3 for the binomial. In fact, the website that gives
us exact CI’s for the binomial also gives exact CI’s for the Poisson.
Bart assumes that X ∼ Poisson(θ) but does not know the value of θ. He observes X = 3 and
wants to obtain:
• The two-sided 95% CI for θ; and
• The upper one-sided 95% CI for θ.
I will use the website to ﬁnd Bart’s CI’s. I type ‘3’ (the value of X ) into the ‘Observed Events:’
box and click on compute. (I don’t need to specify the conﬁdence level because the 95% two-sided
CI is the default for this site.) I get [0.6187, 8.7673] as the exact two-sided 95% CI for θ.
For the one-sided CI, I scroll down and type ‘5’ in the ‘upper tail’ box and ‘0’ in the ‘lower tail’
box. Then I scroll up and hit compute. I get the CI: [0.0008, 7.7537]. This is clearly a computer
error—round-off error—because the lower bound must be 0. So , the answer is that 7.7537 is the
95% upper bound for θ. 4.2 Poisson Approximation to the Binomial
Earlier I promised that I would provide some motivation for s tudying the Poisson distribution.
We have seen that for the binomial, if n is moderately large and p is not too close to 0 (remember, we don’t worry about p being close to 1) then a normal curve gives good approximatio ns to
binomial probabilities. In this section we will see that if p is close to 0 and n is large, the Poisson
45 can be used to approximate the binomial. Thus, the Poisson provides an approximate method in
one of the situations in which the normal curve approximatio n is poor.
I will show you the derivation of this fact below. If you have not studied calculus and limits,
you might ﬁnd this derivation too difﬁcult to follow. This proof will not be on any exam in this
course. Remember, if X ∼ Bin(n, p), then for a ﬁxed value of x,
P (X = x) = n!
px q n − x .
x!(n − x)! Now, replace p in this formula by θ/n. In my ‘limit’ argument below, as n grows, θ will remain
ﬁxed which means that p = θ/n will become smaller. We get:
P (X = x) = n!
(θ/n)x (1 − θ/n)n−x =
x!(n − x)! θx
[(1 − θ/n)n .
x (1 − θ /n)x
x! (n − x)!n Now the term in the square brackets: n!
(n − x)!nx (1 − θ/n)x , for x ﬁxed, converges (i.e. gets closer and closer) to 1 as n → ∞; thus, it can be ignored for large
As shown in calculus, as n → ∞,
(1 − θ/n)n converges to e−θ . The result follows.
In the old days this result was very useful. For very large n and small p and computations
performed by hand, the Poisson might be preferred to working with the binomial. Nowadays, as
we will see below, this result is important mostly because it gives us greater insight into problems.
Next, we will consider estimation. Suppose that we have n = 10,000 BT and there are x = 10
successes observed. The website for the exact binomial conﬁdence interval gives [0.0005, 0.0018]
for the 95% two-sided conﬁdence interval for p. Alternatively, we can approximate the distribution
of X by the Poisson with parameter θ = 10000p. Using the observed x = 10, the exact 95%
two-sided conﬁdence interval for θ is [4.7954, 18.3904]. The CI is an assertion that the following
inequality is true:
4.7954 ≤ θ ≤ 18.3904.
Now we substitute θ = 10000p and this becomes
4.7954 ≤ 10000p ≤ 18.3904.
Dividing thru by 10000, we get the following CI for p:
0.0005 ≤ p ≤ 0.0018,
46 the same answer we had when we used the binomial distribution.
Now, I would understand if you are thinking, “Why should we learn to do the conﬁdence
interval for p two ways?” Fair enough; but computers ideally do more than just give us answers to
speciﬁc questions; they let us learn about patterns in answers.
For example, suppose X ∼ Poisson(θ) and we observe X = 0. From the website, the 95%
one-sided conﬁdence interval for θ is [0, 2.9957]. Why is this interesting?
Well, I have said that we don’t care about cases where p = 0. But sometimes we might hope for
p = 0. Borrowing from the movie, Armageddon, let every day be a trial and the day is a ‘success’
if the Earth is hit by a asteroid/meteor that destroys all hum an life. Obviously, throughout human
habitation of this planet there have been no successes. Given 0 successes in n trials, the above
answer indicates that we are 95% conﬁdent that p ≤ 2.9957/n. Just don’t ask me exactly what n
equals. Or how I know that the trials are i.i.d. 4.3 The Poisson Process
The binomial distribution is appropriate for counting succ esses in n i.i.d. trials. For p small and n
large, the binomial can be well approximated by the Poisson. Thus, it is not too surprising to learn
that the Poisson is also a model for counting successes.
Consider a process evolving in time in which at ‘random times ’ successes occur. What does
this possibly mean? Perhaps the following picture will help. O
3 4 O O
6 In this picture, observation begins at time t = 0 and time passing is denoted by moving to the
right on the number line. At various times successes will occur, with each success denoted by the
letter ‘O’ placed on the number line. Here are some examples o f such processes.
1. A ‘target’ is placed near radioactive material and whenev er a radioactive particle hits the
target we have a success.
2. A road intersection is observed. A success is the occurren ce of an accident.
3. A hockey (soccer) game is watched. A success occurs whenever a goal is scored.
4. On a remote stretch of highway, a success occurs when a vehi cle passes.
The idea is that the times of occurrences of successes cannot be predicted with certainty. We
would like, however, to be able to calculate probabilities. To do this, we need a mathematical
model, much like our mathematical model for BT.
Our model is called the Poisson Process. A careful mathematical presentation and derivation
is beyond the goals of this course. Here are the basic ideas:
47 1. The number of successes in disjoint intervals are indepen dent of each other.
For example, in a Poisson Process, the number of successes in the interval [0, 3] is independent of the number of successes in the interval [5, 6].
2. The probability distribution of the number of successes counted in any time interval depends
only on the length of the interval.
For example, the probability of getting exactly ﬁve successes is the same for interval [0, 2.5]
as it is for interval [3.5, 6.0].
3. Successes cannot be simultaneous.
With these assumptions, it turns out that the probability di stribution of the number of successes
in any interval of time is the Poisson distribution with parameter θ, where θ = λ × w , where w > 0
is the length of the interval and λ > 0 is a feature of the process, often called its rate.
I have presented the Poisson Process as occurring in one dime nsion—time. It also can be
applied if the one dimension is, say, distance. For example, a researcher could be walking along a
path and occasionally ﬁnds successes. Also, the Poisson Process can be extended to two or three
dimensions. For example, in two dimensions a researcher cou ld be searching a ﬁeld for a certain
plant or animal that is deemed a success. In three dimensions a researcher could be searching a
volume of air, water or dirt looking for something of interes t.
The modiﬁcation needed for two or three dimensions is quite simple: the process still has a rate,
again called λ, and now the number of successes in an area or volume has a Pois son distribution
with θ equal to the rate multiplied by the area or volume, whichever is appropriate. 4.4 Independent Poissons
Earlier we learned that if X1 , X2 , . . . , Xn are i.i.d. dichotomous outcomes (success or failure), then
we can calculate probabilities for the sum of these guys X :
X = X1 + X2 + . . . Xn .
Probabilities for X are given by the binomial distribution. There is a similar re sult for the Poisson,
but the conditions are actually weaker. The interested read er can think about how the following
fact is implied by the Poisson Process.
Suppose that for i = 1, 2, 3, . . . , n, the random variable Xi ∼ Poisson(θi ) and that the sequence
of Xi ’s are independent. (If all of the θi ’s are the same, then we have i.i.d. The point is that we don’t
need the i.d., just the independence.) Deﬁne θ+ = n=1 θi . The result is that X ∼ Poisson(θ+ ).
Because of this result we will often (as I have done above), but not always, pretend that we have
one Poisson random variable, even if, in reality, we have a sum of independent Poisson random
variables. I will illustrate what I mean with an estimation example.
Suppose that Cathy observes 10 i.i.d. Poisson random variables, each with parameter θ. She
summarizes the ten values she obtains by computing their total, X , remembering that X ∼
Poisson(10θ). Cathy can then calculate a CI for 10θ and convert it to a CI for θ.
48 For example, suppose that Cathy observes a total of 92 when she totals her 10 values. Because
92 is so large, I will use the formula for the approximate two- sided 95% CI for 10θ. It is:
92 ± 1.96 92 = 92 ± 18.800 = [73.200, 110.800].
Thus, the two-sided 95% CI for θ is [7.320, 11.080]. By the way, the exact CI for 10θ is [74.165, 112.83].
This is typically what happens; the exact CI for a Poisson is shifted to the right of the approximate
CI. 4.5 *Why Bother with the Poisson? (Optional)
Suppose that we plan to observe an i.i.d. sequence of random variables and that each random
variable has for possible values: 0, 1, 2, . . .. (This scenario frequently occurs in science.) In this
chapter I have suggested that we assume that each random vari able has a Poisson distribution. But
why? What do we gain? Why not just do the following? Deﬁne
p0 = P (X = 0), p1 = P (X = 1), p2 = P (X = 2), . . . ,
where there is now a sequence of probabilities known only to n ature. As a researcher we can try to
estimate this sequence.
This question is an example of a, if not the, fundamental question a researcher always considers: How much math structure should we impose on a problem? Ce rtainly, the Poisson leads to
values for p0 , p1 , p2 , . . .. The difference is that with the Poisson we impose a structure on these
probabilities, whereas in the ‘general case’ we do not impos e a structure.
As with many things in human experience, some people are too extreme on this issue. Some
people put too much faith in the Poisson (or other assumed structures) and cling to it even when the
data make its continued assumption ridiculous; others claim the moral high ground and proclaim:
“I don’t make unnecessary assumptions.” I cannot give you any rules for how to behave; instead, I
will give you an extended example of how answers change when we change assumptions.
Let us consider a Poisson Process in two dimensions. For concreteness, imagine you are in a
ﬁeld searching for a plant/insect that you don’t particularly like; i.e. you will be happiest if there
are none. Thus, you might want to know the numerical value of P (X = 0). Of course, P (X = 0)
is what we call p0 and for the Poisson it is equal to e−θ .
Suppose it is true (i.e. this is what Nature knows) that X ∼ Poisson(0.6931) which makes
P (X = 0) = e−0.6931 = 0.500.
Suppose further that we have two researchers:
• Researcher A assumes Poisson with unknown θ.
• Researcher B assumes no parametric structure; i.e. B wants to know p0 .
49 Note that both researchers want to get an estimate of 0.500 fo r P (X = 0).
Suppose that the two researchers observe the same data, name ly n = 10 trials. Who will do
better? Well, we answer this question by simulating the data. I used my computer to simulate
n = 10 i.i.d. trials from the Poisson(0.6931) and obtained the following data:
1, 0, 1, 0, 3, 1, 2, 0, 0, 4.
Researcher B counts four occurrences of ‘0’ in the sample and estimates P (X = 0) to be 4/10 =
0.4. Researcher A estimates θ by the mean of the 10 numbers: 12/10 = 1.2 and then estimates
P (X = 0) by e−1.2 = 0.3012. In this one simulated data set, each researcher’s estimate is too low
and Researcher B does better than A.
One data set, however, is not conclusive. So, I simulated 999 more data sets of size n = 10 to
obtain a total of 1000 simulated data sets. In this simulation, sometimes A did better, sometimes B
did better. Statisticians try to decide which does better overall.
First, we look at how each researcher did on average. If you av erage the 1,000 estimates for
A you get 0.5226 and for B you get 0.5066. Surprisingly, B, who makes fewer assumptions, is,
on average, closer to the truth. When we ﬁnd a result in a simul ation study that seems surprising
we should wonder whether it is a false alarm caused by the appr oximate nature of simulation
answers. While I cannot explain why at this point, I will simply say that this is not a false alarm. A
consequence of assuming Poisson is that, especially for small values of n, there can be some bias
in the mean value of an estimate. By contrast, the fact that the mean of the estimates by B exceeds
0.5 is not meaningful; i.e. B’s method does not possess bias.
I will still conclude that A is better than B, despite the bias; I will now describe the basis for
From the point-of-view of Nature, who knows the truth, every estimate value has an error: e =
estimate minus truth. In this simulation the error e is the estimate minus 0.5. Now errors can be
positive or negative. Also, trying to make sense of 1000 erro rs is too difﬁcult; we need a way to
summarize them. Statisticians advocate averaging the erro rs after making sure that the negatives
and positives don’t cancel. We have two preferred ways of doi ng this:
• Convert each error to an absolute error by taking its absolute value.
• Convert each error to a squared error by squaring it.
For my simulation study, the mean absolute error is 0.1064 for A and 0.1240 for B. Because
there is a minimum theoretical value of 0 for the mean absolut e error, it makes sense to summarize
this difference by saying that the mean absolute error for A is 14.2% smaller than it is for B. This
14.2% is my preferred measure and why I conclude that A is better than B.
As we will see, statisticians like to square errors, althoug h justifying this in an intuitive way
is a bit difﬁcult. I will just remark that for this simulation study, the mean squared error for A is
0.001853 and for B it is 0.002574. (Because all of the absolute errors are 0.5 or smaller, squaring
the errors make them smaller.)
To revisit the issue of bias, I repeated the above simulation study, but now with n = 100. The
mean of the estimates for A is 0.5006 and for B is 0.5007. These discrepancies from 0.5 are not
meaningful; i.e. there is no bias.
View Full Document