Undergraduate probability
These notes are based on A First Course in Probability Theory, 6th edition, by S. Ross.
1. Combinatorics.
The rst basic principle is to multiply.
Suppose we have 4 shirts of 4 dierent colors and 3 pants of dierent colors. How man
A synonym for normal is Gaussian. The rst thing to do is show that this is a density. Let I =
Then
2
ex
I2 =
/2 y 2 /2
e
x2 /2
e
dx.
0
dx dy.
0
Changing to polar coordinates,
/2
rer
I2 =
So I =
/2, hence
Note
2
ex /2 dx
=
2
/2
dr = /2.
0
0
2 as it shou
Example. Find P(1 X 4) if X is N (2, 25).
Answer. Write X = 2 + 5Z . So
P(1 X 4) = P(1 2 + 5Z 4) = P(1 5Z 2) = P(0.2 Z .4)
= P(Z .4) P(Z 0.2) = (0.4) (0.2) = .6554 [1 (0.2)]
= .6554 [1 .5793].
Example. Find c such that P(|Z | c) = .05.
Answer. By symmetry
Answer. Here p = 1 , so np = 30 and
6
np(1 p) = 5. Then P(Sn > 50) P(Z > 4), which is very small.
Example. Suppose a drug is supposed to be 75% eective. It is tested on 100 people. What is the probability
more than 70 people will be helped?
Answer. Here S
Answer. Here p = 1 , so np = 30 and
6
np(1 p) = 5. Then P(Sn > 50) P(Z > 4), which is very small.
Example. Suppose a drug is supposed to be 75% eective. It is tested on 100 people. What is the probability
more than 70 people will be helped?
Answer. Here S
Cauchy. Here
f (x) =
1
1
.
1 + (x )2
What is interesting about the Cauchy is that it does not have nite mean, that is, E |X | = .
Often it is important to be able to compute the density of Y = g (X ). Let us give a couple of examples.
If X is uniform on
The multivariate distribution function of (X, Y ) is dened by FX,Y (x, y ) = P(X x, Y y ). In the
continuous case, this is
x
y
fX,Y (x, y )dy dx,
and so we have
f (x, y ) =
2F
(x, y ).
xy
The extension to n random variables is exactly similar.
We have
d
b
One can conclude from this that
fX,Y (x, y ) = fX (x)fY (y ),
or again the joint density factors. Going the other way, one can also see that if the joint density factors,
then one has independence.
Example. Suppose one has a oor made out of wood planks an
Note that it is not always the case that the sum of two independent random variables will be a random
variable of the same type.
If X and Y are independent normals, then Y is also a normal (with E (Y ) = E Y and Var (Y ) =
(1)2 Var Y = Var Y ), and so X Y
11. Expectations.
As in the one variable case, we have
E g (X, Y ) =
g (x, y )p(x, y )
in the discrete case and
E g (X, Y ) =
g (x, y )f (x, y )dx dy
in the continuous case.
If we set g (x, y ) = x + y , then
E (X + Y ) =
(x + y )f (x, y )dx dy =
xf (x, y
Proposition 11.2. If X and Y are independent, then
Var (X + Y ) = Var X + Var Y.
Proof. We have
Var (X + Y ) = Var X + Var Y + 2Cov (X, Y ) = Var X + Var Y.
Since a binomial is the sum of n independent Bernoullis, its variance is np(1 p). If we write
n
X
Proposition 12.2. If mX (t) = mY (t) < for all t in an interval, then X and Y have the same distribution.
We will not prove this, but this is essentially the uniqueness of the Laplace transform. Note E etX =
etx fX (x)dx. If fX (x) = 0 for x < 0, this is
Proposition 13.2. If Y 0, then for any A,
P(Y > A)
EY
.
A
Proof. Let B = cfw_Y > A. Recall 1B is the random variable that is 1 if B and 0 otherwise. Note
1B Y /A. This is obvious if B , while if B , then Y ( )/A > 1 = 1B ( ). We then have
/
P(Y > A) = P(
Proposition 13.2. If Y 0, then for any A,
P(Y > A)
EY
.
A
Proof. Let B = cfw_Y > A. Recall 1B is the random variable that is 1 if B and 0 otherwise. Note
1B Y /A. This is obvious if B , while if B , then Y ( )/A > 1 = 1B ( ). We then have
/
P(Y > A) = P(
If x [k/2n , (k + 1)/2n ), then x diers from k/2n by at most 1/2n . So the last integral diers from
(k+1)/2n
xf (x)dx
k/2n
by at most
(1/2n )P(k/2n X < (k + 1)/2n ) 1/2n , which goes to 0 as n . On the other hand,
(k+1)/2n
M
xf (x)dx =
k/2n
xf (x)dx,
0
wh
Hypergeometric. Set
N m
ni
N
n
m
i
P(X = i) =
.
This comes up in sampling without replacement: if there are N balls, of which m are one color and the other
N m are another, and we choose n balls at random without replacement, then X represents the probabi
doesnt matter, we will have the letters a, b, c 6 times, because there are 3! ways of arranging 3 letters. The
same is true for any choice of three letters. So we should have 5 4 3/3!. We can rewrite this as
543
5!
=
3!
3!2!
This is often written
5
, read
In general, to divide n objects into one group of n1 , one group of n2 , . . ., and a k th group of nk , where
n = n1 + + nk , the answer is
n!
.
n1 !n2 ! nk !
These are known as multinomial coecients.
Suppose one has 8 indistinguishable balls. How many w
Typically we will take F to be all subsets of S , and so (i)-(iii) are automatically satised. The only
times we wont have F be all subsets is for technical reasons or when we talk about conditional expectation.
So now we have a space S , a -eld F , and we
one of, so the answer is
4
4
13 12
4
1
.
52
5
Example. What is the probability that in a poker hand we get exactly 3 of a kind (and he other two cards
are of dierent ranks)?
Answer. The probability of 3 aces, 1 king and 1 queen is
the rank we have 3 of an
Suppose there are 200 men, of which 100 are smokers, and 100 women, of which 20 are smokers. What
is the probability that a person chosen at random will be a smoker? The answer is 120/300. Now, let us
ask, what is the probability that a person chosen at r
Answer. Let D be the families that own a dog, and C the families that own a cat. We are given P(D) =
.36, P(C ) = .30, P(C | D) = .22 We want to know P(D | C ). We know P(D | C ) = P(D C )/P(C ). To nd the
numerator, we use P(D C ) = P(C | D)P(D) = (.22)(
Proposition 3.1. If E and F are independent, then E and F c are independent.
Proof.
P(E F c ) = P(E ) P(E F ) = P(E ) P(E )P(F ) = P(E )[1 P(F )] = P(E )P(F c ).
We say E , F , and G are independent if E and F are independent, E and G are independent, F a
that is not possible. Neither can the slope be positive, or else we would have y < 0, and again this is not
possible, because probabilities must be between 0 and 1. Therefore the slope must be 0, or y (x) is constant,
or y (x) = 1 for all x. In other word
We have
1
E X = 1( 1 ) + 2( 1 ) + 3( 16 +
4
8
=
1
4
=
1
4
1 + 2( 1 ) + 3( 1 ) +
2
4
1
= 1.
(1 1 )2
2
Example. Suppose we roll a fair die. If 1 or 2 is showing, let X = 3; if a 3 or 4 is showing, let X = 4, and if
a 5 or 6 is showing, let X = 10. What is
Similarly we have E (cX ) = cE X if c is a constant. These linearity results are quite hard using the rst
denition.
It turns out there is a formula for the expectation of random variables like X 2 and eX . To see how
this works, let us rst look at an exam
5. Some discrete distributions.
Bernoulli. A r.v. X such that P(X = 1) = p and P(X = 0) = 1 p is said to be a Bernoulli r.v. with
parameter p. Note E X = p and E X 2 = p, so Var X = p p2 = p(1 p).
nk
p (1 p)nk .
k
The number of successes in n trials is a
5. Some discrete distributions.
Bernoulli. A r.v. X such that P(X = 1) = p and P(X = 0) = 1 p is said to be a Bernoulli r.v. with
parameter p. Note E X = p and E X 2 = p, so Var X = p p2 = p(1 p).
nk
p (1 p)nk .
k
The number of successes in n trials is a
so E X 2 = E (X 2 X ) + |EX = 2 + , and hence Var X = .
Example. Suppose on average there are 5 homicides per month in a given city. What is the probability there
will be at most 1 in a certain month?
Answer. If X is the number of homicides, we are given
Answer. We want
P
Sn
3.5 > .05 .
n
We rewrite this as
P(|Sn nE X1 | > (.05)(3600) = P
Sn n E X 1
180
>
n Var X1
(60) 35
12
P(|Z | > 1.756) .08.
Example. Suppose the lifetime of a human has expectation 72 and variance 36. What is the probability that
the