Undergraduate probability
These notes are based on A First Course in Probability Theory, 6th edition, by S. Ross.
1. Combinatorics.
The rst basic principle is to multiply.
Suppose we have 4 shirts of
A synonym for normal is Gaussian. The rst thing to do is show that this is a density. Let I =
Then
2
ex
I2 =
/2 y 2 /2
e
x2 /2
e
dx.
0
dx dy.
0
Changing to polar coordinates,
/2
rer
I2 =
So I =
/2,
Answer. Here p = 1 , so np = 30 and
6
np(1 p) = 5. Then P(Sn > 50) P(Z > 4), which is very small.
Example. Suppose a drug is supposed to be 75% eective. It is tested on 100 people. What is the probabi
Answer. Here p = 1 , so np = 30 and
6
np(1 p) = 5. Then P(Sn > 50) P(Z > 4), which is very small.
Example. Suppose a drug is supposed to be 75% eective. It is tested on 100 people. What is the probabi
Cauchy. Here
f (x) =
1
1
.
1 + (x )2
What is interesting about the Cauchy is that it does not have nite mean, that is, E |X | = .
Often it is important to be able to compute the density of Y = g (X )
The multivariate distribution function of (X, Y ) is dened by FX,Y (x, y ) = P(X x, Y y ). In the
continuous case, this is
x
y
fX,Y (x, y )dy dx,
and so we have
f (x, y ) =
2F
(x, y ).
xy
The extensio
One can conclude from this that
fX,Y (x, y ) = fX (x)fY (y ),
or again the joint density factors. Going the other way, one can also see that if the joint density factors,
then one has independence.
Ex
Note that it is not always the case that the sum of two independent random variables will be a random
variable of the same type.
If X and Y are independent normals, then Y is also a normal (with E (Y
11. Expectations.
As in the one variable case, we have
E g (X, Y ) =
g (x, y )p(x, y )
in the discrete case and
E g (X, Y ) =
g (x, y )f (x, y )dx dy
in the continuous case.
If we set g (x, y ) = x +
Proposition 11.2. If X and Y are independent, then
Var (X + Y ) = Var X + Var Y.
Proof. We have
Var (X + Y ) = Var X + Var Y + 2Cov (X, Y ) = Var X + Var Y.
Since a binomial is the sum of n independen
Proposition 12.2. If mX (t) = mY (t) < for all t in an interval, then X and Y have the same distribution.
We will not prove this, but this is essentially the uniqueness of the Laplace transform. Note
Proposition 13.2. If Y 0, then for any A,
P(Y > A)
EY
.
A
Proof. Let B = cfw_Y > A. Recall 1B is the random variable that is 1 if B and 0 otherwise. Note
1B Y /A. This is obvious if B , while if B ,
Proposition 13.2. If Y 0, then for any A,
P(Y > A)
EY
.
A
Proof. Let B = cfw_Y > A. Recall 1B is the random variable that is 1 if B and 0 otherwise. Note
1B Y /A. This is obvious if B , while if B ,
If x [k/2n , (k + 1)/2n ), then x diers from k/2n by at most 1/2n . So the last integral diers from
(k+1)/2n
xf (x)dx
k/2n
by at most
(1/2n )P(k/2n X < (k + 1)/2n ) 1/2n , which goes to 0 as n . On th
Hypergeometric. Set
N m
ni
N
n
m
i
P(X = i) =
.
This comes up in sampling without replacement: if there are N balls, of which m are one color and the other
N m are another, and we choose n balls at ra
doesnt matter, we will have the letters a, b, c 6 times, because there are 3! ways of arranging 3 letters. The
same is true for any choice of three letters. So we should have 5 4 3/3!. We can rewrite
In general, to divide n objects into one group of n1 , one group of n2 , . . ., and a k th group of nk , where
n = n1 + + nk , the answer is
n!
.
n1 !n2 ! nk !
These are known as multinomial coecients
Typically we will take F to be all subsets of S , and so (i)-(iii) are automatically satised. The only
times we wont have F be all subsets is for technical reasons or when we talk about conditional ex
one of, so the answer is
4
4
13 12
4
1
.
52
5
Example. What is the probability that in a poker hand we get exactly 3 of a kind (and he other two cards
are of dierent ranks)?
Answer. The probability of
Suppose there are 200 men, of which 100 are smokers, and 100 women, of which 20 are smokers. What
is the probability that a person chosen at random will be a smoker? The answer is 120/300. Now, let us
Answer. Let D be the families that own a dog, and C the families that own a cat. We are given P(D) =
.36, P(C ) = .30, P(C | D) = .22 We want to know P(D | C ). We know P(D | C ) = P(D C )/P(C ). To n
Proposition 3.1. If E and F are independent, then E and F c are independent.
Proof.
P(E F c ) = P(E ) P(E F ) = P(E ) P(E )P(F ) = P(E )[1 P(F )] = P(E )P(F c ).
We say E , F , and G are independent i
that is not possible. Neither can the slope be positive, or else we would have y < 0, and again this is not
possible, because probabilities must be between 0 and 1. Therefore the slope must be 0, or y
We have
1
E X = 1( 1 ) + 2( 1 ) + 3( 16 +
4
8
=
1
4
=
1
4
1 + 2( 1 ) + 3( 1 ) +
2
4
1
= 1.
(1 1 )2
2
Example. Suppose we roll a fair die. If 1 or 2 is showing, let X = 3; if a 3 or 4 is showing, let
Similarly we have E (cX ) = cE X if c is a constant. These linearity results are quite hard using the rst
denition.
It turns out there is a formula for the expectation of random variables like X 2 and
5. Some discrete distributions.
Bernoulli. A r.v. X such that P(X = 1) = p and P(X = 0) = 1 p is said to be a Bernoulli r.v. with
parameter p. Note E X = p and E X 2 = p, so Var X = p p2 = p(1 p).
nk
5. Some discrete distributions.
Bernoulli. A r.v. X such that P(X = 1) = p and P(X = 0) = 1 p is said to be a Bernoulli r.v. with
parameter p. Note E X = p and E X 2 = p, so Var X = p p2 = p(1 p).
nk
so E X 2 = E (X 2 X ) + |EX = 2 + , and hence Var X = .
Example. Suppose on average there are 5 homicides per month in a given city. What is the probability there
will be at most 1 in a certain month?
Answer. We want
P
Sn
3.5 > .05 .
n
We rewrite this as
P(|Sn nE X1 | > (.05)(3600) = P
Sn n E X 1
180
>
n Var X1
(60) 35
12
P(|Z | > 1.756) .08.
Example. Suppose the lifetime of a human has expectati