We put the above together and obtain
mYn (t) = mSn (t/ n)
= [mX1 (t/ n)]n
(t/ n)2
+ Rn
= 1+t0+
2!
n
t2
= 1+
+ Rn ,
2n
2 /2
where |Rn |/n 0 as n . This converges to et
48
n
= mZ (t) as n .
The previous proposition then implies that X + Y is a N (a + c, b2 + d2 ).
Similarly, if X and Y are independent Poisson random variables with
parameters a and b, resp., then
t
t
t
mX +Y (t) = mX (t)mY (t) = ea(e 1) eb(e 1) = e(a+b)(e 1) ,
which is the mo
3. Exponential:
etx ex dx =
E etX =
0
t
if t < and if t .
4. N (0, 1):
1
2
etx ex
2 /2
2 /2
dx = et
1
2
e(xt)
2 /2
2 /2
dx = et
.
5. N (, 2 ): Write X = + Z . Then
E etX = E et etZ = et e(t)
2 /2
2 2 /2
= et+t
.
Proposition 18.1 If X and Y are independent
One more example. Suppose X1 , X2 , . . . is an i.i.d. sequence and each Xi
has mean 0 and variance 25. How large must n be so that the probability
that the absolute value of the average is less than .1 is at least .99?
Answer. We want to choose n such th
Theorem 17.4 Suppose the Xi are i.i.d. Suppose E Xi2 < . Let = E Xi
and 2 = Var Xi . Then
P a
Sn n
b P(a Z b)
n
for every a and b, where Z is a N (0, 1).
The ratio on the left is (Sn E Sn )/ Var Sn . We do not claim that this
ratio converges for any (in
Theorem 17.4 Suppose the Xi are i.i.d. Suppose E Xi2 < . Let = E Xi
and 2 = Var Xi . Then
P a
Sn n
b P(a Z b)
n
for every a and b, where Z is a N (0, 1).
The ratio on the left is (Sn E Sn )/ Var Sn . We do not claim that this
ratio converges for any (in
Proposition 17.2 If Y 0, then for any A,
P(Y > A)
EY
.
A
Proof. Let B = cfw_Y > A. Recall 1B is the random variable that is 1 if
B and 0 otherwise. Note 1B Y /A. This is obvious if B , while if
/
B , then Y ( )/A > 1 = 1B ( ). We then have
EY
.
A
P(Y >
When b a is small, there is a correction that makes things more accurate,
1
namely replace a by a 1 and b by b + 2 . This correction never hurts and is
2
sometime necessary. For example, in tossing a coin 100 times, there ispositive
probability that there
When b a is small, there is a correction that makes things more accurate,
1
namely replace a by a 1 and b by b + 2 . This correction never hurts and is
2
sometime necessary. For example, in tossing a coin 100 times, there ispositive
probability that there
16
Normal approximation to the binomial
A special case of the central limit theorem is
Proposition 16.1 If Sn is a binomial with parameters n and p, then
P a
Sn np
np(1 p)
b P(a Z b),
as n , where Z is a N (0, 1).
This approximation is good if np(1 p) 10
The covariance of two random variables X and Y is dened by
Cov (X, Y ) = E [(X E X )(Y E Y )].
As with the variance, Cov (X, Y ) = E (XY )(E X )(E Y ). It follows that if X
and Y are independent, then E (XY ) = (E X )(E Y ), and then Cov (X, Y ) =
0.
Note
15
Expectations
As in the one variable case, we have
E g (X, Y ) =
g (x, y )p(x, y )
in the discrete case and
E g (X, Y ) =
g (x, y )f (x, y )dx dy
in the continuous case.
If we set g (x, y ) = x + y , then
E (X + Y ) =
=
(x + y )f (x, y )dx dy
xf (x, y )
Just as in the one-dimensional case, there is a change of variables formula.
Let us recall how the formula goes in one dimension. If X has a density fX
and y = g (X ), then
FY (y ) = P(Y y ) = P(g (X ) y ) = P(X g 1 (y ) = FX (g 1 (y ).
Taking the derivat
gamma with parameters s + t and . In particular, the sum of n independent
exponentials with parameter is a gamma with parameters n and .
(2) If Z is a N (0, 1), then FZ 2 (y ) = P(Z 2 y ) = P( y Z y ) =
FZ ( y ) FZ ( y ). Dierentiating shows that fZ 2 (y
One can conclude from this that
fX,Y (x, y ) = fX (x)fY (y ),
or again the joint density factors. Going the other way, one can also see that
if the joint density factors, then one has independence.
An example. Suppose one has a oor made out of wood planks
or
P(X, Y ) D) =
fX,Y dy dx
D
when D is the set cfw_(x, y ) : a x b, c y d. One can show this holds
when D is any set. For example,
P(X < Y ) =
fX,Y (x, y )dy dx.
cfw_x<y
If one has the joint density of X and Y , one can recover the densities of
X and of
14
Multivariate distributions
We want to discuss collections of random variables (X1 , X2 , . . . , Xn ), which
are known as random vectors. In the discrete case, we can dene the density
p(x, y ) = P(X = x, Y = y ). Remember that here the comma means and.
2
Remembering that fX (t) = 1 et /2 and doing some algebra, we end up
2
with
1
fY (x) = x1/2 ex/2 ,
2
which is a Gamma with parameters
degree of freedom.)
1
2
33
1
and 2 . (This is also a 2 with one
An exponential is the time for something to occur. A gamma is the time
for t events to occur. A gamma with parameters 1 and n is known as a 2 ,
n
2
2
a chi-squared random variable with n degrees of freedom. Gammas and chisquareds come up frequently in sta
and integrate.
In particular, for x, large,
1 1 x2 / 2
2
e
ex /2 .
P(Z x) = 1 (x)
2 x
13
Some continuous distributions
We look at some other continuous random variables besides normals.
Uniform. Here f (x) = 1/(b a) if a x b and 0 otherwise. To compute
The distribution function of a standard N (0, 1) is often denoted (x), so
that
x
1
2
(x) =
ey /2 dy.
2
Tables of (x) are often given only for x > 0. One can use the symmetry of
the density function to see that
(x) = 1 (x);
this follows from
x
(x) = P(Z
Changing to polar coordinates,
/2
rer
I2 =
0
So I =
/2, hence
2 /2
dr = /2.
0
2
ex /2 dx
=
2 as it should.
Note
xex
2 /2
dx = 0
by symmetry, so E Z = 0. For the variance of Z , we use integration by parts:
1
E Z2 =
2
x2 ex
2 /2
1
dx =
2
x xex
2 /2
dx.
Proposition 11.1 E g (X ) =
g (x)f (x)dx.
As in the discrete case,
Var X = E [X E X ]2 .
As an example of these calculations, let us look at the uniform distribution.
We say that a random variable X has a uniform distribution on [a, b] if
1
fX (x) = ba if
We give another denition of the expectation in the continuous case. First
suppose X is nonnegative and bounded above by a constant M . Dene Xn ( )
to be k/2n if k/2n X ( ) < (k +1)/2n . We are approximating X from below
by the largest multiple of 2n . Eac
11
Continuous distributions
A random variable X is said to have a continuous distribution if there exists
a nonnegative function f such that
b
P(a X b) =
f (x)dx
a
for every a and b. (More precisely, such an X is said to have an absolutely continuous dist
As an example, let g (x, y ) = x + y . Then
E (X + Y ) =
(x + y )P(X = x, Y = y )
x,y
=
xP(X = x, Y = y ) +
x
=
y
y
xP(X = x) +
x
y P(X = x, Y = y )
y P(Y = y )
y
= E X + E Y,
a result we had previously proved another way.
25
x
10
Joint distributions
Given two discrete random variables X and Y , we can talk about the joint
distribution or joint density:
P(X = x, Y = y ).
Here the comma means and and this is an abbreviation for
P(X = x) (Y = y ).
An example: Suppose we roll two d
or
P(F E ) = P(E )P(F ),
which agrees with the denition of independence we gave before.
Let us give two more examples.
An example: Suppose an urn holds 5 red balls and 7 green balls. You
draw two balls without replacement. What is the probability the seco
.30, P(A | M ) = .25, P(W ) = .60 and we want P(W | A). From the denition
P(W | A) =
P(W A)
.
P(A)
As in the previous example,
P(W A) = P(A | W )P(W ) = (.30)(.60) = .18.
To nd P(A), we write
P(A) = P(W A) + P(M A).
Since the class is 40% men,
P(M A) = P(
An example: A family has 2 children. Given that one of the children is a
boy, what is the probability that the other child is also a boy?
Answer. Let B be the event that one child is a boy, and A the event
that both children are boys. The possibilities ar