This preview shows page 1. Sign up to view the full content.
Unformatted text preview: .16 illustrates the
advantages of this approach, particularly where it is initially unclear whether or not the
expectation is ﬁnite. The following example shows that this approach can sometimes hide
convergence questions and give the wrong answer.
Example 1.3.5. Let Y be a geometric rv with the PDF fY (y ) = 2−y for integer y ≥ 1.
Let X be an integer rv that, conditional on Y , is binary with equiprobable values ±2y
given Y = y . We then see that E [X  Y = y ] = 0 for all y , and thus, (1.37) indicates that
E [X ] = 0. On the other hand, it is easy to see that pX (2k ) = 2−k−1 for each integer k ≥ 1
and pX (−2k ) = 2−k−1 for each integer k ≤ 1. Thus the expectation over positive values of
X is 1 and that over negative values is −1. In other words, the expected value of X is
undeﬁned and (1.37) is incorrect.
The diﬃculty in the above example cannot occur if X is a nonnegative rv. Then (1.37) is
simply a sum of a countable number of nonnegative terms, and thus it either converges to
a ﬁnite sum independent of the order of summation, or it diverges to 1, again independent
of the order of summation.
If X has both positive and negative components, we can separate it into X = X + +X − where
X + = max(0, X ) and X − = min(X, 0). Then (1.37) applies to X + and −X − separately.
If at most one is inﬁnite, then (1.37) applies to X , and otherwise X is undeﬁned. This is
summarized in the following theorem: 24 CHAPTER 1. INTRODUCTION AND REVIEW OF PROBABILITY Theorem 1.1 (Total expectation). Let X and Y be discrete rv’s. If X is nonnegative,
P
then E [X ] = E [E [X  Y ]] = y pY (y )E [X  Y = y ]. If X has both positive and negative
values, and if at most one of E [X + ] and E [−X − ] is inﬁnite, then E [X ] = E [E [X  Y ]] =
P
y pY (y )E [X  Y = y ]. We have seen above that if Y is a discrete rv, then the conditional expectation E [X Y = y ]
is little more complicated than the unconditional expectation, and this is true whether X
is discrete, continuous, or arbitrary. If X and Y are continuous, we can essentially extend
these results to probability densities. In particular,
Z1
E [X  Y = y ] =
x fX Y (x  y ),
(1.36)
−1 and
E [X ] = Z 1 −1 fY (y )E [X  Y =y ] dy = Z 1 fY (y ) −1 Z 1 −1 x fX Y (x  y ) dx dy (1.37) We do not state this as a theorem because the details about the integration do not seem
necessary for the places where it is useful. 1.3.9 Indicator random variables For any event A, the indicator random variable of A, denoted IA , is a binary rv that has
the value 1 for all ω ∈ A and the value 0 otherwise. Thus, as illustrated in Figure 1.6, the
distribution function FIX (x) is 0 for x < 0, 1 − Pr {A} for 0 ≤ x < 1, and 1 for x ≥ 1. it is
obvious, by comparing Figures 1.6 and 1.3 that E [IA ] = Pr {A}.
1 IA 1 − Pr {A}
0 1 0 Figure 1.6: The distribution function of an indicator random variable.
Indicator rv’s are useful because they allow us to apply the many known results about rv’s
and expectations to events. For example, the laws of large numbers are expressed in terms
of sums of rv’s, and those results all translate into results about relative frequencies through
the use of indicator functions. 1.3.10 Transforms The moment generating function (mgf ) for a rv X is given by
Z1
£
§
gX (r) = E erX =
erx dFX (x).
−1 (1.38) 1.4. THE LAWS OF LARGE NUMBERS 25 Viewing r as a real variable, we see that for r > 0, gX (r) only exists if 1 − FX (x) approaches 0
at least exponentially as x → 1. Similarly, for r < 0, gX (r) exists only if FX (x) approaches
0 at least exponentially as x → −1. If gX (r) exists in a region of r around 0, then derivatives
of all orders exist, given by
Ø
Z1
@ n gX (r)
@ n gX (r) Ø
Ø
n rx
=
x e dFX (x) ;
= E [X n ] .
(1.39)
Ø
@ rn
@ rn Ø
−1
r=0 This shows that ﬁnding the moment generating function often provides a convenient way
to calculate the moments of a random variable. Another convenient feature of moment
generating functions is their use in dealing with sums of independent rv’s. For example,
suppose S = X1 + X2 + · · · + Xn . Then
h
≥Xn
¥i
hYn
i Yn
£§
gS (r) = E erS = E exp
rXi = E
exp(rXi ) =
gXi (r).
(1.40)
i=1 i=1 i=1 In the last step here, we have used a result of Exercise 1.11, which shows that for independent
rv’s, the mean of the product is equal to the product of the means. If X1 , . . . , Xn are also
IID, then
gS (r) = [gX (r)]n . (1.41) The variable r in the mgf can also be viewed as a complex variable, giving rise to a number
of other transforms. A particularly important case is to view r as a pure imaginary variable,
√
say iω where i = −1 and ω is real. The mgf is then called the characteristic function.
Since eiωx  is 1 for all x, gX (iω ) must always exist, and its magnitude is at most one. Note
that gX (iω ) is the inverse Fourier transform of the density of X .
The Ztransform is the result of replacing er with z in gX (r). This is useful pri...
View
Full
Document
This note was uploaded on 09/27/2010 for the course EE 229 taught by Professor R.srikant during the Spring '09 term at University of Illinois, Urbana Champaign.
 Spring '09
 R.Srikant

Click to edit the document details