This preview shows page 1. Sign up to view the full content.
Unformatted text preview: EE420/500 Class Notes 06/12/09 John Stensby Chapter 2  Random Variables
In this and the chapters that follow, we denote the real line as R = (∞ < x < ∞), and the
extended real line is denoted as R+ ≡ R∪{±∞}. The extended real line is the real line with ±∞
thrown in.
Put simply (and incompletely), a random variable is a function that maps the sample
space S into the extended real line.
Example 21: In the die experiment, we assign to the six outcomes fi the numbers X(fi) = 10i.
Thus, we have X(f1) = 10, X(f2) = 20, X(f3) = 30, X(f4) = 40, X(f5) = 50, X(f6) = 60.
For an arbitrary value x0, we must be able to answer questions like “what is the
probability that random variable X is less than x0?” Hence, the set { ρ ∈ S : X(ρ) ≤ x0 } must be
an event (i.e., the set must belong to σalgebra F ) for every x0 (sometimes, the algebraic, nonrandom variable x0 is said to be a realization of the random variable X). This leads to the more
formal definition.
Definition: Given a probability space (S, F, P), a random variable X(ρ) is a function X : S → R+ . (21) That is, random variable X is a function that maps sample space S into the extended real line R+.
In addition, random variable X must satisfy the two criteria discussed below.
1) Recall the Borel σalgebra B of subsets of R that was discussed in Chapter 1 (see Example
19). For each B ∈ B, we must have
X −1 (B) ≡ [ ω∈ S : X(ω) ∈ B] ∈ F , B∈ B . (22) A function that satisfies this criteria is said to be measurable. A random variable X must be
a measurable function.
2) P[ω ∈ S : X(ω) = ± ∞] = 0. Random variable X is allowed to take on the values of ± ∞;
Updates at http://www.ece.uah.edu/courses/ee420500/ 21 EE420/500 Class Notes 06/12/09 John Stensby however, it must take on the values of ± ∞ with a probability of zero.
These two conditions hold for most elementary applications. Usually, they are treated as mere
technicalities that impose no real limitations on real applications of random variables (usually,
they are not given much thought).
However, good reasons exist for requiring that random variable X satisfy the conditions
1) and 2) listed above. In our experiment, recall that sample space S describes the set of
elementary outcomes. Now, it may happen that we cannot directly observe elementary outcomes
ω ∈ S. Instead, we may be forced to use a measuring instrument (i.e., random variable) that
would provide us with measurements X(ω), ω ∈ S. Now, for each α ∈ R, we need to be able to
compute the probability P[ ∞ < X(ω) ≤ α], because [ ∞ < X(ω) ≤ α] is a meaningful,
observable event in the context of our experiment/measurements. For probability P[∞ < X(ω) ≤
α] to exist, we must have [∞ < X(ω) ≤ α] as an event; that is, we must have [ω∈ S : −∞ < X(ω) ≤ α ] ∈ F (23) for each α ∈ R. It is possible to show that Conditions (22) and (23) are equivalent. So, while
(22) (or the equivalent (23)) may be a mere technicality, it is an important technicality.
SigmaAlgebra Generated by Random Variable X Suppose that we are given a probability space (S, F, P) and a random variable X as
described above. Random variable X induces a σalgebra σ(X) on S. σ(X) consists of all sets of
the form {ω ∈ S : X(ω) ∈ B, B ∈ B}, where B denotes the σalgebra of Borel subsets of R that
was discussed in Chapter 1 (see Example 19). Note that σ(X) ⊂ F ; we say that σ(X) is the sub
σalgebra of F that is generated by random variable X.
Probability Space Induced by Random Variable X Suppose that we are given a probability space (S, F, P) and a random variable X as
described above. Let B be the Borel σalgebra introduced by Example 19. By (22), for each B
∈ B, we have {ω ∈ S : X(ω) ∈ B} ∈ F, so P[{ω ∈ S : X(ω) ∈ B}] is well defined.
Updates at http://www.ece.uah.edu/courses/ee420500/ 22 EE420/500 Class Notes 06/12/09 John Stensby This allows us to use random variable X to define (R+, B, P′), a probability space
induced by X. Probability measure P′ is defined as follows: for each B ∈ B, we define P′(B) ≡
P[{ω ∈ S : X(ω) ∈ B}]. We say that P induces probability measure P′.
Distribution and Density Functions The distribution function of the random variable X(ρ) is the function F(x) = P[X(ρ) ≤ x] = P[ρ ∈ S : X(ρ) ≤ x] , (24) where − ∞ < x < ∞.
Example 22: Consider the coin tossing experiment with P[heads] = p and P[tails] = q ≡ 1  p.
Define the random variable
X(head) = 1
X(tail) = 0. If x ≥ 1, then both X(head) = 1 ≤ x and X(tail) = 0 ≤ x so that F( x ) = 1 for x ≥ 1 . If 0 ≤ x < 1, then X(head) = 1 > x and X(tail) = 0 ≤ x so that F(x) = P[X ≤ x] = q for 0 ≤ x < 1 Finally, if x < 0, then both X(head) = 1 > x and X(tail) = 0 > x so that F(x) = P[X ≤ x] = 0 for x < 0 . Updates at http://www.ece.uah.edu/courses/ee420500/ 23 EE420/500 Class Notes 06/12/09 John Stensby F(x)
1
q
x
1 Figure 21: Distribution function for X(head)
=1, X(tail) = 0 random variable.
See Figure 21 for a graph of F(x). Properties of Distribution Functions
First, some standard notation:
F( x + ) ≡ limit F( ρ ) and F( x − ) ≡ limit F( ρ ) .
+
−
ρ→ x ρ→ x Some properties of distribution functions are listed below. Claim #1: F(+∞) = 1 and F(∞) = 0. (25) Proof: F(+∞) = limit P[X ≤ x] = P[S ]=1 and F(∞) = limit P[X ≤ x] = P[{∅}]=0 .
x →∞ x →−∞ Claim #2: The distribution function is a nondecreasing function of x. If x1 < x2, then F(x1) ≤
F(x2). Proof: x1 < x2 implies that {X(ρ) ≤ x1} ⊂ {X(ρ) ≤ x2}. But this means that P[{X(ρ) ≤ x1}] ≤
P[{X(ρ) ≤ x2}] and F(x1) ≤ F(x2). Claim #3: P[X > x] = 1  F(x). (26) Proof: {X(ρ) ≤ x1} and {X(ρ) > x1} are mutually exclusive. Also, {X(ρ) ≤ x1}∪{X(ρ) > x1} = Updates at http://www.ece.uah.edu/courses/ee420500/ 24 EE420/500 Class Notes 06/12/09 John Stensby S. Hence, P[{X ≤ x1}] + P[{X > x1}] = P[S] = 1.
Claim #4: Function F(x) may have jump discontinuities. It can be shown that a jump is the only
type of discontinuity that is possible for distribution F(x) (and, F(x) may have a countable
+ number of jumps, at most). F(x) must be right continuous; that is, we must have F(x ) = F(x).
At a “jump”, take F to be the larger value; see Figure 22. Claim #5: P[ x1 < X ≤ x2 ] = F(x2)  F(x1) (27) Proof: {X(ρ) ≤ x1} and {x1 < X(ρ) ≤ x2} are mutually exclusive. Also, {X(ρ) ≤ x2} = {X(ρ) ≤
x1} ∪ {x1 < X(ρ) ≤ x2}. Hence, P[X(ρ) ≤ x 2 ] = P[X(ρ) ≤ x1 ] + P[ x1 < X(ρ) ≤ x 2 ] and P[x1 <
X(ρ) ≤ x2] = F(x2)  F(x1). Claim #6: P[X = x ] = F( x )  F( x− ). (28) Proof: P[ x  ε < X ≤ x ] = F(x)  F(x  ε). Now, take limit as ε → 0+ to obtain the desired result. Claim #7: P[ x1 ≤ X ≤ x2 ] = F(x2)  F(x1−). (29) Proof: {x1 ≤ X ≤ x2} = {x1 < X ≤ x2}∪{X = x1} so that
P[ x1 ≤ X ≤ x 2 ] = ( F(x 2 ) − F(x1 ) )+( F(x1 ) − F(x1− ) )= F(x 2 ) − F(x1− ) . 1
F(x0)
x0 Figure 22: Distributions are right continuous.
Updates at http://www.ece.uah.edu/courses/ee420500/ 25 EE420/500 Class Notes 06/12/09 John Stensby Fx(x)
1 x Figure 23: Distribution function for a discrete random
variable.
Continuous Random Variables
Random variable X is of continuous type if Fx(x) is continuous. In this case, P[X = x] =
0; the probability is zero that X takes on a given value x. Discrete Random Variables
Random variable X is of discrete type if Fx(x) is piecewise constant. The distribution
should look like a staircase. Denote by xi the points of discontinuity of Fx(x). Then Fx(xi) Fx(xi) = P[X = xi] = pi. See Figure 23. Mixed Random Variables
Random variable X is said to be of mixed type if Fx(x) is discontinuous but not a
staircase. Density Function
The derivative fx (x) ≡ dFx (x)
dx (210) is called the density function of the random variable X. Suppose Fx has a jump discontinuity at a
point x0. Then f(x) contains the term Updates at http://www.ece.uah.edu/courses/ee420500/ 26 Distribution Function 06/12/09 John Stensby Fx(x) Density Function EE420/500 Class Notes fx(x) }k
Jump Discontinuity at x0
x x0 kδ(x  x0)
Delta Function at x0 x0 x Figure 24: "Jumps" in distribution function causes delta function in density
function. The distribution jumps by the value k at x = x0.
+
−
−
Fx (x 0 ) − Fx (x 0 ) δ( x − x 0 ) = Fx (x 0 ) − Fx (x 0 ) δ( x − x 0 ) . (211) See Figure 24.
Suppose that X is of a discrete type taking values xi, i ∈ I. The density can be written as
f x (x) = ∑ P[X = xi ] δ(x − xi ) , i∈I where I is an index set. Figures 25 and 26 illustrate the distribution and density, respectively,
of a discrete random variable. Properties of fx
The monotonicity of Fx implies that fx(x) ≥ 0 for all x. Furthermore, we have fx (x) = dFx (x)
⇔ Fx (x) =
dx z x −∞ fx (ρ)dρ Updates at http://www.ece.uah.edu/courses/ee420500/ (212) 27 EE420/500 Class Notes x1 U k2
V
W
x2 Density Function ⎫
⎬k
⎭ 1 John Stensby
fx(x) Fx(x) Distribution
Function 1 06/12/09 } k4 q k3 x3 x4 x k1 x1 k2 x2 k3 x3 k4 x4 x Figure 25: Distribution function for a Figure 26: Density function for a discrete
random variable.
discrete random variable.
P[x1 < X ≤ x2] = Fx(x2)  Fx(x1) = z x2 x1 fx (ρ)dρ . (213) If X is of continuous type, then Fx(x) = Fx(x), and P[x1 ≤ X ≤ x2] = Fx(x2)  Fx(x1) = z x2 x1 fx (ρ)dρ . For continuous random variables P[x < X ≤ x + Δx] ≈ fx(x)Δx for small Δx. Normal/Gaussian Random Variables
Let η, −∞ < η < ∞, and σ, σ > 0, be constants. Then
2 1 ⎛ x− η ⎞
1
⎛ x− η ⎞
f x (x) = g ⎜
exp[− 1 ⎜
⎟=
2⎝ σ ⎟
σ ⎝ σ ⎠
2π σ
⎠ (214) is a Gaussian density function with parameters η and σ. g(x) ≡ These parameters have special 1
exp[− 1 x 2 ]
2
2π x
Figure 27: Density function for a Gaussian random variable
Updates at http://www.ece.uah.edu/courses/ee420500/ 28 EE420/500 Class Notes 06/12/09 John Stensby meanings that will be discussed below. The notation N(η;σ) is used to indicate a Gaussian
random variable with parameters η and σ. Figure 27 illustrates a Gaussian density function.
Random variable X is said to be Gaussian if its distribution function is given by F(x) = P[X ≤ x] = x
1
(x − η) 2 ⎤
exp ⎡ −
dx
∫
⎢
⎣
⎦
2π σ −∞
2σ2 ⎥ (215) for given η, ∞ < η < ∞, and σ, σ > 0. Numerical values for F(x) can be determined with the aid
of a table. To accomplish this, make the change of variable y = (x  η)/σ in (215) and obtain
1 (x −η) / σ
y2
⎛ x −η⎞
exp ⎡ − ⎤ dy = G ⎜
⎟.
∫−∞
⎢ 2 ⎥
⎣
⎦
2π
⎝ σ ⎠ F(x) = (216) Function G(x) is tabulated in many reference books, and it is built in to many popular computer
math packages (i.e., Matlab, Mathcad, etc.). Uniform
Random variable X is uniform between x1 and x2 if its density is constant on the interval
[x1, x2] and zero elsewhere. Figure 28 illustrates the distribution and density of a uniform
random variable. fx(x) Fx(x) 1
x 2 − x1 1 x1
Figure 28:
variable. x2 x x1 x2 x Density and distribution functions for a uniform random Updates at http://www.ece.uah.edu/courses/ee420500/ 29 EE420/500 Class Notes 06/12/09 John Stensby Binomial
Random variable X has a binomial distribution of order n with parameter p if it takes the
integer values 0, 1, ... , n with probabilities
⎛n⎞
P[X = k] = ⎜ ⎟ pk q n − k ,
⎜ ⎟
⎝k⎠ 0 ≤ k ≤ n. (217) Both n and p are known parameters where p + q = 1, and FG nIJ ≡ n ! .
H kK k !( n − k )! (218) We say that binomial random variable X is B(n,p).
The Binomial density function is
n fx ( x ) = ∑
k =0 FG nIJ p q
H kK
k n−k δ( x − k ) , (219) and the Binomial distribution is Fx (x) = mx ⎛ n ⎞ ⎟ pk q n − k ,
⎟
k =0 ⎝ k ⎠ ∑⎜
⎜ = 1, m x ≤ x < m x + 1, 0 ≤ m x ≤ n − 1 (220) x ≥ n. Note that mx depends on x. Poisson
Random variable X is Poisson with parameter a > 0 if it takes on integer values 0, 1, ... Updates at http://www.ece.uah.edu/courses/ee420500/ 210 EE420/500 Class Notes 06/12/09 John Stensby with P[X = k] = e −a ak
,
k! k = 0, 1, 2, 3, ... (221) The density and distribution of a Poisson random variable are given by f x (x) = e−a Fx (x) = e −a ∞ ak
∑ k! δ(x − k)
k =0
mx ak
∑ k!
k =0 (222) m x ≤ x < m x + 1, m x = 0, 1, ... . (223) Rayleigh
The random variable X is Rayleigh distributed with realvalued parameter α, α > 0, if it
is described by the density f x (x) = ( ) 2⎤
⎡
exp ⎢ − 1 x ⎥ , x ≥ 0
2 α
α2
⎣
⎦ x =0 (224) , x < 0. See Figure 29 for a depiction of a Rayleigh density function.
The distribution function for a Rayleigh random variable is Fx (x) = x u ∫0 α 2 = 0, 2
2
e− u / 2α du, x ≥ 0 (225)
x<0 Updates at http://www.ece.uah.edu/courses/ee420500/ 211 EE420/500 Class Notes 06/12/09 LM e j OP
N
Q 1
x 2 U( x )
fx ( x ) = 2 x exp − 1 α
2
α
−1 α exp John Stensby λ −1
2 fx ( x ) = λ exp − λx U ( x ) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 x/α Figure 29: Rayleigh density function. 0 1 2 3 λx Figure 210: Exponential density function. To evaluate (225), use the change of variable y = u2/2α2, dy = (u/α2)du to obtain Fx (x) = x 2 / 2α 2 − y
e dy =
0 ∫ 2
2
1 − e − x / 2α , x ≥ 0 = 0, (226) x<0 as the distribution function for a Rayleigh random variable. Exponential
The random variable X is exponentially distributed with realvalued parameter λ, λ > 0, if
it is described by the density
f x (x) = λ exp [ −λx ] , x ≥ 0
=0 (227) , x < 0. See Figure 210 for a depiction of an exponential density function.
The distribution function for an exponential random variable is Updates at http://www.ece.uah.edu/courses/ee420500/ 212 EE420/500 Class Notes
Fx (x) = x ∫0 λe −λy 06/12/09 dy = 1 − e −λx , x ≥ 0 = 0, John Stensby (228) x < 0. Conditional Distribution
Let M denote an event for which P[M] ≠ 0. Assuming the occurrence of event M, the
conditional distribution F(x⎮M) of random variable X is defined as F(x⎮M) = P[X ≤ x⎮M] ≡ P[X ≤ x, M]
.
P[M] (229) Note that F(∞⎮M) = 1 and F(∞⎮M) = 0. Furthermore, the conditional distribution has all of the
properties of an "ordinary" distribution function. For example, P[x1 < X ≤ x 2⎮M] = F(x 2⎮M) − F(x1⎮M) = P[x1 < X ≤ x 2 , M]
.
P[M] (230) Conditional Density
The conditional density f(x⎮M) is defined as f (x⎮M) = P[x < X ≤ x + Δx⎮M]
d
F(x⎮M) = limit
.
dx
Δx
Δx → 0 (231) The conditional density has all of the properties of an "ordinary" density function. Example 23: Determine the conditional distribution F(x⎮M) of random variable X(fi) = 10i of
the fairdie experiment where M={f2, f4, f6} is the event "even" has occurred. First, note that X
must take on values in the set {10, 20, 30, 40, 50, 60}. Hence, if x ≥ 60, then [X ≤ x] is the
certain event and [X ≤ x, M] = M. Because of this, Updates at http://www.ece.uah.edu/courses/ee420500/ 213 EE420/500 Class Notes
F(x⎮M) = 06/12/09 John Stensby P[X ≤ x, M] P[M]
=
= 1 , x ≥ 60 .
P[M]
P[M] If 40 ≤ x < 60, then [X ≤ x, M] = [f2, f4], and F(x⎮M) = P[X ≤ x, M]
P[f 2 , f 4 ]
2/6
=
=
,
P[M]
P[f 2 , f 4 , f 6 ] 3 / 6 40 ≤ x < 60 . If 20 ≤ x < 40, then [X ≤ x, M] = [f2], and F(x⎮M) = P[X ≤ x, M]
P[f 2 ]
1/ 6
=
=
,
P[M]
P[f 2 , f 4 , f 6 ] 3 / 6 20 ≤ x < 40 . Finally, if x < 20, then [X ≤ x, M] = [∅], and F(x⎮M) = P[X ≤ x, M]
P[{∅}]
0
=
=
,
P[M]
P[f 2 , f 4 , f 6 ] 3 / 6 x < 20 . Conditional Distribution When Event M is Defined in Terms of X
If M is an event that can be expressed in terms of the random variable X, then F(x⎮M)
can be determined from the "ordinary" distribution F(x). Below, we give several examples.
As a first example, consider M = [X ≤ a], and find both F(x⎮M) = F(x⎮X≤ a) = P[X ≤
x⎮X ≤ a] and f(x⎮M). Note that F(x⎮X ≤ a) = P[X ≤ x⎮X ≤ a] = P[X ≤ x, X ≤ a]
.
P[X ≤ a] Hence, if x ≥ a, we have [X ≤ x, X ≤ a] = [X ≤ a] and Updates at http://www.ece.uah.edu/courses/ee420500/ 214 EE420/500 Class Notes
F(x⎮X ≤ a) = 06/12/09 P[X ≤ a]
= 1,
P[X ≤ a] John Stensby x≥a. If x < a, then [X ≤ x, X ≤ a] = [X ≤ x] and F(x⎮X ≤ a) = P[X ≤ x] Fx (x)
=
,
P[X ≤ a] Fx (a) x<a.   At x = a, F(x⎮X ≤ a) would jump 1  FX(a )/FX(a) if FX(a ) ≠ FX(a). The conditional density is
⎧ f x (x)
⎫
, x < a⎪
⎪
⎡ F (a − ) ⎤
d
d ⎡ Fx (x) ⎤ ⎪ Fx (a)
⎪
=⎨
+ ⎢1 − x
f (x⎮X ≤ a) =
F(x⎮X ≤ a) =
⎥ δ(x − a) .
⎬
⎢
⎥
dx
dx ⎣ Fx (a) ⎦ ⎪
Fx (a) ⎥
⎢
⎪
⎣
⎦
x ≥ a⎪
⎪0,
⎩
⎭ As a second example, consider M = [b < X ≤ a] so that F(x⎮b < X ≤ a) = P[X ≤ x, b < X ≤ a]
.
P[b < X ≤ a] Since
[X ≤ x, b < X ≤ a] = [b < X ≤ a], a≤x = [b < X ≤ x], b < x < a
= {∅}, x≤b, we have Updates at http://www.ece.uah.edu/courses/ee420500/ 215 EE420/500 Class Notes 06/12/09 F(x⎮b < X ≤ a) =
= 1, John Stensby a≤x Fx (x) − Fx (b)
, b<x<a
Fx (a) − Fx (b) = 0, x ≤ b. − F(x⎮b < X ≤ a) is continuous at x = b. At x = a, F(x⎮b < X ≤ a) jumps 1  {FX(a ) − FX(b)}/{FX(a)  FX(b)}, a value that is zero if FX(a ) = FX(a). The corresponding conditional
density is f (x⎮b < X ≤ a) = d
F(x⎮b < X ≤ a)
dx f x (x)
⎧
⎫
⎪ F (a) − F (b) , b < x < a ⎪ ⎡ F (a − ) − F (b) ⎤
⎪
⎪
x
x
x
=⎨ x
⎥ δ(x − a) .
⎬ + ⎢1 −
Fx (a) − Fx (b) ⎥
⎪
⎪ ⎢
⎣
⎦
0,
otherwise ⎪
⎪
⎩
⎭ Example 24: Find f(x⎮⎮X  η⎮≤ κσ), where X is N(η,σ). First, note that X − η κσ ⇒ η − κσ ≤ X ≤ η + κσ ,
⎮≤ so that P[⎮X − η κσ] = P[η − κσ ≤ X ≤ η + κσ] =
⎮≤ ⎡ ⎛ x − η ⎞2 ⎤
η + κσ
1
exp ⎢ − 1 ⎜
⎟ ⎥ dx .
2πσ η − κσ
⎢ 2⎝ σ ⎠ ⎥
⎣
⎦ ∫ By a change of variable, this last result becomes P[⎮X − η κσ] =
⎮≤ κ
1
exp ⎡ − 1 u 2 ⎤ du = G( κ ) − G( −κ ) = 2G( κ ) − 1.
⎢ 2 ⎥
⎣
⎦
2π −κ ∫ Updates at http://www.ece.uah.edu/courses/ee420500/ 216 EE420/500 Class Notes 06/12/09 John Stensby Hence, by the previous example, we have f (x⎮ X − η⎮ ≤ κσ) =
⎮ ( ) ⎡
1
x −η 2 ⎤
exp ⎢ − 1 σ
⎥
2
2π σ
⎣
⎦,
2G( κ) − 1 =0 η − κσ ≤ x ≤ η + κσ
otherwise . Total Probability – Continuous Form Define B = [X ≤ x], and let A1, A2, ... , An be a partition of sample space S. That is,
n ∪ Ai = S and Ai ∩ A j = {∅} for i ≠ j . (232) i=1 From the discrete form of the Theorem of Total Probability discussed in Chapter 1, we have
P[B] = P[B ⎮A1]P[A1] + P[B ⎮A2]P[A2] + ... + P[B ⎮An]P[An]. Now, with B = [X ≤ x], this becomes P[X ≤ x] = P[X ≤ x ⎮A1]P[A1] + P[X ≤ x ⎮A2]P[A2] + ... + P[X ≤ x ⎮An]P[An]. (233) Hence,
Fx (x) = F(x⎮A1 )P[A1 ] + F(x⎮A 2 )P[A 2 ] + ... + F(x⎮A n )P[A n ] (234) f x (x) = f(x⎮A1 )P[A1 ] + f(x⎮A 2 )P[A 2 ] + ... + f(x⎮A n )P[A n ] . Several useful formulas can be derived from this result. For example, let A be any event,
and let X be any random variable. Then we can write a version of Bayes rule as Updates at http://www.ece.uah.edu/courses/ee420500/ 217 EE420/500 Class Notes
P[A⎮X ≤ x] = 06/12/09 John Stensby P[A, X ≤ x] P[X ≤ x⎮A]P[A] F(x⎮A)P[A]
=
=
P[X ≤ x]
P[X ≤ x]
F(x) (235)
= F(x⎮A)P[A]
.
F(x⎮A1 )P[A1 ] + F(x⎮A 2 )P[A 2 ] + ... + F(x⎮A n )P[A n ] As a second example, we derive a formula for P[A⎮x1 < X ≤ x2]. Now, the conditional
distribution F(x⎮A) has the same properties as an "ordinary" distribution. That is, we can write
P[x1 < X ≤ x2⎮A] = F(x2⎮A)  F(x1⎮A) so that
P[x1 < X ≤ x 2⎮Α]
P[A]
P[x1 < X ≤ x 2 ] P[A⎮x1 < X ≤ x 2 ] = = (236) F(x 2⎮Α) − F(x1⎮Α)
P[A] .
Fx (x 2 ) − Fx (x1 ) In general, we cannot write P[A⎮X = x] = P[A, X = x]
P[X = x] (237) since this may result in an indeterminant 0/0 form. Instead, we must write P[A⎮X = x] = = limit P[Ax < X ≤ x + Δx] = Δx → 0 + F(x + Δx⎮A) − F(x⎮A)
P[A]
Δx →0+ Fx (x + Δx) − Fx (x)
limit (238) [F(x + Δx⎮A) − F(x⎮A)] / Δx
P(A) ,
[Fx (x + Δx) − Fx (x)] / Δx
Δx → 0
limit + which yields Updates at http://www.ece.uah.edu/courses/ee420500/ 218 EE420/500 Class Notes P[A⎮X = x] = 06/12/09 f (x⎮A)
P[A] .
f x (x) John Stensby
(239) Now, multiply both sides of this last result by fx and integrate to obtain
∞ ∞ ∫−∞ P[A⎮X = x]f x (x)dx = P[A]∫−∞ f (x⎮A)dx . (240) But, the area under f(x⎮A) is unity. Hence, we obtain P[A] = ∫ ∞ −∞ P[A⎮X = x]f x (x) dx , (241) the continuous version of the Total Probability Theorem. In Chapter 1, we gave a “finite
dimensional” version of this theorem. Compare (241) with the result of Theorem 11; conceptually, they are similar. P[A⎮X = x] is the probability of A given that X = x. Equation
(241) tells us to average this conditional probability over all possible values of X to find P[A]. Example 25
Consider tossing a coin. The sample space is S = {h, t}, a “heads” and a “tails”.
However, assume that we do not know P[h] for the coin. So, we model P[h] as a random
variable p . Our goal is to estimate P[h] by repeatedly tossing the coin.
As a random variable, p must map some sample space, call it Sc, into [0, 1], the range of
possible values for the (probability of heads) of any coin. To be more definitive, let’s take Sc as
the set of all coins in a given large pot, and note that p : Sc → [0, 1]. As assigned by p , each
coin in Sc has a (probability of heads) ∈ [0, 1].
Assume that we can guess (or we know) a density function f p (p) that describes random
variable p . Given numbers p1 and p2, the probability of selecting a coin with p (i.e., the
probability of heads) between p1 and p2 is Updates at http://www.ece.uah.edu/courses/ee420500/ 219 EE420/500 Class Notes
P [ p1 < p ≤ p2 ] = ∫ 06/12/09 p2 f (p)dp .
p1 p John Stensby
(242) It is reasonable to expect f p (p) to peak at, or near, p = ½ and drop rapidly as p deviates from ½.
After all, most coins are nearly balanced and fair. As we will show by examples that follow, it is
possible to use experimental data (i.e., statistics from coin toss trials) to update f p (p) . That is,
we will condition the density that describes p on experimental data obtained from tossing the
selected coin. In this context, we will call f p (p) the apriori density (before the coin tosses);
the updated version will be called the aposteriori density (after the coin tosses).
Our combined, or joint, experiment consists of selecting a coin from Sc (the first
experiment) and then tossing it (the second experiment). For the combined experiment, the
relevant product space is
Sc × S = [(ρ, h) : ρ ∈ Sc ] ∪ [(ρ, t) : ρ ∈ Sc ] . (243) For the combined experiment, the event “we get a heads” or “heads occurs” is
H = [(ρ, h) : ρ ∈ Sc ] , (244) an event in the product, or combined, experiment. However the probability function (measure) is
defined on the product space, it is clear that the conditional probability of H, given that the
selected coin has (probability of heads) = p, is simply P ⎡H
⎣ p = p ⎤ = P ⎡[(ρ, h) : ρ ∈ Sc ]
⎦
⎣ p = p ⎤ = p.
⎦ (245) The Theorem of Total Probability can be used to express P[H] in terms of known f p (p) .
Simply use (245) and the Theorem of Total Probability (241) to write Updates at http://www.ece.uah.edu/courses/ee420500/ 220 EE420/500 Class Notes
1 P [H ] = ∫ P ⎡ H
0 ⎣ 06/12/09 John Stensby 1 p = p ⎤ f p (p)dp = ∫ p f p (p)dp .
⎦
0 (246) On the right hand side of (246), the integral is called the ensemble average, or expected value,
of the random variable p .
Bayes Theorem  Continuous Form
From (239) we get f (x⎮A) = P[A⎮X = x]
f x (x) .
P[A] (247) Now, use (241) and (247) to write f (x⎮A) = ∞ P[A⎮X = x] ∫−∞ P[A⎮X = υ]f x (υ)dυ f x (x) , (248) a result known as the continuous form of Bayes Theorem.
Often, fX(x) is called the apriori density for random variable X. And, f(x⎮A) is called
the aposteriori density conditioned on the observed event A. In an application, we might “cook
up” a density fX(x) that (crudely) describes a random quantity (i.e., variable) X of interest. To
improve our characterization of X, we note the occurrence of a related event A and compute
f(x⎮A) to better characterize X.
The value of x that maximizes f(x⎮A) is called the maximum aposteriori (MAP)
estimate of X. MAP estimation is used in statistical signal processing and many other problems
where one must estimate a quantity from observations of related random quantities.
Example 26: (MAP estimate of probability of heads in previously discussed coin experiment)
Find the MAP estimate of the probability of heads in the previously discussed coin selection and
tossing experiment. First, recall that the probability of heads is modeled as a random variable ~
p Updates at http://www.ece.uah.edu/courses/ee420500/ 221 EE420/500 Class Notes 06/12/09 John Stensby with density f~(p). We call f~(p) the apriori ("before the coin toss experiment") density of ~
p
p
p
(which we may have to guess). Suppose we toss the coin n times and get k heads. We want to
use these experimental results with our apriori density f~(p) to compute the conditional density
p
f (p⎮k heads, in a specific order, in n tosses of a selected coin) . (249) Observed Event A This density is called aposteriori density ("after the coin toss experiment") of random variable
~ given the experimentally observed event
p A = [k heads, in a specific order, in n tosses of a selected coin] . (250) The aposteriori density f(p⎮A) may give us a good idea (better than the apriori density
f~(p)) of the probability of heads for the randomly selected coin that was tossed. Conceptually,
p
think of f(p⎮A) as a density that results from using experimental data/observation A to “update”
f~(p). In fact, given that A occurred, the aposteriori probability that ~ is between p1 and p2 is
p
p z p2 p1 Y f ( p A ) dp . (251) Finally, the value of p that maximizes f(p⎮A) is the maximum aposteriori estimate (MAP
estimate) of ~ for the selected coin.
p
To find this aposteriori density, recall that ~ is defined on the sample space Sc. The
p
experiment of tossing the randomly selected coin n times is defined on the sample space (i.e.,
product space) Sc ×S n , where S = [h, t]. The elements of Sc ×S n have the form
ρ, tht h where ρ ∈ Sc and t h t n outcomes h ∈ Sn . n outcomes Updates at http://www.ece.uah.edu/courses/ee420500/ 222 EE420/500 Class Notes 06/12/09 John Stensby Now, given that ~ = p, the conditional probability of event A is
p
P[k heads in specific order in n tosses of specific coin ⎮ p = p] = pk (1 − p)n − k . (252) Observed Event A Substitute this into the continuous form of Bayes rule (248) to obtain
Observed Event A f (p⎮A) = f (p⎮ k heads, in a specific order, in n tosses of a selected coin )
pk (1 − p) n − k f p (p) (253) = 1
,
υk (1 − υ) n − k f p ( υ)dυ
∫
0 a result known as the aposteriori density of ~ given A. In (253), the quantity υ is a dummy
p
variable of integration.
Suppose that the apriori density f~(p) is smooth and slowly changing around p = k/n
p
(indicating a lot of uncertainty in the value of p). Then, for large n, the numerator pk(1p)nk f~(p),
p
and the aposteriori density f(p⎮A), has a sharp peak at p = k/n, indicating little uncertainty in
the value of p. When f(p⎮A) is peaked at p = k/n, the MAP estimate of the probability of heads
(for the selected coin) is the value p = k/n. Example 27: For σ > 0, we use the apriori density f p (p) = ⎡ (p − 1/ 2)2 ⎤
1
exp ⎢ −
⎥
2π σ
2σ2 ⎥
⎢
⎣
⎦ ,
⎛ −1 ⎞
1 − 2G ⎜
⎟
⎝ 2σ ⎠ (254) where G(x) is a zero mean, unit variance Gaussian distribution function (verify that there is unit
area under f~(p)). Also, use numerical integration to evaluate the denominator of (253). For σ
p Updates at http://www.ece.uah.edu/courses/ee420500/ 223 EE420/500 Class Notes 06/12/09 John Stensby 7 6 f(p⎮A) for n = 50, k = 15 Density Functions 5 4 f(p⎮A) for n = 10, k = 3 3 f~(p)
p 2 1 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p Figure 211: Plot of apriori and aposteriori
densities.
= .3, n = 10 and k = 3, aposteriori density f(p⎮A) was computed and the results are plotted on
Figure 211. For σ = .3, n = 50 and k = 15, the calculation was performed a second time, and the
results appear on Figure 211. For both, the MAP estimate of ~ is near .3, since this is were the
p
plots of f(p⎮ A) peak. Expectation
Let X denote a random variable with density fx(x). The expected value of X is defined as E[X] = z ∞ −∞ x fx (x)dx . (255) Also, E[X] is known as the mean, or average value, of X. If fx is symmetrical about some value η, then η is the expected value of X. For example,
let X be N(η,σ) so that f ( x) = LM
N OP
Q 1
( x − η) 2
.
exp − 1
2
σ2
2πσ (256) Now, (256) is symmetrical about η, so E[X] = η. Updates at http://www.ece.uah.edu/courses/ee420500/ 224 EE420/500 Class Notes 06/12/09 John Stensby Suppose that random variable X is discrete and takes on the value of xi with P[X = xi] =
pi. Then the expected value of X, as defined by (255), reduces to
E[X] = ∑ x i p i . (257) i Example 28: Consider a B(n,p) random variable X (that is, X is Binomial with parameters n
and p). Using (257), we can write E[X] = n ∑ k =0 n () k P[X = k] = ∑ k n p k q n − k .
k
k =0 (258) To evaluate this result, first consider the wellknow, and extremely useful, Binomial expansion ∑ ( n ) x k = (1 + x)n .
k
n (259) k =0 With respect to x, differentiate (259); then, multiply the derivative by x to obtain ∑ k ( n ) x k = xn(1 + x)n −1 .
k
n (260) k =0 Into (260), substitute x = p/q, and then multiply both sides by qn. This results in ∑ k ( n ) pk q n − k = npq n −1(1 + q )n −1 = np(p + q)n −1
k
n p k =0 (261) = np . Updates at http://www.ece.uah.edu/courses/ee420500/ 225 EE420/500 Class Notes 06/12/09 John Stensby From this and (258), we conclude that a B(n,p) random variable has E[X] = np. (262) We now provide a second, completely different, evaluation of E[X] for a B(n,p) random
variable. Define n new random variables Xi = 1 , if i th trial is a "success", 1 ≤ i ≤ n (263) = 0 , otherwise . Note that E[Xi ] = 1 ⋅ p + 0 ⋅ (1 − p) = p, 1≤ i ≤ n . (264) B(n,p) random variable X is the number of successes out of n independent trials; it can be written
as X = X1 + X 2 + + Xn . (265) With the use of (264), the expected value of X can be evaluated as
E[X] = E[X1 + X 2 + + X n ] = E[X1 ] + E[X 2 ] + + E[X n ] (266) = np , a result equivalent to (262). Updates at http://www.ece.uah.edu/courses/ee420500/ 226 EE420/500 Class Notes 06/12/09 John Stensby In what follows, we extend (255) to arbitrary functions of random variable X. Let g(x)
be any function of x, and let X be any random variable with density fx(x). In Chapter 4, we will
discuss transformations of the form Y = g(X); this defines the new random variable Y in terms of
the old random variable X. We will argue that the expected value of Y can be computed as E[Y] = E[g( X )] = z ∞ ∞ g( x )fx ( x )dx . (267) This brief “headsup” note (on what is to come in CH 4) is used next to define certain statistical
averages of functions of X. Variance and Standard Deviation
The variance of random variable X is
σ2 = Var[X] = E[(X − η)2 ] = ∫ ∞ −∞ (x − η)2 f x (x)dx . (268) Almost always, variance is denoted by the symbol σ2. The square root of the variance is called
the standard deviation of the random variable, and it is denoted as σ. Finally, variance is a
measure of uncertainty (or dispersion about the mean). The smaller (alternatively, larger) σ2 is,
the more (alternatively, less) likely it is for the random variable to take on values near its mean.
Finally, note that (268) is an application of the basic result (267) with g = (x  η)2 . Example 29: Let X be N(η,σ) then VAR[X] = E[(X − η)2 ] = ⎡ (x − η)2 ⎤
∞
1
(x − η)2 exp ⎢ −
⎥ dx .
∫
2πσ −∞
2σ2 ⎥
⎢
⎣
⎦ (269) Let y = (x−η)/σ, dx = σdy so that Updates at http://www.ece.uah.edu/courses/ee420500/ 227 EE420/500 Class Notes
Var[X] = 06/12/09 z 1 ∞ 2 2 − 1 y2
σ y e 2 dy = σ 2 ,
−∞
2π John Stensby
(270) a result obtained by looking up the integral in a table of integrals. Moments
The nth moment of random variable X is defined as m n = E[X n ] = z ∞ −∞ x n fx (x)dx . (271) The nth moment about the mean is defined as μ n = E[( X − η)n ] = z ∞ −∞ ( x − η)n fx (x)dx . (272) Note that (271) and (272) are basic applications of (267). Variance in Terms of Second Moment and Mean
Note that the variance can be expressed as σ2 = Var[X] = E[(X − η)2 ] = E[X 2 − 2ηX + η2 ] = E[X 2 ] − E[2ηX] + E[η2 ] , (273) where the linearity of the operator E[·] has been used. Now, constants “come out front” of
expectations, and the expectation of a constant is the constant. Hence, Equation (273) leads to σ2 = E[X 2 ] − E[2ηX] + E[η2 ] = m 2 − 2ηE[X] + η2 = m 2 − η2 , (274) the second moment minus the square of the mean. In what follows, this formula will be used
extensively. Example 210: Let X be N(0,σ) and find E[Xn]. First, consider the case of n an odd integer Updates at http://www.ece.uah.edu/courses/ee420500/ 228 EE420/500 Class Notes 06/12/09 John Stensby ⎡ 1 x2 ⎤
∞ n
1
E[X ] =
∫ x exp ⎢ − 2 σ2 ⎥ dx = 0 ,
2πσ −∞
⎢
⎥
⎣
⎦
n (275) since an integral of an odd function over symmetrical limits is zero. Now, consider the case n an even integer. Start with the known tabulated integral
∞ 2
π
e −αx dx = α = α−1/ 2 π ,
∫−∞ α > 0. (276) Repeated differentiations with respect to α yields
∞ 2 first d/dα: −3 / 2
2 −αx
π
∫−∞ − x e dx = − 1 α
2 second d/dα: ∫−∞ x k th d/dα: ∞ 4 −αx 2 e dx = 1 3 α −5 / 2 π
22 ∞ 2
x 2k e −αx dx = 1 3 5
∫−∞
222 (2k −1) −(2k +1) / 2
α
π.
2 Let n = 2k (remember, this is the case n even) and α = 1/2σ2 to obtain
∞ ∫−∞ 2
2
1⋅ 3⋅5
x n e − x / 2σ dx = = 1⋅ 3 ⋅ 5 ( n −1)
2 n π 2(n +1) σ2(n+1)
(277) ( n − 1) 2π σ n+1 . From this, we conclude that the nthmoment of a zeromean, Gaussian random variable is Updates at http://www.ece.uah.edu/courses/ee420500/ 229 EE420/500 Class Notes
m n ≡ E[X n ] = 06/12/09 John Stensby ∞ n − x 2 / 2σ2
1
dx = 1 ⋅ 3 ⋅ 5 (n − 1) σ n , n = 2k
∫−∞ x e
2π σ =0 (n even) (278) , n = 2k1 (n odd) . Example 211 (Rayleigh Distribution): A random variable X is Rayleigh distributed with
parameter α if its density function has the form f X (x) = ⎛ x2 ⎞
exp ⎜ −
⎟ , x≥0
⎜ 2α 2 ⎟
α2
⎝
⎠
x =0, (279) x<0. This random variable has many applications in communication theory; for example, it describes
the envelope of a narrowband Gaussian noise process (as described in Chapter 9 of these notes).
The nth moment of a Rayleigh random variable can be computed by writing
2
2
∞
E[X n ] = 1 ∫ x n +1 e − x / 2α dx .
α2 0 (280) We consider two cases, n even and n odd. For the case n odd, we have n+1 even, and (280)
becomes
2
2
∞ n +1 − x 2 / 2α 2 ⎤
1 ∞
2π ⎡ 1
E[X n ] = 12 ∫ x n +1 e− x / 2α dx =
x
e
dx ⎥ .
α 2 ∞
2α ⎢ 2πα ∫∞
⎣
⎦ (281) On the righthandside of (281), the bracket contains the (n+1)th moment of a N(0,α) random
variable (compare (278) and the righthandside of (281)). Hence, we can write
⎛
⎞
E[X n ] = ⎜ 2π ⎟ ⋅1⋅ 3 ⋅ 5
⎝ 2α ⎠ n α n +1 = 1⋅ 3 ⋅ 5 Updates at http://www.ece.uah.edu/courses/ee420500/ n α n π / 2, n = 2k+1 (n odd) . (282) 230 EE420/500 Class Notes 06/12/09 John Stensby Now, consider the case n even, n+1 odd. For this case, (280) becomes E[X n ] = ∞ n +1 − x 2 / 2α 2
x
e
dx
α2 0 1 ∫ 2
2
∞
= ∫ x n e − x / 2α
0 x
α2 dx . (283) Substitute y = x2/2α2, so that dy = (x/α2)dx, in (283) and obtain
∞ E ⎡X n ⎤ = ∫
⎣ ⎦ 0 ( ( )! , ) n
∞
2α y e − ydy = 2n / 2 αn ∫ y n / 2e − ydy = 2n / 2 αn n
2
0 (284) (note that n/2 is an integer here) where we have used the Gamma function
∞ Γ(k + 1) = ∫ y k e− y dy = k!, (285) 0 k ≥ 0 an integer. Hence, for a Rayleigh random variable X, we have determined that
E[X n ] = 1 ⋅ 3 ⋅ 5
= 2 n/2 n α n αn π / 2 , n odd
(286) ( )!
n
2 , n even . In particular, Equation (286) can be used to obtain the mean and variance E[X] = π
α
2 (287)
Var[X] = E[X ] − ( E[X]) = 2α − π α = (2 − π )α ,
2
2
2 2 2 2 2 given that X is a Rayleigh random variable. Updates at http://www.ece.uah.edu/courses/ee420500/ 231 EE420/500 Class Notes 06/12/09 Example 212: Let X be Poisson with parameter α, so that P[X = k] = e −α ∞ f(x) = ∑ e− α
k =0 αk
δ( x − k ) .
k! John Stensby
αk
, k ≥ 0, and
k! (288) Show that E[X] = α and VAR[X] = α . To accomplish this, recall that
αk
.
k =0 k !
∞ eα = ∑ (289) With respect to α, differentiate (289) to obtain eα = ∞ α k −1 1 ∞ α k
= ∑k
k!
α k =1 k !
k =0 ∑k (290) Multiply both sides of this result by αeα to obtain
∞ k
⎛ e−α α ⎞ = E[X] .
α = ∑ k⎜
⎟
k! ⎠
k =1 ⎝ (291) as claimed. With respect to α, differentiate (290) (obtain the second derivative of (289)) and
obtain
α k −2
1 ∞ 2 αk
1 ∞ αk
e = ∑ k( k − 1)
= 2 ∑k
− 2 ∑k
.
k!
α k =1 k !
α k =1 k !
k =1
α ∞ Multiply both sides of this result by α2eα to obtain Updates at http://www.ece.uah.edu/courses/ee420500/ 232 EE420/500 Class Notes
α2 = 06/12/09 ∞ ∞
∞
αk ⎞
αk ⎞ ∞ 2
⎛
k 2 ⎜ e−α
− ∑ k ⎛ e−α
= ∑ k P[X = k] − ∑ kP[X = k] .
⎟
⎜
⎟
∑
k! ⎠
k! ⎠ k =1
k =1 ⎝
k =1 ⎝
k =1 John Stensby
(292) Note that (292) is simply α 2 = E[X 2 ] − E[X ]. Finally, a Poisson random variable has a variance
given by
2 Var[X] = E[X 2 ] − ( E[X]) = E[X 2 ] − α 2 = E[X] = α , (293) as claimed. Conditional Mean
Let M denote an event. The conditional density f(x⎮M) can be used to define the conditional mean Y E[X M ] = z ∞ ∞ Y x f ( x Μ) dx . (294) The conditional mean has many applications, including estimation theory, detection theory, etc. Example 213: Let X be Gaussian with zero mean and variance σ2 (i.e., X is N(0,σ)). Let M =
[X > 0]; find E[X ⎮M] = E[X⎮X > 0]. First, we must find the conditional density f(x ⎮X > 0);
from previous work in this chapter, we can write F(x⎮X > 0) = = P[X ≤ x, X > 0] P[X ≤ x, X > 0] P[X ≤ x, X > 0]
=
=
1 P[X ≤ 0]
1 − Fx (0)
P[X > 0]
FX (x) − FX (0)
,
1 − Fx (0) = 0, x ≥0
x < 0, so that Updates at http://www.ece.uah.edu/courses/ee420500/ 233 EE420/500 Class Notes 06/12/09 John Stensby f (x⎮X > 0) = 2f x (x), x ≥ 0
= 0, x < 0. From (294) we can write
∞
2
2⎤
⎡ 2
∫0 x exp ⎣− x / 2σ ⎦ dx .
2π σ ⎡
⎤
E ⎣ X⎮X > 0 ⎦ = Now, set y = x2/2σ2, dy = xdx/σ2 to obtain E ⎡ X⎮X > 0 ⎤ =
⎣
⎦ 2σ ∞ − y
2
∫0 e dy = σ π .
2π Tchebycheff Inequality
A measure of the concentration of a random variable near its mean is its variance.
Consider a random variable X with mean η, variance σ2 and density fX(x). The larger σ2, the
more "spreadout" the density function, and the more probable it is to find values of X "far" from
the mean. Let ε denote an arbitrary small positive number. The Tchebycheff inequality says that
the probability that X is outside (η  ε, η + ε) is negligible if σ/ε is sufficiently small. Theorem (Tchebycheff’s Inequality)
Consider random variable X with mean η and variance σ2. For any ε > 0, we have
2
P ⎡ Xη ≥ ε ⎤ ≤ σ2 .
⎣
⎦ ε (295) Proof: Note that Updates at http://www.ece.uah.edu/courses/ee420500/ 234 EE420/500 Class Notes
σ2 = ∫ ∞ −∞ ≥ ε2 [ 06/12/09 (x − η)2f x (x)dx ≥ ∫ John Stensby (x − η)2 f x (x)dx {x : x −η ≥ ε} ∫ f x (x)dx {x : x −η ≥ ε} = ε2P[ Xη ≥ ε] . This leads to the Tchebycheff inequality
2
P ⎡ Xη ≥ ε ⎤ ≤ σ2 .
⎣
⎦ ε (296) The significance of Tchebycheff's inequality is that it holds for any random variable, and
it can be used without explicit knowledge of f(x). However, the bound is very "conservative" (or
"loose"), so it may not offer much information in some applications. For example, consider
Gaussian X. One can show P ⎡ Xη ≥ 3σ ⎤ = 2 − 2G(3) = .0027 ,
⎣
⎦ (297) where G(3) is obtained from a table containing values of the Gaussian integral. However, the
Tchebycheff inequality gives the rather "loose" upper bound of P ⎡ Xη ≥ 3σ ⎤ ≤ 1/ 9 .
⎣
⎦ (298) Generalizations of Tchebycheff's Inequality
For a given random variable X, suppose that fX(x) = 0 for x < 0. Then, for any α > 0, we
have Updates at http://www.ece.uah.edu/courses/ee420500/ 235 EE420/500 Class Notes 06/12/09 P[ X ≥ α ] ≤ η/α . John Stensby
(299) To show (299), note that
∞ ∞ ∞ η = E[X] = ∫ x f x (x) dx ≥ ∫ x f x (x) dx ≥ α ∫ f x (x) dx = α P[X ≥ α ] ,
0
α
α (2100) so that P[ X ≥ α ] ≤ η/α, as claimed. Corollary: Let X be an arbitrary random variable and α and n an arbitrary real number and
positive integer, respectively. The random variable ⎮X  α⎮n takes on only nonnegative values.
Hence
n E[ Xα ]
n
P ⎡ Xα ≥ ε n ⎤ ≤
,
⎢
⎥
⎣
⎦
εn (2101) which implies
n P ⎡ Xα ≥ ε ⎤ ≤
⎣
⎦ E[ Xα ]
εn . (2102) The Tchebycheff inequality is a special case with α = η and n = 2. Application: System Reliability
Often, systems fail in a random manner. For a particular system, we denote tf as the time
interval from the moment a system is put into operation until it fails; tf is the time to failure
random variable. The distribution Fa(t) = P[tf ≤ t] is the probability the system fails at, or prior to, time t. Implicit here is the assumption that the system is placed into service at t = 0. Also, we
require that Fa(t) = 0 for t ≤ 0.
The quantity Updates at http://www.ece.uah.edu/courses/ee420500/ 236 EE420/500 Class Notes 06/12/09 John Stensby R(t) = 1 − Fa (t) = P[t f > t] (2103) is the system reliability. R(t) is the probability the system is functioning at time t > 0.
We are interested in simple methods to quantify system reliability. One such measure of
system reliability is the mean time before failure
∞ MTBF ≡ E [ t f ] = ∫ t f a (t)dt ,
0 (2104) where fa = dFa/dt is the density function that describes random variable tf.
Given that a system is functioning at time t′, t′ ≥ 0, we are interested in the probability
that a system fails at, or prior to, time t, where t > t′ ≥ 0. We express this conditional distribution
function as F(t⎮t f > t ′) = P[t f ≤ t, t f > t ′] P[t ′ < t f ≤ t] Fa (t) − Fa (t ′)
=
=
,
P[t f > t ′]
P[t f > t ′]
1 − Fa (t ′) t > t′ . (2105) The conditional density can be obtained by differentiating (2105) to obtain f (t⎮t f > t′) = f (t)
d
F(t⎮t f > t ′) = a
,
dt
1 − Fa (t′) t > t′ . (2106) F(t⎮tf > t′) and f(t⎮tf > t′) describe tf conditioned on the event tf > t′. The quantity f(t⎮tf > t′)dt
is, to firstorder in dt, the probability that the system fails between t and t + dt given that it was
working at t′. Example 214
Suppose that the time to failure random variable tf is exponentially distributed. That is,
suppose that Updates at http://www.ece.uah.edu/courses/ee420500/ 237 EE420/500 Class Notes 06/12/09 ⎧1 − e−κt , t ≥ 0
⎪
Fa (t) = ⎨
⎪0,
t<0
⎩ John Stensby (2107) ⎧ κe −κt , t ≥ 0
⎪
f a (t) = ⎨
⎪0,
t < 0,
⎩ for some constant κ > 0. From (2106), we see that f (t⎮t f > t ′) = κe−κt
e −κt ′ = κe−κ(t − t ′) = f a (t − t′), t > t′ . (2108) That is, if the system is working at time t′, then the probability that it fails between t′ and t
depends only on the positive difference t  t′, not on absolute t′. The system does not “wear out”
(become more likely to fail) as time progresses!
With (2106), we define f(t⎮tf > t′) for t > t′. However, the function β(t) ≡ fa (t)
,
1 − Fa (t) (2109) known as the conditional failure rate (also known as the hazard rate), is very useful. To firstorder, β(t)dt (when this quantity exists) is the probability that a functioningatt system will fail
between t and t + dt. Example 215 (Continuation of Example 214)
Assume the system has fa and Fa as defined in Example 214. Substitute (2107) into
(2109) to obtain β(t) = κe−κt
1 − {1 − e−κt } =κ. Updates at http://www.ece.uah.edu/courses/ee420500/ (2110) 238 EE420/500 Class Notes 06/12/09 John Stensby That is, the conditional failure rate is the constant κ. As stated in Example 214, the system does
not “wear out” as time progresses!
If conditional failure rate β(t) is a constant κ, we say that the system is a good as new
system. That is, it does not “wear out” (become more likely to fail) overtime. Examples 214 and 215 show that if a system’s timetofailure random variable tf is
exponentially distributed, then f (t⎮t f > t ′) = f a (t − t ′), t > t ′ (2111) and β(t) = constant, (2112) so the system is a good as new system.
The converse is true as well. That is, if β(t) is a constant κ for a system, then the
system’s timetofailure random variable tf is exponentially distributed. To argure this, use
(2109) to write f a (t) = κ[1 − Fa (t)] . (2113) But (2113) leads to
dFa
= −κFa (t) + κ .
dt (2114) Since Fa(t) = 0 for t ≤ 0, we must have Updates at http://www.ece.uah.edu/courses/ee420500/ 239 EE420/500 Class Notes 06/12/09 John Stensby ⎧1 − e−κt , t ≥ 0
⎪
Fa (t) = ⎨
⎪0,
t < 0,
⎩ so tf is exponentially distributed. For β(t) to be equal to a constant κ, failures must be truly
random in nature. A constant β requires that there be no time epochs where failure is more likely
(no Year 2000 – type problems!).
Random variable tf is said to exhibit the Markov property, or it is said to be memoryless,
if its conditional density obeys (2111). We have established the following theorem. Theorem
The following are equivalent
1) A system is a goodasnew system
2) β = constant
3) f(t⎮tf > t′) = fa(t – t′), t > t′
4) tf is exponentially distributed.
The previous theorem, and the Markov property, is stated in the context of system
reliability. However, both have far reaching consequences in other areas that have nothing to do
with system reliability. Basically, in problems dealing with random arrival times, the time
between two successive arrivals is exponentially distributed if
1) the arrivals are independent of each other, and
2) the arrival time t after any specified fixed time t′ is described by a density function that
depends only on the difference t  t′ (the arrival time random variable obeys the Markov
property).
In Chapter 9, we will study shot noise (caused by the random arrival of electrons at a
semiconductor junction, vacuum tube anode, etc), an application of the above theorem and the
Markov property. Updates at http://www.ece.uah.edu/courses/ee420500/ 240 ...
View
Full
Document
This note was uploaded on 10/12/2009 for the course EE EE420/500 taught by Professor Johnstensby during the Summer '09 term at University of Alabama in Huntsville.
 Summer '09
 JohnStensby

Click to edit the document details