CS 70
Discrete Mathematics and Probability Theory
Fall 2010
Tse/Wagner
Lecture 18
Multiple Random Variables and Applications to Inference
In many probability problems, we have to deal with
multiple
r.v.’s defined on the same probability space.
We have already seen examples of that: for example, we saw that computing the expectation and variance
of a binomial r.v.
X
is easier if we express it as a sum
X
=
∑
n
i
=
1
X
i
, where
X
i
represents the result of the
i
th
trial. Multiple r.v.’s arise naturally in the case of inference problems, where we observe certain quantities
and use our observations to draw inferences about other hidden quantities. This Note starts by developing
some of the basics of handling multiple r.v.’s, then applies those concepts to several examples of inference
problems.
Joint Distributions
Consider two random variables
X
and
Y
defined on the same probability space. By linearity of expectation,
we know that
E
(
X
+
Y
) =
E
(
X
)+
E
(
Y
)
. Since
E
(
X
)
can be calculated if we know the distribution of
X
and
E
(
Y
)
can be calculated if we know the distribution of
Y
, this means that
E
(
X
+
Y
)
can be computed knowing
only the individual distributions of
X
and
Y
. In particular, to compute
E
(
X
+
Y
)
, no information is needed
about the
relationship
between
X
and
Y
. However, this happy situation is unusual. For instance, consider
the situation where we need to compute, say,
E
((
X
+
Y
)
2
)
, as arose when we computed the variance of a
binomial r.v. Now we need information about the association or relationship between
X
and
Y
, if we want
to compute
E
((
X
+
Y
)
2
)
. This is because
E
((
X
+
Y
)
2
) =
E
(
X
2
)+
2
E
(
XY
)+
E
(
Y
2
)
, and
E
(
XY
)
depends on
the relationship between
X
and
Y
. How can we capture such a relationship, mathematically?
Recall that the distribution of a single random variable
X
is the collection of the probabilities of all events
X
=
a
, for all possible values of
a
that
X
can take on.
When we have two random variables
X
and
Y
,
we can think of
(
X
,
Y
)
as a “two-dimensional” random variable, in which case the events of interest are
X
=
a
∧
Y
=
b
for all possible values of
(
a
,
b
)
that
(
X
,
Y
)
can take on. Thus, a natural generalization of the
notion of distribution to multiple random variables is the following.
Definition 18.1 (joint distribution)
: The
joint distribution
of two discrete random variables
X
and
Y
is the
collection of values
{
(
a
,
b
,
Pr
[
X
=
a
∧
Y
=
b
])
:
(
a
,
b
)
∈
A
×
B
}
, where
A
and
B
are the sets of all possible
values taken by
X
and
Y
respectively.
This notion obviously generalizes to three or more random variables. Since we will write Pr
[
X
=
a
∧
Y
=
b
]
quite often, we will abbreviate it to Pr
[
X
=
a
,
Y
=
b
]
.
Just like the distribution of a single random variable, the joint distribution is
normalized
, i.e.
∑
a
∈
A
,
b
∈
B
Pr
[
X
=
a
,
Y
=
b
] =
1
.
This follows from noticing that the events
X
=
a
∧
Y
=
b
(where
a
ranges over
A
and
b
ranges over
B
)
partition the sample space.