Unformatted text preview: not true, however, that 0 pX x 1. Probability densities are always non-negative,
but can have arbitrarily large values. Often densities can be manipulated in the same way
that distributions are. In subsequent discussion we will avoid duplication whenever de nitions
and theorems are the same for both distributions and densities. Typically probability books say that this is only done when there is no chance for confusion. We know
1 22 2.1. RANDOM VARIABLES AI-TR 1548 Simple Statistics
A random variable model of a process allows us to answer a variety of quantitative questions
about the behavior of the process. Though the voltage across a resistor is unpredictable, its
long term average is not. Let us de ne the intuitive notion of long term average" as the
expected value or mean of an RV. The expected value EX X is de ned as:
EX X xiP X = xi ;
xi 2 X or Z EX X xpX = xdx : 2.2 For notational convenience we will sometimes refer to the expectation of X as E X . The
mean of a random variable is a deterministic function of its distribution. Intuitively E X
is the average of the RV's value over a large sample. We will denote a sample a, somewhat
non-standardly, by an ordered collection of trials xa,
a = :::xa::: :
The size of a sample kak we will refer to as Na. In a small abuse of notation we will write
1X Ea X N
a xa ; for the average over the sample a. Unlike the mean, the sample mean is a random variable.
The law of large numbers allows us to prove that in the limit the sample mean equals the
EX X = Nlim Ea X = Nlim N xa :
a The mean is an example of a statistic. Statistics are deterministic values computed from
an RV that sum up its gross or long term behavior. Statistics of X are de ned as the
expectation of functions of X or possibly P X .
By itself, E X does not tell us much about X . For example, the average lottery number
does not help us guess what the next lottery number will be. In addition to knowing the
mean, we would like to know on average how close samples of X will be to the mean. We
23 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY can tell that the average lottery number is a useless statistic by the fact that the variation in
lottery numbers is huge. One measure of expected variation is called variance and is de ned
V arx EX X , EX X 2 = EX X 2 , EX X 2 :
The square root of variance is the standard deviation, X . The standard deviation is a
measure of how far, on average, the samples of X will be from E X .
Though the expectation of an RV is equal to the in nite mean, as in 2.3, we have not
explored its relation to the sample mean. Is the sample mean a good estimate for the true
mean of an RV? In a quali ed sense the answer is yes. The expectation of the sample mean
is the same as the expectation:
E Ea X = E N xa = N E xa = E X :
a a Expectation, because it is de ned as an integral, is linear and can be moved inside the
summation. The sample mean is often called an unbiased estimator of the true mean. But,
how close on average will the sample mean be to the true mean? Under the assumption that
the di erent trials of X are independent and identically distributed, the standard deviation
of the sample mean is
Ea X = pX : Na Therefore, the standard deviation of the sample mean approaches 0 as Na approaches in nity.
We can conclude that the sample mean is an unbiased estimate for the true mean, and that
the quality of the estimate is better for larger samples.
The mean and variance are the zeroth and rst elements of an in nite class of moment
statistics. These statistics can used to classify the behavior of an RV with ever increasing
accuracy. The Algebra of Random Variables
Random variables are useful descriptions of processes that occur in the real world. RV's can
be used in algebraic equations just as variables are. The value of an equation that includes
an RV is another random process. A new RV, Y , can be de ned from X as Y = F X . For
24 2.1. RANDOM VARIABLES AI-TR 1548 discrete RV's, the probability distribution of Y is easily de ned as: PY F n = PX n. For
continuous RV's it is not quite as simple, x
pY F x = pdX x :
dx Intuitively this equation tells us to scale down the density at points where @F x is large. In
these regions F acts to stretch out X . The density pX x gets diluted by this stretching.
With this new theory of random variables, and many identities that can only really be
hinted at here, we can begin to analyze systems such as the noisy circuit described above. We
can answer questions like, If there is random noise in the voltage from a power supply, how
much variation will there be in the current across a resistor on the other side of the circuit?"
In general this kind of analysis starts from a description of the distribution of one RV and
derives the distribution of other functionally related RV's in the system. Joint and Conditional Distributions
When one RV is a functio...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola