Unformatted text preview: The theory of probability will help us determine how statistics converge, what
they converge to, and more importantly how alternative statistics might be more appropriate.
In this chapter we will introduce the basic mathematics that underly probability. In
subsequent chapters we will assume that the reader has a fairly thorough knowledge of probability, statistics, entropy, and coding. This chapter is intended both as a review of the
required techniques and theorems and as a bridge for a reader unfamiliar with these topics.
The nal sections of this chapter contain a new analysis and discussion of Parzen density
estimation. Parzen density estimation will play an important role in the estimation of entropy
in subsequent chapters.
We will sometimes use a simpli ed, or looser, de nition of concepts like events and random
variables than is typical. If you get overly confused reading this chapter, any good book on
probability should clear things up Papoulis, 1991; Baclawski et al., 1990. In general we
will leave out the proofs of anything that is easily looked up, and of course most of the theory
presented is cited here without reference. Unfortunately probability and statistics seems to
have many con icting standard notations. In our own de nitions we will try to be consistent
with the prevailing conventions. 20 2.1. RANDOM VARIABLES AI-TR 1548 2.1 Random Variables
In many cases an algebraic model of a physical system allows us to accurately predict its
behavior. For instance, circuit theory can be used to analyze a particular circuit and predict
that when a switch is closed current will ow. The physics of many circuits can be modeled
as equations where unknown quantities are recorded as variables. In the case of a switched
circuit, we can model the resistance of a switch as a variable that can take on one of two
values: zero when closed or in nity when open. The current that ows through that resistor
can then be predicted from algebraic manipulations. Conversely, knowing the value of the
current allows us to predict whether the switch is open or closed. The equivalence of a circuit
and a circuit model is fundamental within the elds of physics and engineering.
In a wide variety of physical systems the behavior of particular measurements cannot
be easily predicted. The voltage of a wire may be a complex function of the circuit and the
thermal noise in a resistor. Even when all of the other circuit variables are known, the voltage
cannot be predicted accurately. Luckily, all hope is not lost. We may not know the actual
voltage but we may know that it will be near" V0 , and that it is never, in our experience,
higher than Vmax . Probability, random processes, and random variables provide the tools to
quantify the intuitive concepts of near" and never".
A random variable, or RV, is a variable whose value is unpredictable. Recall that a variable
is a symbol X and a set of values X over which the variable can range. For example, X
could range over the real numbers between 1 and 10. In this thesis we will assume X will
always be a subset of the real numbers. A random variable X is a variable together with a
function PX : X ! 0; 1 called a probability distribution. For example we can construct
an RV that models the roll of a six sided die. If the die is fair" we cannot in advance know
what its value will be, but we do know that its value will be one of 6 integers from 1 to
6, and that each will appear roughly one sixth of the time. The RV that describes this die
includes the variable symbol X , a sample space X = f1; 2; 3; 4; 5; 6g of possible outcomes,
and a probability distribution function PX n which tells us the probability that X will take
on the value n. A particular value of an RV is called a trial, for example from a die roll. A
collection of trials is called a sample. An event is a set A such that A X . The probability
of an event, PX X 2 A is the proportion of times that you expect to see event A in a large
21 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY sample. The sum over the sample space of the probability distribution equals one:
P X 2 fxig = 1 :
xi 2 X Here we denote the elements of the sample space X with the lower case letter x. In many
cases we will write PX xi, P X = xi or P xi for PX X 2 fxig1. An RV which takes on
a nite or discrete set of values is known as a discrete random variable. An RV whose range
includes some in nite set of continuous values is known as a continuous random variable.
A bit of thought leads one to a conundrum regarding continuous RV's|since there are
an in nite number of possible outcomes the probability of almost every outcome will be zero.
This will in fact be a continuing annoyance to us as we move toward the de nition of entropy.
Instead of probability distributions for continuous RV's we use probability densities: pX x0 = lim P x0 X x0 + :
The probability of an event can just as easily be de ned from the density by
P xlow X xhigh =
pX xdx :
xlow The probability density of an RV always integrates to 1,
pX xdx = PX ,1 X 1 = 1 :
,1 It is...
View Full Document
- Spring '10
- The Land, Probability distribution, Probability theory, probability density function, Mutual Information, Paul A. Viola