The theory of probability will help us determine how

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: The theory of probability will help us determine how statistics converge, what they converge to, and more importantly how alternative statistics might be more appropriate. In this chapter we will introduce the basic mathematics that underly probability. In subsequent chapters we will assume that the reader has a fairly thorough knowledge of probability, statistics, entropy, and coding. This chapter is intended both as a review of the required techniques and theorems and as a bridge for a reader unfamiliar with these topics. The nal sections of this chapter contain a new analysis and discussion of Parzen density estimation. Parzen density estimation will play an important role in the estimation of entropy in subsequent chapters. We will sometimes use a simpli ed, or looser, de nition of concepts like events and random variables than is typical. If you get overly confused reading this chapter, any good book on probability should clear things up Papoulis, 1991; Baclawski et al., 1990. In general we will leave out the proofs of anything that is easily looked up, and of course most of the theory presented is cited here without reference. Unfortunately probability and statistics seems to have many con icting standard notations. In our own de nitions we will try to be consistent with the prevailing conventions. 20 2.1. RANDOM VARIABLES AI-TR 1548 2.1 Random Variables In many cases an algebraic model of a physical system allows us to accurately predict its behavior. For instance, circuit theory can be used to analyze a particular circuit and predict that when a switch is closed current will ow. The physics of many circuits can be modeled as equations where unknown quantities are recorded as variables. In the case of a switched circuit, we can model the resistance of a switch as a variable that can take on one of two values: zero when closed or in nity when open. The current that ows through that resistor can then be predicted from algebraic manipulations. Conversely, knowing the value of the current allows us to predict whether the switch is open or closed. The equivalence of a circuit and a circuit model is fundamental within the elds of physics and engineering. In a wide variety of physical systems the behavior of particular measurements cannot be easily predicted. The voltage of a wire may be a complex function of the circuit and the thermal noise in a resistor. Even when all of the other circuit variables are known, the voltage cannot be predicted accurately. Luckily, all hope is not lost. We may not know the actual voltage but we may know that it will be near" V0 , and that it is never, in our experience, higher than Vmax . Probability, random processes, and random variables provide the tools to quantify the intuitive concepts of near" and never". A random variable, or RV, is a variable whose value is unpredictable. Recall that a variable is a symbol X and a set of values X over which the variable can range. For example, X could range over the real numbers between 1 and 10. In this thesis we will assume X will always be a subset of the real numbers. A random variable X is a variable together with a function PX : X ! 0; 1 called a probability distribution. For example we can construct an RV that models the roll of a six sided die. If the die is fair" we cannot in advance know what its value will be, but we do know that its value will be one of 6 integers from 1 to 6, and that each will appear roughly one sixth of the time. The RV that describes this die includes the variable symbol X , a sample space X = f1; 2; 3; 4; 5; 6g of possible outcomes, and a probability distribution function PX n which tells us the probability that X will take on the value n. A particular value of an RV is called a trial, for example from a die roll. A collection of trials is called a sample. An event is a set A such that A X . The probability of an event, PX X 2 A is the proportion of times that you expect to see event A in a large 21 Paul A. Viola CHAPTER 2. PROBABILITY AND ENTROPY sample. The sum over the sample space of the probability distribution equals one: X P X 2 fxig = 1 : xi 2 X Here we denote the elements of the sample space X with the lower case letter x. In many cases we will write PX xi, P X = xi or P xi for PX X 2 fxig1. An RV which takes on a nite or discrete set of values is known as a discrete random variable. An RV whose range includes some in nite set of continuous values is known as a continuous random variable. A bit of thought leads one to a conundrum regarding continuous RV's|since there are an in nite number of possible outcomes the probability of almost every outcome will be zero. This will in fact be a continuing annoyance to us as we move toward the de nition of entropy. Instead of probability distributions for continuous RV's we use probability densities: pX x0 = lim P x0 X x0 +  : !0 The probability of an event can just as easily be de ned from the density by Z xhigh P xlow X xhigh  = pX xdx : xlow The probability density of an RV always integrates to 1, Z1 pX xdx = PX ,1 X 1 = 1 : ,1 It is...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online