Unformatted text preview: Chapter 3 Axiomatic approach to probability
Chapter Overview: Axioms of probability and terminolgy Basic probability theorems Special cases of probability space:  Discrete (finite and countably infinite)  Continuous (uncountably infinite) 48 3.1 Axioms of probability 49 3.1 Axioms of probability Random experiment: An experiment, either natural or manmade, in which one among several identified results are possible, is called a random experiment. The possible results of the experiments are called outcomes. A particular realization of the experiment, leading to a particular outcome, is called a trial. Probability space: In the axiomatic approach to probability, a random experiment is modeled as a probability space, the latter being a triplet (S, F, P ), where  S is the sample space,  F is the set of events (events algebra),  P (.) is the probability function. These concepts are described individually below. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 50 Sample space: The sample space S is the set of all possible results, or outcomes, of the random experiment. In practical applications, S is defined by the very nature of the problem under consideration. S may be finite, countably infinite or uncountably infinite. The elements of S, i.e. the experimental outcomes, will usually be denoted by lower case letters (e.g.: s, a, x, etc...) Example 3.1:
Consider a random experiment that consists in flipping a coin twice. A suitable sample space may be defined as S = {HH, HT, T H, T T } where, for example, outcome HT corresponds to heads on the first toss and tails on the second. Here, S is finite with only 4 outcomes. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 51 Events: In probability theory, an event A is defined as a subset of S, i.e. A S. Referring to a particular trial of the random experiment, we say that A occurs if the experimental outcome s A. Special events S and :  Since for any outcome s, we have s S by definition, S always occurs and is thus called the certain event.  Since for any outcome s, we have s , never occurs and is thus called the impossible event. Example 3.1 (continued):
Consider the event A = {getting heads on the first flip}. This can equivalently be represented by the following subset of S: A = {HH, HT } S Let s denote the outcome of a particular trial: if s = HH or HT A occurs if s = T H or T T A does not occur c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 52 Events algebra: Let F denote the set of all events under consideration in a given random experiment. Note that F is a set of subsets of S Clearly:  F must be large enough to contain all interesting events,  but not so large as to contain impractical events that lead to mathematical difficulties. (This may be the case when S is uncountably infinite, e.g. S = Rn .) In the axiomatic approach to probability, it is required that F be a algebra: (a) S F (b) A F Ac F (c) A1 , A2 , ... F i Ai F Whenever S is finite, the simplest and most appropriate choice for F is generally the power set PS . The proper choice for F when S in infinite will be discussed later. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 53 Example 3.1 (continued):
Consider flipping a coin twice and let S = {HH, HT, T H, T T } be the corresponding sample space. An appropriate choice for F here is PS , i.e. the set of all subsets of S: PS = {, {HH}, {HT }, {T H}, {T T }, {HH, HT }, {HH, T H}, {HH, T T }, {HT, T H}, {HT, T T }, {T H, T T }, {HH, HT, T H}, {HH, HT, T T }, {HH, T H, T T }, {HT, T H, T T }, S} Note that F = PS contains 16 = 24 different subsets, i.e. events, that may or may not occur during a particular realization of the random experiment. For example, the event {HH, HT, T H} F corresponds to obtaining at least one heads when you flip the coin twice. If you think about it, each event corresponds to a specific statement about the experimental outcome and here, there are only 16 possible different statements of this type that can be made. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 54 The probability function: P is a function that maps events A in F into real numbers in R, that is: P : A F P (A) R The number P (A) is called the probability of the event A. The function P (.) must satisfy the following axioms: Axiom 1: The function P is nonnegative: P (A) 0 Axiom 2: The function P is normalized so that P (S) = 1 (3.3) (3.2) (3.1) Axiom 3: Let A1 , A2 , A3 , ... be a sequence of mutually exclusive events, that is, Ai Aj = for i = j. Then P( Ai ) =
i=1 P (Ai ) (3.4) i=1 c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 55 Remarks: From an operational viewpoint, the number P (A) may be interpreted as a measure of the likelihood of event A in a particular realization of the random experement. If P (A) = P (B), we say that events A and B are equally likely (this does NOT imply that A = B). As a special case of Axiom 3, it follows that for any events A and B, A B = P (A B) = P (A) + P (B) (3.5) In the special case of a finite sample space S, it can be shown that (3.5) is in fact equivalent to Axiom 3. Thus, when S is finite, we may replace Axiom 3 (infinite additivity) by the simpler condition (3.5). c 2003 Beno^ Champagne it Compiled February 2, 2012 3.1 Axioms of probability 56 Example 3.1 (continued):
Let the function P be defined as follows, for any A F: P (A) N (A) 4 where N (A) denotes the number of elements in subset A. For example, consider event A = {at least on tails}; we have A = {T H, HT, T T } N (A) = 3 3 P (A) = 4 It can be verified easily that function P satisfies all the axioms of probability:  Axiom 1: For any event A, N (A) 0 and therefore, P (A) = N (A)/4 0.  Axiom 2: Since N (S) = 4, we immediately obtain P (S) = N (S)/4 = 1.  Axiom 3: Observe that if A B = , then N (A B) = N (A) + N (B) and therefore P (A B) = N (A B) 4 N (A) N (A) = + = P (A) + P (B) 4 4 c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 57 3.2 Basic theorems Introduction: Several basic properties follow from the axiomatic definition of the probability function P (A). These are listed below as theorems along with their proof. Theorem 3.1: For any event A F: P (Ac ) = 1  P (A) (3.6) Proof: Observe that A Ac = and A Ac = S. Thus, using Axiom 3, we have: P (A) + P (Ac ) = P (A Ac ) = P (S) = 1, or equivalently, P (Ac ) = 1  P (A). Corollary: For any event A F: 0 P (A) 1 Proof: Left as exercise. Theorem 3.2: P () = 0. (3.8) (3.7) Proof: Observe that = S c . Thus, invoking Theorem 3.1 and Axiom 2, we have: P () = P (S c ) = 1  P (S) = 0 . c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 58 Theorem 3.3: If A B, then (a) (b) P (B  A) = P (B)  P (A) P (A) P (B) (3.9) (3.10) Proof: Since A B, set B may be expressed as the union B = A (B  A) where A and B  A are mutually exclusive, that is A (B  A) = . The Venn diagram below illustrates this situation:
S B A BA Figure 3.1: Venn diagram for Theorem 3.3. Using axiom 3, we have P (B) = P (A (B  A)) = P (A) + P (B  A) (3.11) which proves part (a). To prove part (b), simply note (see Axiom 1) that P (B  A) 0. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 59 Theorem 3.4: For arbitrary events A and B, we have P (A B) = P (A) + P (B)  P (A B) Proof: Observe that for any events A and B, we can always write A B = A (B  (A B)) (3.13) (3.12) where A and B (AB) are mutually exclusive. This is illustrated by means of a Venn diagram below:
S A B AB BAB Figure 3.2: Venn diagram for Theorem 3.4. (Note: AB A B.) Invoking Axiom 3, we first obtain P (A B) = P (A) + P (B  (A B)) Since A B B, Theorem 3.3 yields P (B  (A B)) = P (B)  P (A B) Eq. (3.12) follows by combining the above two identities. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 60 Remarks: Theorem 3.4 may be generalized to a union of more than two events. In the case of three events, say A, B and C, the following relation can be derived P (ABC) = P (A)+P (B)+P (C)P (AB)P (AC)P (BC)+P (ABC). (3.14) The above formula can be proved by repeated application of Theorem 3.4. This is left as an exercise. See the textbook for a more general formula applicable to a union of n events, where n is an arbitrary positive integer. Theorem 3.5: For any events A and B: P (A) = P (A B) + P (A B c ). (3.15) Proof: The theorem follows from Axiom 3 by noting that A B and A B c are mutually exclusive and that their union is equal to A (see Fig. 3.3).
S A
ABc AB B Figure 3.3: Venn diagram for Theorem 3.5. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 61 Example 3.2:
In a certain city, three daily newspapers are available, labelled here as A, B and C for simplicity. The probability that a randomly selected person reads newspaper A is P (A) = .25. Similarly, for newspapers B and C, we have P (B) = .20 and P (C) = .13. The probability that a person reads both A and B is P (AB) = P (AB) = .1. In the same way, P (AC) = .08, P (BC) = .05 and P (ABC) = .04. (a) What is the probability that a randomly selected person does not read any of these three newspapers? (b) What is the probability that this person reads only B, i.e. reads B but not A nor C? c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 62 Theorem 3.6: For any increasing or decreasing sequence of events A1 , A2 , A3 , ... we have
i lim P (Ai ) = P ( lim Ai )
i (3.16) Remarks: Recall that a sequence Ai , i N, is increasing if A1 A2 A3 ..., in which case we define limi Ai = i=1 Ai . Similarly, a sequence Ai , i N, is decreasing if A1 A2 A3 ..., in which case we define limi Ai = i=1 Ai . Theorem 3.6 is essentially a statement about the continuity of the probability function P . Specifically, it says that under proper conditions on the sequence Ai (i.e. increasing or decreasing), the limit operation in (3.16) can be passed inside the argument of P (.). c 2003 Beno^ Champagne it Compiled February 2, 2012 3.2 Basic theorems 63 Proof (optional reading): First consider the case of an increasing sequence, i.e. A1 A2 A3 ... Define a new sequence of events as follows: B1 = A1 and Bi = Ai  Ai1 for any integer i 2. Note that the events Bi so defined are mutually exclusive, i.e. Bi Bj = if i = j. Furthermore, the following relations hold
i Bj = Ai
j=1 Bj =
j=1 Aj
j=1 Making use of above results together with Axiom 3, we first obtain: P ( lim Ai ) = P (
i Aj ) = P ( Bj ) =
j=1 P (Bj ) (3.17) j=1 j=1 Finally, the infinite summation can be expressed in terms of limits as follows: i i P (Bj ) = lim
j=1 i P (Bj ) = lim P (
j=1 i Bj ) = lim P (Ai )
j=1 i (3.18) A proof of (3.16) for decreasing sequences can be derived in a somewhat similar way. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 64 3.3 Discrete probability space Introduction: In many applications of probability (games of chance, simple engineering problems, etc.), the sample space S is either finite or countably infinite. The word discrete is used to describe anyone of these two situations. Specifically, we say that a probability space (S, F, P ) is discrete whenever the sample space S is finite or countably infinite. In this section, we discuss discrete spaces along with related special cases of interest. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 65 3.3.1 Finite probability space Sample space: The sample space S is a finite set comprised of N distinct elements: S = {s1 , s2 , ..., sN } (3.19) where N is a positive integer and si denotes the ith possible outcome. Events algebra: In the finite case, it is most convenient to take for events algebra the power set of the sample space S: F = PS = set of all subsets of S = {, {s1 }, {s2 }, ..., {sN }, {s1 , s2 }, {s1 , s3 }, ..., S} (3.20) That is, the events algebra consists of all possible subsets of S. Indeed, in the finite case, it is usually not advantageous nor necessary to exclude certain subsets of S from F. Recall that PS , the power set of S, contains 2N distinct elements (i.e. subsets). Thus, there are 2N possible events or different statements that can be made about the experimental outcome. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 66 Probability function: In the finite case, a standard way to define the probability function P (.) is via the introduction of a probability mass pi . To each si S, i = 1, ..., N , we associate a real number pi , such that: (a) (b)
i=1 pi 0,
N i = 1, ..., N (3.21) (3.22) pi = 1 The probability of any event A F is then defined as P (A) =
si A pi (3.23) For example, if A = {s1 , s4 , s6 }, then P (A) = p1 + p4 + p6 . In particular, for the elementary events {si }, we have P ({si }) = pi , i = 1, ..., N (3.24) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 67 Axioms of probability: It may be verified that the probability function P (.) so defined satisfies the probability Axioms: Axiom 1: From the condition pi 0 in (3.21), it follows that P (A) =
si A pi 0 Axiom 2: From condition (3.22), it follows that
N P (S) =
i=1 pi = 1 Axiom 3: Suppose A and B have no common element (i.e. A B = ), then we have P (A B) =
si AB pi pi +
si A si B = Example 3.3: pi = P (A) + P (B) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 68 3.3.2 Equiprobable space Definition: This is a special case of the finite probability space. We say that a probability space is equiprobable (also equilikely) if it is finite and the probability mass pi are all equal. The probability mass: Let N be the number of possible outcomes in the sample space S. Suppose that the numbers pi are all equal. Then, from condition (3.22), i.e.
N i=1 pi = 1, it follows that P ({si }) = pi = 1 N for all i = 1, ..., N (3.25) Probability function: Consider an arbitrary event A F, containing N (A) distinct elements. From (3.23) and (3.25), it follows that P (A) = N (A) N (3.26) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 69 Remarks: We say that the possible outcomes si S are equally likely. Equation (3.26) corresponds to the classical definition of probability, as discussed in Chapter 1. In problem statements, the following standard terminology is used to indicate an equiprobable space:  random selection among N possibilities;  a fair experiment  equiprobable or equilikely outcomes Example 3.4:
What is the probability of at least one 6 when rolling four fair dice? Example 3.5: Standard birthday problem
What is the probability that at least two sutdents in a class of size n have the same birtday? c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 70 3.3.3 Countably infinite probability space Sample space: The sample space S is a countably infinite set represented as S = {s1 , s2 , s3 , ...} where si , i N, denotes the ith possible outcome. Example of countably infinite sets include N, Z and Q. Events algebra: As in the finite case, it is usually most convenient to take as events algebra the power set of S: F = PS = {A : A S} (3.28) (3.27) Observe that since S is infinite, so is F = PS and thus the number of events under consideration is infinite. Some of these events are finite, such as the elementary events {si } for i N, while other are infinite, such as S or, for example, A = {si : i is even } = {s2 , s4 , s6 , ...}. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 71 Probability function: Much the same way as in the finite case, the probability function P (.) is defined via a probability mass pi . To every si S, where i now takes value in the set N, we associate a real number pi such that: (a) (b)
i=1 pi 0, for all i N (3.29) (3.30) pi = 1 The probability of any event A F is defined as P (A) =
si A pi (3.31) In particular, for any i N, we have P ({si }) = pi . It may be verified that the probability function P (.) so defined satisfies all the probability Axioms. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.3 Discrete probability space 72 Remark: The concept of an equiprobable space does not make sense here: If pi was constant, condition (3.30) could not be satisfied. Example 3.6:
Consider flipping a fair coin until heads is observed for the first time. What is the probability that the number of required flips is even? Solution: c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 73 3.4 Continuous probability space Introduction: In many engineering applications of probability (e.g. design of a radio receiver, speech recognition system, image analysis, etc.) the sample space is uncountably infinite or, equivalently, continuous. We say that a probability space (S, F, P ) is continuous whenever the sample space S is uncountably infinite. The proper, formal mathematical treatment of this case is beyond the scope of this course. Here, we adopt an engineering approach, relying more on intuition than mathematical formalism. You will have to accept certain results and concepts without complete justification. Still, we try to explain some of the technical difficulties associated to continuous spaces and we describe some of the mathematical apparatus available to handle this situation. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 74 3.4.1 Onedimensional (1D) continuous space Sample space: S is either the set of real numbers R, or an interval thereof: S=R where a < b are real numbers. These are not the only possibilities but they cover most cases of interest. Note: the elements of S cannot be counted. Example: Waiting time of a person at a bus station. Analog voltage measurement on 5 volts scale: S = [5, +5] R The power dissipated in a resistor: S = [0, ) or S = (a, b) R (3.32) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 75 Events algebra: In the continuous case, it is NOT convenient to take the power set of S as events algebra, so: F = PS : PS includes some strange and complex subsets of R that are counterintuitive, of no interest in engineering applications and pose serious mathematical difficulties. In practice, only those events that belong to the socalled Borel field of S, denoted BS , are included in the events algebra, that is F = BS PS (3.33) While BS is smaller than PS , it contains all subsets of practical significanve in applications of probability. This includes intervals of the real axis and various combinations thereof. See next page for additional explanations. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 76 Borel field (optional reading): For simplicity, assume S = R. Intervals from R may be combined via union, intersection and complementation to generate more complex subsets of R. The Borel field of R, denoted BR may be defined as the smallest algebra that contains as elements all intervals of R. For example, the following subsets of R all belong to BR :  The intervals (a, b), [a, b), etc., with a, b R.  Any subset of R obtained from such intervals via a countable number of union, intersection and/or complementation operations. Because the Borel field BR is made up of subsets of R, it is a subset of the power set PR . However, BR does not contain every subset of R: BR PR (3.34) The Borel field BR essentially contains those subsets of R which are meaningful from an application perspective. Other less interesting and problematic subsets are left out. Since BR is a algebra, it can be used as an events algebra in a probability model. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 77 Probability function: A standard way to define the probability function P (.) is via a probability density (x). To each x S R, we associate a real number (x), such that: (a) (b)
S (x) 0, for all x S (3.35) (3.36) (x)dx = 1 The probability of any event A F = BS is then defined as P (A) =
A (x)dx (3.37) It may be verified that the probability function P (.) so defined satisfies the probability axioms A1, A2 and A3:
 Axiom 1: From (3.37) and (3.35), it follows that P (A) =
A (x) dx 0  Axiom 2: From (3.37) and (3.36), we have P (S) =
S (x) dx = 1  Axiom 3: Suppose A B = . Invoking basic properties of integration, we have P (A B) =
AB (x) dx (x)dx +
A B = (x)dx = P (A) + P (B) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 78 Uniform probability space: We say that a continuous 1D probability space is uniform if the sample space has finite length and the probability density (x) is constant. This is the simplest case of a 1D continuous probability space. The sample space S is typically a bounded interval, as in S = (a, b) or S = [a, b] , where a < b are bounded real numbers (i.e. a, b < ). It does not matter whether the interval S is open, closed, or semiopen. Assuming that the function (x) is constant, it immediately follows from condition (3.36) that (x) = 1 ba for all x (a, b) (3.38) The probability function is easily obtained by inserting (3.38) into (3.37). Specifically, for any event A F, we find: P (A) = 1 ba dx =
A length of A ba (3.39) The following special cases are of interest:  If A is an interval of the type A = (, ) contained in S, i.e. a b, then P (A) =  For any x S, we have P ({x}) = 0 Example 3.7:
Random selection of a point from the interval [1, 1]...  ba (3.40) (3.41) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 79 3.4.2 Continuous probability space in higher dimensions In this section, we consider the generalization of the onedimensional continuous probability space introduced in Section 3.4.1 to n dimensions, where n is a positive integer. Sample space: The sample space is typically Rn or a subset thereof, i.e.: S Rn Examples include the plane R2 , the threedimensional space R3 or specific regions thereof (e.g. a delimited surface in R2 or volume in R3 ). Events algebra: The standard choice is F = BS , which contains all the subsets of practical interest in engineering applications. For example, if S = R2 , the Borel field BS will contain any geometrical region of practical interest within the real plane, such as:  points, lines, curves, and geometrically delimited areas.  other regions obtained from union, intersection and complentation of above regions. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 80 Probability function: P (.) may be defined via a probability density function (x), where x S Rn is now a vector (when n 2). To each element x in S, we associate a real number (x), such that: (a) (b) (x) 0, ...
S for all x S (3.42) (3.43) (x) dx = 1 The probability of any event A F = BS is then defined as P (A) = ...
A (x) dx (3.44) It may be verified that the probability function P (.) so defined satisfies the probability Axioms. For now, we shall only consider a special case of (3.42)(3.43) known as the uniform probability space. c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 81 Uniform probability space: Let S Rn . For any event A BS , we define M (A) = ...
A dx (3.45) The number M (A) (0 M (A) ) is called the measure of A. A probability space is uniform (equilikely) if its sample space S Rn has a finite measure, i.e. M (S) < , and (x) is constant for all x S. Suppose p(x) is constant. Then, from (3.43), it follows that (x) = 1 , M (S) for all x S (3.46) Consider an arbitrary event A F with measure M (A). Using (3.44), (3.46) and (3.45), we obtain: P (A) = ...
A (x)dx = M (A) M (S) (3.47) c 2003 Beno^ Champagne it Compiled February 2, 2012 3.4 Continuous probability space 82 Remarks: For n = 1, 2, 3, the concept of measure admits an immediate physical interpretation: A R M (A) = length of A A R2 M (A) = area of A A R3 M (A) = volume of S In problem statements, look for:  random selection from  fair experiment  uniformly distributed outcomes Example 3.8:
Consider the random selection of two real numbers x and y from the interval [0, 1]. What is the probability that x > 2y? c 2003 Beno^ Champagne it Compiled February 2, 2012 ...
View
Full
Document
This note was uploaded on 02/12/2012 for the course ECSE 305 taught by Professor Champagne during the Spring '09 term at McGill.
 Spring '09
 Champagne

Click to edit the document details