This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Module 7: Discrete Probability
Theme 1: Elementary Probability Theory
Probability is usually associated with an outcome of an experiment. For example, the experiment may
be a throw of a coin, while the two possible outcomes are “heads” and “tails”. Roughly speaking,
probability will estimate our chance that the next outcome of this experiment is either a head or a tail
(here we assume that tail and head are equally likely, that is, the probability of tossing a head or a tail
is equal to ¼ or ¼±). An experiment is a procedure that gives a set of possible outcomes. In fact, the set of all possible outcomes is called the sample space (e.g., in the experiment with a coin, the sample space
head tail ). Finally, an event is a subset of the sample space (e.g., a head). When there are a ﬁnite
number of equally likely outcomes, Laplace suggested the following deﬁnition of the probability:
The probability of an event Ë (which is a subset of a ﬁnite sample space Ë ) of equally likely outcomes is
È where and events in Ë ´ µ Ë are cardinalities of the sets and Ë , respectively. We often call the favorable events, while events in Ë all possible events. Example 1: A box has 5 black balls and 7 green balls. What is the probability of selecting a green
ball? The sample space Ë consists of ½¾ balls. The event Therefore È ´ µ ½¾. select a green ball has seven elements. Example 2: Let two dice be rolled (we recall that a die has six sides, and each side has one, or two,
or , six dots). What is the probability that the sum of the numbers on the two dice is ½½? Let us ﬁrst build the probability space Ë . It consists of pairs ´ ¿ Ë µ where ½ (since every die has six outcomes, so two of them must have ¡ , so we have
outcomes). The event sum is equal to 11 consists of ´ µ´ µ
therefore, È ´ µ ¾¿ ½½ . The counting problems encountered so far were very simple. Consider now the following problem.
Example 3: Find the probability that a hand of ﬁve cards in poker contains four cards of one kind? 1 ¾ cards in a deck; there are ½¿ different kinds of cards, with 4 cards of We recall that there are each kind. These kinds are two’s, three’s, , tens, jacks, queens, kings, and aces. There are also four suits: spades, clubs, hearts, and diamonds.
The number of ways to choose ¾ is ´ ¾ µ (which is a large number). cards out of the cardinality of the sample space. Let us now consider the event This is that a hand has four cards of one kind. By the multiplication rule, a hand of ﬁve cards with four cards of the same kind is the number ´½¿ ½µ of ways to pick one kind ( ´ ½µ (in words, one in every ½¿) and the number of ways to pick the ﬁfth card, which is ). Therefore, by the above deﬁnition ½¿ ¡
´ ¾ µ ¼ ¼¼¼¾
´ ¾ µ possible outcomes and ´½¿ ½µ ¡ ´ ½µ “favorable” outcomes for
È since there are ´ µ Sometimes, we know the probability of events
combinations of events such as
occur), or
event Ë ( ½ ¾ and need to know the probability of
¾ ¾ (both events must does not occur). Let us start with the probability of the complementary
È Ë ¾ (i.e., at least one event occurs), . We claim that Indeed, since and ½ . ´ µ ½ ´ µ
È we obtain
È ´ µ Ë Ë Ë Ë ½ ´ µ
È Ë Example 4: What is the probability that among ﬁve randomly generated bits at least one is ¼?
This exactly the case when it easier to compute than . In this case there are ¾ possible binary strings of length ﬁve, only only one (i.e., ´¼
one, we ﬁnd
È since there are ¾ binary strings of length
È Let us now compute È ´ and there is only one string with all ½s. Hence ¿½
¿¾ È ¾
½ ¼ ¼ ¼ ¼µ) is the “favorable” ½
¾ ´ µ ´ µ ½ ´ µ ½ ¾ ½ all bits are 1 . Since µ. From previous modules we know that
¾
½ ·
¾ ½
¾ therefore, by the deﬁnition of probability
È ´ ½ ¾ µ ½ ¾
Ë ½ · ¾ ½ ¾ Ë ¾ ½
´ ½µ · ´ ¾µ ´
½ Ë
È Ë È 2 ¾ Ë È ½ ¾ µ In summary, we prove that
È ´ ½ ¾ µ È ´ ½µ · ´ ¾µ ´
È È ½ ¾ µ In words, the probability of union of two events is the sum of the probability of both events minus
the the probability of product of the events, When the events are disjoint (i.e.,
È ´ ½ ¾ µ È ´ ½ µ · ´ ¾ µ. ½ ¾ ), then È Example 5: What is the probability that a randomly selected positive integer smaller than equal to ½¼¼ is divisible either by ¾ or by
Let
½ ½ that the integer is divisible by ¾, and ¼ and it we need ? ½ ¾
¾ ¾ the event that the integer is divisible by . Clearly ¾¼. Observe that the event we are looking for is ½ ¾ . In order to compute
½¼ since there are ten numbers in the range ½ to ½¼¼ that are divisible by ½¼. Therefore, by the deﬁnition of probability we have
È ´ ½ ¾ µ ´´ ½ µ · ´ ¾ µ ´
¼ · ¾¼ ½¼
½¼¼ ½¼¼ ½¼¼
¿ È È È ½ Exercise 7A: What is the probability of generating a binary string ´¼
provided ¼ and ½ are equally likely. 3 ¾ µ ¼ ½ ½ ¼ ½ ¼µ of length seven Theme 2: Probability Theory
In the previous section, we assumed that all outcomes of the sample space Ë are equally likely. This
led us to the Laplace deﬁnition of probability. Here we generalize it.
Let Ë be a probability space. Throughout, we assume that outcomes (e.g.,
probability ª È is ﬁnite, and often we just list all of Ë will be called an event. We now deﬁne as a function from the set of subsets of Ë into the interval Ë Ë ×½ ×Ò ). Any subset Ë , then ª È ¼ ½. If we denote by ¼½
´ µ denotes the probability of the event such that the following three properties hold (below È ): ´ µ ¼; 1. È 2. È Ë ´ µ ½;
, then È ´ 3. if µ È ´ µ · ´ µ.
È The above three properties say that the probability of any event must be nonnegative, that the probability of a “sure” event (i.e., Ë ) is equal to one, and ﬁnally that the probability of the union of disjoint
events is the sum of the probabilities of the corresponding events.
Using these three assumptions one can prove many properties of probability (that we already
encountered in the previous section). For example, let
(that is, Ë be the complementary event to µ ½ ´ µ. Indeed, observe that is the same as not ). We have È ´ È Ë and are disjoint, hence by (c) we ﬁnd ½ ´ µ È Ë È ´´ µ µ ´ µ · ´ µ ´ µ · ´ µ
´ µ ½ ´ µ. By the way, as a corollary we see that
´ µ ´ µ ½ ´ µ ¼
Ë È Ë which proves our claim that È È È È È È Ë È Ë Let now all outcomes in Ë be equally likely. To be more precise, let Ë ½ ´ µ È × ÈÒ since by the second property above we have ½
Let now × ½ × È , that is Ò ´ µ ´ µ ×Ò and ´ µ (all events sum up to one). ½È × ÒÈ × . By the third property of the probability deﬁnition and the above we have
È ×½ ´ È × ½ × ¾ ´ µ È × ½ Ò Ë 4 × ´ µ È ×½ µ In the above we ﬁrst observe that the event
× is a union of the elementary events × ½ × ¾ . All elementary events are disjoint, hence we can sum probabilities, as the second line above shows. Finally, since every event is equally likely and there are Ò events, hence ´ µ ½ È ×½ Ò . We have just recovered Laplace’s deﬁnition of probability for equally likely outcomes.
Example 6: Find the probability that a randomly selected digits decimal number is also a valid octal number whose digits are between ¼ and .
First, a number is decimal, and Ü
length ´ digit number can be represented as ¾ ¼½ Ü½ Ü¾ µ where Ü Ü ¾ ¼½ if the if the number is octal. The number of decimal numbers of is ½¼ (just apply the multiplication rule). The number of valid octal numbers of length . Therefore, the probability is ¡ ½¼ is . Conditional Probability
Consider the case when you know that event
the probability of event has occurred, and knowing this you want to compute . This is known as the conditional probability and denoted as È ´ µ. Example 7: There are ﬁve black balls and ten green balls in a box. You select randomly a ball, and it
happens to be a green ball. You do not return this ball to the box. What is the probability that in the
second selection you pick up a green ball? If
and is the event of selecting a green ball in the ﬁrst pick, is the probability of choosing another green ball in the second pick, then the probability we are seeking is denoted as È ´ µ. In our case it is
È ´ µ ½ since after the ﬁrst selection there are only nine green balls in the box containing
used explicitly the fact that after picking a geen ball there are only
We can compute this probability in a different way. Observe that
È Let us now compute the probability of
occur ´ µ ½¼
½
. Event ½ ½ balls. (Here we balls left with ½ Ë and green balls.) ½¼, hence can occur in ½¼ ways out of ½ , while out of ½ since one ball was already taken out from the box in the pick. Hence
È µ ½¼ ¡ ½
½ ´ and then we “deﬁne” (see below for additional explanations) the conditional probability È ´
È ´ µ È ´
È ´ µ µ ½¼ ¡ ¡ ½
½ ½ ½¼ can µ as ½ Thus, we obtain the same result as the one computed directly. It suggests a deﬁnition of conditional
probability that we shall discuss next.
5 Let us generalize this example. Consider a sample space
event Ë and two events has occurred. Then the sample space Ë effectively reduces to the occurrence of event to those outcomes that fall into ´ , therefore, we must restrict
is the new sample space. but . Therefore for equally µ as follows
È . Assume . In a sense, In other words, the number of “favorable outcomes” is not likely events we compute È ´ Ë µ Observe, however, that
È ´ µ ¡ Ë Ë Ë Ë È ´ È In the second line above, we multiply and divide by
the probabilities È ´ µ and ´ µ. Ë ´ µ µ and then observe in the third line that we have È Actually, the last expression is used as a deﬁnition of the conditional probability.
Let and denoted as È ´ be events with È ´ µ ¼. ´ µ µ, is deﬁned as È ½¼± made by company
comes from company
Let to ﬁnd È ´ È ´
È ´ µ ½¼¼ chips made by company given , µ ¼¼¼ chips, ½¼¼¼ of them made by company Example 8: A box contains
It is known that The conditional probability of , the rest by company are defective, while only ± ¾¼¼ chips are defective. Compute the probability that if you pick up a defective chips it
. be the event that a chip is made by company and that a chip is defective. We need µ, that is, the probability that provided a chip is defective it i comes from company
For this we need ´ µ and ´
µ. But
½¼¼ · ¾¼¼ ¼ ¼
´ µ
¼¼¼
½¼¼ ¼ ¼¾
´
µ
¼¼¼
È . È È È Then
È ´ µ È ´
È ´ µ that is, one out of every three.
6 µ ¼ ¼¾
¼¼ ½
¿ . Independence
If È ´ µ and È È ´ µ, then the knowledge of does not change the probability of are independent events. Observe that the above condition is equivalent to ´ µ ´ µ, which serves as a deﬁnition. . We say that
È ´ µ È Two events and are said to be independent if and only if
È ´ µ È ´ µ ´ µ
È Example 8: Consider a ﬁvebit binary string. The probability of generating a zero is equal to Ô. Bits
are generated independently. What is the probability of getting ¼¼½½½?
Since we have independence we easily compute
È ´¼¼½½½µ È ´¼µ ¡ ´¼µ ¡ ´½µ ¡ ´½µ ¡ ´½µ
È È È ¾ È Ô ´½ µ¿
Ô since ½ Ô is the probability of generating a one.
Exercise 7B: Show that if and are independent events, then and are also independent events. Binomial Distribution and Bernoulli Trials
In the last example, we generated ﬁve bits and asked for the probability of getting ¼¼½½½. However, if we ask for the probability of generating two ¼s and three ½s, the situation is different. This time we do not specify where the two ¼s and three ½ are located. Therefore, strings like ¼½¼½½, ½½¼¼½, etc.
satisfy the description of the event. In fact, we have ´ ¾µ
´ ¿µ ways to select two zeros out of ﬁve. Thus this probability is equal to ´ ¾µ ¾ ´½ µ¿ ½¼ ¾ ´½ µ¿
Ô Ô Ô Ô and this should be compared with the answer to the previous example. For instance, if Ô
the above becomes ´ ¾µ¼ ½¾ ¡ ¼ ¿ ½¼ ¡ ¼ ¼½ ¡ ¼ ¾ ¼ ½, then ¼¼ ¾ We shall generalize the last situation, and introduce the so called Bernoulli trials and the binomial distribution. Consider an experiment that has two outcomes called successes and failures. Let
the probability of a success be Ô, while the probability of a failure Õ ½ . This experiment is Ô called the Bernoulli trial. Let us repeat it Ò times. Many problems in probability can be solved by
asking what is the probability of successes in Ò Bernoulli trials. The last example can be viewed as ﬁve Bernoulli trials with a success being a generation of a zero.
Let us now consider Ò independent Bernoulli trials with the probability of a success equal to Ô. What is the probability of obtaining successes. Since the outcomes are independent a particular trial
7 with successes has the probability Ô ´½ µ Ô out of Ò trials, therefore, the probability of ´ Ò ´ . But we can choose on Ò µ ways successes successes in Ò independent Bernoulli trials is µ ´½ µ Ò Ô Ò Ô (1) Considered as a function of , we call the above function the binomial distribution and denote it as ´ µ Ò Ô ´ Ò µ ´½ µ Ô Ô Ò . Observe that (1) is probability since by the deﬁnition of probability it sums up to one. More
precisely, by Newton’s summation formula discussed in Module 5
Ò ´ Ò µ ´½ µ Ô Ô ´ ·½ µ Ò Ô Ô ½ Ò ½ Ò ¼ as needed.1
Example 9: A biased coin is thrown 7 times. The probability of throwing a tail is ¼ . What is the probability of throwing three tails in four trials?
Clearly, we have the Bernoulli trials with the success being a throw of a tail. Hence, the probability
is equal to after substituting Ô ´ ¿µ´¼ µ¿ ¡ ¼
¼ ¼½ ¿ in (1). Random Variables
Many problems are concerned with a numerical values associated with the outcome of an experiment.
For example, we can assign value ½ to the tail when throwing a coin, and value ¼ when throwing a head. Such a numerical value assigned to an outcome is known as a random variable.
A random variable is a function from the sample space of an experiment to the set of
real numbers.
Example 10: Let us ﬂip a coin three times. Deﬁne a random variable ´ µ to be the number of tails that appear when Ø is the outcome. We have ´
´
´
´ ÀÀÀ
ÀÀÌ
ÌÌÀ
ÌÌÌ 1 µ
µ
µ
µ ¼ ´
´ µ
µ ´
´ ÀÌ À
Ì ÀÌ ¿ ÀÌ Ì We recall that by Newton’s formula
Ò Ò ´ · µ ´Ò
¼ 8 µ ½
µ ¾ Ì ÀÀ µ Ò Ø Having deﬁned a random variable, we can now introduce the probability mass function. Let
× Ø ¾ ´µ Ë × Ø , that is,
È Ø is the subset of Ë (an event) that assigns value Ø of ´ µ Ø È ´ µ
Ø ¾ × since Ø ´µ È ×
Ø ´µ is disjoint union of elementary events × such that . Then . × Ø Let us now discuss an important notion of probability theory, namely, the “expected value” of an
experiment. For example, one expects about ¼ tails when ﬂipping an unbiased coin ½¼¼ times. We are now in a position to deﬁnite it precisely. ´ µ over The expected value (also known as the mean value) of a random variable
× ¾ Ë taking values in Ø ¾ × ´ µ is deﬁned as
× × ¾ ´µ ´µ È × × ¾ Ë Ø ØÈ ´ Ø µ ´Ë µ The above formula extends the deﬁnition of “average value” known from high school. Indeed, let
all events Ø are equally likely, and assume that Ø ½¾ . WE learned in high school to Ò compute the average (expected value) as follows ½ · ¾ · ¡¡¡
Ò Ò ½ ¡ ½ · ¾ ¡ ½ · ¡¡¡ · ½
Ò Ò Ò Ò Ò ØÈ
Ø ´ Ø µ ½ which coincides with the above deﬁnition.
Example 11: We shall continue Example 10 assuming that the coin is fair (i.e., probability of a head
or a tail is ¼ ). From the previous example we ﬁnd that
È ½µ ´ ¾µ È satisfying ´ È ½ ¼µ È since, for example, ´ ´ ¿µ ÀÀÌ Ì ÀÀ ÀÌ À ½
¿
¿
½
, thus we have three out of ¾¿ ½ (i.e., the number of tails is equal to one). Therefore,
¼¡ ½ ·½¡ ¿ ·¿¡ ¿ ·¿¡ ½ ½ that is, on average we have ½ tails per three throws. 9 outcomes Let us now compute the expected value of the binomial distribution deﬁned above. We deﬁne
as the number of successes in Ò Bernoulli trials. Then2
Ò È ´ Ò µ µ ´½ µ ´ Ò ¼ Ô Ô Ò ¼ Ò ´ µ ´½ µ
´ ½µ ´ ½µ ´ µ ´½ µ
Ò Ô Ò ¼
Ò Ò ½ Ô Ò Ô Ò Ò Ò Ô Ò ´ ½ ½µ ½ ´½ µ´ ½µ ´ ½µ ÒÔ Ò Ô Ô Ò ½ ½ Ò ÒÔ ´ ½ µ ´½ µ ½ Ò Ô Ô Ò ¼ ´ · ½ µ ½ ÒÔ Ô Ô Ò ÒÔ The ﬁrst line is just the deﬁnition of the binomial distribution and the expected value. In the third line
we use the following property of the binomial coefﬁcients (see Module 4 and 6): ´ Ò µ Ò ´ µ Ò Ò ´ ½µ
´ ½µ ´ µ
Ò Ò Ò In the fourth line above we change the index of summation from ´ ½ ½µ
Ò ½, while in the ﬁfth line to we apply the Newton summation formula, discussed in Module 4 which we recall below ´ · µ
(In our case, Ô and ½ Ò ´ Ò µ Ò Ò ¼ .) Ô Expectation has some nice properties. For example, · · this is, the expectation of the sum of random variables is the sum of expectations. This is very
important result! Let us derive it. We have · ¾ × ¾ × 2 ´ µ ´ µ· ´ µ È × × × Ë ´ µ ´ µ· È ×
Ë · × ¾ × ´µ ´µ È × × Ë This derivation is quite long and can be omitted in the ﬁrst reading. We shall rederive the same result in Example 13 using simpler arguments. 10 Example 13: We just computed that ÒÔ for binomially distributed . We needed a long chain of computations. But we can prove the same result using the above property in a much easier
way. Observe that · ¡¡¡ ·
is equal to ½ when a success occurs and ¼ otherwise.
½ where · ¾ Ò Such a random variable is called the Bernoulli random variable or, more precisely, Bernoulli distributed random variable. Clearly, ½ ¡ · ¼ ¡ ´½ µ
Ô Ô . Since the expectation of a sum of random variables is the sum of Ô expectations, we have · ½ · ¡¡¡ · ¾ ÒÔ Ò as before, but this time we derive it in a simple way.
However, in general
and is not equal to . To assure this is true one must assume are independent deﬁned as follows:
Two random variables
È ´ ´µ
× and
Ø on the same sample space Ë are independent if ´µ
× Ö µ È ´ ´µ
× Ø µ¡ ´ ´ µ
È × Example 14: Let us roll two dice. What is the probability of getting
die. Let represent the number obtained on the ﬁrst die and Ö µ on the die and on the second the number rolled on the second die. Since the events are independent, we have
È ´ µ È µ¡ ´ ´ ½¡½ µ È We now prove the following result
Theorem 1 Let and are independent random variables. Then Proof. We have
Ø ¡ ´
ÖÈ Ø Ö µ Ø Ö Ø ¡ ´ µ ´ ÖÈ Ø È Ö µ Ø Ö ØÈ ´ µ Ø ÖÈ
Ö Ø 11 ´ Ö µ ½
¿ where in the second line we used independence, while in the third line we computed two independent
sums.
Finally, we shall discuss variance. The expected value of a random variable tells us its average
value but says nothing about variability of it. The reader should not forget that is a random variable and it (randomly) varies. While we would like to ﬁnd one synthetic number (e.g., the expected value)
to describe this random variable, such a characterization is usually very poor. Therefore, we try to
introduce some parameters that can tall us (in a simpliﬁed way) more about the random variable.
The variance, roughly speaking, determines how widely a random variable is distributed around the
expected value. Formally:
Let as Î be a random variable deﬁned on a sample space Ë . The variance of Ö , denoted , is ÎÖ ¾ × ´ µ´ ´ µ È × ´ µ¾ × µ¾ Ë ´ µ¾ . Since we
, the random variable ´ µ¾ tells us That is, the variance is the expected value of the following random variable:
expect that is more likely to concentrate around about variations of around the expected value. We can compute the variance using the following formula ÎÖ ¾ ¾ (2) Indeed, ´ µ¾ ¾
¾
¾ ¾ ¾ · ¾ · ¾ ¾ where above we used the fact that the expected value of a sum of random variables is the sum of the
expected values and the following identity (let’s call it the “square of sum identity”) ´ · µ¾ ¾ ·¾ · ¾ known from high school.
Example 15: Consider a Bernoulli random variable
otherwise. What is the variance of
We observe ﬁrst that taking value ½ with probability ? ½ ¡ · ¼ ¡ ´½ µ
Ô ¾ Ô . Then we compute Ô ½¾ ¡ · ¼¾ ¡ ´½ µ
Ô Ô 12 Ô Ô and zero Thus, a straightforward computation gives us ÎÖ ¾ ¾ Ô Ô ¾ Ô ´½ µ
Ô Ô ¡ Õ Unlike the expectation the variance of a sum of two random variables is not the sum of variances.
For this to hold, we need additional assumptions, as shown below.
Theorem 2. Let and be independent random variables. Then ÎÖ ·
In general, if , ÎÖ ½¾
½ Ò · ¾ ÎÖ ·Î Ö are pairwise independent random variables, then · ¡¡¡ · Ò ÎÖ ½ ·Î Ö ¾ · ¡¡¡ · Î Ö Ò Proof. From (2) we have ÎÖ · ´ · µ¾ · ¾ But ´ · µ¾ where in the second line we use the identity ´
independence of and ·¾
¾
·¾
¾
·¾ · ¾ · µ¾ ¾ ¾ · ¾ · ·¾ · ¾ ¾ and in the third line we apply . Summing up, we obtain ÎÖ · ´ · µ¾ · ¾
´ · µ¾ ´
·
µ¾
¾
·¾
· ¾ ¾
´ ¾ µ·´ ¾ Î Ö ·Î Ö ¾ ¾ ¾ ¾ which completes the proof. In the ﬁrst line we use the fact that Î Ö ¾ ¾ (derived above), then we use again the square of sum identity, then we rearrange terms of the sum, and ﬁnally
obtain the desired identity.
Example 16: Let us compute the variance of the binomial distribution. We use the representation of
binomial distribution from Example 13, that is,
½ · ¡¡¡ ·
13 Ò where are Bernoulli distributed with Î Ö Ô ´½ µ as computed in Example 15. Therefore,
Ô by the last theorem ÎÖ ÎÖ ½ · ¡¡¡ · Ò ÎÖ ½ ·¡¡¡ · Î Ö Ò ´½ µ ÒÔ Ô That is, the variance of the sum of Bernoulli distributed random variables is the sum of variances of
individual random variables, and it is equal to ÒÔ´½ Ôµ. 14 ...
View
Full
Document
 Fall '08
 W.Szpankowski

Click to edit the document details