mod7 - Module 7: Discrete Probability Theme 1: Elementary...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Module 7: Discrete Probability Theme 1: Elementary Probability Theory Probability is usually associated with an outcome of an experiment. For example, the experiment may be a throw of a coin, while the two possible outcomes are “heads” and “tails”. Roughly speaking, probability will estimate our chance that the next outcome of this experiment is either a head or a tail (here we assume that tail and head are equally likely, that is, the probability of tossing a head or a tail is equal to ¼ or ¼±). An experiment is a procedure that gives a set of possible outcomes. In fact, the set of all possible outcomes is called the sample space (e.g., in the experiment with a coin, the sample space head tail ). Finally, an event is a subset of the sample space (e.g., a head). When there are a finite number of equally likely outcomes, Laplace suggested the following definition of the probability: The probability of an event Ë (which is a subset of a finite sample space Ë ) of equally likely outcomes is È where and events in Ë ´ µ Ë are cardinalities of the sets and Ë , respectively. We often call the favorable events, while events in Ë all possible events. Example 1: A box has 5 black balls and 7 green balls. What is the probability of selecting a green ball? The sample space Ë consists of ½¾ balls. The event Therefore È ´ µ ½¾. select a green ball has seven elements. Example 2: Let two dice be rolled (we recall that a die has six sides, and each side has one, or two, or , six dots). What is the probability that the sum of the numbers on the two dice is ½½? Let us first build the probability space Ë . It consists of pairs ´ ¿ Ë µ where ½ (since every die has six outcomes, so two of them must have ¡ , so we have outcomes). The event sum is equal to 11 consists of ´ µ´ µ therefore, È ´ µ ¾¿ ½½ . The counting problems encountered so far were very simple. Consider now the following problem. Example 3: Find the probability that a hand of five cards in poker contains four cards of one kind? 1 ¾ cards in a deck; there are ½¿ different kinds of cards, with 4 cards of We recall that there are each kind. These kinds are two’s, three’s, , tens, jacks, queens, kings, and aces. There are also four suits: spades, clubs, hearts, and diamonds. The number of ways to choose ¾ is ´ ¾ µ (which is a large number). cards out of the cardinality of the sample space. Let us now consider the event This is that a hand has four cards of one kind. By the multiplication rule, a hand of five cards with four cards of the same kind is the number ´½¿ ½µ of ways to pick one kind ( ´ ½µ (in words, one in every ½¿) and the number of ways to pick the fifth card, which is ). Therefore, by the above definition ½¿ ¡ ´ ¾ µ ¼ ¼¼¼¾ ´ ¾ µ possible outcomes and ´½¿ ½µ ¡ ´ ½µ “favorable” outcomes for È since there are ´ µ Sometimes, we know the probability of events combinations of events such as occur), or event Ë ( ½ ¾ and need to know the probability of ¾ ¾ (both events must does not occur). Let us start with the probability of the complementary È Ë ¾ (i.e., at least one event occurs), . We claim that Indeed, since and ½ . ´ µ ½ ´ µ È we obtain È ´ µ Ë Ë Ë Ë ½ ´ µ È Ë Example 4: What is the probability that among five randomly generated bits at least one is ¼? This exactly the case when it easier to compute than . In this case there are ¾ possible binary strings of length five, only only one (i.e., ´¼ one, we find È since there are ¾ binary strings of length È Let us now compute È ´ and there is only one string with all ½s. Hence ¿½ ¿¾ È ¾ ½ ¼ ¼ ¼ ¼µ) is the “favorable” ½ ¾ ´ µ ´ µ ½ ´ µ ½ ¾ ½ all bits are 1 . Since µ. From previous modules we know that ¾ ½ · ¾ ½ ¾ therefore, by the definition of probability È ´ ½ ¾ µ ½ ¾ Ë ½ · ¾ ½ ¾ Ë ¾ ½ ´ ½µ · ´ ¾µ ´ ½ Ë È Ë È 2 ¾ Ë È ½ ¾ µ In summary, we prove that È ´ ½ ¾ µ È ´ ½µ · ´ ¾µ ´ È È ½ ¾ µ In words, the probability of union of two events is the sum of the probability of both events minus the the probability of product of the events, When the events are disjoint (i.e., È ´ ½ ¾ µ È ´ ½ µ · ´ ¾ µ. ½ ¾ ), then È Example 5: What is the probability that a randomly selected positive integer smaller than equal to ½¼¼ is divisible either by ¾ or by Let ½ ½ that the integer is divisible by ¾, and ¼ and it we need ? ½ ¾ ¾ ¾ the event that the integer is divisible by . Clearly ¾¼. Observe that the event we are looking for is ½ ¾ . In order to compute ½¼ since there are ten numbers in the range ½ to ½¼¼ that are divisible by ½¼. Therefore, by the definition of probability we have È ´ ½ ¾ µ ´´ ½ µ · ´ ¾ µ ´ ¼ · ¾¼ ½¼ ½¼¼ ½¼¼ ½¼¼ ¿ È È È ½ Exercise 7A: What is the probability of generating a binary string ´¼ provided ¼ and ½ are equally likely. 3 ¾ µ ¼ ½ ½ ¼ ½ ¼µ of length seven Theme 2: Probability Theory In the previous section, we assumed that all outcomes of the sample space Ë are equally likely. This led us to the Laplace definition of probability. Here we generalize it. Let Ë be a probability space. Throughout, we assume that outcomes (e.g., probability ª È is finite, and often we just list all of Ë will be called an event. We now define as a function from the set of subsets of Ë into the interval Ë Ë ×½ ×Ò ). Any subset Ë , then ª È ¼ ½. If we denote by ¼½ ´ µ denotes the probability of the event such that the following three properties hold (below È ): ´ µ ¼; 1. È 2. È Ë ´ µ ½; , then È ´ 3. if µ È ´ µ · ´ µ. È The above three properties say that the probability of any event must be nonnegative, that the probability of a “sure” event (i.e., Ë ) is equal to one, and finally that the probability of the union of disjoint events is the sum of the probabilities of the corresponding events. Using these three assumptions one can prove many properties of probability (that we already encountered in the previous section). For example, let (that is, Ë be the complementary event to µ ½ ´ µ. Indeed, observe that is the same as not ). We have È ´ È Ë and are disjoint, hence by (c) we find ½ ´ µ È Ë È ´´ µ µ ´ µ · ´ µ ´ µ · ´ µ ´ µ ½ ´ µ. By the way, as a corollary we see that ´ µ ´ µ ½ ´ µ ¼ Ë È Ë which proves our claim that È È È È È È Ë È Ë Let now all outcomes in Ë be equally likely. To be more precise, let Ë ½ ´ µ È × ÈÒ since by the second property above we have ½ Let now × ½ × È , that is Ò ´ µ ´ µ ×Ò and ´ µ (all events sum up to one). ½È × ÒÈ × . By the third property of the probability definition and the above we have È ×½ ´ È × ½ × ¾ ´ µ È × ½ Ò Ë 4 × ´ µ È ×½ µ In the above we first observe that the event × is a union of the elementary events × ½ × ¾ . All elementary events are disjoint, hence we can sum probabilities, as the second line above shows. Finally, since every event is equally likely and there are Ò events, hence ´ µ ½ È ×½ Ò . We have just recovered Laplace’s definition of probability for equally likely outcomes. Example 6: Find the probability that a randomly selected -digits decimal number is also a valid octal number whose digits are between ¼ and . First, a number is decimal, and Ü length ´ digit number can be represented as ¾ ¼½ ܽ ܾ µ where Ü Ü ¾ ¼½ if the if the number is octal. The number of decimal numbers of is ½¼ (just apply the multiplication rule). The number of valid octal numbers of length . Therefore, the probability is ¡ ½¼ is . Conditional Probability Consider the case when you know that event the probability of event has occurred, and knowing this you want to compute . This is known as the conditional probability and denoted as È ´ µ. Example 7: There are five black balls and ten green balls in a box. You select randomly a ball, and it happens to be a green ball. You do not return this ball to the box. What is the probability that in the second selection you pick up a green ball? If and is the event of selecting a green ball in the first pick, is the probability of choosing another green ball in the second pick, then the probability we are seeking is denoted as È ´ µ. In our case it is È ´ µ ½ since after the first selection there are only nine green balls in the box containing used explicitly the fact that after picking a geen ball there are only We can compute this probability in a different way. Observe that È Let us now compute the probability of occur ´ µ ½¼ ½ . Event ½ ½ balls. (Here we balls left with ½ Ë and green balls.) ½¼, hence can occur in ½¼ ways out of ½ , while out of ½ since one ball was already taken out from the box in the pick. Hence È µ ½¼ ¡ ½ ½ ´ and then we “define” (see below for additional explanations) the conditional probability È ´ È ´ µ È ´ È ´ µ µ ½¼ ¡ ¡ ½ ½ ½ ½¼ can µ as ½ Thus, we obtain the same result as the one computed directly. It suggests a definition of conditional probability that we shall discuss next. 5 Let us generalize this example. Consider a sample space event Ë and two events has occurred. Then the sample space Ë effectively reduces to the occurrence of event to those outcomes that fall into ´ , therefore, we must restrict is the new sample space. but . Therefore for equally µ as follows È . Assume . In a sense, In other words, the number of “favorable outcomes” is not likely events we compute È ´ Ë µ Observe, however, that È ´ µ ¡ Ë Ë Ë Ë È ´ È In the second line above, we multiply and divide by the probabilities È ´ µ and ´ µ. Ë ´ µ µ and then observe in the third line that we have È Actually, the last expression is used as a definition of the conditional probability. Let and denoted as È ´ be events with È ´ µ ¼. ´ µ µ, is defined as È ½¼± made by company comes from company Let to find È ´ È ´ È ´ µ ½¼¼ chips made by company given , µ ¼¼¼ chips, ½¼¼¼ of them made by company Example 8: A box contains It is known that The conditional probability of , the rest by company are defective, while only ± ¾¼¼ chips are defective. Compute the probability that if you pick up a defective chips it . be the event that a chip is made by company and that a chip is defective. We need µ, that is, the probability that provided a chip is defective it i comes from company For this we need ´ µ and ´ µ. But ½¼¼ · ¾¼¼ ¼ ¼ ´ µ ¼¼¼ ½¼¼ ¼ ¼¾ ´ µ ¼¼¼ È . È È È Then È ´ µ È ´ È ´ µ that is, one out of every three. 6 µ ¼ ¼¾ ¼¼ ½ ¿ . Independence If È ´ µ and È È ´ µ, then the knowledge of does not change the probability of are independent events. Observe that the above condition is equivalent to ´ µ ´ µ, which serves as a definition. . We say that È ´ µ È Two events and are said to be independent if and only if È ´ µ È ´ µ ´ µ È Example 8: Consider a five-bit binary string. The probability of generating a zero is equal to Ô. Bits are generated independently. What is the probability of getting ¼¼½½½? Since we have independence we easily compute È ´¼¼½½½µ È ´¼µ ¡ ´¼µ ¡ ´½µ ¡ ´½µ ¡ ´½µ È È È ¾ È Ô ´½ µ¿ Ô since ½ Ô is the probability of generating a one. Exercise 7B: Show that if and are independent events, then and are also independent events. Binomial Distribution and Bernoulli Trials In the last example, we generated five bits and asked for the probability of getting ¼¼½½½. However, if we ask for the probability of generating two ¼s and three ½s, the situation is different. This time we do not specify where the two ¼s and three ½ are located. Therefore, strings like ¼½¼½½, ½½¼¼½, etc. satisfy the description of the event. In fact, we have ´ ¾µ ´ ¿µ ways to select two zeros out of five. Thus this probability is equal to ´ ¾µ ¾ ´½ µ¿ ½¼ ¾ ´½ µ¿ Ô Ô Ô Ô and this should be compared with the answer to the previous example. For instance, if Ô the above becomes ´ ¾µ¼ ½¾ ¡ ¼ ¿ ½¼ ¡ ¼ ¼½ ¡ ¼ ¾ ¼ ½, then ¼¼ ¾ We shall generalize the last situation, and introduce the so called Bernoulli trials and the binomial distribution. Consider an experiment that has two outcomes called successes and failures. Let the probability of a success be Ô, while the probability of a failure Õ ½ . This experiment is Ô called the Bernoulli trial. Let us repeat it Ò times. Many problems in probability can be solved by asking what is the probability of successes in Ò Bernoulli trials. The last example can be viewed as five Bernoulli trials with a success being a generation of a zero. Let us now consider Ò independent Bernoulli trials with the probability of a success equal to Ô. What is the probability of obtaining successes. Since the outcomes are independent a particular trial 7 with successes has the probability Ô ´½ µ Ô out of Ò trials, therefore, the probability of ´ Ò ´ . But we can choose on Ò µ ways successes successes in Ò independent Bernoulli trials is µ ´½ µ Ò Ô Ò Ô (1) Considered as a function of , we call the above function the binomial distribution and denote it as ´ µ Ò Ô ´ Ò µ ´½ µ Ô Ô Ò . Observe that (1) is probability since by the definition of probability it sums up to one. More precisely, by Newton’s summation formula discussed in Module 5 Ò ´ Ò µ ´½ µ Ô Ô ´ ·½ µ Ò Ô Ô ½ Ò ½ Ò ¼ as needed.1 Example 9: A biased coin is thrown 7 times. The probability of throwing a tail is ¼ . What is the probability of throwing three tails in four trials? Clearly, we have the Bernoulli trials with the success being a throw of a tail. Hence, the probability is equal to after substituting Ô ´ ¿µ´¼ µ¿ ¡ ¼ ¼ ¼½ ¿ in (1). Random Variables Many problems are concerned with a numerical values associated with the outcome of an experiment. For example, we can assign value ½ to the tail when throwing a coin, and value ¼ when throwing a head. Such a numerical value assigned to an outcome is known as a random variable. A random variable is a function from the sample space of an experiment to the set of real numbers. Example 10: Let us flip a coin three times. Define a random variable ´ µ to be the number of tails that appear when Ø is the outcome. We have ´ ´ ´ ´ ÀÀÀ ÀÀÌ ÌÌÀ ÌÌÌ 1 µ µ µ µ ¼ ´ ´ µ µ ´ ´ ÀÌ À Ì ÀÌ ¿ ÀÌ Ì We recall that by Newton’s formula Ò Ò ´ · µ ´Ò ¼ 8 µ ½ µ ¾ Ì ÀÀ µ Ò Ø Having defined a random variable, we can now introduce the probability mass function. Let × Ø ¾ ´µ Ë × Ø , that is, È Ø is the subset of Ë (an event) that assigns value Ø of ´ µ Ø È ´ µ Ø ¾ × since Ø ´µ È × Ø ´µ is disjoint union of elementary events × such that . Then . × Ø Let us now discuss an important notion of probability theory, namely, the “expected value” of an experiment. For example, one expects about ¼ tails when flipping an unbiased coin ½¼¼ times. We are now in a position to definite it precisely. ´ µ over The expected value (also known as the mean value) of a random variable × ¾ Ë taking values in Ø ¾ × ´ µ is defined as × × ¾ ´µ ´µ È × × ¾ Ë Ø ØÈ ´ Ø µ ´Ë µ The above formula extends the definition of “average value” known from high school. Indeed, let all events Ø are equally likely, and assume that Ø ½¾ . WE learned in high school to Ò compute the average (expected value) as follows ½ · ¾ · ¡¡¡ Ò Ò ½ ¡ ½ · ¾ ¡ ½ · ¡¡¡ · ½ Ò Ò Ò Ò Ò ØÈ Ø ´ Ø µ ½ which coincides with the above definition. Example 11: We shall continue Example 10 assuming that the coin is fair (i.e., probability of a head or a tail is ¼ ). From the previous example we find that È ½µ ´ ¾µ È satisfying ´ È ½ ¼µ È since, for example, ´ ´ ¿µ ÀÀÌ Ì ÀÀ ÀÌ À ½ ¿ ¿ ½ , thus we have three out of ¾¿ ½ (i.e., the number of tails is equal to one). Therefore, ¼¡ ½ ·½¡ ¿ ·¿¡ ¿ ·¿¡ ½ ½ that is, on average we have ½ tails per three throws. 9 outcomes Let us now compute the expected value of the binomial distribution defined above. We define as the number of successes in Ò Bernoulli trials. Then2 Ò È ´ Ò µ µ ´½ µ ´ Ò ¼ Ô Ô Ò ¼ Ò ´ µ ´½ µ ´ ½µ ´ ½µ ´ µ ´½ µ Ò Ô Ò ¼ Ò Ò ½ Ô Ò Ô Ò Ò Ò Ô Ò ´ ½ ½µ ½ ´½ µ´ ½µ ´ ½µ ÒÔ Ò Ô Ô Ò ½ ½ Ò ÒÔ ´ ½ µ ´½ µ ½ Ò Ô Ô Ò ¼ ´ · ½ µ ½ ÒÔ Ô Ô Ò ÒÔ The first line is just the definition of the binomial distribution and the expected value. In the third line we use the following property of the binomial coefficients (see Module 4 and 6): ´ Ò µ Ò ´ µ Ò Ò ´ ½µ ´ ½µ ´ µ Ò Ò Ò In the fourth line above we change the index of summation from ´ ½ ½µ Ò ½, while in the fifth line to we apply the Newton summation formula, discussed in Module 4 which we recall below ´ · µ (In our case, Ô and ½ Ò ´ Ò µ Ò Ò ¼ .) Ô Expectation has some nice properties. For example, · · this is, the expectation of the sum of random variables is the sum of expectations. This is very important result! Let us derive it. We have · ¾ × ¾ × 2 ´ µ ´ µ· ´ µ È × × × Ë ´ µ ´ µ· È × Ë · × ¾ × ´µ ´µ È × × Ë This derivation is quite long and can be omitted in the first reading. We shall re-derive the same result in Example 13 using simpler arguments. 10 Example 13: We just computed that ÒÔ for binomially distributed . We needed a long chain of computations. But we can prove the same result using the above property in a much easier way. Observe that · ¡¡¡ · is equal to ½ when a success occurs and ¼ otherwise. ½ where · ¾ Ò Such a random variable is called the Bernoulli random variable or, more precisely, Bernoulli distributed random variable. Clearly, ½ ¡ · ¼ ¡ ´½ µ Ô Ô . Since the expectation of a sum of random variables is the sum of Ô expectations, we have · ½ · ¡¡¡ · ¾ ÒÔ Ò as before, but this time we derive it in a simple way. However, in general and is not equal to . To assure this is true one must assume are independent defined as follows: Two random variables È ´ ´µ × and Ø on the same sample space Ë are independent if ´µ × Ö µ È ´ ´µ × Ø µ¡ ´ ´ µ È × Example 14: Let us roll two dice. What is the probability of getting die. Let represent the number obtained on the first die and Ö µ on the die and on the second the number rolled on the second die. Since the events are independent, we have È ´ µ È µ¡ ´ ´ ½¡½ µ È We now prove the following result Theorem 1 Let and are independent random variables. Then Proof. We have Ø ¡ ´ ÖÈ Ø Ö µ Ø Ö Ø ¡ ´ µ ´ ÖÈ Ø È Ö µ Ø Ö ØÈ ´ µ Ø ÖÈ Ö Ø 11 ´ Ö µ ½ ¿ where in the second line we used independence, while in the third line we computed two independent sums. Finally, we shall discuss variance. The expected value of a random variable tells us its average value but says nothing about variability of it. The reader should not forget that is a random variable and it (randomly) varies. While we would like to find one synthetic number (e.g., the expected value) to describe this random variable, such a characterization is usually very poor. Therefore, we try to introduce some parameters that can tall us (in a simplified way) more about the random variable. The variance, roughly speaking, determines how widely a random variable is distributed around the expected value. Formally: Let as Î be a random variable defined on a sample space Ë . The variance of Ö , denoted , is ÎÖ ¾ × ´ µ´ ´ µ È × ´ µ¾ × µ¾ Ë ´ µ¾ . Since we , the random variable ´ µ¾ tells us That is, the variance is the expected value of the following random variable: expect that is more likely to concentrate around about variations of around the expected value. We can compute the variance using the following formula ÎÖ ¾ ¾ (2) Indeed, ´ µ¾ ¾ ¾ ¾ ¾ ¾ · ¾ · ¾ ¾ where above we used the fact that the expected value of a sum of random variables is the sum of the expected values and the following identity (let’s call it the “square of sum identity”) ´ · µ¾ ¾ ·¾ · ¾ known from high school. Example 15: Consider a Bernoulli random variable otherwise. What is the variance of We observe first that taking value ½ with probability ? ½ ¡ · ¼ ¡ ´½ µ Ô ¾ Ô . Then we compute Ô ½¾ ¡ · ¼¾ ¡ ´½ µ Ô Ô 12 Ô Ô and zero Thus, a straightforward computation gives us ÎÖ ¾ ¾ Ô Ô ¾ Ô ´½ µ Ô Ô ¡ Õ Unlike the expectation the variance of a sum of two random variables is not the sum of variances. For this to hold, we need additional assumptions, as shown below. Theorem 2. Let and be independent random variables. Then ÎÖ · In general, if , ÎÖ ½¾ ½ Ò · ¾ ÎÖ ·Î Ö are pairwise independent random variables, then · ¡¡¡ · Ò ÎÖ ½ ·Î Ö ¾ · ¡¡¡ · Î Ö Ò Proof. From (2) we have ÎÖ · ´ · µ¾ · ¾ But ´ · µ¾ where in the second line we use the identity ´ independence of and ·¾ ¾ ·¾ ¾ ·¾ · ¾ · µ¾ ¾ ¾ · ¾ · ·¾ · ¾ ¾ and in the third line we apply . Summing up, we obtain ÎÖ · ´ · µ¾ · ¾ ´ · µ¾ ´ · µ¾ ¾ ·¾ · ¾ ¾ ´ ¾ µ·´ ¾ Î Ö ·Î Ö ¾ ¾ ¾ ¾ which completes the proof. In the first line we use the fact that Î Ö ¾ ¾ (derived above), then we use again the square of sum identity, then we rearrange terms of the sum, and finally obtain the desired identity. Example 16: Let us compute the variance of the binomial distribution. We use the representation of binomial distribution from Example 13, that is, ½ · ¡¡¡ · 13 Ò where are Bernoulli distributed with Î Ö Ô ´½ µ as computed in Example 15. Therefore, Ô by the last theorem ÎÖ ÎÖ ½ · ¡¡¡ · Ò ÎÖ ½ ·¡¡¡ · Î Ö Ò ´½ µ ÒÔ Ô That is, the variance of the sum of Bernoulli distributed random variables is the sum of variances of individual random variables, and it is equal to ÒÔ´½ Ôµ. 14 ...
View Full Document

Ask a homework question - tutors are online