Unformatted Document Excerpt
Coursehero >>
Texas >>
Baylor >>
QBA 2302
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
4:
PROBABILITY CHAPTER & PROBABILITY DISTRIBUTIONS
4.1 HOW PROBABILITY CAN BE USED IN MAKING INFERENCES
We stated in Chapter 1 that a scientist uses inferential statistics to make statements about a population based on information contained in a sample. Because such statements or decisions are made under conditions of uncertainty, the use of probability concepts is required. Probability concepts help us make better decisions in the face of uncertainty, and allow us to assess the likelihood of an event happening. Graphical and numerical descriptive techniques were presented in Chapter 3 as a means to summarize and describe a sample. However, a sample is not identical to the population from which it was selected. We need to assess the degree of accuracy to which the sample mean, sample standard deviation, or sample proportion represent the corresponding population values. Suppose a company states in its promotional materials that its pregnancy test provides correct results in 75% of its applications by pregnant women. We want to evaluate the claim, and so we select 20 women who have been determined by their physicians, using the best possible testing procedures, to be pregnant. The test is taken by each of the 20 women. For all of these 20 women the test result is negative, indicating that none of the 20 women is pregnant. What do you conclude about the company's claim on the reliability of its test? Suppose you are further assured that each of the 20 women was in fact pregnant, as was determined several months after the test was taken. At what point do we decide that the result of the observed sample is so improbable, assuming the company's claim is correct, that we disagree with its claim? To answer this question, we must know how to find the probability of obtaining a particular sample outcome. Knowing this probability, we can then determine whether we agree or disagree with the company's claim. Probability is the tool that enables us to make inference. In other words, probability is the basis of inferential statistics. For example, predictions are based on probability, and hypotheses are tested by using probability.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 129
4.2 PROBABILITY, PROBABILITY EXPERIMENT, SAMPLE SPACE, EVENT, and PROBABILITY OF AN EVENT DEFINITION 4.1: PROBABILITY Probability is a value between zero and one, inclusive, describing the relative possibility (chance or likelihood) that something will happen. In practical terms, the probability of any thing is a number between 0 and 1 that describes the long-run proportion of the time that thing occurs when the experiment is performed repeatedly under identical conditions. DEFINITION 4.2: PROBABILITY EXPERIMENT A probability experiment is any planned act or process by which observations are made and/or data are collected. A probability experiment produces one and only one single observable outcome at a time and which outcome is due to chance and, as such, that cannot be predicted with certainty. DEFINITION 4.3: SAMPLE POINT A sample point is the most basic outcome of a probability experiment. EXAMPLE 4.1 Problem: A well-balanced die is tossed once, and its up faces are recorded. (A) Is this a probability experiment? Explain. (B) If so, list all the sample points for this experiment. Solution: (A) Yes, this is a probability experiment. We can observe one and only one of the basic outcomes, and the outcome cannot be predicted with certainty. (B) The six basic possible outcomes (sample points) to this probability experiment are as follows: 1. Observe a 1. 4. Observe a 4. 2. Observe a 2. 5. Observe a 5. 3. Observe a 3. 6. Observe a 6.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 130
DEFINITION 4.4: SAMPLE SPACE The sample space, denoted by the Greek letter (read omega), of a probability experiment is the collection of all its sample points. EXAMPLE 4.2 Problem: Two fair coins are tossed, and their up faces are recorded. Represent the sample space of this probability experiment in set notation. Solution: Even for a seemingly trivial experiment, we must be careful when listing the sample points. At first glance, we might expect three basic outcomes: 1. Observe two heads (HH), 2. Observe two tails (TT), or 3. Observe one head and one tail. However, further reflection reveals that the last of these, Observe one head and one tail, can be decomposed into two outcomes: A. Head on coin 1, Tail on coin 2 (HT); B. Tail on coin 2, Head on coin 1 (TH). Thus, we have four sample points: 1. Observe HH 2. Observe HT 3. Observe TH 4. Observe TT
The sample space for this experiment can be represented in set notation as a set of four sample points: = {HH, HT, TH, TT}. DEFINITION 4.5: EVENT An event is a specific collection or set of one or more sample points in a sample space. An event may consist of one or more sample points from a probability experiment. Events are denoted using capital letters such as E. DEFINITION 4.6: PROBABILITY OF AN EVENT We define the probability of an event; say E, denoted by Pr (E), is the sum of the probabilities of the sample points that constitute the event.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 131
NOTE 4.1 Probabilities have some properties that must be satisfied. 1. The probabilities of any event E, Pr (E), must be between 0 and 1 inclusive. That is, 0 Pr (E) 1. 2. If an event is impossible, the probability of the event is 0. 3. If an event is a certainty, the probability of the event is 1. 4. The probability of any event is equal to the sum of the probabilities of the sample points in the event. 5. In particular, if the sample space consists of n events such as = {E1, ..., En}, then the probability of is the sum of the probabilities of all those n events of . Symbolically, we have: Pr () = Pr (E1) + Pr (E2) + ... + Pr (En) = 1. However, events of "probability zero" are not necessarily impossible, and those of "probability one" are not always certain.
4.3 FINDING THE PROBABILITY OF AN EVENT Next, we introduce three methods for determining the probability of an event: (1) the classical method, (2) the empirical method, and (3) the subjective approach.
DEFINITION 4.7: CLASSICAL PROBABILITY Classical (or theoretical) probability is used when each basic outcome in a sample space is equally likely to happen. The classical probability utilizes rules and laws. It involves an experiment. The classical probability for an event E is given by the following formula:
Pr (E) =
Number of outcomes in the event E Total number of outcomes in the sample space
#E #
Probabilities can be expressed as fractions, decimals, or where appropriate percentages. If one asks, "What is the probability of getting a head when a coin is tossed? Typical responses can be any of the following three: "One-half", "Point five", and "Fifty percent". Note that we can apply the classical probability when the events have the same chance of occurring (called equally likely events), and the set of events are mutually exclusive and collectively exhaustive.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 132
EXAMPLE 4.3 Problem: You roll a six-sided die and record the number that falls face up. Find the probability of the following events. (A) Event A: rolling a 3. (B) Event B: rolling a 7. (C) Event C: rolling a number less than 5. Solution: When rolling a six-sided die, the sample space consists of six outcomes: = {1, 2, 3, 4, 5, and 6}, meaning the number of elements in , or # , is 6. (A) There is one outcome in event A: A = {3}, meaning the number of elements in A, written as #A, is 1. Hence, we get the following:
Pr (A) =
#A #
1 6
Pr (A) 0.167.
(B) Because 7 is not in the sample space, there are no outcomes in event B, meaning #B = #B 0 0. Hence, the probability of event B is Pr (B) = Pr (B) = 0. # 6 (C) There are four outcomes in event C: C = {1, 2, 3, and 4}, which means that #C = 4. #C 4 2 Therefore, we obtain Pr (C) = Pr (C) 0.667. # 6 3 EXAMPLE 4.4 Problem: Sophia has three tickets to a concert. Yolanda, Michael, Kevin, and Marissa have all stated that they would like to go to the concert with Sophia. To be fair, Sophia decides to randomly select the two people who can go to the concert with her. (A) Determine the sample space of the experimenter. In other words, list all possible simple random samples of size n = 2. (B) Compute the probability of the event "Michael and Kevin attend the concert." (C) Compute the probability of the event "Marissa attends the concert." Solution:
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 133
(A) The sample space, , can be expressed as: = {(Yolanda, Michael), (Yolanda, Kevin), (Yolanda, Marissa), (Michael, Kevin), (Michael, Marissa), and (Kevin, Marissa)}. (B) Let M be the event: "Michael and Kevin attend the concert". We have 6 simple random samples of size n = 2 and there is one way the event M can occur. Hence, the #M 1 probability of the event M is Pr (M) = . # 6 (C) Let A be the event: "Marissa attends the concert". We still have 6 simple random samples of size n = 2 and there are three ways the event A can occur: A = {(Yolanda, Marissa), (Michael, Marissa), and (Kevin, Marissa)}. #A 3 1 Thus, the probability of the event A is Pr (A) = 0.5. In other words, the # 6 2 probability that Marissa will attend the concert is 50%.
DEFINITION 4.8: EMPIRAL PROBABILITY A second type of probability is empirical probability. Empirical probability can be used even if each outcome is not equally likely to occur. Empirical (or statistical) probability is based on cumulated historical data. The empirical probability of an event E is the relative frequency of event E. The following equation is used to assign this type of probability: Pr (E) =
Number of times an event E occurred inthe past Total number of opportunities for the event to occur
Note that empirical probability is not based on rules or laws, but on what has happened in the past. For example, your company wants to decide on the probability that its inspectors are going to reject the next batch of raw materials from a supplier. Data collected from your company record books show that the supplier had sent your company 80 batches in the past, and inspectors had rejected 15 of them. By the method of 15 empirical probability, the probability of the inspectors rejecting the next batch is , or 80 0.19. If the next batch is rejected, the empirical probability for the subsequent shipment 16 would change to , or 0.20. 81
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 134
The difference between classical and empirical probability is that classical probability assumes that certain outcomes are equally likely (such as the outcomes when a die is rolled), while empirical probability relies on actual experience to determine the likelihood of outcomes. In empirical probability, one might actually roll a given die 6000 times and observe the various frequencies and use these frequencies to determine the probability of an outcome. EXAMPLE 4.5 Problem: With reference to Table 4.1, what is the probability that a randomly chosen family will have annual household income (A) between $20,000 and $40,000, (B) less than $60,000, (C) at one of the two extremes of being either less than $20,000 or at least $100,000?
Table 4.1: Annual household income for 500 families Category, i 1 2 3 4 5 6 Total Annual income range Less than $20,000 $20,000 - $40,000 $40,000 - $60,000 $60,000 - $80,000 $80,000 - $100,000 $100,000 and above Number of families, ni 65 100 145 120 45 25 ni = n = 500
Solution: (A) Let A be the event "chosen family has annual income between $20,000 and $40,000" and n be "the total number of families randomly chosen". Reading from Table 4.1, the frequency of this event is 100. Since the total of the frequencies is 500, the empirical probability of choosing a family with an annual income in the range of $20,000 to $40,000 is: Pr (A) =
Frequency of A n
100 500
1 5
Pr (A) = 0.20.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 135
(B) Let B be the event "chosen family has annual income less than $60,000" and n be "the total number of families randomly chosen". Based on the frequencies shown in Table 4.1, the cumulative frequency of this event is 310 (that is, 65 + 100 + 145). As the total of the frequencies is 500, the empirical probability of choosing a family with an annual income less than $60,000 is: Pr (B) =
Frequency of B n
310 500
31 50
Pr (B) = 0.62.
(C) If we denote by C the event "chosen family has annual income either less than $20,000 or at least $100,000" and if n is "the total number of families randomly chosen", the combined frequency of this event is 90 (that is, 65 + 25). Given that the total of the frequencies is 500, the empirical probability of choosing a family with the specified annual ranges is: Pr (C) =
Frequency of C n
90 500
9 50
Pr (C) = 0.18.
EXAMPLE 4.6 Problem: Suppose a survey is conducted in which 50 families with three children are asked to disclose the gender of their children. Based upon the results, it was found that 19 of the families had two boys and one girl. (A) Estimate the probability of having two boys and one girl in a three-child family using the empirical method. (B) Compute the probability of having two boys and one girl in a three-child family using the classical method, assuming boys and girls are equally likely. Solution: (A) We need to determine the relative frequency of the event "two boys and one girl". Let E be the event "a family of three children will have two boys and one girl". The empirical probability of the event E, written as, Pr (E), is: 19 Pr (E) is approximately equal to the relative frequency of E that happens to be here . 50 We therefore get Pr (E) = 0.38. There is a 38% probability that a family of three children will have two boys and one girl. (B) We must count the number of ways the event E (two boys and one girl) can occur and divide this by the number of possible outcomes for this probability experiment. Since we
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 136
have two basic outcomes, boy and girl, and the number of children (trials) is three, the number of possible outcomes for this probability experiment is: # = 23 = 8, which are (boy, boy, boy), (boy, boy, girl), (boy, girl, boy), (girl, boy, boy), (boy, girl, girl), (girl, boy, girl), (girl, girl, boy), and (girl, girl, girl). Hence, the sample space of this experiment may be written as = {BBB, BBG, BGB, GBB, BGG, GBG, GGB, GGG}. For the event E = {two boys and one girl}, we get E = {BBG, BGB, GBB} or # E = 3. Since the outcomes are equally likely (for example, BBG is just as likely as BGB), the probability of E is #E 3 Pr (E) = , which reduces to Pr (E) = 0.375. # 8 There is a 37.5% probability that a family of three children will have two boys and one girl. In comparing the results of Example 4.6 (A) and Example 4.6 (B), we notice that the two probabilities are slightly different. Empirical probabilities and classical probabilities often differ in value, however, as the number of repetitions in a probability experiment increases, the empirical probability gets closer to the classical probability. That is, the classical probability is the theoretical relative frequency of an event after a large number of trials of the probability experiment. It is also possible that the two probabilities differ because having a boy or girl are not equally likely events. (Maybe the probability of having a baby boy is 50.2% and the probability of having a baby girl is 49.8 %.) If this is the case, then the empirical probability will not get closer to the classical probability. DEFINITION 4.9: SUBJECTIVE PROBABILITY By the subjective probability approach, the probability of an event is the degree of belief by an individual that the event will occur, based upon all evidence available to the individual or employing opinions and inexact information. In a subjective probability, a person or group makes an educated guess at the chance that an event will occur. This guess is based on the person's experience and evaluations of a solution. For example, three different economists were asked "What is the probability the economy will fall into recession next year?" Each economist provided a different answer. The first economist said the probability is about 25%. The second economist was gloomier and said the probability is about 50%. Finally, the third economist stated the probability was about 10%. How can three well-trained economists have such different opinions regarding the probability of a recession? Because the probabilities they stated are educated guesses based upon information they currently have available. The differences in their probabilities come from the fact that people interpret information differently!
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 137
All three types of probability (classical, empirical, and subjective) are used to solve a variety of problems in business, engineering, and other fields. 4.4 BASIC EVENT RELATIONS and PROBABILITY LAWS DEFINITION 4.10: MUTUALLY EXCLUSIVE EVENTS Two events A and B are said to be mutually exclusive if (when the experiment is performed a single time) the occurrence of one of the event excludes the possibility of the occurrence of the other event. The concept of mutually exclusive events is used to specify a second property that the probabilities of events must satisfy. When two (or even more) events are mutually exclusive, then the probability that either one of the events will occur is the sum of the event probabilities. If two events, A and B, are said to be mutually exclusive, the probability that either A or B occurs is Pr (either A or B) = Pr (A B) = Pr (A) + Pr (B). Observe that Pr (A B) is read as "the probability of A union B".
EXAMPLE 4.7 Problem: Suppose that we toss a pair of dice and define the following events: A: A sum equal to 2 appears. B: A sum equal to 3 appears. C: A sum equal to 4 appears. Find the probability that a sum is less than or equal to 4 shows up. Solution: Let's define the event: S = {Sum is less than or equal to 4 shows up}. For this particular experiment, the dice can fall in 36 (or 6 2) different equally likely ways. We can observe a 1 on die 1 and a 1 on die 2, denoted by the symbol (1, 1). We can observe a 1 on die 1 and a 2 on die 1, denoted by (1, 2). In other words, for this experiment, the possible outcomes are presented in Table 4.2 below.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 138
Table 4.2: Sample space for rolling two dice Die 2 Die 1 1 2 3 4 5 6 1 (1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1) 2 (1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2) 3 (1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3) 4 (1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4) 5 (1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5) 6 (1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6)
As you can see, only one of these events, (1, 1), will result in a sum equal to 2. Therefore, we would expect a 2 to occur with a relative frequency of 1/36 in a long series of 1 repetitions of the experiment, and we let Pr (A) = Pr (Sum = 2) = . 36 The sum equal to 3 will occur if we observe either of the outcomes (1, 2) or (2, 1). Therefore, we have Pr (B) = Pr (Sum = 3) = 2/36. Similarly, we find Pr (C) = 3 Pr (Sum = 4) = . 36 The events A, B, and C are mutually exclusive in that if you observe event A (a total of 2), you could not at the same time observe neither event B (a total of 3) nor event C (a total of 4). Thus, if A occurs, neither B nor C can occur at the same time (and vice versa). It follows that Pr (Sum 4) = Pr (A B C) = Pr (A) + Pr (B) + Pr (C) = Pr (Sum = 2) + Pr (Sum = 3) + Pr (Sum = 4) 1 2 3 6 = . 36 36 36 36 Hence, the probability of getting a sum of at most 4 is Pr (S) = Pr (Sum 4) = 1/6. A third property of event probabilities concerns an event and its complement.
DEFINITION 4.11: COMPLEMENT OF EVENT E
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 139
The complement of event E is the set of all outcomes in a sample space that are not included in event E. The complement of event E is denoted by E' and is read as "E prime" or by E (E bar). The complement of event E may also be written as "not E". For instance, if you roll a die and let E be the event "the number is at least 5". In other words, E = {5, 6} and E' = {1, 2, 3, 4}. Using the definition of the complement of an event and the fact that the sum of the probabilities of all outcomes is 1, you can determine the following formulas: Pr (E) + Pr (E') = 1 Pr (E) = 1 Pr (E') Pr (E') = 1 Pr (E)
We can now define two additional event relations: the union and the intersection of two events.
DEFINITION 4.12: THE UNION OF TWO EVENTS The union of two events A and B is the set of all outcomes that are included in either A or B (or both). The union between event A and event B is often denoted as A B, and it is read as "A union B".
DEFINITION 4.13: THE INTERSECTION OF TWO EVENTS The intersection of two events A and B is the set of all outcomes that are included in both A and B. The intersection is often denoted as A B, and it is read as "A intersect B".
The additivity of probabilities for mutually events, called the addition law for mutually exclusive events, can be extended to give the general addition law.
DEFINITION 4.14: THE PROBABILITY OF THE UNION AND INTERSECTION Consider two events A and B; the probability of the union of A and B is Pr (A B) = Pr (A) + Pr (B) - Pr (A B).
EXAMPLE 4.8
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 140
Problem: Suppose that the probability of inflation "event I" is 0.3, the probability of recession "event R" is 0.2, and the probability of inflation and recession is 0.06. (A) Are the event "I" and the event "R" mutually exclusive? (B) Use the appropriate formula to compute the probability of Inflation I or Recession R. Solution: (A) The events I and R are not mutually exclusive, since the probability of their intersection is NOT zero. In fact, their probability is Pr (I and R) = 0.06. (B) Since the events are not mutually exclusive, we use the formula given in DEFINITION 4.14: Pr (I or R) = Pr (I) + Pr (R) Pr (I and R) Pr (I R) = Pr (I) + Pr (R) Pr (I R) = 0.30 + 0.20 0.06 Hence, we have: Pr (Inflation or Recession) = 0.44. EXAMPLE 4.9 Problem: A CBS News/News York Times poll found that of 764 adults surveyed nationwide, 34% felt that we are spending too much on space exploration, 19% felt that we are spending too little, 35% felt that we are spending the right amount, and the rest said "don't know" or had no answer. If one of the respondents is selected at random, (A) What is the probability the person felt that we are NOT spending the right amount? (B) What is the probability the person felt that we are spending the right amount or too little? Source: http://www.pollingreport.com Solution: (A) If the event R is {A respondent felt that we are spending the right amount}, then, we have Pr (R) = 0.35. The question is to determine Pr ( R ) or Pr (R').
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 141
Using the formula for the complement of an event, we get: Pr ( R ) = Pr (not R') Pr (R') = 1 Pr (R) = 1 0.35. Hence, the probability of the complement of event R is: Pr (not R) = Pr ( R ) = Pr (R') = 0.65. (B) If the event T is defined as {A respondent felt that we are spending too little}, then events R and T are mutually exclusive. The probability of their union would be Pr (R T) = Pr (R) + Pr (T) = 0.35 + 0.19, or Pr (R T) = 0.54. EXAMPLE 4.10 Problem: A company has 140 employees, of which 30 are supervisors. Eighty of the employees are married, and 20% of the married employees are supervisors. If a company employee is randomly selected, what is the probability that the employee is married and is a supervisor? Solution: Let M denote "married" and S denote "supervisor". The question is: Pr (M S) =? A two-way table of cross classification, called a contingency table (or cross-tabulation table) might helpful to solve the problem without applying any formulae. It is shown in the Table 4.3 below: Table 4.3: Contingency table for marital status and employee's rank variables Employee's hierarchical rank Marital status Married Not Married Total Supervisor 16 14 30 Not Supervisor 64 46 110 Total 80 60 140
We then notice that Pr (M S) =
Number of married emplyees 16 Total number of employees 140
4 . 35
Hence, 11.43% of the 140 employees are married and are supervisors. 4.5 INDEPENDENCE DEFINITION 4.15: INDEPENDENT EVENTS Two events are independent if the occurrence or non-occurrence of one does not change the probability that the other will occur. If two events A and B are independent, then we use the following formula to compute the probability of the event A and B.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 142
Pr (A and B) = Pr (A & B) = Pr (A B) = Pr (A)
Pr (B).
EXAMPLE 4.11 Problem: A die is to be rolled and we are to observe the number that falls face up. (A) Find the probabilities for these events: (i) (ii) (iii) A: "Observe a 6". B: "Observe an even number". C: "Observe a number greater than 2".
(B) Which of the events (A, B, and C) are independent? Which events are mutually exclusive?
Solution: (A) = {1, 2, 3, 4, 5, 6}. (i) (ii) (iii) (B) (i) A = {6} Pr (A) = #A/# = 1/6. B = {2, 4, 6} Pr (B) = #B/# = 3/6 or 1/2. C = {3, 4, 5, 6} Pr (C) = #C/# = 4/6 or 2/3.
Independent events and mutually exclusive events are determined below. A, B, and C are NOT disjoint events or are NOT mutually exclusive events in that they have the element {6} in common. First, we know that: (A B) = {6} Pr (A B) = # (A B)/# = 1/6, which is different from Pr (A) Pr (B) = 1/6 = 1/12. As a result, A and B are NOT two independent events. Second, we have: (B C) = {4, 6} Pr (B C) = # (B C)/# = 2/6 = 1/3, but Pr (B) Pr (C) = 2/3 = 1/3. Hence, as Pr (B C) = Pr (B) Pr (C), the multiplication rule applies here because the events B and C are independent.
(ii)
EXAMPLE 4.12 Problem: A statistical study by the U.S. Energy Information Administration found that 84.3% of U.S. households with incomes under $10,000 did not own a dishwasher while
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 143
only 21.8% of those in the over-$50,000 income range did not own a dishwasher. If one household is randomly selected from each income group, determine the probability that (Source: Bureau of Census, Statistical Abstract of the United States 2001, p. 611) (A) both households will own a dishwasher. (B) either household will own a dishwasher. (C) neither household will own a dishwasher. (D) the lower-income household will own a dishwasher, but the higher-income household will not. (E) the higher-income household will own a dishwasher, but the lower-income household will not. Solution: Let's first describe the following events: Event L: L = {lower-income household did not own a dishwasher} Event H: H = {higher-income household did not own a dishwasher} Sample space, : = {L, H} We may then answer the questions now. (A) Events L and H being independent, so are their complementary events L and H , respectively. Therefore, the probability of their intersection is equal to the product of their individual probabilities: Pr ( L H ) = Pr ( L ) Pr ( H ) = [1 Pr (L)] [1 Pr (H)] = (1 0.843) (1 0.218) = 0.157 0.782 Pr (L) = 0.843 Pr (H) = 0.218
The answer is: Pr ( L H ) = 0.123. (B) This is the probability either L or H , or both will occur. Pr ( L H ) = Pr ( L ) + Pr ( H ) Pr ( L H ) = 0.157 + 0.782 0.123. The answer is: Pr ( L H ) = 0.816. (C) Pr (L H) = Pr (L) Pr (H) = 0.843 0.218, or Pr (L H) = 1 Pr ( L H )
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 144
= 1 0.816. In both cases, the answer is: Pr (L H) = 0.184.
(D) Pr ( L H) = Pr ( L ) * Pr (H) = [1 Pr (L)] * Pr (H) = (1 0.843) * 0.218 = 0.157 * 0.218, or Pr ( L H) = Pr (H) Pr (L H) = 0.218 0.184. In both cases, the answer is: Pr (L H ) = 0.034.
(E) Pr (L H ) = Pr (L) * Pr ( H ) = Pr (L) * [1 Pr (H)] = 0.843*(1 0.218) = 0.843 * 0.782, or Pr (L H ) = Pr (L) Pr (L H) = 0.843 0.184. In both cases, the answer is: Pr (L H ) = 0.659.
4.6 DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 4.6.1 RANDOM VARIABLE
DEFINITION 4.16: RANDOM VARIABLE
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 145
A random variable is a quantitative variable that contains the outcomes of a probability experiment. That is, a function or rule that assigns a number to each outcome of an experiment.
4.6.2 DISCRETE RANDOM VARIABLE DEFINITION 4.17: DISCRETE RANDOM VARIABLE A discrete random variable is a random variable that can assume only certain clearly separated values, even though the list may continue indefinitely. EXAMPLE 4.13 Problem: Is the "number of children per family" a discrete random variable? If yes, explain why? Solution: Yes. The "number of children per family" is a random variable because it varies from one family to another and its value depends on which particular family chosen. Furthermore, it is a discrete random variable in that its possible values can be listed individually one by one and they may be as follows: 0, 1, 2, 3, ..., 10 000.
4.6.3 RANDOM-VARIABLE NOTATION Uppercase letters such as X, Y, and Z will be used to represent random variables, while lowercase letters such as x, y, and z will designate a particular value taken by X, Y, and Z, respectively. 4.6.4 PROBABILITY DISTRIBUTIONS DEFINITION 4.18: PROBABILITY DISTRIBUTION The probability distribution is a table, formula, or graph that describes the values of a random variable and the probability associated with each of these values.
EXAMPLE 4.14 Problem: Dr. Weiss, a business statistics Professor at Baylor University, once asked his students "To the nearest hour, how many hours did you spend working on the first takehome midterm test?" Table 4.4 presents an ungrouped-data frequency table for that information. Let X denote the number of hours spent working on the first take-home test by a randomly selected student.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 146
(A) What is the probability that you spent four hours working on the first take-home test? (B) Determine the probability distribution of the random variable X. (C) Construct a probability histogram for the random variable X.
Table 4.4: Hours of work spent on the first take-home test by BU business statistic students Hours of work, x 0 1 2 3 4 5 6 7 8 Total Solution: (A) The probability that "the student selected spent four hours working on the first takehome test" will be written as: Number of students, ni 16 25 37 50 26 23 14 7 2 n = 200
26 = 0.13 (from Table 4.4 above). It is here the same as the 200 n corresponding relative frequency ( i ). n
Pr (X = x) = Pr (X = 4) = (B) Probability distribution of X (same as relative-frequency table).
ni 50 = = 0.25. n 200 Table 4.5: Probability distribution of the number of work hours spent on first test
For instance, Pr (X = 3) = Hours of work, x 0 1 2 3 Probability, Pr (X = x) 0.080 0.125 0.185 0.250
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 147
4 5 6 7 8 Total
0.130 0.115 0.070 0.035 0.010 1.000
Note that the sum of all the probabilities is always equal to 1. (C) Probability histogram for the random variable X
Step-by-step instructions for easy reference
STEP 1: Enter the eleven probabilities 0, 0.08, 0.125, 0.185, 0.250,..., 0.010, and 0 into
column A (from cell A1 thru A11).
STEP 2: Enter the label numbers 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0 into column B (from
B1 thru B11).
STEP 3: Highlight cells A1 thru A11. STEP 4: From the main menu, click on Insert; then click on Chart. STEP 5: Select Column under Chart type. Then click on Next. STEP 6: Click on Series, then in the Name dialog box type in Value of X. In the
Category (X) axis labels dialog box, type in =Sheet1!B1:B11. When done, click on Next.
STEP 7: Click on Titles, then in the Chart title dialog box, type in Hours Spent on
First Take-Home Test. In the Category (X) axis dialog box, type in Hours and in the Value (Y) axis dialog box, type in Probability.
STEP 8: Click on Gridlines and under Value (Y) axis dialog box, click in the first slot
to uncheck Major Gridlines. STEP 9: Press Next and then press Finish. As you wish, you may enlarge the generated chart.
STEP 10: In order to remove the gaps between bars, double click on any bar. The
Format Data Point window will open. Reduce the Gap width by typing in the slot 0 (zero) in lieu of 150. Finally, press OK when done.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 148
STEP 11: To remove the zero frequency rows, simply delete the two "zeros" in Row 1
and the two other "zeros" in Row 11.
Excel produces the following probability histogram.
Figure 4.1: Probability histogram for the number of hours spent working on test
Hours Spent Working on First Take-Home Test
0.3
0.25
0.25
0.2
0.185
Probability
0.15 0.125 0.1 0.13 0.115 0.08 0.07 0.05 0.035 0.01 0 0 1 2 3 4 5 6 7 8 Number of hours
4.6.5 THE MEAN & STANDARD DEVIATION OF A DISCRETE RANDOM VARIABLE The mean and standard deviation of a discrete random variable are analogous to the population mean ( ) and population standard deviation ( ). The mean of a discrete random variable X, denoted by [or E(X)], is defined as the long-run average of occurrences and is computed as follows.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 149
= E(X) = x where
Pr (X = x),
E(X) is the long-run average, x is an outcome, and Pr (X = x) is the probability of that outcome. The terms expected value and expectation are commonly used in place of mean.
The standard deviation of a discrete random variable X, denoted by , is defined by
X
(x
X
) 2 Pr( X
x)
x 2 Pr( X
x)
2
,
which may be expressed also
as
E ( X 2 ) [ E ( X )]2 .
NOTE 4.2 The square of the standard deviation is called the variance, denoted by random variable X.
2 X
, of the
EXAMPLE 4.15 Problem: Ashley Field, the associate dean of a business school, has applied for the position of dean of the school of business at a much larger university. The salary at the new university has been advertised as $125,000, and Ashley is very excited about this possibility for making a big career move. She has been told by friends within the administration of the larger university that her rsum is impressive and her chances of getting the position she seeks are "about 65%." If Ashley stays at his current position, her salary next year will be $75,000. Assuming that her friends have accurately assesses her chances of success, (A) What is Ashley's expected salary for next year? (B) What standard deviation is associated with this expected salary for next year? Solution: We start by constructing a probability distribution for the random variable X, where X represents the annual year at either university.
Table 4.6: Probability distribution of annual salary
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 150
Annual salary, x Probability, Pr (X = x)
Larger University $125,000 0.65
Current University $75,000 0.35
Total 1.00
(A) The expected value for the random variable X, E (X): Applying the formula E (X) = x Pr (X = x) yields
E (X) = ($125,000 0.65 + $75,000 0.35) = $107,500. Hence, the associate dean Ashley Field would expect a salary of $107,500 for next year.
(B) The population standard deviation for the random variable X, Substituting into the population standard deviation formula
X
X:
x 2 Pr( X
x) [ E ( X )]2
leads to
X
X
(125,000) 2 0.65 (75,000) 2 0.35 (107,500) 2 or
568,750,000 23,848.48 .
12,125,000,000 11,556,250,000]
Hence, the standard deviation associated with the above expected salary for next year will be $23,848.48.
4.7 THE BINOMIAL DISTRIBUTION
DEFINITION 4.19: BINOMIAL EXPERIMENT A binomial experiment is a probability experiment that satisfies the five following characteristics: 1) The experiment consists of fixed n identical trials. 2) Each trial can have only two outcomes or outcomes that can be reduced to two mutually exclusive outcomes. These outcomes can be considered as either success (denoted S) or failure (denoted by F). 3) The probability of success on a single trial is equal to and remains the same from trial to trial.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 151
4) The outcomes of each trial (repetition of the experiment) must be independent of each other. In other words, the outcome of one trial does not influence the outcome of any other trial. 5) The random variable X is the number of successes observed during the n trials. DEFINITION 4.20: BINOMIAL RANDOM VARIABLE A binomial random variable is defined as the number of successes in the n trials of a binomial experiment. A binomial random variable; say X, is a discrete random variable in that it may take on any integer value from 0 up to n, x = 0, 1, 2, 3, ..., n.
EXAMPLE 4.16 Problem: An article in March 5, 1998, issue of The New England Journal of Medicine discussed a large outbreak of tuberculosis. One person, called the index patient, was diagnosed with tuberculosis in 1995. The 232 co-workers of the index patient were given a tuberculin-screening test. The number of co-workers recording a positive reading on the test was the random variable of interest. Did this study possess the properties of a binomial experiment? Solution: To answer the question, we check this experiment against each of the five characteristics of the binomial experiment to determine whether they were satisfied. 1) Were there n identical trials? YES. There were n = 232 co-workers who had approximately equal contact with the index patient. 2) Did each trial result in one of two outcomes? YES. Each co-worker recorder recorded either a positive or negative reading on the test. 3) Was the probability of success, , the same from trial to trial? YES, if the co-workers had equivalent risk factors and equal exposures to the index patient. 4) Were the n trials independent? YES. The outcome of one screening test was unaffected by the outcome of the other n 1 screening tests. 5) Was the random variable of interest to the experimenter the number of successes X in the 232 screening tests? YES. The number of co-workers who obtained a positive reading on the screening test was the variable of interest. All five characteristics were satisfied, so the tuberculin-screening test represented a binomial experiment.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 152
EXAMPLE 4.17 Problem: A television cable company plans to conduct a survey to determine the fraction of households in the city that would use the cable television service. The sampling method is to choose a city block at random and the survey every household on that block. This sampling technique is called cluster sampling. Suppose 10 blocks are so sampled, producing a total of 124 household responses. Let X be the number of the 124 households that would use the television cable service. Can X be considered as a binomial random variable? Solution: 1) Are there n identical trials? YES. There are n = 124 households potentially using the television cable service. 2) Does each trial result in one of two outcomes? YES. The survey produces dichotomous responses: YES (if household is using the television cable service) and NO (otherwise). 3) Is the probability of success, , the same from trial to trial? YES, since remain equal to 1 whenever a household's answer is yes. would
4) Were the trials independent? NO. The responses of households within a particular block would be dependent, since the households within a block tend to be similar with respect to income, level of education, and general interests. Thus, the binomial model would not be satisfactory for X if the cluster sampling technique were employed.
DEFINITION 4.21: BINOMIAL DISTRIBUTION The outcomes of a binomial experiment and the corresponding probabilities of these outcomes are called a binomial distribution.
NOTE 4.3 As the word binomial indicates, any single trial of a binomial experiment contains only two possible, mutually exclusive outcomes. These two outcomes are labeled success or failure. Usually the outcome of interest to the researcher is labeled a success.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 153
For example, if a quality control analyst is looking for defective products, he would consider finding a defective product a success even though the company would not consider a defective product a success. If researchers are studying left-handedness, the outcome of getting a left-handed person in a trial of an experiment is a success. The other possible outcome of a trial in a binomial experiment is called a failure. The word failure is used only in opposition to success. In the preceding experiments, a failure could be to get an acceptable part (as opposed to a defective part) or to get a right-handed person (as opposed to a left-handed person). 4.7.1 NOTATION FOR THE BINOMIAL DISTRIBUTION Their notations and those of their respective probabilities follow. Pr (S): The symbol for the probability of success. Pr (F): The symbol for the probability of failure. : The numerical probability of a success. 1 - : The numerical probability of a failure. Pr (S) = and Pr (F) = 1 . n: The number of trials. x: The number of successes (0
x
n).
4.7.2 BINOMIAL PROBABILITY FORMULA In a binomial experiment, the probability of exactly "x" successes in N trials is n! x Pr (X = x) = (1 ) n x , where n! = 1 2 3 ... (n 1), and for x = x! (n x)! 0, 1, 2, 3, ..., n.
EXAMPLE 4.18 Problem: Let us consider the purchase decisions of the next three customers who enter the Martin Clothing store. On the basis of past experience, the store manager estimates the probability that any one customer will make a purchase is 0.30. What is the probability that two of the next customers will make a purchase? Solution: The experiment can be described as a sequence of three identical trials, one trial for each of the three customers who will enter the store. Hence, we have: n = 3. There are two possible, mutually exclusive outcomes: The customer makes a purchase (success or S) or the customer does not make a purchase (failure or F). The probability of success, Pr (S), is 0.30 and the probability of failure, Pr (F), is 1 - 0.30, or 0.70,
Dr. LOHAKA QBA 2302 Chapter PROBABILITY 4: & PROBABILITY DISTRIBUTIONS Page 154
respectively. The sample space associated with this probability experiment is defined as follows. # = 2n = 23 = 8 = {SSS, SSF, SFS, FSS, SFF, FSF, FFS, FFF}.
If we define the random variable X as X = {Number of customers making a purchase}, then the probability that exactly two of the next three customers will make a purchase, written as Pr (X = 2), is: (A) Using the probability distribution of X, the number of successes Pr (X = 2) = Pr ({SSF}) + Pr ({SFS}) + Pr ({FSS}) = 0.3 0.3 0.7 + 0.3 0.7 0.3 + 0.7 0.3 0.3 Hence, we get Pr (X = 2) = 0.189. (B) Using the Binomial distribution n = Number of customers making a purchase = 3 = Probability of success = Probability that any one customer making a purchase = 0.3 X ~ Binomial (x; , n) Now, we may compute Pr (X = 2) as 3! Pr (X = 2) = (0.30) 2 (1 0.30) 3 (3 2)! 2! the following: Pr (X = 2) = 0.189.
Pr (X = 2) = 3 (0.3 0.3 0.7).
2
6 (0.063) 2
3 (0.063) , which reduces to
(C) Using Microsoft EXCEL In any empty cell type =BINOMDIST(x,n, ,FALSE). Here we have: =BINOMDIST(2,3,0.3,FALSE), then press ENTER. The answer is: 0.189. EXAMPLE 4.19 Problem: Melinda Brown is a Baylor University student taking a business statistics course. Unfortunately, Melinda is not a good student. Melinda does not read the written notes distributed by the instructor before class, does not do homework assignments, and regularly misses class. Melinda intends to rely on luck to pass the next in-class test. The test consists of 10 multiple-choice questions. Each question has five possible answers, only one of which is correct. Melinda, who of course does not have a clue about the subject matter, plans on guessing the answer to each question. What is the probability that Melinda gets no answers correct? Solution:
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 155
We have the following: - Number of trials, here number of questions, n: n = 10 - Probability of success, probability of guessing the correct answer, : = 1/5 The probability of getting no answers correct may be expressed as Pr (X = 0) =? Applying the formula for computing Pr (X = X)
n! ( ) x (1 ) n x in a Binomial experiment yields (n x)! x! 10! 1 4 1048576 Pr (X = 0) = ( ) 0 ( )10 0 0.107374182. (10 0)! 0! 5 5 9765625
Pr (X = x) = Pr (X = 0) = 0.1074.
Using Microsoft EXCEL
In any empty cell type in =BINOMDIST(0,10,1/5,FALSE), then press ENTER. The answer is 0.107374182.
EXAMPLE 4.20 Problem: A student majoring in accounting is trying to decide on the number of firms to which he should apply. Given his work experience and grades, he can expect to receive a job offer from 70% of the firms to which he applies. The student decides to apply to only four firms. What is the probability that he receives (A) Exactly no job offers? (B) At most three job offers? (C) At least three job offers?
Solution: What is given follows. - Probability of success (being hired), : - Number of trials (firms), n: n = 4 = 70%
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 156
(A) Let X be the number of job offers received, then we have here: x = 0. Since X follows a binomial (x; = 0.25, n = 20), its corresponding probability is given below:. Pr (X = 0) =
4! (0.70) 0 (0.30) 4 (4 0)! 0!
Hence, we get Pr (X = 0) = 00081. (This result may be obtained by any scientific calculator).
Using Microsoft EXCEL
In any empty cell type =BINOMDIST(0,4,0.70,FALSE) then press ENTER. The answer is: 0.0081. (B) "At most 3" means that the random variable X takes on any integer value from 0 to 3. Pr (X 3) = Pr (X = 0) + Pr (X = 1) + Pr (X = 2) + Pr (X = 3) = 0.0081 + 0.0756 + 0.2646 + 0.4116
Pr (X
3) = 0.7599.
Using Microsoft EXCEL
In any empty cell type =BINOMDIST(3,4,0.70,TRUE) then press ENTER. The answer is: 0.7599. (C) "At least 3" means that the random variable X takes on any integer value from 3 to 4. Pr (X 3) = Pr (X = 3) + Pr (X = 4) = 1 [Pr (X 3)] = 1 Pr (X 2) = 1 Pr (X = 0) Pr (X = 1) Pr (X = 2) = 1 - 0.0081 - 0.0756 - 0.2646 = 1 - 0.7599 Pr (X 3) = 0.6517.
Using Microsoft EXCEL
In any empty cell type =1-BINOMDIST(2,4,0.70,TRUE) then press ENTER. The answer is: 0.6517. 4.7.3 MEAN, VARIANCE, AND STANDARD DEVIATION FOR THE BINOMIAL DISTRIBUTION Without using the binomial table or the probability distribution, there exist some straightforward formulas to compute the mean, the variance, and the standard deviation of a random variable X.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 157
Mean,
X
or E(X): E(X) =
2 X
=n
X
Variance, as
X
:
2 X
=n
).
(1 ) and the standard deviation,
is written
n
(1
EXAMPLE 4.21 Problem: Suppose that the Bank of America of Waco has recently begun a new credit program. Customers meeting certain credit requirements can obtain a credit card that is accepted by participating area merchants and which carries a discount. Records since the new program started show that 30 percent of all applicants for this card are rejected. Given that credit acceptance or rejection is a Binomial process, out of 15 applicants, what is the probability that (A) exactly 3 will be rejected? (B) fewer than 4 will be rejected? (C) more than 6 will be rejected? (D) between 10 and 12, inclusively, will be rejected? (E) Find the expected value, variance, and standard deviation of the number of credit card applicants who will be rejected. Solution: (A) We have The sample size, n: n = 15, the proportion of the population who was denied this credit card since the inception of the program, , is: = 0.30, and the random variable X is defined as follows. X = {Number of credit card applicants that will be rejected}. The question is: Pr (X = 3) = ? if X ~ Binomial(x; n = 15, = 0.30).
Using Microsoft EXCEL
In any empty cell, type in any empty cell he function =BINOMDIST(3,15,0.30,FALSE) then press the ENTER key. The answer is: Pr (X = 3) = 0.170040. (B) Pr (X 4) = Pr (X 3) = ?
Using Microsoft EXCEL
In any cell type =BINOMDIST(3,15,0.30,TRUE) then press ENTER.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 158
The answer is: Pr (X
4) = 0.296868.
(C) Pr (X
6) = 1 Pr (X
6) = ?
Using Microsoft EXCEL
In any cell type =1-BINOMDIST(6,15,0.30,TRUE) then press ENTER. The answer is: Pr (X 6) = 0.13114.
(D) Pr (10
X
12)
Pr (X = 10) + Pr (X = 11) + Pr (X = 12) = ?
Using Microsoft EXCEL
In any empty cell type the functions =BINOMDIST(10,15,0.30,FALSE)+ BINOMDIST(11,15,0.30,FALSE)+BINOMDIST(12,15,0.30,FALSE) then press ENTER. The answer is: Pr (10 X 12) = 0.003644. (E) The mean of X: E (X), E (X) = n = 15 0.30 E (X) = 4.5. On the average, we would expect 4 or 5 credit card applications to be denied. The variance of the random variable X is
2 X
2
X:
=n
(1 ) = 4.5
0.70
2 X
= 3.15, and
X:
The standard deviation of the random variable X is
X
n
(1
)
4.5 0.7
3.15
X
= 1.775.
EXAMPLE 4.22 Problem: A coin is tossed four times. Find the mean, variance, and standard deviation of the number of heads that will be obtained. Solution: We have the following: The sample size, n: n = 4, the population proportion, : = , and the random variable X, which is defined as X = {Number of heads obtained after tossing a coin four times}.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 159
The questions are: Mean of X, E (X) = ?, variance of X, Var (X) or deviation of X, X = ? Mean, E (X): E (X) = n = 4 0.50 2 2 Variance, X : X = n (1 ) = 2 Standard deviation,
X:
X
2 X
= ?, and standard
E (X) = 2. 2 0.50 X = 1.
) 2 0.5 1
X
n
(1
= 1.
4.8 THE NORMAL DISTRIBUTION 4.8.1 CONTINUOUS RANDOM VARIABLE DEFINITION 4.20: CONTINUOUS RANDOM VARIABLE A random variable that may assume any numerical value in an interval or collection of intervals is called a continuous random variable. Thus, continuous random variables have no gaps or non assumed values. It could be said that continuous random variables are generated from experiments in which things are "measured" not "counted". For example, consider an experiment of monitoring incoming telephone calls to the claims office of a major insurance company. Suppose the random variable of interest is X = {Time between consecutive incoming calls in minutes}. This random variable may assume any value in the interval x 0. Actually, an infinite number of values are possible for x, including values such as 1.26 minutes, 2.571 minutes, 4.33333 minutes, and so on. Once continuous data are measured and recorded, they become discrete data because the data are rounded off to a discrete number. Thus in actual practice, virtually all business data are discrete. However, for practical reasons, data analysis is facilitated greatly by using continuous distributions on data that were continuous originally. 4.8.2 IMPORTANCE OF THE NORMAL DISTRIBUTION Probably the most widely known and used of all probability (density) distributions for describing a continuous random variable is the normal probability distribution (or simply normal distribution). Because of its many applications, the normal distribution is an extremely important distribution. One reason the normal distribution is so important is because it fits many human characteristics, such as height, weight, length, speed, IQ, scholastic achievement, and years of life expectancy, among others. For instance, measures of reading ability, introversion, job satisfaction, and memory are among the many psychological and educational variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close. A second reason the normal distribution is important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 160
from normal distributions. Almost all statistical tests discussed in this text assume normal distributions. Many variables in business and industry also are approximately normally distributed. Some examples of variables that could produce roughly normally distributed measurements include the annual cost of household insurance, the cost per square foot of renting warehouse space, and managers' satisfaction with support from ownership on a five-point scale. In addition, most items produced or filled by machines are more or less normally distributed. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality. Finally, if the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth from raw scores to percentiles.
4.8.3 INTRODUCTION TO THE NORMAL DISTRIBUTION 4.8.3.1 DEFINITION AND PROPERTIES OF THE NORMAL DISTRIBUTION
DEFINITION OF THE NORMAL PROBABILITY DISTRIBUTION DEFINITION 4.22: NORMAL PROBABILITY DISTRIBUTION A continuous random variable is normally distributed or has a normal probability distribution if its relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric). APPLICATIONS AND PROPERTIES OF THE NORMAL DISTRIBUTION The Normal distribution is one which appears in a variety of statistical applications. One reason for this is the Central Limit Theorem. This theorem tells us that sums of random variables are approximately normally distributes if the number of observations is large. For example, if we toss a coin, the total number of heads approaches normality if we toss the coin a lot of times. Even when a distribution may not be normal, it may still be convenient to assume that a normal distribution is a good approximation. In this case, many statistical procedures, such as t-test, can still be used. The Normal distribution is completely specified by two parameters: the mean ( ) and 2 the variance ( ). The mean of a Normal distribution locates of the center of the density and can be any real number. The distribution is symmetrical with mean, mode, and median all equal at mu (). The variance of a Normal distribution measures the variability of the density and can be any positive real number. The standard deviation () is the square root of the variance and is used more for its interpretability. The variance 2 ( ) is used more for its nice mathematical features.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 161
For a Normal random variable given by X ~ N ( , (p.d.f.) is
2
) the probability density function
f ( x)
e
(x
) 2 /( 2
2
)
2
............................................................................... (1)
where stands for the mean of the random variable X (+ ), is the standard deviation of X, ( 0), 3.14159, e 2.71828, and the random variable X may take on any real number, X . Figure 4.2 shows the probability density function (p.d.f.) for a normal random variable 2 with mean and variance . Figure 4.2: Normal curve or p.d.f.
The normal distribution exhibits the following characteristics. It is a continuous distribution. It is symmetrical distribution about the mean. Each half of the distribution is a mirror image of the other half. It is asymptotic to the horizontal axis. That is, in theory, it does not touch the x axis, and it goes forever in each direction. The reality is that most applications of the normal curve are experiments that have finite limits of potential outcomes. For example, even though SAT score are analyzed by the normal distribution, the range of scores on each part of SAT is from 200 to 800. It is unimodal. The normal distribution is sometimes referred to as the bell-shaped curve. It is unimodal in that values mound up in only one portion of the graph, the center of the curve. It is a family of curves. Every unique value of the mean and every unique value of the standard deviation result in a different normal curve. The total area under the curve is 1. The area under the curve yields the probabilities, so the total of all probabilities on each side of the mean is 0.5.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 162
The cumulative distribution function (c.d.f.) of a Normal random variable is obtained by intergrating (1):
x
F ( x)
Pr( X
x) 2
1
2
e
(t
)2 / 2
2
dt ...................................................(2)
Probabilities and quantiles are obtained from ( 2): 1. Pr (X x1) = F (x1) 2. Pr (x0 X x1) = F (x1) - F (x0) 3. Pr (X x0) = 1 - F (x0). Because (1) is not integrable we use tables and computers to determine Normal probabilities and quantiles. The Normal table, found in any standard statistics textbook, relies on the Standard Normal distribution (denoted by Z). A Standard Normal distribution has mean 0 and variance 1. Any Normal distribution can be transformed to a Standard Normal Distribution using the Z-transformation:
Z
X
...........................................................................................(3)
Conversely, if Z ~ N(0,1), then
X = Z + .......................................................................................(4)
is a normal random variable with mean and variance .
2
Figure 4.3 shows how the Z-transformation preserves the probabilities. The top curve is any Normal distribution; the bottom curve is the Standard Normal Distribution. The shaded regions are equivalent in area.
Figure 4.3: Z-transformation.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 163
The applets display two normal densities in two stacked plots similar in representation to Figure 4.3. The shaded regions represent the probability being computed. The regions for both plots are equivalent in area because of the Z-transformation defined by (3).
4.8.3.2 STANDARD NORMAL PROBABILITY DISTRIBUTION DEFINITION 4.23: STANDARD NORMAL PROBABILITY DISTRIBUTION A standard normal probability distribution or simply standard normal distribution 2 is a normal distribution with mean = 0 and variance = 1. Areas under this curve can be found using a standard normal table. Once the standard normal distribution has been tabulated, the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution. Figure 4.3 shows a standard normal distribution curve. The density in the upper plot corresponds to a user-defined Normal random variable:
X~N( ,
2
)
The density in the lower plot corresponds to a standard Normal random variable:
Z~ N (0, 1)
The equation for the standard normal distribution is
e
z2 2
f X ( z)
2
for 0 < z < +
and is equal to zero elsewhere.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 164
Since the general form of probability functions can be expressed in terms of the standard distribution, all subsequent formulas in this section are given for the standard form of the function.
Figure 4.4: Standard normal distribution curve.
Assume throughout this document then that we are working with a variable Z that has a standard normal distribution. The letter Z is usually used for such a variable; the small letter z is used to indicate the generic value that the variable may take. The standard normal distribution is sometimes called the Z-distribution. A z score always reflects the number of standard deviations above or below the mean a particular score is.
4.8.4 FINDING THE AREA UNDER THE STANDARD NORMAL CURVE
Some notable qualities of the standard normal distribution are given below: 1. The density function is symmetric about its mean value. 2. The mean is also its mode and median. 3. 68.26% of the area under the curve is within one standard deviation of the mean. 4. 95.44% of the area is within two standard deviations. 5. 99.74% of the area is within three standard deviations. 6. The inflection points of the curve occur at one standard deviation away from the mean.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 165
NOTE 4.4: ILLUSTRATION OF THE "68-95-99.7 RULE" The illustration is given by Figure 4.5 below. In practice, one often assumes that data are from an approximately normally distributed population. If that assumption is justified, then about 68.26% of the values are at within one standard deviation away from the mean, about 95.44% of the values are within two standard deviations and about 99.74% lie within three standard deviations. This is known as the "68-95-99.7 rule". Figure 4.5: Illustration of the "68-95-99.7 rule"
EXAMPLE 4.23 Problem: Suppose that the average annual salary for first-year teachers in Waco, Texas, is $29,856. If the distribution of salaries is known to follow a normal distribution with standard deviation $3,200, what is the probability that a randomly selected first-year teacher makes these salaries? (A) Between $20,000 and $30,000 a year. (B) Greater than $30,000 a year.
Solution: Let the random variable X represent the annual salary for the first-year teachers. Hence, X ~ N ( = 29,867, = 3,200). (A) We need to compute Pr (20,000 X 30,000).
To do so, we have to calculate the z-scores using the formula (3):
z1
x1
20,000 29,856 = -3.08 and z2 3, 200
x2
30,000 29,856 = 0.045. 3, 200
Thus, z1 lies 3.08 standard deviations below the mean ( = 29,856) and z2 lies 0.045 standard deviations above the mean ( = 29,856). Referring to any standard normal table, we find the area corresponding to z1 = -3.08 to be 0.0010 and we find the area corresponding to z2 0.05 to be 0.5199.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 166
The resulting probability follows: Pr (20,000 X 30,000) = Pr (Z2 0.05) Pr (Z1 -3.08) = 0.5199 0.0010 Pr (20,000 X
30,000) = 0.5190.
We conclude that about 51.9% of the first-year teachers in Waco, Texas earn a salary between $20,000 and $30,000 per year. (B) We need to compute Pr (X 30,000).
From part (A), we had already computed the probability that the salary is less than or equal to $30,000. Hence, we simply take here the complement of that to get: Pr (X 30,000) = 1 - Pr (X 30,000) = 1 - 0.5199, or Pr (X 30,000) = 0.4801.
Using EXCEL
Type in any empty cell the function: =1-NORMDIST(30000,29856,3200,TRUE), then press Enter. The answer is 0.4821. EXAMPLE 4.24 Problem: Suppose that at a large state university graduate research assistants are paid by the hour. Data from the personnel office show that the distribution of hourly wages paid to graduate students across the campus is approximately normal with a mean of $12.00 and a standard deviation of $2.50. Determine the probability of selecting at random from the personnel files a graduate assistant whose hourly wage is extreme in either direction either $10 or below or $14 or above.
Solution: Let the random variable X be the hourly wage paid to graduate students across the campus. Hence, X ~ N ( = 12, = 2.5). We need to compute Pr (X 10) + Pr (X since the two events are mutually exclusive.
14)
Using EXCEL
Type in any empty cell the function: =NORMDIST(x,, ,TRUE), or alternatively: =NORMSDIST(z). Here we have: =NORMDIST(10,12,2.5,true)+1-NORMDIST(14,12,2.5,true), then press Enter. Hence, the answer is 0.423711. There is a 42% chance of selecting at random from the personnel files a graduate assistant whose hourly wage is extreme in either direction either $10 or below or $14 or above.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 167
EXAMPLE 4.25 Problem: It has been reported that households in the West spend an annual average of $6,050 for groceries. Assume a normal distribution with a standard deviation of $1,500. (David Stuckey and Sam Ward, "Home Cooked", USA Today, July 25, 2006, p. 1A) How much money would a Western household have to spend on groceries per year in order to be at the 99th percentile (i.e., only 1% of Western households would spend more on groceries)? Solution: Let the random variable X be the annual average spending for groceries. Hence, X follows a normal distribution with mean = 6050, and standard deviation = 1500. (A) We need to compute x such that Pr (X x) = 0.99.
To do so, we have to calculate first z such that Pr (Z z) = 0.99. From any standard normal table, the area closet to 0.9901 is 0.975. The corresponding z value is 2.33. Then we plug that value into formula (4) as follows:
x
z
6050 2.33 1500 9545
We conclude that a western household has to spend on average $9,545 on groceries per year in order to be in the top 1% of all western households. The researcher might use this information to identify those households who have unusually large expenditures.
Using EXCEL
Type in any empty cell the function: =NORMINV(percentile,, ). Here we have: =NORMINV(0.99,6050,1500), then press Enter. The answer is 9539.52181.
NOTE 4.5 We often are interested in finding the Z-score that a specified area to the right. For this reason, we have special notation to represent this situation. Note that the area under the standard normal curve to the right of z = 1 Area to the left of z.
DEFINITION 4.24: THE z NOTATION The symbol z (pronounced "z sub alpha" or more simply "z alpha") is the z-score such that the area under the standard normal curve to the right of z is .
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 168
EXAMPLE 4.26 Problem: A) Use the Normal Table to find the value of z0.025. B) Use Microsoft Excel to find the value of z0.05. Solution: (A) We wish to find the z value such that the area under the standard normal curve to the right of the z value is 0.025. The area to the right of the unknown z value is 0.025, so the area to the left of the z value is 1 0.025 = 0.975. We look up in any standard normal table for the area closest or equal to 0.975. The area equal to 0.975 corresponds to a z value of 1.96. Therefore, z0.025 = 1.96.
(B) Using Microsoft EXCEL Type in any empty cell the function: =-1*NORMSINV(0.05), then press the Enter key. The answer is 1.6449.
EXAMPLE 4.28 Problem: Find the area under the standard normal curve to the right of z = -0.46. Solution: We wish to find the probability of Pr (Z -0.46). We first compute Pr (Z -0.46), which equivalent to the area to the left of z = -0.46. From the standard normal table, we read Pr (Z -0.46) = 0.3228. Then, applying the theorem stating that the area to the right of 0.46 is equal to 1 minus the area to the left of 0.46, we get Pr (Z -0.46) = 1 Pr (Z -0.46) = 1 0.3228 = 0.6772.
We conclude that the area to the right of 0.46 is 0.6772.
Using Microsoft EXCEL
Type in any empty cell: =1-NORMSDIST(-0.46), then press Enter. The answer is 0.6772.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 169
4.8.5 ASSESSING NORMALITY Normality tests check a given set of data for similarity to the normal distribution. The null hypothesis is that the data set is similar to the normal distribution, therefore a sufficiently small P-value indicates non-normal data. Using a statistical package, it is easy to check for normality by carrying out at least one of the following tests: 1. Kolmogorov-Smirnov test. 2. Shapiro-Wilk test. 3. Normal probability plot (Normal Q-Q plot). EXAMPLE 4.27 Problem: The data in Table 4.8 represent the three-year rate of return of 19 randomly selected small-capitalization growth mutual funds. Is there evidence to support the belief that the variable "three-year rate of return" is normally distributed? Describe the distribution. (A) Table 4.8: Three-year return rate of 19 randomly selected small-capitalization growth mutual funds 15.8 16.7 18.2 18.4 18.4 18.5 19.2 19.5 21.3 22.2 22.6 23.7 23.7 25.5 27.0 27.4 28.5 29.1 29.6
(B) Table 4.9 represents the time spent waiting in line (in minutes) for the Demon Roller Coaster for 100 randomly selected riders. Is the random variable "waiting time" normally distributed? Describe the distribution. Table 4.9: Time spent waiting in line for 100 randomly selected riders 7 33 30 4 35 94 3 76 6 3 39 18 5 107 14 8 21 0 2 7 9 15 4 61 8 9 86 18 53 38 37 45 6 0 9 38 16 15 11 93 47 21 41 81 1 68 5 61 7 94 1 6 55 9 25 22 10 115 3 9 94 16 64 51 80 18 19 18 79 13 80 21 1 0 41 24 26 8 40 18 2 6 24 14 1 29 11 60 9 6 12 121 12 0 47 56 93 34 19 30
Solution: For both distributions, we use SPSS. The steps to be followed are:
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 170
1. Enter the data into ONE column (like VAR000001) under Data View. 2. Name the variable VAR000001 "rate_of_return" or "time_spent" by clicking on
Variable View.
3. Choose Analyze from the menu bar. Highlight Descriptive Statistics, then click on
Explore. Highlight "rate_of_return" or "time_spent" and MOVE it to the Distribution List window by clicking on the > icon. Click on Plots and select Normality Plots with Tests. Press Continue. Press OK. Descriptive statistics outputs are shown in Table 4.9 below. (A) Table 4.10: Descriptive statistics for the three-year return rate data
Descriptives rate_of_return Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 22.3842 20.2470 24.5214 22.3491 22.2000 19.663 4.43424 15.80 29.60 13.80 8.60 .272 -1.272 Std. Error 1.01729
Lower Bound Upper Bound
.524 1.014
Description: Center, symmetry and shape Looking the above Descriptive Outputs (Table 4.10), we see that the mean = median = 22.3. The skewness coefficient is close to zero (0.27). All of these mean that the distribution is centered at 22.3 and is roughly symmetrical around that value. The kurtosis coefficient is much less than three (-1.27). This is an indication that the "peak" of the histogram of the data distribution is shorter than the one of the standard normal curve.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 171
Table 4.11: Tests of normality for the three-year return rate data
Tests of Normality Kolmogorov-Smirnov Statistic df Sig. .163 19 .198
a
rate_of_return
Shapiro-Wilk Statistic df .931 19
Sig. .179
a. Lilliefors Significance Correction
Normality of the random variable: Statistical tests (based on Table 4.11 results) The p-values associated with Kolmogorov-Smirnov and Shapiro-Wilk tests both exceed 0.05 (0.198 and 0.179, respectively). Hence, the random variable "Return rate" is normally distributed. Figure 4.6: Normal Q-Q plot of rate of return
Normal Q-Q Plot of rate_of_return
2
Expected Normal
1
0
-1
-2 15 18 21 24 27 30
Observed Value
Normality of the random variable: Normal Q-Q plot (see figure 4.6) Since the normal probability plot is roughly linear in that all the data clustered around the straight line provided by the SPSS software, then we conclude that it is reasonable to believe that the sample return-rate data come from a population that follows a normal distribution.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 172
(B) Table 4.12: Descriptive statistics for the time spent waiting data
Descriptives time_spent Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 30.6100 24.5381 36.6819 28.1000 18.0000 936.422 30.60102 .00 121.00 121.00 39.25 1.197 .488 Std. Error 3.06010
Lower Bound Upper Bound
.241 .478
Description: Center, symmetry and shape Looking the below Descriptive Outputs (Table 4.12), we see that the mean = 30.6 and is much greater than the median = 18.0. The skewness coefficient also is much greater than zero (1.20). All of these mean that although the distribution is centered at 30.6, it is skewed to the right. The kurtosis coefficient is less than three (0.49). This is an indication that the "peak" of the histogram of the data distribution is shorter than the one of the standard normal curve. Normality of the random variable: Statistical tests (based on Table 4.13 results) The p-values associated with Kolmogorov-Smirnov and Shapiro-Wilk tests are both less than 0.05 (0.000 for either). Hence, the random variable "Time spent waiting" is NOT normally distributed. Table 4.13: Tests of normality for the time spent waiting data
Tests of Normality Kolmogorov-Smirnov Statistic df Sig. .183 100 .000
a
time_spent
Statistic .845
Shapiro-Wilk df 100
Sig. .000
a. Lilliefors Significance Correction
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 173
Normality of the random variable: normal Q-Q plot (see figure 4.6) Clearly, the normal probability plot is not linear. We conclude the random variable "Time spent waiting" is NOT normally distributed. Figure 4.7: Normal Q-Q plot of time spent waiting
Normal Q-Q Plot of time_spent
3
2
Expected Normal
1
0
-1
-2 -50 0 50 100
Observed Value
4.8.6 SAMPLING DISTRIBUTIONS In this section, you will study the relationship between a population mean and the means of samples taken from the population. If you compute the mean of a sample of 10 numbers, the value you obtain will not equal the population mean exactly; by chance it will be a little bit higher or a little bit lower. If you sampled sets of 10 numbers over and over again (computing the mean for each set), you would find that some sample means come much closer to the population mean than others. Some would be higher than the population mean and some would be lower.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 174
Imagine about sampling 10 numbers and computing the mean over and over again, say about 1,000 times, and then constructing a relative frequency distribution of those 1,000 means. This distribution of means is a very good approximation to the sampling distribution of the mean. The sampling distribution of the mean is a theoretical distribution that is approached as the number of samples in the relative frequency distribution increases. With 1,000 samples, the relative frequency distribution is quite close; with 10,000 it is even closer. As the number of samples approaches infinity, the relative frequency distribution approaches the sampling distribution. DEFINITION 4.25: SAMPLING DISTRIBUTION A sampling distribution is the probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population. DEFINITION 4.26: SAMPLING DISTRIBUTION OF SAMPLE MEANS The sampling distribution of sample means is the probability distribution of sample means, with all samples having the same sample size, n. In general, the sampling distribution of any particular statistic is the probability distribution of that statistic. PROPERTIES OF SAMPLING DISTRIBUTIONS OF SAMPLE MEANS
1. The mean of the sample means
N X
X
is equal to the population mean .
X
X
1 Xi N i1 2. The standard deviation of the sample means
deviation divided by the square root of n. where
X
X
is equal to the population standard
n
,
X
[
1 N
N
( Xi
i 1
)2 ]
The standard deviation of the sampling distribution of the sample means is called the standard error of the mean. It is designated by the symbol X . Note that the spread of the sampling distribution of the mean decreases as the sample size increases. EXAMPLE 4.28 Problem: You write the population values {1, 3, 5, and 7} on slips of paper and put them in a box. Then you randomly choose two slips of paper, with replacement. (A) Find the sampling distribution of sample means.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 175
(B) Find the mean, variance, and standard deviation of the sample means. Compare your 1 N 1 3 5 7 16 results with the mean, : X = 4 and standard Xi = N i1 4 4 deviation, :
2 X
X
[
1 N
N
( Xi
i 1
)2 ] . Since we have here,
9 1 1 9 4 2.236.
=
20 = 5, then the population standard deviation, , is 4
= 5
Solution: List all N = 16 (= 42) samples of size n = 2 from the population of size N = 4 and the mean of each sample. The outcomes are shown in Table 4.14 below.
n
Table 4.14: Sampling distribution of sample means for sample size of 2 Sample data 1, 1 1, 3 1, 5 1, 7 3, 1 3, 3 3, 5 3, 7 Sample mean 1 2 3 4 2 3 4 5 Sample data 5, 1 5, 3 5, 5 5, 7 7, 1 7, 3 7, 5 7, 7 Sample mean 3 4 5 6 4 5 6 7
(A) The relative frequency distribution of the sample means is given in Table 4.15 below. Table 4.15: Relative frequency of the sample means Sample mean Frequency ni 1 2 3 4 3 3 1 n = 16 Relative frequency fi = ni/n 1/16 = 0.0625 2/16 = 0.1250 3/16 = 0.1875 4/16 = 0.2500 3/16 = 0.1875 2/16 = 0.1250 1/16 = 0.0625 1
x i fi
xi
2
fi
xi 1 2 3 4 5 6 7
Total
1/16 4/16 9/16 16/16 15/16 12/16 7/16 64/16
1/16 8/16 27/16 64/16 75/16 72/16 49/16 296/16
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 176
To compute the mean of the sample means, we have:
1 E( X ) = N
N
xi
i 1
p( x i )
1 N
N
xi
i 1
f (xi )
64 = 4 (= ). 16
To compute the variance of the sample means, we get: Var ( X )
1 N
N i 1
xi
2
p( x i )
2
1 N
N
x
i 1
2 i
f ( x i ) - 2 =
1 (296) (4)2 = 18.5 16 16
2
Var ( X ) =
2 X
= 2.5 (We have thus established that
2 X
n
5 ). 2
If you construct both the relative frequency bar graph of population of X and the relative frequency of sampling distribution of the sample means, X , you would observe the following:
Figure 4.8: Relative Frequency Bar Graph for Population Values of X
Relative Frequency Bar Graph for the Population Values 0.3 0.25 0.25 0.25 0.25 0.25
Relative frequency
0.2
0.15
0.1
0.05
0 1 3 X values 5 7
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 177
Figure 4.9: Relative Frequency Histogram for the Sampling Distribution of the Sample Means, X
Relative Frequency Histogram for Sample Means 0.3
0.25
0.25
Relative frequency
0.2
0.1875
0.1875
0.15
0.125
0.125
0.1
0.0625
0.05
0.0625
0 1 2 3 4 Sample means 5 6 7
Comparing the two figures shows that the bar graph of population values has a uniform shape, whereas the histogram of sample means is bell shaped and symmetric, similar to a normal curve.
(B) The mean, variance, and standard deviation of the 16 sample means follows.
N
Applying the formula
= X
i 1
xi
p( xi ) yields
X
1 0.0625 2 0.125 3 0.1875 4 0.25 5 0.1875 6 0.125 7 0.0625 4 .
These results satisfy the properties of sampling distributions because
4 and
X X
X
X
n
n
5 2
2.5 1.5811 .
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 178
4.8.7 THE CENTRAL LIMIT THEOREM 4.8.7.1 INTRODUCTION The central limit theorem states that given a distribution with a mean and variance , the sampling distribution of the mean approaches a normal distribution with a mean () and a variance
2
2
as n, the sample size, increases. n The amazing and counter- intuitive thing about the central limit theorem is that no matter what the shape of the original distribution, the sampling distribution of the mean approaches a normal distribution. Furthermore, for most distributions, a normal distribution is approached very quickly as n increases. Keep in mind that n is the sample size for each mean and not the number of samples. Remember in a sampling distribution the number of samples is assumed to be infinite. The sample size is the number of scores in each sample; it is the number of scores that goes into the computation of each mean. Opposite are shown the results of a simulation exercise to demonstrate the central limit theorem. The computer sampled n scores from a uniform distribution and computed the mean. This procedure was performed 500 times for each of the sample sizes n = 1, n = 4, n = 7, and n = 10. On the right are shown the resulting frequency distributions each based on 500 means. For n = 4, 4 scores were sampled from a uniform distribution 500 times and the mean computed each time. The same method was followed with means of 7 scores for n = 7 and 10 scores for n = 10. Two things should be noted about the effect of increasing n: The distribution's shape becomes more and more normal. The spread of the distributions decreases. 4.8.7.2 USING THE CENTRAL LIMIT THEOREM TO FIND A PROBABILITY You can find the probability that a sample mean will fall in a given interval of the sampling distribution of the sample means. To transform the sample mean to a z-score, you can use the formula z=
Sample Mean Population Mean Population St andard Error
x
X
X
x n
...............................(5)
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 179
EXAMPLE 4.29 Problem: Suppose you have selected a random sample of size 36 observations from a population with mean equal to 80 and standard deviation equal to 6. It is known that the population is not extremely skewed. Find the probability that the sample mean will be larger than 82. Solution: From the Central Limit Theorem, you know that the sampling distribution of the sample mean will be approximately normal since the sampled population distribution is not extremely skewed and the sample size, 36, exceeds 30 (i.e., a large sample size). You also know that the sampling distribution of X will have mean and standard deviation 6 6 X 80 and X 1 , respectively. X X n n 36 6 Now, the probability that the sample mean will exceed 82 may be expressed as: Pr ( X 82) = ?
Using the formula in (4), we get:
82 80 ) = Pr (Z 6 36
Pr ( X
82) = Pr (Z
2) = 1 Pr (Z
2) = 1 0.9772 = 0.0228.
. Using Microsoft EXCEL Type in any empty cell: =1-NORMDIST(82,80,1,TRUE), then press Enter. The answer is 0.0228. EXAMPLE 4.30
Problem: A manufacturer of automobile batteries claims that the distribution of the
lengths of life of its best battery has a mean of 54 months and a standard deviation of 6 months. Suppose a consumer decides to check the claim by purchasing a sample of 64 of these batteries and subjecting them to tests that determine battery life. Assuming that the manufacturer's claim is true, what is the probability the consumer group's sample has a mean life of 51 or fewer months? Solution:
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 180
From the Central Limit Theorem, you know that the sampling distribution of the sample mean will be approximately normal since the sample size, 64, exceeds 30 (i.e., a large sample size). You also know that the sampling distribution of X will have mean and standard deviation 6 6 X 54 and X 0.75 , respectively. X X n n 64 8 Now, the probability that the sample mean life is 51 months or fewer may be expressed as Pr (X 51) ? After applying the formula in (5), it follows that:
51 54 ) = Pr (Z 6
3 ) = Pr (Z 6 8
Pr (X
51) = Pr (Z
12 ) = Pr (Z 3
4) = 0.0000317.
64 Thus, the probability the consumer group will observe a sample mean life for a battery of 51 or fewer months is only 0.00003 or 0.0003% (a very unlikely event) if the manufacturer's claim is true.
Using Microsoft EXCEL
Type in any empty cell: =NORMDIST(51,54,0.75,TRUE), then press Enter. The answer is 3.1686E-05 or 0.000031686. EXAMPLE 4.31 Problem: According to the Pew research Center for the People and the Press, persons in higher age groups tend to spend much more time reading the newspaper. For persons 65 or more years of age who read a newspaper, the average time has been reported as 33 minutes. Assuming a population standard deviation of 10.0 minutes and a simple random sample of 36 newspaper readers in the 65-or-over age group, what is the probability that members of this group will average at least 35 minutes reading their next newspaper? (Source: Anne R. Carey and Web Bryant, "Speed readers", USA Today snapshot calendar, May 20, 1999). Solution: We have The random variable X defined as X = "Time (in minutes) spent reading a newspaper"
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 181
The population mean,
X:
X
= 33 minutes,
X: X
The population standard deviation, The sample size, n: n = 36
= 10 minutes, and
Again from the Central Limit Theorem, you know that the sampling distribution of the sample means will be approximately normal since the sample size, 36, exceeds 30 (i.e., a large sample size). You also know that the sampling distribution of sample means, X , will have mean and standard deviation 10 10 5 X , respectively. 33 and X X X n n 36 6 3 Now, the probability that the sample mean time is less than or equal to 35 minutes may be expressed as Pr (X 35) ? The successive computations are shown below.
Pr ( X
35)
Pr
X n
35 33 10 36
Pr Z
2 6 10
Pr ( Z
6 ) 5
Pr ( Z 1.2)
Pr (Z < 1.20) = 0.8849. Hence, we obtain the answer as: Pr ( X
35) = 0.8849.
One may infer that, based on a random sample of 36 very old people, the probability that persons 65 or more years of age will read their next newspaper at least 35 minutes is 88.49%.
Using Microsoft EXCEL
Type in any empty cell: =NORMDIST(35,33,10/SQRT(36),TRUE), then press Enter. The answer is 0.8849303.
Dr. LOHAKA QBA 2302 Chapter 4: PROBABILITY & PROBABILITY DISTRIBUTIONS Page 182
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Baylor - QBA - 2302
CHAPTER 2: COLLECTING DATA2.1 INTRODUCTION Once the objectives of the statistical study have been determined, the variables of interest have been defined, and the population and its sample have been identified and fully described, which is Step 1 in
Baylor - ACC - 2302
Chapter two quick check (pages 87&88) and quiz (pages 97 99): There are several ways to memorize the accounts natural balance. First you must remember that debits are entries on the left and credits are entries on the right. Second, remember that th
Baylor - ENV - 1301
ENV 1301-02 7 September, 2007Title: Message in the Drink Bottle: RecycleSummary: With the rate of consumption for bottled water significantly rising, many activists areattacking the companies producing the millions of plastic bottles. The U.S. C
Baylor - ENV - 1301
ENV 1301-02 31 August, 2007Title: Fish Mercury Linked to Atmospheric LevelsSummary: For the first time in history, North American Scientists have shown the change inmercury levels in fish is directly related to the chemical's atmospheric disposi
Baylor - ACC - 2302
Chapter Six Inventory and Cost of Goods Sold General Principles What the company starts the period with plus what is buys has to equal what is has sold plus what it has left. Make sure you understand this! Restated: Beginning Inventory + Purchases=
Baylor - ACC - 2302
Chapter Six Quick Check Plant Assets and Intangibles This chapter is concerned with assets that provide benefit to the entity for more than one period. The issues we have to focus on are: acquisition; expensing the asset while it is being used (dep
Baylor - ACC - 2302
Answers Quick Check (pages 26 & 27) 1. B. A proprietorship has one owner. The other three statements are true. 2. A. Assets (economic resources) are generally valued at historical cost. 3. D. The accounting equation is: Assets = Liabilities + Owners'
Baylor - ACC - 2302
Chapter Four Quick Check Page 241 Problem 1. 1. This procedure limits access to sensitive data Firewalls Firewalls prevent unauthorized access to computer files. 2. This type of insurance policy covers losses due to employee theft Fidelity Bond o
Baylor - ACC - 2302
Chapter Five Quick Check pages 286 288 and Quiz pages 296 298 Quick Check: 1. B Trading securities are classified at fair market value. In this case $55,000. This is an exception to the historical cost principle. 2. C The gain or loss on trading
Baylor - ACC - 2302
Chapter Eight Liabilties Quiz: 1. A The allowance for bad debts is a contra-asset account. It is used to reduce accounts receivable to the expected net realizable value. Since assets have debit balances, contra-assets have credit balances. 2. E The
Baylor - QBA - 2302
CHAPTER 1: WHAT IS STATISTICS?1.1 AN OVERVIEW OF STATISTICSDEFINITION 1.1: STATISTICS Statistics is the science of learning from data. This means that statistics is a method of extracting information from data. It involves rules and procedures for
Baylor - ENV - 1301
Dr. Lehr ENV 1301-02 Film Notes Relation to Biosphere II Provide a summary of the film. Biosphere II is a documentary on the creation and experimentation of a manmade ecosystem. It houses four different biomes and was created to develop a further und
Baylor - ENV - 1301
ENV 1301-02 05 October, 2007Title: Fish Mercury Linked to Atmospheric LevelsSummary: For the first time in history, North American Scientists have shown the change inmercury levels in fish is directly related to the chemical's atmospheric dispos
Baylor - ENV - 1301
ENV 1301-02 28 September 2007Title: How Ethanol is Making the Farm Belt ThirstySummary: Ethanol is quickly drying up high corn-producing states. Water cop Mike Clementsstudies the fast-growing invasive phragmites that slows the flow of the river
Baylor - ENV - 1301
ENV 1301-02 21 September 2007Title: Global Warming Masks Threat to BiodiversitySummary: As environmentalists are being distracted by the increasing threats of globalwarming, the problem of species extinction continues relatively unnoticed. Resea
Baylor - QBA - 2302
CHAPTER 6: INFERENCES ABOUT TWOPOPULATION CENTRAL VALUES6.1 DEPENDENT AND INDEPENDENT SAMPLES In chapter 5, we studied methods for testing a claim about the value of a population parameter. In this chapter, we will learn how to test a claim compari
Baylor - MIS - 1305
Introduction to Basic Codes HTML Tags(Hypertext Mark-up Language) Good public website tutorial: http:/www.w3schools.com/html/html_intro.aspOverview of Web PublishingBrowsing the World Wide Web 1. Tim Berners-Lee - developed the Web in 1989 2. New
Baylor - MIS - 1305
Schedule for Remainder of Excel Topics Dr. Mo DateOct 22, MondayTopics and Tasks1. 10-question Pop Test, 8 a.m., covering Chapters 1-2 in your Excel textbook 2. Chapter 3 Calculating Data Pages 75-102 only in your Excel Textbook (Slides have be
Baylor - MIS - 1305
FINAL MIS 1305 SCHEDULELab 10 Last lab of the year 1. 2. 3. 4. Webpage due No pop test Lab 10 due by 5 p.m. this date In class we will discuss Access chapters 1, 2 and 3Friday November 15 Monday, November 19LAB 10 ASSIGNMENT:(1) Chapter 1 - P
Baylor - HED - 1145
Final Exam Review: There will be 61 questions. Possible questions include T/F, matching, and multiple choice. The final will not be cumulative. It will only consist of the material that I post on this review sheet. Know the following definitions: car
USC - ECON - 205
ECON NOTES Accounting of GDP & Spending multiplier GDP= C+I+G+X-M Consumption Largest component (70%)- Usually stable Investment is smallest (9%)- VOLATILE; changes a lot Government spending (20%)-based on government needs; exogenous Government expen
UCLA - LS - 1
Chapter 31: Fungi Large, multicellular eukaryotes that occupy terrestrial environments *Fungi absorb their nutrition from other organisms-dead or alive. Fungi that absorb their nutrients from dead organisms are the world's most important decomposers.
UCLA - LS - 1
Carly Lyons 003-630-708 Demo 10 Assignment: Animal Diversity 3 Scientific Name: Perognathus flavescens Common Name: Plains Pocket Mouse Class: Mammalia Order: Rodentia Habitat: Desert, grasslands/ plains Global distribution: North America, New Mexico
USC - IR - 101XG
IR NOTES INTELLIGENCE LADY Intelligence and Homeland Security Since 9/11 1) The omnipotence/incompetence problem 2) The threat environment: new challenges #, dynamism of enemies signals-to-noise problem on steroids 3) Three premises of Bush Admin's C
USC - ECON - 205
ECON Notes Week 21/22/2008 3:24:00 PMConstant Opportunity Cost PPF- Slope is constant; straight line Increasing Opportunity Cost PPF- slope is curved The Law of diminishing return- anything that you use more and more of it, the productivity decli
USC - ECON - 205
ECON Notes Week 11/17/2008 3:24:00 PMEconomic Systems Free Market System-free from government control; no government interference Command Economy-government decides gets what; they control everything Mixed Economy- part of economy goes through ma
UCLA - LS - 1
Carly Lyons 003-630-708 Squirrel and Snake Skin Data Analysis Hypothesis: Squirrels rub snake skin mainly on the lower half of their bodies to ward off predators. Null Hypothesis: Squirrels rub snake skin on their bodies in a completely random manner
UCSD - CHEM - 100A
Experiment 1: Buret Preparation and Calibration Abstract: A buret was prepared and calibrated for use in future experiments. The first part involved determining the value of drops/mL of water and the second part involved finding the correction for a
UCSD - CHEM - 100A
Experiment 3: Mixture of Carbonate and Hydroxide Abstract This experiment involved determining the CO32- concentration in a sodium hydroxide solution by also determining the OH- concentration through two sets of titrations utilizing different indicat
UCSD - CHEM - 100A
Experiment 4: Electrochemistry and Vitamin C Abstract The experiment used redox titrations and the concept of cell voltages (determining a reaction's spontaneity) to find the amount of ascorbic acid in a vitaminc C tablet. The first part involved obs
UCSD - CHEM - 100A
Experiment 2: Homogenous Chemical Equilibrium Abstract The equilibrium constant (Kc) of the esterification of ethanol and acetic acid to produce ethyl acetate and water was determined. Five trials were performed, each with a different initial concent
UCSD - CHEM - 100A
Experiment 4: Buffers and Indicators Abstract This experiment involved determining the pKHIn of the indicator bromocresol green by measuring its percent transmittance, converted to absorbance, in various solutions of different pH values. The first pa
UCLA - LS - 1
Chapter 29: Protists Eukaryotes: o Nuclear envelope o System of structural proteins called the cytoskeleton o Undergo cell division via mitosis o Have chromosomes where DNA is complexed with proteins called histones Protist: all eukaryotes that are n
UCLA - LS - 1
Chapter 30: Green Plants Dominate terrestrial and freshwater habitats Green algae and land plants o Logical to study them together because (1) they are the closest living relative to land plants and form a monophyletic group with them, and (2) the tr
UCLA - LS - 1
Chapter 32: An Introduction to Animals Animals: a monophyletic group of eukaryotes that can be recognized by three traits o Multicellular o Ingest their food o Move under their own power at some point in their life cycle Heterotrophs: obtaining carbo
UCLA - LS - 1
Carly Lyons 003-630-708 Squirrel and Snake Skin Data Analysis Hypothesis: Squirrels rub snake skin mainly on the lower half of their bodies to ward off predators. Null Hypothesis: Squirrels rub snake skin on their bodies in a completely random manner
UCLA - LS - 1
Chapter 31: Fungi Large, multicellular eukaryotes that occupy terrestrial environments *Fungi absorb their nutrition from other organisms-dead or alive. Fungi that absorb their nutrients from dead organisms are the world's most important decomposers.
Wake Forest - REL - 104
List of relevant terms for Hinduism examination: Sruti: Earliest compositions within Hindu tradition (that which was heard), told by the "seers" Dual emphasis on hearing and seeing what is holy characterizes Hindu tradition. Sacred is experienced thr
Wake Forest - REL - 104
Asia Religion Study guide to Buddhism 1. Buddha: lit. The Enlightened One. also, Thatagata (who has thus come) He has achieved highest state, tranquil, ageless and deathless, has reached full enlightenment. It has to be a man; it takes hundreds of pr
Wake Forest - REL - 104
List of relevant terms for Chinese Religion examination: Duke of Zhou (Zhou Gong): Prays to ancestral spirits (three deceased kings) when his King (Wu) falls sick, begging them to spare the king and take himself instead as their servant in death. The
Wake Forest - HST100 - woddard
Chapter Ten: Alcohol: Of the two-thirds who do consume alcohol, 10% are heavy drinkers Moderate drinkers are men who drink no more than two drinks per day or women who drink no more than one drink per day Roots of drinking behavior are established
Wake Forest - HST100 - woddard
Health in a changing society: Chapter 1: The Health Belief Model was developed to understand why people failed to take advantage of accessible disease prevention programs Health behaviors are influenced by 3 classes of factors Health Concerns Perc
FIU - REL - 2011
Gabriel Rotman Feb. 18, 2007 REL 2011 Creation Myths Throughout different cultures and different ways of growing up we experience some sort of educational experience relating to how life was created. I, myself, remember hearing my parents version of
FIU - REL - 2011
Gabriel Rotman Mar. 12, 2007 REL 2011 Site Review: Mosque and Buddhist Temple On a bright, sunny Sunday afternoon, my friend and I woke up and set out to experience two different religious sites we had never encountered before. We chose to visit both
FIU - REL - 2011
Gabriel Rotman EUH 2030 Section 02 10/11/06Like stories told by an old grandfather or moments pictured in an antique photo album, wars were depicted to make people understand the outcomes of these tragedies. Survivors have a tendency to tell their
FIU - REL - 2011
Gabriel Rotman EUH 2030 Section 02 11/01/06Back in 1907 when Baden-Powell held the first Scouting encampment he only knew of what to do with it from his own experiences. He lived in South Africa and was a commander in the British Army. Luckily, the
FIU - REL - 2011
Benjamin Franklins life is the essential rags to riches story. He lived a poor life while living in Boston, but it didnt make a difference in his later life unlike Jakob Walter. Walter was nothing but a pawn in Napoleons army at the time he was consc
FIU - REL - 2011
Gabriel Rotman 1677107 Traveling through the Enlightenment Back in the 1700's travel was an opportunity given only to certain men of a definite caliber. There are so many things interlocked with the term travel that every one of these terms means so
FIU - REL - 2011
Gabriel RotmanHistory books have always given students an idea about the horrifying events that took place between the years of 1914 and 1918. World War I was the first defining war that actually involved so much activity with so many different nat
FIU - REL - 2011
Gabriel Rotman April 8, 2007 REL 2011 Intellectual Influence on Hitler It's been 62 years since the end of World War II. The German surrender was just the final infringement in Hitler's failure to rule the Western world. His belief for a society with
University of Texas - M - 324K
1) Let p and q be statements. Are (p q) ( p q) and equivalent statement forms? Prove or disprove according to your answer. 2) Let D = {-1, 0, 1}. Consider the following statement: n D, m D such that n < m2 a) Is the statement you wrote in (
University of Texas - CC - 303
C C 303 (33087) INTRO TO CLASSICAL MYTHOLOGY, Fall 2007 SECOND MIDTERM EXAMINATION Answers Section A 1. Paris/Alexander/Alexandros 2. Hermes 3. Athena 4. Hera 5. Aphrodite 6. Mt. Ida 7. Hermes 8. That Polybus is dead and the people of Corinth want Oe
University of Texas - CC - 303
1NAME (last, first): UTEID: C C 303 (33087) INTRO TO CLASSICAL MYTHOLOGY, Fall 2007 SECOND MIDTERM EXAMINATION All questions are worth one point. The maximum score is 60 points. Section A: short answers. Please give your answers in the spaces provid
University of Texas - CC - 303
ANSWERS 1a 2d 3d 4d 5d 6c 7d 8c 9e 10 b 11 c 12 c 13 a 14 b 15 b 16 d 17 e 18 a 19 c 20 b 21 d 22 e 23 e 24 c 25 c 26 c 27 e 28 d 29 c 30 c 31 b 32 b 33 b 34 d 35 c 36 e 37 a 38 a 39 e 40 a41 Zeus 42 Hundred-Handers (or Hecatonchires) 43 Heaven/Sky
University of Texas - CC - 303
1NAME (last, first): UTEID: C C 303 (33087) INTRO TO CLASSICAL MYTHOLOGY, Fall 2007 FIRST MIDTERM EXAMINATION All questions are worth one point except where otherwise indicated. The maximum score is 60 points. Questions 140: multiple-choice section.
UVA - MAE - 210
Note that P1 is 2 bar (not 1)
Texas San Antonio - CLA - 2323
AgamemnonBy Aeschylus Written 458 B.C.E Translated by E. D. A. MorsheadDramatis PersonaeA WATCHMAN CHORUS OF ARGIVE ELDERS CLYTEMNESTRA, wife of AGAMEMNON A HERALD AGAMEMNON, King of Argos CASSANDRA, daughter of Priam, and slave of AGAMEMNON AEGI
Texas San Antonio - CLA - 2323
Hesiod's Theogony: Names to NoteSources:Hesiod's Theogony in Anthology (pp. 129-160) If you do not have the text: http:/ancienthistory.about.com/library/bl/bl_text_hesiod_theogony.htm)Names:Aphrodite Chaos Cronos Demeter Eros Gaia Hades Hera Hes