Unformatted text preview: STAT 230 Midterm 2 Review Package Spring 2010 STAT 230 Midterm 2 Review Package
Waterloo SOS
Prepared by Grace Gu Spring 2010
1 STAT 230 Midterm 2 Review Package Spring 2010 Table of Contents Important formulas (Memorizing these should help!)............................................ 3 Chapter 5 – Discrete Distributions.......................................................................... 4 Chapter 7 – Expectation, Averages and Variability ............................................... 11 Chapter 8 – Discrete Multivariate Distributions ................................................... 14 Extra Practice ....................................................................................................... 20 Past Midterm 2………………………………………………………………………….……………………….28 2 STAT 230 Midterm 2 Review Package Spring 2010
Important formulas (Memorizing these should help!)
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. If X and Y are independent, then Cov(X, Y) = 0 12. The correlation coefficient of X and Y is 13. 14. 15. If we have n identically distributed random variables, and ai = 1 for all I = 1, …, n 16. 3 STAT 230 Midterm 2 Review Package Spring 2010
Chapter 5 – Discrete Distributions
Definitions
Random Variable  a function that assigns a real number to each point in a sample space S. Probability function (p.f.) of a discrete random variable X  the function Cumulative distribution function (cdf) of a random variable X  the function 1. The following are the properties of a cdf F(x): a. b. c. 2. Distributions
A) Discrete Uniform Distribution. If X takes on values a, a+1, a+2, . . . , b with all values being equally likely, then X has a discrete uniform distribution on [a, b]. B) Hypergeometric Distribution We pick n objects at random without replacement from a collection of N items, and X is the number of successes among the n objects picked. Then, X has a hypergeometric distribution. Intuition: 4 STAT 230 Midterm 2 Review Package Spring 2010
Numerator: We use the counting techniques from Chapter 3. We have r “success items” within the collection of N items. We select x objects out of the r objects, and select the remaining n x objects out of the Nr “failure items”. Denominator: If we don’t impose any restrictions, we can choose all n objects from any of the N items available. C) Binomial Distribution Suppose we conduct an experiment that results in Success, and Failure (a Bernoulli r.v.). Let the probability of success be p and the probability of failure be 1 p. We then repeat the experiment n independent times. Let X be the number of successes obtained. Then X has a binomial distribution. Intuition: The x successes can happen in any of the n trials, and the x successes and nx failures are repeats. Thus, by the counting techniques from chapter 3, we can arrange them in ways. Since each one of the n trials is independent, by the multiplication rule from chapter 4, we can simply multiply all the probabilities together. D) Geometric Distribution Suppose we conduct an experiment that can either result in success (with probability p) or failure (with probability 1p). We keep repeating the experiment independently until we obtain a success. Let X be the number of failures obtained before the first success. Then X has a geometric distribution. Intuition: We use the multiplication rule from Chapter 4 for independent events. We have x failures before obtaining the first success, so we multiply the probabilities for all of these events together. E) Negative Binomial Distribution (a generalization of the geometric distribution) Suppose we conduct an experiment which results in success (probability p) or failure (probability 1p). We keep repeating the experiment independently until we obtain k successes. Let X be the number of failures obtained before the kth success. Then X has a negative binomial distribution. Intuition: 5 STAT 230 Midterm 2 Review Package Spring 2010
When we have x failures before obtaining the kth success, we have a total of x+k trials. Note that the x failures and previous k1 successes can happen in any order within the previous x+k1 trials. Thus, by the counting techniques from chapter 3, we can arrange them in ways. Since each one of the x+k trials are independent, by the multiplication rule in chapter 4 we can simply multiply the corresponding probabilities. F) Poisson Distribution There are two ways to derive a Poisson distribution, one using the binomial approximation which we will see later, and another using the conditions of a Poisson process. We will see the latter first. 3 conditions of a Poisson process occurring randomly over time (or space): a. Independence: The number of events in nonoverlapping intervals is independent. b. Individuality: For a very small time interval, the probability of 2 events occurring in the same interval is close to 0. In other words, no more than 1 event can take place at one exact point in time. c. Homogeneity/Uniformity: events occur at a uniform rate of . An event of some type occurs according to a Poisson process as defined above. Let X be the number of event occurrences in a time period of length t. Then, X has a Poisson distribution with Note: By the uniformity argument, if X takes place at a rate over a time period of length t1 , then X takes place at a rate of over a time period of length t2. G) Approximations a. Approximate of the Hypergeometric Distribution using the Binomial distribution. Recall, if X has a Hypergeometric distribution, If N and r are large, we can approximate this using a binomial distribution where . Then, we have b. Approximate of the Binomial Distribution using the Poisson distribution. Suppose that X has a Binomial (n, p) distribution. If we let n→ and p→0, while keeping np fixed at some µ, then we can use a Poisson distribution with µ=np to approximate X. Thus, we have 6 STAT 230 Midterm 2 Review Package Spring 2010 Example 1 a) The H1N1 flu is spreading among UW students!! Suppose that we get infected at a rate of 0.25 people per day. What is the probability that at least 3 people get infected within a week? b) UW has 2 different concern levels for the H1N1 flu, updated every week. If at least 3 people are infected within that week, the warning level is red, else it is green. If the concern level is red for at least 3 out of the 12 weeks of the term, the final exams will be cancelled. What is the probability that the final exams will be cancelled? Solution: (a) Let X denote the number of people who get infected. The appropriate rate for 1 week is (b) (from part a). Then Now, let Y denote the number of weeks in which we have at least 3 infections. Y has a Binomial (12, 0.256) distribution. 7 STAT 230 Midterm 2 Review Package Spring 2010
Example 2 The probability function (pf) of a random variable X is given by: f(x) = kx for x = 1, 2, …, 9 a) Find k. f (1) + f (2) + ... + f (9) = 1 k(1 + 2 + ... + 9) = 1 k( )=1 45k = 1 k= b) Find F(x), the cumulative distribution function (cdf) of X, for all values of x. F (x) = f (1) + ... + f (x) = (1 + ... + x)
= = x2 +x
90 , x= 1, 2, ..., 9 F (x) = 0, x < 1 F (x) = 1, x > 9 c) Sketch the probability function (histogram) and the cumulative distribution (graph) of X for 0 <= x <= 5 8 STAT 230 Midterm 2 Review Package Spring 2010
Example 3 (a) An experiment has 3 possible outcomes, A, B and C with respective probabilities k, l and m, where k+l+m = 1. The experiment is repeated until either outcome A or outcome B occurs. Show that A occurs before B with probability k/(k+l). Solution: (b) In the game of craps, a player rolls two dice. They win at once if the total is 7 or 11, and lose at once if the total is 2, 3 or 12. Otherwise, they continue rolling the dice until they either win by throwing their initial total again, or lose by rolling 7. What is the probability that they win? Solution: They can win by: Throwing 7 or 11 NOT throw 7 or 11, 2, 3 or 12 ; and repeatedly throw until they roll their initial throw before throwing a 7. Let X be a rv denoting the outcome of a single roll. Recall that when we have two dice, the probability of throwing a total sum of k is: o o Thus, P(X=7) = 1/6 and P(X=11) = 1/18. The other way is to throw a 4, 5, 6, 8, 9, or a 10. P(X=4) = 1/12, P(X=5) = 1/9, P(X=6) = 5/36, P(X=8) = 5/36, P(X=9) = 1/9 and P(X=10) = 1/12 9 STAT 230 Midterm 2 Review Package Spring 2010
Thus, By the independence of the first roll and the subsequent rolls, the result from part (a), and the addition rule from chapter 3, 10 STAT 230 Midterm 2 Review Package Spring 2010
Chapter 7 – Expectation, Averages and Variability
Definitions: 1. The median of a sample is a value such that half the results are below it and half the results are above it, when the results are arranged in numerical order. 2. The mode of the sample is the value which occurs most often. It is possible to have more than 1 mode in a sample. 3. The expected value of a discrete random variable X with probability function f(x) is 4. Suppose that the random variable X has probability function f(x). Then the expected value of some function g(x) of X is given by Linear property of expectations: 5. The variance of a r.v. X is given by We also have the following two expressions for variance, which are usually handier 6. The standard deviation of a random variable X is 7. Means and variances of special discrete distributions: a. Binomial distribution If X is a Binomial (n,p) random variable, then b. Poisson distribution If Y is a Poisson (µ) random variable, then 8. The moment generating function (m.g.f) of a discrete r.v. X with p.f. f(x) is given by: 9. The moments of the distribution of a random variable X can be derived from its m.g.f as follows: In other words, the rth moment of X can be obtained by differentiating M(t) r times and evaluating it at t = 0. 11 STAT 230 Midterm 2 Review Package Spring 2010
Example 1 Suppose that X is a discrete distribution with a m.g.f as follows: (a) Find E(X) and Var(X) (b) Find Solutions (a) (b) 12 STAT 230 Midterm 2 Review Package Spring 2010
Example 2 Assume that each week a stock either increase in value by $2 with probability 0.5 or decreases by $2, these moves independent of the past. The current price of the stock is $25. I wish to purchase a put option which allows me (if I wish to do so) the option of selling the stock 10 weeks from now at a “strike price” of $30. If the stock price is greater than $30 10 weeks from now, then it will not be exercised. This gives a return to the option of where S10 is the price of the stock in 10 weeks. (a) What is the fair price of the option today, assuming no transaction costs and 0% interest? (i.e. what is ? (b) What is variance in the return to the option? Solution: (a) First we define X to be the random variable that denotes the increase in stock price from now till week 10. Since each change can be either up or down (Bernoulli), X is a Binomial (10,0.5) random variable. Now, note that the stock price 10 weeks from now is a function of the random variable X and the initial stock price. The stock price increases by $2 when the change is an increase (success) and reduces by $2 when the change is a decrease (failure). Note: Max(.) is NOT a linear function. Thus, Be careful about situations like this in the exam. (b) We use the formula for variance; 13 STAT 230 Midterm 2 Review Package Spring 2010
Chapter 8 – Discrete Multivariate Distributions
Definitions 1. Suppose that there are two random variables X and Y. We define f(x, y), the joint probability function of (X, Y) as: And as usual, 2. We define the marginal distributions of X and Y as: In words, this means that to find the marginal distribution of X, sum the joint p.f. over all values of Y; and similarly to find the marginal distribution of Y sum the joint p.f. over all values of X. 3. Let X1, X2,…, Xn be some collection of random variables. We say that X1, X2,…, Xn are independent if and only if: Note: This is similar to the definition of independent events we saw in Chapter 4. 4. The conditional probability function of X given Y=y is Similarly 5. If we have a new variable U, such that U = g(X, Y), then the probability function for U is 6. The multinomial distribution: Similar to the binomial case, we conduct an experiment which has k different outcomes (instead of just two), with probabilities p1,p2,…,pk (p1+p2+…+pk=1). This experiment is repeated independently n times. Let X1 be the number of times outcome 1 occurs, X2 be the number of times outcome 2 occurs,…, Xk be the number of times outcome k occurs. Then, (X1, X2,…,Xk) has a multinomial distribution with the joint distribution function: 14 STAT 230 Midterm 2 Review Package Spring 2010
Intuition: Think about the multinomial in the same way as you did about the binomial distribution. There are n objects, and we have k repeating types among the n objects. There are objects of the ith repeating type, so by our repetition rule for permutations we can arrange these objects in ways. Due to independence, we can simply multiply the probability of each type occurring times. 7. The expected value of a function of discrete rv’s X and Y, g(X, Y) is: This can be extended beyond two variables X and Y. 8. Property of Multivariate Expectation: 9. The covariance of X and Y, denoted Note: A handier formula for covariance is 10. If X and Y are independent, then Cov(X, Y) = 0 11. Suppose X and Y are independent random variables. Then, if functions, . 12. The correlation coefficient of X and Y is and are any two Note: This is a measure of the strength of the relationship between X and Y that lies in the interval [1,1]. 13. Properties of Covariances: a. b. Intuition: Think of this as multiplying the two terms (aX+bY) and (cU+dV) together. (Which is exactly how it is derived using the definition) 14. Variance of a linear combination: In fact, more generally if we have n r.v’s X1,X2,…,Xn 15 STAT 230 Midterm 2 Review Package Spring 2010 If we have n identically distributed random variables, and ai = 1 for all I = 1, …, n Note: This general formula is very useful in problems involving indicator random variables. If all n random variables are independent, then 16 STAT 230 Midterm 2 Review Package Spring 2010
Example 1 1. Assume random variables X and Y have joint probability function as follows. x f (x, y) 0 y 2 0.05 0.2 0.25 0 1 2 0 0.2 0.3 a. Find the marginal probability function of X. f(x) = 0.25, 0.5, 0.25 for x = 0, 1, 2 b. Find cov X, Y . E(X) =1, E(Y) = 1, E(XY) = 2*0.2 + 4*0.25 = 1.4 c. Are X and Y independent? Why or why not? They are not independent since Cov X, Y ≠ 0. 17 STAT 230 Midterm 2 Review Package Spring 2010
Example 2 Suppose that a pond contains 100 fish, and 40 of them are salmons. One day, 30 random fish are caught from the pond. Let X be the number of salmons caught. What is E(X) and Var(X)? Use indicator random variables to solve this problem. Solution: We first define indicator variables X1, X2,… ,X30 , where Also note that Justification: Forty out of 100 fishes in the pond are salmons. Now, Also, Now, And, Now, note that Justification: For the fish i, we have a total of 100 fishes and 40 salmons. If the first fish is a salmon, then we have a total of 99 fishes and 39 salmons left. Thus E(XiXj) = . This gives Hence, 18 STAT 230 Midterm 2 Review Package Spring 2010
*Note that hypergeometric distribution would work for this question too 19 STAT 230 Midterm 2 Review Package Spring 2010
Extra Practice
Setup: Making lemonade requires lemons, sugar, and water. Good lemonade has a balance between sour (lemons) and sweet (sugar). If there are more lemons than sugar, it will be too sour, and if there is more sugar than lemons, it will be too sweet. Suppose you make 9 glasses of lemonade, where the distribution of the number of lemons, X, and the number of sugar cubes, Y, is given by the following joint probability function: x f (x, y) 0 y 1 2 1. Find the probability that: (a) A single glass of lemonade is i. actually just water P(water) = f (0, 0) = 0.05 ii. too sweet P(sweet) = P (X < Y ) = f (0, 1) + f (0, 2) + f (1, 2) = 0.20 iii. too sour P(sour) = P (X > Y ) = f (1, 0) + f (2, 0) + f (2, 1) = 0.15 iv. good P(good) = f (1, 1) + f (2, 2) = 0.60 (b) 1 glass is water, 2 are too sweet, 2 are too sour, and 4 are good. Mult(9; 0.05, 0.20, 0.15, 0.60) 9! f (1, 2, 2, 4) =
1!2!2!4! 0 0.05 0.05 0.1 1 0.05 0.25 0.05 2 0.05 0.05 0.35 0.051 0.202 0.152 0.604 = 0.022 (c) 4 are good and 2 are too sour. Mult(9; 0.25, 0.15, 0.60) (Combine sweet and water into one category) 20 STAT 230 Midterm 2 Review Package Spring 2010 (d) Given 1 is water, 4 are good and 2 are sour. Divide probability in (b) by probability of 1 water (which is Bin(9, 0.05).) = 0.074 (e) 4 are good. Bin(9, 0.60) (Marginals are Binomial) = 0.167 (f) Explain logically why the probability in (e) is Binomial. We only care about glasses of lemonade being good or not good. So we have two possible outcomes, and a fixed number of independent trials. This is Binomial. 2. Refer to the same setup as Question 1 (a) Tabulate the marginal probability functions f1(x) of X and f2(y) of Y. x f1(x) 0 0.20 1 0.35 2 0.45 y f2(y) 0 0.15 1 2 0.35 0.50 21 STAT 230 Midterm 2 Review Package Spring 2010
(b) Are the numbers of lemons and sugar cubes independent? Why or why not? No, since f (0, 0) = 0.05 but f1(0)f2(0) = 0.2 0.15 = 0.03 6= 0.05 (Any combination that works) (c) Calculate E[XY ]. E*XY + = ΣΣxyf (x, y) = 1 1 0.25 + 1 2 0.05 + 2 1 0.05 + 2 2 0.35 = 1.85 (5 terms are 0) (d) Find the covariance of lemons and sugar cubes. (Cov(X , Y ) = E [XY ] − E[X ]E[Y ]) E *X + = Σxf1(x) = 1 0.35 + 2 0.45 = 1.25 E *Y + = Σyf2(y) = 1 0.35 + 2 0.50 = 1.35 Cov(X, Y ) = 1.85 − 1.25 1.35 = 0.1625 3. Refer to the same setup as Question 1 (a) Find the moment generating function MY (t) of Y. MY (t) = E[etY + = Σety f2(y) = 0.15 + 0.35et + 0.5e2t (b) Explain (in point form) how you would use MY (t) to find the variance of Y . First, find E[Y 2+ by taking M ”Y (0) Second, subtract the mean of Y squared (1.352), since Var(Y) = E[Y 2] − E[Y ]2. Statistics 230, Winter 2010 Family Name:
Midterm Test 2 March 9, 2010 Duration: 75 Minutes Given Name: ID #: 1. [8 marks] Recall that if X is a random variable, then the cumulative distribution function of X is the function F (x) deﬁned by F (x) = P (X ≤ x). For each of the following functions, either explain why the function cannot be the cumulative distribution function of a random variable, or ﬁnd the probability that the variable is between 1 4 3 and 4 . (a)
y=1 if x < 0 0 2 x if 0 ≤ x ≤ 0.8 F (x) = 0.64 if 0.8 < x < 1.5 1 if x ≥ 1.5
y=0 x=.8 x=1.5 (b)
y=1 0 x+ F (x) = 1
y=0 x=1 1 4 sin(2πx) if x < 0 if 0 ≤ x < 1 if x ≥ 1 (c)
y=1 F (x) = 0 1 − e −x if x < 0 otherwise y=0 (d)
y=1 F (x) = 1 4 3 4 if x < 1 2 otherwise y=0 x=.5 (Page 1 of 4) 2. [3 marks] Suppose that the random variable X has the cumulative distribution function: y=1 0 0.3 F ( x) = 0.5 1 if if if if x<0 0≤x<2 2 ≤ x < 3.5 3.5 ≤ x
y=0 (a) Sketch the cumulative distribution function of X on the provided axes. (b) Determine the following probabilities: i. P (X = 2) ii. P (X = 3) iii. P (X = 4) iv. P (X ≤ 3) 3. [5 marks] Consider three random variables, X ∼ Bi(12, 1 ), Y ∼ Bi(120 000, 30 1 ), and Z ∼ Poisson(µ). 3 000 (a) What are the expected values of X , Y , and Z ? (b) What are the variances of X , Y , and Z ? (c) Determine exact expressions for P (X = 5), P (Y = 5), and P (Z = 5). (Do not evaluate the expressions.) (d) By picking an appropriate value of µ, use your knowledge of Z to estimate P (Y = 5) to two decimal places. (e) Why is Z not useful for approximating P (X = 5)? 4. [3 marks] Suppose that X is a random variable with probability function f (x) = P (X = x) = 1 1 − e −1 x x = 1, 2, 3, . . . . Find the expected value of X . (Hint: It may help to make the substitution r = 1 − e−1 .) x , for (Page 2 of 4) 5. [4 marks] The score of a hockey game can be estimated by assuming that each team scores according to a Poisson process, and determining appropriate intensities from available data. Suppose that Canada plays a game against Russia. Assume that Canada scores at a rate of 3.5 goals per hour of play, and that Russia scores independently at a rate of 2.5 goals per hour of play. (a) What is the probability that there are a total of 3 goals during the ﬁrst 20 minutes of the game? (b) What is the probability that Canada scores 4 goals and Russia scores 2 goals during the ﬁrst 60 minutes of the game? (c) If a total of 6 goals are scored during the ﬁrst 50 minutes of the game, what is the probability that Canada scores exactly 4 of them? 6. [6 marks] Recall that if X is a random variable, then its moment generating function M (t) is deﬁned by M (t) = E (etX ). (a) Supposing that you know M (t), how could you determine the expected value of X 12 ? (b) For a particular random variable X , it has been determined that MX (t) = E (etX ) = following: i. E (X ) 1 . Compute the 1 − 6t ii. E (X 2 ) iii. Var(X ) (c) If someone tells you that they have determined that the moment generating function of a diﬀerent random variable, Y , is MY (t) = E (etY ) = cos(t), why should you not believe them? (Page 3 of 4) 7. [5 marks] Two random variables, X and Y , satisfy E (X ) = 3, E (Y ) = 5, E (X 2 ) = 21, and E (Y 2 ) = 28. (a) What additional information do you need if you want to compute the correlation coeﬃcient of X and Y ? (b) What are the values of Var(X ) and Var(Y )? (c) Suppose, in addition, that E (XY ) = 9. Compute Cov(X, Y ). (d) In fact, it is not possible for E (XY ) to be less than 9. What is the largest possible value of E (XY )? (Hint: Try computing ρ, the correlation coeﬃcient.) 8. [6 marks] A jar contains a large number of jelly beans. Suppose half of the jelly beans are red, one quarter are green, and the rest are blue. You plan to randomly select 20 beans from the jar. Let R, G, and B be the numbers of red, green, and blue beans you select. (a) What is the probability that R = 9 and B = 4? (b) You plan to buy the red beans, at a cost of 5 cents each, and the blue beans, at a cost of 8 cents each (you will not buy the green beans). So your total cost will be C = 5R + 8B cents. i. Compute E (C ). ii. What are Var(R), Var(B ), and Cov(R, B )? iii. Give an expression for Var(C ) involving Var(R), Var(B ), and Cov(R, B ), and use it to compute Var(C ). iv. Would the variance increase, decrease, or stay the same if there were only 40 beans in the jar? (Page 4 of 4) ...
View
Full
Document
 Spring '10
 WILKIE
 Formulas, Probability theory, ........., Hypergeometric Distribution, Review Package Spring

Click to edit the document details