This preview shows pages 1–5. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: X Outcome Random Variables Cartoon Guide Chapter 4 Highly Recommended! A Random Variable is : Y HH 2 The numerical outcome of a random experiment A function defined on the Sample Space S (the set of all possible outcomes). This function assigns a number to every possible outcome Denoted by X (or Y, or Z) Example : Flip a coin twice define X to be the number of heads. TT 0 TH 1 HT 1 P(outcome) X What are the probabilities for certain outcomes of X? X P(X=x) 0 1 2 This is called the PROBABILITY DISTRIBUTION of X Note on Notation : Uppercase X indicates a random variable, lowercase x indicates a particular value of the random variable. STAT 101106 Introduction to Statistics 125 Discrete vs. Continuous Random Variables (R.V.) Discrete R.V. : takes on finitely many values (or countably infinite i.e, X = 0, 1, 2, 3, . . .) The distribution of a discrete random variable is the list of all values taken by the random variable with the probability of taking each value. Roll of Die (Discrete) Example (see last example) Example : roll a die, let X = number of spots on a face. P(X=x) .15 .10 .05 0 1 2 3 X 4 5 6 P(X=1)=1/6, P(X=2)=1/6, etc. We say that P(X=x)=1/6 for all x. Example : flood insurance. You take out a 2 year term flood insurance policy for $50,000. You pay $1400 per year for this policy. The insurance company estimates that you have a .025 chance of having a flood in each of the next two years. If they pay in the first year, the policy is ended. Let X be the amount of money the insurance company makes on your policy. X P(X=x) STAT 101106 $48,600 .025 $47,200 .025 $2800 .95 126 Introduction to Statistics Continuous Random Variables Takes on infinitely many values (uncountably infinite) Probability is defined by a density function f (x) . This is a positively valued function with total area under the curve equal to 1! Aside : In Calculus terms this means that f ( x)dx 1 For a density function, area represents probability . That is P(a<X<b) = area under the density curve between a and b. Aside : In Calculus terms this means that P(a X b) b a f ( x)dx 0 Example : You play a game with a spinner that takes on values between 0 and 1. Let X be the 0.75 value of the spinner (this is a uniform [0,1] random variable). 0.25 0.5 P(0.25 0.5 0.25 X 0.5) 0.25 0 Introduction to Statistics 1dx 0.25 0.5 1 127 STAT 101106 Example : Standard Normal Distribution : Suppose that X has a standard normal distribution (we write this as X ~ N (0,1) : ~ parlance ). means `distributed as' in statistical P( 1 X 1 e 1 2 0.68 1 1) x2 2 2 1 0 1 2 Now, let's play a game : Example : Another spinner, this time discrete. X = payoff in game. Distribution of X : 0 with prob 0.5 X 2 with prob 0.3 9 with prob 0.2 What is a fair amount to pay to play this game? $9 $0 $2 STAT 101106 Introduction to Statistics 128 Means and Variances of Random Variables The fair amount to pay to play is the same as the expected winnings from the game. This is the mean of X : E( X ) X (0)(.5) (2)(.3) (9)(.2) 0 .6 1.8 2.4 dollars General Definition of the MEAN for Discrete Random Variables : Mean ( X ) E( X ) (X ) X i xi pi Notation : The mean of X is sometimes written as E(X), the Expected Value of X. Literally, what value would be expect X to take? This is also written as X or simply the true, unknown average value of a distribution (God's Speak Greek!). , STAT 101106 Introduction to Statistics 129 Example : flood insurance. What is the average amount the insurance company makes writing policies like the one outlined above? X P(X=x) X i $48,600 .025 $47,200 .025 $2800 .95 x i pi $48,600 * .025 $47,200 * .025 $2800 * .95 $265 Now : $265 dollars is the average amount of money the insurance company makes on a policy. What does this mean in practice? LAW OF LARGE NUMBERS As we do many independent repetitions of an experiment, drawing more and more occurrences of a random variable from the same distribution, the mean of our sample will approach the mean of the distribution more and more closely. STAT 101106 Introduction to Statistics 130 Example : flood insurance. Let's simulate writing identical flood policies for 10,000 homeowners with identical risk of flooding. We'll keep track of our average profit as we go. Here is the first part of the simulated data : If flood, Profit which from This Cumulative Average Flood? year? Policy Profit Profit No NA $2,800 $2,800 $2,800 No NA $2,800 $5,600 $2,800 No NA $2,800 $8,400 $2,800 No NA $2,800 $11,200 $2,800 No NA $2,800 $14,000 $2,800 No NA $2,800 $16,800 $2,800 No NA $2,800 $19,600 $2,800 No NA $2,800 $22,400 $2,800 No NA $2,800 $25,200 $2,800 No NA $2,800 $28,000 $2,800 No NA $2,800 $30,800 $2,800 No NA $2,800 $33,600 $2,800 No NA $2,800 $36,400 $2,800 No NA $2,800 $39,200 $2,800 Yes 1 $48,600 $9,400 $627 Yes 2 $47,200 $56,600 $3,538 No NA $2,800 $53,800 $3,165 Let's make a plot of our average profit on policies over many successive policies : 3000 2000 1000 0 1000 2000 3000 4000 1 1000 2000 3000 4000 5000 6000 7000 Number of Policies Written 8000 9000 10000 About $275 STAT 101106 Average Profit ($) Introduction to Statistics 131 NOW : Recall that the `Relative Frequency' Definition of Probability defines the probability of an event as the longrun relative frequency of the event : Fn # times event occurs in n independen t trials n As the sample size n gets large, the sample proportion will approach the true probability of the event : Fn p Example : The Spinner Game. Imagine playing the game n times, with n large. Total "sample" winnings: $9 $0 $2 X1 Mean of X2 Xn 0(# of 0' s) 2(# of 2' s) 9(# of 9' s) X: # of 0' s 0 n # of 2' s 2 n # of 9' s 9 n Xn That is : 0.5 Xn 0.3 Introduction to Statistics 0.2 X 132 0(0.5) 2(0.3) 9(0.2) STAT 101106 Variance and Standard Deviation of Random Variables For Discrete Random Variables : Variance ( X ) 2 (X ) 2 X i ( xi X ) 2 pi X SD( X ) Example : Spinner Game (X ) P( X x) 0.5 0.3 0.2 pi X X 0 2 9 2.4 0.4 6.6 x X 5.76 0.16 43.56 2 x X 2 x pi 2.88 0.048 8.71 2 X i ( xi ) 2 pi X 2.88 0.048 8.71 11.64 X 11.64 3.41 dollars $9 $0 $2 STAT 101106 Introduction to Statistics 133 Example : flood insurance. Let's calculate the standard deviation of the profit on policies : remember that the mean profit is $265. X P(X=x) 2 X i $48,600 .025 $47,200 .025 $2800 .95 ( xi X ) 2 pi X 122,122,775($ 2 ) $11,050.90 Means and Variances for Continuous Random Variables (for the interested nothing to remember here) Mean ( X ) Variance ( X ) 2 (X ) (X ) xf ( x)dx x 2 X f ( x)dx STAT 101106 Introduction to Statistics 134 Rules for Means and Variances For any constant number c (like 5 or 3.14 or 2 or n) : 1) (X c) (X ) c The mean of {a random variable plus a constant} equals the mean of {the random variable} plus the constant Easy to prove for X c discrete random variables, a bit more challenging for continuous random variables xi i c pi i xi pi X cpi c i xi pi i i cpi pi X c $11 $2 $4 Ex : Add two dollars to each possible payout on the spinner. New mean payout = $2.40 + $2.00 = $4.40 2) (cX ) c ( X ) The mean of { a random variable times a constant} equals the mean of {the random variable} times the constant Proof like above. STAT 101106 Introduction to Statistics 135 Ex : Change dollars to cents. The constant c = 100, payout is 240 cents. 900 0 200 (X ) 3) (X c) ( X ), 2 (X c) 2 The variance (standard deviation) is not affected by adding a constant. Proof like above. Ex : Adding two dollars to the payout does not change the standard deviation still $3.41 $11 $2 $4 4)
2 (cX ) c2 2 ( X ), (cX )  c  ( X ) The standard deviation of {a constant times a random variable} equals the absolute value of the constant times the standard deviation of the random variable. Ex : Change dollars to cents. The constant c = 100, new standard deviation is 341. 900 0 200 STAT 101106 Introduction to Statistics 136 Rules for addition of two random variables: ( X1 ( X1 IF X2) X2) ( X1) ( X1) (X2) (X2) X 1 and X 2 are INDEPENDENT then 2 X1 X 2 2 X1 X 2 2 X1 2 X1 2 X2 2 X2 (Variances of independent random variables ALWAYS add) Example : flood insurance. Get mean and standard deviation of the profit on two independent policies (i.e. these people are not neighbors) : 2 X Y $265 * 2 $530 122,122,775 * 2 244245550 $15628.36 X X Y NOW : STAT 101106 Introduction to Statistics 137 Suppose we take a series of INDEPENDENT observations from the SAME distribution. Put the rules all together for a sum and average of n INDEPENDENT and IDENTICALLY DISTRIBUTED random variables : Mean and Variance for the SUM of INDEPENDENT, Identical Random Variables In General Let X1, X2 ,..., Xn be an independent random sample of size n from a distribution having mean and standard deviation X X An Example Flip a coin 10 times. X1=0 for tails, 2 for heads. Mean of X1=1, SD=1. Same for all 10 coins. . Add up the results for 10 coin tosses. Average total sum = 10 X 1 X n be Let S n the sum of these random variables Mean of Sn = Sn Sn Variance of 2 Sn n X Sn = , i.e. Std. Dev. X Variance of sum = 10, SD = n 2 X 10 of Sn = n STAT 101106 Introduction to Statistics 138 Mean and Variance for the AVERAGE of INDEPENDENT, Identical Random Variables In General Let An Example Calculate the sample average for 10 coin tosses. Mean of Sample average = 1 2 X Xn X1 X n n Xn X n= X X . This is the SAMPLE MEAN Mean of Variance of sample mean = 1/10, Standard deviation of sample mean = Std. Dev. Variance of n n , i.e. X n= 1 10 2 Why is Variance of X n= n ? Now for something truly shocking . . . . STAT 101106 Introduction to Statistics 139 Example : the spinner game some again. X = payoff in game. Distribution of X : $9 $0 $2 0 with prob .5 S1 2 9 .3 .2 The mean and Variance of X : X 2.4 dollars 2 X 11.64 NOW : play the spinner game twice : Total Winnings in 2 plays Possible Outcomes 0=0+0 2=0+2=2+0 9=0+9=9+0 4=2+2 11 = 2 + 9 = 9 + 2 18 = 9 + 9 S2 X1 X2 = Probabilities (.5)(.5) = .25 (.5)(.3) + (.3)(.5) = .3 (.5)(.2) + (.2)(.5) = .2 (.3)(.3) = .09 (.3)(.2) + (.2)(.3) = .12 (.2)(.2) = .04 STAT 101106 Introduction to Statistics 140 Probability Distributions for S2 0 with prob .25 2 S2 4 9 11 18 If .3 .0 9 .2 .1 2 .0 4 S2 2 S2 X 1 X 2 , then 2 * (11.64) , i.e. S2 X1 S2 4.8 and 2 * 11.64 X2 Probabiliy distributions for increasing sample size n=4 n=1 n=8 n=2 STAT 101106 Introduction to Statistics 141 n = 16 n = 64 n = 32 As the sample size n increases, the shape is more and more normal!! n=64, S 64 has mean (64)(2.4) = 153.6 and SD n 64(3.41) 27.28 AND A NORMAL SHAPE!!! 127 181 STAT 101106 Introduction to Statistics 142 154 Central Limit Theorem If X1, X2, . . . Xn are a sample of n independent and identically distributed trials from any distribution with mean and standard deviation , then for n large enough, Sn ~ N (n , n ) , or X n ~ N ( , (these are equivalent statements) n ) Example : play the spinner game 25 times, calculate total X 1 X 2 X 25 . What is probability winnings S 25 of winning at least 80 dollars? S25 = (2.4)(25) = 60 = S25 25 * X 25 * 3.41 17.05 Use NORMAL APPROXIMATION (Central Limit Theorem) Pr( S 25 80) Pr( N (60,17.05) 80) 0.121 (use computer to figure this out). Computer reports true value as 0.127 a pretty good approximation!
STAT 101106 Introduction to Statistics 143 Example : flood insurance. Suppose that Each day our insurance company writes exactly 200 identical flood policies. For all policies written on a particular day, we calculate the AVERAGE profit on policies written that day. We repeat this process for the next 3 years (i.e. 3*365=1095 days) each day taking the average profit on policies written on that day. What is the distribution of all of these averages? Rules for means and variances for averages of identically distributed random variables says that X 200 X $265 (Mean of sample means = mean of a single observation) X X 200 200 $11,051 $781 200 Histogram of Mean Profit, 200 policies per day, 1095 days (Mean of sample means = mean of a single observation) AND the Central Limit Theorem says that the sample distribution of those 1095 days should be approximately Normal! 140 120 100 Frequency Mean 212.1 StDev 796.9 N 1094 80 60 40 20 0 2250 1500 750 0 Mean Profit 750 1500 STAT 101106 Introduction to Statistics 144 ...
View
Full
Document
This note was uploaded on 04/07/2008 for the course STAT 102 taught by Professor Jonathanreuningschererdonaldgreen during the Fall '05 term at Yale.
 Fall '05
 JonathanReuningSchererDonaldGreen

Click to edit the document details