Week02-Lecture - Simulation Modeling and Analysis(ORIE 4580/5580/5581 Week 2 Review of Probability and Statistics 1 Announcement and Agenda •

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Simulation Modeling and Analysis (ORIE 4580/5580/5581) Week 2: Review of Probability and Statistics (08/31/10 - 09/02/10) 1 Announcement and Agenda • Recitations begin this Friday (9/3) • • • • • There will be no recitation on Monday (9/6) because of Labor Day. You can attend the Friday sections, or just review the recitation questions and solutions that will be posted on Blackboard. HW1 is now available on Blackboard: Due Date: 11am on 9/9 • Please use the Discussion Board for Homework Questions. I’ve already posted additional hints on one of the threads. Ticket has been filed about the AC in B14. Will keep you posted. Simulation Job Opportunity at Intel Cornell INFORMS Chapter Announcement 2 Sample Space • • • • Suppose we perform a random experiment whose outcome cannot be predicted in advance. Sample Space = the set of all possible outcomes of the experiment Example1: The experiment is tossing a die • • Sample Space = {1, 2, 3, 4, 5, 6} Example 2: The experiment is flipping two coins simulataneously Sample Space = { HH, HT, TH, TT } 3 Random Variable (RV) • Random Variable (RV) is a function that maps the sample space to real numbers. The value of the random variable is determined by the outcome of the random experiment. • • • We typically denote the random variables by capital letters, for example, X,Y, or Z Example 1: The experiment is tossing a die • • • • X = 1 / (outcome of the die) Y = outcome of the die + 2 Example 2: The experiment is flipping 2 coins simultaneously X = # of tails Y= ￿ 1, 0, if the two coins are the same , otherwise. 4 Discrete Random Variables and Probability Mass Function • • Definition: Random variables that take only discrete values. • • • • • Our examples in the previous slide are discrete random variables Suppose we have a discrete random variable X taking integer values p(x) = Pr{ X = x } is called the probability mass function of X NOTE: The distinction between the random variable X (in upper case) and the variable x (in lower case) x=−∞ ∞ ￿ Properties: p(x) ≥ 0 for all x and p(x) = 1 Example: X = # of tails when we flip 2 fair coins simultaneously. • p(0) = 1/4, p(1) = 1/2, p(2) = 1/4, and p(x) = 0 for all x ∉ {0, 1, 2} 5 Discrete RV: Cumulative Distribution Functions • An alternative way to describe the the probability law of a random variable X is to consider its cumulative distribution F(⋅) x ￿ F (x) = Pr{X ≤ x} = • Properties of F(⋅): i=−∞ Pr{X = i} = i=−∞ x ￿ p(i) F (x) ≥ 0 for all x x→−∞ lim F (x) = 0 and lim F (x) = 1 b ￿ i=a Pr {a ≤ X ≤ b} = p(i) = • Knowing F(⋅) is the same as knowing p(⋅) because i=−∞ b ￿ x→∞ p(i) − i=−∞ a−1 ￿ p(i) = F (b) − F (a − 1) p(x) = i=−∞ x ￿ p(i) − i=−∞ x−1 ￿ p(i) = F (x) − F (x − 1) 6 Continuous Random Variables • Consider the following random variable Y • • • • Y = distance between your upper and lower eyelids at the end of the lecture Y can takes any value in a continuum (the positive real line), so it is called a continuous random variable How would we describe the behavior of X? Suppose we measure the distance of your eyelids after each lecture • • Ŷi = the measurement at the end of the ith lecture Suppose we construct a histogram of { Ŷi : i = 1, 2, ...., n } 7 Histogram and Probability Density Function Frequency (# of lectures) Histogram of the distance between your eyelids f(⋅) .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 1.1 1.2 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 1.1 distance distance (cm) As the sample size increases to infinity and the width of the interval converges to zero at an appropriate rate, the histogram will converge to a function f(⋅) that might look like this The function f(⋅) is called the probability density function of the random variable X 1.2 8 Important Fact About Density Function • • • It is NOT true that f(x) = Pr{X = x} In fact, for any value x, Pr{ X = x} = 0 Thus, for a continuous random variable, the probability that it takes on a specific value is ZERO! ￿b f (x)dx However, for any a ≤ b, Pr {a ≤ X ≤ b} = • • a Interpretation: Area under the density function over the interval [a,b] f(⋅) Pr {a ≤ X ≤ b} a b 9 Continuous RV: Cumulative Distribution Function • • • The cumulative distribution function F(⋅) is defined by: F (x) = Pr{X ≤ x} = Pr{−∞ < X ≤ x} = If f(⋅) is continuous in a interval containing x, then the derivative of the cumulative distribution function at x is equal to the density, that is, F´(x) = f(x) Properties of F(⋅) and f(⋅): ￿ x f (u)du u=−∞ ￿ ∞ f (x) ≥ 0 and F (x) ≥ 0 for all x f (x)dx = 1 −∞ x→∞ x→−∞ lim F (x) = 0 and lim F (x) = 1 ￿ b Pr {a ≤ X ≤ b} = f (x)dx = a ￿ b −∞ f (x)dx − ￿ a −∞ f (x)dx = F (b) − F (a) 10 Example • Let the probability density function of a random variable X be: f (x) = ￿ x3 /4, 0, if 0 ≤ x ≤ 2, otherwise. • What is the cumulative distribution function? 0, ￿ ￿x x F (x) = f (u)du = 0 −∞ 1, • ￿x u3 u4 ￿ 4 du = 16 ￿ = 0 if x < 0, x4 16 , if 0 ≤ x ≤ 2, if x > 2 What is Pr{ 1.0 ≤ X ≤ 1.5}? 1.54 1.04 4.0625 Pr {1.0 ≤ X ≤ 1.5} = F (1.5) − F (1.0) = − = 16 16 16 11 Expected Value (Mean, Average) • • • • If X is a discrete RV with probability mass function p(⋅), then E[X ] = • • If g(⋅) is a real-valued function, then E[g (X )] = x=−∞ ∞ ￿ g (x)p(x) ￿ ∞ x=−∞ ∞ ￿ xp(x) If X is a continuous RV with density function f(⋅), then E[X ] = If g(⋅) is a real-valued function, then E[g (X )] = ￿ ∞ xf (x)dx x=−∞ g (x)f (x)dx x=−∞ For any two random variables X and Y (can be either discrete or continuous): E[aX + bY ] = aE[X ] + bE[Y ] Example: Suppose the density of X is given by: f (x) = ￿ 2 ￿ x3 /4, 0, ￿ 2 if 0 ≤ x ≤ 2, otherwise. x5 x6 ￿2 8 ￿ = ￿= 4 24 0 3 12 E [X ] = xf (x)dx = 0 ￿ 2 0 x x￿ 8 = ￿= 4 20 0 5 4 5 ￿2 ￿ 2￿ EX = ￿ 2 x2 f (x)dx = 0 0 Variance Definition: Var(X ) = E (X − E[X ]) Alternative Formula: ￿ ￿ ￿ 2 ￿ Note: E[X ] is a deterministic quantity! Variance Under￿ Linear Mapping: ￿ ￿2 ￿ ￿2 2￿ 2 = E X − 2E[X ]X + (E[X ])2 ￿ Var(X ) = E (X − E[X ]) = E X − 2E[X ]X + (E[X ])2 Var(X ) = E (X − E[X ]) ￿ 2￿ ￿ 2￿ ￿ 2￿ 2 2 + (E[X ])2 = E ￿X 2 ￿ − (E[X ])2 = E X − 2 (E[X ]) + (E[X ])2 = E X − (E[X ])2 = E X − 2 (E[X ]) Var(aX + k ) = E (aX + k − E[aX + k ]) = E (aX + k − aE[X ] − k ) ￿ ￿ ￿ ￿ 2 2 2 = E (aX − aE[X ]) = a E (X − E[X ]) = a2 Var(X ) ￿ x3 /4, 0, 2 ￿ ￿ 2 ￿ • Example: Suppose the density of X is given by: f (x) = ￿ ￿2 8 8 8 2 2 = Var(X ) = E[X ] − (E[X ]) = − 3 5 75 if 0 ≤ x ≤ 2, otherwise. 13 Independence • • • Intuition: The random variables X and Y are independent if the knowledge of one of them tells us nothing about the value of the other! Suppose random variables X and Y are independent. For any functions h1(⋅) and h2(⋅), E[ h1(X)⋅h2(Y) ] = E[ h1(X)]⋅E[h2(Y) ] A measure of independence is the covariance between the random variables Cov(X, Y ) = E {(X − E[X ])(Y − E[Y ])} Cov(X, Y ) = E {(X − E[X ])(Y − E[Y ])} Cov(X, Y ) = E {(X − E[X ])(Y − E[Y ])} = E {XY − X E[Y ] − E[X ]Y + E[X ]E[Y ]} = E {XY − X E[Y ] − E[X ]Y + E[X ]E[Y ]} = E {XY − X E[Y ] − E[X ]Y + E[X ]E[Y ]} = E [XY ] − E[X ]E[Y ] − E[X ]E[Y ] + E[X ]E[Y ] = E [XY ] − E[X ]E[Y ] = E [XY ] − E[X ]E[Y ] − E[X ]E[Y ] + E[X ]E[Y ] = E [XY ] − E[X ]E[Y ] = E [XY ] − E[X ]E[Y ] − E[X ]E[Y ] + E[X ]E[Y ] = E [XY ] − E[X ]E[Y ] • If X and Y are independent, then E[X Y] = E[X]⋅E[Y], and thus, Cov(X,Y) = 0 • • IMPORTANT: Two random variables can have zero covariance, but are NOT independent! Example: X ~ Uniform[ -1, 1] and Y = X2. In this case,Y clearly depends on X, so they are NOT independent. However, Cov(X,Y) = 0 14 Variance and Covariance • • • By definition, Cov(X,X) = Var(X) For any two random variables X and Y, • • • • Var(a X + b Y) = a2 Var(X) + b2 Var(Y) + 2 a b Cov(X,Y) Why? IMPORTANT: If X and Y are independent, then Var(X + Y) = Var(X) + Var(Y) The variance of the sum of independent random variables is equal to the sum of the variance! 15 • Normal Distribution A continuous random variables X is normally distributed with mean μ and variance σ2 if its density function is given by (x−µ)2 1 f (x) = √ e− 2σ2 , σ 2π −∞ < x < ∞ • Normal density function for different values of mean and variance (picture from wikipedia) The density function is symmetric around the mean • • We will write X ~ N(μ, σ2) to denote that the random variable X is normally distributed with mean μ and variance σ2 • When μ = 0 and σ2 = 1, it is called a standard normal distribution. 16 Properties of Normal Distribution • • • • • By symmetry of the density function: • Pr{ X ≤ μ - a } = Pr{ X ≥ μ + a } (Picture?) If X ~ N(μ, σ2), then aX + b ~ N( a μ + b, a2 σ2) If X ~ N(μ, σ2), then (X - μ)/ σ ~ N(0,1) is a standard normal If X ~ N(μ1, σ12),Y ~ N(μ2, σ22) and X and Y are independent, then X + Y is also normally distributed with mean μ1 + μ2 and variance σ12 + σ22 If X ~ N(μ, σ2), then • Pr {X ≤ x} = Pr Thus, knowing the cumulative distribution of a standard normal allows us to determine the cumulative distribution of any other normally distributed random variables. ￿ X −µ x−µ ≤ σ σ ￿ x−µ = Pr N (0, 1) ≤ σ ￿ ￿ 17 Sums and Averages of Independent Random Variables • • Suppose X1, X2, .... are i.i.d. random variables where Xi ~ Uniform[0,1]. ¯ n = X1 + · · · + Xn ? What is the density function of the sample mean X n nn = 1 n= 1 =1 nn = 1 n= 1 =1 nn = 2 n= 2 =2 nn = 2 n= 2 =2 nn = 3 n= 3 =3 nn = 3 n= 3 =3 nn = 4 n= 4 =4 nn = 4 n= 4 =4 22 2 22 2 1.5 1.5 1.5 1.5 1.5 1.5 11 1 11 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 nn = 8 n= 8 =8 nn = 8 n= 8 =8 44 4 44 4 33 3 33 3 22 2 22 2 11 1 11 1 0.5 0.5 0.5 11 1 00 0 00 0 00 0 0.5 0.5 0.5 11 1 11 1 11 1 00 0 00 0 00 0 00 0 0.5 0.5 0.5 0.5 0.5 0.5 n n = 10 n = 10 = 10 n n = 10 n = 10 = 10 11 1 11 1 22 2 22 2 1.5 1.5 1.5 1.5 1.5 1.5 11 1 11 1 0.5 0.5 0.5 0.5 0.5 0.5 00 0 00 0 00 0 00 0 2.5 2.5 2.5 2.5 2.5 2.5 22 2 22 2 1.5 1.5 1.5 1.5 1.5 1.5 11 1 11 1 0.5 0.5 0.5 0.5 0.5 0.5 00 0 00 0 00 0 00 0 • E[X1] = 1/2 and Var(X1) = 1/12 11 1 11 1 0.8 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 00 0 00 0 00 0 00 0 0.5 0.5 0.5 0.5 0.5 0.5 n n = 12 n = 12 = 12 n n = 12 n = 12 = 12 11 1 11 1 0.5 0.5 0.5 0.5 0.5 0.5 n n = 14 n = 14 = 14 n n = 14 n = 14 = 14 11 1 11 1 44 4 44 4 33 3 33 3 22 2 22 2 11 1 11 1 00 0 00 0 00 0 55 5 55 5 44 4 44 4 33 3 33 3 22 2 22 2 11 1 11 1 00 0 00 0 00 0 0.5 0.5 0.5 11 1 55 5 55 5 44 4 44 4 33 3 33 3 22 2 22 2 11 1 11 1 00 0 00 0 00 0 0.5 0.5 0.5 11 1 18 Qualitative Observations • Suppose X1, X2, .... are independent and identically distributed (iid) random variables with a finite mean and variance. • • As n increases, the density of the sample mean becomes more “tightly concentrated”. • As n increases, the variability of the sample mean decreases. ￿n ￿ ￿n ￿ ￿ ￿￿ 1￿ 1 Var(X1 ) ¯ Xi = 2 Var Var Xn = Var Xi = n i=1 n n i=1 The average of n iid random variables is 1/n times as variable as any one of the random variable. • NOTE: However, the sum of n iid random variables are n times as variable 19 Law of Large Numbers (LLN) • Suppose X1, X2, .... are independent and identically distributed (iid) random variables with E[ | X1 | ] < ∞. Then, as n → ∞, “almost surely” n ￿ ¯n = 1 X Xi → E[X1 ] n i=1 • • ¯ IMPORTANT: For any finite n, the sample mean Xn is still a RANDOM VARIABLE. However, E[X1] is always a DETERMINISTIC quantity ¯ Question: How much does the sample mean Xn differ from E[X1]? • ¯ Let us consider the random variable Xn − E[X1 ] ￿ ￿ ￿￿ ¯ ¯ E Xn − E[X1 ] = E Xn − E[X1 ] = E[X1 ] − E[X1 ] = 0 ￿ ￿ ￿ ￿ Var(X1 ) ¯ ¯ Var Xn − E[X1 ] = Var Xn = n ¯ Can we say something about the distribution of Xn − E[X1 ] ? 20 Central Limit Theorem (CLT) • Suppose X1, X2, .... are independent and identically distributed (iid) random variables with Var(X1) = σ2. Then, as n → ∞, ￿ ￿ n 2 1￿ σ D ¯ n − E[X1 ] = X (Xi − E[X1 ]) −→ N 0, n i=1 n D where the symbol −→ that for large n, means convergence in distribution. This means n ￿ i=1 ¯n = 1 X n Xi ≈ N D ￿ σ E[X1 ], n 2 ￿ Implication 1: The sample mean is approximately normally distributed with mean E[X1] and variance σ2/n. Implication 2: The approximation gets better as the sample size n increases 21 Application of CLT • Suppose we want to determine the expected daily temperature in Ithaca • • • • • We can take n i.i.d. measurements X1, X2, .... Xn ¯ Evaluate the sample mean Xn Question: How do compute the 95% confidence interval of E[X1]? How ￿ ￿ ¯ ¯ do we choose ε to ensure that E[X1 ] ∈ Xn − ￿, Xn + ￿ with a probability of 0.95 ¯ Key Observation: The sample mean Xn is a random variable. Use CLT ¯ to approximate the distribution of Xn ! LLN tells us that E[X1] should be close to the sample mean, that is, ￿ ￿ ¯ ¯ E[X1 ] ∈ Xn − ￿, Xn + ￿ 22 • 95% Confidence Interval -- Part 1 Suppose, for now, that we know the standard deviation σ, and we want to find a 95% confidence interval of E[X1] Goal: Choose z such that ￿ ￿ ￿￿ σ ¯ n− zz√σ ≤ EE[X]1 ] ≤≤XXn + √√ ¯ + z zσ σ ¯ ¯n = Pr X 0.95 = Pr Xn − √ ≤ [X1 n n nn ￿ ￿ ￿￿ σ σ σσ ¯¯ n− EE[X]1 ]≤≤z √√ = Pr −z √ ≤ X z = Pr −z √ ≤ Xn − [X1 n nn n ￿ ￿ ￿ ￿￿ ￿ ￿ 22￿ σ σσ σσ σ ≈ Pr −z √ ≤ N 00, ≤≤z √√ (CLT) ≈ Pr −z √ ≤ N , z (C nn n nn n = Pr {−z ≤ N (0, , 1) ≤ z } } = Pr {−z ≤ N (0 1) ≤ z Want to choose z* so so that the RED area under the standard normal density is 0.95. You can look up the table to see that z* = 1.96 The above derivation is only an approximation because n is finite. It is exact if X1,..., Xn are normally distributed NOTE: If Φ(⋅) denotes the cumulative distribution of a standard normal, then Φ(z*) = 0.975, that is, z* = Φ-1(0.975) 23 95% Confidence Interval -- Part 2 • • What if we do NOT know the variance σ2 in advance? In practice, we estimate the variance using its sample estimator: s2 n 1 = n−1 n ￿￿ i=1 ¯ Xi − Xn ￿2 Why do we divide by n-1? In Question 3 on HW 1, you will prove that this is an unbiased estimator of the variance • So, the (approximate) 95% confidence interval for E[X1] is given by: ￿ ￿ s s ¯ n − 1.96 √n , Xn + 1.96 √n ¯ X n n 24 • • • • 100(1-α)% Confidence Interval Select a sample size n Generate n i.i.d. samples X1,..., Xn of the underlying random variable X Compute the sample mean and sample variance n ￿ ¯n = 1 X Xi n i=1 s2 n Look up the value zα/2 such that Pr { - zα/2 ≤ N(0,1) ≤ zα/2 } = 1 - α 1 = n−1 n ￿￿ i=1 ¯ Xi − Xn ￿2 • • • If Φ(⋅) denotes the cumulative distribution of a standard normal, then Φ( zα/2 ) = 1 - (α/2), that is, zα/2 = Φ-1(1 - (α/2)) By symmetry of the density function, Φ-1(1 - (α/2)) = - Φ-1(α/2) Excel command NORMINV(1-(α/2), 0, 1) gives the inverse of the cumulative distribution function of standard normal at 1-(α/2) • The (approximate) 100(1-α)% confidence interval of E[X] is given by ￿ ￿ s s ¯ n − zα/2 √n , Xn + zα/2 √n ¯ X n n 25 • Quantiles vs. Confidence Interval Suppose a random variable X has a probability density function f(⋅) and cumulative distribution function F(⋅): f(⋅) Pr{ X ≤ q } = p q The p-th quantile of X is the value q such that F(q) = Pr{ X ≤ q} = p IMPORTANT: Suppose we select q1 and q2 such that Pr{ q1 ≤ X ≤ q2 } = 0.95 The interval [q1 , q2 ] is NOT a 95% confidence interval for E[X]. 26 ...
View Full Document

This note was uploaded on 10/26/2010 for the course OR&IE 5580 at Cornell University (Engineering School).

Ask a homework question - tutors are online