### Stats LEC 3

Course: STATS 10, Winter 2008
School: UCLA
Language Common and Notations: Parameters and Statistics Variables can be summarized using statistics. A statistic is a numerical measure that describes a characteristic of the sample A parameter is a numerical measure that describes a characteristic of the population. We use statistics to estimate parameters A population is an entire group of which we want to characterize. Population parameters: mean,...

Language Common and Notations: Parameters and Statistics Variables can be summarized using statistics. A statistic is a numerical measure that describes a characteristic of the sample A parameter is a numerical measure that describes a characteristic of the population. We use statistics to estimate parameters A population is an entire group of which we want to characterize. Population parameters: mean, variance, standard deviation, proportion. A sample is a collection of observations on which we measure one or more characteristics. Sample statistics: mean, variance, standard deviation, proportion. Population Sample Notations: Estimation and Inferences Population Sample Population Parameters : Mean Var St. Dev. proportion p Sample Statistics : Mean Var St. Dev. 2 = 1 # xi N i=1 n 1 n x = ! xi n i=1 1 n s = ! (xi " x)2 n " 1 i=1 s= (x i " x)2 ! n"1 1 n 2 ! = # (xi " )2 N i=1 ! = # (xi " )2 N ^ proportion p The Normal Distribution The Normal Distribution Bell Shaped Symmetrical Mean, Median and Mode are Equal Location is determined by the mean, Spread is determined by the standard deviation, The random variable has an infinite theoretical range: + to - Mean = Median = Mode f(x) x Many Normal Distributions By varying the parameters and , we obtain different normal distributions The Normal Distribution Shape f(x) Changing shifts the distribution left or right. Changing increases or decreases the spread. x Given the mean and variance we define the normal distribution using the notation X~N( ,! ) The formula for the normal distribution is f(x) = 1 2!! e " ( x " )2 2! 2 Where e = the mathematical constant approximated by 2.71828 = the mathematical constant approximated by 3.14159 = the population mean = the population standard deviation x = any value of the continuous variable, - < x < For a normal random variable X with mean and standard deviation , i.e., X~N(, ), the area under the curve shows the proportion of data on the left of the given point Xo: f(x) P(X ! x 0 ) x0 x 0 Finding Normal Proportions P(X < b) a b x P(X < a) a b x P(a < X < b) = P(X < b)-P(X < a ) x a b The Standard Normal Any normal distribution (with any mean and variance combination) can be transformed into the standard normal distribution (Z), with mean 0 and variance 1 Z~N(0 ,1) f(Z) 1 0 Z X! Z= " Need to transform X units into Z units by subtracting the mean of X and dividing by its standard deviation The process of converting normal data to the standard scale is called standardizing. To convert Y into Z (a z-score) use the following formula: Y " Z= ! What does a z-score measure? Example If X is distributed normally with mean of 100 and standard deviation of 50, the Z value for X = 200 is This says that X = 200 is two standard deviations above the mean of 100. Comparing X and Z units 100 0 200 2.0 X Z ( = 100, = 50) ( = 0, = 1) Note that the distribution is the same, only the scale has changed. We can express the problem in original units (X) or in standardized units (Z) Finding Normal Proportions #a! b ! & P(a < X < b) = P % <Z< " ( \$ " ' # b! = P% z < " \$ & # a ! & <( ! P % z < " ( ' \$ ' f(x) a a! " 0 b b! " x Z Area Under the Curve The total area under the curve is 1.0, and the curve is symmetric, so half is above the mean, half is below f(X) 0.5 0.5 X The Standard Normal table in the textbook (Appendix A - 98) Table A-98 gives areas under the standard normal curve For a given Z-value a , the table shows a proportion of values of z, less then a: P(Z < a) a Z 0 Example: P(Z < 2.00) = 0.9772 0.9772 0 2.00 Z Example: P(Z < -2.00) = 1 0.9772 = 0.0228 For negative Z-values, use the fact that the distribution is symmetric to find the needed proportion: 0.9772 0.0228 0 0.0228 2.00 Z 0.9772 -2.00 0 Z General Procedure: To find P(a < X < b) when X is distributed normally: Draw the normal curve for the problem in terms of X Translate X-values to Z-values Use the Standard Normal Table Additionally: Use tools like SOCR (socr.ucla.edu) or other applets (moodle) Example: Suppose X is normal with mean 8.0 and standard deviation 5.0 Find P(X < 8.6) 8.0 8.6 X Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X < 8.6) =8 = 10 =0 =1 8 8.6 X 0 0.12 Z P(X < 8.6) P(Z < 0.12) Finding P(Z < 0.12) P(X 8.6) < = P(Z < 0.12)=0.5478 0.5478 0.00 0.12 Z Example: Upper Tail Suppose X is normal with mean 8.0 and standard deviation 5.0. Now Find P(X > 8.6) 8.0 8.6 X P(X > 8.6) = P(Z > 0.12) = 1.0 - P(Z 0.12) = 1.0 - 0.5478 = 0.4522 0.5478 1.000 1.0 - 0.5478 = 0.4522 0 0.12 Z 0 0.12 Z Finding the X value for a Known Proportion 1. Find the Z value for the known proportion 2. Convert to X units using the formula: X = + Z! Finding the X value for a Known Proportion (Percentage) Example: Suppose X is normal with mean 8.0 and standard deviation 5.0. Now find the X value so that only 20% of all values are below this X 0.2000 ? ? 8.0 0 X Z Find the Z value for 20% in the Lower Tail 1. Find the Z value for the known proportion 20% (0.20) area in the lower tail is consistent with a Z value of -0.84 0.2000 ? 8.0 -0.84 0 X Z Finding the X value 2. Convert to X units using the formula: X = + Z! = 8.0 + ("0.84)5.0 = 3.80 So 20% of the values from a distribution with mean 8.0 and standard deviation 5.0 are less than 3.80 Relationship to the Empirical Rule Recall the Empirical Rule y 1s ! 68% y 2 s ! 95% y 3s ! 99.7% How can we use the standard normal distribution to verify the properties of the empirical rule? The area: -1 < z < 1 = 0.8413 - 0.1587 = 0.6826 The area: -2.0 < z < 2.0 = 0.9772 - 0.0228 = 0.9544 The area: -3.0 < z < 3.0 = 0.9987 - 0.0013 = 0.9974 Relationship to the Empirical Rule The area: -1 < z < 1 = 0.6826 The area: -2.0 < z < 2.0 = 0.9544 The area: -3.0 < z < 3.0 = 0.9974 68% 95% 99.7% Z -3 -2 -1 0 1 2 3 Assessing Normality Not all continuous random variables are normally distributed It is important to evaluate how well the data is approximated by a normal distribution Assessing Normality How can we tell if our data is normally distributed? Several methods for checking normality: Mean = Median Empirical Rule: Check the percent of data that within 1 sd, 2 sd and 3 sd (should be approximately 68%, 95% and 99.7%). Histogram or dotplot Normal Plot Normal Plot A normal plot is a graph that is used to assess normality in a data set. When we look at a normal plot we want to see a straight line. This means that the distribution is approximately normal. Sometimes easier to see if a line is straight, than if a histogram is bell shaped. How the plot works: We take the data and plot it against normal scores To compute normal scores we take expected values of ordered observations from a sample of size n that is normally distributed N(0,1). When we then compare these "normal scores" to the actual y values on a graph, if the data were normal, we will see our straight line. Scatterplot of Y vs Nscore 12 11 10 9 8 7 6 5 4 3 -2 -1 0 Nscore 1 2 Y Example: The height for 11 women. Height 61 62.5 63 64 64.5 65 66.5 67 68 68.5 70.5 Normal scores z -1.5667 -1.0445 -0.8704 -0.5222 -0.3482 -0.1741 0.3482 0.5222 0.8704 1.0445 1.7408 Mean = 65.5 St.Dev = 2.8723 More examples.... Population of students 25 20 15 10 5 Histogram Population of students 3 2 1 0 -1 -2 -3 Normal Quantile Plot 58 60 62 64 66 68 70 72 74 76 78 80 Height 66 68 70 72 Height Normal Quantile = 0.230Height - 15.3 58 60 62 64 74 76 78 80 Population of students 45 40 35 30 25 20 15 10 5 0 50 100 150 200 250 Weight Histogram Population of students 3 2 1 0 -1 -2 -3 300 350 400 Normal Quantile Plot 200 250 Weight Normal Quantile = 0.0293Weight - 4.1 0 50 100 150 300 350 400 Population of students 60 50 40 30 20 10 Histogram Population of students 3 2 1 0 -1 -2 -3 Normal Quantile Plot 150 200 250 300 Age_months 350 400 250 300 350 Age_months Normal Quantile = 0.0416Age_months - 9.85 200 400 Example: Suppose that the average systolic blood pressure (SBP) for a Los Angeles freeway commuter follows a normal distribution with mean 130 mmHg and standard deviation 20 mmHg. Find the percentage of LA freeway commuters that have a SBP less than 100. First step: Rewrite with notation! Y ~ N(130, 20) Second step: Identify what we are trying to solve! Find the percentage for: y < 100 Third step: Standardize Y ! 100 ! 130 Z = = = !1.5 " 20 Fourth Step: Use the standard normal table to solve y < 100 = z < -1.5 = 0.0668 Therefore approximately 6.7% of LA freeway commuters have SBP less than 100 mmHg. Visually: Draw the picture! Y Z -1.5 0 100 130 z < -1.5 = 0.0668 Try these: What percentage of LA freeway commuters have SBP greater than 155 mmHg? Between 120 and 175?
