Topic 3-Statistical Inference

Topic 3-Statistical Inference - Topic #3 Statistical...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Topic #3 Statistical Inference Point Estimation Sampling Distribution of x o Sampling Distribution of p • Not on Exam 1 o H pothesis Testing Hy o Interval Estimation • Not on Exam 1 o o Form Your Team By End of 2nd Class! Slide 1 Slide 2 Statistical Inference o Statistical Inference o Definitions • A population is the set of all the elements of interest. • All 50,000 UCF students • A sample is a subset of the population. • This class Definitions (cont.) A parameter is a numerical characteristic of a population • usually unknown value • mean of a variable • Average GPA of all UCF students • An estimate is a statistical approximation of a parameter’s value • Average GPA in this class • If the estimate is computed from a sample, it is a sample statistic Slide 3 Slide 4 Statistical Inference The purpose of statistical inference • obtain information about a population from information contained in a sample • The sample results provide only estimates of the values of the population characteristics o With proper sampling methods, the sample results proper the will will provide “good” estimates of the population characteristics. • See earlier coverage of different sampling methods o o Point Estimation In point estimation we use the data from the sample to compute a SINGLE VALUE of a sample statistic that serves as an ESTIMATE of a population parameter. o ESTIMATOR • Formula • Gives numerical value of parameter. • Example: formula for mean x= o ∑x n i We refer to x as the point estimator of the population mean μ. o s is the point estimator of the population standard deviation σ. o p is the point estimator of the population proportion p. Slide 6 Slide 5 1 Point Estimation o Sampling Error The absolute value difference between an unbiased point estimate and the corresponding population parameter is called the sampling error. o Sampling error is the result of using a subset of the population (the sample), and not the entire population to develop estimates. o The sampling errors are: | x − μ | for sample mean |s - σ | for sample standard deviation | p − p | for sample proportion o REVIEW • ESTIMATE • numerical value calculated from data in the sample • approximation of population parameter value • ESTIMATOR • Formula • Gives estimate Slide 7 Slide 8 Example: ESPN ESPN annually receives 1,500 applications from prospective student interns. The application forms contain a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual is an in-state resident. inThe director of interns would like to know, at least roughly, the following information: • the average SAT score for the applicants, and • the proportion of applicants that are in-state inresidents. We will now look at two alternatives for obtaining the desired information. Slide 9 Example: ESPN o Alternative #1: Take a Census of ALL 1,500 Applicants ALL • SAT Scores • Population Mean x μ = ∑ i = 990 1,500 • Population Standard Deviation (x − μ)2 = 80 σ= ∑ i 1,500 • In-State Applicants In• Population Proportion p= 1,080 = .72 1,500 Slide 10 Example: Example: ESPN o Example: ESPN o Alternative #2: Take a SAMPLE of 50 Applicants SAMPLE • Excel can be used to select a simple random sample without replacement. • The process is based on random numbers generated by Excel’s RAND function. • RAND function generates numbers in the interval from from 0 to 1. • Any number in the interval is equally likely. • The numbers are actually values of a uniformly distributed random variable. Using Excel to Select a Simple Random Sample • 1500 random numbers are generated, one for each applicant in the population. • RULE FOR CHOOSING SAMPLE • then we choose the 50 applicants corresponding corresponding to the 50 smallest random numbers numbers as our sample. • (could have used different rule) • we find that 34 of the 50 are in-state residents in• Each of the 1500 applicants have the same probability of being included. Slide 11 Slide 12 12 2 Example: ESPN o Example: Example: ESPN o Point Estimates • x as Point Estimator of μ (μ = 990) x= • s as Point Estimator of σ (σ = 80) ∑ xi 49 , 850 = = 997 50 50 o Sampling error of… • …Mean = 997 – 990 = 7 • …Standard deviation = 75.2 – 80 = 4.8 in absolute value. • …Proportion = .68 - .72 = .04 in absolute value. s= • 2 277 , 097 ∑ ( xi − x ) = = 75. 2 49 49 p as Point Estimator of p (p=.72) (p=.72) p = 34 50 = . 68 o o Note: Different random numbers would have identified a different sample which would have resulted in different point estimates. Slide 13 Note: Usually Note: Usually we never know the population parameter so we cannot compute compute the sampling error. (If we knew the population parameter, we would not bother estimating it with a sample.) But we can make probability statements about the sampling error, based on sampling sampling distribution. Example: “The survey has a margin of error of plus or minus 3 percentage points” usually means that “There is a 95% probability that the sample proportion is within +/- 3% of the population proportion.” +/- Slide 14 Sampling Sampling Distributions A sampling distribution is the probability distribution of a sample statistic. o Example: the sampling distribution of the sample mean. • There are many possible sample that could be drawn from a given population. • The sample mean would vary from sample to sample. • The sampling distribution is the frequency distribution of the sample means obtained from all possible samples from the population. o o What is a Sampling Distribution of x ? Example • a) estimating average height of UCF students • b) row #1/sample #1 mean = 5’4” • c) row #2/sample #2 mean = 5’5” 5’5” • d) row #3/ row #3/sample #3 mean = 6’9” (!) #3 6’9” • e) etc • f) DISTRIBUTION OF ESTIMATES (means) • (1) 4’4”, 5’5”, 5’7”, 5’8”, . . ., 6’3”, 6’5”, 6’9” • So each sample gives a different sample mean. • (Note these should be random samples to use sampling theory.) Slide 16 Slide 15 15 Distribution Distribution of Average Heights f(x) Sampling Distributions The sampling distribution is not the distribution of heights, but the distribution of the mean heights obtained from many different samples. o If we follow good sampling protocol, then most samples would give us a sample mean that is close to the true but unknown population mean. o But some samples, despite being random, will not be representative. • Some will include too many tall people, and give us a sample mean much larger than the population mean. • Others will include too many short people, and give us a sample mean much smaller than the population mean. o Each sample gives one value 4’4” μ x 6’9” Point: Taking many samples yields a distribution of x values Slide 17 Slide 18 18 3 Distribution of Average Heights f(x) Central Limit Theorem The basic idea of the Central Limit Theorem is that, as the sample size increases, the sampling distribution of the sample mean • Approaches a normal distribution, • With a mean equal to the population mean, • And a standard deviation equal to the population standard standard deviation divided by the square root of sample size. o Even if the population is not normally distributed, for large sample size, the sampling distribution of the sample mean is approximately normal. o Each sample gives one value 4’4” μ x 6’9” Point: This is the sampling distribution of x values Slide 19 Slide 20 20 Distribution Distribution of Average Heights f(x) Sampling Distribution of x The sampling distribution of x • is the probability distribution of all possible values of x from all possible samples o By the CLT, E( x ) = μ where: μ = the population mean and o Each sample gives one value 4’4” μ x 6’9” x= ∑x n i Point: This is the sampling distribution of x values Slide 21 Slide 22 Sampling Distribution of x o Sampling Distribution of x o Standard Deviation of x Finite Population Infinite Population Standard Deviation of x σx = σ n standard error of the mean σx = ( σ n ) N −n N −1 σx = σ n • NOTE: A finite population is treated as being infinite if n/N < .05. • ( N − n ) / ( N − 1) is the finite correction factor. • σ x is referred to as the standard error of the mean. Slide 23 Suppose σ = 48 & n = 36. σ x = 8 Suppose σ = 48 & n = 64. σ x = 6 Suppose σ = 48 & n = 100. σ x = 4.8 48 100 What happens to σ x as sample size grows? Point: As sample size grows, standard error of mean shrinks. As a result, larger sample size provides higher probability that sample mean is within specified distance of pop. mean. Slide 24 4 Sampling Distribution of x o Sampling Sampling Distribution of x If we use a large (n > 30) simple random sample, • the CENTRAL LIMIT THEOREM enables us to conclude that the sampling distribution of x can be approximated by a normal probability distribution distribution. o When the simple random sample is small (n < 30), • the sampling distribution of x can be considered normal normal only if we assume the population has a normal normal probability distribution. o Whenever the population has a normal probability distribution, • the sampling distribution of x is a normal probability distribution for ANY sample size o Slide 26 Standard Deviation of x Suppose σ = 48 & n = 64. σ x = 6 95% of all values x between 188 and 212. 95% of all samples give x within 2 σ of mean (=200, say). Suppose σ = 48 & n = 36. σ x = 8 95% of all possible x between 184 and 216. . Suppose σ = 48 & n = 100. σ x = 4.8 95% of all values x between 190.4 and 209.6. . Point: As sample size grows, 95% of all values closer and closer to mean . . . or more values closer to mean. Slide 25 Sampling Distribution of x Now have completely described sampling _ distribution of x _ • Use estimator such that E( x ) = μ • Calculate standard error of the mean • Determine shape of sampling distribution • Likely normal l o o Example: ESPN Sampling Distribution of x for the SAT Scores NOTE THIS σx = σ n = 80 = 11. 3 50 E(x ) = μ NOTE: two possible sources for value of σ: 1. Know it from population (e.g., see slide #13) 2. Estimate it using sample standard deviation x Slide 27 Slide 28 Example: Example: ESPN o Example: ESPN o Sampling Distribution of x for the SAT Scores n = 50 Sampling distribution of x Sampling Distribution of x for the SAT Scores What is the probability that • a simple random sample of 50 applicants (n = 50) • will provide an estimate of the population mean SAT score (μ) • within (plus or minus) 10 of μ ? Note: Note: assume that don’t know value of μ. Area = ?? Area = ?? μ - 10 μ μ + 10 x Slide 29 Slide 30 5 Example: ESPN(cont.) •NOTE: • Pop std. dev was calculated = 80 (previous slide) • Std. dev. of x was calculated = 11.3 (previous slide) • USE STD. DEV. OF x = 11.3 •Here are the steps to find the answer: •Step (1) convert the x value of (μ – 10) to a Z-value: • z1= (μ – 10) - μ / 11.3 = -10 / 11.3 = -0.88 • so, x = (μ – 10) equivalent to z=-0.88 z=•Step (2) convert the x value of (μ + 10) to a Z-value: Z• z2 = (μ + 10) - μ / 11.3 = 10 / 11.3 = 0.88 • so, x = (μ + 10) equivalent to z= 0.88 Slide 31 Example: ESPN(cont.) HINT: draw picture •Step (3) P(-0.88 < Z < 0.88) is area between P(z = - 0.88 and z = 0.88 • this area is sum of two areas: • area between z = - 0.88 & z =0 (area A1) & (area • area between z = 0 & z = 0.88 (area A2) •Step (4) P(-0.88 < Z < 0.88) = area A1 + area A2 P(area = 0.3106 + 0.3106 = 0.6212 •This answer means there is 62.12 % probability that x will be between (μ – 10) & (μ + 10) (or 62.12 % of all x values fall between (μ – 10) & (μ + 10) - from n = 50 sample). sample). Slide 32 Example: ESPN o Sampling Distribution of x What is the probability that a simple random sample of 50 applicants will provide an estimate of the population mean SAT score that is within (plus or minus) 10 of the actual population mean μ ? • 62.12 % probability that x falls between (μ – 10) & (μ + 10) σ 80 o What if n = 256? σ= = =5 o x Sampling Distribution of x for the SAT Scores from n = 50 sample Sampling distribution of x Area = .3106 Area = .3106 n 256 μ - 10 μ μ + 10 x There is 62.12% probability that x will be between (μ – 10) & (μ + 10) with n = 50 Slide 33 • 95.45 % probability that x falls between (μ – 10) & (μ + 10) Slide 34 Sampling Sampling Distribution of p The sampling distribution of p is the probability distribution of all possible values of the sample proportion p . o o Sampling Distribution of p Standard Deviation of p Finite Population Infinite Population We want E ( p ) = p where: p = the population proportion σp = p (1 − p ) N − n N −1 n σp = p (1 − p ) n • σ p is referred to as the standard error of the proportion proportion. Slide 35 Slide 36 6 When Use Finite Pop./Infinite Pop. Formula? Formulas for σ x Example: ESPN o and σ Sampling Distribution of Inp for In-State Residents .72 = population proportion p Is n/N < .05 ? YES use INFINITE population formula NO use FINITE population formula NOTE THIS σp = .72(1−.72) =.0635 50 E ( p ) = p = . 72 The normal probability distribution is an acceptable approximation since np = 50(.72) = 36 > 5 and n(1 - p) = np 50(.28) = 14 > 5. Slide 37 Slide 38 Example: ESPN o Example: ESPN o Sampling Distribution of p for In-State Residents InWhat is the probability that a simple random sample of 50 applicants will provide an estimate of the population proportion of in-state residents that is inwithin plus or minus .05 (5 percentage points) of the (5 actual population proportion? In other words, what is the probability that p will be between .67 and .77? Sampling Distribution of p for In-State Residents InSampling distribution of p Area = .2852 Area = .2852 2852 0.67 0.72 0.77 p For z = .77-.72/.0635 = .79, the area = (.2852)(2) = .5704. .77The probability is .5704 that p will be within p +/+/-.05 of the actual population proportion. Slide 39 Slide 40 Hypothesis Testing Hypothesis Testing o Generally, any formal or informal testing begins with an idea, theory, speculation, guess, hunch or hypothesis about the population of interest • what is not true • what is true In formal hypothesis testing, there will always be two hypotheses: • the null hypothesis (H0) and • the alternative hypothesis (HA.) Slide 41 41 Slide 42 42 7 Hypothesis Testing (cont.) The null hypothesis • is often (but not always - see example later) … o the idea that you think is not true. o o o Example Example (cont.) In our system of justice, the presumption about the guilt or innocence of the accused is . . . In other words, the null hypothesis is that the accused is ______________ and it is up to the prosecution to disprove the null hypothesis. (So, it appears that the prosecution believes that the null hypothesis is wrong.) o Tip: • Begin stating your hypotheses by first … i fi • putting what you believe is true in the alternative what is hypothesis hypothesis o Slide 43 Slide 44 44 Hypothesis Hypothesis Testing (cont.) After testing the null hypothesis, you must draw one of two conclusions about it. Either the evidence favors the idea that the null hypothesis is • false (so you reject it in favor of the alternative) or • it is true (so you do NOT reject it.) o Notice that the two conclusions always refer to the null null hypothesis. o o Example A jury or judge will conclude either that the accused or is ??? or is not ???. Slide 45 45 Slide 46 46 Example A jury or judge will conclude either that the accused or is ??? or is not ???. (Ever wonder why the second verdict is never "innocent"?) o In terms of hypothesis testing, the jury either • rejects the null hypothesis of innocence (accused is guilty) or • doesn't reject it (accused is not guilty.) o o •Hypothesis Testing (cont.) Now, it is obvious that you may make either • correct decisions or • incorrect decisions about the null hypothesis. Slide 47 47 Slide 48 48 8 Hypothesis Testing (cont.) Correct decisions are: (H0: accused innocent) (H innocent) (in formal terms) Example reject false null convict guilty person don't reject true null acquit innocent person Incorrect decisions are: (in formal terms) reject true null don't reject false null o Hypothesis Hypothesis Testing (cont.) It is important to emphasize that you will never be 100% certain that you have made the correct decision. Because you are using a sample, not the population, there will always be a chance of making a wrong decision. making By the way, what are the sample & population in a trial? sample: evidence presented population: the complete truth o Example convict innocent person acquit guilty person o o o Slide 49 Slide 50 50 Type I & Type II Errors o o Hypothesis Testing (cont.) Over the years, researchers have concluded that it’s easier to control the chance of making the first type of incorrect decision (Type I error: rejecting a true null, e.g. convicting an innocent person) o So, every test of a hypothesis is conducted so as to control the probability of making the first t pe bilit fi ty of of error ("Type I error") – typically by making the probability of Type I error small. o For a given probability of Type I error, we prefer a test with a lower probability of a Type II error (for a given significance, we prefer greater power). o Slide 52 52 the error of rejecting a true null hypothesis is called Type Type I error the error of not rejecting a false null hypothesis is Type called Type II error Slide 51 Hypothesis Hypothesis Testing (cont.) o Example When the jury or judge convicts a person, there could be a 1% chance that they are convicting an innocent person person. o That is, there is a 1% chance that they have rejected a true null hypothesis of that person's innocence. o That is, the CHANCE OF MAKING TYPE I ERROR is 1%. o The probability of making a Type I Error is called the level level of significance of the test. Slide 53 53 Slide 54 54 9 Hypothesis Testing (cont.) Three Key Definitions The level of significance is a probability that level • we set before doing testing • we want to be small: usually 10%, 5% or 1% • is max. probability of making Type I error that we will tolerate The software calculates a p-value • "p" stands for probability • This should be compared with the level of significance (above) The value of (1 - the p-value) is the level of plevel confidence you have in a test's conclusions. Slide 55 Hypothesis Hypothesis Testing (cont.) When you conduct a test using a PC, the statistical software will usually print a p-value for each test you pdo. o This value can range from 0 through 1.00 (0% to 100%.) o This tells you the likelihood that you will make a Type I Error if you reject the null (= the probability of rejecting the null, if it is in fact true). o o o o Slide 56 56 Hypothesis Hypothesis Testing (cont.) o Who Uses This Additionally, the corresponding level of confidence tells you how confident you are that your rejection of the null is correct. o Stock market analysts have shown that concentrated mutual funds (those with less than 25 stocks in their portfolio) had statistically significant better returns statistically nonduring 1998 than non-concentrated mutual funds. (Source: The Wall Street Journal, April 22, 1999,p.4.) The Slide 57 Slide 58 58 Who Uses This Who Uses This • Medical researchers have shown, using one-tailed onehypothesis tests, that the much-touted family of muchantidepressants including Prozac are slightly worse in most aspects than the older generation of antidepressants. (Source: The New York Times, March The 20, 1999.) • Researchers state that there is not statistically significant evidence that the much-touted vitamin muchginkgo biloba would improve memory. “The jury is The still out.” (Source: The New York Times, April 4, 1999 p.1.) Slide 59 59 Slide 60 60 10 Who Uses This Example • A study for Morgan Stanley Dean Witter examined stock fund performance over two consecutive fivefiveyear periods. The report found that of those in the top quartile in the first period, only 28% remained in the top half in the second period. These results supported the null hypothesis that there is no repetition of top performers. (Source: Investor’s Investor’s Business Daily, May 5, 1999 p.A4.) Suppose you are trying to see if your company's revenues have increased from one year ago at the same time. o You have a sample of data for several units showing sample average average weekly unit revenues of $25,000 for last month month one year ago and $25,500 for last month this year. o Slide 61 61 Slide 62 62 Example Example (cont.) o Example (cont.) o It seems that revenues have, indeed, risen . . . but a little voice in the back of your head wonders if a different sample might give a different result. (After all, the difference is small.) You don't have the time to collect another sample. Instead you decide to put your ECO 6416 tools to work and test the significance of that difference. ?? Slide 63 63 Slide 64 64 Example (cont.) o Example Example (cont.) Your first step is to state your hypotheses. • H0: your company's revenues have not changed (average weekly unit revenues last year equal average weekly unit revenues this year) • versus • HA: your company's revenues have increased over the last year (average weekly unit revenues last year don't equal average weekly unit revenues of this year) You next perform a test of the difference whose results show a p-value of 0.046 (4.6%) po This leads you to reject the null hypothesis. reject o This means that there's only a 4.6% chance that you have have incorrectly rejected the null of no revenue increase. increase. o Slide 65 65 Slide 66 66 11 Example (cont.) o Hypothesis Testing – Basic Idea Suppose I have a coin that I claim is “fair” Suppose I toss it 20 times and it comes up heads all 20 times o Do you believe that my coin is “fair”? o Why do you believe that? o In other words, there is only a 4.6% chance that the revenues have actually remained the same. o Alternatively, you are (1 - 4.6% =) 95.4% confident that revenues have risen during the period. o Slide 67 Slide 68 Hypothesis Testing – Basic Idea o o o o o Hypothesis Testing – Basic Idea Suppose I toss a coin 20 times and it comes up heads all 20 times when I believe that it is fair o Suppose I catch a red light every day for 30 straight days when I believe that it is red ½ the time o Suppose I drive through an intersection with a traffic light every day The light is supposed to be red ½ the time and green ½ the time. No yellow. Suppose I catch a red light every day for 30 straight days Do you believe that light is red ½ the time ? Why do you believe that? 1. You state a possibility – hypothesis You hypothesis 2. An “experiment” yields an outcome that would rarely happen if your hypothesis is true 3. Probably your hypothesis is not true Slide 69 Slide 70 Hypothesis Testing Examples • There are two approaches to hypothesis testing involving test statistic numbers. • This first approach (example #1) is simpler that the second approach. • So, we’ll use this first approach in most but not all - of this course. • Look at the second approach on your own. Slide 71 71 Hypothesis Testing Example #1 • Problem: What is average height of Div. 1A basketball players? • Step 1: 1: a. Formulate hypotheses HO: μ = 6' 7"; HA: μ not = 6' 7" • We have here a two-sided alternative, so we will do a two-tail test. Slide 72 12 Step 1 (cont.) • B. Specify Decision Rule • i) Reject HO if p-value ≤ .05** • ii) Do not reject HO if p-value > .05** • ** NOTE: or .01 (or, rarely, .10). You select this value. Example #1 (cont.) • Step 2: Collect sample of size n and calculate sample mean height = 6’3” • If the true population mean is 6’7”, what is the probability that a sample of size n would give a sample mean of 6’3”? • Suppose you calculate p-value of 0.007. • This means: if the population mean is 6’7”, only 7 out of 1000 samples would give a sample mean as small as 6’3”. Slide 74 74 • Most use .05 in business Slide 73 Example #1 (cont.) • Step 3: a. Compare calculated p-value with value in decision rule • calculated p-value = 0.007 < 0.05 • b. Choose correct hypothesis based upon comparison above • Following the decision rule in step 1, reject HO and conclude that μ not = 6' 7" Slide 75 Example #1 (cont.) • WHAT’S PROBABILITY YOU’RE WRONG? • Assume the null is true. You made a Type I error. The probability of a Type I error is 0.007. • WHAT’S PROBABILITY YOU’RE RIGHT? • Assume the null is false. You did not make a Type II error. The probability of correctly detecting a true alternative is the power of the test. Slide 76 76 Who Who Uses This Who Uses This • The average starting salary differential between MBAs graduating from top 25 programs and from all other programs is statistically significant: $120,000 statistically versus $60,000. (Source: U.S. Department of Education report cited in The New Times, August 2, The 1998, p.1.) • Researchers at the American Medical Association regularly use two-sample hypothesis tests to compare twodifferent physician populations (e.g., specialist versus primary care) and to compare a physician population h at different times (e.g. now and a year ago) (Source: Dr Carol Kane, AMA, Chicago, IL.) Slide 77 77 Slide 78 13 Who Uses This o • Exercise Officials at Indiana University have conducted studies showing that students living in dorms have statistically significant better grade averages than those living off campus. (Source: College Rewrite Book on Dorm Life,” Chicago Tribune, June 1, 1999, p.1) Do “Statistics Review - Writing Hypotheses” Exercise Slide 79 79 Slide 80 80 Recall: Finite Population Correction Factor x Finite Population Infinite Population When Use Finite Pop./Infinite Pop. Formula? Formulas for σ x and σ p N −n σx = ( ) n N −1 σ σx = σ n • NOTE: A finite population is treated as being infinite if n/N < .05. • ( N − n ) / ( N − 1) is the finite pop. correction factor. Is n/N < .05 ? YES use INFINITE population formula NO use FINITE population formula Slide 81 Slide 82 Recall: Finite Population Correction Factor Finite Population Infinite Population Confidence Holds Steady By Gary Langer σx = ( σ n ) N −n N −1 σx = σ n • NOTE: A finite population is treated as being infinite if n/N < .05. 1. Sampling from a population of 10,000 with sample size of 50. Use correction factor? 2. Sampling from a population of 1000 with sample size of 50. Use correction factor? 3. Sampling from a population of 100 with sample size of 50. Use correction factor? Slide 83 N E W Y O R K, Sept. 26 — Consumer Confidence Held Steady in the Latest ABCNEWS/money Magazine Poll As Americans Were Unfazed by War Jitters Responsible for Wall Street's Worst Week in 61 Years. 9/24/01 National Economy 44% 9/9/01 43% 1/7/01 71% 15-year avg. 42% Methodology The ABCNEWS/Money magazine Consumer Comfort Index is based on 1,009 telephone interviews in the month ending Sept. 23 and has an error margin of plus or minus 3 percentage points. Slide 84 14 Gallup Poll (October 6, 2001) (October (www.gallup.com) o o Do you think the United States should -- or should not -- take military action in retaliation for last week's attacks on the World Trade Center and the Pentagon? Would you support or oppose the U.S. continuing a campaign against terrorism if you knew that 5,000 U.S. troops would be killed? Interval Interval Estimation of a Population Mean: LargeLarge-Sample Case Sampling Error Probability Statements about the Sampling Error o Constructing an Interval Estimate: o Calculating an Interval Estimate: Large-Sample Case with Unknown with σ Unknown ± 3% Margin of Error September 21-22, 2001 Sample Size= 1,005 [--------------------- x ---------------------] [--------------------- x ---------------------] [--------------------- x ---------------------] μ x Slide 85 85 Slide 86 86 Sampling Error The absolute value of the difference between an unbiased point estimate and the population parameter it estimates is called the sampling error. o For the case of a sample mean estimating a population mean, the sampling error is o o What’s an Unbiased Estimator? Definition • “A point estimator is unbiased if the mean of its sampling distribution is equal to the population parameter being estimated.” o Mean of estimator = parameter • If true, estimator “unbiased” • If untrue, estimator “biased” o Sampling Error = |x − μ| Example • Know that mean of all of estimator’s possible values = height of UCF student population • Estimator is unbiased Slide 87 Slide 88 What’s an Unbiased Estimator? o What’s an Unbiased Estimator? Sampling Distributions of 2 x estimators for the SAT Scores Definition • Mean of estimator’s distribution = parameter value • Center of estimator’s distribution over parameter value Estimator #1 Why should you care? • On average, any estimate (from that particular estimator) calculated from ONE SAMPLE = parameter value o See Figures o x1 μ = 990 Estimator #2 x2 μ = 990 WHICH IS UNBIASED? HOW CAN YOU TELL? Slide 89 89 Slide 90 15 What’s an Unbiased Estimator? Sampling Distributions of 2 Estimator #1 x estimators for the SAT Scores o Probability Statements About the Sampling Error Knowledge of the sampling distribution of x enables us to make probability statements about the sampling error even though the population mean μ is not known. o A probability statement about the sampling error is a precision statement. x1 μ = 990 Estimator #2 x2 μ = 990 IS THE AVERAGE ESTIMATE FROM THE BIASED ESTIMATOR > μ OR < μ? HOW CAN YOU TELL? Slide 91 Slide 92 Probability Statements About the Sampling Error o Probability Statements About the Sampling Error o Precision Statement There is a 1 - α probability that the value of a sample mean will provide a sampling error of zα / 2 σ x or less. Sampling distribution distribution of x Precision Statement There is a 1 - α probability that the value of a sample mean will be within +/- zα / 2 σ x of the population +/mean μ Sampling distribution distribution of x α/2 1 - α of all x values α/2 x α/2 1 - α of all x values α/2 x μ Slide 93 − zα / 2σ x μ + zα / 2σ x Slide 94 Interval Estimate of a Population Mean: LargeLarge-Sample Case (n > 30) o Example: Example: National Discount, Inc. National Discount has 260 retail outlets throughout the United States. National evaluates each potential location for a new retail outlet in part on the mean annual income of the individuals in the marketing area of the new location. Sampling can be used to develop an interval estimate of the mean annual income for individuals in a potential marketing area for National Discount. A sample of size n = 36 was taken. The sample mean, x , is $21,100 and the sample standard deviation, s, is $4,500. We will use .95 as the confidence level in our interval estimate. With σ Unknown Unknown In most applications the value of the population standard deviation is unknown. We simply use the value of the sample standard deviation, s, as the point estimate of the population standard deviation. x ± zα / 2 s n Slide 95 Slide 96 16 Example: Example: National Discount, Inc. o Example: National Discount, Inc. o Precision Statement There is a .95 probability that the value of a sample mean for National Discount will provide a sampling error of $1,470 or less……. determined as follows: 95% of the sample means that can be observed are are within + 1.96 σ x of the population mean μ. = 4,500 = 750 , then If σ x = s n 36 Interval Estimate of the Population Mean: σ Unknown Interval Estimate of μ is: $21,100 + $1,470 or $19,630 to $22,570 We are 95% confident that the interval contains the population mean. (Because 95% of all interval estimates from this estimator estimator - from all samples - will contain μ.) zα / 2σx = 1.96(750) = 1,470. National Discount will build a retail outlet in this market if the mean annual income of the individuals in this area is > $21,500. Should they open a new outlet? Slide 97 Slide 98 Interval Estimation of a Population Mean: SmallSmall-Sample Case (n < 30) IF Population is Not Normally Distributed The only option is to increase the sample size to n > 30 and use the large-sample interval-estimation largeintervalprocedures. o IF Population is Normally Distributed and σ is Known eintervalThe large-sample interval-estimation procedure can be used. (σ is RARELY known) known) o IF Population is Normally Distributed and σ is Unknown The appropriate interval estimate is based on a probability distribution known as the t distribution. distribution o t Distribution o o The t distribution is a family of similar distribution probability probability distributions. A specific t distribution depends on a parameter known as the degrees of freedom. • how it is calculated varies across different tests using t distribution • here: degrees of freedom equals n – 1 Slide 99 Slide 100 100 t Distribution As value of degrees of freedom increases, (as n increases) • difference between the t distribution and the standard normal probability distribution becomes smaller and smaller. • when n becomes large enough, no difference no between between t and standard normal distributions o A t distribution with more degrees of freedom has less dispersion. o The mean of the t distribution is zero. o o Interval Estimation of a Population Mean: SmallSmall-Sample Case (n < 30) with σ Unknown Interval Estimate x ± tα /2 s n where 1 -α = the confidence coefficient tα/2 = the t value providing an area of α/2 va /2 in in the upper tail of a t distribution distribution with n - 1 degrees of freedom s = the sample standard deviation Slide 101 101 Slide 102 17 Interval Estimation of a Population Mean: SmallSmall-Sample Case (n < 30) with σ Unknown o Example: Ticket Prices o Interval Estimate x ± tα /2 o s n Contrast with large-sample (n > 30) case with σ e30) unknown x ± zα / 2 s n Slide 103 Interval Estimation of a Population Mean: SmallSmall-Sample Case (n < 30) with σ Unknown A reporter for a student newspaper is writing an article on the cost of student tickets to basketball games. A sample of 10 universities resulted in a sample mean of $5.50 per game and a sample sample standard standard deviation of $0.60. Let us provide a 95% confidence interval estimate of the mean ticket price for the population of all southeastern universities. We’ll assume this population to be normally distributed. What is different between the formulas? Slide 104 104 Example: Ticket Prices o Example: Ticket Prices o t Value At 95% confidence, 1 - α = .95, α = .05, and α/2 = .025. t.025 is based on n - 1 = 10 - 1 = 9 degrees of freedom. In the t distribution table we see that t.025 = 2.262. De grees of Freedom . 7 8 9 10 . .10 . 1.415 1.397 1.383 1.372 . .05 . 1.895 1.860 1.833 1.812 . Area in Upper Tail .025 . 2.365 2.306 2.262 2.228 . .01 . 2.998 2.896 2.821 2.764 . .005 . 3.499 3.355 3.250 3.169 . Slide 105 Interval Estimation of a Population Mean: SmallSmall-Sample Case (n < 30) with σ Unknown x ± t.025 s 0.60 = 5.50 ± 2.262 = 5.50 ± 0.43 n 10 $5 = $5.07 to $5.93 We are 95% confident that the mean ticket price per game for the population is between $5.07 and $5.93. The reporter wants to state that, based on the sample mean price of $5.50, ticket prices for UF games ($5.90) are higher than ticket prices in southeastern universities. Can she say that? Slide 106 Sample Sample Size for an Interval Estimate of a Population Mean Let E = the maximum sampling error mentioned in the precision statement. o E is the amount added to and subtracted from the point estimate to obtain an interval estimate. o E is often referred to as the margin of error. o We have o o Example: National Discount, Inc. Sample Size for an Interval Estimate of a Population Mean Suppose that National’s management team wants an estimate of the population mean such that there is a .95 probability that the sampling error is $500 or less. How large a sample size is needed to meet the required precision? E = zα /2 σ n o Solving for n we have n= (zα/2 )2 σ2 E2 Slide 107 Slide 108 18 Example: National Discount, Inc. o Interval Estimation of a Population Proportion o Sample Size for Interval Estimate of a Population Mean σ zα /2 = 500 (given) n At 95% confidence, z.025 = 1.96. Recall that σ = 5,000. 5,000. Solving for n we have Interval Estimate p ± zα/2 p(1− p) n n= (1.96) 2 (5, 000) 2 = 384.16 (500) 2 We need to sample 384 or 385 to reach a desired precision of + $500 at 95% confidence. NOTE: USUALLY ROUND UP SAMPLE SIZE Slide 109 where: 1 -α is the confidence coefficient zα/2 is the z value providing an area of is α/2 in the upper tail of the standard normal probability distribution p is the sample proportion Slide 110 Interval Estimation of a Population Proportion o Example: Political Science, Inc. o Interval Estimate p ± zα/2 p(1− p) n o The normal probability distribution is an acceptable approximation whenever o o np > 5 and n(1 - p) > 5 Interval Estimation of a Population Proportion Political Science, Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race. Using telephone surveys, interviewers ask registered voters who they would vote for if the election were held that that day. In In a recent election campaign in Seminole County, County, FL., PSI found that 220 registered voters, out of 500 contacted, favored Al Gore. PSI wants to develop a 95% confidence interval estimate for the proportion of the population of registered voters in Seminole County that favors Mr. Gore. Slide 111 Slide 112 112 Example: Political Science, Inc. o Sample Size for an Interval Estimate of a Population Proportion Let E = the maximum sampling error mentioned in the precision statement. o We have o Interval Estimate of a Population Proportion p ± zα / 2 p (1 − p ) n = . 44 ± 1. 96 . 44 (1−. 44 ) 500 = .44 + .0435 where: n = 500, p = 220/500 = .44, zα/2 = 1.96 500 220/500 PSI is 95% confident that the proportion of all voters that favors the candidate is between .3965 and .4835. The Democratic National Committee wanted to use these results to report that Al Gore would win Seminole County. Could they say that? Slide 113 E = zα/2 o p(1− p) n Solving for n we have n= (zα/2 )2 p(1− p) E2 Slide 114 19 Example: Example: Political Science, Inc. o Example: Political Science, Inc. o Sample Size for an Interval Estimate of a Population Proportion Suppose that PSI would like a .99 probability that the sample proportion is within + .03 of the population proportion. How large a sample size is needed to meet the required precision? Sample Size for Interval Estimate of a Population Proportion At 99% confidence, z.005 = 2.576. n= ( zα / 2 ) 2 p (1 − p ) ( 2.576) 2 (. 44 )(. 56) = ≅ 1817 (. 03) 2 E2 Note: We used .44 as the best estimate of p in the above expression. o If no information is available about p, then .5 is often often assumed because it provides the highest possible possible sample size. o If we had used p = .5, the recommended n would have been 1843. Slide 116 Slide 115 20 ...
View Full Document

This note was uploaded on 11/15/2010 for the course ECO 6416 taught by Professor Staff during the Spring '08 term at University of Central Florida.

Ask a homework question - tutors are online