Unformatted text preview: BASIC BUSINESS STATISTICS BASIC BUSINESS STATISTICS Eleventh Edition Berenson Levine Krehbiel Slides Modified by Eugene Li Topic 2 Topic 2 Inferences about Means and Proportions With Two Populations
s s s s s Estimation of the Difference between the Means of Two Populations: Independent Samples Hypothesis Tests about the Difference between the Means of Two Populations: Independent Samples Inference about the Difference between the Means of Two (Related) Populations: Matched Samples (also called Matched Pairs) Inference about the Difference between the Proportions of Two Populations (Ch. 10.110.3) Slide 2 Estimation of the Difference Between the Means of Estimation of the Difference Between the Means of Two Populations: Independent Samples
s s s s Point Estimate of the Difference between the Means of Two Populations Sampling Distribution of x1 − x2 Interval Estimate of µ 1 −µ 2 : LargeSample Case LargeSample Case Interval Estimate of µ 1 −µ 2 : SmallSample Case SmallSample Case Slide 3 Point Estimate of the Difference Between Point Estimate of the Difference Between the Means of Two Populations
s s Let µ 1 equal the mean of population 1 and µ 2 equal the mean of population 2. The difference between the two population means is µ 1 µ 2. To estimate µ 1 µ 2, we will select a simple random sample of size n1 from population 1 and a simple random sample of size n2 from population 2. x s s s Let equal the mean of sample 1 and equal the mean of sample 2. The point estimate of the difference between the means of x1 − x2 the populations 1 and 2 is .
Slide 4 x1 2 Sampling Distribution of x1 − x2 Sampling Distribution of s The sampling distribution of has the following properties. x1 − x2 Expected Value: E ( x1 − x2 ) = µ 1 − µ 2 Standard Deviation: σ x1 − x2 2 σ 12 σ 2 = + n1 n2 where σ1 = standard deviation of the variable of population 1 σ2 = standard deviation of variable of population 2 n1 = sample size from population 1 n2 = sample size from population 2 Slide 5 Interval Estimate of µ 1 µ 2: Interval Estimate of LargeSample Case (n1 > 30 and n2 > 30)
s Interval Estimate with σ 1 and σ 2 Known where ( x1 − x2 ) ± zα / 2σ x1 − x2 s 1 α is the confidence level Interval Estimate with σ 1 and σ 2 Unknown where ( x1 − x2 ) ± zα / 2 s x1 − x2
s x1 −x2 =
2 2 s1 s2 + n1 n2 (Can we use t instead of Z if σ 1 and σ 2 are unknown?)
Slide 6 Example: PaPa, Inc. Example: PaPa, Inc.
PaPa, Inc. is a manufacturer of golf equipment. PaPa has developed a new golf ball that has been designed to provide “extra distance.” In a test of driving distance using a mechanical driving device, a sample of PaPa golf balls was compared with a sample of golf balls made by MaMa, Ltd., a competitor. The sample data is below. Sample #1 Sample #2 PaPa, Inc. MaMa, Ltd. Sample Size n1 = 120 balls n2 = 80 balls
x1 x2 Mean = 235 yards = 218 yards Standard Deviation s1 = 15 yards s2 = 20 yards Slide 7 Example: PaPa, Inc. Example: PaPa, Inc. Do we need to check the normality of the extra distance of PaPa golf balls, as well as check the normality of the extra distance of MaMa golf balls? We should…….but because the sample sizes are large, the requirement is not a must in order to use z into developing the confidence interval. Slide 8 Example: PaPa, Inc.
s Point Estimate of the Difference Between Two Population Means µ 1 = mean distance for the population of PaPa, Inc. golf balls µ 2 = mean distance for the population of MaMa, Ltd. golf balls
x1 − x2 Point estimate of µ 1 µ 2 = = 235 218 = 17 yards. Slide 9 Example: PaPa, Inc.
s 95% Confidence Interval Estimate of the Difference Between Two Population Means: LargeSample Case, σ 1 and σ 2 Unknown Substituting the sample standard deviations for the population standard deviation:
( x1 − x2 ) ± zα / 2
2 s12 s2 (15) 2 (20) 2 + = 17 ±1.96 + n1 n2 120 80 = 17 + 5.14 or 11.86 yards to 22.14 yards. We are 95% confident that the difference between the mean driving distances of PaPa, Inc. balls and MaMa, Ltd. balls lies in the interval of 11.86 to 22.14 yards.
Slide 10 Interval Estimate of µ 1 µ 2: Interval Estimate of SmallSample Case (n1 < 30 and/or n2 < 30)
s Interval Estimate with two σ 2 Known and equal
( x1 − x2 ) ± zα / 2σ x1 − x2 where
σ x −x = σ 2 (
1 2 1 1 +) n1 n2 If two σ 2 are known but unequal, then just change the
σ x −x =
1 2 σ12
n1 + σ2 2
n2 Slide 11 Interval Estimate of µ 1 µ 2: Interval Estimate of SmallSample Case (n1 < 30 and/or n2 < 30)
s Interval Estimate with two σ 2 Unknown but (or after checking) equal where
s x1 −x2 = sp
2 ( x1 − x2 ) ± tα / 2 s x1 − x2
1 1 ( +) n1 n2 d.f. = n1 + n2 − 2 sp 2 2 ( n1 −1) s12 + ( n2 −1) s2 = (n1 − 1) + (n2 − 1) n1 + n2 − 2 Pooled Variance Estimator
s Interval Estimate with two σ 2 Unknown but (or after checking) unequal, just change s x1 −x2 = s1 s2 + n1 n2 2 2 d.f.d.f. = = 2 s ( ((s12n )n+ )s2 n ) ( s2 n2 ) 1 + n −1 n −1 n1 − 1 n2 − 1
2 1 2 1 2 2 2 2 1 2 2 ( s12 n1 + s2 / n2 ) 2 2 ( s12 n1 + s2 / n2 ) 2 2 (in p. 397)
Slide 12 Interval Estimate of µ 1 µ 2 Interval Estimate of Note: n1 , n2 ≥ 30 σ1,σ 2 Even , and are unknown, you can also use t and proceed in the ways in previous slide. It is always beneficial if .n1 = n2 Slide 13 Example: Specific Motors Example: Specific Motors
Specific Motors of Detroit has developed a new automobile known as the M car. 12 M cars and 8 J cars (the same grade from Japan) were road tested to compare milespergallon (mpg) performance. The sample data is below. Sample #1 Sample #2 M Cars J Cars Sample Size n1 = 12 cars n2 = 8 cars
x2 x1 Mean = 29.8 mpg = 27.3 mpg Standard Deviation s1 = 2.56 mpg s2 = 1.81 mpg Slide 14 Example: Specific Motors
s Point Estimate of the Difference Between Two Population Means µ 1 = mean milespergallon for the population of M cars µ 2 = mean milespergallon for the population of J cars x1 − x2 Point estimate of µ 1 µ 2 = = 29.8 27.3 = 2.5 mpg. Slide 15 Example: Specific Motors
s 95% Confidence Interval Estimate of the Difference Between Two Population Means: SmallSample Case For the smallsample case we need to check the following : 1. The miles per gallon rating must be normally distributed for both the M car and the J car. 2. The equality of variances in the miles per gallon rating for both the M car and the J car. Suppose both are yes, then use the t distribution with n1 + n2 2 = 18 degrees of freedom, the appropriate t value is t.025 = 2.101. We will use a weighted average of the two sample variances as the pooled estimator of σ 2. Slide 16 Example: Specific Motors
s 95% Confidence Interval Estimate of the Difference Between Two Population Means: SmallSample Case sp
2 2 (n1 − 1) s12 + (n2 − 1) s2 11(2.56) 2 + 7(1.81) 2 = = = 5.28 n1 + n2 − 2 12 + 8 − 2 ( x1 − x2 ) ± t.025 11 11 s p ( + ) = 2.5 ± 2.101 5.28( + ) n1 n2 12 8
2 = 2.5 + 2.2 or .3 to 4.7 miles per gallon. We are 95% confident that the difference between the mean mpg ratings of the two car types is from .3 to 4.7 mpg (with the M car having the higher mpg).
Slide 17 s Hypothesis Tests About the Difference Hypothesis Tests About the Difference Between the Means of Two Populations: Independent Samples
Hypothesis Forms: H0: µ 1 µ 2 ≤ 0 H0: µ 1 µ 2 ≥ 0 H0: µ 1 µ 2 = 0 Ha: µ 1 µ 2 > 0 Ha: µ 1 µ 2 < 0 Ha: µ 1 µ 2 ≠ 0 Test Statistic: • LargeSample Case ( x1 − x2 ) − ( µ1 − µ2 ) s z= ( x1 − x2 ) − ( µ1 − µ 2 ) z= σ 2 1 n1 + σ 2 2 n2 or • SmallSample Case 1 1 σ + n n2 1 2 t= ( x1 − x2 ) − ( µ1 − µ 2 ) ( s1 n1 + s2 n2 )
2 2 or t= ( x1 − x2 ) − ( µ1 − µ 2 ) s p (1 n1 + 1 n2 )
2 (remember the two different degrees of freedom?)
Slide 18 Example: PaPa, Inc. (Yes! Again!)
PaPa, Inc. is a manufacturer of golf equipment. PaPa PaPa, Inc. is a manufacturer of golf equipment. PaPa has developed a new golf ball that has been designed to provide “extra distance.” In a test of driving distance using a mechanical driving device, a sample of PaPa golf balls was compared with a sample of golf balls made by MaMa, Ltd., a competitor. The sample data is below. Sample #1 Sample #2 PaPa, Inc. MaMa, Ltd. Sample Size n1 = 120 balls n2 = 80 balls
x1 x2 Mean = 235 yards = 218 yards Standard Deviation s1 = 18 yards s2 = 20 yards
Slide 19 Example: PaPa, Inc.
s Hypothesis Tests About the Difference Between the Means of Two Populations: LargeSample Case Can we conclude, using a .01 level of significance, that the mean driving distance of PaPa, Inc. golf balls is greater than the mean driving distance of MaMa, Ltd. golf balls? (Assume we have checked and don’t have enough evidence to say they are unequal) µ 1 = mean distance for the population of PaPa, Inc. golf balls µ 2 = mean distance for the population of MaMa, Ltd. golf balls Hypotheses: H0: µ 1 µ 2 ≤ 0 Ha: µ 1 µ 2 > 0
Slide 20 Example: PaPa, Inc.
s Hypothesis Tests About the Difference Between the Means of Two Populations: LargeSample Case Rejection Rule: Reject H0 if z > 2.33 z= ( x1 − x2 ) − ( µ 1 − µ 2 )
2 s12 s2 + n1 n2 17 = = = 6.12 (18) 2 (20) 2 2.7749 + 120 80 (235 − 218) − 0 Conclusion: Reject H0. We have enough evidence that the mean driving distance of PaPa, Inc. golf balls is greater than the mean driving distance of MaMa, Ltd. golf balls. (Do you know the value of the pvalue?)
Slide 21 Exercise: PaPa, Inc. (Sorry, can’t get rid of him)
Try to do the PaPa Inc. example with the following two Try to do the PaPa Inc. example with the following two situations: 1. (Equal variances) Sample #1 Sample #2
Sample Size Mean Standard Deviation 2. (Unequal variances) Sample Size Mean Standard Deviation PaPa, Inc. MaMa, Ltd. n1 = 25 balls n2 = 16 balls x1 x2 = 235 yards = 218 yards s1 = 18 yards s2 = 20 yards
Sample #1 PaPa, Inc. n1 = 25 balls Sample #2 MaMa, Ltd. n2 = 16 balls x1 x2 = 235 yards = 218 yards s1 = 10 yards s2 = 40 yards
Slide 22 Which case to use: Which case to use: Equal variance or unequal variance?
s s Whenever there is insufficient evidence that the population variances are unequal, it is preferable to perform the ttest with equal variances. This is so, because for any two given samples The number of degrees of freedom for the equal variances case
s ≥ The number of degrees of freedom for the unequal variances case How can you be sure there is sufficient evidence that the population variances are unequal? We will discuss this in Additional Topic 1.
Slide 23 Note: Note: If we have equal variances, t is robust to departure of normality (but not extreme!). That means the pooledvariance t won’t lose its accuracy. Slide 24 Inference About the Difference Between the Means of Two Inference About the Difference Between the Means of Two (Related) Populations: Matched Samples/Pairs
s s With a matched samples/pairs design each sampled item provides a pair of data values. The matched samples design can be referred to as blocking. (Note: The previous independent sample cases are called independentsample design) s s This design often leads to a smaller sampling error than the independentsample design because variation between sampled items is eliminated as a source of sampling error. NOTE: It is not the same as the hypothesis test about the difference between the means of two populations for independent samples. Slide 25 Example: Express Deliveries Example: Express Deliveries
A Vancouverbased firm has documents that must be quickly distributed to district offices throughout Canada. The firm must decide between two delivery services, UPX (United Parcel Express) and INTEX (International Express), to transport its documents. In testing the delivery times of the two services, the firm sent two reports with the same conditions (e.g. the same size, the same weight, etc.) to a random sample of ten district offices with one report carried by UPX and the other report carried by INTEX. Do the data that follow indicate a difference in delivery times for the two services?
Slide 26 Example: Express Deliveries
District Office St. Johns Halifax Montreal Kingston Winnipeg Ottawa Regina Calgary Prince George Toronto Delivery Time (Hours) UPX INTEX Difference 32 25 7 30 24 6 19 15 4 16 15 1 15 13 2 18 15 3 14 15 1 10 8 2 7 9 2 16 11 5
Slide 27 Example: Express Deliveries Example: Express Deliveries
BoxandWhisker Plot 2 0 2 4 6 8 Normal Probability Plot
99.9 99 95 80 50 20 5 1 0.1 2 0 2 4 6 8 Differences percentage Differences Slide 28 Example: Express Deliveries
s s (Ask yourself…is this an independent sample case or matched sample case? Why?) Inference About the Difference Between the Means of Two Populations: Matched Samples Let µ D = the mean of the difference values for the two delivery services for the population of district offices Hypotheses: H0: µ D = 0, Ha: µ D ≠ 0 We need to check the population of difference values is approximately normally distributed, then the tdist’n with n 1 degrees of freedom applies. With α = .05, t.025 = 2.262 (9 degrees of freedom). Rejection Rule: Reject H0 if t < 2.262 or if t > 2.262 Slide 29 Example: Express Deliveries
s Inference About the Difference Between the Means of Two Populations: Matched Samples
D D ∑ = n
i (7 + + + ) 6 ... 5 = =2.7 10
i sD = ( ∑D −D ) 2 n− 1 = 76.1 =2.9 9 t= D −µ 2.7 −0 D = =2.94 sD n 2.9 10 Conclusion: Reject H0. We have enough evidence to conclude there is a difference between the delivery times for the two services. Can we conclude INTEX provides faster service, why?
Slide 30 Example: Express Deliveries Example: Express Deliveries
s Confidence Interval of µ d
D ± tα / 2,n −1 sD n 2.7 ± 2.262 With our example, we have 2.9 = 0.6265,4.7744 10 Therefore, we have 95% confidence that the mean difference between the two deliveries is in between 0.6265 to 4.7747 hrs. Because the differences were calculated as UPX minus INTEX, therefore a positive value means UPX takes longer. This means we have 95% confidence that UPX takes longer than INTEX by 0.6265 Slide 31 Matched Samples Design (More explanation)
Another Example Another Example • To investigate the job offers obtained by MBA graduates, a study focusing on salaries was conducted. • Particularly, the salaries offered to finance majors were compared to those offered to marketing majors. • Two random samples of 25 graduates in each discipline were selected, and the highest salary offer was recorded for each one. The data are in next slide. • Can we infer that finance majors obtain higher salary offers than do marketing majors among MBAs?. Slide 32 Matched Samples Design (More explanation) Matched Samples Design (More explanation)
Data :
Independent Finance 61228 79782 77073 47915 96382 Marketing 73361 97097 51943 68421 62553 36956 49442 35272 56276 81931 63627 75188 60631 47510 30867 71069 59854 63567 58925 49091 40203 79816 69423 78704 48843 Slide 33 51836 29523 86705 86792 80644 20620 80645 70286 75155 51389 73356 76125 63196 65948 61955 84186 62531 64358 29392 63573 Matched Samples Design (More explanation)
s Solution • Compare two populations of interval data. • The parameter tested is µ 1 µ 2 Finance 61,228 51,836 20,620 73,356 84,186
. . . Marketing 73,361 36,956 63,627 71,069 40,203 . . . µ 1 mean of the highest salary The offered to Finance MBAs
µ 2 mean of the highest salary The H : (µ  µ ) ≤ 0 offered to Marketing MBAs
Slide 34 Matched Samples Design (More explanation)
s Solution – continued Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) onetail t Critical onetail P(T<=t) twotail t Critical twotail Finance 65624 360433294 25 311330926 0 48 1.04 0.1513 1.6772 0.3026 2.0106 Mark eting 60423 262228559 25 From the data we have: x1 = 65624 , x2 = 60423 ,
2 s1 = 360433294 ,,, s2 = 262228559 ,, 2 We don’t have enough evidence to Conclude that Finance MBAs are offered • Let us assume equal higher salaries than marketing MBAs. variances
Slide 35 The effect of a large sample variability
s Question • The difference between the sample means is 65624 – 60423 = 5,201. Looks like the difference is not that small, roughly 8% on either mean. • So, why could we not reject H0 and favor Ha where µ 1 – µ 2 > 0? Slide 36 The effect of a large sample variability
s Answer: • Sp2 is large (because the sample variances are large) Sp2 = 311,330,926. • A large variance reduces the value of the t statistic and it becomes more difficult to reject H0. (x1 − x2 ) − (µ1 − µ 2 ) t= 1 21 sp ( + ) n1 n2 Slide 37 The variability of each sample The variability of each sample The range of observations sample A The values each sample consists of might markedly vary...
The range of observations sample B Slide 38 The variability of each sample in pairs
Differences ...but the differences between pairs of observations might be quite close to one another, resulting in a small The range of the variability of the differences.
differences 0
Slide 39 Matched Samples Design (More explanation) Matched Samples Design (More explanation)
s We rewrite the hypotheses in terms of µ D (the mean of the differences) rather than in terms of µ 1 – µ 2. This formulation has the benefit of a smaller variability. s Slide 40 Matched Samples Design (More explanation)
s Example (a modification) • It was suspected that salary offers were affected by students’ GPA, (which caused S12 and S22 to increase). • To reduce this variability, the following procedure was used: • 25 ranges of GPAs were predetermined. • Students from each major were randomly selected, one from each GPA range. • The highest salary offer for each student was recorded. • From the data presented can we conclude that Finance majors are offered higher salaries? Slide 41 Matched Samples Design (More explanation) Matched Samples Design (More explanation)
Data :
Finance 95171 87089 69131 56625 51071 Marketing 89329 77038 68110 53559 47438 92705 78272 54970 46793 29662 99205 59462 68675 39984 33710 99003 51555 54110 30137 31989 74825 81591 46467 61965 38788 Slide 42 88009 88664 58187 63728 31235 98089 71200 64718 55425 32477 106322 69367 67716 37898 35274 74566 82618 49296 56244 45835 Matched Matched Samples Design (More explanation)
s Solution • The parameter tested is µ D (use “Finance” “Marketing”) • The hypotheses: H0: µ D ≤ 0 H1: µ D > 0 • The t statistic: D − µD t= sD n The rejection region is t > t.05,251 = 1.711
Degrees of freedom = n – 1 Slide 43 Don’t forget to checking Don’t forget to checking the requirement for the paired observations case!!!
s The validity of the results depends on the normality of the differences. Slide 44 Matched Samples Design (More explanation)
s Solution • From the “Matched” dataset, we can calculate:
GPA Group Finance Marketing Difference 1 95171 89329 5842 2 88009 92705 4696 3 98089 99205 1116 4 106322 99003 7319 5 74566 74825 259 6 87089 77038 10051 7 88664 78272 10392 8 71200 59462 11738 9 69367 51555 17812 10 82618 81591 1027 . . . . . . . . .
Difference Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 5065 1329 3285 #N/A 6647 44181217 0.6594 0.3597 23533 5721 17812 126613 25 Slide 45 The boxplot of checking The boxplot of checking normality of the differences
BoxandWhisker Plot 6 2 2 6 10 14 Differences 18 (X 1000.0) Slide 46 Matched Samples Design (More explanation)
s Solution D = 5,065 s D = 6,647
• Calculate t D − µD 5065 − 0 t= = = 3.81 sD n 6647 25 Slide 47 Matched Samples Design (More explanation) tTest: Paired Two Sample for Means Finance Mark eting Mean 65438 60374 Variance 444981810 469441785 Observations 25 25 Pearson Correlation 0.9520 Hypothesized Mean Difference 0 df 24 t Stat 3.81 P(T<=t) onetail 0.0004 t Critical onetail 1.7109 P(T<=t) twotail 0.0009 t Critical twotail 2.0639 3.81 > 1.711 .0004 < .05 Slide 48 Matched Samples Design (More explanation) nclusion: e have enough evidence to conclude that the ance MBAs’ highest salary offer is, on the averag gher than that of the Marketing MBAs. Slide 49 Which case to use: Which case to use: Independent Samples or Matched Samples?
s s s First of all...
View
Full Document
 Spring '09
 Li
 Statistics, Variance, interval estimate

Click to edit the document details