This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Review: Sections 10.4, 10.5 fl New: 10.7 Comparing Two Population Means:
Inference from Small Samples Outline  Sections 10.4 and 10.5 will cover: . Small—Sample Confidence Intervals and
Hypothesis Testing for the Difference Between
Two Population Means (Independent and
Paired Samples) . Section 10.7 will cover:
. Comparing Two Population Variances
. Tables used:
. Table 4: Critical values of the t distribution
. Table 6: Percentage points of the F distribution Statistical Inference
with a Small Sample Size . There are two issues we must address when we
have a sample size of less than 30 observations: . 1) If the measurenlants do not follow a normal
distribution, then X is not well approximated by a
normal distribution. . 2) The sample standard deviation 3 is NOT likely to be a good approximation of the population
standard deviation 0' . . What do we do in this situation? A Little History: II W.S. Gosset to the Rescue He worked for Guinness
Breweries, 1899 — 1937,
eventually as Head Brewer. He was the first person to
work on inference from small samples. All his statistical research
was motivated by work in
the manufacture of stout and
the production of barley. Statistical Inference i with a Small Sample Size . 1) We must ensure that our SRS is coming from a
normal or approximately normal distribution. In
practice, it is enough that the distribution be
symmetric and single peaked. . 2) We must account for the variability of s in
estimating a . This is done using the (Student)
tdistribution in place of the standard normal
distribution. (Table 4 in your textbook.) i The Student t distribution  Recall that if 11230, then 7 approx X _ ,Ll approx X ~ N(#,s2/n) and Z: ~ N(O,l) SHE . Recall that we estimate 0' with s, which is a good
approximation if n is large. If n is not large, then
Y—y T : SHE has a t—distribution with n — 1 “degrees of freedom". 6 i The Student t distribution . Like the standard normal distribution, the
tdistribution is bell shaped and symmetric about 0.
However, it has a lower peak and and longer tails
(more variation). . As degrees of freedom tend towards infinity, the
t—distribution converges to a standard normal
distribution.  Table 4 contains various values of the tdistribution
for various righttailed areas and degrees of
freedom. Small—Sample Inference for Two i Population Means: Three Cases . The way in which you will compute a
confidence interval or conduct a hypothesis
test with a small samples will depend on the
information you are given from the two
populations, and also on the way in the
samples were taken. Case 1: Independent samples, 0,2 $022 .
Case 2: Independent samples, 012 2022 .
Case 3: Paired (dependent) samples. Independent Samples: i Equal vs NonEqual Variances?  There is a formal hypothesis test designed
to test which of 0'1 2 0'22 and 012 at 0'22
is a more reasonable assumption. We shall
learn how to perform this test shortly. For
the moment, we shall just use some
common sense. Case 1: 2 2
Independent Samples, 01 i 02  Recall that if ml 230 and n2 230, then _ approx 2 2
N _ S_1 +S_2
[u] [“2 9
n] ”2 Case 1: Independent Samples, 0'12 7: 0'22 . However if n1 <30 and/or n2 <30, then where dfare given on the next slide: Case 1: 2 2
Independent Samples, 0'1 ¢ 0'2 (rounded DOWN to the nearest integer) i Case 1: Confidence Interval  An approximate (1—a)100% CI for ,u, —,u2 is
given by: where lot/M» is the value the tdistribution having an
area of 05/2 to the right of it. That is, P(tdf > tat/2W) = 05/2 . i Exercise 1  A researcher observes nerve conductivity speeds (in
m/s) for two samples — a sample of healthy subjects
(group 1) and a sample of subjects suffering from a
nerve disorder (group 2). The following sample
information is obtained: )7] = 53.994 31 = 0.974 n] = 25 )_CZ =48.594 32 22.490 n2 =19
We are interested in comparing the average nerve
conductivity between the two groups. Give a 90% confidence interval for the true mean difference in
nerve conductivity speeds between the two groups. i Exercise 1
— From the boxplots,
we can see that
Disorder 4 the population variances are likely
different as there is
a large difference
in the IQR and
overall spread in the two data sets. 45 50 55 NOTE: This is not an formal hypothesis test. Case 2: $ Independent Samples, 012 = (722 . Recall that we use (X, ‘X2)_( 1—1‘12) to estimate (Xr—. How would things change if we could assume that
our population variances were equal to some
common value? That is, 012 = 022 :02. Case 2: i Independent Samples, 0',2 = (722 . In that case, (— _ —)_(/11_/12) b (XI—X2)—(,ul—,uz)
ecomes . ,[1 1]
a 7+7
”1 142 Since we are making the assumption that the populations share a common variance, it makes sense to estimate 02 with an 52 computed from
both samples. Case 2: Independent Samples, 0',2 = (722 . We compute a “pooled” sample variance as
(”i _1)Sl2 + (’72 _1)522 n1+n2—2 i Case 2: Confidence Interval . An approximate (1—a)100% CI for ,u, —,uz is
given by: (371 — f2)i tot/2,111+n2—2 i Exercise 2 . An important property of airs bags is permeability of
the woven fabric. This is related to their ability to
absorb energy. Is there any difference in
permeability at 0°C and 20°C? A researcher takes
two independent SRSs of permeability
measurements — one at each temperature setting —
and obtains the following results: 0°C: 70, 85, 92, 80, 6O 20°C: 40, 60, 50, 45 Compute a 95% CI for the true mean difference in
permeability and interpret. 20 i Exercise 2 From the boxplots,
we can see that
the population
variances are likely
reasonably close in
value as there is
not a large
difference in the
IQR and overall
spread in the two
data sets. Case 3: i Paired (Dependent) Samples . Paired samples occur when the two samples are
dependent. Examples include:  Weight measurements of a group of people
before and after a diet program. . Mileage measurements from the same cars for
two types of gasoline. . Two real estate agents estimate the values of the
same group of houses. Case 3: i Paired (Dependent) Samples . The analysis of paired samples is done by reducing a
twosample problem into a one sample problem.
The general procedure is as follows: . We have paired data (X1, Y1), (X2, Y2),..., (Xn, Y”).  For each pair we compute the difference di= X— Y].
and we think of ,u1 —,u2 as lid.
. Now we have a single list of observations which we can use the onesample t—procedure to analyze (see
Chapter 10, Section 10.3). i Case 3: Confidence Interval . An approximate (l—a)100% CI for #4 =,u1—,u2
is given by: i Exercise 3 . Trace metals in drinking water affect the flavour, and
unusually high concentrations can pose a health
hazard. The article “Trace Metals of South Indian
River" (Envir. Studies, 1982: 6266) reports on a study
in which six river locations were randomly selected.
The zinc concentration (mg/L) for both surface water
and bottom water was determined at each location
(see next slide). It is suspected that bottom levels
are significantly higher, on average, than surface
levels. Give a 99% CI for the true mean difference
in zinc concentration (bottom — surface). 25 i Exercise 3 (cont’d) Location Small—Sample Inference for Two i Population Means: Hypothesis Testing  Step 1: State the null and alternative hypotheses. . Step 2: Compute the test statistic to.  Step 3: Compute the pvalue for the test. Small—Sample Inference for Two i Population Means: Hypothesis Testing . Step 4: Make a conclusion based on the pvalue. If pvalue < a, we reject 110 and accept HA. If pvalue > 06, we do not reject H0. NOTE 1: When working with the tdistribution, we
won’t be able to get an exact pvalue (unless we use
MINITAB). What we can obtain from Table 4 is a
range for the pvalue. NOTE 2: As with the ztests, we can use the
pvalue method, the critical value method, and, in
the ﬁasje of a twosided test, the confidence interval
met 0 . 28 i Case 1: Test Statistic . The test statistic used is
t : (371—972)_Do 0
2 2
S1 52
7+7 where the dfare computed as before. i Exercise 4: Example 1 Revisited  Use the information from Exercise 1. Are nerve
conductivity speeds lower on average for the group
with the nerve disorder? Perform the appropriate
hypothesis test at the 10% level of significance
ugng
a) the pvalue method. b) the critical value method. Could we also use the confidence interval method
to perform this hypothesis test? Why or why not?
If yes, use it. 3O 10 i Case 2: Test Statistic . The test statistic used is 7+7
”2 where S2 is computed as before and q’f: n1 + n2 , 2. i Exercise 5: Exercise 2 Revisited  Use the information from Exercise 2. A researcher
wonders if there any difference in permeability at
0°C and 20°C. Perform the appropriate hypothesis
test at the 5% level of significance using a) the pvalue method.
b) the critical value method. Could we also use the confidence interval method to
perform this hypothesis test? Why or why not? If
yes, use it. i Case 3: Hypotheses  For some constant value do there are three
basic type of hypothesis tests for #6,. Hozydsdo versus HA :,ud>a’0
(onesided)
HO 2,110, Zdo versus HA :yd <d0 HO 2%, =d0 versus HA :yd ido (tWOsided) 11 i Case 3: Test Statistic . The test statistic used is _c?—d0 —Sd/‘/; to where df=n— 1. i Exercise 6 . A random sample of 6 viewers of Home Shopping
Network was selected for an experiment. All
viewers in the sample had recorded the amount of
money they spent shopping during the holiday
season of the previous year. The next year, these
people were given access to the cable network and
were asked to keep a record of their total purchases
during the holiday season. Consider the data on the
next slide. Is there sufficient evidence at the 5%
level of significance to conclude that the average
amount spent has increased? i Exercise 6 (cont’d) This Year (Y) 12 Statistical Inference i for Two Population Variances  Generally speaking, if you have no reason to believe
otherwise, always assume that o1 :5 0—2 . Using
this “general procedure” will always provide valid
results even when 0'1 2 0—2. If we can correctly assume 0. = 02 , then the
benefit in using the pooled variance procedure will
be that it tends to provide slightly shorter
confidence intervals and slightly smaller pvalues. We will now discuss a hypothesis test we can
perform to determine if 01 = 02 is a plausible
assumption. Statistical Inference i for Two Population Variances . To perform the hypothesis test we are about to
discuss, we make three main assumptions, as
follows: . We take two independent, random samples,
one from each population. . Both populations are at least approximately
normally distributed. . The variability of the measurements in the two
populations is the same; that is, 012 = 022 . i The Hypothesis Test . Step 1: State the null and alternative hypotheses. , 2_ 2 , 2 2
H0.0']—0'2 HA.0]7&0'2 2
SMAX 2 .
S MIN . Step 2: Compute the test statistic F0 2 . But what is F0? i The F Distribution . Under the assumption that 012 = 022 , F0 is an
observation from the F distribution. That is, 2
S F _ MAX N F 0— 2 dfldfz denominator degrees of freedom
numerator degrees of freedom i The Hypothesis Test  Step 3: Compute the pvalue for the test. pvalue = 2P(Fd >Fo) i 4.17";  Step 4: Make a conclusion based on the pvalue. If pvalue < a, we reject H0 and accept HA.
If pvalue > a, we do not reject H0.  NOTE: We could also use the critical value method.
41 i Exercise 7: Example 1 Revisited . A researcher observes nerve conductivity speeds (in
m/s) for two samples — a sample of healthy subjects
(group 1) and a sample of subjects suffering from a
nerve disorder (group 2). The following sample
information is obtained: 7c] = 53.994 31 = 0.974 n] = 25 7c 2 48.594 s2 = 2.490 n2 =19
. In order to conduct a hypothesis test to compare
the average nerve conductivity between the two
groups, we first need to decide if 0'12 2 022 .
Conduct an appropriate hypothesis test at a = 0.10.
42 14 i Exercise 8: Example 2 Revisited . An important property of airs bags is permeability of
the woven fabric. This is related to their ability to
absorb energy. Is there any difference in
permeability at 0°C and 20°C? A researcher takes
two independent SRSs of permeability
measurements — one at each temperature setting —
and obtains the following results: 0°C: 70, 85, 92, 80, 60 20°C: 40, 60, 50, 45 Are the population variances equal? Conduct an
appropriate hypothesis test at the 5% level of
significance. ...
View
Full Document
 Winter '10
 LaniHaque
 Linear Algebra, Algebra

Click to edit the document details