This preview shows pages 1–20. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CHAPTER 7, Sections 7.1 & 7.2 Revised Jan 27, 2012
Inference for the Mean of a Population When sigma is unknown, Sec 7 .1 Previously we made the assumption that we knew the population standard
deviation, 0'. We then developed a conﬁdence interval and used tests of
signiﬁcance to evaluate evidence for or against a hypothesis, all with a known a. In many situations, a is unknown. In this section, we will continue doing inference
for the population mean, but we will use the sample standard deviation , s, as a
substitute for the unknown population standard deviation. The procedures are
called t procedures. Confidence Interval for a Mean First, Assumptions for Inference about the population mean: 0 Our data are a simple random sample (SRS) of size n taken from a normally
distributed population with mean ,u and standard deviation 0, both of which
are unknown. 0 Unless a small sample is used, the assumption that the data comes from a SRS
is more important than the assumption that the population distribution is
normal. ‘ Because we do not know the population sigma, a, we make two changes in our
procedure: 1. The sample standard deviation , s, is used in place of the unknown 6 to
estimate the standard deviation of the sample mean. The result is called the
standard error of the mean, SEM. SEM= SE23—
" JE Where .9 is the sample standard deviation and n is the sample size. 2. We calculate a different test statistic, t instead of z, and our P Value comes
from the t distribution instead of the Normal distribution. The tdistributions:
o The tdistribution is used when we do not know the population sigma, a. The
tdistributions have density curves similar in shape to the standard normal
curve, but with more spread. Lecture 9, Section 7.1 & 7.2
Page 1 o The t—distributions have more probability in the tails and less in the center
when compared with the standard normal distribution. This is because
substituting the estimate 3 for the unknown value of 6 introduces more
uncertainty. 0 For sample sizes less than 30, s tends to underestimate 0. As the sample size
increases, the t—density curve approaches the N(0,l) distribution. This is
because s estimates 6 with less bias as the sample size increases. The t Distributions
Suppose that an SRS of size n is drawn from a N ( ,u, 0') population, but the values of ,u and a are unknown. Then the onesample t statistic
t= x _ ’1
S / J5 ,
has the t distribution with n—l degrees of freedom. There is a separate distribution
for every sample size. For any given sample size the degrees of freedom = n—l.
Each line on the t table is for one speciﬁc degree of freedom. For any line on the t table there are columns which show the value of t which
coincides with certain speciﬁc probabilities in the right tail of the t distribution. The OneSample t Conﬁdence Interval
Suppose that a SRS of size n is drawn from a normal population having unknown
mean ,u and a. A level C conﬁdence interval for ,u is Eifki
«M where t* is the value for the t(n1) density curve with area C between —z‘* and t*.
Xbar is the point estimate, and t* (s)/\/(n) is the margin of error. This interval is exact when the population distribution is normal and is
approximately correct for large n in other cases. Examples :
1. Suppose X, Bob’s golf scores, are approximately normally distributed with unknown mean and standard deviation. A SRS of n = 16 scores is selected and a
sample mean of 3_c = 77 and a sample standard deviation, 3, = 3 is calculated.
Calculate a 90% conﬁdence interval for ,u. Lecture 9, Section 7.1 & 7.2
Page 2 WK CI 74% M: 77 1/.753 (V732) Q’F2I5
: 77 r /,3/
Esfin‘iavfea’ A =— (75,57, 72,3/ 2. (Example 7.1 in Textbook) Corn soy blend, CSB is highly nutritious, low—cost
fortiﬁed food that can be incorporated into different food preparations
worldwide. As part of a study to evaluate appropriate vitamin C levels in this
commodity, measurements were taken on samples of CSB produced in a factory.
The following data are the amounts of vitamin C, measured in milligrams per
100 grams of blend, for a random sample of size 8 from a production run.
Compute a 95% conﬁdence interval for ,u Where ,u is the population mean vitamin C content of the CSB. 26 3123 22112214 31 By hand:
3c =22.5 s=7.l9l n=8 df=7 F A __ + 71/77
95 z 7for/{J _ _. z 22...? 1’ 4.013 .Egmzarea/ Pofuzavw‘on ya = (inf/9, 28,5) Lecture 9, Section 7.1 & 7.2
Page 3 Using SPSS: analyze > descriptive statistics > explore Move “vitaminC” to “dependent list”. Click “statistics” and select “descriptives ” and change/keep a 95 % conﬁdence interval.
Click “continue” followed by “UK Descriptives —— Vitamin C Mean 95% Conﬁdence Lower Bound
Interval for Mean Upper Bound 5% Trimmed Mean
Median Variance
'Std. Deviation
Minimum
Maximum Range
Interquartile Range
Skewness Kurtosis The OneSample t test:
1. State the Null and Alternative hypothesis.
2. Find the test statistic:
Suppose that a SRS of size n is drawn from a normal population having
unknown mean uand unknown 6. To test the hypothesis H 0 : ,u = ,uo based on a SRS of size n, compute the onesample t statistic t = M _ s/ J75 3. Calculate the pvalue.
In terms of a random variable T having the t(n—1) distribution, the P value for a test of H 0 : ,u = ,u0 against
Ha :,u>,u0 is P(T2t), one side right
Ha:,u<,u0 is P(TSt), one side left
Ha :yiyo is 2P(T2t),two side These Pvalues are exact if the population distribution is normal and
are approximately correct for large n in other cases. Lecture 9, Section 7.1 & 7.2
Page 4 4. State the conclusions in terms of the problem. Use the given a or if the
value is not speciﬁed use a = 0.05 as the default. Then compare the P—value
to the or level. ' If P—value S 01, then reject H 0 . We have sufﬁcient evidence to reject Ho.
If P—Value > 0., then we fail to reject H 0 . We lack sufﬁcient evidence to rej ect H 0 .
Either way, we should use the words of the story in the conclusion. Examples:
1. Experiments on learning in animals sometimes measure how long it takes mice to ﬁnd their way through a maze. Suppose the population mean time is 18
seconds for one particular maze. A researcher thinks that loud noise will
decrease the time it takes a mouse to complete the maze. She measures how long
each of 30 mice take to complete the maze with loud noise stimulus. She gets a
sample mean, Xbar = 16 seconds and a sample standard deviation, 3 = 3
seconds. Do a one sided hypothesis test to test the researchers assertions with_q_ =0.1. HO; 14:”; j: ,za—IQ: _3Ig§/
HAJA<I 3/50 @He S/Ya’c F Va/Ue ./5 4,00!
REJECT Ho, SUFFI‘CIEnYL ewe/ante
To Show Pofu/aT/on mean .4 lg 2. (Example 7.2 in Textbook) Suppose that we know that sufﬁcient vitamin C was
added to the CSB mixture to produce a mean Vitamin C content in the ﬁnal
product of 40 mg/ 100 g. It is suspected that some of the vitamin C is lost in the
production process. To test this hypothesis we can conduct a onesided test to
determine if there is sufﬁcient evidence to conclude that vitamin C is lost. A sample of 8 batches of CSB was tested for Vitamin C. The sample mean = 22.50 J >354!) and the sample standard deviation = 7.191. Use a = 0.05 level. By hand: Ho: u = 40 Ha: u < 40 one side left test —'__—'.__‘— ‘ .91
2‘4" (DVc/Ue is <.ooo! Lecture 9, Section 7.1 & 7.2 PageS Rejecf Ho Using SPSS:
analyze > compare means > One sample T test
Move “vitaminc” into the “test variable 'box ” and type in 40 for the
test value.
To change the conﬁdence interval, Click “options ” and change conﬁdence interval from 95% to whatever. I did
not do this as I will keep the 95 % default.
Click “continue”. Lastly click “OK”. OneSample Statistics , .—
0' ﬂ _ ‘7’ 0 Std. Error 7
Std. Deviation Mean m OneSam ple Test Test Value = 40 95% Confidence Interval
of the Difference Mean
Difference t df
m Matched Pairs Design: W 2,3 ,5“ I A common design to compare two treatments is the matched pairs design. One type
of matched pair design has 2 subjects who are similar in important aspects matched
in pairs and each treatment is given to one of the subjects in each pair. A 2nd common design does not use matched subjects. 'Instead each subject is given
2 treatments in random order. Ex: Each subj ect" does a left hand test and a right
hand test. Lecture 9, Section 7.1 & 7.2
Page 6 Another common design uses beforetreatment and aftertreatment observations on
each of the subjects in the experiment. Each subject thereby provides data on the
difference, or improvement, or reduction associated with the treatment. Assumptions for the matched pair t test:
1. The data values are paired and we analyze the linebyline differences.
2. The linebyline differences are independent of each other.
3. The linebyline differences are normally distributed with unknown
population mean and unknown population standard deviation. Paired t Procedures: To compare the responses to the two treatments in a matched pairs design,
determine the difference between the two treatments for each subject and analyze
the observed differences. Example: (Problem 7.31 is done by hand and using SPS S): The researchers studying vitamin C in CSB in example 7.1 were also interested in a
similar commodity called Wheat soy blend (WSB). Both these commodities are
mixed with other ingredients and cooked. Loss of vitamin C as a result of cooking
was a concern of the researchers. One preparation used in Haiti called gruel can be
made from WSB, salt, sugar, milk, banana, and other optional items to improve the
taste. Five samples of gruel prepared in Haitian households were obtained. The
vitamin C content of these 5 samples was measured before and after cooking. Set up appropriate hypotheses and carry out a signiﬁcance test for these data.
Hypotheses: Ho: u of differences (before — after) =0; u = pop mean of differences Ha: p. of differences (before — after) >0
Here are the data: Xbar of differences = 5.0 Std Dev of differences =s = 3.937 Test statistic, t = 2.840 P Value is between .02 and .025. Lecture 9, Section 7.1 & 7.2
Page 7 Exact P Value = .0234 ( from computer).
Conclusion: if on = .05: Reject Ho. There is sufﬁcient evidence to
conclude that Vitamin C is lower after cooking. Using SPSS:
> Analyze > Compare Means > Paired — Sample T test.
Move “before and after ” t0 “paired variable box ” (whichever variable is listed ﬁrst will come ﬁrst in the subtraction)
Click “OK” Paired Sam  les Statistics —m std. Deviation srd. Error Mean Pair 1 Before 80.8000 5 6.14003 2.74591
After 75.8000 5 7.52994 3.36749
Paired Sam Paired Differences T V‘a/U C = O 95% Conﬁdence Interval of the
Std. Std. Error Difference
Deviation Mean t df '
1 After (PValues for t tests are given by SPSS as Sig (2 tailed) and must be
divided by 2 for oneside tests). One side P Value = .0235. les Test A conﬁdence interval or statistical test is called robust if the conﬁdence level or P
value does not change very much when the assumptions of the procedure are
violated. The tprocedures are robust against non—normality of the population when
there are no outliers, especially when the distribution is roughly symmetric and
unimodal. Robustness and use of the OneSample t and Matched Pair t procedures:
0 Unless a small sample is used, the assumption that the data comes from a SRS
is more important than the assumption that the population distribution is
normal. Lecture 9, Section 7.1 & 7.2
Page 8  n<15: Use tprocedures only if the data are close to normal with no outliers. o n is 15  39: The tprocedure can be used except in the presence of outliers or
strong skewness. o n240: The tprocedure can be used even for clearly skewed distributions. Comparing Two Means: Two—Sample Problems: Sec 7.2
A situation in which two populations or two treatments based on separate samples are compared. A two—sample problem can arise:
o from a randomized comparative experiment which randomly divides the units
into two groups and imposes a different treatment on each group.
0 From a comparison of two random samples selected separately from two
different populations. Note: Do not confuse twosample designs with matched pair designs! In the two
sample t problems, each group is composed of separate subjects, ie, no subject is in
both groups, and each subject ﬁirnishes only one piece of data, ie, no subject is
tested twice. Assumptions for Comparing Two Means: 0 Two independent simple random samples, from two distinct populations are
compared. The same variable is measured on both samples. The sample
observations are independent, ie, neither sample has an inﬂuence on the other. 0 Both populations are normally distributed.
0 The means ,u1 and #2 and standard deviations 0'1 and 0'2 of both populations are unknown. Typically we want to compare two population means by giving a confidence interval
for their difference, #1 — ,u2 , or by testing the hypothesis of no difference, Hozyl—y2=0. OR MI=ML The TwoSample t Confidence Interval:
Suppose that an SRS of size n1 is drawn from a normal population with unknown mean #1 and that an independent SRS of size n, is drawn from another normal Lecture 9, Section 7.1 & 7.2
Page 9 population with unknown mean #2. The conﬁdence interval for the difference
between population means, ,ul — ,u2 is given by The value of t* is determined for the conﬁdence level, C, desired. The conﬁdence
interval formula is still composed of the same two components, (1) a point estimate
for the difference between population means, and (2) a margin of error which
expresses the uncertainty involved. Note that the two sample sizes do not have to be equal. Here, t* is the value for the t(k) density curve with area C between —t* and t*. The
value of the degrees of freedom, k, is approximated by software, or if we do the calculations by hand, we determine the df using the smaller of the two sample
sizes. TwoSample t Procedure For Tests Of Signiﬁcance:
1. Write the hypotheses in terms of the difference between means. Howl—#220 Mr .:. 447—
HaZM—/12>0 onesideright or «M, > I“ Z.
Hazyl—y2<0 onesideleft or J4, 4 M 2. Hamil—#2750 twoside ,u’ =F ML 2. Calculate the test statistic. A SRS of size ml is drawn from a normal population with unknown mean ,u1 and a SRS of size 722 is drawn from another normal population with unknown mean ,uz. Sigma 1 and sigma 2 are also unknown.
To test the hypothesis H 0 :141 — ,u2 = O the twosample 1‘ statistic is:
t 2 x1 _ x 2
2 2 S_1+§2_ "1 "2
and use P—values or critical values for the t(k) distribution, where the degrees of
freedom k is either approximated by software, or is based on the df for the smaller of the the two sample sizes. Note: SPSS calculates a value for degrees of freedom using a complicated formula which takes the sample sizes Lecture 9, Section 7.1 & 7.2
Page 10 and sample standard deviations into account. This value does not have to be an
integer. The df value can be as small as the (if for the smaller sample, or it can
be as large as the sum of the two dfs. The closer the sample sizes are, and the
closer the two sample standard deviations are, the higher the calculated df value will be. 3. Calculate the P—value.
Note: Unless we use software, we can only get a range for the PValue. We use the following formulas:
Ha :,ul —,u2 > 0 use P(TZZ‘) one side right test Hazyl—y, <0 use P(TSt) one side left test
HaIILll—[Ll9t0 use 2P(T2t) two side test Note: If doing the calculations by hand you should use the df for the smaller of
the two sample sizes. The resulting procedure is conservative. 4. State the conclusions in terms of the problem. Choose a signiﬁcance level
such as or = 0.05, then compare the PValue to the a level.
If Pvalue _<_ on, then reject H ; we have sufﬁcient evidence to reject Ho . If Pvalue > or, then fail to reject H 0 ; we lack sufﬁcient evidence ...... .. Robustness and use of the TwoSample t Procedures:
The twosample t procedures are more robust than the onesample t methods,
particularly when the distributions are not symmetric. They are robust in the
following circumstances:
0 If two samples are equal size and the two populations that the samples come
from have similar distributions then the t distribution is accurate for a variety
of distributions even when the sample sizes are as small as 111 2 n2 2 5 . 0 When the two population distributions are different, larger samples are needed. .
0 n1 + 722 <15 : Use two—sample tprocedures if the data are close to normal. If the data are clearly non normal or if outliers are present, do not use t.
0 nl + n2 is between 1539: The tprocedures can be used except in the
presence of outliers or strong skewness. ' r 0 111 + 722 2 40: The t procedures can be used even for clearly skewed distributions. Lecture 9, Section 7.1 & 7.2
Page 11 Examples:
1. The US. Department of Agriculture (USDA) uses many types of surveys to obtain important economic estimates. In one pilot study they estimated wheat
prices in July and in September using independent samples. Here is a brief
summary from the report: Month 11 E S / x/ﬁ (SEM)
July 90 3.50 0.023
September 45 3 .61 0.029 a. Note that the standard error of the sample mean,(SEM) was reported,
instead of sample standard deviations. We ﬁnd the standard deviation for each of
the samples as follows: July 5 — .023 t 8:,013‘M: ,ozzﬁo:o,z/gz ﬁ ‘* 1 f
52,0 3... = .02.?“ s: .ozNﬁ‘ =.oz9w5 am:
m ) b. Use a signiﬁcance test to examine whether or not the price of wheat was
the same in July and September. Be sure to give details and carefully state
your conclusion. Ho: ujuly = u sept _
Ha: .u july 7k u sept two side test because direction not indicated j ___ 3,6! {3.5% a A 2 2,97 C”: (H
‘2‘191'7—1Li/‘7957‘ .oz7o use 70
9o ‘15 Rlpth Tall /5 Aer/aer] ‘00:)" aria/(0025’ 2 Side FVa/u: )5 .0/ — , on: Kﬁjecf HQ) SU'FFI'CI’er—zt' QVIJehcg Lecture 9, Section 7.1 & 7.2
Page 12 0. Give a 95% conﬁdence interval for the increase in the population mean
between July and September. . 7 n.
(3.6/*3.So) f 2.02! '2’“ iiiE.
‘70 L15 N), a
.H t 2'02" ( ’ 037°) In’rgr‘eoﬂ doesd
.. i .075  o 5' I 95 n41" I‘l‘lCiU c,
H 9 L 3’ I )Zem. 2. The survey for Study Habits and Attitudes (SSHA) is a psychological test
designed to measure the motivation, study habits, and attitudes toward learning
of college students. These factors, along with ability are important in explaining
success in school. Scores on the SSHA range from 0 to 200. A selective private
college gives the SSHA to a SRS of both male and female ﬁrstyear students.
Here are the data for 18 women:
154 109 137 115 152 140 145 178 101 . n _.._._ I g
103 126 126 137 165 165 129 200 148 Here are the data for 20 men: 108 140 114 91 180 115 126 92 169 146 n’. G
109 132 75 88 113 151 70 115 187 104 “‘2‘ a. Examine each sample graphically, with special attention to outliers and
skewness. Is use of a tprocedure acceptable for these data? Lecture 9, Section 7.1 & 7.2
Page 13 Frequency Mean = 140.56
. ‘ Std. Dev. = 26.262
N = 18 100 120 140 160 180 200
Women scores Lecture 9, Section 7.1 & 7.2
Page 14 Frequency ‘ Mean = 121.25
\3 Std. Dev. = 32.852 so so 100 120 140 160 180 200
Men scores b. Most studies have found that the mean SSHA score for men is lower than
the mean score in a comparable group of women. Test this supposition
here. That is, state the hypotheses, carry out the test and obtain a PValue,
and give your conclusions. H0;,uw_,uM:.o 0R Aw=AM
HA1MW‘MM>O ‘9’? “”w>”“M x_ /40,S‘é‘lZ/,2.5
———————e——— = 2.0/0 [8 20 lisicic Fl/a/Ue < ,05 Lecture 9, Section 7.1 & 7.2
Page 15 Using SPSS: Note: The data needs to be typed in using two columns. In the ﬁrst column you
need to put all the scores. In the second column deﬁne the grouping variable as
gender and enter “ women” next to the women ’s scores and “men” next to the
men ’s scores. Analyze > Compare means > Independent Sample T test Move “score” to “T est Variable ” box and “gender” to “grouping variable ” box. Click “deﬁne groups ” and enter “women ” for group J and “men ” for group 2.
Click “continue ” followed by “OK”. Group Statistics $ Std. Error rou Std. Deviation Mean score women 18 140.56 26.262 6.190
men 20 121.25 32.852 7.346 Levene's Test for
Equality of
Variances ttest for Equality of Means 95% Confidence
Interval of the Mean Std. Error Difference Differenc Differenc —
Lower U 0 er S > s
\\ Vs“ \\§ \ \ \§§\ .\\\\\\\\\\ . .. variances
not
assumed Independent Samples Test We ALWAYS use the second line, “Equal variances not assumed” to get the t
test statistic, pvalue, etc. Never assume equal variances in this test. NOTE: PValue in the above example is .052/2 = .026 Give a 95% conﬁdence interval for the mean difference between the SSHA scores
of male and female ﬁrstyear students at this college. <—.l8§) 38,7?7) lhl’ehva/ mall/d9;
50 a? 2 Slaxt/ kw: Teri
Lecture 9, Section 7.1 &7.2 : ,DE woulcf “of
Page 16 r‘eJEC—t HO 3. Suppose we wanted to compare how students performed on test 1 versus
test 2 in stat 301. Below is data for a random sample of 10 students taking stat 301
along with the printout of the results ﬁom running A MATCHED PAIR T TEST. ub'ect I
01 m
—a
m I
A Paired Samples Statistics Std. Deviation Mean
Pair Test 1 82.80 10 14.382 4.548
1 Test 2 80.70 10 14.507 _ 4.588 Palred Samples Test ' Paired Differences 95% conﬁdence
Interval of the
Std. Errol Difference
Std. Deviation Mean Imam—um _ 'z 2.1/59 :
)3 #3573 T??? ms"? Answer the quegtidnos below based on the test. a. Why was a matched pairs test used as opposed to a two sample t—test? Each S7Ua’en7 Took [907% 7‘657‘5 Lecture 9, Section 7.1 & 7.2
Page 17 b. Is there a difference between test 1 and test 2 scores. Write out the twoside
matched pair hypotheses to test this and give the P—Value. Are our results
signiﬁcant at the 5% signiﬁcance level? Ho; Mam: = O 5 HA “Mo/i751: =1: o
[2 /Z Fram 57°55 j 5' ﬂES“? FVa/Uc = ,0?6
RCSU/r I707 5/‘7miﬂc‘am‘f' 4T 04 5,05” cannor rejec?’ H° : ,01‘778
" \ [0757! 4
, . Z 2 [6/65
0. Suppose one of our friends thought test 1 was eas1er and the students
generally did better on it. We want to test whether the student is correct. Write
out the oneside matched pair hypotheses to test this and give the P—Value. Are
our results signiﬁcant at the 5% signiﬁcance level? Ho1 AaiFF =0 5% «07¢? >0 From 55955 j:/, 957 FVa/vc : (06‘? RESU/T is signi‘FIc‘anf‘ at?" :4 =.o_s' R e} ec?‘ Ho Lecture 9, Section 7.1 & 7.2 _ Page 18 4. An instructor thought the material for the second test was more difﬁcult and
hence the students may have done worse on the second test. She decided to
randomly sample 10 Exam 1 tests and a different 10 Exam 2 tests, with each sample
taken from all the tests taken, ie the two groups of 10 are different students. Below is data for the two random samples and the analysis. Sample Sample
Test 1 Test 2 Group Statistics nw
test Std. Deviation Mean
score 1 10 82.80 14.382 4.548 Independent Samples Test Levene's Test for
Eualil ofVariances l—leslforEualil ofMeans
score Equal variances 95% Conﬁdence
Interval of the
Mean Sid. Error Difference
F 2 Si. 2lailed Difference Difference simian m :— l3 15 [Ova/03 : '11? =.3745— (One Side) a. What procedure was used and why? 2 Two sample T test because the two groups are made up of different
students. b. Write out the oneside hypotheses for this twosample t test and give the P—
value. Are your results signiﬁcant at the 5% signiﬁcance level? HO;M1MZ :0 6R Jul :b’b‘z—e
HA :, A, —J1a>0 0R M; >112. Cal/MOT rejec'l‘ Ho Lecture 9, Section 7.1 & 7.2
Page 19 ...
View Full
Document
 Spring '08
 Staff

Click to edit the document details