This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Table D [7] Inference for Distributions For df = 9 we only
look into the
corresponding row. Lecture Notes The calculated value of t is 2.7.
We find the 2 closest t values. Simin Seury, Department of Economics, York University, Canada
Date: April 1 2010 2.398 < t = 2.7 < 2.821
thus
0.02 > upper tail p > 0.01
For a onesided Ha, this is the Pvalue (between 0.01 and 0.02);
for a twosided Ha, the Pvalue is doubled (between 0.02 and 0.04). The onesample ttest Excel TDIST(x, degrees_freedom, tails) As in the previous chapter, a test of hypotheses requires a few steps: TDIST = P(X > x) for a random variable X following the t distribution (x positive).
Use it in place of Table C or to obtain the pvalue for a positive tvalue.
X is the standardized value at which to evaluate the distribution (i.e., “t”).
Degrees_freedom is an integer indicating the number of degrees of freedom.
Tails specifies the number of distribution tails to return. If tails = 1, TDIST returns
the onetailed pvalue. If tails = 2, TDIST returns the twotailed pvalue. 1. Stating the null and alternative hypotheses (H0 versus Ha)
2. Deciding on a onesided or twosided test TINV(probability,degrees_freedom) 3. Choosing a significance level α Gives the tvalue (e.g., t*) for a given probability and degrees of freedom.
Probability is the probability associated with the twotailed t distribution. 4. Calculating t and its degrees of freedom Degrees_freedom is the number of degrees of freedom of the t distribution. 5. Finding the area under the curve with Table D
6. Stating the Pvalue and interpreting the result Sweetening colas
The Pvalue is the probability, if H0 is true, of randomly drawing a
sample like the one obtained or more extreme, in the direction of Ha.
The Pvalue is calculated as the corresponding area under the curve,
onetailed or twotailed depending on Ha: Onesided
(onetailed) t= Twosided
(twotailed) x − µ0
s n Cola manufacturers want to test how much the sweetness of a new
cola drink is affected by storage. The sweetness loss due to storage
was evaluated by 10 professional tasters (by comparing the sweetness
before and after storage):
Taster
1
2
3
4
5
6
7
8
9
10 Sweetness loss
2.0
0.4
0.7
2.0
−0.4
2.2
−1.3
1.2
1.1
2.3 Obviously, we want to test if
storage results in a loss of
sweetness, thus:
H0: µ = 0 versus Ha: µ > 0 Notice, here we do not know the population parameter σ .
The population of all cola drinkers is too large.
Since this is a new cola recipe, we have no population data.
This situation is very common with real data. 1 Sweetening colas (continued)
Is there evidence that storage results in sweetness loss for the new cola
recipe at the 0.05 level of significance (α = 5%)?
H0: µ = 0 versus Ha: µ > 0 (onesided test) t= x − µ0
1.02 − 0
=
= 2.70
s n 1.196 10 The critical value tα = 1.833.
t > tα thus the result is significant.
2.398 < t = 2.70 < 2.821 thus 0.02 > p > 0.01.
p < α thus the result is significant. Taster
Sweetness loss
1
2.0
2
0.4
3
0.7
4
2.0
5
0.4
6
2.2
7
1.3
8
1.2
9
1.1
10
2.3
___________________________
Average
1.02
Standard deviation
1.196
Degrees of freedom
n− 1= 9 In these cases, we use the paired data to test the difference in the two
population means. The variable studied becomes Xdifference = (X1 − X2),
and H0: µdifference= 0 ; Ha: µdifference>0 (or <0, or ≠0) Conceptually, this is not different from tests on one population. The ttest has a significant pvalue. We reject H0.
There is a significant loss of sweetness, on average, following storage. Sweetening colas (continued) Sweetening colas (revisited)
The sweetness loss due to storage was evaluated by 10 professional
tasters (comparing the sweetness before and after storage): x −µ
1.02 − 0
t=
=
= 2.70
s n 1.196 10
df = n − 1 = 9 Taster
1
2
3
4
5
6
7
8
9
10 Sweetness loss
2.0
0.4
0.7
2.0
−0.4
2.2
−1.3
1.2
1.1
2.3 We want to test if storage
results in a loss of
sweetness, thus:
H0: µ = 0 versus Ha: µ > 0 In Excel, you can obtain the precise
pvalue once you have calculated t:
Use the function dist(t, df, tails)
“=tdist(2.7, 9, 1),” which gives 0.01226 Matched pairs t procedures
Sometimes we want to compare treatments or conditions at the
individual level. These situations produce two samples that are not
independent — they are related to each other. The members of one
sample are identical to, or matched (paired) with, the members of the Although the text didn’t mention it explicitly, this is a pre/posttest design and
the variable is the difference in cola sweetness before minus after storage.
A matched pairs test of significance is indeed just like a onesample test. Does lack of caffeine increase depression?
Individuals diagnosed as caffeinedependent are
deprived of caffeinerich foods and assigned
to receive daily pills. Sometimes, the pills
contain caffeine and other times they contain
a placebo. Depression was assessed. Depression Depression Placebo Subject with Caffeine with Placebo Cafeine
1
5
16
11
2
5
23
18
3
4
5
1
4
3
7
4
5
8
14
6
6
5
24
19
7
0
6
6
8
0
3
3
9
2
15
13
10
11
12
1
11
1
0
1 other sample.
There are 2 data points for each subject, but we’ll only look at the difference.
Example: Pretest and posttest studies look at data collected on the The sample distribution appears appropriate for a ttest. same sample elements before and after some experiment is performed. factors by comparing a variable between sets of twins.
Example: Using people matched for age, sex, and education in social
studies allows canceling out the effect of these potential lurking
variables. 20 11 “difference”
data points. DIFFERENCE Example: Twin studies often try to sort out the influence of genetic 15
10
5
0
5
2 1
0
1
Normal quantiles 2 2 Does lack of caffeine increase depression? Power of ttest For each individual in the sample, we have calculated a difference in depression OMITTED from syllabus score (placebo minus caffeine).
There were 11 “difference” points, thus df = n − 1 = 10.
We calculate that x = 7.36; s = 6.92 Depression Depression Placebo Subject with Caffeine with Placebo Cafeine
1
5
16
11
2
5
23
18
3
4
5
1
4
3
7
4
5
8
14
6
6
5
24
19
7
0
6
6
8
0
3
3
9
2
15
13
10
11
12
1
11
1
0
1 H0: µ difference = 0 ; H0: µ difference > 0 t= x −0
7.36
=
= 3.53
s n 6.92/ 11 For df = 10, 3.169 < t = 3.53 < 3.581 0.005 > p > 0.0025 Caffeine deprivation causes a significant increase in depression. statistical output for the caffeine study:
a) Conducting a paired sample ttest on the raw data (caffeine and placebo) Example: Apartment Rents b) Conducting a onesample ttest on difference (caffeine – placebo)
Paired Samples Test
Paired Differences Placebo  Caffeine Mean
7.364 Std. Deviation
6.918 Std. Error
Mean
2.086 95% Confidence
Interval of the
Difference
Lower
Upper
2.716
12.011 t
3.530 df
10 Sig. (2tailed)
.005 OneSample Test
Test Value = 0 Difference t
3.530 df
10 Mean
Sig. (2tailed) Difference
.005
7.364 Interval Estimation of a Population Mean:
σ Unknown 95% Confidence
Interval of the
Difference
Lower
Upper
2.72
12.01 A reporter for a student newspaper is writing an
article on the cost of offcampus housing. A sample of
16 efficiency apartments within a halfmile of campus
resulted in a sample mean of $650 per month and a
sample standard deviation of $55.
Let us provide a 95% confidence interval
estimate of the mean rent per month for the
population of efficiency apartments within a
halfmile of campus. We will assume this population
to be normally distributed. Our alternative hypothesis was onesided, thus our pvalue is half of the
twotailed pvalue provided in the software output (half of 0.005 =
0.0025). Robustness
The t procedures are exactly correct when the population is distributed
exactly normally. However, most real data are not exactly normal.
The t procedures are robust to small deviations from normality – the
results will not be affected too much. Factors that strongly matter:
Random sampling. The sample must be an SRS from the population. Interval Estimation of a Population Mean:
σ Unknown
At 95% confidence, α = .05, and α/2 = .025. t.025 is based on n − 1 = 16 − 1 = 15 degrees of freedom.
In the t distribution table we see that t.025 = 2.131.
Degrees Area in Upper Tail Specifically:
When n < 15, the data must be close to normal and without outliers.
When 15 > n > 40, mild skewness is acceptable but not outliers.
When n > 40, the tstatistic will be valid even with strong skewness. .20 .100 .050 .025 .010 .005 15 .866 1.341 1.753 2.131 2.602 2.947 16 .865 1.337 1.746 2.120 2.583 2.921 17 Outliers and skewness. They strongly influence the mean and
therefore the t procedures. However, their impact diminishes as the
sample size gets larger because of the Central Limit Theorem. of Freedom .863 1.333 1.740 2.110 2.567 2.898 18 .862 1.330 1.734 2.101 2.520 2.878 19 .861
. 1.328
. 1.729
. 2.093
. 2.539
. 2.861
. . 3 pValue Approach to
TwoTailed Hypothesis Testing Interval Estimation of a Population Mean:
σ Unknown
Interval Estimate x ± t.025 650 ± 2.131 Compute the pvalue using the following three steps:
1. Compute the value of the test statistic z. s
n Margin
of Error 2. If z is in the upper tail (z > 0), find the area under
the standard normal curve to the right of z.
If z is in the lower tail (z < 0), find the area under
the standard normal curve to the left of z.
3. Double the tail area obtained in step 2 to obtain
the p –value.
The rejection rule:
Reject H0 if the pvalue < α . 55
= 650 ± 29.30
16 We are 95% confident that the mean rent per month
for the population of efficiency apartments within a
halfmile of campus is between $620.70 and $679.30. Summary of Interval Estimation Procedures
for a Population Mean When the z score falls within the
rejection region (shaded area on Can the
population standard
deviation σ be assumed
known ? Yes No the tailside), the pvalue is
smaller than α and you have
shown statistical significance. Use the sample
standard deviation
s to estimate s σ Known z = 1.645 Onesided
test, α = 5% Case
Use x ± zα /2 σ σ Unknown
Case n Use x ± tα /2 s
n Twosided
test, α = 1%
Z OneTailed Tests About a Population Mean:
σ Known Rejection region for a twotail test of with α = 0.05 (5%) Critical Value Approach A twosided test means that α is spread
between both tails of the curve, thus: 4. Determine the critical value and rejection rule. A middle area C of 1 − α = 95%, and For α = .05, z.05 = 1.645 An upper tail area of α /2 = 0.025. Reject H0 if z > 1.645 0.025 0.025 5. Determine whether to reject H0.
Because 2.47 > 1.645, we reject H0.
There is sufficient statistical evidence
to infer that Health Servicec is not meeting
the response goal of 12 minutes. Table C
upper tail probability p0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005 (…)
z*
0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291
Confidence interval C 50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9% 4 Recap: Confidence intervals
Remember, because a twosided test is symmetrical, wee can also use
a confidence interval to test a twosided hypothesis. [6] Introduction to Inference In a twosided test, C = 1 – α.
C confidence level α /2 α /2 α significance level Use and Abuse of Tests
Packs of cherry tomatoes (σ = 5 g): H0 : µ = 227 g versus Ha : µ ≠ 227 g
Sample average 222 g. 95% CI for µ = 222 ± 1.96*5/√4 = 222 g ± 4.9 g
227 g does not belong to the 95% CI (217.1 to 226.9 g). Thus, we reject H0. confidence interval test and pvalue Cautions about significance tests Ex: Your sample gives a 99% confidence interval of x ± m = 0.84 ± 0.0101. Choosing the significance level α With 99% confidence, could samples be from populations with µ = 0.86? µ = 0.85? Cannot reject
H0: µ = 0.85 Factors often considered:
What are the consequences of rejecting the null hypothesis
(e.g., global warming, convicting a person for life with DNA evidence)? Reject H0 : µ = 0.86 99% C.I. Are you conducting a preliminary study? If so, you may want a larger α so x that you will be less likely to miss an interesting result. Some conventions:
A confidence interval gives a black and white answer: Reject or don't reject H0. We typically use the standards of our field of work. But it also estimates a range of likely values for the true population mean µ. There are no “sharp” cutoffs: e.g., 4.9% versus 5.1 %. A Pvalue quantifies how strong the evidence is against the H0. But if you reject
H0, it doesn’t provide any information about the true population mean µ. Does the packaging machine need revision? It is the order of magnitude of the Pvalue that matters: “somewhat
significant,” “significant,” or “very significant.” Practical significance H0 : µ = 227 g versus Ha : µ ≠ 227 g
What is the probability of drawing a random sample such
as yours if H0 is true? x = 222g σ = 5g x − µ 222 − 227
z=
=
= −2
σ n
5 4 n=4 statistical significance doesn’t tell you about the magnitude of the
effect, only that there is one. normal curve to the left of z is 0.0228. Sampling
distribution An effect could be too small to be relevant. And with a large enough Thus, Pvalue = 2*0.0228 = 4.56%. sample size, significance can be reached even for the tiniest effect. σ/√n = 2.5 g sample average so different from likely to be due to chance alone because of random sampling.
Statistical significance may not be practically important. That’s because From table A, the area under the standard 2.28% The probability of getting a random Statistical significance only says whether the effect observed is 2.28% A drug to lower temperature is found to reproducibly lower patient
217 µ is so low that we reject H0.
The machine does need recalibration. 222 227 x,
z = −2 µ ( H0 ) 232 237 temperature by 0.4°
Celsius (Pvalue < 0.01). But clinical benefits of
temperature reduction only appear for a 1°decrease or larger. 5 Don’t ignore lack of significance
Consider this provocative title from the British Medical Journal: “Absence
of evidence is not evidence of absence”.
Having no proof of who committed a murder does not imply that the
murder was not committed. Indeed, failing to find statistical significance in results is not
rejecting the null hypothesis. This is very different from actually
accepting it. The sample size, for instance, could be too small to
overcome large variability in the population.
When comparing two populations, lack of significance does not imply
that the two samples come from the same population. They could
represent two very distinct populations with similar mathematical
properties. Interpreting effect size: It’s all about context
There is no consensus on how big an effect has to be in order to be
considered meaningful. In some cases, effects that may appear to be
trivial can be very important.
Example: Improving the format of a computerized test reduces the average
response time by about 2 seconds. Although this effect is small, it is
important since this is done millions of times a year. The cumulative time
savings of using the better format is gigantic. Always think about the context. Try to plot your results, and compare
them with a baseline or results from similar studies. 6 ...
View
Full
Document
This note was uploaded on 04/15/2010 for the course ECON ECON 2500 taught by Professor Siminseruy during the Spring '10 term at York University.
 Spring '10
 SiminSeruy

Click to edit the document details