Unformatted text preview: Discovering Statistics Using SPSS: Chapter 8 Chapter 8: Answers
Task 1
Imagine that I was interested in how different teaching methods affected students’ knowledge.
I noticed that some lecturers were aloof and arrogant in their teaching style and humiliated
anyone who asked them a question, while others were encouraging and supporting of
questions and comments. I took three statistics courses where I taught the same material. For
one group of students I wandered around with a large cane and beat anyone who asked daft
questions or got questions wrong (punish). In the second group I used my normal teaching
style which is to encourage students to discuss things that they find difficult and to give
anyone working hard a nice sweet (reward). The final group I remained indifferent to and
neither punished nor rewarded their efforts (indifferent). As the dependent measure I took the
students’ exam marks (percentage). Based on theories of operant conditioning, we expect
punishment to be a very unsuccessful way of reinforcing learning, but we expect reward to be
very successful. Therefore, one prediction is that reward will produce the best learning. A
second hypothesis is that punishment should actually retard learning such that it is worse than
an indifferent approach to learning. The data are in the file Teach.sav carry out a oneway
ANOVA and use planned comparisons to test the hypotheses that (1) reward results in better
exam results than either punishment or indifference; and (2) indifference will lead to
significantly better exam results than punishment.
SPSS Output
Descriptives
Exam Mark N
Punish
Indifferent
Reward
Total 10
10
10
30 Mean
50.0000
56.0000
65.4000
57.1333 Std. Deviation
4.13656
7.10243
4.29987
8.26181 95% Confidence Interval for
Mean
Lower Bound Upper Bound
47.0409
52.9591
50.9192
61.0808
62.3241
68.4759
54.0483
60.2183 Std. Error
1.30809
2.24598
1.35974
1.50839 Minimum
45.00
46.00
58.00
45.00 Maximum
57.00
67.00
71.00
71.00 This output shows the table of descriptive statistics from the oneway ANOVA; we’re told the
means, standard deviations, and standard errors of the means for each experimental
condition. The means should correspond to those plotted in the graph. These diagnostics are
important for interpretation later on. It looks as though marks are highest after reward and
lowest after punishment.
Test of Homogeneity of Variances
Exam Mark
Levene
Statistic
2.569 df1 df2
2 27 Sig.
.095 The next part of the output reports a test of the assumption of homogeneity of variance
(Levene’s test). For these data, the assumption of homogeneity of variance has been met,
because our significance is 0.095, which is bigger than the criterion of 0.05.
ANOVA
Exam Mark Between Groups
Within Groups
Total Dr. Andy Field Sum of
Squares
1205.067
774.400
1979.467 df
2
27
29 Mean Square
602.533
28.681 Page 1 F
21.008 Sig.
.000 8/13/2003 Discovering Statistics Using SPSS: Chapter 8
The main ANOVA summary table shows us that because the observed significance value is less
than 0.05 we can say that there was a significant effect of teaching style on exam marks.
However, at this stage we still do not know exactly what the effect of the teaching style was
(we don’t know which groups differed).
Robust Tests of Equality of Means
Exam Mark
a Welch
BrownForsythe Statistic
32.235
21.008 df1 df2
17.336
20.959 2
2 Sig.
.000
.000 a. Asymptotically F distributed. This table shows the Welch and BrownForsythe Fs, but we can ignore these because the
homogeneity of variance assumption was met.
Contrast Coefficients
Contrast
1
2 Type of Teaching Method
Punish
Indifferent
Reward
1
1
2
1
1
0 Because there were specific hypotheses I specified some contrasts. This table shows the codes
I used. The first contrast compares reward (coded with 2) against punishment and
indifference (both coded with 1). The second contrast compares punishment (coded with 1)
against indifference (coded with 1). Note that the codes for each contrast sum to zero, and
that in contrast 2, reward has been coded with a 0 because it is excluded from that contrast.
Contrast Tests Exam Mark Assume equal variances
Does not assume equal
variances Contrast
1
2
1
2 Value of
Contrast
24.8000
6.0000
24.8000 Std. Error
4.14836
2.39506
3.76180 t
5.978
2.505
6.593 27
27
21.696 Sig. (2tailed)
.000
.019
.000 6.0000 2.59915 2.308 14.476 .036 df This table shows the significance of the two contrasts specified above. Because homogeneity of
variance was met, we can ignore the part of the table labelled does not assume equal
variances. The ttest for the first contrast tells us that reward was significantly different from
punishment and indifference (it’s significantly different because the value in the column
labelled Sig. is less than 0.05). Looking at the means this tells us that the average mark after
reward was significantly higher than the average mark for punishment and indifference
combined. The second contrast (and the descriptive statistics) tells us that the marks after
punishment were significantly lower than after indifference (again, it’s significantly different
because the value in the column labelled Sig. is less than 0.05). As such we could conclude
that reward produces significantly better exam grades than punishment and indifference, and
that punishment produces significantly worse exam marks than indifference. So lecturers
should reward their students not punish them!
Calculating the Effect Size
The output provides us with three measures of variance: the between group effect (SSM), the
within subject effect (SSR) and the total amount of variance in the data (SST). We can use
these to calculate omega squared (ω2): Dr. Andy Field Page 2 8/13/2003 Discovering Statistics Using SPSS: Chapter 8
MS − MS R
ω 2 = MS + (M −1)×MS )
M (n
R ω2 = 602.53 − 28.68
602.53 + ((10 − 1 )× 28.68 ) 573.85
602.53 + 258.12
= 0.67
= ω = 0.67 = 0.82
For the contrasts the effect sizes will be:
rcontrast = t2
t + df
2 − 5.978 2
− 5.978 2 + 27
= 0.75 rcontrast 1 = If you think back to our benchmarks for effect sizes this represents a huge effect (it is well above 0.5—the
threshold for a large effect). Therefore, as well as being statistically significant, this effect is large and so
represents a substantive finding. For contrast 2 we get:
− 2.505 2
− 2.505 2 + 27
= 0.43 rcontrast 2 = This too is a substantive finding and represents a medium to large effect size.
Interpreting and Writing the Result
The correct way to report the main finding would be:
All significant values are reported at p < .05.There was a significant effect of teaching
style on exam marks, F(2, 27) = 21.01, ω = .82. Planned contrasts revealed that
reward produced significantly better exam grades than punishment and indifference,
t(27) = –5.98, r = .75, and that punishment produced significantly worse exam marks
than indifference, t(27) = –2.51, r = .43. Task 2
In Chapter 11 (section 11.4) there are some data looking at whether eating Soya meals
reduces your sperm count. Have a look at this section, access the data for that example, but
analyse them with ANOVA. What’s the difference between what you find and what is found in
section 11.4? Why do you think this difference has arisen?
SPSS Output Dr. Andy Field Page 3 8/13/2003 Discovering Statistics Using SPSS: Chapter 8 Descriptives
Sperm Count (Millions) N
No Soya Meals
1 Soya Meal Per Week
4 Soyal Meals Per Week
7 Soya Meals Per Week
Total 20
20
20
20
80 Mean
4.9868
4.6052
4.1101
1.6530
3.8388 Std. Deviation
5.08437
4.67263
4.40991
1.10865
4.26048 95% Confidence Interval for
Mean
Lower Bound Upper Bound
2.6072
7.3663
2.4184
6.7921
2.0462
6.1740
1.1341
2.1719
2.8906
4.7869 Std. Error
1.13690
1.04483
.98609
.24790
.47634 Minimum
.35
.33
.40
.31
.31 Maximum
21.08
18.47
18.21
4.11
21.08 This output shows the table of descriptive statistics from the oneway ANOVA. It looks as
though as Soya intake increases, sperm counts do indeed decrease.
Test of Homogeneity of Variances
Sperm Count (Millions)
Levene
Statistic
5.117 df1 df2
3 Sig.
.003 76 The next part of the output reports a test of the assumption of homogeneity of variance
(Levene’s test). For these data, the assumption of homogeneity of variance has been broken,
because our significance is 0.003, which is smaller than the criterion of 0.05. In fact, these
data also violate the assumption of normality (see the Chapter on nonparametric statistics).
ANOVA
Sperm Count (Millions) Between Groups
Within Groups
Total Sum of
Squares
135.130
1298.853
1433.983 df
3
76
79 Mean Square
45.043
17.090 F
2.636 Sig.
.056 The main ANOVA summary table shows us that because the observed significance value is
greater than 0.05 we can say that there was no significant effect of Soya intake on men’s
sperm count. This is strange because if you read the chapter on nonparametric statistics from
where this example came, the KruskalWallis test produced a significant result! The reason for
this difference is that the data violate the assumptions of normality and homogeneity of
variance. As I mention in the chapter on nonparametric statistics, although parametric tests
have more power to detect effects when their assumptions are met, when their assumptions
are violated nonparametric tests have more power! This example was arranged to prove this
point: because the parametric assumptions are violated, the nonparametric tests produced a
significant result and the parametric test did not because, in these circumstances, the
nonparametric test has the greater power!
Robust Tests of Equality of Means
Sperm Count (Millions)
a Welch
BrownForsythe Statistic
6.284
2.636 df1
3
3 df2
34.657
58.236 Sig.
.002
.058 a. Asymptotically F distributed. This table shows the Welch and BrownForsythe Fs, note that the Welch test agrees with the
nonparametric test in that the significance of F is below the 0.05 threshold. However, the
BrownForsythe F is nonsignificant (it is just above the threshold). This illustrates the relative
superiority of the Welch procedure. However, in these circumstances because normality and
homogeneity of variance have been violated we’d use a nonparametric test anyway! Dr. Andy Field Page 4 8/13/2003 Discovering Statistics Using SPSS: Chapter 8 Task Three
Students (and lecturers for that matter) love their mobile phones, which is rather worrying
given some recent controversy about links between mobile phone use and brain tumours. The
basic idea is that mobile phones emit microwaves, and so holding one next to your brain for
large parts of the day is a bit like sticking your brain in a microwave oven and selecting the
‘cook until well done’ button. If we wanted to test this experimentally, we could get 6 groups
of people and strap a mobile phone on their heads (that they can’t remove). Then, by remote
control, we turn the phones on for a certain amount of time each day. After 6 months, we
measure the size of any tumour (in mm3) close to the site of the phone antennae (just behind
the ear). The six groups experienced 0, 1, 2, 3, 4 or 5 hours per day of phone microwaves for
6 months. The data are in Tumour.sav. (From Field & Hole, 2003, so there is a very detailed
answer in there).
SPSS Output
The error bar chart of the mobile phone data shows the mean size of brain tumour in each
condition, and the funny ‘I’ shapes show the confidence interval of these means. Note that in
the control group (0 hours), the mean size of the tumour is virtually zero (we wouldn’t actually
expect them to have tumour) and the error bar shows that there was very little variance
across samples. We’ll see later that this is problematic for the analysis. Descriptives
Size of Tumour (MM cubed) N
0
1
2
3
4
5
Total 20
20
20
20
20
20
120 Mean
.0175
.5149
1.2614
3.0216
4.8878
4.7306
2.4056 Std. Deviation
.01213
.28419
.49218
.76556
.69625
.78163
2.02662 Std. Error
.00271
.06355
.11005
.17118
.15569
.17478
.18500 95% Confidence Interval for
Mean
Lower Bound Upper Bound
.0119
.0232
.3819
.6479
1.0310
1.4917
2.6633
3.3799
4.5619
5.2137
4.3648
5.0964
2.0393
2.7720 Minimum
.00
.00
.48
1.77
3.04
2.70
.00 Maximum
.04
.94
2.34
4.31
6.05
6.14
6.14 This output shows the table of descriptive statistics from the oneway ANOVA; we’re told the
means, standard deviations, and standard errors of the means for each experimental
condition. The means should correspond to those plotted in the graph. These diagnostics are
important for interpretation later on. Dr. Andy Field Page 5 8/13/2003 Discovering Statistics Using SPSS: Chapter 8 Test of Homogeneity of Variances
Size of Tumour (MM cubed)
Levene
Statistic
10.245 df1 df2
114 5 Sig.
.000 The next part of the output reports a test of this assumption, Levene’s test. For these data, the
assumption of homogeneity of variance has been violated, because our significance is 0.000,
which is considerably smaller than the criterion of 0.05. In these situations, we have to try to
correct the problem and we can either transform the data or choose the Welch F.
ANOVA
Size of Tumour (MM cubed) Between Groups
Within Groups
Total Sum of
Squares
450.664
38.094
488.758 df
5
114
119 Mean Square
90.133
.334 F
269.733 Sig.
.000 The main ANOVA summary table shows us that because the observed significance value is less
than 0.05 we can say that there was a significant effect of mobile phones on the size of
tumour. However, at this stage we still do not know exactly what the effect of the phones was
(we don’t know which groups differed).
Robust Tests of Equality of Means
Size of Tumour (MM cubed)
a Welch
BrownForsythe Statistic
414.926
269.733 df1
5
5 df2
44.390
75.104 Sig.
.000
.000 a. Asymptotically F distributed. This table shows the Welch and BrownForsythe Fs, which are useful because homogeneity of
variance was violated. Luckily our conclusions remain the same; both Fs have significance
values less than 0.05.
Multiple Comparisons
Dependent Variable: Size of Tumour (MM cubed)
GamesHowell (I) Mobile Phone Use
(Hours Per Day)
0 1 2 3 4 5 (J) Mobile Phone
Use (Hours Per Day)
1
2
3
4
5
0
2
3
4
5
0
1
3
4
5
0
1
2
4
5
0
1
2
3
5
0
1
2
3
4 Mean
Difference
Std. Error
(IJ)
.4973*
.18280
1.2438*
.18280
3.0040*
.18280
4.8702*
.18280
4.7130*
.18280
.4973*
.18280
.7465*
.18280
2.5067*
.18280
4.3729*
.18280
4.2157*
.18280
1.2438*
.18280
.7465*
.18280
1.7602*
.18280
3.6264*
.18280
3.4692*
.18280
3.0040*
.18280
2.5067*
.18280
1.7602*
.18280
1.8662*
.18280
1.7090*
.18280
4.8702*
.18280
4.3729*
.18280
3.6264*
.18280
1.8662*
.18280
.1572
.18280
4.7130*
.18280
4.2157*
.18280
3.4692*
.18280
1.7090*
.18280
.1572
.18280 Sig.
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.984
.000
.000
.000
.000
.984 95% Confidence Interval
Lower Bound Upper Bound
.6982
.2964
1.5916
.8960
3.5450
2.4631
5.3622
4.3783
5.2653
4.1608
.2964
.6982
1.1327
.3603
3.0710
1.9424
4.8909
3.8549
4.7908
3.6406
.8960
1.5916
.3603
1.1327
2.3762
1.1443
4.2017
3.0512
4.0949
2.8436
2.4631
3.5450
1.9424
3.0710
1.1443
2.3762
2.5607
1.1717
2.4429
.9751
4.3783
5.3622
3.8549
4.8909
3.0512
4.2017
1.1717
2.5607
.5455
.8599
4.1608
5.2653
3.6406
4.7908
2.8436
4.0949
.9751
2.4429
.8599
.5455 *. The mean difference is significant at the .05 level. Dr. Andy Field Page 6 8/13/2003 Discovering Statistics Using SPSS: Chapter 8
Because there were no specific hypotheses I just carried out post hoc tests and stuck to my
favourite GamesHowell procedure (because variances were unequal). It is clear from the table
that each group of participants is compared to all of the remaining groups. First, the control
group (0 hours) is compared to the 1hour, 2hour, 3hour, 4hour and 5hour groups and
reveals a significant difference in all cases (all the values in the column labeled Sig. are less
than 0.05). In the next part of the table, the 1hour group is compared to all other groups.
Again all comparisons are significant (all the values in the column labeled Sig. are less than
0.05). In fact, all of the comparisons appear to be highly significant except the comparison
between the 4hour and 5hour groups, which is nonsignificant because the value in the
column labeled Sig. Is bigger than 0.05.
Calculating the Effect Size
The output provides us with three measures of variance: the between group effect (SSM), the
within subject effect (SSR) and the total amount of variance in the data (SST). We can use
these to calculate omega squared (ω2):
MS − MS R
ω 2 = MS + (M −1)× MS )
M (n
R ω2 = 90.13 − 0.33
90.13 + ((20 − 1 )× 0.33 ) 89.8
90.13 + 6.27
= 0.93
= ω = 0.93 = 0.96
Interpreting and Writing the Result
We could report the main finding as:
• Levene’s test indicated that the assumption of homogeneity of variance had been
violated (F (5, 114) = 10.25, p < .001). Transforming the data did not rectify this
problem and so Ftests are reported nevertheless. The results show that using a mobile
phone significantly affected the size of brain tumour found in participants (F(5, 114) =
269.73, p < .001, r = .96). The effect size indicated that the effect of phone use on
tumour size was substantial. The next thing that needs to be reported are the post hoc comparisons. It is customary just to
summarise these tests in very general terms like this:
• GamesHowell post hoc tests revealed significant differences between all groups (p <
.001 for all tests) except between 4 and 5hours (ns).’ If you do want to report the results for each post hoc test individually, then at least include the
95% confidence intervals for the test as these tell us more than just the significance value. In
this example though when there are many tests it might be as well to summarise these
confidence intervals as a table: Mobile Phone Use
(Hours Per Day)
0
1
2
3
4
5
1
2
3
4
5 Dr. Andy Field Sig.
<
<
<
<
<
<
<
<
< .001
.001
.001
.001
.001
.001
.001
.001
.001 Page 7 95% Confidence
Interval
Lower
Upper
Bound
Bound
.6982
.2964
1.5916
.8960
3.5450
2.4631
5.3622
4.3783
5.2653
4.1608
1.1327
.3603
3.0710
1.9424
4.8909
3.8549
4.7908
3.6406 8/13/2003 Discovering Statistics Using SPSS: Chapter 8 2
3
4 Dr. Andy Field 3
4
5
4
5
5 <
<
<
<
<
= .001
.001
.001
.001
.001
.984 Page 8 2.3762
4.2017
4.0949
2.5607
2.4429
.5455 1.1443
3.0512
2.8436
1.1717
.9751
.8599 8/13/2003 ...
View
Full Document
 Spring '10
 ennart
 Statistical significance, Nonparametric statistics, Effect size, Parametric statistics, Dr. Andy Field

Click to edit the document details