You've reached the end of your free preview.
Want to read all 943 pages?
Unformatted text preview: Welcome to the Companion Website for the 3rd edition of Discovering Statistics Using SPSS
by Andy Field. Student Resources This section contains a range of resources that are available to
students who use Andy Field - Discovering Statistics Using SPSS, Third Edition.
The resources available include:
• Interactive MCQs
Flashcard glossary containing all the terms you need to study for assessment
Smart Alex's answers
Additional web material
SPSS Flash animations
Study skills Click on the links on the left-hand side to access this student material. Additional DSUS Material Chapter 1 Self-Test Answers ⇒ Based on what you have read in this section, what qualities do you think a
scientific theory should have?
A good theory should do the following:
1. Explain the existing data.
2. Explain a range of related observations.
3. Allow statements to be made about the state of the world.
4. Allow predictions about the future.
5. Have implications. ⇒ What is the difference between reliability and validity? Validity is whether an instrument measures what it was designed to measure, whereas
reliability is the ability of the instrument to produce the same results under the same
conditions. 13 ⇒ Why is randomization important? It is important because it rules out confounding variables (factors that could influence the
outcome variable other than the factor in which you’re interested). For example, with
groups of people, random allocation of people to groups should mean that factors such as
intelligence, age, gender and so on are roughly equal in each group and so will not
systematically affect the results of the experiment. ⇒ Compute the mean but excluding the score of 252.
The range is the lowest score (22) subtracted from the highest score (now
121), which gives us 121–22 = 99. First, we first add up all of the scores:
i =1 i = 22 + 40 + 53 + 57 + 93 + 98 + 103 + 108 + 116 + 121
= 811 We then divide by the number of scores (in this case 11):
n X= ∑x
i =1 n i = 811
10 14 The mean is 81.1 friends. ⇒ Twenty-one heavy smokers were put on a treadmill at the fastest setting.
The time in seconds was measured until they fell off from exhaustion: 18,
16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57.
Compute the mode, median, upper and lower quartiles, range and
First, let’s arrange the scores in ascending order: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29,
32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57.
The Mode: The scores with frequencies in brackets are: 16 (1), 18 (2), 22 (2), 23 (2), 24
(1), 26 (1), 29 (1), 32 (1), 34 (2), 36 (2), 42 (1), 43 (1), 46 (2), 49 (1), 57 (1). Therefore,
there are several modes because 18, 22, 23, 34, 36 and 46 seconds all have frequencies of
2, and 2 is the largest frequency. These data are multimodal (and the mode is, therefore,
not particularly helpful to us).
The Median: The median will be the (n + 1)/2 score. There are 21 scores, so this will be
22/2 = 11. The 11th score in our ordered list is 32 seconds.
The Mean: The mean is 32.19 seconds:
n X= ∑x
i =1 i n
16 + 18 + 22 + 22 + 23 + 23 + 24 + 26 + 29 + 32 + 34 + 36 + 36 + 42 + 43 + 46 + 46 + 49 + 57
= 32.19 15 The Lower Quartile: This is the median of the lower half of scores. If we split the data at 32
(not including this score), there are 10 scores below this value. The median of 10 = 11/2 =
5.5th score. Therefore, we take the average of the 5th score and the 6th score. The 5th
score is 22 and the 6th is 23; the lower quartile is therefore 22.5 seconds.
The Upper Quartile: This is the median of the upper half of scores. If we split the data at
32 (not including this score), there are 10 scores above this value. The median of 10 =
11/2 = 5.5th score above the median. Therefore, we take the average of the 5th score
above the median and the 6th score above the median. The 5th score above the median is
42 and the 6th is 43; the upper quartile is therefore 42.5 seconds.
The Range; This is the highest score (57) minus the lowest (18), i.e. 39 seconds.
The Interquartile Range: This is the difference between the upper and lower quartile:
42.5−22.5 = 20. ⇒ Assuming the same mean and standard deviation for the Beachy Head
example above, what’s the probability that someone who threw
themselves off of Beachy Head was 30 or younger? As in the example, we know that the mean of the suicide scores was 36, and the standard
deviation 13. First we convert our value to a z-score: the 30 becomes (30−36)/13 = −0.46.
We want the area below this value (because 30 is below the mean), but this value is not
tabulated in the Appendix. However, because the distribution is symmetrical, we could
instead ignore the minus sign and look up this value in the column labelled ‘Smaller
Portion’ (i.e. the area above the value 0.46). You should find that the probability is .32276,
or, put another way, a 32.28% chance that a suicide victim would be 30 years old or
younger. By looking at the column labelled ‘Bigger Portion’ we can also see the probability 16 that a suicide victim was aged 30 or more! This probability is .67724, or there’s a 67.72%
chance that a suicide victim was older than 30 years old! Chapter 2 Self-Test Answers ⇒ We came across some data about the number of friends that 11 people had
on Facebook (22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252). We
calculated the mean for these data as 96.64. Now calculate the sums of
squares, variance and standard deviation. To calculate the sum of squares, take the mean from each value, then square this
difference. Finally, add up these squared values: So, the sum of squared errors is a massive 37544.55.
The variance is the sum of squared errors divided by the degrees of freedom (N – 1). There
were 11 scores and so the degrees of freedom were 10. The variance is, therefore,
37544.55/10 = 3754.45. 17 Finally, the standard deviation is the square root of the variance: 3754.45 = 61.27. ⇒ Calculate these values again but excluding the outlier (252). To calculate the sum of squares, take the mean from each value (note that it has changed
because the outlier is excluded), then square this difference. Finally, add up these squared
values: So, the sum of squared errors is 10992.90.
The variance is the sum of squared errors divided by the degrees of freedom (N – 1). There
were 10 scores and so the degrees of freedom were 9. The variance is, therefore,
10992.90/9 = 1221.43.
Finally, the standard deviation is the square root of the variance: 1221.43 = 34.95. Note then that like the mean itself the standard deviation is hugely influenced by outliers:
the removal of this one value has halved the standard deviation! 18 ⇒ We came across some data about the number of friends that 11 people had
on Facebook. We calculated the mean for these data as 96.64 and standard
deviation as 61.27. Calculate a 95% confidence interval for this mean.
First we need to calculate the standard error, σX = s
N = 61.27
11 = 18.47 The sample is small, so to calculate the confidence interval we need to find the appropriate
value of t. First we need to calculate the degrees of freedom, N – 1. With 11 data points,
the degrees of freedom are 10. For a 95% confidence interval we can look up the value in
the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the tdistribution (Appendix). The corresponding value is 2.23.
The confidence intervals are, therefore: Lower Boundary of Confidence Interval = X − (2.23 × SE) = 96.64 – (2.23 × 18.47) = 55.63
Upper Boundary of Confidence Interval = X − (2.23 × SE) = 96.64 + (2.23 × 18.47) =137.65 ⇒ Recalculate the confidence interval assuming that the sample size was 56. First we need to calculate the new standard error, σX = s
N = 61.27
56 = 8.19 The sample is big now, so to calculate the confidence interval we can use the critical value
of z for a 95% confidence interval (i.e. 1.96). The confidence intervals are therefore: Lower Boundary of Confidence Interval = X − (1.96 × SE) = 96.64 – (1.96 × 8.19) = 80.59
Upper Boundary of Confidence Interval = X − (1.96 × SE) = 96.64 + (1.96 × 8.19) = 112.69 19 ⇒ What are the null and alternative hypotheses for the following questions: • ‘Is there a relationship between the amount of gibberish that people speak and the amount of
vodka jelly they’ve eaten?’
o Null Hypothesis: There will be no relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten.
o Null Hypothesis: There will be a relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten.
• ‘Is the mean amount of chocolate eaten higher when writing statistics books then when not?’
o Null Hypothesis: There will be no difference in the mean amount of chocolate eaten when writing statistics textbooks compared to when not writing them.
o Alternative Hypothesis: The mean amount of chocolate eaten when writing statistics textbooks will be higher than when not writing them. Chapter 3 Self-Test Answers ⇒ Why is the ‘number of friends’ variable a ‘scale’ variable? 20 It is a scale variable because the numbers represent consistent intervals and ratios along
the measurement scale: the difference between having (for example) 1 and 2 friends is the
same as the difference between having (for example) 10 and 11 friends, and (for example)
20 friends is twice as many as 10. ⇒ Having created the first four variables with a bit of guidance, try to enter
the rest of the variables in Table 2.1 yourself.
The finished data and variable views should look like those in the figure below (more or
less!). You can also download this data file (Data with which to play.sav). 21 Chapter 4 Self-Test Answers ⇒ What does a histogram show? A histogram is a graph in which values of observations are plotted on the horizontal axis,
and the frequency with which each value occurs in the data set is plotted on the vertical
axis. ⇒ Produce boxplots for the day 2 and day 3 hygiene scores and interpret
The boxplot for the day 2 data should look like this: 22 Note that as for day 1 the females are slightly more fragrant than males (look at the
median line). However, if you compare these to the day 1 boxplots (in the book) scores are
getting lower (i.e. people are getting less hygienic). In the males there are now more
outliers (i.e. a rebellious few who have maintained their sanitary standards).
The boxplot for the day 3 data should look like this: Note that compared to day 1 and day 2 the females are getting more like the males (i.e.
smelly). However, if you look at the top whisker, this is much longer for the females. In
other words, the top 25% of females are more variable in how smelly they are compared to
males. Also, the top score is higher than for males. So, at the top end females are better at
maintaining their hygiene at the festival compared to males. Also, the box is longer for
females, and although both boxes start at the same score, the top edge of the box is
higher in females, again suggesting that above the median score more women are
achieving higher levels of hygiene than men. Finally, note that for both days 1 and 2, the
boxplots have become less symmetrical (the top whiskers are longer than the bottom
whiskers). On day 1 (see the book chapter), which is symmetrical, the 23 whiskers on either side of the box are of equal length (the range of the top
and bottom 25% of scores is the same); however, on days 2 and 3 the
whisker coming out of the top of the box is longer than that at the bottom,
which shows that the distribution is skewed (i.e. the top 25% of scores is
spread out over a wider range than the bottom 25%). Use what you learnt earlier to add error bars to this graph and to label both
the x- (I suggest ‘Time’) and y-axis (I suggest ‘Mean Grammar Score (%)’). Simple Line Charts for Independent Means
To begin with, imagine that a film company director was interested in whether there was
really such a thing as a ‘chick flick’ (a film that typically appeals to women more than
men). He took 20 men and 20 women and showed half of each sample a film that was
supposed to be a ‘chick flick’ (Bridget Jones’ Diary), and the other half of each sample a
film that didn’t fall into the category of ‘chick flick’ (Memento, a brilliant film by the way).
In all cases he measured their physiological arousal as a measure of how much they
enjoyed the film. The data are in a file called ChickFlick.sav on the companion website.
Load this file now.
First of all, let’s just plot the mean rating of the two films. We have just one grouping
variable (the film) and one outcome (the arousal); therefore, we want a simple line chart.
Therefore, in the Chart Builder double-click on the icon for a simple line chart. On the
canvas you will see a graph and two drop zones: one for the y-axis and one for the x-axis.
The y-axis needs to be the dependent variable, or the thing you’ve measured, or more
simply the thing for which you want to display the mean. In this case it would be arousal, 24 so select arousal from the variable list and drag it into the y-axis drop zone ( ). The x-axis should be the variable by which we want to split the arousal data. To plot the
means for the two films, select the variable film from the variable list and drag it into the
drop zone for the x-axis ( ). Dialog boxes for a simple line chart with error bar The figure above shows some other options for the line chart. The main dialog box should
appear when you select the type of graph you want, but if it doesn’t click on in the Chart Builder. There are three important features of this dialog box. The first is that, by
default, the lines will display the mean value. This is fine, but just note that you can plot
other summary statistics such as the median or mode. Second, you can adjust the form of 25 the line that you plot. The default is a straight line, but you can have others like a spline
(curved line). Finally, we can ask SPSS to add error bars to our line chart by selecting
. We have a choice of what our error bars represent. Normally, error bars show
the 95% confidence interval, and I have selected this option ( ). Note, though, that you can change the width of the confidence interval displayed by changing the ‘95’ to
a different value. You can also display the standard error (the default is to show two
standard errors, but you can change this to one) or standard deviation (again, the default
is two but this could be changed to one or another value). It’s important that when you
change these properties that you click on
applied to Chart Builder. Click on : if you don’t then the changes will not be to produce the graph. Line chart of the mean arousal for each of the two films
The resulting line chart displays the mean (and the confidence interval of those means).
This graph shows us that, on average, people were more aroused by Memento than they
were by Bridget Jones’ Diary. However, we originally wanted to look for gender effects, so
this graph isn’t really telling us what we need to know. The graph we need is a multiple line
graph. 26 Multiple Line Charts for Independent Means
To do a multiple line chart for means that are independent (i.e. have come from different
groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the
book chapter). On the canvas you will see a graph as with the simple line chart but there is
now an extra drop zone: . All we need to do is to drag our second grouping variable into this drop zone. As with the previous example then, select arousal from the
variable list and drag it into , select film from the variable list and drag it into . In addition, though, we can now select the gender variable and drag it into
. This will mean that lines representing males and females will be displayed in
different colours. As in the previous section, select error bars in the properties dialog box
and click on to apply them to the Chart Builder. Click on to produce the graph. 27 Dialog boxes for a multiple line chart with error bars 28 Line chart of the mean arousal for each of the two films
The resulting line chart tells us the same as the simple line graph: that is, arousal was
overall higher for Memento than Bridget Jones’ Diary, but it also splits this information by
gender. Look first at the mean arousal for Bridget Jones’ Diary; this shows that males were
actually more aroused during this film than females. This indicates they enjoyed the film
more than the women did! Contrast this with Memento, for which arousal levels are
comparable in males and females. On the face of it, this contradicts the idea of a ‘chick
flick’: it actually seems that men enjoy chick flicks more than the chicks (probably because
it’s the only help we get to understand the complex workings of the female mind!). Simple Line Charts for Related Means
Hiccups can be a serious problem. Charles Osborne apparently got a case of hiccups while
slaughtering a hog (well, who wouldn’t?) that lasted 67 years. People have many methods
for stopping hiccups (a surprise, holding your breath), but actually medical science has put
its collective mind to the task too. The official treatment methods include tongue–pulling
manoeuvres, massage of the carotid artery and, believe it or not, digital rectal massage
(Fesmire, 1988). I don’t know the details of what the digital rectal massage involved, but I
can probably imagine. Let’s say we wanted to put this to the test. We took 15 hiccup
sufferers, and during a bout of hiccups administered each of the three procedures (in
random order and at 5 minute intervals) after taking a baseline of how many hiccups they
had per minute. We counted the number of hiccups in the minute after each procedure.
Load the file Hiccups.sav. Note that these data are laid out in different columns; there is
no grouping variable that specifies the interventions because each patient experienced all
interventions. In the previous two examples we have used grouping variables to specify
aspects of the graph (e.g. we used the grouping variable film to specify the x-axis). For 29 repeated-measures data we will not have these grouping variables and so the process of
building a graph is a little more complicated (but not a lot more).
To plot the mean number of hiccups go to the Chart Builder and double-click on the icon
for a simple line chart. As before, you will see a graph on the canvas with drop zones for
the x- and y-axes. Previously we specified the column in our data that contained data from
out outcome measure on the y-axis, but for these data we have four columns containing
data on the number of hiccups (the outcome variable). What we have to do then is to drag
all four of these variables from the variable list into the y-axis drop zone. We have to do
this simultaneously. To do this we first need to select multiple items in the variable list: to
do this select the first variable by clicking on it with the mouse. The variable will be
highlighted in blue. Now, hold down the Ctrl key on the keyboard and click on a second
variable. Both variables are now highlighted in blue. Again, hold down the Ctrl key and
click on a third variable in the variable list and so on for the fourth. In cases in which you
want to select a list of consecutive variables, You can do this very quickly by simply clicking
on the first variable that we want to select (in this case baseline), then hold down the
Shift key on the keyboard and then click on the last variable that you want to select (in this
case digital rectal massage); notice that all of the variables in between have been
selected too. Once the four variables are selected you can drag them by clicking on any
one of the variables and then dragging them into as shown in the figure: 30 Specifying a simple line chart for repeated-measures data
Once you have dragged the four variables onto the y-axis drop zones a new dialog box
appears. This box tells us that SPSS is c...
View Full Document
- Spring '14
- Standard Deviation, Bar chart, Chart Builder