Student Resources.pdf - Welcome to the Companion Website for the 3rd edition of Discovering Statistics Using SPSS by Andy Field Student Resources This

Student Resources.pdf - Welcome to the Companion Website...

This preview shows page 1 out of 943 pages.

You've reached the end of your free preview.

Want to read all 943 pages?

Unformatted text preview: Welcome to the Companion Website for the 3rd edition of Discovering Statistics Using SPSS by Andy Field. Student Resources This section contains a range of resources that are available to students who use Andy Field - Discovering Statistics Using SPSS, Third Edition. The resources available include: • • • • • • Interactive MCQs Flashcard glossary containing all the terms you need to study for assessment Smart Alex's answers Additional web material SPSS Flash animations Study skills Click on the links on the left-hand side to access this student material. Additional DSUS Material Chapter 1 Self-Test Answers ⇒ Based on what you have read in this section, what qualities do you think a scientific theory should have? A good theory should do the following: 1. Explain the existing data. 2. Explain a range of related observations. 3. Allow statements to be made about the state of the world. 4. Allow predictions about the future. 5. Have implications. ⇒ What is the difference between reliability and validity? Validity is whether an instrument measures what it was designed to measure, whereas reliability is the ability of the instrument to produce the same results under the same conditions. 13 ⇒ Why is randomization important? It is important because it rules out confounding variables (factors that could influence the outcome variable other than the factor in which you’re interested). For example, with groups of people, random allocation of people to groups should mean that factors such as intelligence, age, gender and so on are roughly equal in each group and so will not systematically affect the results of the experiment. ⇒ Compute the mean but excluding the score of 252. The range is the lowest score (22) subtracted from the highest score (now 121), which gives us 121–22 = 99. First, we first add up all of the scores: n ∑x i =1 i = 22 + 40 + 53 + 57 + 93 + 98 + 103 + 108 + 116 + 121 = 811 We then divide by the number of scores (in this case 11): n X= ∑x i =1 n i = 811 = 81.1 10 14 The mean is 81.1 friends. ⇒ Twenty-one heavy smokers were put on a treadmill at the fastest setting. The time in seconds was measured until they fell off from exhaustion: 18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57. Compute the mode, median, upper and lower quartiles, range and interquartile range. First, let’s arrange the scores in ascending order: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57. The Mode: The scores with frequencies in brackets are: 16 (1), 18 (2), 22 (2), 23 (2), 24 (1), 26 (1), 29 (1), 32 (1), 34 (2), 36 (2), 42 (1), 43 (1), 46 (2), 49 (1), 57 (1). Therefore, there are several modes because 18, 22, 23, 34, 36 and 46 seconds all have frequencies of 2, and 2 is the largest frequency. These data are multimodal (and the mode is, therefore, not particularly helpful to us). The Median: The median will be the (n + 1)/2 score. There are 21 scores, so this will be 22/2 = 11. The 11th score in our ordered list is 32 seconds. The Mean: The mean is 32.19 seconds: n X= ∑x i =1 i n 16 + 18 + 22 + 22 + 23 + 23 + 24 + 26 + 29 + 32 + 34 + 36 + 36 + 42 + 43 + 46 + 46 + 49 + 57 = 21 676 = 21 = 32.19 15 The Lower Quartile: This is the median of the lower half of scores. If we split the data at 32 (not including this score), there are 10 scores below this value. The median of 10 = 11/2 = 5.5th score. Therefore, we take the average of the 5th score and the 6th score. The 5th score is 22 and the 6th is 23; the lower quartile is therefore 22.5 seconds. The Upper Quartile: This is the median of the upper half of scores. If we split the data at 32 (not including this score), there are 10 scores above this value. The median of 10 = 11/2 = 5.5th score above the median. Therefore, we take the average of the 5th score above the median and the 6th score above the median. The 5th score above the median is 42 and the 6th is 43; the upper quartile is therefore 42.5 seconds. The Range; This is the highest score (57) minus the lowest (18), i.e. 39 seconds. The Interquartile Range: This is the difference between the upper and lower quartile: 42.5−22.5 = 20. ⇒ Assuming the same mean and standard deviation for the Beachy Head example above, what’s the probability that someone who threw themselves off of Beachy Head was 30 or younger? As in the example, we know that the mean of the suicide scores was 36, and the standard deviation 13. First we convert our value to a z-score: the 30 becomes (30−36)/13 = −0.46. We want the area below this value (because 30 is below the mean), but this value is not tabulated in the Appendix. However, because the distribution is symmetrical, we could instead ignore the minus sign and look up this value in the column labelled ‘Smaller Portion’ (i.e. the area above the value 0.46). You should find that the probability is .32276, or, put another way, a 32.28% chance that a suicide victim would be 30 years old or younger. By looking at the column labelled ‘Bigger Portion’ we can also see the probability 16 that a suicide victim was aged 30 or more! This probability is .67724, or there’s a 67.72% chance that a suicide victim was older than 30 years old! Chapter 2 Self-Test Answers ⇒ We came across some data about the number of friends that 11 people had on Facebook (22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252). We calculated the mean for these data as 96.64. Now calculate the sums of squares, variance and standard deviation. To calculate the sum of squares, take the mean from each value, then square this difference. Finally, add up these squared values: So, the sum of squared errors is a massive 37544.55. The variance is the sum of squared errors divided by the degrees of freedom (N – 1). There were 11 scores and so the degrees of freedom were 10. The variance is, therefore, 37544.55/10 = 3754.45. 17 Finally, the standard deviation is the square root of the variance: 3754.45 = 61.27. ⇒ Calculate these values again but excluding the outlier (252). To calculate the sum of squares, take the mean from each value (note that it has changed because the outlier is excluded), then square this difference. Finally, add up these squared values: So, the sum of squared errors is 10992.90. The variance is the sum of squared errors divided by the degrees of freedom (N – 1). There were 10 scores and so the degrees of freedom were 9. The variance is, therefore, 10992.90/9 = 1221.43. Finally, the standard deviation is the square root of the variance: 1221.43 = 34.95. Note then that like the mean itself the standard deviation is hugely influenced by outliers: the removal of this one value has halved the standard deviation! 18 ⇒ We came across some data about the number of friends that 11 people had on Facebook. We calculated the mean for these data as 96.64 and standard deviation as 61.27. Calculate a 95% confidence interval for this mean. First we need to calculate the standard error, σX = s N = 61.27 11 = 18.47 The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, N – 1. With 11 data points, the degrees of freedom are 10. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the tdistribution (Appendix). The corresponding value is 2.23. The confidence intervals are, therefore: Lower Boundary of Confidence Interval = X − (2.23 × SE) = 96.64 – (2.23 × 18.47) = 55.63 Upper Boundary of Confidence Interval = X − (2.23 × SE) = 96.64 + (2.23 × 18.47) =137.65 ⇒ Recalculate the confidence interval assuming that the sample size was 56. First we need to calculate the new standard error, σX = s N = 61.27 56 = 8.19 The sample is big now, so to calculate the confidence interval we can use the critical value of z for a 95% confidence interval (i.e. 1.96). The confidence intervals are therefore: Lower Boundary of Confidence Interval = X − (1.96 × SE) = 96.64 – (1.96 × 8.19) = 80.59 Upper Boundary of Confidence Interval = X − (1.96 × SE) = 96.64 + (1.96 × 8.19) = 112.69 19 ⇒ What are the null and alternative hypotheses for the following questions: • ‘Is there a relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten?’ o Null Hypothesis: There will be no relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten. o Null Hypothesis: There will be a relationship between the amount of gibberish that people speak and the amount of vodka jelly they’ve eaten. • ‘Is the mean amount of chocolate eaten higher when writing statistics books then when not?’ o Null Hypothesis: There will be no difference in the mean amount of chocolate eaten when writing statistics textbooks compared to when not writing them. o Alternative Hypothesis: The mean amount of chocolate eaten when writing statistics textbooks will be higher than when not writing them. Chapter 3 Self-Test Answers ⇒ Why is the ‘number of friends’ variable a ‘scale’ variable? 20 It is a scale variable because the numbers represent consistent intervals and ratios along the measurement scale: the difference between having (for example) 1 and 2 friends is the same as the difference between having (for example) 10 and 11 friends, and (for example) 20 friends is twice as many as 10. ⇒ Having created the first four variables with a bit of guidance, try to enter the rest of the variables in Table 2.1 yourself. The finished data and variable views should look like those in the figure below (more or less!). You can also download this data file (Data with which to play.sav). 21 Chapter 4 Self-Test Answers ⇒ What does a histogram show? A histogram is a graph in which values of observations are plotted on the horizontal axis, and the frequency with which each value occurs in the data set is plotted on the vertical axis. ⇒ Produce boxplots for the day 2 and day 3 hygiene scores and interpret them. The boxplot for the day 2 data should look like this: 22 Note that as for day 1 the females are slightly more fragrant than males (look at the median line). However, if you compare these to the day 1 boxplots (in the book) scores are getting lower (i.e. people are getting less hygienic). In the males there are now more outliers (i.e. a rebellious few who have maintained their sanitary standards). The boxplot for the day 3 data should look like this: Note that compared to day 1 and day 2 the females are getting more like the males (i.e. smelly). However, if you look at the top whisker, this is much longer for the females. In other words, the top 25% of females are more variable in how smelly they are compared to males. Also, the top score is higher than for males. So, at the top end females are better at maintaining their hygiene at the festival compared to males. Also, the box is longer for females, and although both boxes start at the same score, the top edge of the box is higher in females, again suggesting that above the median score more women are achieving higher levels of hygiene than men. Finally, note that for both days 1 and 2, the boxplots have become less symmetrical (the top whiskers are longer than the bottom whiskers). On day 1 (see the book chapter), which is symmetrical, the 23 whiskers on either side of the box are of equal length (the range of the top and bottom 25% of scores is the same); however, on days 2 and 3 the whisker coming out of the top of the box is longer than that at the bottom, which shows that the distribution is skewed (i.e. the top 25% of scores is spread out over a wider range than the bottom 25%). Use what you learnt earlier to add error bars to this graph and to label both the x- (I suggest ‘Time’) and y-axis (I suggest ‘Mean Grammar Score (%)’). Simple Line Charts for Independent Means To begin with, imagine that a film company director was interested in whether there was really such a thing as a ‘chick flick’ (a film that typically appeals to women more than men). He took 20 men and 20 women and showed half of each sample a film that was supposed to be a ‘chick flick’ (Bridget Jones’ Diary), and the other half of each sample a film that didn’t fall into the category of ‘chick flick’ (Memento, a brilliant film by the way). In all cases he measured their physiological arousal as a measure of how much they enjoyed the film. The data are in a file called ChickFlick.sav on the companion website. Load this file now. First of all, let’s just plot the mean rating of the two films. We have just one grouping variable (the film) and one outcome (the arousal); therefore, we want a simple line chart. Therefore, in the Chart Builder double-click on the icon for a simple line chart. On the canvas you will see a graph and two drop zones: one for the y-axis and one for the x-axis. The y-axis needs to be the dependent variable, or the thing you’ve measured, or more simply the thing for which you want to display the mean. In this case it would be arousal, 24 so select arousal from the variable list and drag it into the y-axis drop zone ( ). The x-axis should be the variable by which we want to split the arousal data. To plot the means for the two films, select the variable film from the variable list and drag it into the drop zone for the x-axis ( ). Dialog boxes for a simple line chart with error bar The figure above shows some other options for the line chart. The main dialog box should appear when you select the type of graph you want, but if it doesn’t click on in the Chart Builder. There are three important features of this dialog box. The first is that, by default, the lines will display the mean value. This is fine, but just note that you can plot other summary statistics such as the median or mode. Second, you can adjust the form of 25 the line that you plot. The default is a straight line, but you can have others like a spline (curved line). Finally, we can ask SPSS to add error bars to our line chart by selecting . We have a choice of what our error bars represent. Normally, error bars show the 95% confidence interval, and I have selected this option ( ). Note, though, that you can change the width of the confidence interval displayed by changing the ‘95’ to a different value. You can also display the standard error (the default is to show two standard errors, but you can change this to one) or standard deviation (again, the default is two but this could be changed to one or another value). It’s important that when you change these properties that you click on applied to Chart Builder. Click on : if you don’t then the changes will not be to produce the graph. Line chart of the mean arousal for each of the two films The resulting line chart displays the mean (and the confidence interval of those means). This graph shows us that, on average, people were more aroused by Memento than they were by Bridget Jones’ Diary. However, we originally wanted to look for gender effects, so this graph isn’t really telling us what we need to know. The graph we need is a multiple line graph. 26 Multiple Line Charts for Independent Means To do a multiple line chart for means that are independent (i.e. have come from different groups) we need to double-click on the multiple line chart icon in the Chart Builder (see the book chapter). On the canvas you will see a graph as with the simple line chart but there is now an extra drop zone: . All we need to do is to drag our second grouping variable into this drop zone. As with the previous example then, select arousal from the variable list and drag it into , select film from the variable list and drag it into . In addition, though, we can now select the gender variable and drag it into . This will mean that lines representing males and females will be displayed in different colours. As in the previous section, select error bars in the properties dialog box and click on to apply them to the Chart Builder. Click on to produce the graph. 27 Dialog boxes for a multiple line chart with error bars 28 Line chart of the mean arousal for each of the two films The resulting line chart tells us the same as the simple line graph: that is, arousal was overall higher for Memento than Bridget Jones’ Diary, but it also splits this information by gender. Look first at the mean arousal for Bridget Jones’ Diary; this shows that males were actually more aroused during this film than females. This indicates they enjoyed the film more than the women did! Contrast this with Memento, for which arousal levels are comparable in males and females. On the face of it, this contradicts the idea of a ‘chick flick’: it actually seems that men enjoy chick flicks more than the chicks (probably because it’s the only help we get to understand the complex workings of the female mind!). Simple Line Charts for Related Means Hiccups can be a serious problem. Charles Osborne apparently got a case of hiccups while slaughtering a hog (well, who wouldn’t?) that lasted 67 years. People have many methods for stopping hiccups (a surprise, holding your breath), but actually medical science has put its collective mind to the task too. The official treatment methods include tongue–pulling manoeuvres, massage of the carotid artery and, believe it or not, digital rectal massage (Fesmire, 1988). I don’t know the details of what the digital rectal massage involved, but I can probably imagine. Let’s say we wanted to put this to the test. We took 15 hiccup sufferers, and during a bout of hiccups administered each of the three procedures (in random order and at 5 minute intervals) after taking a baseline of how many hiccups they had per minute. We counted the number of hiccups in the minute after each procedure. Load the file Hiccups.sav. Note that these data are laid out in different columns; there is no grouping variable that specifies the interventions because each patient experienced all interventions. In the previous two examples we have used grouping variables to specify aspects of the graph (e.g. we used the grouping variable film to specify the x-axis). For 29 repeated-measures data we will not have these grouping variables and so the process of building a graph is a little more complicated (but not a lot more). To plot the mean number of hiccups go to the Chart Builder and double-click on the icon for a simple line chart. As before, you will see a graph on the canvas with drop zones for the x- and y-axes. Previously we specified the column in our data that contained data from out outcome measure on the y-axis, but for these data we have four columns containing data on the number of hiccups (the outcome variable). What we have to do then is to drag all four of these variables from the variable list into the y-axis drop zone. We have to do this simultaneously. To do this we first need to select multiple items in the variable list: to do this select the first variable by clicking on it with the mouse. The variable will be highlighted in blue. Now, hold down the Ctrl key on the keyboard and click on a second variable. Both variables are now highlighted in blue. Again, hold down the Ctrl key and click on a third variable in the variable list and so on for the fourth. In cases in which you want to select a list of consecutive variables, You can do this very quickly by simply clicking on the first variable that we want to select (in this case baseline), then hold down the Shift key on the keyboard and then click on the last variable that you want to select (in this case digital rectal massage); notice that all of the variables in between have been selected too. Once the four variables are selected you can drag them by clicking on any one of the variables and then dragging them into as shown in the figure: 30 Specifying a simple line chart for repeated-measures data Once you have dragged the four variables onto the y-axis drop zones a new dialog box appears. This box tells us that SPSS is c...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture