Discovering Statistics with SAS.pdf - DISCOVERING...

This preview shows page 1 out of 753 pages.

Unformatted text preview: DISCOVERING STATISTICS USING SAS Student comments about Discovering Statistics Using SPSS ‘This book is amazing, I love it! It is responsible for raising my degree classification to a 1st class. I want to thank you for making a down to earth, easy to follow no nonsense book about the stuff that really matters in statistics.’ Tim Kock ‘I wanted to tell you; I LOVE YOUR SENSE OF HUMOUR. Statistics make me cry usually but with your book I almost mastered it. All I can say is keep writing such books that make our life easier and make me love statistics.’ Naïlah Moussa ‘Just a quick note to say how fantastic your stats book is. I was very happy to find a stats book which had sensible and interesting (sex, drugs and rock and roll) worked examples.’ Josephine Booth ‘I am deeply in your debt for your having written Discovering Statistics Using SAS (2nd edition). Thank you for a great contribution that has made life easier for so many of us.’ Bill Jervis Groton ‘I love the way that you write. You make this twoddle so lively. I am no longer crying, swearing, sweating or threatening to jack in this stupid degree just because I can’t do the statistics. I am elated and smiling and jumping and grinning that I have come so far and managed to win over the interview panel using definitions and phrases that I read in your book!!! Bring it on you nasty exams. This candidate is Field Trained ...’ Sara Chamberlain ‘I just wanted to thank you for your book. I am working on my thesis and making sense of the statistics. Your book is wonderful!’ Katya Morgan ‘Sitting in front of a massive pile of books, in the midst of jamming up revision for exams, I cannot help distracting myself to tell you that your book keeps me smiling (or mostly laughing), although I am usually crying when I am studying. Thank you for your genius book. You have actually made a failing math student into a first class honors student, all with your amazing humor. Moreover, you have managed to convert me from absolutely detesting statistics to ‘actually’ enjoying them. For this, I thank you immensely. At university we have a great laugh on your jokes … till we finish our degrees your book will keep us going!’ Amber Atif Ghani ‘Your book has brought me out of the darkness to a place where I feel I might see the light and get through my exam. Stats is by far not my strong point but you make me feel like I could live with it! Thank you.’ Vicky Learmont ‘I just wanted to email you and thank you for writing your book, Discovering Statistics Using SAS. I am a graduate student at the University of Victoria, Canada, and have found your book invaluable over the past few years. I hope that you will continue to write more in-depth stats books in the future! Thank you for making my life better!’ Leila Scannell ‘For a non-math book, this book is the best stat book that I have ever read.’ Dvir Kleper DISCOVERING STATISTICS USING SAS (and sex and drugs and rock ’n’ roll) A ND Y F I E LD a n d J ER E M Y M I LES © Andy Field and Jeremy Miles 2010 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1 Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 Library of Congress Control Number Available British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN 978-1-84920-091-2 ISBN 978-1-84920-092-9 Typeset by C&M Digitals (P) Ltd, Chennai, India Printed by CPI Antony Rowe, Chippenham, Wiltshire Printed on paper from sustainable resources CONTENTS Preface xix How to use this book xxiii Acknowledgements xxvii Dedication xxix Symbols used in this book xxx Some maths revision xxxii 1 Why is my evil lecturer forcing me to learn statistics? 1.1. 1.2. 1.3. 1.4. 1.5. What will this chapter tell me? 1 What the hell am I doing here? I don’t belong here 1.2.1. The research process 1.5.1. Variables 1.5.2. Measurement error 1.5.3. Validity and reliability 1.6. 1.7. Initial observation: finding something that needs explaining Generating theories and testing them 1 Data collection 1: what to measure 1 1 1 1 1 1 Data collection 2: how to measure 1.6.1. Correlational research methods 1 1 1.6.2. Experimental research methods 1.6.3. Randomization Analysing data 1 1 1 1.7.1. Frequency distributions 1 1.7.2. The centre of a distribution 1 1.7.3. The dispersion in a distribution 1 1.7.4. Using a frequency distribution to go beyond the data 1.7.5. Fitting statistical models to the data 1 What have I discovered about statistics? Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 1 1 1 1 1 2 3 3 4 7 7 10 11 12 12 13 17 18 18 20 23 24 26 28 28 29 29 30 v vi D I S C O VE R I N G STAT I ST I C S US I N G SAS 2 Everything you ever wanted to know about statistics (well, sort of) 2.1. 2.2. 2.3. 2.4. What will this chapter tell me? Building statistical models 1 Populations and samples 1 Simple statistical models 1 31 1 2.4.1. The mean: a very simple statistical model 1 31 32 34 35 35 2.4.2. Assessing the fit of the mean: sums of squares, variance and standard 2.4.3. Expressing the mean as a model deviations 2.5. Going beyond the data 1 2.5.1. The standard error 1 2.5.2. Confidence intervals 2.6. 2 2 Using statistical models to test research questions 2.6.1. Test statistics 1 2.6.2. One- and two-tailed tests 1 2.6.3. Type I and Type II errors 2.6.4. Effect sizes 2.6.5. Statistical power 2 1 1 3 The SAS environment 3.1. 3.2. 3.3. 3.4. 3.5. 2 What have I discovered about statistics? Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 1 61 What will this chapter tell me? 1 Versions of SAS 1 Getting started 1 Libraries 1 The program editor: getting data into SAS 1 3.5.1. Entering words and numbers into SAS with the program editor 3.5.2. Entering dates Entering data with Excel 3.6. 3.6.1. Saving the data 3.6.2. Date formats 3.6.3. Missing values The data step 1 SAS formats 1 3.7. 3.8. 1 1 1 1 1 1 3.8.1. Built in formats 3.8.2. User-defined formats 3.9. Variable labels 1 3.10. More on data steps 35 38 40 40 43 48 52 54 55 56 58 59 59 59 60 60 1 1 1 3.10.1. Calculating a new variable 3.10.2. Conditional (If ...then...) statements 3.10.3. Getting rid of variables 3.10.4. Getting rid of cases 1 1 1 1 1 61 62 62 63 65 66 69 71 72 72 73 73 74 74 76 77 79 79 79 81 81 vii C ontents 3.11. 3.12. 3.13. 3.14. 3.15. 3.16. 3.17. Procs for checking data 1 Output and results 1 Looking at data with proc contents 1 What have I discovered about statistics? 1 Key terms that I’ve discovered Smart Alex’s tasks Further reading 81 82 85 85 86 86 87 4 Exploring data with graphs 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. What will this chapter tell me? The art of presenting data 1 88 1 4.2.1. What makes a good graph? 1 4.2.2. Lies, damned lies, and … erm … graphs 4.6.1. 4.6.2. 4.6.3. 4.6.4. Graphing relationships: the scatterplot 1 4.7.2. Grouped scatterplot 4.7.3. Simple and grouped 3-D scatterplots What have I discovered about statistics? Key terms that I’ve discovered 1 Smart Alex’s tasks Further reading Interesting real research 4.7.4. Matrix scatterplot 5 Exploring assumptions 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. 5.7. 1 Simple bar charts for independent means 1 Clustered bar charts for independent means 1 Simple bar charts for related means 1 Clustered bar charts for ‘mixed’ designs 1 4.7.1. Simple scatterplot Charts in SAS Histograms: a good way to spot obvious problems Boxplots (box–whisker diagrams) 1 Graphing means: bar charts and error bars 1 1 1 1 1 1 1 1 113 What will this chapter tell me? 1 What are assumptions? 1 Assumptions of parametric data 1 The assumption of normality 1 5.4.1. Oh no, it’s that pesky frequency distribution again: checking normality visually 1 5.4.2. Quantifying normality with numbers 1 5.4.3. Exploring groups of data 1 Testing whether a distribution is normal 1 5.5.1. Doing the Kolmogorov–Smirnov test on SAS 5.5.2. Output from the explore procedure 1 5.5.3. Reporting the K–S test 1 Testing for homogeneity of variance 5.6.1. Levene’s test 1 Correcting problems in the data 88 88 89 91 92 93 95 100 100 101 102 104 106 106 108 108 110 111 112 112 112 112 2 1 1 113 114 114 115 116 117 121 127 127 128 130 130 131 133 viii D I S C O VE R I N G STAT I ST I C S US I N G SAS 5.7.1. 5.7.2. 5.7.3. 5.7.4. 133 134 136 140 1 What have I discovered about statistics? 142 Key terms that I’ve discovered   143 Smart Alex’s tasks 143 Further reading 143 Dealing with outliers 2 Dealing with non-normality and unequal variances Transforming the data using SAS 2 When it all goes horribly wrong 3 2 6 Correlation 6.1. 6.2. 6.3. 144 What will this chapter tell me? 1 Looking at relationships 1 How do we measure relationships? 1 6.3.1. A detour into the murky world of covariance 6.3.2. Standardization and the correlation coefficient 1 6.3.3. The significance of the correlation coefficient 6.3.4. Confidence intervals for r 6.4. 6.5. 6.6. 3 Data entry for correlation analysis using SAS Bivariate correlation 1 1 6.5.2. Pearson’s correlation coefficient 1 6.5.3. Spearman’s correlation coefficient 6.5.4. Kendall’s tau (non-parametric) 2 1 1 6.6.3. Semi-partial (or part) correlations 6.8. 6.9. 1 6.6.2. Partial correlation using SAS 1 6.5.1. General procedure for running correlations on SAS 6.6.1. The theory behind part and partial correlation 6.7. 3 1 6.3.5. A word of warning about interpretation: causality Partial correlation 144 145 145 145 147 149 150 151 152 152 152 154 157 159 160 160 162 163 164 164 165 166 166 168 169 169 170 170 Comparing correlations 3 2 2 2 6.7.1. Comparing independent rs 3 3 6.7.2. Comparing dependent rs Calculating the effect size 1 How to report correlation coefficents 1 What have I discovered about statistics? Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 1 7 Regression 7.1. 7.2. 171 What will this chapter tell me? 1 An introduction to regression 1 7.2.1. 7.2.2. 7.2.3. 7.2.4. Some important information about straight lines 1 The method of least squares 1 Assessing the goodness of fit: sums of squares, R and R2 Assessing individual predictors 1 1 171 172 173 174 175 178 ix C ontents 7.3. 7.4. 7.5. 7.6. 7.7. 7.8. 7.4.1. Overall fit of the model 7.4.2. Model parameters 1 7.4.3. Using the model 1 1 Multiple regression: the basics 2 7.5.1. An example of a multiple regression model 7.5.2. Sums of squares, R and R2 2 7.5.3. Methods of regression 2 How accurate is my regression model? 2 2 7.6.1. Assessing the regression model I: diagnostics 2 7.6.2. Assessing the regression model II: generalization How to do multiple regression using SAS 7.7.1. 7.7.2. 7.7.3. 7.7.4. 2 2 2 Some things to think about before the analysis Main options 2 Statistics 2 Saving regression diagnostics 2 Interpreting multiple regression 7.8.1. 7.8.2. 7.8.3. 7.8.4. 7.8.5. 7.8.6. 179 180 180 181 182 183 184 185 186 188 188 194 199 199 200 201 202 203 204 207 211 211 214 216 221 221 222 222 225 228 229 229 230 230 2 Simple statistics 2 Model parameters 2 Comparing the models 1 Assessing the issue of multicollinearity Casewise diagnostics 1 Checking assumptions 2 1 7.9. What if I violate an assumption? 2 7.10. How to report multiple regression 2 7.11. Categorical predictors and multiple regression Doing simple regression on SAS 1 Interpreting a simple regression 1 7.11.1. Dummy coding 3 7.11.2. SAS output for dummy variables 3 3 What have I discovered about statistics? Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 1 8 Logistic regression 8.1. 8.2. 8.3. 8.4. 8.5. 231 What will this chapter tell me? 1 Background to logistic regression 1 What are the principles behind logistic regression? 8.3.1. 8.3.2. 8.3.3. 8.3.4. 3 Assessing the model: the log-likelihood statistic 3 Assessing the model: R, R2 and c 3 Assessing the contribution of predictors: the Wald statistic The odds ratio 3 Assumptions and things that can go wrong 4 8.4.1. Assumptions 2 8.4.2. Incomplete information from the predictors 8.4.3. Complete separation 4 2 4 Binary logistic regression: an example that will make you feel eel 2 231 232 232 234 235 237 238 239 239 240 240 242 x D I S C O VE R I N G STAT I ST I C S US I N G SAS 8.6. 8.5.1. The main analysis 2 8.5.2. Obtaining predicted probabilities and residuals 8.5.3. Final syntax Interpreting logistic regression 2 8.6.1. The initial output 8.6.2. Intervention 8.6.3. Listing predicted probabilities 8.7. 8.8. 8.9. 2 8.6.4. Interpreting residuals 2 2 8.6.5. Calculating the effect size 2 How to report logistic regression 2 Testing assumptions: another example 8.8.1. Testing for linearity of the logit 2 3 8.8.2. Testing the assumption of linearity of the logit Predicting several categories: multinomial logistic regression 8.9.1. Running multinomial logistic regression in SAS 8.9.2. The final touches 2 3 3 8.9.3. Interpreting the multinomial logistic regression output 8.9.4. Reporting the results 3 3 What have I discovered about statistics? Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 1 9 Comparing two means 9.1. 9.2. 9.3. 9.4. 9.5. 9.6. 9.7. 9.8. 9.3.1. Two example data sets 9.3.2. Rationale for the t-test 1 1 1 9.3.3. Assumptions of the t-test The dependent t-test 1 9.4.1. Sampling distributions and the standard error 9.4.2. The dependent t-test equation explained 1 1 9.4.3. The dependent t-test and the assumption of normality 9.4.4. Dependent t-tests using SAS 1 9.4.5. Output from the dependent t-test 9.4.6. Calculating the effect size 2 1 1 9.4.7. Reporting the dependent t-test The independent t-test 268 What will this chapter tell me? Looking at differences 1 The t-test 1 1 3 243 244 245 245 245 246 250 251 253 253 254 255 256 258 259 260 260 264 265 265 265 267 267 1 9.5.1. The independent t-test equation explained 9.5.2. The independent t-test using SAS 9.5.3. Output from the independent t-test 9.5.4. Calculating the effect size 2 1 2 1 1 9.5.5. Reporting the independent t-test 1 Between groups or repeated measures? 1 The t-test as a general linear model 2 What if my data are not normally distributed? What have I discovered about statistics? 1 Key terms that I’ve discovered 1 268 269 269 270 271 273 273 274 274 276 276 277 278 279 280 280 284 285 286 287 288 288 290 291 291 xi C ontents Smart Alex’s task Further reading Interesting real research 291 292 292 10 Comparing several means: anova (glm 1) 10.1. What will this chapter tell me? 10.2. The theory behind anova 2 10.2.3. 10.2.4. 10.2.5. 10.2.6. 10.2.7. 10.2.8. 10.2.9. Interpreting f 2 2 Anova as regression Logic of the f-ratio 2 2 Total sum of squares (sst) 2 Model sum of squares (ssm) Mean squares The F-ratio 2 2 2 2 Post hoc tests in SAS 2 2 2 ODS graphics 10.4.1. Output for the main analysis 10.4.3. 2 Planned comparisons using SAS Options 3 10.3.4. 10.4.2. 2 10.2.12. Post hoc procedures 10.3.3. 10.2.11. Planned contrasts 10.3.2. 2 10.2.10. Assumptions of anova 10.3.1. Residual sum of squares (ssr) 2 10.4. Output from one-way anova 10.2.2. Inflated error rates 10.3. Running one-way anova on SAS 10.2.1. 1 293 2 2 Output for trends and planned comparisons Output for post hoc tests 2 2 10.4.4. Graph output for one-way ANOVA using PROC GLM 2 10.5. Calculating the effect size 2 10.6. Reporting results from one-way independent anova 2 10.7. Violations of assumptions in one-way independent anova What have I discovered about statistics? 1 Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 2 11 Analysis of covariance, ancova (glm 2) 11.1. What will this chapter tell me? 2 11.2. What is ancova? 2 11.3. Assumptions and issues in ancova 11.3.1. 11.3.2. Homogeneity of regression slopes 11.4.1. Inputting data 11.4.3. variable and covariate The main analysis 2 1 2 337 Independence of the covariate and treatment effect 11.4. Conducting ancova on SAS 3 3 3 11.4.2. Initial considerations: testing the independence of the independent 2 293 294 294 295 295 300 302 302 303 304 304 305 306 317 319 320 322 322 323 324 324 327 328 330 331 333 333 334 334 335 336 336 337 338 339 339 341 341 341 343 344 xii D I S C O VE R I N G STAT I ST I C S US I N G SAS 11.4.4. Contrasts and other options 2 11.6. 11.7. 11.8. 11.9. 11.10. 11.5.1. 11.5.2. 11.5.3. 11.5.4. 344 345 345 346 348 349 351 351 353 355 356 356 357 357 358 358 11.5. Interpreting the output from ancova 2 What happens when the covariate is excluded? The main analysis 2 Contrasts 2 Interpreting the covariate 2 2 Ancova run as a multiple regression 2 Testing the assumption of homogeneity of regression slopes Calculating the effect size 2 Reporting results 2 What to do when assumptions are violated in ancova 3 What have I discovered about statistics? 2 Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 3 12 Factorial anova (glm 3) 359 12.1. What will this chapter tell me? 2 12.2. Theory of factorial anova (between groups) 12.2.1. 12.2.2. 12.2.3. Factorial designs 2 An example with two independent variables Total sums of squares (sst) 2 12.2.5. 12.2.6. The residual sum of squares (ssr) 12.3.1. 12.3.2. 12.3.3. 12.5. 12.6. 12.7. 12.8. 12.9. 2 2 2 Contrasts and estimates 2 Simple effects 2 12.3.5. ODS graphics 2 12.4.1. The main ANOVA tables 12.4.2. 2 Exploring the data: proc means Post hoc tests 12.3.4. 2 12.4. Output from factorial anova 2 12.3. Factorial anova using SAS The F-ratios 2 The model sum of squares (ssm) 12.2.4. 2 2 2 Contrasts 12.4.3. Least squares means and post hoc analysis 12.4.5. Summary 2 12.4.4. Simple effects Interpreting interaction graphs 2 Calculating effect sizes 3 Reporting the results of two-way anova 2 Factorial anova as regression 3 What to do when assumptions are violated in factorial anova What have I discovered about statistics? 2 Key terms that I’ve discovered Smart Alex’s tasks Further reading Interesting real research 3 359 360 360 361 362 364 366 367 368 368 369 370 371 371 372 372 375 375 377 378 379 382 384 384 388 390 390 390 391 392 xiii C ontents 13 Repeated-measures designs (glm 4) 13.1. What will this chapter tell me? 2 13.2. Introduction to repeated measures designs 13.3. 13.4. 13.5. 13.6. 13.7. 13.8. 13.9. 13.10. 13.11. 13.12. 393 394 13.2.1. The assumption of sphericity 2 395 13.2.2. How is sphericity measured? 2 395 13.2.3. Assessing the severity of departures from sphericity 2 396 13.2.4. What is the effect of violating the assumption of sphericity? 3 396 13.2.5. What do you do if you violate sphericity? 2 397 Theory of one-way repeated measures anova 2 398 13.3.1. The total sum of squares (sst) 2 400 13.3.2. The within-participant sum of squares (ssw) 2 401 13.3.3. The model sum of squares (ssm) 2 402 13.3.4. The residual sum of squares (ssr) 2 403 13.3.5. The mean squares 2 403 2 13.3.6. The F-ratio 403 13.3.7. The between-participant sum of squares 2 404 One-way repeated-measures anova using SAS 2 404 13.4.1. The main analysis 2 404 13.4.2. Defining contrasts for repeated measures 2 405 Output for one-way repeated-measures anova 2 407 13.5.1. Model description 407 13.5.2. Assessing and correcting for sphericity: Mauchly’s test 2 408 13.5.3. The main anova 2 409 13.5.4. Contrasts 2 411 Effect sizes for repeated-measures an...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture