[Richard_M._Heiberger,_Burt_Holland]_Statistical_A(BookZZ.org) - Springer Texts in Statistics Advisors George Casella Stephen Fienberg Ingram Olkin

[Richard_M._Heiberger,_Burt_Holland]_Statistical_A(BookZZ.org)

This preview shows page 1 out of 728 pages.

You've reached the end of your free preview.

Want to read all 728 pages?

Unformatted text preview: Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Springer Texts in Statistics Alfred Berger Bilodeau and Brenner Blom Brockwell and Davis Carmona Chow and Teicher Christensen Christensen Christensen Creighton Davis Dean and Voss du Toit, Steyn, and Stumpf Durrett Edwards Finkelstein and Levin Flury Heiberger and Holland Jobson Jobson Kalbfleisch Kalbfleisch Karr Keyfitz Kiefer Kokoska and Nevison Kulkarni Lange Elements of Statistics for the Life and Social Sciences Introduction to Probability and Stochastic Processes, Second Edition Theory of Multivariate Statistics Probability and Statistics: Theory and Applications Introduction to Time Series and Forecasting, Second Edition Statistical Analysis of Financial Data in S-Plus Probability Theory: Independence, Interchangeability, Martingales, Third Edition Advanced Linear Modeling: Multivariate, Times Series, and Spatial Data; Nonparametic Regression and Response Surface Maximization, Second Edition Log-Linear Models and Logistic Regression, Second Edition Plane Answers to Complex Questions: The Theory of Linear Models, Second Edition A First Course in Probability Models and Statistical Inference Statistical Methods for the Analysis of Repeated Measurements Design and Analysis of Experiments Graphical Exploratory Data Analysis Essential of Stochastic Processes Introduction to Graphical Modeling, Second Edition Statistics for Lawyers A First Course in Multivariate Statistics Statistical Analysis and Data Display: An Intermediate Course with Examples in S-PLUS, R, and SAS Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design Applied Multivariate Data Analysis, Volume II: Categorical and Multivariate Methods Probability and Statistical Inference, Volume I: Probability, Second Edition Probability and Statistical Inference, Volume II: Statistical Interference, Second Edition Probability Applied Mathematical Demography, Second Edition Introduction to Statistical Inference Statistical Tables and Formulae Modeling, Analysis, Design, and Control of Stochastic Systems Applied Probability Continued after index Richard M. Heiberger Burt Holland Statistical Analysis and Data Display An Intermediate Course with Examples in S-PLUS, R, and SAS With 200 Figures f) Springer Richard M. Heiberger Department of Statistics Temple University Philadelphia, PA 19122 USA [email protected] Burt Holland Department of Statistics Temple University Philadelphia, P A 19122 USA [email protected] Editorial Board George Casella Stephen Fienberg Ingram Olkin Department of Statistics University of Florida Gainesville, FL 32611-8545 USA Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213-3890 USA Department of Statistics Stanford University Stanford, CA 94305 USA Cover illustration: Cover art is a variation of Figure 14.14d. The data source is (Williams, 2001). Cygwin: Copyright © 1996, 1998, 200 I, 2003 Free Software Foundation, Inc. EMACS: Copyright © 1989, 1991 Free Software Foundation, Inc. Excel: Copyright © 1985-1999, Microsoft Corp. Ghostscript: Copyright © 1994, 1995, 1997, 1998, 1999,2000 Aladdin Enterprises, Menlo Park, California, U.S.A. All rights reserved. GSview: Copyright © 1993-2001 Ghostgum Software Ply Ltd. Internet Explorer: Copyright © 1995-2001 Microsoft Corp. Linux: Copyright © 2004, Eklektix, Inc. LogXact: Copyright © Cytel Software Corporation MathType: Copyright © 1990-1999 Design Science, Inc. Microsoft Windows: Copyright © 1981-2001 Microsoft Corp. MiKTeX: Copyright © 1999 Christian Schenk MS-DOS: Copyright © 1985-2001 Microsoft Corp. MS-Word: Copyright © 1983-1999, Microsoft Corp. PostScript: Copyright © Adobe Systems Incorporated R: Copyright © 2002, The R Development Core Team SAS: Copyright © 2002 by SAS Institute Inc., Cary, NC, USA. sas.l1brary/code/ischeffe.sas: copyright holder unknown. S-Plus: Copyright © 1988, 2002 Insightful Corp. Stata: Copyright © 1984-2002 Stata Corp. TeX is a trademark of the American Mathematical Society. Unix: Copyright © 1998 The Open Group Windows XP: Copyright © 200 I Microsoft Corporation. All rights reserved. XLISP-STAT 2.1 Copyright © 1990, by Luke Tierney ISBN 978-1-4419-2320-2 ISBN 978-1-4757-4284-8 (eBook) DOI 10.1007/978-1-4757-4284-8 © 2004 Springer Science+Business Media New York Originally published by Springer Science+Business Media Inc. in 2004. Softcover reprint of the hardcover I st edition 2004 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC , except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is notto be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 765 4 3 2 1 SPIN 10935286 In loving memory of Mary Morris Heiberger To my family : Margaret, Irene, Andrew, and Ben Holland Preface 1 Audience Students seeking master's degrees in applied statistics in the late 1960s and 1970s typically took a year-long sequence in statistical methods. Popular choices of the course text book in that period prior to the availability of highspeed computing and graphics capability were those authored by Snedecor and Cochran, and Steel and Torrie. By 1980, the topical coverage in these classics failed to include a great many new and important elementary techniques in the data analyst's toolkit. In order to teach the statistical methods sequence with adequate coverage of topics, it became necessary to draw material from each of four or five text sources. Obviously, such a situation makes life difficult for both students and instructors. In addition, statistics students need to become proficient with at least one high-quality statistical software package. This book can serve as a standalone text for a contemporary year-long course in statistical methods at a level appropriate for statistics majors at the master's level or other quantitatively oriented disciplines at the doctoral level. The topics include both concepts and techniques developed many years ago and a variety of newer tools not commonly found in textbooks. This text requires some previous studies of mathematics and statistics. We suggest some basic understanding of calculus including maximization or minimization of functions of one or two variables, and the ability to undertake definite integrations of elementary functions. We recommend acquired knowledge from an earlier statistics course, including a basic understanding viii of statistical measures, probability distributions, interval estimation, and hypothesis testing. 2 Structure The book is organized around statistical topics. Each chapter introduces concepts and terminology, develops the rationale for its methods, presents the mathematics and calculations for its methods, and gives examples supported by graphics and computer output, culminating in a writeup of conclusions. Some chapters have greater detail of presentation than others, based on our personal interests and expertise. Our emphasis on graphical display of data is a distinguishing characteristic of this book. Many of our graphical displays appear here for the first time. Appendix G summarizes those new graphs that are based on Cartesian products. We show graphs, how to construct and interpret them, and how they relate to the tabular outputs that appear automatically when a statistical program "analyzes" a data set. The graphs are not automatic and so must be requested. Gaining an understanding of a data set is always more easily accomplished by looking at appropriately drawn graphs than by examining tabular summaries. In our opinion, graphs are the heart of most statistical analyses; the corresponding tabular results are formal confirmations of our visual impressions. We advanced this point of view in seminars and presentations ((Heiberger, 1998), (Heiberger and Holland, 2002), (Heiberger and Holland, 2003b), and (Heiberger and Holland, 2003a)) and so have others, for example (Gelman et al., 2002). A vivid demonstration of it appears in Section 4.2. We have chosen to work with both of what we believe are the two leading statistical languages available today: S (available as both S-PLUS and R), and SAS. S is an exceptionally well-developed tool for statistical research and analysis, that is for exploring and designing new techniques of analysis, as well as for analysis. S is especially strong for statistical graphics, the output of data analysis through which both the raw data and the results are displayed for the analyst and the client. SAS is the most widely used package for serious and extensive statistical analysis and data management. Because of our heavy use of graphics as an essential part of most analyses, we make somewhat heavier use of S than SAS. We frequently mention the package name S-PLUS, rather than the language name S, in situations where S-PLUS and R could equally well be used. Although we do not explicitly teach S-PLUS or SAS, we make the reader aware of their powerful capabilities by using them to perform the data anal- Preface ix yses we present. Sections B.5 and C.4 contain our currently recommended references for learning S-PLUS and SAS. All S-PLUS and SAS code used in the book appears in the companion online files that readers are expected to download from the Springer website (see Preface Section 3). We anticipate that readers will wish to adapt our code to their own data analyses. The code files used to produce the book's numerous graphs are identified alongside each graph. Readers are encouraged to examine these code files in the online files in order to gain full understanding of what has been plotted. We believe that a firm control of the language gives the analyst the tools to think about the ideal way to detect and display the information in the data. We focus our presentation on the written command languages, the most flexible descriptors of the statistical techniques. The written languages provide the opportunity for growth and understanding of the underlying techniques. The point-and-click technology is convenient for routine tasks. However, many interesting data analyses are not routine and therefore cannot be accomplished by pointing and clicking the icons provided by the program developers. 3 Data and Programs The data for all examples and exercises in this book, and the sample code in both languages [S (meaning S-PLUS and R) and SAS] for all examples and figures, are provided on the accompanying online files (Heiberger and Holland, 2004b). Occasionally we produce listing (output) files that are too big to include in this text. In such situations we place the complete file in the online files and only excerpts in the text. (The collection of directories and files in the online files is distributed from the Springer web page . com as a downloadable zipped file. Search for "Heiberger Holland". We recommend that readers burn a CD of the unzipped directories for reference and copy the entire directory structure to their hard disk for use. See the file README. HH on the website for details.) The filename in the online files is given in the text for every code fragment, function, and macro presented. The code and the PostScript file for every figure in the text is in the online files. Transcripts (*. st files for S-PLUS, and * .lst and occasionally * .log files for SAS) are included for code fragments that produce printed output. x The directories are structured by chapter, with three subdirectories for each. chapter/code/ chapter/transcript/ chapter/figure/ The filename is indicated at the time the example is presented. In addition, there are several directories not associated with specific chapters. datasets/ spIus.library/ sas.library/ software/ All datasets are in the datasets directory. The splus . library and sas . library directories contain general utilities and new analysis and display functions. All our code and examples assume that these libraries are attached. In S-PLUS and R, the libraries are attached by running the .First function described in Appendix B. The . First function must be customized for the individual computer. In SAS, the macros are made available by running the file hh. sas described in Appendix C. The hh. sas file must be customized for the individual computer. Both customizations are simple and these are the only customizations required. All our functions and input statements are defined relative to the paths defined in these customizations. Once these customizations have been made, all examples in the book work as written, with no changes. 4 Software We include in the Software Appendix A and the (sftw/code/url.htm) file the urIs to the software we recommend: • S-PLUS, Insightful's implementation of the S language • R, the GNU-licensed implementation of the S language • SAS • Ghostscript/Ghostview for displaying PostScript graphs Preface xi • Emacs, the extensible text editor from the Free Software Foundation • ESS (Emacs Speaks Statistics), an intelligent environment for statistical analysis • Springer, the online files for this book are distributed from the Springer website • H'-'IEX. We wrote this book in H'-'IEX (Lamport, 1986), the best mathematical typesetting software (and the one required by several statistics journals), so we provide the urI for that as well. 4.1 Microsoft Windows We include urIs for • Cygwin, an implementation of the Unix shell and other user tools for Microsoft Windows • Standalone utilities (gunzip, gzip, tar) that work in the MS-DOS prompt window • gnuservand ispell, utilities that work with Emacs • MathType fonts, for improved appearance of mathematics written in Microsoft Word. 4.2 Unix Most of the software listed above is distributed as part of Unix systems and is probably already available on the Unix system you are using. The statistical programs S-PLUS, R, and SAS, and the ESS interface between Emacs and the statistical software will be needed. 5 Exercises Learning requires that the student work a fair selection of the exercises provided, using, where appropriate, one of the statistical software packages we discuss. Beginning with the exercises in Chapter 5, even when not specifically asked to do so, the student should routinely plot the data in a way that illuminates its structure, and state all assumptions made and discuss their reasonableness. xii Acknowledgments We are indebted to many people for providing us advice, comments and assistance with this project. Among them are our editor John Kimmel and the production staff at Springer, our colleagues Francis Hsuan and Byron Jones, our current and former students (particularly Paolo Teles who coauthored the paper on which Chapter 18 is based, Kenneth Swartz, and Yuo Guo), and Sara R. Heiberger. Each of gratefully acknowledges the support of a study leave from Temple University. We are also grateful to Insightful Corp. for providing us with current copies of S-PLUS software for ourselves and our student, and to the many professionals who reviewed portions of early drafts of this manuscript. Contents Preface 1 2 3 4 5 Audience . . . . . . Structure . . . . . . Data and Programs Software . . . . . . . 4.1 Microsoft Windows. 4.2 Unix Exercises . . . . . . . . 1 Introduction and Motivation 1.1 Statistics in Context . . 1.2 Examples of Uses of Statistics . . . . . . . . . 1.2.1 Investigation of Salary Discrimination 1.2.2 Measuring Body Fat . . . . 1.2.3 Minimizing Film Thickness . . . . . . . 1.2.4 Surveys . . . . . . . . . . . . . . . . . . 1.2.5 Bringing Pharmaceutical Products to Market 1.3 The Rest of the Book . 1.3.1 Fundamentals ... 1.3.2 Linear Models . . . 1.3.3 Other Techniques. 1.3.4 New Graphical Display Techniques. 2 Data and Statistics 2.1 Types of Data . . . . . . . . . 2.2 Data Display and Calculation 2.2.1 Presentation . . . . . . vii vii viii IX x Xl xi Xl 1 3 4 4 5 5 6 6 7 7 7 9 9 11 11 12 13 xiv Contents 2.3 2.4 2.5 2.6 2.2.2 Rounding Importing Data. 2.3.1 S-Pws .. 2.3.2 SAS ... 2.3.3 Data Rearrangement . Analysis with Missing Data. . 2.4.1 Missing Data in S-PWS . 2.4.2 Missing Data in SAS. . . Tables and Graphs . . . . . . . . . Files for Statistical Analysis and Data Display (HH) 2.6.1 Datasets . . . . . . . . . . . . . 2.6.2 Code, Transcripts, and Figures 2.6.3 Functions and Macros 2.6.4 Software............. 3 Statistics Concepts 3.1 A Brief Introduction to Probability . . . . . . . . . . . . . . . 3.2 Random Variables and Probability Distributions . . . . . . . 3.2.1 Discrete Versus Continuous Probability Distributions 3.2.2 Displaying Probability Distributions . . . . . . . . 3.3 Concepts That Are Used When Discussing Distributions. 3.3.1 Expectation and Variance of Random Variables 3.3.2 Median of Random Variables. . . . . 3.3.3 Symmetric and Skewed Distributions . . . . . . 3.3.4 Displays of Univariate Data . . . . . . . . . . . . 3.3.5 Multivariate Distributions-Covariance and Correlation 3.4 Three Probability Distributions . . 3.4.1 The Binomial Distribution .. The Normal Distribution ... 3.4.2 3.4.3 The (Student's) t Distribution 3.5 Sampling Distributions .. 3.6 Estimation................ 3.6.1 Statistical Models. . . . . . . . 3.6.2 Point and Interval Estimators. 3.6.3 Criteria for Point Estimators . 3.6.4 Confidence Interval Estimation . 3.6.5 Example-Confidence Interval on the Mean JL of a Population Having Known Standard Deviation . . . . 3.6.6 Example-One-Sided Confidence Intervals 3.7 Hypothesis Testing. . . . . . . . . . . . . . . . . . . 3.8 Examples of Statistical Tests . . . . . . . . . . . . . 3.9 Power and Operating Characteristic (O.C.) Curves 3.10 Sampling . . . . . . . . . . . . . . 3.10.1 Simple Random Sampling. . . . . . . . . . 13 14 14 15 15 16 16 17 17 18 18 18 19 19 21 21 22 23 24 27 27 28 28 30 34 37 37 38 39 40 41 41 42 42 43 44 44 45 47 49 52 53 Contents 3.11 3.10.2 Stratified Random Sampling . 3.10.3 Cluster Random Sampling ... 3.10.4 Systematic Random Sampling 3.10.5 Standard Errors of Sample Means 3.10.6 Sources of Bias in Samples Exercises . . . . . . . . . . . . . . . 4 Graphs 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 Definition.............. Example-Ecological Correlation Scatterplots......... Scatterplot Matrix . . . . . . . . . Example-Life Expectancy. . . . Scatterplot Matrices-Continued. Data Transformations . . . . . . . Life Expectancy Example-Continued SAS Graphics. Exercises . . . . . . . . . . . . . . . . . 5 Introductory Inference 5.1 Normal (z) Intervals and Tests. . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Test of a Hypothesis Concerning the Mean of a Population Having Known Standard Deviation . . . . . . . . . . . . . . . . . . . 5.1.2 Confidence Intervals for Unknown Population Proportion p ... 5.1.3 Tests on an Unknown Population Proportion p. . . . . . . . . . . 5.1.4 Example-One-Sided Hypothesis Test Concerning a Population Proportion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 t-intervals and Tests for the Mean of a Population Having Unknown Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Confidence Interval on the Variance or Standard Deviation of a Normal Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Comparisons of Two Populations Based on Independent Samples. . . . . 5.4.1 Confidence Intervals on the Difference Between Two Population Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Confidence Interval on the Difference of Between Two Means . . 5.4.3 Tests Comparing Two Population Means When the Samples Are Independent ....
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes