Introduction to regression lecture notes

Unformatted text preview: Introduction to regression Adam J. Rothman April 12, 2011 Contents 1 Introduction 2 1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Introductory example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Exploring the dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Using the model to predict . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Simple linear regression 7 2.1 Definition of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Example: predicting satisfaction with garlic content . . . . . . . . . . . . . . 9 2.4 Inference about the slope β 1 in simple linear regression . . . . . . . . . . . . 12 2.4.1 Assumptions and formula for the t-test for the slope β 1 . . . . . . . . 13 2.4.2 Using R to perform the calculations . . . . . . . . . . . . . . . . . . . 13 2.5 Example: standardized test scores . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6 Generating data from the simple linear regression model . . . . . . . . . . . 16 3 Polynomial regression 17 3.1 Definition of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Inference about the order of the polynomial population regression function using the f-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 Assumptions and formula for the level α f-test . . . . . . . . . . . . . 22 3.3.2 How to select d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.3 Selecting the appropriate polynomial order for the corn data by hy- pothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.4 Revisiting the standardized test score data . . . . . . . . . . . . . . . 26 3.4 Relating the t-test and f-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5 Goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1 4 Appendix 30 1 Introduction 1.1 Definitions In one-way ANOVA, we modeled a response in J subpopulations defined by levels of a fac- tor. Using realizations of J independent random samples (one from each subpopulation distribution), we tested H : all J subpopulation means are equal versus H a : at least one subpopulation mean is different. Regression is an extension of this idea, where the factor (categorical characteristic) is replaced with one or more “predictors” (numerical character- istics)....
