350LectureR_Regression_Student

# 350LectureR_Regression_Student - Lecture R Simple Linear...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture R: Simple Linear Regression Sections 3.3 and 11.1 Example Data: For a science project, a student wanted to examine the effects of alcohol on performance. The student trained mice to run a maze. Once all mice were proficient at running the maze, the student randomly assigned each mouse a different dose of alcohol and timed them running the maze. The results are given in the table and plotted below. 18 Alcohol dose Time to run maze x y 0.85 6.80 1.36 8.69 3.12 13.70 4.64 12.19 5.60 16.64 16 14 12 y 10 8 6 4 2 0 0 1 2 3 4 5 6 x x is the explanatory variable (a.k.a. independent or predictor variable) y is the response variable (a.k.a. dependent variable) Knapp Stat 350 Spring 2009 Lecture R: Simple Linear Regression Page 1 ˆ Least Squares Regression Line: y = a + bx ∑( x y ) − i b= ( ∑ x )( ∑ y ) i ∑(x ) − 2 i i i n 2 ( ∑ xi ) = SS xy SS xx n a = y − bx Example Continued: i 1 2 3 4 5 SUM b= x 0.85 1.36 3.12 4.64 5.60 15.57 ∑(x y ) − i y2 46.240 75.516 187.690 148.596 276.890 734.932 xy 5.780 11.818 42.744 56.562 93.184 210.088 ( ∑ x )( ∑ y ) i ∑(x ) − 2 i x2 0.723 1.850 9.734 21.530 31.360 65.196 y 6.80 8.69 13.70 12.19 16.64 58.02 i i n 2 ( ∑ xi ) = n a = y − bx = Knapp Stat 350 Spring 2009 Lecture R: Simple Linear Regression Page 2 18 16 14 12 y 10 8 6 4 2 0 0 1 2 3 4 6 5 x Predicted or fitted values are the values of the yi's for each xi if the points fell on the regression line ˆ (i.e., yi = a + bxi ). Residuals are the vertical deviations from the line; the difference between the observed value and the ˆ predicted/fitted value (the ith residual = yi − yi ). A residual is positive (+) if the observed point is above the regression line and negative (-) if the observed point is below the regression line. Predicted/Fitted Values i 1 2 3 4 5 SUM xi 0.85 1.36 3.12 4.64 5.60 yi 6.80 8.69 13.70 12.19 16.64 15.57 58.02 ˆ yi = 1.76014 xi + 6.12293 7.619 8.517 11.615 14.290 15.980 58.020 Squared "residuals": residuals ˆ ˆ ( yi − yi ) ( yi − yi ) ‐0.819 0.173 2.085 ‐2.100 0.660 0.000 2 0.671 0.030 4.349 4.410 0.436 9.896 Find the Pearson Correlation Coefficient for this data: r= SS xy SS xx SS yy Knapp Stat 350 Spring 2009 = Lecture R: Simple Linear Regression Page 3 Assessing the Fit of the Least Squares Line Regression Sums of Squares Total Sum of Squares (SSTo) 2 SSTo = ∑ ( yi − y ) = SS yy = ∑ ( y 2 i ) (∑ y ) − 2 i n Residual Sum of Squares (SSResid) also called Error Sum of Squares (SSE) This is the measure of the variation in y not explained by the linear relationship between x and y. 2 ˆ SSResid = SSE = ∑ ( yi − yi ) = SSTo - bSSxy Regression Sum of Squares (SSReg) This is the measure of the variation in y that is explained by the linear relationship between x and y. 2 ˆ SSReg = ∑ ( yi − y ) = SSTo - SSResid Partitioning the Sums of Squares SSTo = SSReg + SSE 2 2 2 ˆ ˆ ∑ ( yi − y ) = ∑ ( yi − y ) + ∑ ( yi − yi ) 18 16 14 12 y 10 8 6 4 2 0 0 1 2 3 4 5 6 x Note: for a given point: ( yi − y ) Deviation from the mean Knapp Stat 350 Spring 2009 = ˆ ( yi − y ) Deviation of fitted regression value around the mean + ˆ ( yi − yi ) Deviation from the fitted regression line (residual) Lecture R: Simple Linear Regression Page 4 SSResid/SSTo is the proportion of the total variation that is NOT EXPLAINED by the linear relationship between x and y. SSReg/SSTo is the proportion of the total variation that IS EXPLAINED by the linear relationship. Coefficient of Determination r2 SSResid SSRegr r2 = 1− = SSTo SSTo This is the proportion of the variation in y that can be explained by the linear relationship between x and y. r2 is the square of the Pearson correlation coefficient r. Standard Deviation about the Least Squares Line se SSResid se = n−2 Regression in SAS data mice; input x_alcohol y_mazetime; cards; 0.85 6.80 1.36 8.69 3.12 13.70 4.64 12.19 5.60 16.64 ; run; proc reg data=mice; model y_mazetime = x_alcohol; plot y_mazetime * x_alcohol; run; Knapp Stat 350 Spring 2009 Lecture R: Simple Linear Regression Page 5 The REG Procedure Model: MODEL1 Dependent Variable: y_mazetime Number of Observations Read Number of Observations Used 5 5 Analysis of Variance DF Sum of Squares Mean Square 1 3 4 51.77193 9.89579 61.66772 51.77193 3.29860 Root MSE Dependent Mean Coeff Var 1.81620 11.60400 15.65153 Source Model Error Corrected Total R-Square Adj R-Sq F Value Pr > F 15.70 0.0287 0.8395 0.7860 Parameter Estimates Variable Intercept x_alcohol DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 6.12296 1.76013 1.60431 0.44429 3.82 3.96 0.0316 0.0287 y_m azet i m = 6. 123 + 7601 x_al cohol e 1. 18 N 5 R sq 0. 8395 AR dj sq 0. 7860 RS ME 1. 8162 16 14 12 10 8 6 0. 5 1. 0 1. 5 2. 0 2. 5 3. 0 3. 5 4. 0 4. 5 5. 0 5. 5 6. 0 x_al cohol Knapp Stat 350 Spring 2009 Lecture R: Simple Linear Regression Page 6 Obtaining Predicted Values and Residuals with SAS proc reg data=mice; model y_mazetime = x_alcohol; output out = miceout p=timepred r=resid; run; proc print data=miceout; run; The SAS System Obs 1 2 3 4 5 Knapp Stat 350 Spring 2009 x_alcohol 0.85 1.36 3.12 4.64 5.60 y_mazetime 6.80 8.69 13.70 12.19 16.64 timepred 7.6191 8.5167 11.6146 14.2900 15.9797 resid -0.81907 0.17327 2.08544 -2.09996 0.66032 Lecture R: Simple Linear Regression Page 7 y y Simple Linear Regression Models In simple linear regression we assume the following model underlies the data: y = α + βx + e. • α and β are parameters (constants) representing the y-intercept and slope of the true/population regression line • e represents the deviation from the line. We assume that the deviations for each observation are independent from each other, and that each has a Normal distribution with mean 0 (μe = 0) and standard deviation σ (σe = σ). • Thus for a given value of x, the corresponding value of y is has a normal distribution with mean α + βx and standard deviation σ x y y x x x ˆ Least squares regression line: y = a + bx a is a point estimate of the parameter α b is a point estimate of the parameter β se is a point estimate of the parameter σ Knapp Stat 350 Spring 2009 Lecture R: Simple Linear Regression Page 8 ...
View Full Document

## This note was uploaded on 02/16/2010 for the course MA 350 taught by Professor Sellke during the Spring '10 term at Purdue.

Ask a homework question - tutors are online