lecture4

# lecture4 - ISYE6414 Summer 2010 Lecture 4 The Linear Model...

This preview shows pages 1–4. Sign up to view the full content.

ISYE6414 Summer 2010 Lecture 4 The Linear Model: OLS Regression Some Diagnostics and Etc. Dr. Kobi Abayomi June 3, 2010 1 Diagnostics The theme of diagnostics in the linear regression model is departures from the assumptions, which are chieﬂy: Normality of the error, constancy of the error ( homoscedasticity ), independence of the error. Violations of these assumptions are realized as non-linearity of the regression function and via exclusion of latent predictors, as well. The ﬁrst thing you should always do is make plots of the data! Remember that a linear regression is a linear model: in many cases the relationship in the data is not linear; or not linear for all of the data. 1.1 Checking Observations: Predictors and Response Any data point that is far from the majority of the data (in both x and y ) is called an outlier . An outlier is an observation that is unusually small or large. Remember that our estimates for the regression parameters came from our attempt to ﬁt the mean line through the y ’s at each value of x . Data points that are far from the mean of the x ’s are called leverage points . A data point that is far from the the mean of both the y ’s and the x ’s are often inﬂuential and can change the value of the estimated parameters signiﬁcantly. The upshot: Often a regression line will diﬀer greatly depending upon what subsets of the data are included. Sometimes there are good reasons for excluding subsets (there were errors in the data entry; there were errors in the experiment). Sometimes - the outlier belongs in the data. Outliers should always be examined. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
1.2 In R ###outlier example IQ<-c(90,90,110,120,80,130,125,95,100) shoesize<-c(8.25,8.5,12,12,7.75,12.5,12,9.5,9.7) ###The original shoe IQ data ###I add an outlier, that is influential shoesize2<-c(shoesize,20) IQ1<-c(IQ,115) ###to the original shoe IQ data ###I add an outlier that is not influential shoesize2<-c(shoesize,15) IQ2<-c(IQ,140) ###all of the data with the affected regression lines shoesize3<-c(shoesize,20,15) IQ3<-c(IQ,115,140) shoeIQdata<-as.data.frame(cbind(shoesize3,IQ3)) names(shoeIQdata)<-c("shoesize","IQ") regline1<-lm(IQ ~ shoesize,data=shoeIQdata[1:9,]) regline2<-lm(IQ ~ shoesize,data=shoeIQdata[1:10,]) regline3<-lm(IQ ~ shoesize,data=shoeIQdata[-10,]) plot(shoesize3,IQ3,type="n") points(shoesize3[1:9],IQ3[1:9],col="black") points(shoesize3[10],IQ3[10],col="red") 2
points(shoesize3[11],IQ3[11],col="green") lines(shoesize3[1:9],regline1\$fitted,col="black",lwd=2) lines(shoesize3[1:10],regline2\$fitted,col="red",lwd=2) lines(shoesize3[-10],regline3\$fitted,col="green",lwd=2) Some diagnostic plots you can use Dotplot par(mfrow=c(2,1)) dotchart(shoeIQdata[,1],main="shoesize-predictor") dotchart(shoeIQdata[,2],main="IQ-response") Stem and Leaf plot stem(shoeIQdata[,1]); stem(shoeIQdata[,2]) Box Plot boxplot(shoeIQdata[,1],main="shoesize-predictor"); boxplot(shoeIQdata[,2],main="IQ-response",horizontal=T)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 22

lecture4 - ISYE6414 Summer 2010 Lecture 4 The Linear Model...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online