{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture4

Lecture4 - Lecture 4 Prediction and Residual Diagnostics...

This preview shows pages 1–4. Sign up to view the full content.

Lecture 4. Prediction and Residual Diagnostics Prediction How to predict in R using a regression model and to construct a predictive interval (PI)? Example 1 . Let us us consider the Albuquerque Home Prices. The linear regression model for PRICE vs. SQFT is given by Y j = 4781 . 93 + 61 . 37 X j + ² j . (4.1) Suppose now that we have a house of 4000 square feet. How much can we roughly expect to pay for this house if we know only its square footage? > SQFT<-data\$SQFT > PRICE<-data\$PRICE > new<-data.frame(SQFT=c(4000)) > predict(lm(PRICE~SQFT), new, se.fit = TRUE, interval="prediction") \$fit fit lwr upr 1 250248.7 206227.9 294269.4 \$se.fit [1] 8711.35 \$df [1] 115 \$residual.scale [1] 20445.12 Indeed, 4781 . 931+61 . 367 * 4000 = 250249 . 9USD. The 95% PI is (206227 . 9 , 294269 . 4). Hence, with 95% probability, we can expect to pay between 206228.9 and 294269.4 USD for this house of 4000 s.f. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Residual diagnostics Recall the Gauss-Markov theorem. We need to check that residuals ε i are: uncorrelated, homoscedastic, As typically we also would like to construct confidence and predictive intervals, i.e. CI and PI, which are based on the normal critical values, hence, we also assume that the residuals are normally distributed. Note that if the residuals ε i are i.i.d. N (0 , σ 2 ) then all three above properties automatically satisfied. In this case, ε i are not only uncorrelated, ε i are independent. To be independent is much stronger property than to be uncorrelated. While a given model may still have useful predictive value even when those three assumptions are violated, the confidence intervals, prediction intervals, and p -values associated with the t -statistics will generally be incorrect when those three assumptions do not hold. In a residual analysis we attempt to assess the validity of the assumptions by examining the estimated residuals ε i to see if they actually behave like i.i.d. N (0 , σ 2 ) random variables. Later in the course we add other procedures based on residual analysis (outliers, lack-of-fit, influential observations etc) but for now we concentrate only on verifying the assumptions of the Gauss-Markov (GM) theorem. We perform our diagnostics analysis from a step-by-step verification of each GM assumption and illustrate it by application to residuals from an SLR model for the Albuquerque home price data. We shall begin again from the ”by-eye” diagnostics and proceed further with the formal definitions and theoretical background. Diagnostics “by eye” A basic technique for investigating the aptness of a model is based on analyzing the residuals ε i . If the model is apt, the observed residuals should reflect the three assumptions. A lot of useful diagnostic information may be obtained from a residual plot. 2
What may you notice from the residual plot? Change of variability with index or fitted values indicates heterogeneity of variance of ε i , i.e. heteroscedasticity.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 13

Lecture4 - Lecture 4 Prediction and Residual Diagnostics...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online