This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Lecture 4. Prediction and Residual Diagnostics Prediction How to predict in R using a regression model and to construct a predictive interval (PI)? Example 1 . Let us us consider the Albuquerque Home Prices. The linear regression model for PRICE vs. SQFT is given by Y j = 4781 . 93 + 61 . 37 X j + j . (4.1) Suppose now that we have a house of 4000 square feet. How much can we roughly expect to pay for this house if we know only its square footage? > SQFT<data$SQFT > PRICE<data$PRICE > new<data.frame(SQFT=c(4000)) > predict(lm(PRICE~SQFT), new, se.fit = TRUE, interval="prediction") $fit fit lwr upr 1 250248.7 206227.9 294269.4 $se.fit [1] 8711.35 $df [1] 115 $residual.scale [1] 20445.12 Indeed, 4781 . 931+61 . 367 * 4000 = 250249 . 9USD. The 95% PI is (206227 . 9 , 294269 . 4). Hence, with 95% probability, we can expect to pay between 206228.9 and 294269.4 USD for this house of 4000 s.f. 1 Residual diagnostics Recall the GaussMarkov theorem. We need to check that residuals i are: uncorrelated, homoscedastic, As typically we also would like to construct confidence and predictive intervals, i.e. CI and PI, which are based on the normal critical values, hence, we also assume that the residuals are normally distributed. Note that if the residuals i are i.i.d. N (0 , 2 ) then all three above properties automatically satisfied. In this case, i are not only uncorrelated, i are independent. To be independent is much stronger property than to be uncorrelated. While a given model may still have useful predictive value even when those three assumptions are violated, the confidence intervals, prediction intervals, and pvalues associated with the tstatistics will generally be incorrect when those three assumptions do not hold. In a residual analysis we attempt to assess the validity of the assumptions by examining the estimated residuals i to see if they actually behave like i.i.d. N (0 , 2 ) random variables. Later in the course we add other procedures based on residual analysis (outliers, lackoffit, influential observations etc) but for now we concentrate only on verifying the assumptions of the GaussMarkov (GM) theorem. We perform our diagnostics analysis from a stepbystep verification of each GM assumption and illustrate it by application to residuals from an SLR model for the Albuquerque home price data. We shall begin again from the byeye diagnostics and proceed further with the formal definitions and theoretical background. Diagnostics by eye A basic technique for investigating the aptness of a model is based on analyzing the residuals i . If the model is apt, the observed residuals should reflect the three assumptions. A lot of useful diagnostic information may be obtained from a residual plot....
View
Full
Document
This note was uploaded on 01/12/2012 for the course STAT 331 taught by Professor Yuliagel during the Spring '08 term at Waterloo.
 Spring '08
 YuliaGel
 Linear Regression

Click to edit the document details