assump - Lecture 7 Linear Regression Diagnostics BIOST 515...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error term  has zero mean. 3. The error term  has constant variance. 4. The errors are uncorrelated. 5. The errors are normally distributed or we have an adequate sample size to rely on large sample theory. We should always check fitted models to make sure that these assumptions have not been violated. BIOST 515, Lecture 6 1 Departures from the underlying assumptions cannot be detected using any of the summary statistics we’ve examined so far such as the t or F statistics or R2. In fact, tests based on these statistics may lead to incorrect inference since they are based on many of the assumptions above. BIOST 515, Lecture 6 2 Residual analysis The diagnostic methods we’ll be exploring are based primarily on the residuals. Recall, the residual is defined as ei = yi − yˆi, i = 1, . . . , n, where ˆ yˆ = X β. If the model is appropriate, it is reasonable to expect the residuals to exhibit properties that agree with the stated assumptions. BIOST 515, Lecture 6 3 Characteristics of residuals • The mean of the {ei} is 0: e¯ = n X 1 n i=1 ei = 0. • The estimate of the population variance computed from the sample of the n residuals is n X 1 S = e2i n − p − 1 i=1 2 which is the residual mean square, M SE = SSE/(n−p−1). BIOST 515, Lecture 6 4 • The {ei} are not independent random variables. In general, if the number of residuals (n) is large relative to the number of independent variables (p), the dependency can be ignored for all practical purposes in an analysis of residuals. BIOST 515, Lecture 6 5 Methods for standardizing residuals • Standardized residuals • Studentized residuals • Jackknife residuals BIOST 515, Lecture 6 6 Standardized residuals An obvious choice for scaling residuals is to divide them by their estimated standard error. The quantity ei zi = √ M SE is called a standardized residual. Based on the linear regression assumptions, we might expect the zis to resemble a sample from a N (0, 1) distribution. BIOST 515, Lecture 6 7 Studentized residuals Using MSE as the variance of the ith residual ei is only an approximation. We can improve the residual scaling by dividing ei by the standard deviation of the ith residual. We can show that the covariance matrix of the residuals is var(e) = σ 2(I − H). Recall H = X(X 0X)−1X 0 is the hat matrix. The variance of the ith residual is var(ei) = σ 2(1 − hi), where hi is the ith element on the diagonal of the hat matrix and 0 ≤ hi ≤ 1. BIOST 515, Lecture 6 8 The quantity ri = p ei M SE(1 − hi) is called a studentized residual and approximately follows a t distribution with n − p − 1 degrees of freedom (assuming the assumptions stated at the beginning of lecture are satisfied). Studentized residuals have a mean near 0 and a variance, n X 1 ri2, n − p − 1 i=1 that is slightly larger than 1. In large data sets, the standardized and studentized residuals should not differ dramatically. BIOST 515, Lecture 6 9 Jackknife residuals The quantity s r(−i) = ri M SE ei =q = ri M SE(−i) M SE(−i)(1 − hi) s (n − p − 1) − 1 (n − p − 1) − ri2 is called a jackknife residual (or R-Student residual). M SE(−i) is the residual variance computed with the ith observation deleted. Jackknife residuals have a mean near 0 and a variance n X 1 2 r(−i) (n − p − 1) − 1 i=1 that is slightly greater than 1. Jackknife residuals are usually the preferred residual for regression diagnostics. BIOST 515, Lecture 6 10 How to use residuals for diagnostics? Residual analysis is usually done graphically. We may look at • Quantile plots: to assess normality • Scatterplots: to assess model assumptions, such as constant variance and linearity, and to identify potential outliers • Histograms, stem and leaf diagrams and boxplots BIOST 515, Lecture 6 11 Quantile-quantile plots Quantile-quantile plots can be useful for comparing two samples to determine if they arise from the same distribution. Similarly, we can compare quantiles of a sample to the expected quantiles if the sample came from some distribution F for a visual assessment of whether the sample arises from F . In linear regression, this can help us determine the normality of the residuals (if we have relied on an assumption of normality). To construct a quantile-quantile plot for the residuals, we plot the quantiles of the residuals against the theorized quantiles if the residuals arose from a normal distribution. If the residuals come from a normal distribution the plot should resemble a straight line. A straight line connecting the 1st and 3rd quartiles is often added to the plot to aid in visual assessment. BIOST 515, Lecture 6 12 Samples from N (0, 1) distribution 3 Normal Q−Q Plot 1 0 −1 −3 −2 Sample Quantiles 2 ● ● ● −3 ● ●●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●● −2 −1 0 1 2 ● ● 3 Theoretical Quantiles BIOST 515, Lecture 6 13 Samples from a skewed distribution Normal Q−Q Plot 2 0 −2 Residuals 4 ● ● ●● ●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●● −3 −2 −1 0 1 ● 2 ● 3 Theoretical Quantiles 0.2 0.0 Density 0.4 Histogram of y −2 −1 0 1 2 3 4 Residuals BIOST 515, Lecture 6 14 Samples from a heavy-tailed distribution Normal Q−Q Plot 4 0 −4 Residuals 8 ● ● ● ●●● ● ● ●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●● ● ●● −3 −2 −1 0 1 2 3 Theoretical Quantiles 0.3 0.0 Density 0.6 Histogram of y −4 −2 0 2 4 6 8 Residuals BIOST 515, Lecture 6 15 Samples from a light-tailed distribution Normal Q−Q Plot 2 1 0 −2 Residuals ● ●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●● ● −3 −2 −1 0 1 2 3 Theoretical Quantiles 0.4 0.2 0.0 Density Histogram of y −2 −1 0 1 2 Residuals BIOST 515, Lecture 6 16 Scatterplots Another useful aid for inspection is a scatterplot of the residuals against the fitted values and/or the predictors. These plots can help us identify: • Non-constant variance • Violation of the assumption of linearity • Potential outliers BIOST 515, Lecture 6 17 Satisfactory residual plot ● ● ● ● ● ei ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●●● ●●●●●● ● ● ● ●●● ● ●●● ● ● ● ● ●●● ●●● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ●● ● ● ●● ●● ●● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●●●● ●●● ● ●●● ● ● ● ● ●● ● ●●●● ●● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ●●●● ●● ● ●● ● ● ● ● ●●●● ●● ●● ●● ● ●●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ●● ●●● ●● ●● ● ●● ● ●● ●● ●●●● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y^i BIOST 515, Lecture 6 18 Non−constant variance ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●●● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●● ●●● ●● ● ● ●●●● ● ● ●●● ● ● ● ● ●●● ● ●●● ●●●● ●● ● ●●● ● ● ● ● ● ● ● ● ●●●●●●●●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ●●●● ● ●●●● ●●● ● ●● ● ● ● ●● ● ●●● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ●● ●● ● ● ● ● ●●● ●● ● ● ● ●● ●● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●●●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ●●●● ● ●● ● ●●● ●● ● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ●● ●●●● ●● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ●●● ● ● ●● ● ●●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ●●●●● ● ● ●● ●● ●● ● ● ●●●● ● ●●● ●● ● ● ● ●●●●●● ●● ● ●●● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ●●● ●●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ●● ●●● ● ● ●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ei ● ● ● ● ● ● y^i BIOST 515, Lecture 6 19 Non−constant variance ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ●● ● ● ●● ● ● ●●● ●● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●●●● ● ● ● ●●● ● ●●● ●● ● ●●● ●●●●●● ● ● ●● ● ●● ● ● ●● ●● ●●●● ● ● ●● ● ● ● ●● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ●● ●●●●● ●● ● ●● ● ●●● ● ●● ●● ●●● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ●●●● ● ●● ●● ● ● ● ● ●● ● ●● ●●●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●●●● ● ●● ● ● ●● ●●●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ●● ●●●●● ●● ●●● ● ● ● ● ●●● ● ● ●●● ●● ● ● ●●●●● ● ●● ● ●●● ● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ●●●●● ●● ● ●● ●● ● ● ●●● ●●● ●● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ei ● ● ● ● ● ● ● y^i BIOST 515, Lecture 6 20 Example Suppose the true relationship between a predictor, x, and an outcome, y is E(yi) = 1 + 2xi − 0.25x2i , but we fit the model yi = β0 + β1xi + i. Can we diagnose this with residual plots? BIOST 515, Lecture 6 21 yi = β0 + β1xi + i 30 Scatterplot of x vs. y ● ● ● 20 ●● ● −30 −20 −10 y 0 10 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ●●● ●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● 0 5 10 x BIOST 515, Lecture 6 22 Estimate −1.7761 0.7569 (Intercept) x Std. Error 0.5618 0.1111 Pr(>|t|) 0.0017 0.0000 t value −3.16 6.81 Normal Q−Q Plot ● 0 −1 −2 Sample Quantiles 1 2 ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● −3 ● ● ● ● ●● ● −3 −2 −1 0 1 2 3 Theoretical Quantiles BIOST 515, Lecture 6 23 Fitted values versus residuals ● 2 ● ● 1 0 −1 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ●● ● ●●● ●●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ei ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −3 ● −4 −2 0 2 4 y^i BIOST 515, Lecture 6 24 Next, we fit yi = β0 + β1xi + β2x2i + i. Estimate 1.9824 1.9653 −0.2640 (Intercept) x x2 Std. Error 0.6000 0.1637 0.0268 Pr(>|t|) 0.0010 0.0000 0.0000 t value 3.30 12.00 −9.85 Normal Q−Q Plot ● 0 −1 −3 −2 Sample Quantiles 1 2 ● ●●●● ●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●●● ●● ● ● −3 ● ● ● ● −2 −1 0 1 2 3 Theoretical Quantiles BIOST 515, Lecture 6 25 Fitted values versus residuals ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ●● ●●● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● −2 −1 ei 0 1 2 ● ● −3 ● ● ● ● −10 −5 0 y^i BIOST 515, Lecture 6 26 What if we have more than one predictor and only one is misspecified? So, the true model is E[yi] = 1 + 3wi + 2xi − 0.25x2i and we fit yi = β0 + β1wi + β2xi + i. How can we diagnose model misspecification with residual plots in this case? BIOST 515, Lecture 6 27 4.0 ● ●● ●● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ●●● ●● ●● ● ●●● ●● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●●● ● ●● ● ● ●●● ● ● ●●●● ● ● ●●● ●●● ● ●● ● ● ●● ●●●●● ●● ●● ●●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●●● ● ● ●● ● ●● ●● ● ● ●● ●● ● ●● ●● ● ● ●● ●● ●● ●● ● ●● ●●● ●● ●● ● ● ●●● ●● ● ●●● ● ●●●●● ●● ●●●● ●● ● ● ●● ●●● ●●● ● ●●● ● ● ● ●●● ● ● ●●●●● ● ●● ● ● ●●●● ● ● ●● ● ● ●●●●● ● ●●● ●● ● ●●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●●●● ● ● ●● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ●● ● ●● ● ●● ●● ● ● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ●● ●● ● ●● ● ●● ●● ●● ●● ● ●● ● ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●●● ●● ● ● ●● ●●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● ●●● ● ● ●● ●● ● ●●● ●● ●●●●● ●● ● ●● ● ●●● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●●● ●...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern