Unformatted text preview: Lecture 7
Linear Regression Diagnostics
BIOST 515
January 27, 2004 BIOST 515, Lecture 6 Major assumptions
1. The relationship between the outcomes and the predictors is
(approximately) linear.
2. The error term has zero mean.
3. The error term has constant variance.
4. The errors are uncorrelated.
5. The errors are normally distributed or we have an adequate
sample size to rely on large sample theory.
We should always check fitted models to make sure that these
assumptions have not been violated.
BIOST 515, Lecture 6 1 Departures from the underlying assumptions cannot be detected using any of the summary statistics we’ve examined so
far such as the t or F statistics or R2. In fact, tests based on
these statistics may lead to incorrect inference since they are
based on many of the assumptions above. BIOST 515, Lecture 6 2 Residual analysis
The diagnostic methods we’ll be exploring are based primarily
on the residuals. Recall, the residual is defined as
ei = yi − yˆi, i = 1, . . . , n,
where
ˆ
yˆ = X β.
If the model is appropriate, it is reasonable to expect the residuals to exhibit properties that agree with the stated assumptions. BIOST 515, Lecture 6 3 Characteristics of residuals
• The mean of the {ei} is 0:
e¯ = n
X
1 n i=1 ei = 0. • The estimate of the population variance computed from the
sample of the n residuals is
n
X 1
S =
e2i
n − p − 1 i=1
2 which is the residual mean square, M SE = SSE/(n−p−1).
BIOST 515, Lecture 6 4 • The {ei} are not independent random variables. In general,
if the number of residuals (n) is large relative to the number
of independent variables (p), the dependency can be ignored
for all practical purposes in an analysis of residuals. BIOST 515, Lecture 6 5 Methods for standardizing residuals
• Standardized residuals
• Studentized residuals
• Jackknife residuals BIOST 515, Lecture 6 6 Standardized residuals
An obvious choice for scaling residuals is to divide them by
their estimated standard error. The quantity
ei
zi = √
M SE
is called a standardized residual. Based on the linear regression assumptions, we might expect the zis to resemble a
sample from a N (0, 1) distribution. BIOST 515, Lecture 6 7 Studentized residuals
Using MSE as the variance of the ith residual ei is only an
approximation. We can improve the residual scaling by dividing
ei by the standard deviation of the ith residual. We can show
that the covariance matrix of the residuals is
var(e) = σ 2(I − H).
Recall H = X(X 0X)−1X 0 is the hat matrix. The variance of
the ith residual is
var(ei) = σ 2(1 − hi),
where hi is the ith element on the diagonal of the hat matrix
and 0 ≤ hi ≤ 1.
BIOST 515, Lecture 6 8 The quantity
ri = p ei
M SE(1 − hi) is called a studentized residual and approximately follows a t
distribution with n − p − 1 degrees of freedom (assuming the
assumptions stated at the beginning of lecture are satisfied).
Studentized residuals have a mean near 0 and a variance,
n
X 1
ri2,
n − p − 1 i=1
that is slightly larger than 1. In large data sets, the standardized
and studentized residuals should not differ dramatically.
BIOST 515, Lecture 6 9 Jackknife residuals
The quantity
s
r(−i) = ri M SE
ei
=q
= ri
M SE(−i)
M SE(−i)(1 − hi) s (n − p − 1) − 1
(n − p − 1) − ri2 is called a jackknife residual (or RStudent residual).
M SE(−i) is the residual variance computed with the ith observation deleted.
Jackknife residuals have a mean near 0 and a variance
n
X
1
2
r(−i)
(n − p − 1) − 1 i=1 that is slightly greater than 1. Jackknife residuals are usually
the preferred residual for regression diagnostics.
BIOST 515, Lecture 6 10 How to use residuals for diagnostics?
Residual analysis is usually done graphically. We may look
at
• Quantile plots: to assess normality
• Scatterplots: to assess model assumptions, such as constant
variance and linearity, and to identify potential outliers
• Histograms, stem and leaf diagrams and boxplots BIOST 515, Lecture 6 11 Quantilequantile plots
Quantilequantile plots can be useful for comparing two
samples to determine if they arise from the same distribution.
Similarly, we can compare quantiles of a sample to the expected
quantiles if the sample came from some distribution F for a
visual assessment of whether the sample arises from F . In
linear regression, this can help us determine the normality of
the residuals (if we have relied on an assumption of normality).
To construct a quantilequantile plot for the residuals, we
plot the quantiles of the residuals against the theorized quantiles
if the residuals arose from a normal distribution. If the residuals
come from a normal distribution the plot should resemble a
straight line. A straight line connecting the 1st and 3rd quartiles
is often added to the plot to aid in visual assessment.
BIOST 515, Lecture 6 12 Samples from N (0, 1) distribution 3 Normal Q−Q Plot 1
0
−1
−3 −2 Sample Quantiles 2 ● ● ● −3 ● ●●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●● −2 −1 0 1 2 ● ● 3 Theoretical Quantiles BIOST 515, Lecture 6 13 Samples from a skewed distribution
Normal Q−Q Plot 2
0
−2 Residuals 4 ● ● ●●
●●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●●●●●●●● −3 −2 −1 0 1 ● 2 ● 3 Theoretical Quantiles 0.2
0.0 Density 0.4 Histogram of y −2 −1 0 1 2 3 4 Residuals BIOST 515, Lecture 6 14 Samples from a heavytailed distribution
Normal Q−Q Plot 4
0
−4 Residuals 8 ● ● ● ●●● ● ●
●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●●
●
●● −3 −2 −1 0 1 2 3 Theoretical Quantiles 0.3
0.0 Density 0.6 Histogram of y −4 −2 0 2 4 6 8 Residuals BIOST 515, Lecture 6 15 Samples from a lighttailed distribution
Normal Q−Q Plot
2
1
0
−2 Residuals ● ●●●●●●●●●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●●●●●●●● ● −3 −2 −1 0 1 2 3 Theoretical Quantiles 0.4
0.2
0.0 Density Histogram of y −2 −1 0 1 2 Residuals BIOST 515, Lecture 6 16 Scatterplots
Another useful aid for inspection is a scatterplot of the
residuals against the fitted values and/or the predictors. These
plots can help us identify:
• Nonconstant variance
• Violation of the assumption of linearity
• Potential outliers BIOST 515, Lecture 6 17 Satisfactory residual plot
● ●
● ● ● ei ● ● ● ● ●
● ●
●
●● ●
●
●
●
●
● ● ●
● ●
●
●
●●
●● ●
●
●
●●
●● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
● ●● ●
●●
●●● ● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●●●●● ● ●
● ●●
●
● ●● ●
●
●● ●
● ● ●
● ●● ●
●
●
●
●
●●
●
● ●●
●
●
● ●● ● ● ●●
●
●
●
● ● ● ●
● ●
● ● ●●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ● ●
●
● ●
●●
● ● ● ●
●
● ● ●
●
●●
● ● ● ●●
●
● ●●●
●
● ●● ●●
● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ●●
●● ●●
● ● ● ● ●● ● ●
●● ● ● ●
● ● ● ● ●●●
●
●
●●● ●●●●●● ●
●
● ●●●
● ●●●
● ●
●
● ●●●
●●● ●
●●
● ● ● ●● ● ●●●
●
●
●
●● ●● ●
●
●● ●●
●● ●● ●
● ●
●
● ● ● ●●
●●● ● ● ●
●●
●
●
● ●●● ●
● ●
●
●
● ●
●
●● ● ● ●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●● ●● ●●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●● ●
●
● ●
●
● ● ● ●●
●●●
● ●●
●●●●
●●● ●
●●●
●
●
●
●
●●
● ●●●●
●●
● ● ●●
● ●● ● ●●●
●
● ●
●
● ●●●●
●●
● ●● ● ●
●
●
●●●● ●● ●● ●●
● ●●● ● ● ●
●
●●
●
●
●
●●● ●
● ●
● ●●
●●● ● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
● ●
●● ●
● ●
● ●
● ●●
●
●● ● ●
●●● ●
●
●● ●●
●●● ●● ●●
●
●●
● ●● ●●
●●●● ●● ●● ●●
● ●
●
● ●● ●
●
●● ● ●
● ● ● ●● ●
●●
●
●
● ●
● ●●
●●● ● ●
●●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
● ●
●
●
●
●●
●●
●
●●●
● ●
●●
●●
●
●
●
● ●
● ● ● ●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●● ●
● ● ●
●●
● ●● ●
●
●
●
● ● ●●
●
●●●
● ●●●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●●
●
●
● ●
● ●●
●
●● ●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
● ●
● y^i BIOST 515, Lecture 6 18 Non−constant variance
● ● ● ●●
●
●● ●
● ●
●
●
●
●● ● ●
●
●
● ●
●
●
●
●
● ●● ●
●
●
● ● ● ● ●
●
●
●● ● ●
●
●
●
●
●
●
●
● ●
●
●●
●●
●● ● ● ● ● ●
●
●
●
●
● ●●
●
●
● ●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●●●● ●
●
●
●
●
●
●
●●
● ● ●●
● ●
●●●
●
●
●●
●
● ●●
● ● ●● ●● ●
●
● ● ●● ●●
●●
● ● ● ● ● ●● ●
●●
●● ●
●
● ● ● ●●●
●●● ●● ●
● ●●●●
●
●
●●● ● ●
● ● ●●● ● ●●● ●●●●
●● ● ●●● ● ● ● ● ● ●
●
●
●●●●●●●●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
●
● ●
●
●●
●●
●●●● ● ●●●●
●●● ●
●●
●
●
●
●●
● ●●●
●
●
●
●●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
● ●
●
● ●
●
● ●●●
● ●● ● ●●
● ●
● ●
●
● ●
● ●● ●
● ●●
●
●● ●●
●● ●●
●
●
● ● ●●● ●●
●
●
● ●● ●● ●●
●
●
●●●
● ●● ●
●
●
●
●
●
●● ●
● ●
●● ●
●
●●
●●
●
●
●
●
●●●
●● ● ●
●
● ●● ●●
●
●
●
●●
●
●
●● ●
● ●
●
●●
● ● ● ● ●●
● ●●●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
● ● ●● ●
●
●●● ● ●●
● ● ●●● ●
●● ●
● ● ●
●● ●
●
●
●●
●
● ●●●
●
●● ●
●
● ●● ●●
● ●
●
● ●●●
● ●
●
●●●● ●
●● ● ●●●
●● ●
●● ●
● ●●
●●
● ●
● ●● ●●
● ● ●●
●●●● ●●
● ●
●
●
● ●
● ●●●● ●●
●
●
●
●
●●●
●
● ●
●●
●● ● ●● ● ●
●●●
●
●
●● ● ●●● ●
●
●● ●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●●● ●
● ●● ●
● ●●●●● ●
●
●● ●●
●●
● ● ●●●●
● ●●●
●● ●
● ● ●●●●●● ●●
●
●●● ● ●
● ●
●
● ●●
●●● ●● ●
● ● ●●
●
● ●● ●● ● ●
●●
●
●
●●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●●
● ●● ●
●●
● ●
●
●● ●●●
●●● ●● ● ●
●
●
●
●
●
●●
●
●● ●
●●
●● ●
● ●●
●
●●
● ●
●
●●
● ●●
●●
●
●●
●
●
●
●●
●●●
●
●
●●●
●
● ●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●● ●
●
●
● ●●
●
● ●●
●
●
●
●
●
●
● ei ● ●
●
●
● ● y^i BIOST 515, Lecture 6 19 Non−constant variance
●
●
● ●
● ●
●
● ●
●
●● ● ●
●
●
●●
●
●
●
● ● ●
●
● ●
●
●
●●● ●
●
●
●
●● ●
●
● ●● ●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●● ●
●
●
●
●
● ●●●●
● ●
●
●
●
● ●
●● ●
●
● ● ●●
● ● ● ● ●● ●
●●
●●● ●
●●●
● ●● ●●
●
● ● ●
● ● ● ● ● ● ●● ●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
● ●●●●
● ●● ● ●
●●
●
● ●●●
●●
●●
● ●●
● ●● ● ●
●
●● ●
● ●
●
● ●●
●
● ●
●● ● ●
●●
●
● ●
● ●
●
●
● ●●
●●
●
● ● ● ●●●●
●
● ● ●●●
● ●●●
●●
●
●●● ●●●●●●
● ● ●●
●
●●
● ●
●● ●●
●●●●
●
●
●●
●
●
●
●●
● ●
●
●●●●●● ● ●
●
●
●
●
●
● ●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
● ● ●
●●●
●●
●●●●●
●● ●
●● ● ●●●
● ●●
●●
●●●
● ● ● ●● ●
●
●
●
●●
●●
●●
● ●● ●●
● ●
●
● ●●●
● ●
● ●
●●●
●
●●
●●
● ● ●● ● ●●●●
●
●● ●● ●
●
● ● ●●
● ●●
●●●●
●●
● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ● ●
●●
●● ●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●●
●
●● ●●●●
●
●●
●
● ●● ●●●●●
●
●●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●● ●
●
●● ●●●●● ●● ●●● ●
●
● ●
●●●
● ● ●●● ●● ●
● ●●●●●
●
●●
●
●●● ● ● ●●
●●
●●
●●
● ● ●
●●
●
●
●
●●
●● ● ●
● ● ● ●● ●●
●
●●●●● ●●
●
●● ●●
●
●
●●● ●●● ●● ● ●
●
● ● ●
●
●●●
●● ● ●●● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●●●
●
●
●
●
●
●●●
●● ●
●
●
●
● ●
●
●
●
●
●● ●
●●
●
●
●● ●
●
●
●●●
●
● ● ●
● ●● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ● ●
●
●● ●
● ●
●
●
●
●
●●●
●
●
●
●
●
●● ●
● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●● ● ●
●
● ●
●
●
● ●
● ●
●●
●
●
●
● ●
●●
●
● ●
●
● ●
●
●
● ei ● ● ● ●
● ● ● y^i BIOST 515, Lecture 6 20 Example
Suppose the true relationship between a predictor, x, and
an outcome, y is
E(yi) = 1 + 2xi − 0.25x2i ,
but we fit the model
yi = β0 + β1xi + i.
Can we diagnose this with residual plots? BIOST 515, Lecture 6 21 yi = β0 + β1xi + i 30 Scatterplot of x vs. y
● ●
● 20 ●●
● −30 −20 −10 y 0 10 ● ● ● ● ●
● ● ● ●
●
● ●
●●
● ●
●
● ●●
●● ●
● ● ● ● ●
●● ●
●● ●●
●● ● ●
● ●
● ● ●
● ●
●
● ●
●
●
●
●
●
●
●●
●
●●
●●
● ●●
●
● ● ●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ● ●
●●
●
●
●●
●
●
●
● ●● ●
●●
●
●
●
●
●
●●●
●
●
●
●
●
● ●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●● ●
●●
● ●●
●
●
● ●
●●
●●
●
●
●●
●
●● ● ● ●
●●
●
●
●
●
●
●
●
● ●
● ●●
●
● ●
●
●●
●● ●●●
●●
● ●
●
●
●●
●
●
●● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ● ●●●
● ● ●●
● ●
●● ●
●
● ● ● ●
●
● ●● ●
●●
●
● ● ●● ●
● ● ●●
●●
●
●
●●
●●
●
●
●● ●
● ● ●
●● ● ●
● ●● ●
● ●●
●
●
●
●●
●
● ●●
●
●
●
●
● ● ●●●
● ●
●● ● ●
● ● ●●
●● ● ●
●
●
● ●●
●
●
●
●
● ●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
● ●
●
●
● ●●
● ●
●
●
● ● ●
●
● ● ●●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● −5 ● ● ● ● ●
●
●●
● ● ●
● ● ● ●●
●
●
●● ● ● ● 0 5 10 x BIOST 515, Lecture 6 22 Estimate
−1.7761
0.7569 (Intercept)
x Std. Error
0.5618
0.1111 Pr(>t)
0.0017
0.0000 t value
−3.16
6.81 Normal Q−Q Plot
● 0
−1
−2 Sample Quantiles 1 2 ●
●●● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●●
● −3 ● ● ● ● ●● ● −3 −2 −1 0 1 2 3 Theoretical Quantiles BIOST 515, Lecture 6 23 Fitted values versus residuals ● 2 ● ● 1
0
−1 ● ●
● ●
● ● ●
● ●
●
● ●● ●
● ●● ●
●●●
● ●
● ●
● ● ● ●● ● ●
●
●● ●●
●
● ●
●
●●
● ● ●●
●
●
●
●
●●
●● ●
●
●
●
●● ●
●● ●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
● ● ● ●
●●
● ●
● ● ●
●
●●
●●
●
●
●
●●
●
●
●
●
● ●●● ● ●
● ●
●
● ●●●
● ●
●
●
●●
● ● ●● ●
●
●●
●
●● ● ●
●
●
●
● ● ●● ●
●
●
●
●●
●●
● ● ●
●
●
●
● ●
●● ● ● ●
●
● ●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ● ●
●●
● ●● ● ●
●
●
● ●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
● ●
●
●
● ●● ●● ● ●
● ●
●● ● ● ●
●
●● ●●
● ●● ● ● ●
●
●● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
● ● ●●
●
● ● ●
●● ●
●
●
● ●●●●
●
●
● ●
●●
● ●●
●
●●●
●●●
●
●
●
●
● ●●
● ●●
●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●
●
●●
●
●
●●
●●
●
●
●
● ●
● ●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
● −2 ● ●
● ei ● ● ●● ●
● ●
● ●● ●
●● ●
●
●
●
●
●
● ● ● ● ●
● −3 ● −4 −2 0 2 4 y^i BIOST 515, Lecture 6 24 Next, we fit
yi = β0 + β1xi + β2x2i + i.
Estimate
1.9824
1.9653
−0.2640 (Intercept)
x
x2 Std. Error
0.6000
0.1637
0.0268 Pr(>t)
0.0010
0.0000
0.0000 t value
3.30
12.00
−9.85 Normal Q−Q Plot
● 0
−1
−3 −2 Sample Quantiles 1 2 ●
●●●●
●●
●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●●
●●
●
● −3 ● ● ● ● −2 −1 0 1 2 3 Theoretical Quantiles BIOST 515, Lecture 6 25 Fitted values versus residuals ●
●
●
●
●
●
●●●
●
● ● ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ● ●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●●
●
● ●●
●
●
●
●
●
●
●
●
●
● ●
●● ●●●
●
●
●
●●
●
●●
●● ●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●●
● ●●●
●
●● ●
●●
● ●
●
●
●
●●
● ●●
● ● ●
● ● ●● ● ● ●●
●
●
●● ● ●
●●●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●● ●
●
●●
●
●●
●● ●●●
●
●●
●
●
●
●
●
●● ●● ● ●●
●
● ● ●● ● ● ●
●
●
●● ●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●●●●
●
●●
●●
●
●
● ●● ● ● ●●
●
●
●
●
●
●
●●
● ●● ●
●
●●●
●
●
● ●
●
●
●●
●●
● ●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●● ●● ●● ●●
●
●
●
●●●
●
● ●
● ●
● ●
●
●
●
● ●●● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●● ●
●
●
● ●●●●
●
●
●
●
● ● ●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
● ● −2 −1 ei 0 1 2 ● ● −3 ● ●
● ● −10 −5 0
y^i BIOST 515, Lecture 6 26 What if we have more than one predictor and only one is
misspecified?
So, the true model is
E[yi] = 1 + 3wi + 2xi − 0.25x2i
and we fit
yi = β0 + β1wi + β2xi + i.
How can we diagnose model misspecification with residual plots
in this case? BIOST 515, Lecture 6 27 4.0
● ●●
●●
●●
●
●● ●
● ●
● ●●●
●
●
●
●
●
●●
●
●● ●
●● ●
●●
● ●
● ●
●
●● ● ●●
●●●
●● ●● ● ●●● ●● ●●● ●●
●
● ● ●●
● ●
●
●
●
● ● ●●
●● ● ●
●●
●
●●●
●
●● ●
● ●●● ●
●
●●●● ●
● ●●●
●●● ● ●●
●
●
●● ●●●●●
●● ●● ●●●
●●
●
●
●
●
●
●●●
● ●●● ●
● ●
● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●●●● ● ● ●● ● ●● ●● ● ●
●●
●● ●
●● ●● ●
● ●●
●●
●●
●● ●
●●
●●●
●●
●●
●
●
●●●
●●
●
●●● ●
●●●●●
●●
●●●●
●●
●
● ●●
●●● ●●●
● ●●● ●
●
●
●●●
●
●
●●●●●
●
●●
●
●
●●●● ●
● ●●
● ●
●●●●●
●
●●●
●● ● ●●●
● ●
●
●●
●
●●
● ●●
● ●
●
●
●● ●
● ●●●●
●
●
●● ● ●● ●
●● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●● ●● ●
● ●
●
● ●
●
● ●●
●
● ●●
●● ●
●●
●
●
●
● ● ●
●
●
●●● ● ●● ●
●
●●
●● ● ●●
●
●●
●● ●
● ●●●● ●
● ●● ● ●
●
● ●●
●
●
● ● ● ●● ●●
●●
● ● ●●
● ●
● ●●
●
● ●
●
●
● ●
●
●
●
●
● ●●●
●
●●
●
●
●●
●
●
●
●
●
● ● ●●
●●●
●●
●
● ●● ●●
● ●●
●
●●
●●
●●
●● ●
●● ●
● ●●
●
●●●
●
● ●
●●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ● ●●
●●
●
● ●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●●
●●●
●● ● ● ●●
●●●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●●● ● ● ● ● ●●
●
●
●
● ●●
●
●
● ●● ●
●
● ●●●●
●
●●●
●
●
●●
●●
● ●●● ●●
●●●●● ●●
●
●●
●
●●●
● ● ●●
●
● ● ●●● ●
●
●
● ●●
●
●
●● ●
●● ● ●
●●● ●
●
●
●
●●
●● ●
●
● ● ●●
●
●
●●
●
●
●
●●
● ●●●
●...
View
Full Document
 Fall '15
 Rogers
 Science, Normal Distribution, Regression Analysis, Yi, Errors and residuals in statistics, Studentized residual, ●●

Click to edit the document details