Distances value of more than chi-square critical value
(with degrees of freedom is equal to the number of
explanatory variables) is classified as outliers.

Business Analytics – The Science of Data Driven Decision Making
Cook’s Distance
Cook’s distance measures how much the predicted value
of the dependent variable changes for all the observations
in the sample when a particular observation is excluded
from sample for the estimation of regression parameters.
Cook’s distance for simple linear regression is given by
where
D
i
is the Cook’s distance measure for
i
th
observation,
is the predicted value of
j
th
observation including
i
th
observation,
is the predicted value of
j
th
observation after
excluding
i
th
observation from the sample, MSE is the
Mean–Squared–Error.
MSE
)
Y
Y
(
D
j
2
j(i)
j
i
j
Y
)
(
i
j
Y

Business Analytics – The Science of Data Driven Decision Making
Leverage Value
Leverage value of an observation measures the influence
of that observation on the overall fit of the regression
function.
Leverage value for an observation in SLR is
given by
Leverage value of more than 2/
n
or 3/
n
is treated as
highly influential observation. In Eq. the first term (1/
n
)
will tend to zero for large value of
n
.
n
i
i
i
i
x
x
x
x
n
h
1
2
2
)
(
)
(
1

Business Analytics – The Science of Data Driven Decision Making
DFFit and DFBeta
•
DFFit is the change in the predicted value
of
Y
i
when case
i
is removed from the
data set. DFBeta is the change in the
regression coefficient values when an
observation
i
is removed from the data.

Business Analytics – The Science of Data Driven Decision Making
Confidence Interval for Regression
coefficients
0
and
1
The standard error of estimates of
and
are given by
where
Where S
e
is the standard error of residuals and SSX =
The interval estimate or (1-
)100% confidence
interval for
and
are given by
0
1
X
n
i
i
e
e
SS
n
X
S
S
1
2
0
)
(
X
e
e
SS
S
S
)
(
1
2
2
n
Y
Y
S
i
i
e
n
i
i
X
X
1
2
)
(
0
1
)
(
0
2
,
2
/
0
e
n
S
t
)
(
1
2
,
2
/
1
e
n
S
t

Business Analytics – The Science of Data Driven Decision Making
Confidence Interval for the Expected Value
of
Y
for a Given
X
•
Since the point estimates are subjected to higher
levels
of
error,
due
to
uncertainties
around
estimation of parameters and natural variation in the
data around the predicted line, the user would like to
know the interval estimate or the
confidence
interval
for the conditional expected value.
•
The confidence interval of the expected value of
Y
i
for a given value of
X
i
is given by
•
Where the term
is the
standard error of E(Y|X).
n
i
i
i
e
n
i
X
X
X
X
n
S
t
Y
1
2
2
2
,
2
/
)
(
)
(
1
n
i
i
i
e
X
X
X
X
n
S
1
2
2
)
(
)
(
1

Business Analytics – The Science of Data Driven Decision Making
Prediction Interval for the Value of
Y
for a
Given X
The prediction interval of
Y
i
for a given value of
X
i
is given
by
where the term,
is the standard error of Yi
for a given
Xi value
n
i
i
i
e
n
i
X
X
X
X
n
S
t
Y
1
2
2
2
,
2
/
)
(
)
(
1
1
n
i
i
i
e
X
X
X
X
n
S
1
2
2
)
(
)
(
1
1

Business Analytics – The Science of Data Driven Decision Making
For large
n
, the confidence interval of
E
(
Y|X


You've reached the end of your free preview.
Want to read all 70 pages?
- Fall '19
- Regression Analysis