Homework #2
—
Multicollinearity/Missing Data
—
Page 1
Soc 63993, Advanced Social Statistics II
Homework No. 2
Multicollinearity/Missing Data
I.
Multicollinearity
[The following problem is adapted from Greene, Econometric Analysis, Fourth Edition.] The
data in
longley.dta
(available at
http://www.nd.edu/~rwilliam/xsoc63993/index.html
) were
collected by James W. Longley (
“An Appraisal of Least Squares Programs for the Electronic
Computer from the point of view of the User,” Journal of the Ameri
can Statistical Association,
Vol. 62, No. 319 (Sep. 1967), pp. 819-841) for the purpose of assessing the accuracy of least
squares computations by computer programs. (If you want to see how they did things before the
advent of modern computers, the article is available on JSTOR in the statistics journals.)
Economic data were collected for the US for each of the years 1947-1962.
The variables are:
Variable
Description
employ
Number of people employed (in thousands).
This is the
dependent variable in the analysis
price
Gross National Product Implicit Price Deflator.
This is an
adjustment for inflation.
It equals 100 in the base year, 1954.
Because of inflation, it is higher in years after 1954, and lower in
years before that.
A value of 110 would mean that, in that
particular year, it cost $110 to buy the same goods that cost $100
in 1954.
gnp
Gross National Product (in millions of dollars)
armed
Size of armed forces (in thousands)
year
Year the data are from
A.
Diagnosis
.
Analyze these data with Stata.
First, give the commands
. list
. summarize
just so you can get a feel for the characteristics of the data.
Then give the command
. regress
employ price gnp armed year
Then, do further examination to determine what evidence, if any, suggests that multicollinearity
may or may not be present in these data.
Estimate and examine the bivariate correlations,
tolerances/VIFs, condition numbers, the sample size, and anything else that you think would help
to diagnose a problem of multicollinearity if it existed.
For everything you do, be sure to explain
what it means and how it applies to multicollinearity; don’t just give numbers without
explanation.
If you find that multicollinearity is present, offer a substantive explanation for it,
i.e. why are these variables so highly correlated with each other?

This ** preview** has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*