1
* Stata OLS Regression Example.doc: Here's an example of regression analysis in Stata.
The
example uses the UCLAATS dataset hsb2.dta.
If any of the commands don't work, download
them via Stata's ‘findit’ command or via ‘Help/STB & Userwritten Programs’ (which you select
by clicking ‘help’ on the top tool bar).
April 2008.
* See ‘Explanatory Variables in OLS Regression.doc’.
* Open, describe & summarize data set, & save the listwise observations as a new
data set
use hsb2, clear
d
su, d
* Create the dummy variable ‘complete’, which contains only observations with nonmissing
data (i.e. listwise or pairwise data, which is what regression analysis uses) (even though in
this particular data set there are no nonmissing data).
mark complete
markout complete science female race ses schtyp prog read write math socst
tab complete
keep if complete==1
save complete_dataset
d
su
* Save the ‘complete data’ as a new data set, thus avoiding having to type ‘if complete==1’
repeatedly.
*
Note:
Do the following only after thoroughly checking & cleaning the data set,
including systematic univariate, bivariate, & multivariate exploratory analysis. This
should include the following (or other) checks for curvilinearity in regard to each
explanatory variable:
qfitci scienc read
lowess science read
[help lowess]
scatter science read  lowess science read, lcolor(red)  lfit science read, lcolor(blue)
scatter science read, by(female)  lowess science read, lcolor(red)  lfit science read,
lcolor(blue)
* mrunning to see lowess graph of dv with each iv, holding constant the other iv’s.
The
decision could be to categorize a quantitative iv.
xi:mrunning science read write math socst female i.race i.ses
[download ‘mrunning’]
locpoly science read
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document2
sparl science read
[download ‘sparl’]
sparl science read, logy
sparl science read, logx
sparl science read, logy logx
sparl science read, quad
* ‘boxcox’ to examine whether dv needs to be transformed: theta = 1.0 don’t transform
dependent variable; +.5, square root of dv, 0=natural log transform of dv, .5=reciprocal
square root of dv, 1.0=reciprocal transform of dv (compare results to ladder dv, but don’t do
any of these unless they make sense substantively)
boxcox reg science read write math socst
* ‘boxtid’ to explore possible transformations of explanatory variables.
Examine nlinear dev
p = . The decision could be to categorize a quantitative explanatory variable.
boxtid science read write math socst female race2 race3 race4 ses2 ses4
*‘Fractional polynomials’ & fracplot to evaluate whether a polynomial transformation will
improve model.
If a transformation is suggested, do a lowess plot. The decision could be to
categorize a quantitative explanatory variable.
fracpoly regress science read write math socst female race2 race3 race4 ses2 ses4, compare
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 Tardanico
 Regression Analysis, reg science

Click to edit the document details