lab7 - STAT5044: Lab7 Inyoung Kim Outline 1 How to perform...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STAT5044: Lab7 Inyoung Kim Outline 1 How to perform variable selection using R Example We illustrate the variable selection methods on some data on the 50 states from the 1970s. The data were collected from US Bureas of the Census. We will take life expectancy as the response and the remaining variables as predictors. Call data Read data > #1. call data > data(state) > statdata<-data.frame(state.x77,row.names=state.abb) > > statdata Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area AL 3615 3624 2.1 69.05 15.1 41.3 20 50708 AK 365 6315 1.5 69.31 11.3 66.7 152 566432 AZ 2212 4530 1.8 70.55 7.8 58.1 15 113417 AR 2110 3378 1.9 70.66 10.1 39.9 65 51945 CA 21198 5114 1.1 71.71 10.3 62.6 20 156361 CO 2541 4884 0.7 72.06 6.8 63.9 166 103766 CT 3100 5348 1.1 72.48 3.1 56.0 139 4862 . . . All pair scatter plot Scatter plot > pairs(statdata) Scatter plot Scatter plot Population 3 0 0 0 5 0 0 0 6 8 7 1 4 0 5 5 0 e + 0 0 4 e + 0 5 010000 30005000 Income Il iteracy 0.51.52.5 68 71 Life.Exp Murder 26 12 40 55 HS.Grad Frost 100 1 0 0 0 0 0e+004e+05 0 . 5 1 . 5 2 . 5 2 6 1 2 1 0 0 Area Simple linear regression Simple linear fit > ##################################################### > ##PartI: sequential variable selection > ###################################################### > #1. Illustrate the backward method-at each stage we remove the predictor with the largest p-value over 0.05 > g<-lm(Life.Exp ., data=statdata) #multiple regressin including all variables Multiple linear regression Fit multiple linear regression > summary(g) Call: lm(formula = Life.Exp ., data = statdata) Residuals: Min 1Q Median 3Q Max-1.48895 -0.51232 -0.02747 0.57002 1.49447 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.094e+01 1.748e+00 40.586 < 2e-16 *** Population 5.180e-05 2.919e-05 1.775 0.0832 . Income-2.180e-05 2.444e-04-0.089 0.9293 Illiteracy 3.382e-02 3.663e-01 0.092 0.9269 Murder-3.011e-01 4.662e-02-6.459 8.68e-08 *** HS.Grad 4.893e-02 2.332e-02 2.098 0.0420 * Frost-5.735e-03 3.143e-03-1.825 0.0752 . Area-7.383e-08 1.668e-06-0.044 0.9649--- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.7448 on 42 degrees of freedom Multiple R-Squared: 0.7362, Adjusted R-squared: 0.6922 F-statistic: 16.74 on 7 and 42 DF, p-value: 2.534e-10 Backward Elimination Fit multiple linear regression > gall<-g > g<-update(g, ..-Area) Backward Elimination Backward Elimination > summary(g) Call: lm(formula = Life.Exp Population + Income + Illiteracy + Murder + HS.Grad + Frost, data = statdata) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.099e+01 1.387e+00 51.165 < 2e-16 *** Population 5.188e-05 2.879e-05 1.802 0.0785 ....
View Full Document

Page1 / 55

lab7 - STAT5044: Lab7 Inyoung Kim Outline 1 How to perform...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online