This preview shows pages 1–8. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Sections 7.1, 7.2, 7.4, & 7.6 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 22 Chapter 7 example: Body fat n = 20 healthy females 2534 years old. x 1 = triceps skinfold thickness (mm) x 2 = thigh circumference (cm) x 3 = midarm circumference (cm) Y = body fat (%) Obtaining Y i , the percent of the body that is purly fat, requires immersing a person in water. Want to develop model based on simple body measurements that avoids people getting wet. 2 / 22 SAS code ******************************* * Body fat data from Chapter 7 *******************************; data body; input triceps thigh midarm bodyfat @@; cards; 19.5 43.1 29.1 11.9 24.7 49.8 28.2 22.8 30.7 51.9 37.0 18.7 29.8 54.3 31.1 20.1 19.1 42.2 30.9 12.9 25.6 53.9 23.7 21.7 31.4 58.5 27.6 27.1 27.9 52.1 30.6 25.4 22.1 49.9 23.2 21.3 25.5 53.5 24.8 19.3 31.1 56.6 30.0 25.4 30.4 56.7 28.3 27.2 18.7 46.5 23.0 11.7 19.7 44.2 28.6 17.8 14.6 42.7 21.3 12.8 29.5 54.4 30.1 23.9 27.7 55.3 25.7 22.6 30.2 58.6 24.6 25.4 22.7 48.2 27.1 14.8 25.2 51.0 27.5 21.1 ; proc sgscatter; matrix bodyfat triceps thigh midarm; run; 3 / 22 Scatterplot 4 / 22 Correlation coefficients proc corr data=body; var triceps thigh midarm; run; Pearson Correlation Coefficients, N = 20 Prob > r under H0: Rho=0 triceps thigh midarm triceps 1.00000 0.92384 0.45778 <.0001 0.0424 thigh 0.92384 1.00000 0.08467 <.0001 0.7227 midarm 0.45778 0.08467 1.00000 0.0424 0.7227 There is high correlation among the predictors. For example r = 0 . 92 for triceps and thigh. These two variables are essentially carrying the same information . Maybe only one or the other is really needed. In general, one predictor may be essentially perfectly predicted by the remaining predictors (a high partial correlation), and so would be unecessary if the other predictors are in the model. 5 / 22 7.1 Extra sums of squares Extra sums of squares are defined as the difference in SSE between a model with some predictors and a larger model that adds additional predictors. Fact : As predictors are added, the SSE can only decrease. The extra sums of squares is much the SSE decreases: defn Let x 1 , x 2 ,..., x k be predictors in a model. SSR ( x 1 , x 2 , . . . , x j  x j +1 , . . . , x k ) = SSE ( x 1 , x 2 , . . . , x j ) SSE ( x 1 , x 2 , . . . , x j , x j +1 , . . . , x k ) , the difference in the sums of squared errors from the reduced to the full model. This is how much of the total variation in SSTO is further explained by adding the new predictors. 6 / 22 Example with k = 8 predictors The predictors under consideration are x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 ....
View
Full
Document
 Fall '11
 Staff
 Statistics

Click to edit the document details