STA 4702/5701 Spring 2009 HW #2 solutions 1. (a) Problem 4.8 a. 0.8804 b. 2 c. 2 because the two components explains 88% of variation. The first component would be roughly interpreted as the overall average of the 8 variables because the loadings are evenly distributed to all 8 variables to some degree and The second component seems to represent the sprints rather than the long distance races. (d) a. 0.9852 b. 1 (i) Yes (ii) For part (a), the proportion of the variation explained by the first principal component changes from that value obtained when the correlation matrix is used. For part (b), I choose only the first principal component for the covariance matrix while I choose the first two principal components for the correlation matrix. (iii) Because the variables are standardized in the correlation matrix but they are not standardized in the covariance matrix. (iv) I would recommend the correlation matrix because the correlation matrix represents the relationships among the variables after they are standardized and so it isn’t affected by the varying scales of the variables. (e) The correlation matrix x1 x2 x3 x4 x5 x6 x7 x8 Prin1 0.79067 0.35685 0.90444 0.94711 0.96372 0.94632 0.95402 0.89572 Prin2 0.38939 0.85356 0.08942 -0.02680 -0.06967 -0.18270 -0.19016 -0.27521 Prin3 -0.41292 0.37688 -0.32195 -0.11453 0.05234 0.17265 0.15636 0.25528 The first component would be roughly interpreted as the overall average of the 8 variables. The second component seems to represent the sprints rather than the long

