### Topic 03 CLG Slides

Course: STAT 502, Fall 2011
School: Purdue University -...
Activity CLG #1 Please discuss questions 3.1-3.6 from the handout. CLG Activity #1 Q1: Research Questions? Is there an effect of smoking on SBP? Is body size associated to SBP? Can any combination of the three variables be used to predict SBP? CLG Activity #1 Q2: The use of the quetelet index as a measure of size? This shouldnt adversely affect the analysis though if data were available we might...

Activity CLG #1 Please discuss questions 3.1-3.6 from the handout. CLG Activity #1 Q1: Research Questions? Is there an effect of smoking on SBP? Is body size associated to SBP? Can any combination of the three variables be used to predict SBP? CLG Activity #1 Q2: The use of the quetelet index as a measure of size? This shouldnt adversely affect the analysis though if data were available we might want to consider height and/or weight separately. A problem to be aware of is that quetelet index may be correlated with age to some extent. Correlations among predictors can be problematic in a multiple regression analysis. CLG Activity #1 Q3: Smoking Status Variable This is what is called an indicator variable. It is perfectly acceptable in regression, but the slope parameter has a different meaning for this type of variable. What is this meaning? The associated slope will be the size of the difference in average blood pressures for smokers when compared to non-smokers. CLG Activity #1 Q4A: Are there more smokers or nonsmokers? Why do we care? There are 17 smokers and 15 non-smokers. Ideally we would prefer balance in this case, the standard error for the non-smokers will be slightly greater than the standard error for smokers. Q4B: What is the mean response? What is the mean age? Mean SBP = 144.5 bpm; Mean Age = 53.25 CLG Activity #1 Q4C: What is the sum of squares SSX for age (remember this plays a part in many SE formulas)? From UNIVARIATE output: 1500 Corrected SS (uncor. doesnt subtract mean) Do the variables look normal? Do we need them to? Somewhat symmetric but not really normal. We require only that the errors be normal. CLG Activity #1 Q5: Using the correlations table. SBP vs. Size: r = 0.74, p<0.0001 SBP vs. Smk: r = 0.24, p=0.17 SBP vs. Age: r =0.74, p<0.0001 Age vs. Size: r = 0.8, p<0.0001 Since Age and Size are so highly related, they are unlikely to both be useful in the same model. CLG Activity #1 Q6: Scatter Plots SBP increases linearly with either age or size. Notice there are a couple semi-outlying values in size slide). Any (next difference in smoking status not enough to overcome the variation; BUT once age or size has been accounted for, a difference in smoking status may show up. So we would not want to forget about that variable just yet. SBP vs. Size SBP vs. Smoking Status CLG Activity #2 Please discuss questions 3.7-3.9 from the handout. Q7: Comparison of Models Model SSR R2 Age 3586 0.558 Body Size 3538 0.551 Smoking Status 393 0.061 ALL THREE 4489 0.761 Q7: Estimates & SEs Variable SLR Slope (SE) MLR Slope (SE) Age 1.60 (0.26) 1.21 (0.32) Body Size 21.5 (3.5) 8.59 (4.50) Smoking Status 7.02 (5.02) 9.55 (2.66) Q7: Hypothesis Tests Some collinearity exists between AGE and SIZE. They are not completely redundant, but when AGE is in the model, SIZE becomes only marginally significant. The effect of Smoking Status is small and, in fact, cannot be seen prior to accounting for AGE. Note: It would be wrong to make a decision based on the SLR model here (as evidenced by the MLR output). Assumptions (1) Assumptions (2) Assumptions (3) Assumptions (4) Interaction? Interaction? (2) Interaction? (3) proc reg; model SBP = age size by smk; /vif; Smk=0 Variable Intercept Age Size DF 1 1 1 Parameter Estimate 48.61270 1.02892 10.45104 Standard Error 12.61735 0.37207 6.77025 t Value 3.85 2.77 1.54 Pr > |t| 0.0023 0.0171 0.1486 Vari Infla 3.5 3.5 ------------------------------------ --------------------------Smk=1 Variable Intercept Age Size DF 1 1 1 Parameter Estimate 48.07526 1.46624 6.74422 Standard Error 18.61755 0.59598 6.71965 t Value 2.58 2.46 1.00 Pr > |t| 0.0217 0.0275 0.3326 Vari Infla 2.8 2.8 Question 3.9 95% CI for age slope: 0.55 to 1.88 95% CI for size slope: -0.6 to 17.8 95% CI for smoking vs non-smoking: 4.5 to 15.4 Mean Response (Age = 60, Body Size = 4, Smk=0) CI: 147 to 157 PI: 136 to 168 PI for (A=40,BS=7,Smk=1): 121 to 207 This is much wider since farther from X-bar (in fact the point is outside the scope of the model) Most PIs within the scope of the model are of width 30-40. This is ok, but not all that great.
