Unformatted text preview: Statistics 302 Spring 2010 Assignment #4
Solutions 1. Section 18.5, #12 (a) Scatter plot of fatality rate versus year. There does appear to be a negative relationship between year and fatality rate, but it does not appear strongly linear. Stat 302 – Spring 2010 Assignment #4  Solutions Page 1 of 1 (b) JMP output from a linear regression of rate on year: Linear Fit rate = 0.180553
0.0101633*year Summary of Fit RSquare 0.558887 RSquare Adj 0.514775 Root Mean Square Error 0.034144 Mean of Response 0.114492 Observations (or Sum 12 Wgts) Parameter Estimates Term Estimate Std Error t Ratio Prob>t Intercept 0.180553 0.021014 8.59 <.0001 year
0.010163 0.002855
3.56 0.0052 Residual plot (fitted values versus residuals): Stat 302 – Spring 2010 Assignment #4  Solutions Page 2 of 2 The R2 indicates that 55.9% of the variability in fatality rate is explained by the year, but the residual plot indicates a distinct U
shaped pattern so the residuals do not appear to be independent and the linear model may not be appropriate. (c) Scatter plot of fatality rate versus ln(year). Linear Fit (d) JMP output from a linear regression of rate on ln(year): rate = 0.2135065  0.0594469*ln(year) Summary of Fit
RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.840284 0.824312 0.020545 0.114492 12 Parameter Estimates
Term Intercept ln(year) Estimate 0.2135065 0.059447 Std Error 0.014884 0.008196 t Ratio 14.35 7.25 Prob>t <.0001 <.0001 Page 3 of 3 Stat 302 – Spring 2010 Assignment #4  Solutions Residual plot (fitted values versus residuals): The R2 has increased dramatically (84%) implying that this model is a superior fit to that in part (b). however, there still appears to be a u
shaped pattern in the residual plot so the linear model may still not be appropriate. (e) Scatter plot of fatality rate versus 1/year: Stat 302 – Spring 2010 Assignment #4  Solutions Page 4 of 4 (f) JMP output from a linear regression of rate on ln(year): Linear Fit rate = 0.0677316 + 0.1808192*1/year Summary of Fit
RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.943277 0.937605 0.012244 0.114492 12 Parameter Estimates
Term Intercept 1/year Estimate 0.0677316 0.1808192 Std Error 0.005064 0.014022 t Ratio 13.38 12.90 Prob>t <.0001 <.0001 The R2 is now greater than either of the prior models (94.3%) implying that this model is the best of those attempted. The residual plot could be interpreted as being heteroscedastic, but only one point (with a large positive residual) is influencing this impression. Stat 302 – Spring 2010 Assignment #4  Solutions Page 5 of 5 (g) Transforming the year to 1/year provided the best fit of the three models attempted. This model resulted in the largest R2, resulted in the most significant β (absolute value of t
statistic = 12.9). In addition, the residual plot for the model involving 1/year was most consistent with the assumptions for a linear regression (borderline heteroscedastic) while the other two models showed a clear u
shaped pattern. 2. Section 18.5, #13 (a) JMP output of summary statistics for expense per admission (expadm) and length of stay (los). expadm los 4612.0 4612.0 4459.9 3585.6 3101.0 2600.0 2248.0 2061.6 1798.1 1772.0 1772.0 Quantiles 100.0% 99.5% 97.5% 90.0% 75.0% 50.0% 25.0% 10.0% 2.5% 0.5% 0.0% maximum quartile median quartile minimum Quantiles 100.0% 99.5% 97.5% 90.0% 75.0% 50.0% 25.0% 10.0% 2.5% 0.5% 0.0% maximum quartile median quartile minimum 9.7000 9.7000 9.6100 8.6800 8.3000 7.7000 6.6000 5.9800 5.4300 5.4000 5.4000 Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N 2716.8039 603.94708 84.569507 2886.6668 2546.9411 51 Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N 7.4901961 1.0151364 0.1421475 7.7757078 7.2046843 51 Stat 302 – Spring 2010 Assignment #4  Solutions Page 6 of 6 The mean and median of expense per admission and length of stay are $2716.80 and $2600 and 7.49 days and 7.7 days respectively. The minimum and maximum values of expense per admission and length of stay are $1,772 and $4, 612 and 5.4 days and 9.7 days respectively. (b) Scatterplot of expense per stay (expadm) versus length of stay (los): The scatterplot shows a weak positive relationship between length of stay and expense per admission. Stat 302 – Spring 2010 Assignment #4  Solutions Page 7 of 7 (c)
JMP output for regression of expense per admission (expadm) on length of stay (los): Linear Fit expadm = 1281.9595 + 191.563*los Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.103675 0.085383 577.5886 2716.804 51 Parameter Estimates Term Estimate Std Error t Ratio Prob>t Intercept los 1281.9595 191.563 608.1041 80.4654 2.11 2.38 0.0402 0.0212 The estimated intercept is $1281.96. This implies that no matter what the length of stay, the expense per admission is at least $1281.96. The estimated slope is $191.56/day. This implies that for every day in hospital, the expense increases by $191.56. (d) JMP output for 95% confidence interval for β. Term Parameter Estimates Estimate Std Error t Ratio Prob>t Lower 95% Upper 95% 1281.9595 191.563 608.1041 80.4654 2.11 2.38 0.0402 0.0212 59.928506 29.861721 2503.9904 353.26428 Intercept los Stat 302 – Spring 2010 Assignment #4  Solutions Page 8 of 8 The confidence interval for β excludes zero, [$29.86/day, $353.26/day]. This indicates that the slope is significantly different than zero and that as the length of stay increases we are 95% confident that the expense per admission will increase. (e) The R2 for this model is 10.3%. This is the square of the pearson correlation coefficient. (f) Scatterplot of fitted values vs residuals: The residual is used to confirm that the residuals are normally distributed (and no outliers), they have consistent variance over different values of x (homoscedasticity), and do not exhibit an obvious pattern (independence). Stat 302 – Spring 2010 Assignment #4  Solutions Page 9 of 9 3. Section 19.4, #11 (a) JMP output for summary of average salary per employee (salary). Quantiles Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N 100.0% maximum 23594 99.5% 23594 97.5% 22126 90.0% 16940 75.0% quartile 15578 50.0% median 14573 25.0% quartile 13559 10.0% 12923 2.5% 12102 0.5% 11928 0.0% minimum 11928 14852.412 1965.514 275.22702 15405.221 14299.602 51 The mean and median average salary per employee are $14,852 and $14,573 respectively. The maximum and minimum average salary are $23,594 and $11,928 respectively. Stat 302 – Spring 2010 Assignment #4  Solutions Page 10 of 10 (b) Scatterplot of average salary per employee (salary) and expense per admission (expadm): The scatterplot indicates a strong positive and likely linear relationship between salary and expense per admission. (c) JMP output for the regression of expense per admission (expadm) on average salary per employee (salary) and length of stay (los). Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.758926 0.748881 302.6486 2716.804 51 Parameter Estimates Term Estimate Std Error t Ratio Prob>t Intercept los salary
2582.736 213.79672 0.248994 464.77 42.20769 0.021799
5.56 5.07 11.42 <.0001 <.0001 <.0001 Page 11 of 11 Stat 302 – Spring 2010 Assignment #4  Solutions The estimated regression coefficient for length of stay can be interpreted as an estimated increase of $213.80 in expense per admission for every day in hospital. The estimated regression coefficient for salary can be interpreted as an estimated increase of $0.25 in expense per admission for every $1 increase in employee’s average salary. (d) The coefficient for length of stay has increased by approximately $22/day when accounting for the effect of the average salary of employees (e) The R2 went from 10.3% when only length of stay was included and increased to 75.9% when average salary of employees was included. Including average salary explains substantially more of the variability in the response and will result in better prediction of the expense per admission. (f) Scatter plot of residuals vs predicted values: Stat 302 – Spring 2010 Assignment #4  Solutions Page 12 of 12 The residual plot is clearly consistent with 2 required assumptions for a linear regression (normality of residuals and no obvious patterns). There is athe possibility that the variance could be higher for large predicted values. However, this perception is strongly influenced by the one point with the highest predicted value. Other than this single point, the variance seems constant and we can determine that the model is acceptable. [ aside (not for marks): the influential point is Alaska. Average salaries in Alaska are substantially higher than any other state. This point is influential; removing it results in an R2 increase up to 84% and a reduction in the slope for length of stay to $168.62/day … a reduction of more than $50/day! ] Stat 302 – Spring 2010 Assignment #4  Solutions Page 13 of 13 3. Section 19.4, #12 (a) Scatterplots of the four explanatory variables: The variables police, register, and weekly appear to have a positive linear relationship with the homicide rate. Stat 302 – Spring 2010 Assignment #4  Solutions Page 14 of 14 (b) JMP output for the regression of homicide rate on police, register, weekly and, unemployed. Linear Fit homicide =
77.63027 + 0.3374493*police Summary of Fit RSquare 0.929414 RSquare Adj 0.922997 Root Mean Square Error 4.546874 Mean of Response 25.12692 Observations (or Sum Wgts) 13 Parameter Estimates Term Estimate Std Error t Prob>t Ratio Intercept
77.63027 8.630939
8.99 <.0001 0.3374493 0.028039 12.03 <.0001 police Linear Fit homicide = 1.6620628 + 0.0430028*register Summary of Fit RSquare 0.666325 RSquare Adj 0.635991 Root Mean Square Error 9.885853 Mean of Response 25.12692 Observations (or Sum Wgts) 13 Parameter Estimates Term Estimate Std Error t Prob>t Ratio Intercept 1.6620628 5.708193 0.29 0.7763 register 0.0430028 0.009175 4.69 0.0007 Linear Fit homicide = 16.67298 + 1.4595121*unemp Summary of Fit RSquare 0.04416 RSquare Adj
0.04274 Root Mean Square Error 16.73189 Mean of Response 25.12692 Observations (or Sum Wgts) 13 Parameter Estimates Term Estimate Std Error t Prob>t Ratio Intercept 16.67298 12.73452 1.31 0.2171 unemp 1.4595121 2.047348 0.71 0.4908 Linear Fit homicide =
33.05875 + 0.3423275*weekly Summary of Fit RSquare 0.788815 RSquare Adj 0.769616 Root Mean Square Error 7.864729 Mean of Response 25.12692 Observations (or Sum Wgts) 13 Parameter Estimates Term Estimate Std Error t Prob>t Ratio Intercept
33.05875 9.335846
3.54 0.0046 weekly 0.3423275 0.053406 6.41 <.0001 The variables police, register and weekly have a significant (α = 0.05) effect on the homicide rate. (c) The coefficients of determination for each model are listed in the output in part (b). The variable police explains the most about the variability of homicide rate (92.9%). Stat 302 – Spring 2010 Assignment #4  Solutions Page 15 of 15 Response h omicide Summary of Fit (d) The following is the ‘best’ model identified through forward selection (JMP output): 0.967635 0.961163 3.229115 25.12692 13 RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Paramete r Estimates Term Estimate Std Error t Ratio Prob>t Intercept police register
64.96371 0.2699255 0.0144691 7.152402 0.027975 0.00421
9.08 9.65 3.44 <.0001 <.0001 0.0064 Prediction Expression This model implies that the variables weekly and unemployment are not linearly associated to the homicide rate. The model further indicates that as police or gun registrations increase, the homicide rate increases. [ aside (not worth marks): clearly police are not causing an increase in homicide rates; the increase in police is likely the result of increased homicide rates. This is an example of a related variable appearing in the model when it is not causal. ] Stat 302 – Spring 2010 Assignment #4  Solutions Page 16 of 16 ...
View
Full Document
 Spring '08
 Dr.AnthonyDixon
 $0.25, Mean squared error, $1,772, $23,594, $29.86

Click to edit the document details