Lab 5.pdf - Lab 5 Stat 326 Solutions What is driving the...

This preview shows page 1 out of 4 pages.

You've reached the end of this preview.

Unformatted text preview: Lab 5 Stat 326 Solutions What is driving the Cost of Cars? Manufacturers are interested in understanding the driving factors of car prices. They want to make the best decisions in terms of car designs. One way to justify increasing the price of cars is to adjust the size of the car. But size is a function of several other components. Before investing in exotic materials and different sizes, manufacturers want evidence of the benefit. We will answer this question using multiple regression. Data collected on several new cars, on which various variables have been measured, are presented in the JMP file Cars. jmp and a glossary of the tabulated variables is given SHOW ALL YOUR WORK FOR FULL CREDIT! below. M ultivariate MidPrice MPG.City Horsepower REV Wheelbase MidPrice 1.0000 -0.5946 0.7882 -0.4264 0.5009 M PG.City -0.5946 1.0000 -0.6726 0.6959 -0.6671 Horsepower 0.7882 -0.6726 1.0000 -0.6003 0.4869 REV -0.4264 0.6959 -0.6003 1.0000 -0.6368 Wheelbase 0.5009 -0.6671 0.4869 -0.6368 1.0000 Scatterplot Matrix MidPrice 10 20 30 40 50 6010 20 25 30 35 40 45 50 100 150 200 250 1500 2500 3500 Wheelbase 90 95100105 115 Response MidPrice Response MidPrice W hole M odel Summary of Flt Summary of Fit RSquare . 0.664458 RSquare 0. 65979 quuare Ad] 0549206 RS uare Ad' 0 648322 Root Mean Square Error 5.72107 q J ' Root Mean Square Error 5.728277 Mean of Response 19.50968 Observations (or Sum W ts) 93 Mean 0f Response 1950968 3 Observations (or Sum Wgts) 93 Analysis Of Variance Analysis of Variance Sum of Source DF S uares Mean S uare F Ratio Sum 0f M d I 4 570: 7248 1:25 93 43 5656 Source DF Squares Mean Square F Ratio E ° e 2 '2 2'7 P b F Model 3 5663.6503 1887.88 57.5343 ":r I 8: 88°" :55 3 ' 3 r° >1* Error 89 29203709 32.81 Prob > F < C' °ta 9 8584” 3 "000 c. Total 92 8584.0213 < .0001* Parameter Estimates ' Parameter Estimates Lerm 233:: Std1:r7r:; t R22: P3002221! Term Estimate Std Error t Ratio Prob> |t| tercel’t ' ' ' ' ' ' Intercept -47.12988 14.52193 -3.25 0.0017* M PG.CIty -0.196655 0.177725 -1.11 0.2715 Horsepower 0.1460375 0.014467 10.09 <.0001* Horsepower 0.1387173 0.015891 8.73 <.0001* REV 00040063 0001729 232 00228* REV 0004644 000182 2'55 001251! Wheelbase 03491407 0 11525 3.03 0.0032* Wheelbase 0.2979725 0.124047 2.40 0.0184* ' ' ' ' a Response Variable: MidPrice (in $1,000, average price between the basic version of this model and the price for the premium version) o Explanatory variables: MPG.City (city miles per gallon, by EPA rating) Horsepower (number of) REV (engine revolutions per mile, in highest gear) Wheelbase (distance between a car’s front and rear wheels, in inches) In this lab, you will use the utilities in J MP, called Multivariate and Fit Model. 1. Create a scatterplot matrix and a correlation matrix of all variables. To do this go to Analyze —> Multivariate methods —> Multivariate. Then put all variables names into Y, columns and click OK. Keep this scatterplot matrix and the correlation matrix for a reference but no need to turn this output in. (a) (b) (C) (d) Based on the scatterplot and correlation matrix which explanatory variable(s) appear(s) to have a strong linear relationship with the response? Horsepower has a strong linear relationship to the MidPrice. Based on the scatterplot and correlation matrix which explanatory variable(s) appear(s) to have a moderate linear relationship with the response? Both conditions must be met to include a variable here. Rev and Wheelbase both have moderate linear relationships to the MidPrice. Based on the scatterplot or correlation matrix which explanatory variable(s) appear(s) to have a positive linear relationship with the response? Horsepower and Wheelbase have positive linear relationships to MidPrice. Based on the scatterplot and correlation matrix which explanatory variable(s) appear(s) to have a non—linear relationship with the response? MPG.City has a non-linear relationship with MidPrice. The relationship looks more close to a concave upward curve. 2. Using Fit Model, fit a multiple regression model that predicts MidPrice based on all the explanatory variables in this dataset. We will call this our Full Model. Write out the estimated model in the space below. :1) = —37.84744 — 01966555101 + 0.1387173m2 + 0.004644m3 + 0.2979725“ 3. Perform a hypothesis test to determine if there is a significant negative linear relationship between MPG.city and Midprice of cars. 0 Hypothesis: ,81 = 0 vs ,81 < 0 0 Test Statistic: W = —l.11 o P-value: 0.2715/2 = 0.13575 0 Decision: fail to reject H0 0 Conclusion: There is not statistically significant evidence of a negative linear relationship be- tween the MidPrice of cars and the City miles per gallon of cars after accounting for the horse— power, rev and wheelbase. 4. Do your conclusions from part (3) surprise you based on your observations in part (1)? Explain. The conclusions are not surprising because the relationship between MidPrice and MPG.City is shown to be quadratic in the scatterplot matrix. 5. Based on your work from part (3) we will consider a new model only including the explanatory variables Horsepower, Rev and Wheelbase. This new model will not include MPG.City. Using Fit Model, fit a multiple regression model that predicts MidPrice based on the explanatory variables Horsepower, Rev and Wheelbase. We will call this our Reduced Model. Write the estimated model in the space below. 3) = —47.12988 + 014603759132 + 00040063303 + 03491407104 6. Compare the R2 values from both models (Full Model and the Reduced Model). Based on the R2 make a case for the Reduced model despite the larger R2 value in the Full Model. Every time we add an additional variable to a model we should see a slightly larger R2. Although the R2 value is larger in the full model the difference is less than 0.005. This very small increase in the R2 value does not justify the additional complexity we see in the 4 variable model compared to the 3 variable model. 7. (Reduced Model) Provide an interpretation for the estimated slope associated with the variable wheelbase. For every additional inch distance between the vehicles front and back wheels the predicted increase in the MidPrice of cars goes up by $349.14 assuming horsepower and REV remain fixed. 8. (Reduced Model) Perform a hypothesis test to determine if there is a significant positive linear relationship between REV and the Midprice of cars. 0 Hypothesis: H0 : fig = 0 vs Ha : ,32 > 0 0 Test Statistic: 2.32 o P-value: p—value = 0.0228/2 = 0.0114 0 Decision: Reject the null hypothesis. 0 Conclusion: There is statistically significant evidence for a positive linear relationship between the number of revolutions per mile and the mid price of cars assuming the horsepower and wheelbase remain fixed. 9. (Reduced Model) Provide an approximate interpretation of the RMSE using the empirical rule with 95%. Be sure your interpretation is in the context of this example. Approximately 95% of the actual Mid prices of cars will be within $11,456 of the predicted Mid price of cars. 10. (Reduced Model) For Simple Linear Regression we discussed finding the standard error for an in- dividual in order to create a Prediction Interval (PI) for an individual response. We also discussed using J MP to find the standard error for the mean response used to create a Confidence Interval (CI) for the mean response. In multiple regression we can also use JMP to get the standard error for an individual response and the standard error for the mean response. Although the formulas going into the calculations of these standard errors are much more complicated for multiple regression (com— pared to simple linear regression) the same underlying ideas apply for creating PIs for an individual response and 01s for a mean response. Here we will ask you to use a similar procedure as used in lab 3 to find a 95% prediction interval for one specific car: This car has 25 City MPG, a horsepower of 95, Wheelbase of 120 inches, and the car does 2500 engine revolutions per mile. First add the new row of information to your data table. Next Exclude this new row. Finally rerun the model. Save the predicted values and save the column with the standard error required to make a prediction interval (Standard Error for Individual). (a) Find the predicted Midprice of this new car. You may use JMP to automate this prediction. Also show your work below to find the predicted value based on the prediction equation. 3] = —47.12988 + 0.1460375 * 95 + 0.0040063 * 2500 + 0.3491407 * 120 = 18.656 $18,656 is the predicted MidPrice. (b) Use JMP to find the standard error needed to create a prediction interval for the new car. Report this value here. SE!) = 6.1618 (c) The critical value will have 95% confidence and error degrees of freedom = n—p—l (Where p is the number of explanatory variables). Report the critical value you will use in this calculation. DF = 93—3—1=89 we will use DF = 80. t = 1.989 ((1) Write out the full calculation for the prediction interval for this new car. 18.656 :I: 1.989 * 6.1618 (6.400, 30.912) (e) Provide an interpretation of the interval. We are 95% confident that a car with horsepower of 95, a wheelbase of 120 inches and get 2500 revolutions per mile will have an actual MidPrice between $6,400 and $30,912. (f) Record the width of the interval from 10e. 2 * 1.989 * 6.1618 = 24.51164 0r (30.91182 — 6.40018 = 24.5116). (g) What is the value of the RMSE*4? 4 * RMSE = 5.728277 * 4 = 22.91311. (h) Compare the width of the interval from 10e to the value of the RMSE*4. Explain why these two values are close but not exactly the same? 4*RM SE = 5.728277 *4 = 22.91311. Remember the RMSE has an approximate interpretation. The exact prediction interval has a slightly higher width compared to the RMSE approximation. This demonstrates that we can get a quick guess at the width and range using the RMSE but we should perform the prediction interval calculations to produce the accurate (non-approximated) interval. ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern