Unformatted text preview: Lab 5 Stat 326 Solutions What is driving the Cost of Cars? Manufacturers are interested in understanding the driving factors
of car prices. They want to make the best decisions in terms of car designs. One way to justify increasing
the price of cars is to adjust the size of the car. But size is a function of several other components. Before
investing in exotic materials and different sizes, manufacturers want evidence of the beneﬁt. We will answer
this question using multiple regression. Data collected on several new cars, on which various variables have
been measured, are presented in the JMP ﬁle Cars. jmp and a glossary of the tabulated variables is given SHOW ALL YOUR WORK FOR FULL CREDIT! below.
M ultivariate MidPrice MPG.City Horsepower REV Wheelbase
MidPrice 1.0000 -0.5946 0.7882 -0.4264 0.5009
M PG.City -0.5946 1.0000 -0.6726 0.6959 -0.6671
Horsepower 0.7882 -0.6726 1.0000 -0.6003 0.4869
REV -0.4264 0.6959 -0.6003 1.0000 -0.6368
Wheelbase 0.5009 -0.6671 0.4869 -0.6368 1.0000 Scatterplot Matrix MidPrice 10 20 30 40 50 6010 20 25 30 35 40 45 50 100 150 200 250 1500 2500 3500 Wheelbase 90 95100105 115 Response MidPrice Response MidPrice W hole M odel
Summary of Flt Summary of Fit
RSquare . 0.664458 RSquare 0. 65979
quuare Ad] 0549206 RS uare Ad' 0 648322
Root Mean Square Error 5.72107 q J '
Root Mean Square Error 5.728277
Mean of Response 19.50968
Observations (or Sum W ts) 93 Mean 0f Response 1950968
3 Observations (or Sum Wgts) 93
Analysis Of Variance Analysis of Variance
Source DF S uares Mean S uare F Ratio Sum 0f
M d I 4 570: 7248 1:25 93 43 5656 Source DF Squares Mean Square F Ratio
E ° e 2 '2 2'7 P b F Model 3 5663.6503 1887.88 57.5343
":r I 8: 88°" :55 3 ' 3 r° >1* Error 89 29203709 32.81 Prob > F
C' °ta 9 8584” 3 "000 c. Total 92 8584.0213 < .0001*
Parameter Estimates ' Parameter Estimates
Lerm 233:: Std1:r7r:; t R22: P3002221! Term Estimate Std Error t Ratio Prob> |t|
tercel’t ' ' ' ' ' ' Intercept -47.12988 14.52193 -3.25 0.0017*
M PG.CIty -0.196655 0.177725 -1.11 0.2715
Horsepower 0.1460375 0.014467 10.09 <.0001*
Horsepower 0.1387173 0.015891 8.73 <.0001* REV 00040063 0001729 232 00228*
REV 0004644 000182 2'55 001251! Wheelbase 03491407 0 11525 3.03 0.0032*
Wheelbase 0.2979725 0.124047 2.40 0.0184* ' ' ' ' a Response Variable: MidPrice (in $1,000, average price between the basic version of this model and
the price for the premium version) o Explanatory variables: MPG.City (city miles per gallon, by EPA rating)
Horsepower (number of)
REV (engine revolutions per mile, in highest gear) Wheelbase (distance between a car’s front and rear wheels, in inches) In this lab, you will use the utilities in J MP, called Multivariate and Fit Model. 1. Create a scatterplot matrix and a correlation matrix of all variables. To do this go to Analyze —>
Multivariate methods —> Multivariate. Then put all variables names into Y, columns and click OK.
Keep this scatterplot matrix and the correlation matrix for a reference but no need to turn this
output in. (a) (b) (C) (d) Based on the scatterplot and correlation matrix which explanatory variable(s) appear(s) to have
a strong linear relationship with the response?
Horsepower has a strong linear relationship to the MidPrice. Based on the scatterplot and correlation matrix which explanatory variable(s) appear(s) to have
a moderate linear relationship with the response? Both conditions must be met to include a
variable here. Rev and Wheelbase both have moderate linear relationships to the MidPrice. Based on the scatterplot or correlation matrix which explanatory variable(s) appear(s) to have
a positive linear relationship with the response?
Horsepower and Wheelbase have positive linear relationships to MidPrice. Based on the scatterplot and correlation matrix which explanatory variable(s) appear(s) to have
a non—linear relationship with the response? MPG.City has a non-linear relationship with MidPrice. The relationship looks more close to a
concave upward curve. 2. Using Fit Model, ﬁt a multiple regression model that predicts MidPrice based on all the explanatory
variables in this dataset. We will call this our Full Model. Write out the estimated model in the
space below. :1) = —37.84744 — 01966555101 + 0.1387173m2 + 0.004644m3 + 0.2979725“ 3. Perform a hypothesis test to determine if there is a signiﬁcant negative linear relationship between
MPG.city and Midprice of cars. 0 Hypothesis: ,81 = 0 vs ,81 < 0 0 Test Statistic: W = —l.11 o P-value: 0.2715/2 = 0.13575
0 Decision: fail to reject H0 0 Conclusion: There is not statistically signiﬁcant evidence of a negative linear relationship be-
tween the MidPrice of cars and the City miles per gallon of cars after accounting for the horse—
power, rev and wheelbase. 4. Do your conclusions from part (3) surprise you based on your observations in part (1)? Explain.
The conclusions are not surprising because the relationship between MidPrice and MPG.City is shown
to be quadratic in the scatterplot matrix. 5. Based on your work from part (3) we will consider a new model only including the explanatory
variables Horsepower, Rev and Wheelbase. This new model will not include MPG.City. Using Fit
Model, ﬁt a multiple regression model that predicts MidPrice based on the explanatory variables
Horsepower, Rev and Wheelbase. We will call this our Reduced Model. Write the estimated model
in the space below. 3) = —47.12988 + 014603759132 + 00040063303 + 03491407104 6. Compare the R2 values from both models (Full Model and the Reduced Model). Based on the R2
make a case for the Reduced model despite the larger R2 value in the Full Model. Every time we add an additional variable to a model we should see a slightly larger R2. Although
the R2 value is larger in the full model the difference is less than 0.005. This very small increase in
the R2 value does not justify the additional complexity we see in the 4 variable model compared to
the 3 variable model. 7. (Reduced Model) Provide an interpretation for the estimated slope associated with the variable
For every additional inch distance between the vehicles front and back wheels the predicted increase
in the MidPrice of cars goes up by $349.14 assuming horsepower and REV remain ﬁxed. 8. (Reduced Model) Perform a hypothesis test to determine if there is a signiﬁcant positive linear
relationship between REV and the Midprice of cars. 0 Hypothesis: H0 : ﬁg = 0 vs Ha : ,32 > 0
0 Test Statistic: 2.32 o P-value: p—value = 0.0228/2 = 0.0114
0 Decision: Reject the null hypothesis. 0 Conclusion: There is statistically signiﬁcant evidence for a positive linear relationship between
the number of revolutions per mile and the mid price of cars assuming the horsepower and
wheelbase remain ﬁxed. 9. (Reduced Model) Provide an approximate interpretation of the RMSE using the empirical rule with 95%. Be sure your interpretation is in the context of this example. Approximately 95% of the actual Mid prices of cars will be within $11,456 of the predicted Mid price
of cars. 10. (Reduced Model) For Simple Linear Regression we discussed ﬁnding the standard error for an in-
dividual in order to create a Prediction Interval (PI) for an individual response. We also discussed
using J MP to ﬁnd the standard error for the mean response used to create a Conﬁdence Interval (CI)
for the mean response. In multiple regression we can also use JMP to get the standard error for an
individual response and the standard error for the mean response. Although the formulas going into
the calculations of these standard errors are much more complicated for multiple regression (com—
pared to simple linear regression) the same underlying ideas apply for creating PIs for an individual
response and 01s for a mean response. Here we will ask you to use a similar procedure as used in lab 3 to ﬁnd a 95% prediction interval for
one speciﬁc car: This car has 25 City MPG, a horsepower of 95, Wheelbase of 120 inches, and the
car does 2500 engine revolutions per mile. First add the new row of information to your data table.
Next Exclude this new row. Finally rerun the model. Save the predicted values and save the column
with the standard error required to make a prediction interval (Standard Error for Individual). (a) Find the predicted Midprice of this new car. You may use JMP to automate this prediction.
Also show your work below to ﬁnd the predicted value based on the prediction equation. 3] = —47.12988 + 0.1460375 * 95 + 0.0040063 * 2500 + 0.3491407 * 120 = 18.656
$18,656 is the predicted MidPrice. (b) Use JMP to ﬁnd the standard error needed to create a prediction interval for the new car.
Report this value here. SE!) = 6.1618 (c) The critical value will have 95% conﬁdence and error degrees of freedom = n—p—l (Where p is
the number of explanatory variables). Report the critical value you will use in this calculation.
DF = 93—3—1=89 we will use DF = 80. t = 1.989 ((1) Write out the full calculation for the prediction interval for this new car. 18.656 :I: 1.989 * 6.1618 (6.400, 30.912) (e) Provide an interpretation of the interval.
We are 95% conﬁdent that a car with horsepower of 95, a wheelbase of 120 inches and get 2500
revolutions per mile will have an actual MidPrice between $6,400 and $30,912. (f) Record the width of the interval from 10e.
2 * 1.989 * 6.1618 = 24.51164 0r (30.91182 — 6.40018 = 24.5116). (g) What is the value of the RMSE*4?
4 * RMSE = 5.728277 * 4 = 22.91311. (h) Compare the width of the interval from 10e to the value of the RMSE*4. Explain why these
two values are close but not exactly the same?
4*RM SE = 5.728277 *4 = 22.91311. Remember the RMSE has an approximate interpretation.
The exact prediction interval has a slightly higher width compared to the RMSE approximation.
This demonstrates that we can get a quick guess at the width and range using the RMSE but we
should perform the prediction interval calculations to produce the accurate (non-approximated)
View Full Document
- Fall '08