**Unformatted text preview: **STAT 326 LAB 6 Group Number NAMES: NAMES: In this lab, you will use the utilities in J MP, called Fit Y by X and Fit Model. 1. LEGO. In this example we will consider models aimed at predicting the original price for different LEGO sets.
The variables in this example include the following. (a) (b) The Response variable
— Numerical: y = Original Price ($) for a LEGO set.
Explanatory Variables — Numerical: x1 2 Pieces: Number of LEGO Pieces in the set. — Categorical: Type. This is the type of LEGO set. Levels for Type include: Duplo (D), Traditional
(T). In this example we will use the baselevel as LEGO set type Duplo. Create side by side boxplots of the Original price of LEGO sets by the Type of LEGO set. Comment on
the similarities and/ or differences in the original price by LEGO set type. Go to Fit Y by X. Put Type
into the X7 Factor and put Original price into the Y, Response. There is a lot of overlap between the prices of LEGO sets for the two sets. The spread of prices for the
Traditional sets is larger than the spread of prices for the Duplo sets. Create a scatterplot of the Number of Pieces versus the Original price of different LEGO sets. The symbols
on the graph are to distinguish between the different types of LEGO sets. The D represents Duplo sets,
the T represents traditional LEGO sets. i. Based on this scatterplot comment on the the relationship between number of pieces and the original
price of LEGO sets for each of the 2 types of LEGO sets.
For each level of LEGO Type there is a strong positive linear relationship between number of pieces
and the price of LEGO sets. ii. Clearly explain why the scatterplot suggests including an interaction between number of pieces and
type of LEGO set. The visual slope between number of Pieces and the original price of LEGO sets is higher for the Duplo
Sets compared to the Traditional sets. Assume DUPLO is the baselevel for LEGO Set Type. Deﬁne the indicator variable needed in the space
below. Create an appropriate column in JMP for this indicator variable. Select the column Type. Go to
Cols. Select Utilities. Select Make Indicator Columns. Change the setting so the new column is numerical.
To do this click on the Red chart next to the word Traditional on the left side of the data table. Make
sure the Continuous option is selected. (ET = 1 if the set is of type Traditional and 0 if Duplo. (d) (e) (f) Write out the population model for the original price of LEGO sets based on a linear relationship between
price and number of pieces and different intercepts for each LEGO set type. Additionally we will assume
the relationship between number of pieces and price depends on the Type of LEGO set. (This model
should include an intercept and three additional beta coefﬁcients (p = 3). If you have questions please
discuss with groups around you and then ask your TA. ) ﬂy 2 50 + [31931 + 32931" + 53181931" Use the JMP utility Fit Model to ﬁnd the estimate of the model from part 1d. To include any cross
product terms in your estimated model highlight both variables required for the cross product and click
“Cross” under construct Model Effects. If you hold down the Ctrl key you will be able to select two
different variables simultaneously. Be sure to also include all ﬁrst order terms in the estimated model as
well. Reminder: Do not put the categorical column from JMP labeled “Type” into your Fit Model. By
default this will create variables with a 1, 0, -1 coding. Instead include the indicator variable column you
have created into your model (along with any cross product terms). For this example: Go To Analyze, Fit Model and put Original Price in the Y box. Select “Number of
Pieces” and put in Add. Selection “Traditional” and put in Add. Holding down the ctrl key and select
“Number of Pieces” and “Traditional”. While both “Traditional” and “Number of Pieces” are selected
“Cross” (right below Add). Before Clicking Run make sure you go to model speciﬁcation and
uncheck the option Center Polynomials.
i. Write out the full prediction equation. 3) = 9.986 + 0.379131 — 11.739xT — 0268wa ii. Use this output to write the simpliﬁed prediction equation for the linear relationship between the
Price and Number of pieces for Traditional LEGO sets. Show all your work. g 9.986 + 0.379%1 — 11.739 * 1 — 0.268m1 * 1
g = (9.986 — 11.739) + (0.379 — 0.2629951
g —1.753 + 0.11197,-1 iii. Use this output to write the simpliﬁed prediction equation for the linear relationship between the
Price and Number of pieces for Duplo LEGO sets. Show all your work. 17 = 9.986 + 0.379371 — 11.739 * 0 — 0.2681131 * 0
g) = 9.986 + 0.379.191 Use your answer to parts (1(e)ii and 1(e)iii) to explain what it means for the relationship between number
of pieces and price to depend on the LEGO set type. The linear relationship between number of pieces and the price has different estimated slopes. The slopes
of these relationships depend on the LEGO set type. (g) Check the assumption of the form of the model. To answer this questions we create a scatterplot with
predicted price on the x axis and residuals on the y axis. In the space below comment on the validity of
this assumption. We see positive and negative residuals across the range of predicted values. This suggests we have no
violation of the assumption related to the form of the model. (h) Check the assumption of constant variance. To answer this questions we create a scatterplot with
predicted price on the x axis and residuals on the y axis. In the space below comment on the valid—
ity of this assumption. We see a similar spread of the errors for each predicted value. This suggests we have no concern with the
constant variance assumption. (i) Check the assumption of normally distributed errors. To answer this questions we create a
normal quantile plot of the residuals. In the space below comment on the validity of this assumption. The Residuals appear to follow the straight line in the normal quantile plot. Furthermore all the ob—
servations are between the upper and lower bands. We do not have any concern with the normality
assumption. (j) Is the multiple regression model proposed in part (1d) useful in modeling the original price of LEGO sets?
Show a full hypothesis test to justify your answer. H0 : [31 2 ﬁg 2 [33 = 0 vs Ha : 51- ¢ 0 for at least one i in (1,2,3)
F —Ratio: 170.70 p—value: < 0.0001 Reject the null hypothesis There is statistically signiﬁcant evidence to suggest this model is helpful in describing the original
price of LEGO sets. (k) Complete the R2 interpretation: 91.1 percent of the variability in the price of LEGO sets is explained by
the model with explanatory variables of number of LEGO pieces and Type of LEGO set. (1) Based on this example Complete the interpretation of the RMSE: @ percent of the actual original prices of these LEGO
will be within 14.86 of the corresponding predicted LEGO set price (In) Conduct a hypothesis test to determine if the relationship between Number of Pieces and the price of
LEGO sets depends on the LEGO set Type. H02ﬁ3=0vsHazﬂ37$0
F —Ratio: —7.06 p—value: < 0.0001 Reject the null hypothesis There is statistically signiﬁcant evidence to suggest the relationship between Number of Pieces and
the price of LEGO sets depends on the LEGO set Type. 2. A conceptual example: Assume you are a manager for a large cereal manufacturer. This manufacturer
has recently developed a new brand of cereal called “Morning Chipper”. Your marketing team has proposed
4 different visuals for the outside of the box design (Design A, Design B, Design C, Design D). Your goal is
to determine which display type would be best for sales even after accounting for the price of the cereal. You
propose to model the sales (response variable with units as number of boxes sold) based linearly on the price
(explanatory variables with units of $) and the box design type. Additionally you want this model to account
for a possibility that the relationship between price and sales depends on the box design. Assume design
version D as the baselevel. (a) Deﬁne all variables needed to write out the population model. This requires deﬁning all numerical vari-
ables and all indicator variables.
$1 is the price of the box of cereal.
95A is 1 if Design A and 0 if Not Design A
$3 is 1 if Design B and 0 if Not Design B
we is 1 if Design C and 0 if Not Design C (b) Clearly write out the full population model.
My = 30 + 31551 + 3231A + 33-103 + ﬂ4$c + ﬁsh-TA + ﬁG-leB + 37151930
(0) Write out the simpliﬁed version of the population model for Design version A.
ﬂy =50+51$1 +ﬁ2*1+53*0+/34*0+55$1 *1+56-’E1 *0+37-’171 *0
My = 50 + 31551 + 32 + 35551
My = (’80 + ,82) + (,61 + ,35).’t1 ((1) Write out the simpliﬁed version of the population model for Design version D.
My =30+51$1 +32*0+53*0+34*0+55$1 *0+,86$1 *0+ﬁ7-’131 *0
My = 130 + IBI$1 (e) Assume you collect a random sample of 250 observations. Based on the described model above report the
error degrees of freedom for this example. Error Degrees of freedom: n — p — 1 = 250—7—1 = 242 Response Original Price Summary of Fit RSquare 0.911051
RSquare Adj 0.905714
Root Mean Square Error 7.431279
Mean of Response 36.89741
Observations (or Sum Wgts) 54 Analysis of Variance Sum of
Source DF Squares Mean Square F Ratio
M odel 3 28281.342 9427.11 170.7071
Error 50 2761.196 55.22 Prob > F
C. Total 53 31042.537 < .0001 * Parameter Estimates Term Estimate Std Error tRatio Prob> |t|
Intercept 9.9860168 2.571364 3.88 0.0003*
Number of Pieces 0.3791072 0.037551 10.10 <.0001*
Traditional -11.73928 3.622042 -3.24 0.0021 * NumberofPieces*Traditiona| -0.26825 0.037977 -7.06 <.0001* Distributions 0.015 0.05 0.16 0.3 0.5 0.7 0.84 0.95 Normal Quantile Plot Bivariate Fit of Residual Original Price By Pred Formula Original Price
20
D T 15 a, 10 .2 D. E 5 .211 o o '3" .1: -5 8 °‘ -10
-15
-2o 10 20 30 40 50 60 70 80 90 100
Pred Formula Original Price ...

View
Full Document

- Fall '08
- GENSCHEL
- Statistics