Homework 2
STAT 331
Fall 2008
1.
DESCRIPTIVE ABSTRACT:
Data taken from the advertising pages of the Sunday Times a few years
ago, presenting a Mercedes Benz for sale in the UK (mainly in and around
London). The asking prices (in pounds sterling) are classified according
to type/model of car, age of car (in sixmonth units based on date of
registration), recorded mileage, and vendor.
VARIABLE DESCRIPTIONS:
1. Case number
2. Asking price in pounds
3. Type/Model of car:
0=model 500, 1=450, 2=380, 3=280, 4=200
4. Age of car in sixmonth units, based on registration date
5. Recorded mileage (in thousands)
6. Vendor (0,1,2,3 are different dealerships, 4 means "sale by
owner")
Values are aligned and delimited by blanks.
YOUR TASK:
To forecast prices of a 2 years old Mercedes 500 with 60000 miles and
being sold by owner and a 1 year old Mercedes 500 with 26000 miles and
being sold by dealership or by owner. You should
a) construct a linear regression of prices vs. available regressors
(you can add any interaction terms); omit any nonimportant
regressors if any and discuss your choice; report the summary
output of the linear regression and discuss your findings, i.e. R
squared, Fstatistic, significance of regressors;
data < read.table("Data.txt",header=TRUE)
l<lm(data$Price~data$Mod+data$Age+data$Mile+data$Vend)
Summary
summary(l)
Call:
lm(formula = data$Price ~ data$Mod + data$Age + data$Mile + data$Vend)
Residuals:
Min
1Q
Median
3Q
Max
5307.4
995.4
266.1
1235.5
4324.0
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 27669.415
937.312
29.520
< 2e16 ***
data$Mod
2819.718
244.836 11.517 5.85e14 ***
data$Age
1445.027
229.018
6.310 2.14e07 ***
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Documentdata$Mile
7.631
40.293
0.189
0.851
data$Vend
291.241
238.171
1.223
0.229

Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2002 on 38 degrees of freedom
Multiple Rsquared: 0.8301,
Adjusted Rsquared: 0.8122
Fstatistic:
46.4 on 4 and 38 DF,
pvalue: 3.982e14
We can see from the summary that the pvalue for Mile and Vend are above 0.1 which shows that
these two variables are no use. Therefore, I decided to run the regression model again but
omitting the 2 variables Mile and Vend.
Summary (Omitting Mile and Vendor)
> l<lm(data$Price~data$Mod+data$Age)
> summary(l)
Call:
lm(formula = data$Price ~ data$Mod + data$Age)
Residuals:
Min
1Q
Median
3Q
Max
5174.2
996.4
186.5
1010.7
4040.1
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept)
27847.7
920.0
30.27
< 2e16 ***
data$Mod
2815.7
239.7
11.75 1.52e14 ***
data$Age
1392.8
161.2
8.64 1.10e10 ***

Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1990 on 40 degrees of freedom
Multiple Rsquared: 0.8233,
Adjusted Rsquared: 0.8145
Fstatistic: 93.22 on 2 and 40 DF,
pvalue: 8.756e16
Fstatistics
> qf(0.99,2,40)
[1] 5.178508
PRICE = 27847.7  2815.7*MOD  1392.8*AGE
+
ε
Since Fstatistics 42.88>5.178508 therefore we can show with a 99% confidence level, we can reject H
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 YuliaGel
 Normal Distribution, Regression Analysis, Errors and residuals in statistics, Residual standard error

Click to edit the document details