STAT 420
Examples for 10/11/2007
Fall 2007
1.
The worksheet
case1201.csv
contains data on the average SAT scores by state. The states have
been ordered by how well their students did on the SAT on average. Researchers have tried to explain
the state by state differences in scores. Column 2 is the average SAT scores, along with six variables
that may be associated with the SAT differences among states: percentage of the total eligible students
who took the exam, median income of families of test takers, average number of years that the test
takers had formal studies in social studies, natural sciences, humanities, percentage of test takers who
attended public secondary schools, total state expenditure on secondary schools (dollars per student),
and median percentile ranking of the test takers within their secondary school classes.
> case1201.dat <
read.table("http://www.stat.uiuc.edu/~stepanov/case1201.csv", header=T,
sep=",")
> pairs(SAT ~ TAKERS+INCOME+YEARS+PUBLIC+EXPEND+RANK, case1201.dat)
View Full Document > pairs(SAT ~ log(TAKERS)+INCOME+YEARS+PUBLIC+EXPEND+RANK, case1201.dat)
> case1201.dat < subset(case1201.dat, STATE != "Alaska")
> case1201.fit < lm(SAT ~ log(TAKERS)+INCOME+YEARS+PUBLIC+EXPEND+RANK,
case1201.dat)
A
KAIKE
’
S
I
NFORMATION
C
RITERION
(AIC):
Akaike proposed to choose the model that minimises
AIC = –
2
×
(Maximized loglikelihood) + 2
×
(number of parameters in the model)
=
n
+
n
ln
(
2
π
) +
n
ln
(
RSS
/
n
)
+ 2
p
R:
AIC =
n
ln
(
RSS
/
n
)
+ 2
p
B
ACKWARD
E
LIMINATION
> step(case1201.fit, direction = "backward")
Start: AIC= 311.88
SAT ~ log(TAKERS) + INCOME + YEARS + PUBLIC + EXPEND + RANK
Df Sum of Sq RSS AIC
 PUBLIC 1 20 21417 310
 INCOME 1 340 21737 311
<none> 21397 312
 log(TAKERS) 1 2150 23547 315
 YEARS 1 2532 23928 315
 RANK 1 2679 24076 316
 EXPEND 1 10964 32361 330
Step: AIC= 309.93
SAT ~ log(TAKERS) + INCOME + YEARS + EXPEND + RANK
Df Sum of Sq RSS AIC
 INCOME 1 505 21922 309
<none> 21417 310
 log(TAKERS) 1 2552 23968 313
 YEARS 1 3011 24428 314
