This preview shows pages 1–3. Sign up to view the full content.
Midterm Test for ST4240 Data Mining
(please answer all the questions for full marks. Please send your answer to
staxyc@nus.edu.sg
)
1. For
data A
(at
http://www.stat.nus.edu.sg/~staxyc/DM07testdata1.dat
), there are 5
predictors X1, … , X5 and response Y.
A Singleindex model (SIM) is suggested
Y = g(a1*X1+… +a5*X5) + e
A.
Estimate the model, plot the link function and its confidence band.
The estimated model is
Y = g(0.008595073X1 0.740091476X2 + 0.034182754X3
0.671579756X4 + 0.001703434X5)
The estimated function and its
95% confidence band are show in
Figure 1
2 1 0 1 2
0
5
10
15
xalpha
y
Figure 1
B.
which variables can be removed? Estimate the model again after removing the
variables
The estimated coefficients have SE respectively
0.02222250 0.01297481
0.02339194 0.01451642 0.02307927. By checking the “tstatistics ”
, we
can see that X1, X3 and X5 can be removed
C.
For a new X (X1=0, X2=0, X3 = 0, X4=0, X5=0), predict the function value
i.e. E(Ynew X) and calculate its 95% confidence interval.
Predict value is 1.023846, the 95% confidence interval is
[0.6610302,
1.386661]
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentCODE
xy = read.table("testdata1.dat")
x = data.matrix(xy[,1:5])
y = data.matrix(xy[,6])
source("sim.R")
out = sim(x, y)
out$alpha
out$se
xalpha = x %*% out$alpha
plot(xalpha, y)
I = order(xalpha)
lines(xalpha[I], out$predict[I])
lines(xalpha[I], out$Ln[I])
lines(xalpha[I], out$Un[I])
out = sim(x, y, xnew = c(0, 0, 0, 0, 0))
out$predict
out$Ln
This is the end of the preview. Sign up
to
access the rest of the document.
 Fall '09
 XIAYingcun

Click to edit the document details