Course: STAT W4240
Title: Data Mining
Semester: Fall 2015
Quiz 0: 2015/09/09
Explanation
As listed in the syllabus, prerequisites for this course include a previous course in statistics, elementary probability, multivariate calculus, linear algebra. This

Homework 4
Statistics W4240: Data Mining
Columbia University, Fall 2015
Due Wednesday, November 11
For your .R submission, submit a le for each question labeled hw04 q1.R and so on. The
write up should be saved as a .pdf of size less than 8MB. DO NOT subm

1. Why is the predictor on line 10 described as being a nearest-neighbor
approach to prediction?
(a) Because the new value of Y is predicted by the average of the Y
values in the sample whose Y values are nearest to the new value
of Y .
(b) Because it was

Support Vector Machines
Remember
Irreducible error
Variance of es8mated condi8onal expecta8on
Squared bias in es8mate of condi8onal
expecta8on
But that was really all for squared error loss
Classica8on with -1/1 ou

Classication
Logistic Regression
Bayes Rules for Classication
Lecture 7
Discriminant Analysis
Classication
Logistic Regression
Bayes Rules for Classication
Classication
Categorical (sometimes only dichotomous) outcome Y
Predictors X
See X = x, predict

Supervised Learning with Squared Error Loss
A numerical example with small n and small number of predictors.
Lecture 3
Classication
Supervised Learning with Squared Error Loss
A numerical example with small n and small number of predictors.
Classication
S

STAT 4240.002, .003, and .D04
First Problems Set
Section .002, Due September 22nd by 5:00 PM E.S.T. in my mailbox
or by hand before the end of lecture
Section .003, Due September 22nd by 5:00 PM E.S.T. in my mailbox
Section .D04 Due September 22nd by e

Help for problem 3 on the first assignment.
No matter what version of SAS you are using, the web-based, one youve
downloaded, one in the labs, one that youve bought, you will want to locate the
program window.

1. Fit a simple linear regression o the data in which each predictor appears as a
main effect in the regression. Compute the fitted values. Do any of the fitted
values fall outside of the range from 0 to 1?

Here is some material on running linear regressions in SAS.
If you would like to follow along, the example data is the same as was used in the
first homework. It may be accessed in SAS OnDemand via the followi

PROBLEM 2 HELP
A procedure in SAS that implements the K-means approach to unsupervised learning is PROC
FASTCLUS. Here are some examples using the SENIC data set in which clusters are fit to
some variables, and

Let Y be a normal vector with expectation and covariance function 2 I. Let P be
a k-dimensional projection. And consider
|P Y |2 .
We may write the sum of squares of the projection as
|P ( + )|2 = 2 |P + P /|2 ,
where is a vector of independent standard n

These hints are based on Fishers Iris data. The data are on courseworks and our
site on SAS OnDemand.
To access the data in SAS OnDemand, you might use code like this.
LIBNAME classlib '/courses/de3a5b65ba27fe3

1. Page 180. Look at equation (5.2), and note that as the leverage increases, the
differences between outcome and fitted value is up-weighted. Does this
make sense intuitively? Why?
The nave estimate of ri

1. Use Enterprise Miner to develop a support vector machine predictor with the
hwdata data set. Explore a few of the options available. Leave out 30% of
your data while you decide on a final support vector mach

Consider the usual set-up, in which we have a random sample of training
data consisting of n predictors Xi and corresponding outcomes Yi . Ultimately,
we are going to use new predictors to estimate new outcomes, and we measure
the loss associated with a p

PROC GAM DATA=data;
MODEL v4 = SPLINEWS) Spline(V7);
RUN;
Try several different choices of the effective degrees of freedom. Also
try cross-validation.
%MACRO m2(df1, (1:2);
PROC GAM DATA=data;
MODEL v4 = SPLINEW5,DF=&df‘1) Spline(V7,DF=&df2);
RUN;
%MEND;

2 F 36 21
4 F 31 31
17 F 64 122
17 F 28 7
18 F 69 159
24 F 62 193
25 F 67 126
26 F 61 276
25 F 62 71
27 F 56 175
30 F 71 425
30 F 64 146
31 F . .
32 F 68 186
32 F 64 111
32 F 64 205
33 F 63 131
32 F 68 165
34 F 69 185
34 F 66 141
34 F 68 205
34 F 69 296
3

Linear algebra
Probability theory
Lecture 2
Statistical inference
Linear algebra
Probability theory
Statistical inference
Let X be the n by 1 matrix (really, just a vector) with all entries equal
to 1,
0 1
1
B 1 C
B C
X = B . C.
@ . A
.
1
And consider the

The data to be used for this problem set may be found in courseworks as a flat file,
and also in SAS OnDemand in our course site both as a flat file and as a SAS data set.
There are four predictors and a

Here is something that I hope will help.
Suppose you have a population where the probability that Y = 1 is p
and the conditional density of X given Y = y is fy (x). And suppose that
you t a logistic regression and nd that
eg(x)
.
1 + eg(x)
P (Y = 1|X = x)

STAT 4240.002, .003, and .D04
First Problems Set
Section .002, Due September 22nd by 5:00 PM E.S.T. in my mailbox
or by hand before the end of lecture
Section .003, Due September 22nd by 5:00 PM E.S.T. in my mailbox
Section .D04 Due September 22nd by e

STAT 4240.002, .003, and .D04
Second Problems Set
Section .002, Due September 29nd by 5:00 PM E.S.T. in my physical
mailbox on the 10th oor of the SSW building - or by hand before the
end of lecture.
Section .003, Section .002, Due September 29nd by 5:0

1. The rst problem asks us to nd the c that maximizes
cT c
subject to the stricture that ct c = 1.
We know something about this problem: in the lecture notes, we found
that the local maxima and minima are given by the eigen-vectors of
- at least when is

F r id a y , S e p te m b e r 2 5 , 2 0 1 5 0 5 :4 9 :1 1 P M
T h e R E G P ro c e d u re
M o d e l: M O D E L 1
D e p e n d e n t V a r ia b le : H E IG H T
N u m b e r o f O b s e r v a t io n s R e a d
8 1 5
N u m b e r o f O b s e r v a t io n s U s e

First consider simply sampling from the joint distribution of X and Y
and tting a logistic regression model. Lets suppose this gives us one subsample of size n with Y = 1 and a sample of size m with Y = 0. When you
t that logistic regression, you will be

F r id a y , O c to b e r 2 , 2 0 1 5 0 4 :4 3 :5 4 P M
T h e R E G P ro c e d u re
M o d e l: M O D E L 1
D e p e n d e n t V a r ia b le : H E IG H T
N u m b e r o f O b s e r v a t io n s R e a d
8 1 5
N u m b e r o f O b s e r v a t io n s U s e d
8 0