Homework 1
Statistics W4240: Data Mining
Columbia University
Due Dates
T/Th section: due on Jan 29th, 2015
M/W section: due on Feb 2nd, 2015
For material from the James book, print out what appears on
Random projection trees and low dimensional manifolds
Sanjoy Dasgupta
Yoav Freund
UC San Diego
UC San Diego
[email protected][email protected]
ABSTRACT
q
q
We present a simple variant of the k-d
Homework 4
Statistics W4240: Data Mining
Columbia University, Fall 2015
Due Wednesday, November 11
For your .R submission, submit a le for each question labeled hw04 q1.R and so on. The
write up shoul
Homework 5
Statistics W4240: Data Mining
Columbia University, Fall 2015
Due Monday, November 30
For your .R submission, submit a le for each question labeled hw05 q1.R and so on. The write
up should b
Homework 6
Statistics W4240: Data Mining
Columbia University, Fall 2015
Due Wednesday, November 9
For your .R submission, submit les for each question labeled hw06 q1.R and so on. The
write up should
Homework 2
Statistics W4240: Data Mining
Columbia University, Fall 2015
Due Wednesday, October 7
For your .R submission, submit a le for each question labeled hw02 q1.R, and so on. The write
up should
Statistics W4240: Data Mining
Columbia University
Fall 2015
Version: November 25, 2015. The syllabus is subject to change, so look for the version with
the most recent date.
Course Description
Massive
Data Mining
W4240
Prof. Rahul Mazumder
Columbia Statistics
Lecture Handout #9
Outline
Broadening Linear Regression
Polynomial Regression
Some Pitfalls
Nonlinearity
Heteroscedasticity
Outliers
Collinea
General guidelines:
*) The test is cumulative. More weight will be placed on the material after Midterm 1.
*) You will be tested based on your conceptual understanding of the material
*) I will not as
General guidelines:
*) You will be tested based on your conceptual understanding of the material
*) I will not ask you R coding examples, R functions, etc
*) There will be minor computational exercise
General guidelines:
*) The test is cumulative. More weight will be placed on the material after Midterms 1-2.
*) You will be tested based on your conceptual understanding of the material
*) I will not
Homework 2
Statistics W4240: Data Mining
Columbia University
M/W Class: Due date Feb 25th (before class starts)
T/Th Class: Due date Feb 24th (before class starts)
For your .R submission, submit a fil
Homework 3
Statistics W4240: Data Mining
Columbia University
Due Date M/W section: March 30, (before class starts)
Due Date T/Th section: March 31, (before class starts)
Note:
1. The teaching staff ha
Homework 4
Statistics W4240: Data Mining
Columbia University
Due date: M/W class, April 22 (before class)
Due date: T/R class, April 21 (before class)
Note:
1. Please try to be punctual in submitting
Homework 5 (Final Homework)
Statistics W4240: Data Mining
Columbia University
Due date: T/Th Class May 12th, 7:40 PM
Due date: M/W Class May 11th, 1:10 PM
Problem 1. (10 Points) James 6.8.1
Problem 2.
Course: STAT W4240
Title: Data Mining
Semester: Spring 2015
Quiz 0
Explanation
Turn in this work on or before Jan 24th, 5 PM on courseworks. Submissions are
all electronic. No other form of submission
Midterm Exam
STAT W4240 Section 001: Data Mining
March 11, 2014
Explanation
This exam is to be done in-class. You have 75 minutes to complete the entirety. All solutions should
be written in the accom
Journal of Machine Learning Research 15 (2014) 1929-1958
Submitted 11/13; Published 6/14
Dropout: A Simple Way to Prevent Neural Networks from
Overfitting
Nitish Srivastava
Geoffrey Hinton
Alex Krizhe
Downloaded 10/14/16 to 68.173.10.215. Redistribution subject to SIAM license or copyright; see http:/www.siam.org/journals/ojsa.php
Density-Based Clustering Validation
Davoud Moulavi
Pablo A. Jaskowia
Some notation
f (x)
the conditional expectation of Y given X .
Why estimate conditional expectations, conditional
modes?
I
One answer has to do with prediction.
What is prediction?
I
You are going to
Linear Regression
Linear regression models are a class of models for the
conditional expectation of Y given X .
Models in which the conditional expectation is a linear function
of the parameters.
As w
ANOVA
n
(Yi [0 + 1 Xi1 + . . . + j1 Xij1 ])2
i=1
The OLS
Also, n times the naieve estimator of expected mean squared
error.
ANOVA
Partitioning the sum of squares
n
SSE
(Yi Yi )2
=
i=1
n
(Yi Y )2
SSR =
Course
Data Mining
STAT S4240.001
503 Hamilton
MTuWTh 10:45 AM - 12:20 PM
Instructor
Daniel Rabinowitz
1014 School of Social Work Building
(212) 851-2141
[email protected]
Teaching Assistant
Yixin
First Problem Set
1. In the lecture notes, it says that we will be looking at models and
methods for supervised and unsupervised problems.
(a) What is the dierence between a model and a method?
A stat
Exercises to prepare for the rst day.
Look up or remember the denitions of
Expectation
Variance
Covariance
Conditional distribution
Conditional expectation
Conditional variance
Independent random vari
Second Problem Set
1. This exercise is to work through in some detail the bias-variance tradeo calculations in a specic example. In this example, the predictor
is one-dimensional, and takes values in