Kernel Regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Linear smoothers and kernels
Recall our basic setup: we are given i.i.d. samples (xi , yi ), i = 1, . . . n from the model
yi = r(xi ) + i ,
i = 1, . . . n,
and our goal i
Error and Validation
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Testing and training errors
1.1
Setup and motivation
Lets suppose we have a function for predicting Y from X, lets call it r. Further suppose that
r was t on n training
Direct Inference with Linear Smoothers
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Review of linear smoothers
Given samples (xi , yi ), i = 1, . . . n, recall that a linear smoother is an estimator for the underlying regression funct
Introduction and Regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Course logistics
Instructor: Ryan Tibshirani, TAs: Robert Lunde, Sonia Tardova
See course website: http:/www.stat.cmu.edu/~ryantibs/advmethods/ for syllabus, oc
Degrees of Freedom
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Degrees of freedom
1.1
Motivation
So far weve seen several methods for estimating the underlying regression function r(x) =
E(Y |X = x) (linear regression, k-nearest-neig
The Truth About Linear Regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Linear regression review
1.1
Model basics and assumptions
Recall our model building block from last time:
Y = r(X1 ) + ,
where E() = 0 and is independent o
The Bootstrap
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
The bootstrap
1.1
Basic idea
The bootstrap is one of the most general and the most widely used tools to estimate measures
of uncertainty associated with a given statistical me
Smoothing Splines
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Splines, regression splines
1.1
Splines
Smoothing splines, like kernel regression and k-nearest-neigbors regression, provide a exible
way of estimating the underlying regr
Additive Models
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Nonparametric smoothing in multiple dimensions
1.1
Nonparametric review in one dimension
Assume for know that X R. A model of the form
Y = r(X) + ,
where we dont make any as
Other Dimension Reduction Techniques
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Classical multidimensional scaling
1.1
PCA and SVD
Recall that last time we learned principal component analysis (PCA) applied to a data matrix
X Rnp wi
Clustering
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
Introduction to clustering
Clustering is the task of dividing up data points into groups or clusters, so that points in any
one group are more similar to each other than to points
Midterm Exam 1
Advanced Methods for Data Analysis (36-402/36-608)
Due Thursday March 6, 2014 at 11:59pm
Instructions: you will submit this take-home midterm exam in three parts.
1. Writeup. This will be a complete writeup, in full data analysis report for
Logistic Regression
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Classication
1.1
Introduction to classication
Classication, like regression, is a predictive task, but one in which the outcome takes only
values across discrete categor
Generalized Linear Models
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Generalized linear models
1.1
Introduction: two regressions
So far weve seen two canonical settings for regression. Let X Rp be a vector of predictors.
In linear r
Principal Component Analysis
Advanced Methods for Data Analysis (36-402/36-608)
Spring 2014
1
Unsupervised learning
1.1
Supervised versus unsupervised
Up until this point, weve been working in a setting in which weve been given pairs (xi , yi ),
i = 1, .