Cross validation and Bootstrap
April 15, 2015
Cross-validation and the Bootstrap
we discuss two resampling methods: cross-validation and the
bootstrap.
these methods ret a model of interest to samples formed fr

Introduction
March 30, 2015
People
Instructor: Randy Lai (aka Chu Shing Lai)
TA: Chris Aden & Chunzhe Zhang
Lectures: MW 10:0011:30am
Oce hours: TBA
Course Description
Focus on linear and nonlinear statist

GLMNET and Principal Component Analysis
April 20, 2015
The limitations of the lasso
If p > n, the lasso selects at most n variables. The number of
selected predictors is bounded by the number of samples.
Groupe

K-Means
May, 18, 2015
From K-means to hierarchical clustering
Recall two properties of K-means (K-medoids) clustering:
It ts exactly K clusters (as specied)
Final clustering assignment depends on the chosen ini

Statistical Learning
April 1, 2015
Admin Work
oce hours
a bad news and a good news
Motivating Example
income dataset
K-Means
May, 18, 2015
What is clustering? And why?
Clustering: task of dividing up data into groups (clusters), so that
points in any one group are more similar to each other than to
points outside the group
Wh

Classification I
April 8, 2015
Classification
Classication is a predictive task in which the response takes values
across discrete categories (i.e., not continuous), and in the most
fundamental case, two catego

Support vector machine
May, 13, 2015
Support Vector Machines
Here we approach the two-class classication problem in a direct way:
We try and nd a plane that separates the classes in feature space.
If we cannot,

Boosting
May, 11s, 2015
Reminder: classification trees
Suppose that we are given training data (xi , yi ), i = 1, . . . , n, with
yi 1, . . . , K the class label and xi Rp the associated features
Recall that th

Linear regression
April 6, 2015
Linear regression
Linear regression, also called the method of least squares, is an old
topic, dating back to Gauss in 1795 (he was 18!).
Linear regression is a simple approach t

Bagging and Random Forest
May, 6, 2015
Review: CART
The full grown tree can be overtting
Pruning can be applied to reduce complexity of the nal classier as
well as improve predictive accuracy
Cross validation i

Trees
April 29, 2015
Tree-based methods
Tree-based based methods for predicting y from a feature vector
x Rp divide up the feature space into rectangles, and then t a
very simple model in each rectangle. This w

Kernel smoothing and GAM
April 29, 2015
Review: Smoothing splines
Smoothing spline minimizes
n
i=1
(yi g(xi )2 +
g (t)2 dt
The solution is a natural cubic spline, with a knot at every unique
value of xi . The

PCA and Splines
April 20, 2015
Dimension Reduction
Let Z1 , Z2 , . . . , ZM represent M < p linear combinations of our
original p predictors. That is
p
Zm =
mj Xj
j=1
for some constants m1 , . . . , mp .
We can

Classification II
April 13, 2015
Types of errors
False positive rate: The fraction of negative samples that are
classied as positive
False negative rate: The fraction of positive samples that are
classied as ne

Model Selection
April 20, 2015
Reminder
Recall that we talked about:
1. Predictive ability: recall that we can decompose prediction error into
squared bias and variance. Linear regression has low bias (zero bia