Statistical Methods in Machine Learning
Cross validation and Bootstrap
April 15, 2015
1/22
Cross-validation and the Bootstrap
we discuss two resampling methods: cross-validation and the
bootstrap.
these methods ret a model of interest to samples formed fr

Statistical Methods in Machine Learning
Introduction
March 30, 2015
1/15
People
Instructor: Randy Lai (aka Chu Shing Lai)
TA: Chris Aden & Chunzhe Zhang
Lectures: MW 10:0011:30am
Oce hours: TBA
2/15
Course Description
Focus on linear and nonlinear statist

Statistical Methods in Machine Learning
GLMNET and Principal Component Analysis
April 20, 2015
1/28
The limitations of the lasso
If p > n, the lasso selects at most n variables. The number of
selected predictors is bounded by the number of samples.
Groupe

Statistical Methods in Machine Learning
K-Means
May, 18, 2015
1/23
From K-means to hierarchical clustering
Recall two properties of K-means (K-medoids) clustering:
It ts exactly K clusters (as specied)
Final clustering assignment depends on the chosen ini

Statistical Methods in Machine Learning
Statistical Learning
April 1, 2015
1/22
Admin Work
oce hours
a bad news and a good news
2/22
Motivating Example
income dataset
or
ity
e
Incom
rs
of
Se
ni
Ye
a
Ed
uc
ati
on
Can we predict Income using these two varia

Statistical Methods in Machine Learning
K-Means
May, 18, 2015
1/22
What is clustering? And why?
Clustering: task of dividing up data into groups (clusters), so that
points in any one group are more similar to each other than to
points outside the group
Wh

Statistical Methods in Machine Learning
Classification I
April 8, 2015
1/19
Classification
Classication is a predictive task in which the response takes values
across discrete categories (i.e., not continuous), and in the most
fundamental case, two catego

Statistical Methods in Machine Learning
Support vector machine
May, 13, 2015
1/22
Support Vector Machines
Here we approach the two-class classication problem in a direct way:
We try and nd a plane that separates the classes in feature space.
If we cannot,

Statistical Methods in Machine Learning
Boosting
May, 11s, 2015
1/19
Reminder: classification trees
Suppose that we are given training data (xi , yi ), i = 1, . . . , n, with
yi 1, . . . , K the class label and xi Rp the associated features
Recall that th

Statistical Methods in Machine Learning
Linear regression
April 6, 2015
1/16
Linear regression
Linear regression, also called the method of least squares, is an old
topic, dating back to Gauss in 1795 (he was 18!).
Linear regression is a simple approach t

Statistical Methods in Machine Learning
Bagging and Random Forest
May, 6, 2015
1/19
Review: CART
The full grown tree can be overtting
Pruning can be applied to reduce complexity of the nal classier as
well as improve predictive accuracy
Cross validation i

Statistical Methods in Machine Learning
Trees
April 29, 2015
1/25
Tree-based methods
Tree-based based methods for predicting y from a feature vector
x Rp divide up the feature space into rectangles, and then t a
very simple model in each rectangle. This w

Statistical Methods in Machine Learning
Kernel smoothing and GAM
April 29, 2015
1/18
Review: Smoothing splines
Smoothing spline minimizes
n
i=1
(yi g(xi )2 +
g (t)2 dt
The solution is a natural cubic spline, with a knot at every unique
value of xi . The

Statistical Methods in Machine Learning
PCA and Splines
April 20, 2015
1/25
Dimension Reduction
Let Z1 , Z2 , . . . , ZM represent M < p linear combinations of our
original p predictors. That is
p
Zm =
mj Xj
j=1
for some constants m1 , . . . , mp .
We can

Statistical Methods in Machine Learning
Classification II
April 13, 2015
1/18
Types of errors
False positive rate: The fraction of negative samples that are
classied as positive
False negative rate: The fraction of positive samples that are
classied as ne

Statistical Methods in Machine Learning
Model Selection
April 20, 2015
1/29
Reminder
Recall that we talked about:
1. Predictive ability: recall that we can decompose prediction error into
squared bias and variance. Linear regression has low bias (zero bia