Name:
10-702 Statistical Machine Learning: Midterm Exam
March 4, 2010
Submit solutions to any three of the following ve problems. Clearly indicate
which problems you are submitting solutions for. Writ
Practice Problems: 10/36-702
1. Let (X1 , Y1 ), . . . , (Xn , Yn ) be iid. Suppose that X1 , . . . , Xn Unif(0, 1) and that
Yi = m(Xi ) + i
where 1 , . . . , n are iid with mean 0 and variance 2 and a
Random Matrix Theory
These notes are based on the following sources:
1. Introduction to the Non-asymptotic Analysis of Random Matrices by Roman Vershynin.
2. Error Bounds for Random Matrix Approximati
Linear Regression
We observe D = cfw_(X1 , Y1 ), . . . , (Xn , Yn ) where Xi = (Xi (1), . . . , Xi (d) Rd and Yi R.
For notational simplicity, we will always assume that Xi (1) = 1.
Given a new pair (
Nonparametric Classification
10/36-702
1
Introduction
Let h : X cfw_0, 1 to denote a classifier where X is the domain of X. In parametric classification we assumed that h took a very constrained form,
Concentration of Measure
1
Introduction
Often we need to show that a random quantity f (Z1 , . . . , Zn ) is close to its mean
(f ) = E(f (Z1 , . . . , Zn ). That is, we want a result of the form
P f
10/36-702: Minimax Theory
1
Introduction
When solving a statistical learning problem, there are often many procedures to choose from.
This leads to the following question: how can we tell if one stati
Bayes versus Frequentist
This lecture combines three blog posts that I wrote on this topic.
1
Adventures in FlatLand: Stones Paradox
Mervyn Stone is Emeritus Professor at University College London. He
Concentration of Measure
1
Introduction
Often we need to show that a random quantity f ( Z1 , . . . , Z n ) is close to its mean
( f ) = E( f ( Z1 , . . . , Z n ). That is, we want a result of the for
Modeling Basics: Assessment, Selection, and Complexity
Statistical Machine Learning, Spring 2015
Ryan Tibshirani (with Larry Wasserman)
1
You (should) already know this stu: statistical prediction
and
Nonparametric Bayesian Methods
1
What is Nonparametric Bayes?
In parametric Bayesian inference we have a model M = cfw_f (y|) : and data
Y1 , . . . , Yn f (y|). We put a prior distribution () on the
Practice Midterm
10/36-702
For Review Session on Monday Mar 2
(1) Let X1 , . . . Xn Unif(0, 1). Compute the bias and variance of the histogram density
estimator with binwidth h for this distribution.
10/36-702 Review
These are things you should know from 36-705 and 10-715.
1
Probability
P
X n 0 means that means that, for every > 0 P(| X n | > ) 0 as n . X n
Z means
that P( X n z) P( Z z) at all co
Undirected Graphical Models
10/36-702
Graphical models are a way of representing the relationships between features (variables).
There are two main brands: directed and undirected. We shall focus on u
Density Estimation
10/36-702 Spring 2015
1
Introduction
Let X 1 , . . . , X n be a sample from a distribution P with density p. The goal of nonparametric
density estimation is to estimate p with as fe
Nonparametric Regression
Statistical Machine Learning, Spring 2015
Ryan Tibshirani (with Larry Wasserman)
1
Introduction, and k-nearest-neighbors
1.1
Basic setup, random inputs
Given a random pair (X
Sparsity and the Lasso
Statistical Machine Learning, Spring 2014
Ryan Tibshirani (with Larry Wasserman)
1
Regularization and the lasso
1.1
A bit of background
If 2 was the norm of the 20th century, t
Clustering
10/26-702 Spring 2014
1
The Clustering Problem
In a clustering problem we aim to nd groups in the data. Unlike classication, the data are
not labeled, and so clustering is considered an exa
Concentration of Measure
1
Introduction
Often we need to show that a random quantity f (Z1 , . . . , Zn ) is close to its mean
(f ) = E(f (Z1 , . . . , Zn ). That is, we want a result of the form
P f
Nonparametric Regression
10/36-702
Larry Wasserman
1
Introduction
Now we focus on the following problem: Given a sample (X1 , Y1 ), . . ., (Xn , Yn ), where
Xi Rd and Yi R, estimate the regression fun
Homework 3 Solution
1. (a)
1
n
E(j thetaj ) = E
m(xi )j (xi ) + i j (xi ) j
i=1
n
1
n
=
n
1
m(xi )j (xi )
m(xi )j (xi )dx
0
i=1
Where we used linearity of expectation and the fact that i is the only
10702 Homework 2 Solution
Thanks to Akshay Krishnamurthy for providing his solution.
1
Convexity and Optimization
1. (Convexity)
(a) Well show that the second derivative of 1/g (x) is always positive,
36-702 Homework 1 Solution
Thanks to William Bishop and Rafael Stern for providing their solutions.
Problem 1
(a) Let n(j ) =
i Icfw_j (xi ),
L( ) = n(1)
2
n(2)
3
n(3)
6 11
6
n(4)
n(1)+n(2)+n(3) (6
10/36-702 Homework 2
Additional hints for problem 4
Yifei Ma
4 Let H be a Hilbert space of functions. Suppose that the evaluation functionals x f =
f (x) are continuous. Show that H is a reproducing k
10-702/36-702
Midterm Exam Solutions
March 2 2011
There are ve questions. You only need to do three. Circle the three questions
you want to be graded:
1
2
3
Name:
1
4
5
Problem 1: Let X1 , . . . , Xn
Lecture Notes 6
1
The Likelihood Function
Definition. Let X n = (X1 , , Xn ) have joint density p(xn ; ) = p(x1 , . . . , xn ; ) where
2 . The likelihood function L : ! [0, 1) is defined by
L() L(; x
Linear Regression
We observe D = cfw_(X1 , Y1 ), . . . , (Xn , Yn ) where Xi = (Xi (1), . . . , Xi (d) 2 Rd and Yi 2 R.
For notational simplicity, we will always assume that Xi (1) = 1.
Given a new pa
10-702/36-702
Statistical Machine Learning
Syllabus, Spring 2016
http:/www.stat.cmu.edu/larry/=sml
Lectures: Tuesday and Thursday 1:30 - 2:50 pm (HH B103)
Statistical Machine Learning is a second grad
Lecture Notes 8
1
Minimax Theory
Suppose we want to estimate a parameter using data X n = (X1 , . . . , Xn ). What is the
b 1 , . . . , Xn ) of ? Minimax theory provides a framework for
best possible
Lecture Notes 16
Model Selection
Not in the text except for a brief mention in 13.6.
1
Introduction
Sometimes we have a set of possible models and we want to choose the best model. Model
selection met