Name:
10-702 Statistical Machine Learning: Midterm Exam
March 4, 2010
Submit solutions to any three of the following ve problems. Clearly indicate
which problems you are submitting solutions for. Write your answers in the space provided;
additional sheets
Clustering
10/26-702 Spring 2014
1
The Clustering Problem
In a clustering problem we aim to nd groups in the data. Unlike classication, the data are
not labeled, and so clustering is considered an example of unsupervised learning. In many
cases, clusterin
Sparsity and the Lasso
Statistical Machine Learning, Spring 2014
Ryan Tibshirani (with Larry Wasserman)
1
Regularization and the lasso
1.1
A bit of background
If 2 was the norm of the 20th century, then 1 is the norm of the 21st century . OK, maybe
that
Nonparametric Regression
Statistical Machine Learning, Spring 2015
Ryan Tibshirani (with Larry Wasserman)
1
Introduction, and k-nearest-neighbors
1.1
Basic setup, random inputs
Given a random pair (X, Y ) Rd R, recall that the function
f0 (x) = E(Y |X =
Density Estimation
10/36-702 Spring 2015
1
Introduction
Let X 1 , . . . , X n be a sample from a distribution P with density p. The goal of nonparametric
density estimation is to estimate p with as few assumptions about p as possible. We denote
the estima
Undirected Graphical Models
10/36-702
Graphical models are a way of representing the relationships between features (variables).
There are two main brands: directed and undirected. We shall focus on undirected graphical
models. See Figure 1 for an example
10/36-702 Review
These are things you should know from 36-705 and 10-715.
1
Probability
P
X n 0 means that means that, for every > 0 P(| X n | > ) 0 as n . X n
Z means
that P( X n z) P( Z z) at all continuity points z. X n = O P (a n ) means that, X n /a
Practice Midterm
10/36-702
For Review Session on Monday Mar 2
(1) Let X1 , . . . Xn Unif(0, 1). Compute the bias and variance of the histogram density
estimator with binwidth h for this distribution. Show that the optimal value of h is h = 1.
(2) Given sa
Nonparametric Bayesian Methods
1
What is Nonparametric Bayes?
In parametric Bayesian inference we have a model M = cfw_f (y|) : and data
Y1 , . . . , Yn f (y|). We put a prior distribution () on the parameter and compute the
posterior distribution using
Modeling Basics: Assessment, Selection, and Complexity
Statistical Machine Learning, Spring 2015
Ryan Tibshirani (with Larry Wasserman)
1
You (should) already know this stu: statistical prediction
and the bias-variance tradeo
Suppose that we observe (X,
Concentration of Measure
1
Introduction
Often we need to show that a random quantity f ( Z1 , . . . , Z n ) is close to its mean
( f ) = E( f ( Z1 , . . . , Z n ). That is, we want a result of the form
P f ( Z 1 , . . . , Z n ) ( f ) >
< .
(1)
Such result
Bayes versus Frequentist
This lecture combines three blog posts that I wrote on this topic.
1
Adventures in FlatLand: Stones Paradox
Mervyn Stone is Emeritus Professor at University College London. He is famous for
his deep work on Bayesian inference as w
10-702/36-702
Midterm Exam Solutions
March 2 2011
There are ve questions. You only need to do three. Circle the three questions
you want to be graded:
1
2
3
Name:
1
4
5
Problem 1: Let X1 , . . . , Xn be a random sample where B Xi B for some nite
B > 0. Fo
10/36-702 Homework 2
Additional hints for problem 4
Yifei Ma
4 Let H be a Hilbert space of functions. Suppose that the evaluation functionals x f =
f (x) are continuous. Show that H is a reproducing kernel Hilbert space and nd the
kernel.
h1 Dual space
Su
36-702 Homework 1 Solution
Thanks to William Bishop and Rafael Stern for providing their solutions.
Problem 1
(a) Let n(j ) =
i Icfw_j (xi ),
L( ) = n(1)
2
n(2)
3
n(3)
6 11
6
n(4)
n(1)+n(2)+n(3) (6 11 )n(4)
Thus, there exists a constant k such that:
l(
10702 Homework 2 Solution
Thanks to Akshay Krishnamurthy for providing his solution.
1
Convexity and Optimization
1. (Convexity)
(a) Well show that the second derivative of 1/g (x) is always positive, which implies that
1/g (x) is convex. First, the secon
Homework 3 Solution
1. (a)
1
n
E(j thetaj ) = E
m(xi )j (xi ) + i j (xi ) j
i=1
n
1
n
=
n
1
m(xi )j (xi )
m(xi )j (xi )dx
0
i=1
Where we used linearity of expectation and the fact that i is the only random quantity,
and it has mean 0.
We can lower bound