10/36-702 Homework 2
Additional hints for problem 4
Yifei Ma
4 Let H be a Hilbert space of functions. Suppose that the evaluation functionals x f =
f (x) are continuous. Show that H is a reproducing kernel Hilbert space and nd the
kernel.
h1 Dual space
Su

Homework 3 Solution
1. (a)
1
n
E(j thetaj ) = E
m(xi )j (xi ) + i j (xi ) j
i=1
n
1
n
=
n
1
m(xi )j (xi )
m(xi )j (xi )dx
0
i=1
Where we used linearity of expectation and the fact that i is the only random quantity,
and it has mean 0.
We can lower bound

Homework 3
10-702/36-702 Statistical Machine Learning
Due: Friday Feb 18 3:00
Hand in to: Michelle Martin GHC 8001.
1
Nonparametric Regression
1.1
Theory
Consider the following nonparametric regression model:
Yi = m(xi ) + i ,
i = 1, . . . , n
where xi =

Copyright c 20082010 John Lafferty, Han Liu, and Larry Wasserman
Do Not Distribute
Chapter 27
Nonparametric Bayesian Methods
.
Most of this book emphasizes frequentist methods, especially for nonparametric problems. However, there are Bayesian approaches

Density Estimation
10/26-702 Spring 2014
1
Introduction
Let X 1 , . . . , X n be a sample from a distribution P with density p. The goal of nonparametric
density estimation is to estimate p with as few assumptions about p as possible. We denote
the estima

Chapter 12
Bayesian Inference
This chapter covers the following topics:
12.1
Concepts and methods of Bayesian inference.
Bayesian hypothesis testing and model comparison.
Derivation of the Bayesian information criterion (BIC).
Simulation methods and Marko

Clustering Part II:
k-means and Related Methods
Lets begin with a few examples.
Example 1 Figures 1 and 2 shows some synthetic examples where the clusters are meant to
be intuitively clear. In Figure 1 there are two blob-like clusters. Identifying cluster

Convexity and Optimization
Statistical Machine Learning, Spring 2014
Ryan Tibshirani (with Larry Wasserman)
1
An entirely too brief motivation
1.1
Why optimization?
Optimization problems are ubiquitous in statistics and machine learning. A huge number of

Density Clustering
10/26-702 Spring 2014
1
Modes and Clusters
Let p be the density of X Rd . Assume that p has modes m 1 , . . . , m k0 and that p is a Morse
function, which means that the Hessian of p at each stationary point is non-degenerate. We
can us

Assignment 2
10/36-702
Due Friday Feb 14 at 3:00 pm
Hand in to Mari-Alice McShane, Baker Hall 229K
1. Consider the kth order local polynomial regression estimate, trained on points (x1 , y1 ), . . . (xn , yn )
R R, which we know can expressed as
n
w i (x

4.
(a) Let X (n) denote max i=1,.,n X i . The likelihood
L( ) =
n
i =1
P(X i ; ) =
1
I( X (n) )
n
is a decreasing positive function on [X (n) , ) and 0 everywhere else. Therefore it is maxi
mized by taking n = X (n) .
(b) P(n > ) = 0 for all n so n(n ) ca

Assignment 3
10/36-702
Due Friday March 21 at 3:00 pm
Hand in to Mari-Alice McShane, Baker Hall 229K
2
1. Suppose that X i N( i , 1) for i = 1, . . . , n. We then say that W = n=1 X i has a non-central
i
2 distribution with n degrees of freedom and non-ce

Solutions to Assignment 2
10/36-702
1.
(a) For any polynomial q(x) of degree k we have q(x) = T b(x) where b(x) = (1, x, . . . , x k )T and
contains the coefcients. Then
n
n
b(x i )T w i (x) = w(x)T B = b(x)T (B T B)1 B T B = b(x)T = q(x).
q(x i )w i (x)

Assignment 4
10/36-702
Due Friday April 11 at 3:00 pm
Hand in to Mari-Alice McShane, Baker Hall 229K
1. Here well study convexity (and concavity) in exponential families and generalized linear models. Consider an exponential family density (or mass) funct

Assignment 1
10/36-702
Due Friday Jan 24 3:00 pm
Hand In to Mari-Alice McShane, Baker Hall 229K
1. Let X 1 , . . . , X n P and let = E[X i ] and 2 = Var(X i ). Dene
Xn =
1 n
X i,
n i=1
S2 =
n
1 n
(X i X n )2 .
n i=1
P
(a) Prove that S 2 2 .
n
(b) Prove th

Assignment 5
10/36-702
Due Friday May 2 at 3:00 pm
Hand in to Mari-Alice McShane, Baker Hall 229K
1. Given y Rn , X Rn p , consider the lasso problem
min
p
R
Rewrite this as
min
p
R
, zRn
1
y X
2
1
y z
2
2
2 +
2
2 +
1
1.
subject to z = X ,
(1)
and now, st

1. Let X1 , . . . Xn P where P has density p. Let p be the kernel density estimator with
kernel K and bandwidth h. Prove that the bias E[p(x)] p(x) is exactly
K(t)[p(x + th) p(x)]dt.
Assume now that |p(x) p(y)| L|x y|. Use this to get an upper bound on th

SML Recitation Notes Week 1
Concentration of Measure:
Min Xu
January 13, 2011
1
Moments and C. of M.
Theorem 1 (Markov Inequality). Let X be a non-negative random variable. Assume EX < . Then
P (X > )
EX
Example 1. Let X1 , .Xn be random variables. Assum

Homework 2
10-702/36-702 Statistical Machine Learning
Due: Friday Feb 4 3:00
Hand in to: Michelle Martin GHC 8001.
1
Convexity and Optimization
1. (Convexity)
(a) Show that 1/g(x) is convex if g is twice-dierentiable, concave and positive (hint:
use compo

10/36-702 Homework 1
Due: Friday 2/8/2013
Instructions: Hand in your homework to Michelle Martin (GHC 8001) before 3:00pm on
Friday 2/8/2013.
1. Suppose that E(V |Z) = 0 and that
h(Z) V h(Z) + c
for some h and some c > 0. Show that
2 c2 /8
E(etV |Z) et
.

10/36-702 Homework 5
Due Friday 5/3/2013
1
Graphical Models
1. Consider a p-dimensional Gaussian graphical model PX N (0, ) dened on X =
(X1 , . . . , Xp ). Let = 1 denote the precision matrix. In this problem, you will
show that ij = 0 if and only if Xi

Homework 4: 10-36/702 due April 19
1. Simulation
Let
1
1
X1 , . . . , Xn N (1 , 1) + N (2 , 1).
2
2
(a) Consider the prior
(1 , 2 ) 1.
Show that the posterior in improper (i.e. the posterior does not have nite integral).
Hint: we can write the model as:
Z

10/36-702 Homework 2
Due: Friday 3/1/2013
Instructions: Hand in your homework to Michelle Martin (GHC 8001) before 3:00pm on
Friday 3/1/2013.
1. Let X1 , . . . , Xn P where P has density p and 0 Xi 1. Find the asymptotic bias
of ph (0) where ph is the ker

Homework 3 10/36-702: Due March 22
1. Let X1 , . . . , Xn P where P has a density p on [0, 1]. Suppose that p P where
P = p : p 0,
p = 1, |p(y) p(x)| L |x y|, for all x, y [0, 1] .
1
(p(x) q(x)2 dx. Let Rn be the
We want to estimate the density p. Let d(p

SML Recitation Notes Week 6:
Information Lower Bound (Minimax)
Min Xu
February 22, 2011
1
Problem Set-up
Let P = cfw_P be a set of distributions parametrized by H, with support X
We dene an estimator to be a function : X n
Denition 1. The minimax risk

SML Recitation Notes Week 2:
Convexity
Min Xu
January 21, 2011
1
Geometry Fundamentals
We can describe a hyperplane in Rd as a normal vector h Rd and an offset x0 Rd .
If x is on the plane, then hT (x x0 ) = 0
If x is on one side of the plane, then hT (

Name:
10-702 Statistical Machine Learning: Midterm Exam
March 4, 2010
Submit solutions to any three of the following ve problems. Clearly indicate
which problems you are submitting solutions for. Write your answers in the space provided;
additional sheets

Undirected Graphical Models - Representation
What are undirected graphical models?
Consider a random vector X = (X1 , . . . , Xp ) with a multivariate distribution PX . An undirected
graphical model is a multivariate distribution together with an undirect

10-702/36-702
Midterm Exam Solutions
March 2 2011
There are ve questions. You only need to do three. Circle the three questions
you want to be graded:
1
2
3
Name:
1
4
5
Problem 1: Let X1 , . . . , Xn be a random sample where B Xi B for some nite
B > 0. Fo