CS 294 - Homework 3 - Homework 3
CS 294
CS 294 - Homework 2
October 15, 2009
October 2, 2006
October 15, 2009
If you have questions, contact Alexandre Bouchard ([email protected]) ([email protected]) for part 1
If you have questions, contact
Foundations and Trends R in
Machine Learning
Vol. 1, Nos. 12 (2008) 1305
c 2008 M. J. Wainwright and M. I. Jordan
DOI: 10.1561/2200000001
Graphical Models, Exponential Families, and
Variational Inference
Martin J. Wainwright1 and Michael I. Jordan2
1
2
De
CS 294-34 Homework 4
Due: Thursday, November 12, 2009
Part 1: Collaborative Filtering
Preliminaries The collaborative ltering section of the assignment will make use of the recently released
machine learning benchmarking site, MLcomp. Before tackling the
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Multi-armed bandit algorithms.
Concentration inequalities.
P(X ) exp( ().
Cumulant generating function bounds.
Hoeffdings inequality for sub-Gaussian random variables.
U
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Discrete decision problems with partial monitoring
Denition: loss and feedback. Stochastic and adversarial.
Examples.
Minimax regret: algorithms and lower bounds.
1
Discre
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Adversarial bandits
Denition: sequential game.
Lower bounds on regret from the stochastic case.
Exp3: exponential weights strategy.
1
Adversarial bandits
Repeated game: st
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Gittins Index:
Discounted, Bayesian (hence Markov arms).
Reduces to stopping problem for each arm.
Interpretation as (scaled) equivalent lump sum.
Computation.
1
Gittins
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Minimax regret bounds
Upper bounds: worst case over j .
Lower bounds.
1
Pseudo-regret
Recall
n
Rn (P ) =
n
max E
j =1,.,k
t=1
Xj ,t
t=1
n
XIt ,t = n E
XIt ,t .
t=1
We have
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Lower bounds on regret for multi-armed bandits.
1
Stochastic bandit problem: notation.
k arms.
Arm j has unknown reward distribution Pj , for j .
Reward: Xj,t Pj .
Mean r
CS294-34 Homework 2
The homework is due on October 8. Please submit a PDF on bSpace. If you have questions on the
clustering questions, email Sriram; for the dimensionality reduction ones, email Percy.
This assignment will be done entirely in R. You will
Stat 260/CS 294-102. Learning in Sequential Decision
Problems.
Peter Bartlett
1. Contextual bandits.
Bandits with side information.
Model assumptions versus comparison class.
Woodroofe/Sarkar one-armed bandit with side information.
|X | distinct bandi