c 2016 Robert Nowak
Rademacher Complexity and Learning with Convex Loss Functions
1
Convex Losses
Suppose we have training data cfw_xi , yi ni=1 , a set of prediction rules F, and a loss function L. Empirical risk
minimization is the optimization
n
1X
min
Computational Learning Theory
www.cs.wisc.edu/~dpage/cs760/
1
Goals for the lecture
you should understand the following concepts
PAC learnability
consistent learners and version spaces
sample complexity
PAC learnability in the agnostic setting
the VC
Evaluating Machine Learning Methods
www.cs.wisc.edu/~dpage/cs760/
1
Goals for the lecture
you should understand the following concepts
test sets
learning curves
validation (tuning) sets
stratified sampling
cross validation
internal cross validation
Name: K 5 [/5
Problem 1 ~ SVM Learning by the SMO Algorithm (20 points)
You have the small dataset below that involves three features (A, B and C) and the Category (1
, for negative and +1 for positive). Show one update step of the SMO algorithm on this d
Instance-Based Learning
www.biostat.wisc.edu/~dpage/cs760/
Goals for the lecture
you should understand the following concepts
k-NN classification
k-NN regression
edited nearest neighbor
k-d trees for nearest neighbor identification
Nearest-neighbor cl
Decision Tree Learning
Goals for the lecture
you should understand the following concepts
the decision tree representation
the standard top-down approach to learning a tree
Occams razor
entropy and information gain
types of decision-tree splits
test
CS 731
Written Homework 2
Assigned Wed, Oct 21
Due Wed, Oct 28
1.
Given the Bayes net below, show the result of three cycles of the EM algorithm to update the
CPTs, using two data points: one with A=true, B=true, C=true, and D missing, and one with
A=true
1.
In the Bayesian Network below, the variables are A, B, C, D, E, and they are all Boolean. In the CPTs,
the notation P(a) denotes the probability that A is set to True, with similar meanings for P(b), P(c), etc.
Use variable elimination to determine the
Support Vector and Kernel Machines
Nello Cristianini BIOwulf Technologies [email protected] http:/www.support-vector.net/tutorial.html
ICML 2001
A Little History
z
z
z
z
SVMs introduced in COLT-92 by Boser, Guyon, Vapnik. Greatly developed ever sin
CHAPTER 1
GENERATIVE AND DISCRIMINATIVE
CLASSIFIERS:
NAIVE BAYES AND LOGISTIC REGRESSION
Machine Learning
Copyright c 2005, 2010. Tom M. Mitchell. All rights reserved.
*DRAFT OF January 19, 2010*
*PLEASE DO NOT DISTRIBUTE WITHOUT AUTHORS
PERMISSION*
This
Neural Networks and Deep Learning
www.cs.wisc.edu/~dpage/cs760/
1
Goals for the lecture
you should understand the following concepts
perceptrons
the perceptron training rule
linear separability
hidden units
multilayer neural networks
gradient descen
SVM by Sequential Minimal
Optimization (SMO)
www.cs.wisc.edu/~dpage
1
Quick Review
As last lecture showed us, we can
Solve the dual more eciently (fewer unknowns)
Add parameter C to allow some misclassicaAons
Rep
SPRING 2001
CS 731: ADVANCED ARTIFICIAL INTELLIGENCE
COMPUTER SCIENCES DEPARTMENT
UNIVERSITY OF WISCONSIN MADISON
Wednesday, May 1, 2002
11:00AM 12:30 PM
2534 Engineering Hall
The first five questions are worth 8 points each. The last five are worth 12 po
2016 Rebecca Willett
Stochastic Gradient Descent
In many machine learning and signal processing settings, we wish to solve an optimization problem of
the form
minimize f (w)
w
where the objective function can be decomposed as
f (w) =
n
X
fi (w).
i=1
For
2016 Rebecca Willett
Backpropagation in Neural Networks
Artificial neural networks can be used to learn predictors in a wide variety of machine learning settings.
The basic idea is take a feature vector x Rp , compute different weighted combinations of t
c 2016 Robert Nowak
Note on Proximal Gradient Algorithms
These notes consider optimization problems of the following form
min f (w) + c(w) ,
wRp
where the functions f and c are convex, and f is also differentiable. Special cases include ridge regression
a
c 2016 Robert Nowak
Note on Lasso
This is a short note is based on the analysis framework developed in [1]. Let w? be a s-sparse vector and
suppose that we observe
y = Xw? + ,
iid
where X is a known n p matrix with entries Xij N (0, 1) and is an unknown e
c 2016 Robert Nowak
Note on Cross-Validation
Let f be the minimizer of the regularized problem
(
)
N
1 X
min
L(yi , f (xi ) + c(f ) ,
f F
N i=1
(1)
where F is a class of predictors, L is a loss function (e.g., squared error, logistic loss, hinge loss, etc
Learning Bayesian Networks
www.biostat.wisc.edu/~page/cs760/
Goals for the lecture
you should understand the following concepts
the Bayesian network representation
inference by enumeration
variable elimination inference
junction tree (clique tree) inf
Decision Tree
MakeSubtree(set of training instances D)
C = DetermineCandidateSplits(D)
if stopping criteria met
make a leaf node N
determine class label/probabilities for N
else
make an internal node N
S = FindBestSplit(D, C)
for each outcome k of S
Dk =
Review of probability
www.biostat.wisc.edu/~page/cs760/
Goals for the lecture
you should understand the following concepts
definition of probability
random variables
joint distributions
conditional distributions
independence
union rule
Bayes theore
Decision Tree
MakeSubtree(set of training instances D)
C = DetermineCandidateSplits(D)
if stopping criteria met
make a leaf node N
determine class label/probabilities for N
else
make an internal node N
S = FindBestSplit(D, C)
for each outcome k of S
Dk =
University of Wisconsin Madison
Computer Sciences Department
CS 760 - Machine Learning
Spring 2003
Exam
7:15-9:15pm, May 6, 2003 Room 3345 Engineering Hall CLOSED BOOK (one sheet of notes and a calculator allowed)
Write your answers on these pages and sho
University of Wisconsin-Madison
Computer Sciences Department
CS 760 Machine Learning
Spring 1990
Midterm Exam
(ve pages of notes allowed)
100 points, 90 minutes May 1, 1990
Write your answers on these pages and show your work. If you feel that a question
University of Wisconsin Madison
Computer Sciences Department
CS 760 - Machine Learning
Fall 1999
Exam
7:15-9:15pm, December 14, 1999
Room 1240 CS & Stats
CLOSED BOOK
(one sheet of notes and a calculator allowed)
Write your answers on these pages and show
CS 760 - Homework 4
Out: 4/12/10
Due: 4/19/10
50 points
Consider the deterministic reinforcement environment drawn below. The numbers on the arcs
are the immediate rewards. Let the discount rate equal 0.8 and the probability of taking an
exploration step
Solution to CS760 HW 4 (Spring 2010)
1. Initial values: all Q=3.
L
Start
Q=3
R
a
R Q=3
b
L
Q=3
C
Q=3
C
L
Q=3
Q=3
Q=3
d
R
Q=3
end
C
Q=3
i) For the first episode: start->a->b->d->end, we have the following Q values:
Step 1: start->a. We have Q ( start , L)
University of Wisconsin-Madison
Computer Sciences Department
CS 760 Machine Learning
Fall 1998
Exam
(one page of notes and calculators allowed)
100 points, 105 minutes December 10, 1998
Write your answers on these pages and show your work. If you feel tha
University of Wisconsin-Madison
Computer Sciences Department
CS 760 Machine Learning
Fall 1997
Midterm Exam
(one page of notes allowed)
100 points, 90 minutes December 3, 1997
Write your answers on these pages and show your work. If you feel that a questi
University of Wisconsin-Madison
Computer Sciences Department
CS 760 Machine Learning
Spring 1995
Midterm Exam
(two pages of notes allowed)
100 points, 90 minutes May 3, 1995
Write your answers on these pages and show your work. If you feel that a question