6.867 Machine learning, lecture 7 (Jaakkola)
1
Lecture topics:
Kernel form of linear regression
Kernels, examples, construction, properties
Linear regression and kernels
Consider a slightly simpler model where we omit the oset parameter 0 , reducing the
6.867 Machine learning, lecture 9 (Jaakkola)
1
Lecture topics:
Kernel optimization
Model (kernel) selection
Kernel optimization
Whether we are interested in (linear) classication or regression we are faced with the
problem of selecting an appropriate ke
6.867 Machine learning, lecture 14 (Jaakkola)
1
Lecture topics:
margin and generalization
linear classiers
ensembles
mixture models
Margin and generalization: linear classiers
As we increase the number of data points, any set of classiers we are consi
6.867 Machine learning, lecture 13 (Jaakkola)
1
Lecture topics:
Boosting, margin, and gradient descent
complexity of classiers, generalization
Boosting
Last time we arrived at a boosting algorithm for sequentially creating an ensemble of
base classiers.
6.867 Machine learning, lecture 16 (Jaakkola)
1
Lecture topics:
Mixture of Gaussians (contd)
The EM algorithm: some theory
Additional mixture topics
regularization
stage-wise mixtures
conditional mixtures
Mixture models and clustering
Mixture of Ga
6.867 Machine learning, lecture 17 (Jaakkola)
1
Lecture topics:
Mixture models and clustering, k-means
Distance and clustering
Mixture models and clustering
We have so far used mixture models as exible ways of constructing probability models for
predict
6.867 Machine learning, lecture 8 (Jaakkola)
1
Lecture topics:
Support vector machine and kernels
Kernel optimization, selection
Support vector machine revisited
Our task here is to rst turn the support vector machine into its dual form where the exam
p
6.867 Machine learning, lecture 6 (Jaakkola)
1
Lecture topics:
Active learning
Non-linear predictions, kernels
Active learning
We can use the expressions for the mean squared error to actively select input points
x1 , . . . , xn , when possible, so as t
6.867 Machine learning, lecture 5 (Jaakkola)
1
Linear regression, active learning
We arrived at the logistic regression model when trying to explicitly model the uncertainty
about the labels in a linear classier. The same general modeling approach permits
6.867 Machine learning, lecture 20 (Jaakkola)
1
Lecture topics:
Hidden Markov Models (contd)
Hidden Markov Models (contd)
We will continue here with the three problems outlined previously. Consider having given
a set of sequences of observations y1 , . .
6.867 Machine learning, lecture 19 (Jaakkola)
1
Lecture topics:
Markov chains (contd)
Hidden Markov Models
Markov chains (contd)
In the context of spectral clustering (last lecture) we discussed a random walk over the
nodes induced by a weighted graph.
6.867 Machine learning, lecture 23 (Jaakkola)
1
Lecture topics:
Markov Random Fields
Probabilistic inference
Markov Random Fields
We will briey go over undirected graphical models or Markov Random Fields (MRFs) as
they will be needed in the context of p
6.867 Machine learning, lecture 22 (Jaakkola)
1
Lecture topics:
Learning Bayesian networks from data
maximum likelihood, BIC
Bayesian, marginal likelihood
Learning Bayesian networks
There are two problems we have to solve in order to estimate Bayesian
6.867 Machine learning, lecture 1 (Jaakkola)
1
Example
Lets start with an example. Suppose we are charged with providing automated access
control to a building. Before entering the building each person has to look into a camera so
we can take a still imag
6.867 Machine learning, lecture 2 (Jaakkola)
1
Perceptron, convergence, and generalization
Recall that we are dealing with linear classiers through origin, i.e.,
f (x; ) = sign T x
(1)
where Rd species the parameters that we have to estimate on the basis
6.867 Machine learning, lecture 21 (Jaakkola)
Lecture topics:
Bayesian networks
Bayesian networks
Bayesian networks are useful for representing and using probabilistic information. There
are two parts to any Bayesian network model: 1) directed graph over
6.867 Machine learning, lecture 3 (Jaakkola)
1
The Support Vector Machine
So far we have used a reference assumption that there exists a linear classier that has
a large geometric margin, i.e., whose decision boundary is well separated from all the
traini
6.867 Machine learning, lecture 4 (Jaakkola)
1
The Support Vector Machine and regularization
We proposed a simple relaxed optimization problem for nding the maximum margin sep
arator when some of the examples may be misclassied:
n
1
minimize 2 + C
t
2
t=1
6.867 Machine learning, lecture 18 (Jaakkola)
1
Lecture topics:
Spectral clustering, random walks and Markov chains
Spectral clustering
Spectral clustering refers to a class of clustering methods that approximate the problem
of partitioning nodes in a we