CSE 250B: Machine learning
Fall 2016
Worksheet 6 Algorithms for regression and classification
This worksheet covers the following topics:
Least-squares and regularized least-squares regression.
Isotonic regression.
Variants of the perceptron algorithm.
Natural Language Processing
(2)
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
1
Outline
Lexicons and Lexical Analysis
Lexicon: A Language Resource
A Lexicon for English Words: WordNet
2
Lexic
CSE 250a. Assignment 8
Out: Tue Nov 15
Due: Tue Nov 22 in class
Reading: Sutton & Barto, Chapters 1-4.
8.1
Policy improvement
Consider the Markov decision process (MDP) with three states s cfw_1, 2, 3, two actions a cfw_, , discount
factor = 43 , and thes
Informative projection
Suppose we wanted just one feature for the following data.
Informative projections
CSE 250B
We could pick a single coordinate.
Or an arbitrary direction.
A good choice: the direction of maximum variance.
Two types of projection
Pr
Beyond projections
PCA and SVD find informative linear projections. Given a data set in Rp ,
and a number k < p, they:
Beyond projections
Find orthogonal directions u1 , . . . , uk Rp
Approximate points in Rp by their projection into the subspace
spanne
Multiclass classification
Richer output spaces
We have mostly discussed binary classification problems, with |Y| = 2.
Do the methods weve studied generalize to cases with k > 2 labels?
Nearest neighbor?
Generative models?
CSE 250B
Linear classifiers?
L
Recall: Perceptron
Input space X = Rp , label space Y = cfw_1, 1
w =0
while some (x, y ) is misclassified:
w = w + yx
Kernels
2
5
7
CSE 250B
+1
4
-1
8
6
3
1
Separator: w = x (1) + x (6)
Deviations from linear separability
Systematic inseparability
In t
Matrix-vector notation
Some linear algebra background
CSE 250B
Vector x Rp and matrix M Rr p :
x1
M11
x2
M21
x = x3 , M = .
.
.
.
Mr 1
xp
Transpose x T and M T Rpr :
x T = x1
x2
xp , M T
M12
M22
.
.
.
.
Mr 2
M11
M12
= M13
.
.
M1p
M1p
M2p
. .
Natural Language Processing
(3)
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
1
Outline
Lexicons and Lexical Analysis
Generative Lexicon (1)
2
Lexicons and Lexical Analysis (31)
Generative Le
Natural Language Processing
(1)
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
Course Info 1/2
Curriculum Venue and Timeslot
Time: The 3th and 4th classes, Tuesday morning,
The 7th and 8th cla
Natural Language Processing
Syntactic Parsing
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
Revised from
Roxana Girju (University of Illinois at Urbana-Champaign )
Overview
An introduction to
Natural Language Processing
(5)
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
1
Outline
Lexicons and Lexical Analysis
Collocation
Hypothesis Testing
T Test
Mutual Information
2
Lexicons and L
Opinion Mining on YouTube
Aliaksei Severyn1 , Alessandro Moschitti3,1 ,
Olga Uryupina1 , Barbara Plank2 , Katja Filippova4
1
DISI - University of Trento, 2 CLT - University of Copenhagen,
3
Qatar Computing Research Institute, 4 Google Inc.
severyn@disi.un
Weakly Supervised User Profile Extraction from Twitter
Jiwei Li1 , Alan Ritter2 , Eduard Hovy1
1 Language
Technology Institute, 2 Machine Learning Department
Carnegie Mellon University, Pittsburgh, PA 15213, USA
bdlijiwei@gmail.com, rittera@cs.cmu.edu, eh
Weakly Supervised User Profile Extraction from Twitter
Jiwei Li1 , Alan Ritter2 , Eduard Hovy1
1 Language
Technology Institute, 2 Machine Learning Department
Carnegie Mellon University, Pittsburgh, PA 15213, USA
bdlijiwei@gmail.com, rittera@cs.cmu.edu, eh
Linguistic Regularities in Continuous Space
Word Representations
Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig
Microsoft Research, Redmond
NAACL 2013
Abstract
Neural network language model
and distributed representation for words (Vector
representation)
Ca
Computational Linguistics:
the Literature Resources
Zhao Hai
Department of Computer Science and Engineering,
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
The Research
Community or System
Publish or Perish
Publishing Impact: the index to evalu
Natural Language Processing
(4)
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
1
Outline
Lexicons and Lexical Analysis
Finite State Models and Morphological Analysis
Collocation
2
Lexicons and
The decision boundary
More linear classification
CSE 250B
Decision boundary in Rp is a hyperplane.
How is this boundary parametrized?
How can we learn a hyperplane from training data?
Hyperplanes
Homogeneous linear separators
Hyperplanes that pass throu
Nearest neighbor classification
Given a labeled training set (x (1) , y (1) ), . . . , (x (n) , y (n) ).
Example: the MNIST data set of handwritten digits.
Nearest neighbor classification
CSE 250B
To classify a new instance x:
Find its nearest neighbor a
CSE 250B: Machine learning
Fall 2016
Homework 3 Coordinate descent
Overview
In this homework we consider a standard unconstrained optimization problem:
min L(w)
where L() is some cost function, and w Rp . In class, we looked at several approaches to solvi
CSE 250B: Machine learning
Fall 2016
Homework 2 Sparse generative models
Overview
The multinomial naive Bayes model is a quick-and-dirty way to do text classification. In some situations, it
would be helpful to have a sparse version of this model that is,
CSE 250B: Machine learning
Fall 2016
Homework 1 Prototype selection for nearest neighbor
One way to speed up nearest neighbor classification is to replace the training set by a carefully chosen
subset of prototypes.
Think of a good strategy for choosing p
CSE 250B: Machine learning
Fall 2016
Worksheet 4 Unconstrained convex optimization
This worksheet covers the following topics:
Decision boundaries for Gaussian generative models.
Convex functions and sets.
Deriving gradient descent and Newton-Raphson u
CSE 250B: Worksheet 2 Solutions
1. Bayes optimality in a multi-class setting. The Bayes-optimal classifier predicts the label that is most
likely:
h (x) = arg max i (x)
i|Y|
2. Classification with an abstain option. The classifier
exceeds :
abstain
h (x)
CSE 250B: Worksheet 4 Solutions
1. Linear decision boundary. The positive side of the boundary is shaded.
2. Decision boundaries for Gaussian classes.
1 0
1
and 1 =
(a) 1 =
0 1
0
1
1 0
2 =
and 2 =
0 1
0
The equation for this boundary is x = 0.
1/4 0
CSE 250B: Machine learning
Fall 2016
Worksheet 5 Convex optimization
This worksheet covers the following topics:
Norms.
Perceptron and SVM.
Writing an optimization problem as a convex program.
1. Norms. In class, we talked about `p norms on Rp , which
CSE 250B: Machine learning
Fall 2016
Worksheet 3 The multivariate Gaussian
This worksheet covers the following topics:
Basics of linear algebra: dot products, orthogonality, positive semidefiniteness, eigenvalues and eigenvectors.
The multivariate Gauss
Natural Language Processing
(6)
Zhao Hai
Department of Computer Science and Engineering
Shanghai Jiao Tong University
zhaohai@cs.sjtu.edu.cn
Revised from
Joshua Goodman (Microsoft Research) and
Michael Collins (MIT)
1
Outline
(Statistical) Language Model