Natural Language Processing:
Background and Overview
Regina Barzilay and Michael Collins
EECS/CSAIL
September 8, 2005
Course Logistics
Instructor
Regina Barzilay, Michael Collins
Classes
Tues&Thurs 13:0014:30
Natural Language Processing:Background and Ove
6.864, Fall 2005: Problem Set 5
Total points: 140 regular points
Due date: 5pm, 29 November 2005
Late policy: 5 points off for every day late, 0 points if handed in after 5pm on 3 December 2005
Question 1a (25 points)
Say that we have used IBM model 2 to
6.864: Lecture 9 (October 5th, 2005)
Log-Linear Models
Michael Collins, MIT
The Language Modeling Problem
wi is the ith word in a document
Estimate a distribution P (wi |w1 , w2 , . . . wi1 ) given previous
history w1 , . . . , wi1 .
E.g., w1 , . . . ,
Lexical Semantics
Regina Barzilay
MIT
October, 5766
Last Time: Vector-Based Similarity
Measures
man
woman
grape
orange
apple
Euclidian: | | = | | =
x, y
xy
Cosine: cos( ) =
x, y
y
x
| |y |
x
n
i=1 (xi
n
= n
y i )2
xy
i
in
i=1
x2
i=1 i
i=1
2
yi
Last Ti
Lexical Semantics
Regina Barzilay
MIT
July, 2005
Today: Semantic Similarity
Its not pinin, its passed on! This parrot is no more!
It has ceased to be! Its expired and gone to meet its
maker! This is a late parrot! Its a stiff! Bereft of life,
it rests in
6.864: Lecture 6 (September 27th, 2005)
The EM Algorithm Part II
Hidden Markov Models
A hidden Markov model (N, , ) consists of the following
elements:
N is a positive integer specifying the number of states in the
model. Without loss of generality, we w
6.864: Lecture 5 (September 22nd, 2005)
The EM Algorithm
Overview
The EM algorithm in general form
The EM algorithm for hidden markov models (brute force)
The EM algorithm for hidden markov models (dynamic
programming)
An Experiment/Some Intuition
I h
6.891: Lecture 4 (September 20, 2005)
Parsing and Syntax II
Overview
Weaknesses of PCFGs
Heads in context-free rules
Dependency representations of parse trees
Two models making use of dependencies
Weaknesses of PCFGs
Lack of sensitivity to lexical in
6.864: Lecture 2, Fall 2005
Parsing and Syntax I
Overview
An introduction to the parsing problem
Context free grammars
A brief(!) sketch of the syntax of English
Examples of ambiguous structures
PCFGs, their formal properties, and useful algorithms
6.864, Fall 2005: Problem Set 6
Total points: 140 regular points
Due date: 5pm, 8 December 2005
Late policy: 5 points off for every day late, 0 points if handed in after 5pm on 12 December 2005
Question 1 (25 points) Figure 1 shows the perceptron algorith
6.864, Fall 2005: Problem Set 4
Total points: 160 regular points
Due date: 5pm, 15 November 2005
Late policy: 5 points off for every day late, 0 points if handed in after 5pm on 19 November 2005
Question 1 (15 points)
Describe an algorithm for hierarchica
6.864, Fall 2005: Problem Set 3
Total points: 110 regular points
Due date: 5 pm, 1st November 2005
Late policy: 5 points off for every day late, 0 points if handed in after 5pm on November 4th 2005
Question 1 (15 points)
Clarissa Linguistica decides to bu
6.864, Fall 2005: Problem Set 2
Total points: 160 regular points
Due date: 5pm, 18th October 2005
Late policy: 5 points off for every day late, 0 points if handed in after 5pm on October 22nd 2005
Question 1 (15 points)
In the absolute discounting model o
6.864, Fall 2005: Problem Set 1
Total points: 90 regular points, 10 bonus points
Due date: 5pm, 29th September 2005
Late policy: 5 points off for every day late, 0 points if handed in after 1pm on October 4th 2005
Question 1 (20 points)
A probabilistic co