Question 1
P
We would like to ensure that for all t, u, v, w qBO (w|t, u, v) = 1.
Note that the missing probability mass is
X
X c (t, u, v, w)
1
qBO (w|t, u, v) = 1
c(t, u, v)
wA(t,u,v)
wA(t,u,v)
P
If we set (t, u, v) = 1 wA(t,u,v)
P
w qBO (w|t, u, v) =

Question 1a
One parse tree:
S
NP
DT
NN
the
man
VP
VB
saw
NP
NP
PP
DT
NN
IN
the
dog
in
NP
DT
NN
the
park
Question 1b
Two parse trees, parse tree 1:
S
VP
NP
DT
NN
the
man
VB
NP
saw
NP
PP
DT
NN
the
dog
IN
NP
in
NP
PP
DT
NN
IN
the
park
with
NP
DT
NN
the
cat
Q

Questions for Flipped Classroom Session of COMS 4705
Week 2, Fall 2014. (Michael Collins)
Question 1 In lecture we saw how to build trigram language models using discounting methods, and the Katz back-off definition. Were now going to build a
four-gram la

Questions for Flipped Classroom Session of COMS 4705
Week 3, Fall 2014. (Michael Collins)
Question 1 Consider a trigram HMM tagger with:
The set K of possible tags equal to cfw_D, N, V
The set V of possible words equal to cfw_the, dog, barks
The follow

Questions for Flipped Classroom Session of COMS 4705
Week 1, Fall 2014. (Michael Collins)
Question 1
Wed like to define a language model with V = cfw_the, a, dog, and p(x1 . . . xn ) =
0.5n for any x1 . . . xn , such that xi V for i = 1 . . . (n 1), and

Questions for Flipped Classroom Session of COMS 4705
Week 5, Fall 2014. (Michael Collins)
Question 1 In this question our goal is to design an algorithm that takes a sentence s and a context-free grammar in Chomsky normal form as input, and as its
output

Final for COMS W4705
Name:
30 15 15 30 15 15 30 20
COMS W4705 Final
page 1 of 19
Part #1
30 points
Consider a very simple bigram language model, where the vocabulary consists
of the single word a, and the parameters of the model are
q(a|*)
=
1.0
q(a|a)
=

Question 1a
Set the following translation parameters equal to 1 (all other
translation parameters are 0): t(aate|ate), t(athe|the),
t(adog|dog), t(acat|cat), t(abanana|banana)
Set the following alignment parameters equal to 1 (all others
are zero):
q(3|1,

COMS W4705, Spring 2015: Problem Set 2
Total points: 140
Analytic Problems (due March 2nd)
Question 1 (20 points)
A probabilistic context-free grammar G = (N, , R, S, q) in Chomsky Normal Form is defined as follows:
N is a set of non-terminal symbols (e.

COMS W4705, Spring 2015: Problem Set 4
Total points: 145
Analytic Problems (due April 27th at 5pm)
Question 1 (20 points)
Clarissa Linguistica decides to build a log-linear model for language modeling. She has a training sample
(xi , yi ) for i = 1 . . .

Questions for Flipped Classroom Session of COMS 4705
Week 13, Fall 2014. (Michael Collins)
Question 1 Consider an application of global linear models to dependency parsing. In this scenario each input x is a sentence. GEN(x) returns the set of all
depende

Question 1
S(dog)
NP(man)
D
N
the
man
VP(dog)
V
saw
NP(dog)
PP(telescope)
D
N
P
the
dog
with
NP(telescope)
D
N
the
telescope
Question 2
S(likes)
NP(Bob)
VP(likes)
Bob
VB(likes)
NP(parks)
likes
NP(parks)
PP(in)
parks
IN(in)
NP(Paris)
in
NP(Paris)
Paris
NP-

Question 1
g1 (x, h, m) = 1 if xh = car and xm = the
g2 (x, h, m) = 1 if P OS(h) = NN and P OS(m) = DT
and P OS(i) 6= VB for i cfw_(h + 1) . . . (m 1)
and i cfw_(m + 1) . . . (h 1)
Question 2a
Question 2a
C
C
I
C
0.2
I
C
C C
3.2
C
C
I
1.1
C C
1.2
2.1
2.2

Questions for Flipped Classroom Session of COMS 4705
Week 9, Fall 2014. (Michael Collins)
Definition of consistent(A, (s, t), (s0 , t0 ):
(Recall that A is an alignment matrix with Ai,j = 1 if French word i is aligned to English
word j. (s, t) represents

Question 1
g1 ( ) = 1 if = S -> NP VP, 0 otherwise
g2 ( ) = 1 if = N -> dog, 0 otherwise
g3 ( ) = 1 if = NP -> NP NP, 0 otherwise
Question 2a
Input: a sentence s = x1 . . . xn , a PCFG G = (N, , S, R, q).
Initialization:
For all i cfw_1 . . . n, for all X

Question 1
The only tag sequence y1 . . . yn+1 for which p(y1 . . . yn+1 ) > 0
is D N V STOP. Thus the only sequences that satisfy the
conditions are
the
the
the
the
dog dog, D N V STOP
barks dog, D N V STOP
dog barks, D N V STOP
barks barks, D N V STOP
Q

Questions for Flipped Classroom Session of COMS 4705
Week 11, Fall 2014. (Michael Collins)
Question 1 In this question we consider the problem of mapping a sentence to
an underlying sequence of tags, using a log-linear tagger. The input to the tagger
is a

Questions for Flipped Classroom Session of COMS 4705
Week 6, Fall 2014. (Michael Collins)
Question 1 Consider the following parse tree:
S
NP
VP
D
N
the
man
V
saw
NP
PP
D
N
P
the
dog
with
NP
D
N
the
telescope
Now assume that we add head-words to the non-te

Question 1
Input: a sentence s = x1 . . . xn , a context-free grammar
G = (N, , S, R).
Initialization:
For all i cfw_1 . . . n, for all X N ,
1 if X xi R
(i, i, X) =
0 otherwise
Algorithm:
I
For l = 1 . . . (n 1)
I
For i = 1 . . . (n l)
I
I
Set j = i + l

Questions for Flipped Classroom Session of COMS 4705
Week 12, Fall 2014. (Michael Collins)
Question 1 Consider an application of global linear models to parsing. In this
scenario each input x is a sentence. We have a fixed context-free grammar; GEN(x)
ret

The Forward-Backward Algorithm
Michael Collins
1
Introduction
This note describes the forward-backward algorithm. The forward-backward algorithm has very important applications to both hidden Markov models (HMMs) and
conditional random fields (CRFs). It i

We would like
X
X
n=1 w1 .wn
p(w1 . . . wn ) =
X
X
g(w1 . . . wn , n) 0.5n = 1
n=1 w1 .wn
where we can choose the function g(w1 . . . wn , n).
Note that we have 3 words in the vocabulary V, so there are 3n1
sequences of the form w1 . . . wn . If we set

Questions for Flipped Classroom Session of COMS 4705
Week 10, Fall 2014. (Michael Collins)
Question 1 This question considers log-linear models. Wed like to build a model
that estimates a distribution p(tag|word) using a log-linear model. The variable
tag

Questions for Flipped Classroom Session of COMS 4705
Week 8, Fall 2014. (Michael Collins)
Question 1 This question concerns training IBM Model 2 for statistical machine
translation. Assume that we have a bilingual corpus of English sentences e paired
with

The Inside-Outside Algorithm
Michael Collins
1
Introduction
This note describes the inside-outside algorithm. The inside-outside algorithm has
very important applications to statistical models based on context-free grammars.
In particular, it is used in E

COMS W4705, Spring 2015: Problem Set 1
Total points: 100
Due date: Analytical problems (questions 13) due Monday 9th February, 5pm. Programming problems due 16th February, 5pm.
Late policy: 5 points off for every day late, 0 points if handed in after 5

Questions for Flipped Classroom Session of COMS 4705
Week 4, Fall 2014. (Michael Collins)
Question 1 Consider a context-free grammar with the following rules (assume
that S is the start symbol):
S NP VP
NP DT NN
NP NP PP
PP IN NP
VP VB NP
DT the
NN man
NN

Question 1a
f1 (word, tag)
=
1 if word = the and tag = D, 0 otherwise
f2 (word, tag)
=
1 if word = dog and tag = N, 0 otherwise
f3 (word, tag)
=
1 if word = sleeps and tag = V, 0 otherwise
f4 (word, tag)
=
1 if word
/ cfw_the, dog, sleeps and tag = D, 0

COMS 4705, Spring 2015: Problem Set 3
Total points: 140
Analytic Problems (due April 3rd at 5pm)
Question 1 (25 points)
Say that we have used IBM model 2 to estimate a model of the form
m
Y
p(f , a|e, m) =
t(fj |eaj )q(aj |j, l, m)
j=1
where f is a French

Question 1a
(adog, dog)
(adog, the dog)
(aswims, swims)
(adog aswims, dog swims)
(adog aswims, the dog swims)
Question 1b
f = adog
e = the dog swims
A1,2 = 1, all other Ai,j values equal to 0.
Question 1c
(1, 1, the dog) (2, 2, swims)
(1, 1, dog) (2, 2, s