So we've now described the basic form of
log-linear models.
And we've talked about he parameter
estimation in these models.
The final piece of the puzzle is going to
be to talk about smoothing and
regularization, which is going to be a
slight modif
Language Modeling
Michael Collins, Columbia University
Overview
I
I
I
I
The language modeling problem
Trigram models
Evaluating language models: perplexity
Estimation techniques:
I
I
Linear interpolation
Discounting methods
The Language Modeling Problem
I
So let's consider this modeling problem
that we're left with a little bit more
closely.
And to do this, I'm going to use the
following example.
So if we consider this word base, there
are many possible tags.
At this particular position for this wor
So, here's a quick recap of how
Log-Linear Models were applied to the
tagging problem.
So, remember we use this notation w1
colon n, to refer to an input sentence w1
through wn.
And similarly t1 colon n is a tag
sequence.
The cri-, first critical
So first, let's give a quick recap of the
tagging problem.
So, remember we considered a couple of
very important examples of tagging
problems much earlier in this class.
And the first one we looked at was
Part-of-Speech tagging.
So the problem here
So let's now talk about what makes a
dependency structure well formed.
That is how do we define the set of
possible dependency structures for a
given sentence.
And we're going to focus on two
constraints which will be important.
So the first constr
So the first example problem I'm going to
use to motivate log linear models is the
Language Modeling Problem, which we saw
right at the start of this class.
So just to recap quickly, the problem is
as follows.
We define w sub i to be the ith word in
So in last week's lectures, we developed
log-linear models.
This very new way of looking at modelling
for natural image processing, and also
parameter estimation and natural image
processing.
In the current week's lectures, we going
to look at vari
1
00:00:00,012 -> 00:00:06,540
So the next set of categories we're going
to look at are verbs, verb phrases, and
2
00:00:06,540 -> 00:00:11,091
sentences.
So our first critical observation is that
3
00:00:11,091 -> 00:00:16,705
verbs in English can be sub
Okay.
To summarize this segment of the class on
log linear taggers.
These were the key ideas.
The first key idea was, to directly model
the conditional probability of any tag
sequence conditioned on a word sequence.
Using a decomposition, where we
So in this lecture I'm first going to give
an introduction to the parsing problem.
We'll then describe context-free grammars.
I'll then give a very brief sketch of how
we can apply context-free grammars to
develop a model of the grammatical
structures see
1
00:00:00,930 -> 00:00:05,050
So in last week's lectures, we developed
log-linear models.
2
00:00:05,050 -> 00:00:09,137
This very new way of looking at modelling
for natural image processing, and also
3
00:00:09,137 -> 00:00:13,290
parameter estimation
1
00:00:01,330 -> 00:00:04,775
So that's basically it.
What we've seen here, is a way of using
2
00:00:04,775 -> 00:00:09,800
log-linear models, to construct a very
different type of parsing model from the
3
00:00:09,800 -> 00:00:17,102
problistic context
1
Independence Assumptions in Log-linear Taggers
1.1
Question (time: 8:32, slide: 8)
Say we have w1 . . . w3 = the dog barks. We would like
p( D N V | the dog barks) = 0.5
p( D N N | the dog barks) = 0.5
What should be the value for the following probab
1
00:00:02,180 -> 00:00:05,962
So let's now consider our second example
problem, which was part of speech
2
00:00:05,962 -> 00:00:09,959
tagging.
So remember in this case, each x is a
3
00:00:09,959 -> 00:00:20,114
history which consists of a sentence.
Fo
1
00:00:01,540 -> 00:00:05,566
So that was a description of a set of
potential features in our Log-Linear
2
00:00:05,566 -> 00:00:09,519
attacker.
This is just a recap of local linear
3
00:00:09,519 -> 00:00:15,108
models then take features and produce
co
Premium Support
This page features MathJax technology to render mathematical formulae. If you are using a
screen reader, please visit MathPlayer to download the plugin for your browser. Please note that
this is an Internet Explorer-only plugin at this tim
So that was a description of a set of
potential features in our Log-Linear
attacker.
This is just a recap of local linear
models then take features and produce
conditional probability distributions of
the form PY given X under a parameter
vector v.
1
00:00:00,012 -> 00:00:05,468
> So now, we'll see how we can apply
these ideas very directly to the language
2
00:00:05,468 -> 00:00:09,922
modeling problem.
So and we'll see how trigram language
3
00:00:09,922 -> 00:00:15,026
models can be derived as a
1
00:00:01,50 -> 00:00:05,807
So, the first important question in this
kind of approach is, how do we implement
2
00:00:05,807 -> 00:00:09,584
step one?
How do we represent a tree as a sequence
3
00:00:09,584 -> 00:00:13,528
of decisions?
And so, the next
1
00:00:00,980 -> 00:00:06,232
So we've now described some key steps in
constructing a log-linear tagger.
2
00:00:06,232 -> 00:00:10,856
Firstly, how to define features that take
into account features of the context or
3
00:00:10,856 -> 00:00:16,300
the h
1
00:00:01,200 -> 00:00:04,582
So, we've now seen these first two layers
of structure.
2
00:00:04,582 -> 00:00:09,730
Firstly, the part of speech tags for the
sentence and secondly, the sequence of
3
00:00:09,730 -> 00:00:14,261
chunking decisions.
We're
Log-Linear Models
Michael Collins, Columbia University
The Language Modeling Problem
I
wi is the ith word in a document
I
Estimate a distribution p(wi |w1 , w2 , . . . wi1 ) given previous
history w1 , . . . , wi1 .
I
E.g., w1 , . . . , wi1 =
Third, the n