Economics 241B
Relation to Method of Moments and Maximum Likelihood
OLSE as a Maximum Likelihood Estimator
Under Assumption 5 we have speci°ed the distribution of the error, so we
can estimate the model parameters
°
= (
±; ²
2
)
with the principle of maximum
likelihood.
Under the assumption that the error is Gaussian, we will see that the
OLS estimator
B
is equivalent to the MLE and the OLS estimator of
²
2
di/ers
only sightly from its ML counterpart.
Further,
B
achieves the CramerRao lower
bound.
ML Principle
The intuitive idea of the ML principle is to choose the value of the parameter
that is most likely to have generated the data.
Precisely, we assume that the
probability distribution of a sample
(
Y
)
is a member of a family of functions
indexed by
³
(this is described as parameterizing the distribution). This function,
viewed as a function of the parameter vector
³
is called the likelihood function.
In general, the likelihood function has the form of the joint density function
L
(
³
j
Y
1
=
y
1
; : : : ; Y
n
=
y
n
) =
f
Y
1
°°°
Y
n
(
y
1
; : : : ; y
n
;
³
)
:
For an i.i.d.
sample of a continuous random variable, we form the likelihood
function as
L
(
³
j
Y
1
=
y
1
; : : : ; Y
n
=
y
n
) =
n
Y
t
=1
f
Y
(
y
t
;
³
)
:
De°nition.
The maximum likelihood estimator (MLE) of
³
,
A
ML
, is the value
of
³
(in the parameter space) that maximizes
L
(
³
j
Y
1
=
y
1
; : : : ; Y
n
=
y
n
)
.
Conditional versus Unconditional Likelihood
For the regression model, we have a sample
(
Y; X
)
, whose joint density we para
meterize.
Because the joint density is the product of a marginal density and a
conditional density, we can write the joint density of the data as
f
(
y; x
;
³
) =
f
(
y
j
x
;
°
)
°
f
(
x
;
)
:
The parameter vector of interest is
°
.
If we knew the parametric form of
f
(
x
;
)
, then we could maximize the joint
likelihood function.
We cannot do this, as the classic model does not specify
f
(
x
;
)
.
However, if there is no functional relation between
°
and
(such as
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
the value of an element of
depending on an element of
°
), then maximizing the
joint likelihood is achieved by separately maximizing the conditional and marginal
likelihoods.
In such a case, the ML estimate of
°
is obtained by maximizing the
conditional likelihood alone.
LogLikelihood for the Regression Model
As we have already seen, Assumptions 1.2 (strict exogeneity), Assumption 1.4
(spherical error variance) and Assumption 1.5 (Gaussian) together imply
U
j
X
±
N
(0
; ²
2
I
n
)
.
Because
Y
=
X±
+
U
, we have
Y
j
X
±
N
°
X±; ²
2
I
n
±
:
The loglikelihood function, which is simpler to maximize, is
ln
L
²
~
±;
~
²
2
j
(
Y
1
; X
1
) = (
y
1
; x
1
)
; : : : ;
(
Y
n
; X
n
) = (
y
n
; x
n
)
³
=
²
n
2
ln (2
´
)
²
n
2
ln ~
²
2
²
1
2~
²
2
²
Y
²
X
~
±
³
0
²
Y
²
X
~
±
³
:
(Because the likelihood function has the form of a joint density function, the like
lihood function takes values on the unit interval. Because the likelihood function
takes values on the unit interval, the loglikelihood function is negative.)
ML via Concentrated Likelihood
We could maximize the log likelihood in two stages.
First, maximize over
~
±
for
any given
~
²
2
.
The
~
±
that maximizes the objective function could (but in this
case, does not) depend on
~
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 Staff
 Economics, probability density function, Maximum likelihood, Estimation theory, Likelihood function, Xt

Click to edit the document details