{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# s3 - Stat 5102 Lecture Slides Deck 3 Charles J Geyer School...

This preview shows pages 1–9. Sign up to view the full content.

Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Likelihood Inference We have learned one very general method of estimation: the method of moments. Now we learn another: the method of maximum likelihood. 2
Likelihood Suppose have a parametric statistical model specified by a PMF or PDF. Our convention of using boldface to distinguish between scalar data x and vector data x and a scalar parameter θ and a vector parameter θ becomes a nuisance here. To begin our discussion we write the PMF or PDF as f θ ( x ). But it makes no difference in likelihood inference if the data x is a vector. Nor does it make a difference in the fundamental definitions if the parameter θ is a vector. You may consider x and θ to be scalars, but much of what we say until further notice works equally well if either x or θ is a vector or both are. 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Likelihood The PMF or PDF, considered as a function of the unknown parameter or parameters rather than of the data is called the likelihood function L ( θ ) = f θ ( x ) Although L ( θ ) also depends on the data x , we suppress this in the notation. If the data are considered random, then L ( θ ) is a random variable, and the function L is a random function. If the data are considered nonrandom, as when the observed value of the data is plugged in, then L ( θ ) is a number, and L is an ordinary mathematical function. Since the data X or x do not appear in the notation L ( θ ), we cannot distinguish these cases notationally and must do so by context. 4
Likelihood (cont.) For all purposes that likelihood gets used in statistics — it is the key to both likelihood inference and Bayesian inference — it does not matter if multiplicative terms not containing unknown parameters are dropped from the likelihood function. If L ( θ ) is a likelihood function for a given problem, then so is L * ( θ ) = L ( θ ) h ( x ) where h is any strictly positive real-valued function. 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Log Likelihood In frequentist inference, the log likelihood function , which is the logarithm of the likelihood function, is more useful. If L is the likelihood function, we write l ( θ ) = log L ( θ ) for the log likelihood. When discussing asymptotics, we often add a subscript denot- ing sample size, so the likelihood becomes L n ( θ ) and the log likelihood becomes l n ( θ ). Note: we have yet another capital and lower case convention: capital L for likelihood and lower case l for log likelihood. 6
Log Likelihood (cont.) As we said before (slide 5), we may drop multiplicative terms not containing unknown parameters from the likelihood function. If L ( θ ) = h ( x ) g ( x, θ ) we may drop the term h ( x ). Since l ( θ ) = log h ( x ) + log g ( x, θ ) this means we may drop additive terms not containing unknown parameters from the log likelihood function. 7

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Examples Suppose X is Bin( n, p ), then the likelihood is L n ( p ) = n x p x (1 - p ) n - x but we may, if we like, drop the term that does not contain the parameter, so L n ( p ) = p x (1 - p ) n - x is another (simpler) version of the likelihood.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}