University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS"
Spring 2008
Lab 10: Maximum Likelihood and Modeltest
In this lab we’re going to use
PAUP*
to find a phylogeny using molecular data
and Maximum Likelihood as the optimality criterion.
The computer evaluates the
likelihood of each tree, including topology and branch lengths, one at a time.
It
calculates the probability of each base pair changing in such a way as to generate the
states observed at the tips of the branches based on the tree and a set of parameters
describing how the bases change with time.
The likelihood of a data set for a given tree
is the product of these probabilities for all the base pairs.
The computer chooses the
topology and branch lengths that produce the highest likelihood for the data set.
So what
parameters of nucleotide change do we use and what values do we give them?
This is
called the model of nucleotide change and today we will pick a model using
ModelTest.
There are an infinite number of possible models.
Many have been implemented
in various programs, many have been suggested and never implemented, and even more
have never been conceived.
Today we are only going to deal with a few models that are
implemented in
PAUP*
and evaluated by
ModelTest
.
A model is considered nested within another model if its parameters are a limited
set of the parameters in the other model. For example the JukesCantor model, which
assumes that every nucleotides has the same rate of change to any other, is nested within
the
Kimura two parameter model, which assumes different transition and transversion
rates.
A model without any invariant sites would be nested within one with some
percentage of invariant sites.
Any two models are not necessarily nested.
Adding parameters to a model always increases the maximum likelihood of the
data.
However, if a model has too many parameters, then maximum likelihood becomes
unreliable.
Therefore to accept a new parameter into your model it must produce a
significant
increase in the likelihood.
How do you tell if a difference in likelihood is
significant?
Well, I’m sure you’ll be shocked to learn that there is a formula.
It is called
the Likelihood Ratio Test (LRT). For a given model with likelihood,
Λ
1
, nested within
another model with likelihood,
Λ
2
, with n less parameters:
Χ
2
(chi squared) = 2
* (
ln
(
Λ
2
) –
ln
(
Λ
1
))
with n degrees of freedom.
You can use this equation to pick the most inclusive model that can not be
significantly improved on.
The only drawback of this equation is that you can not use it
to compare different trees, because different trees are not different models – they are
more like alternative parameter values.
Therefore, you have to compare the different
models on a single tree, and which tree to compare them on may not be obvious.
Luckily, you tend to get similar results as long as you use a reasonable tree.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 LINDBERG,MISHLER,WILL
 Likelihoodratio test, nucleotide, Base pair

Click to edit the document details