This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Assessing significance [JF 20] Fredrik Ronquist November 30, 2005 1 Introduction With the exception of Bayesian analysis, phylogenetic inference procedures typically identify a best estimate of phylogenetic relationships, a so called point estimate of the phylogeny. However, the point estimate is often relatively uninteresting in itself unless we have some measure of its reliability. This lecture will be about techniques for examining the robustness or significance of the results of phylogenetic analysis. The techniques can be divided into nonparametric and parametric approaches. The nonparametric methods are quite general in that they can be combined with any approach to phylogenetic inference; the parametric approaches, however, rely on stochastic models. Five different methods will be discussed here, three of which are nonparametric (permutation tests, jackknifing and bootstrapping) and two of which are parametric (parametric bootstrapping and posterior predictive distributions). 2 Permutation tests Permutation tests are widely used nonparametric techniques. They are particularly easy to apply in hypothesistesting situations. Assume, for instance, that we have two samples of 50 data points each and want to examine the hypothesis that they both come from the same distribution. To use a permutation test to accomplish this, first calculate the difference between the means of the two samples. Then randomly permute all of the 100 data points and calculate the difference between the mean of the 50 first and the 50 last data points. Because of the permutation, each of these sets will consist of a random mix of data points from the first and the second sample. Repeat the permutation a large number of times, say 999 times, and calculate the difference between the 1 BSC5936Fall 2005PB,FR Computational Evolutionary Biology means of the first 50 and last 50 points for each permutation. If we add the observed difference between the means we now have 1,000 values, which we order from lowest to highest. If the actual difference lies in the bottom 25 or top 25, corresponding to the 0.025 and 0.975 percentiles, we can reject the null hypothesis with = 0 . 05. A simple application of this approach to test for the presence of phylogenetic signal is the per mutation tail probability test. Assume we have an aligned data set X = { x 1 ,x 2 ,x 3 ,...,x n } where each x i is a column vector representing a single site (or character) in the alignment. The idea is to generate a large number of permuted data sets, in which the elements of each column vector x i have been permuted independently, and calculate some test statistic, such as the parsimony or like lihood score, for each. The distribution of values from the permuted data sets are then compared to the value of the test statistic for the observed data. If the latter value is sufficiently far into the appropriate tail of the reference distribution, we would say there is significant phylogenetic signal.appropriate tail of the reference distribution, we would say there is significant phylogenetic signal....
View
Full
Document
This note was uploaded on 11/27/2011 for the course BSC 5936 taught by Professor Staff during the Spring '08 term at FSU.
 Spring '08
 staff
 Evolution

Click to edit the document details