Lecture_25 - Assessing significance [JF 20] Fredrik...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Assessing significance [JF 20] Fredrik Ronquist November 30, 2005 1 Introduction With the exception of Bayesian analysis, phylogenetic inference procedures typically identify a best estimate of phylogenetic relationships, a so called point estimate of the phylogeny. However, the point estimate is often relatively uninteresting in itself unless we have some measure of its reliability. This lecture will be about techniques for examining the robustness or significance of the results of phylogenetic analysis. The techniques can be divided into non-parametric and parametric approaches. The non-parametric methods are quite general in that they can be combined with any approach to phylogenetic inference; the parametric approaches, however, rely on stochastic models. Five different methods will be discussed here, three of which are non-parametric (permutation tests, jackknifing and bootstrapping) and two of which are parametric (parametric bootstrapping and posterior predictive distributions). 2 Permutation tests Permutation tests are widely used non-parametric techniques. They are particularly easy to apply in hypothesis-testing situations. Assume, for instance, that we have two samples of 50 data points each and want to examine the hypothesis that they both come from the same distribution. To use a permutation test to accomplish this, first calculate the difference between the means of the two samples. Then randomly permute all of the 100 data points and calculate the difference between the mean of the 50 first and the 50 last data points. Because of the permutation, each of these sets will consist of a random mix of data points from the first and the second sample. Repeat the permutation a large number of times, say 999 times, and calculate the difference between the 1 BSC5936-Fall 2005-PB,FR Computational Evolutionary Biology means of the first 50 and last 50 points for each permutation. If we add the observed difference between the means we now have 1,000 values, which we order from lowest to highest. If the actual difference lies in the bottom 25 or top 25, corresponding to the 0.025 and 0.975 percentiles, we can reject the null hypothesis with = 0 . 05. A simple application of this approach to test for the presence of phylogenetic signal is the per- mutation tail probability test. Assume we have an aligned data set X = { x 1 ,x 2 ,x 3 ,...,x n } where each x i is a column vector representing a single site (or character) in the alignment. The idea is to generate a large number of permuted data sets, in which the elements of each column vector x i have been permuted independently, and calculate some test statistic, such as the parsimony or like- lihood score, for each. The distribution of values from the permuted data sets are then compared to the value of the test statistic for the observed data. If the latter value is sufficiently far into the appropriate tail of the reference distribution, we would say there is significant phylogenetic signal.appropriate tail of the reference distribution, we would say there is significant phylogenetic signal....
View Full Document

This note was uploaded on 11/27/2011 for the course BSC 5936 taught by Professor Staff during the Spring '08 term at FSU.

Page1 / 7

Lecture_25 - Assessing significance [JF 20] Fredrik...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online