Lecture_23_24

# Lecture_23_24 - Model Selection and Model Averaging in...

This preview shows pages 1–3. Sign up to view the full content.

Model Selection and Model Averaging in Phylogenetics Johan Nylander November 27, 2005 1 Key Concepts Model selection - to use the data to select a model - should be an integral part of inference [4]. The data generating or ”true” model ( f ) has an inFnite number of parameters and is un- reachable. The best approximate model ( g ): best descriptive model given the limited sample size. ±ind- ing the best g is (or can be) the goal of model selection. A more parameter-rich model has a higher potential than a less parameter rich model: less discrepancy due to approximation . However, a more parameter-rich model tends to perform farther below its potential than a less parameter rich model caused by the discrepancy due to estimation [20]. Parsimonious trade o² between error (decreases with additional parameters) and variance (increases with additional parameters). To help us with the trade o²: apply a model selection criterion. 2 Model Selection Criteria in Phylogenetics 2.1 Likelihood Changing the model changes the likelihood - (which is proportional to) the probability of data, given the parameters and the model [5]. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Computational Evolutionary Biology Maximized log Likelihood is biased upward as an estimator of the target model. The bias is proportional to the number of parameters [4]. 2.2 Likelihood Ratio Testing [7] δ = - 2(ln L 0 - ln L 1 ) Basic idea: Is the increase in likelihood signi±cant? δ is asymptotically χ 2 n distributed, with df n , the di²erence in number of free parameters between models. Only for nested models (model L 0 must be a special case of L 1 ). Mixed χ 2 n distributions when one parameter is in its limit (e.g., GTR vs. GTR+G) [19]. Applications: Modeltest [13], MrModeltest2 [15] 2.3 AIC - Akaike Information Criterion [1, 4] AIC i = - 2 ln( L ) + 2 p L : Max. log Likelihood for model i , p : number of parameters. Estimates the expected Kullback-Leibler (K-L) distance: information lost when model g is used to approximate f . Min AIC is the best K-L model in the set of competing models. No accept or reject (not a strict test). Applies to nested and non-nested models. AIC c - takes sample size in to account. Must be based on the maximum likelihood - problematic(?).
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 8

Lecture_23_24 - Model Selection and Model Averaging in...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online