This preview shows page 1. Sign up to view the full content.
Unformatted text preview: atter distribution is 0.1 1. be used for estimating the number of nucleotide substitutions in the control region of mtDNA. In this model, aI, a2, and p stand for the rates of transitional changes between purines and between pyrimidines and of transversional change, respectively, and gA, gT, gc, and go represent the equilibrium frequencies of nucleotide A, T, C, and G, respectively. The latter parameters are incorporated into the substitution rates per unit evolutionary time, because the rates clearly depend on the frequencies of the mutant nucleotides (table 1). These parameters are also important to take care of the differences among the equilibrium values of gA, gT, gc, and go (Tajima and Nei 1984). Note that when crl = a2, the model presented in table 3 becomes identical with Hasegawa et al’ ( 1985 ) , though those authors did not derive any analytical formula s for estimating the number of nucleotide substitutions. We first consider a particular nucleotide site and derive an equation for the expected number of nucleotide substitutions between two sequences at that site. This equation obviously applies both to a set of nucleotide sites that have the same substitution pattern and the same substitution rate and to the case where the substitution pattern and the substitution rate are the same for all sites. When the nucleotide frequencies are in equilibrium, the average rate of nucleotide substitution per site for this model is given by h = 2gAgool + 2g,g,a2 + 2&&p, where gR = gA + go and
Table 3 Rates of Nucleotide Substitution in the Model Used
ORIGINAL NUCLEOTIDE MUTANT NUCLEOTIDE A T
c ,... . . A T
gAP gTa2 G
gAa gTP gcP I ...
.... . . gTP
gcP gca 2 G ... . . . gGal g’ $ - gGP Nucleotide Substitution in mtDNA 5 17 gY = 8T + sequences gc, whereas the expected number of nucleotide substitutions time units ago ( d = 2h t ) is that diverged t evolutionary d = 48A8Galt
+ 4gTgCad + 4gr&‘ Pt. y between two (3) To derive a formula for estimating d, one must know the expected proportions of nucleotide sites showing transitional differences between purines (P, ) and between pyrimidines (Pz) and of those showing transversional differences (Q), as expressed in terms of the substitution rates and evolutionary time. These proportions can be derived by the method described by Tamura ( 1992a) and become P1 = 2g,gG+ gYexp(-2fW (gR
gR - exp[--2(gRal + gyP)t]} , (4) ’ 2 = 2g,g, + gRexp([email protected]) (&
gY - exp[-2(gya2 + gRp)t]) , Q = &a41 - ew(-2Pt)l . (6) Since PI, P2, and Q are estimable from sequence comparison and since gA, gT, gc, and go can be estimated by the average nucleotide frequencies of the two sequences compared, one can obtain the estimator of d as defined in equation ( 3 ) . It becomes
2gAgG -log, gR 1- (7)
gAgGgY gR -- gTgCgR gY where p,, p2, and Q are the estimates of P, , P2, and Q, respectively. Here, gA, &, etc., represent the estimated nucleotide frequencies, but a is eliminated for simplicity. It can be shown that equation ( 7) is a maximum-likelihood estimator of d under the model of nucleotide substitution given in table 3. The large-sample variance of $ is given by V(B) = [( c:p, where n is the number + c:& + c:&) - (c& and + c& + c&>‘ ]/n ) (8) of nucleotides examined,
“=dPI= %AgGgR (9)
g&Cd ’ - & ad
C2=dPz= %T&gY 2gTgCgY &p2 gT&d ’ (10) 5 18 Tamura and Nei c3 = @ = gR(&?A&?GgR - dip1 Q&g:: - gAgG$) + gY(%T&gY (11)
gTgC@ - &p2 Our computer simulations have shown that equation (7) gives good estimates, unless PI, p2, and 0 are very large and n is relatively small. In the latter case, equation (...
View Full Document
This note was uploaded on 01/06/2010 for the course NS 2750 taught by Professor Haas&gu during the Spring '08 term at Cornell.
- Spring '08