Unformatted text preview: des. The 20 sequences used for estimating the age of common ancestral mtDNA are indicated by asterisks. The numbers in parentheses refer to the sequence number in the original data from Vigilant et al. ( 1991). Nucleotide Substitution in mtDNA Table 1 Esimated Relative Frequencies of Nucleotide Substitutions in the Hypervariable Segments of the mtDNA Control Region in Humans
ORIGINALNUCLEOTIDE MUTANT NUCLEOTIDE A T C G 5 15 A ........... T ........... c ........... G .......... 0.3 0.4 1.1 14.1 33.8 0.3 1.1 25.8 0.5 20.0 1.1 1.6 mtDNA (Tamura 1992b). Close examination of the substitution pattern also suggests that the substitution rate depends on the frequency of the mutant nucleotide. Thus, \ f& and& are higher than fAo and =&cr respectively. , As mentioned earlier, the rate of nucleotide substitution h in the control region is known to vary substantially from site to site. Kocher and Wilson ( 199 1) showed that the distribution of the number of nucleotide substitutions per site approximately follows the negative binomial distribution rather than the Poisson distribution. Since we have a larger set of substitution data, we reexamined this problem. Table 2 shows the observed numbers of sites showing 0, 1,2, and a.3 substitutions, together with the expected numbers obtained for the Poisson and the negative binomial distributions. The observed data follow the negative binomial rather than the Poisson distribution. The negative binomial distribution is known to be generated when Poisson parameter h varies according to the following gamma distribution among sites (Johnson and Kotz 1973, pp. 124-125). f(h) & = e-bkha-’ , where a = x2/V(h) and b = a/x, x and V(X) being, respectively, the mean and variance of h. r(a) is the gamma function. Therefore, one can derive an equation for the average number of nucleotide substitutions for the entire control region by using the above gamma distribution, as will be shown later. However, to estimate this average number, we need an estimate of parameter a in equation ( 1). This parameter can be estimated by a= m2/(s2 - m) , (2) where m and s2 are, respectively, the mean and variance of the number of nucleotide substitutions per site (Johnson and Kotz 1973, p. 13 1). In the present case, we have a = 0. 1 from observed data in table 2. ‘This estimate is virtually identical with that I obtained by Kocher and Wilson ( 199 1) . Mathematical Methods On the basis of the pattern of nucleotide substitution observed in table 1, we propose that the mathematical model of nucleotide substitution presented in table 3 516 Tamura and Nei Table 2 Observed and Expected Distributions of the Number of Nucleotide Substitutions in the Control Region of Human mtDNA
No. OF SUBSTTIWT~~NS PER SITES No. OF SITES~ Observed 1,028 58 21 9 Poisson 987.0 121.2 7.4 0.4 Negative Binomial 0 .............. 1 .............. 2 .............. 23 . .. 1,028.O 59.3 17.4 11.3 a Estimated by parsimony analysis of 20 human sequences. The mean and variance of the estimated number of nucleotide substitutions per site are m = 0.1228 and s2 = 0.2582, respectively. b The x2 value for the difference between the observed and expected distributions is 97.8 (P + 0.001; 2 df) for the Poisson distribution and 1.24 (0.5 < P < 0.7; 2 df) for the negative binomial distribution. The number of degrees of freedom is 2 for the Poisson distribution because the substitution classes “2” and “>3+” were pooled and one parameter was estimated, whereas the number for the negative binomial distribution is 2 because all the four classes were used and two parameters were estimated. The a value used for the
View Full Document
This note was uploaded on 01/06/2010 for the course NS 2750 taught by Professor Haas&gu during the Spring '08 term at Cornell.
- Spring '08