The Evolution of the Mitochondrial D-Loop Region and the Origin of Modern Man ’

The origin of modern man is a highly debated issue that has recently been tackled by using mitochondrial DNA sequences. The limited genetic variability of human mtDNA has been explained in terms of a recent common genetic ancestry, thus implying that all modern-population mtDNAs originated from a single woman who lived in Africa ~0.2 Mya. This divergence time is based on both the estimation of the rate of mtDNA change and its calibration date. Because different estimates of the rate of mtDNA evolution can completely change the scenario of the origin of modem man, we have reanalyzed the available mitochondrial sequence data by using an improved version of the statistical model, the “Markov clock,” devised in our laboratory. Our analysis supports the African origin of modem man, but we found that the ancestral female from which all extant human mtDNAs originated lived in a time span of 0.3-0.8 Mya. Pushing back the date of the deepest root of the humans implies that the earliest divergence would have been in the Homo erectus population.


Introduction
In the past decade, interpretations of new genetic evidence-in particular, massive sequencing of selected regions of mitochondrial DNA (mtDNA)-has engendered much debate about the origin of modem man. The limited genetic variability of human mtDNA has been explained in terms of a recent common genetic ancestry (Cann et al. 1987;Lewin 1987;Wainscoat 1987;Wilson et al. 1987;Stringer and Andrews 1988). Modern populations are thought to have their mtDNA originating from a single woman who lived in Africa in a small diverging population that successively spread throughout the world, replacing preexisting populations without allowing further admixture. The date of this event has been estimated by Cann et al. ( 1987) to be 0.2 Mya.
This "single origin" hypothesis has been recently termed the "Garden of Eden" hypothesis ( Wainscoat 1987). It competes with two other models of the origin of the human species: ( 1) the "multiregional origin" model, which postulates that modern populations originated independently in different areas of the world, from the evolution of the predecessors of local ancient Homo erectus individuals, and (2) the "gene-flow/ hybridization" model, which postulates that there was neither a single center nor a worldwide origin of modern populations, whose genes should thus derive from various ancestral humans ( Wolpoff 1989).
As stressed above, the strongest genetic support for the "Garden of Eden" hypothesis stems from th< interpretations of the sequence variability of the main regulatory region of the mtDNA, the so called D-loop region (in particular, from the estimate of the rate of mtDNA evolution) and the inference of the time of the deepest root of the human mtDNA tree (Stoneking et al. 1986 ). There has been much debate, essentially based on the mathematical models used by various authors, about the rate of mtDNA change and its calibration. Wilson's group put the age of modern human mtDNAs within the range of 0.14-0.28 Mya (Cann et al. 1987;Vigilant et al. 1989, 199 1). The data of the D-loop region have been recently analyzed by Hasegawa and Horai ( 199 I), who determined the time of the deepest root of all extant human mtDNAs at 0.28 + 0.05 Mya. This time estimate, considered in light of its statistical fluctuation, does not allow us to prefer any of the three competing models for the origin of modern man. Since different estimates of mtDNA evolution can completely change the scenario of the origin of the human species, we reanalyzed the available sequence data by using the "Markov clock" method (Preparata and Saccone 1987;Saccone et al. 1990). We applied this method to 16 sequences ( 14 human sequences and 2 chimpanzee species) of the regulatory region of mtDNA, and, because no model of homogeneous evolution of the D-loop sites can fit the data, we here present an improved version of the Markov model, one that holds when only a fraction of the sites under examination evolve at an appreciable rate.

Sequence Data
The entire D-loop region of 14 human sequences and two chimpanzee speciescommon (Pan troglodytes) and pygmy (P. pygrnaeus and P. paniscus)-were aligned and analyzed. The sequence data and the multiple alignment are from Kocker and Wilson ( 199 1). The human sequences correspond to the most divergent mtDNA sequences previously studied by Cann et al. ( 1987 ). The alignment has 1,135 sites, of which 203 carry nucleotide substitutions, and 900 are identical in all 16 sequences. In addition we analyzed nine more sequences of two noncontiguous segments of the left and right domain of the D-loop, -650 nucleotides long, of !Kung (Botswana) individuals, published by Vigilant et al. ( 1989).

Method
The stochastic model adopted for drawing phylogenetic trees and evaluating the evolutionary rates is that described by Saccone et al. ( 1990). It is based on a Markov stationary process that for its application requires that the nucleotide frequencies of the various sequences be the same within the statistical fluctuations. For each pair of sequences the evolutionary process is described by the 4 X 4 rate matrix &( T) , which fully describes the propensity of nucleotide j to be substituted by nucleotide i within time T (i, j = A, C, G, or T). The RU matrix is obtained from the experimentally observed data, through the substitution matrix Nij that counts the number of times the nucleotide i in the first sequence has become j in the second. To introduce the effect of "invariable" sites in D-loop evolution, the Nti matrices have been recalculated in the following way: where f is the fraction of sites free to vary, L is the sequence length, and qi is the frequency of the ith nucleotide.
The Markov clock "renormalization" that we use to eliminate those sites whose evolutionary dynamics, because of some functional constraint, are much slower than those of others, is admittedly rather rudimentary, but it is the simplest that we can find. Indeed, our renormalization simply assumes that a fraction of sites ( 1 -f) (fr 1) with a stationary base composition identical to the variable fraction fdoes not participate in the basic Markov process. This introduces into our analysis a new parameter f, which we shall adjust to observation. By fixing fand the time of divergence of any one of the pair of species under analysis, our renormalized Markov-clock model produces a set of theoretical Nii ( T) matrices, that are functions of the time of divergence and that can be compared with those observed, in order to determine the relevant times of divergence.

Results
By assuming that f = 1 in our previous, unrenormalized model and by fixing the time of divergence between human and chimp at 5 Mya (Wilson and Sarich 1967), we have obtained the tree shown in figure la. From the root of all the human sequences depart two primary branches, one leading to HO2 and the other leading to the common ancestor of all the 13 remaining human mtDNA types. Including the sequences from Vigilant et al. ( 1989), we obtained a phylogenetic tree (fig. lb) in which, as expected, all the !Kung sequences clustered together because of both the low genetic diversity and the low average heterozygosity observed in this population and HO2 remained the first to diverge on the human tree. If we assume that the time of divergence of this root corresponds to the age of "our common mitochondrial mother" (Wilson et al. 1987), we obtain 0.8 + 0.2 Mya.
It has already been observed that a homogeneous model of gene evolution cannot be applied to the mitochondrial D-loop, because of the various functional constraints acting on this regulatory DNA region (Hasegawa and Horai 199 1;Saccone et al. 199 1) . The claimed high evolutionary rate for this region of the mitochondrial genome does not imply that all the possible substitutions are selectively neutral. The regulatory role of this region, which involves various DNA-protein interactions, may well exert a strong functional constraint on many sites. The nature and the extent of the functional constraints-and a possible structure-function model-have been discussed by Saccone et al. ( 199 1). The tendency of a nucleotide site to undergo a substitution process is strongly related to the extent of the functional constraint acting on that particular site. In this sense each site could be characterized by its own evolutionary dynamics, but a stochastic description of the process necessitates the grouping together of all the sites that presumably are subjected to the same evolutionary dynamics. The problem can be simplified by assuming that the sequences to be analyzed are composed of two homogeneous classes of sites: ( 1) equally variable sites (varied + unvaried variables) and ( 2) invariable sites (in this class we also include those sites that change so slowly that, for practical purposes, they can be considered invariable ) .
Starting from the average Nij substitution matrix observed between humans and chimpanzees and fixing the time of divergence between human and chimpanzee at 5 Mya, we can calculate, by applying our Markov model, the expected substitution matrix as a function of the time T. In the stochastic process nonhomogeneity of the nucleotide sites can be recognized by the observation of the time dependewe of the  I   I  I  I  I  I  I  I  I  I  I  I  I  tution matrices calculated for the sequences under examination. The TS/TV observed by comparing HO2 with the other human sequences is, on average, 12.3, compared with the theoretically expected 3.9 f 0.4 (see fig. 2a). In the observed sequences we noted a quick drop in the TS/TV when going from the intraspecies comparison of humans to pygmy chimpanzee-common chimpanzee and to humans-chimpanzee. This particular pattern can be explained as being due to a limited fraction of sites that are effectively free to accumulate substitutions, thus allowing multiple hits to occur at the same sites. To determine fwe proceeded as follows: by assigning the time of divergence between human and chimpanzee ( THc) at the most commonly used date of 5 Mya (Wilson and Sarich 1967) and varying ffrom 0.26 to 1.00, we determined both the curve of the TS/TV ratio as a function of time and the time of divergence of the root of humans ( TDR) and of the pygmy and common chimpanzees ( TX). By comparing the observed TS/TV ratio with that expected, we determined a x2 ( The error range is determined according to the classical x2 minimization functions for which the best estimate of the parameter to be determined by this function ranges between -1 minimum x2 and + 1 minimum x2. Using equation ( 3 ) The presence of invariable sites, which is now taken into account by introducing the parameter f, does not influence the topology of the tree but strongly affects the times of divergence. Obviously the choice of T HC = 5 Mya, which does not affect the f estimate, also changes the relative times of divergence, proportionally to their lengths.
In this way the order of the nodal times remains conserved. If we now plot the TS/ TV ratio against the time T, fixing f = 0.28 and T HC = 5 Mya, we obtain a theoretical curve ( fig. 2b) that fits the observed TS/TV data very well.
We are now able to calculate the average substitution rate for the D-loop according to a method described by Saccone et al. ( 1990). The average rate is 12 sf: 5 substitutions/ 100 variable sites / Myr.

Discussion
The adequacy of a given evolutionary model can be established on the basis of how well the theoretical description of the evolutionary process fits the observed one. It has been shown that the substitution process is not the same for all nucleotide positions. Furthermore, in many cases the evolutionary process concerns only some sites, the others being practically invariable (Fitch and Margoliash 1967;Fitch 1986~). If the presence of invariable sites is not taken into account when the sequence divergence is calculated, the application of any evolutionary model produces inconsistent results (Fitch 1986b;Shoemaker and Fitch 1989 ). Correct estimates of the evolutionary rate are of fundamental importance to obtain a reliable time-scale of evolutionary events. Rate values obtained by mixing together such different dynamics as those of the three Mitochondrial D-Loop Region Evolution and Modern Man 595 codon positions, tRNAs, rRNAs, and the regulatory regions are not very reliable. The methodology that takes multiple hits into account by adjusting the number of TS to 10 times that of TV (Vigilant et al. 1989)-and, more recently, to 15 times that of TV (Vigilant et al. 1991 )-appears to be rather rough. In fact it does not consider the possibility of multiple TV on the same site-which, especially in the humanchimp comparison, are likely due to the small fraction of sites free to vary, as well as to the relatively high extent of divergence.
The crucial point is a believable and consistent determination of the evolutionary rate. The rate estimates based on restriction analyses of the mtDNA (Stoneking et al. 1986) and the phylogenetic trees constructed through a parsimony principle (Wilson et al. 1987) may not be very reliable. Indeed, the disagreement in the estimate of the substitution rate of the total mtDNA-given as 2%-4% /Myr by Wilson et al. ( 1985) and as 1.4% /Myr by Nei ( 1985)-bear witness to the weakness of such analysis. On the other hand, our simple, self-consistent model for neutral evolution, the stationary Markov clock (Preparata and Saccone 1987;Saccone et al. 1990)) contains within its basic structure the criteria to assess its applicability to a specific problem in evolutionary dynamics. With no arbitrary assumptions about the structure of the basic stochastic rate matrix and now with a further element that allows us to eliminate from consideration the hypovariable sites, we can analyze the evolution of the D-loop in mtDNA. At the same time we obtain an extra bonus: a reliable estimate of the statistical fluctuations associated with the relatively small number of variable sites ( -320, according to our analysis).
We have estimated the time of divergence T DR by applying a corrected version of the Markov model. The reliability of this result is confirmed by the good match between the 12.3 observed value and the 10.4 f 2.1 theoretically expected value of the TS/TV ratio, as shown in figure 2b. Hasegawa and Horai ( 199 1 ), by analyzing a smaller sequence sample composed of three different data sets of incomplete sequences of the D-loop region, estimated that the fraction of the variable sites is 0.27-0.31, depending on the data set, and estimated that the time of the deepest root of the humans mtDNA tree was 0.28 f 0.05 Mya. However, their tree is calibrated by dating the split between human and chimpanzee at 4 Mya. Their methodology is based on a Markov analysis that is different from that used by us. In particular, our analysis is based on the observed matrices of substitutions under the fulfilment of the stationary condition but without any "a priori" condition on the structure of the rate matrix. The Hasegawa and Horai ( 199 1) Markov analysis, on the other hand, is based on a two-parameter model that assumes that all TS are as probable as all TV. This is not the case for mammalian mtDNA, which displays considerable asymmetry in the base composition of the two strands (Cantatore and Saccone 1987).
The ancestral date that we propose, being longer ago than the one inferred by Cann et al. ( 1987), has important implications for the question of the origin of modern man. The ancestral date that we propose depends on the reference date and would be even greater earlier had we chosen an earlier date for the divergence between human and chimpanzee. Fossil records dated at -5.5 Mya have been ascribed to the line leading to hominids after the separation from the chimpanzee line (Groves 1989). Because fossils only provide a minimum date of divergence, 5 Mya is probably a low estimate of the date of the ape-human divergence, while the estimate of 7.4 f 1.5 Mya, previously obtained on the basis of the ~rl-globin genes (Holmes et al. 1989), looks more reliable because of the exceptionally good molecular clock of this pseu- dogene. This 7.4-Mya estimate is further supported by all the other times of divergence obtained in the present study, which agree well with paleontological estimates ( Andrews 1986). Changing the human-chimp calibrating time from 5 to 7.4 Mya would date the root of the human tree at 0.6 f 0.2 Mya.
The divergence times reported by Hasegawa and Horai ( 199 1) do not discriminate between the two competing hypotheses put forward for the human origin-i.e., the "Garden of Eden" model and the multiregional/gene flow model. They state that their time estimate "is consistent with previous estimates from restriction analysis and with the data from Vigilant et al. ( 1989)." Our data, on the other hand, put the age of the common ancestral human mitochondrion at either 0.4 + 0.1 Mya or 0.6 + 0.2 Mya, depending on whether one considers the human-chimp divergence to be 5 Mya or 7.4 Mya. Both dates put the ancestral human mitochondrion in the Homo erectus period. These two times estimates, in light of their statistical fluctuations, determine a time span of 0.3-0.8 Mya.
Our analysis supports the African origin of modern man and thus is in agreement with the observations, based on both mitochondrial and nuclear analyses (Nei and Livshits 1989;Rapacz et al. 199 1) , of a greater genetic variability-both within Africans and between Africans and other populations-compared with that within the Euro-Asiatic populations. In addition, by pushing back the date of the root of humans, our analysis implies that the ancestor would have been in the H. erectus population. Thus the appearance of H. sapiens need not have been a speciation event. The time that we propose is more recent than the 1.0 Mya estimated for the first appearance of H. erectus outside of Africa (Simons 1989)) although this latter estimate is debatable (Eckhardt 1987).
The results presented in the present paper are further supported by a recent report by Xiong et al. ( 199 1)) who demonstrate that a Japanese defective apo-cI1 allele arose -0.5 Mya. The claimed occurrence of a severe bottleneck in the human species -0.2 Mya-which would have rendered a recent speciation event more likely-is thus weakened by the antiquity of this Asian allele.