Although fossil remains show that anatomically modern humans dispersed out of Africa into the Near East ∼100 to 130 ka, genetic evidence from extant populations has suggested that non-Africans descend primarily from a single successful later migration. Within the human mitochondrial DNA (mtDNA) tree, haplogroup L3 encompasses not only many sub-Saharan Africans but also all ancient non-African lineages, and its age therefore provides an upper bound for the dispersal out of Africa. An analysis of 369 complete African L3 sequences places this maximum at ∼70 ka, virtually ruling out a successful exit before 74 ka, the date of the Toba volcanic supereruption in Sumatra. The similarity of the age of L3 to its two non-African daughter haplogroups, M and N, suggests that the same process was likely responsible for both the L3 expansion in Eastern Africa and the dispersal of a small group of modern humans out of Africa to settle the rest of the world. The timing of the expansion of L3 suggests a link to improved climatic conditions after ∼70 ka in Eastern and Central Africa rather than to symbolically mediated behavior, which evidently arose considerably earlier. The L3 mtDNA pool within Africa suggests a migration from Eastern Africa to Central Africa ∼60 to 35 ka and major migrations in the immediate postglacial again linked to climate. The largest population size increase seen in the L3 data is 3–4 ka in Central Africa, corresponding to Bantu expansions, leading diverse L3 lineages to spread into Eastern and Southern Africa in the last 3–2 ka.
Since the emergence of Homo sapiens 150–200 thousand years (ka) ago, the species is thought to have lived in small scattered populations and been at risk of extinction for much of its existence (Newman 1995; Willoughby 2007). This low effective population size is reflected in deeper parts of the mitochondrial DNA (mtDNA) tree of modern humans. Although the tree is highly starlike at shallower time depths, suggesting numerous episodes of rapid growth in the human population in the more recent past, it is only at a third of the time depth of the entire tree with the emergence of the L3 haplogroup that the first multifurcating node occurs (seven subbranches, including haplogroups M and N that encompass between them all the ancient diversity observed outside Africa) (Watson et al. 1997; Torroni et al. 2006; Behar et al. 2008). The explanation for this dramatic early population expansion after the appearance of L3 is unknown, but advances in technological or cognitive capacities and response to climatic change have been suggested (Klein 1992; Forster 2004; Mellars 2006; Scholz et al. 2007). Two main routes out of Africa are still widely discussed: the Sinai, linking Egypt to the Levant, and the Bab el-Mandab strait, from the Horn of Africa to the Arabian Peninsula (Derricourt 2005). Early dates for the settlement of Southeast Asia by ∼50 ka (Barker et al. 2007) and Australia by 48 ka (Turney et al. 2001) combined with the distribution and ages of mtDNA lineages (Kivisild et al. 2003; Macaulay et al. 2005; Thangaraj et al. 2005; Atkinson et al. 2008; Soares et al. 2009) have suggested a “southern coastal route” via Arabia (Field and Lahr 2005) as the sole major exit from Africa in the Late Pleistocene (Beyin 2006; Mellars 2006; Richards et al. 2006; Bulbeck 2007). Nevertheless, controversy about the timing in particular remains.
It is usually assumed that the exit should take place during an interglacial period, implying either marine isotope stage 5 (MIS 5 at 130–75 ka) or MIS 3 (60–25 ka) (Bar-Yosef 1995). An early arrival in South Asia (as early as 125–130 ka) has recently been proposed on the basis of lithic evidence in India and the United Arab Emirates (Haslam et al. 2010), with tool kit affinities between Arabia and the late Middle Stone Age (MSA) of northeast Africa suggesting (although not to all: Lawler 2011) that low sea level and increased rainfall during the transition between MIS 6 and 5, rather than technological innovation, might have allowed this proposed early dispersal into Arabia (Armitage et al. 2011). This would imply that the early movement of modern humans into the Levant at 100–130 ka known from fossil evidence (Grün et al. 2005) led to further expansions into Asia. Several other models propose an exit from Africa prior to the Toba supereruption in Sumatra at 73.5 ka. For example, Oppenheimer (2003) has proposed an exit toward the end of stage 5, due to a fall in the salinity of the Red Sea. On this scenario, early modern Asians might have been profoundly affected by the Toba eruption, although continuity of toolkits across the Toba layer in India has also been proposed (Petraglia et al. 2007).
Cohen et al. (2007) have argued for a similar model, providing evidence for several “megadroughts” across tropical Africa 130–90 ka, and proposing that the dispersal of modern humans took place during stage 5, 90–70 ka, albeit along the Nile valley, in the crossover period before high aridity hit Eurasia after 70 ka. However, Scholz et al. (2007) argued from the same data that the stable moist phase in tropical Africa during the glacial period of MIS 4, beginning ∼70 ka, most likely coincided with the expansion of modern human populations, fitting a spread from Africa a little later, by ∼60 ka. Mellars (2006) also invoked environmental change and a similar expansion time but coupled this to innovations related to behavioral modernity.
The time window for the out-of-Africa migration on the basis of mtDNA lies between the emergence of haplogroup L3 in Eastern Africa and its derivative non-African haplogroups M and N, which most likely arose during the departure or outside Africa (Richards et al. 2006). Since the age of L3 places an upper bound on the time of the out-of-Africa migration, a sufficiently precise estimate would allow us to test these hypotheses—at least to the extent of determining whether maternal lineages from any putative pre-Toba dispersal might have survived to the present day. The mtDNA molecular clock estimates have hitherto, however, indicated a range for the time of the most recent common ancestor (TMRCA) of haplogroup L3 at around 50–90 ka (Salas et al. 2002; Macaulay et al. 2005; Behar et al. 2008; Endicott and Ho 2008; Soares et al. 2009), placing a rather imprecise upper bound on the timing of the dispersal. An even younger date for the exit better fits estimates from Y-chromosome variation (Shi et al. 2010), but there is considerable uncertainty about Y-chromosome mutation rates (Zhivotovsky et al. 2004).
The population expansions after the emergence of haplogroup L3 most likely led not only to global colonization but also to range expansions within Africa. There is an increase in microlithic/Mode 5 (Later Stone Age or LSA) technologies in the archaeological record from ∼50 ka onward throughout Africa, but this is uneven and gradual at best for much of the continent (Phillipson 2005; Barham and Mitchell 2008). Climatology has suggested more recent episodes of climate change in Africa that might have led to population movements and expansion. For example, following the last glacial maximum ∼21 ka and the Younger Dryas cold snap, moist conditions prevailed in West Africa from 11.5 to 5.1 ka (Weldeab et al. 2007), and in the Sahara, the Holocene climatic optimum lasted until 7.3 ka (Kuper and Kröpelin 2006), with a shift to the present aridity from ∼6 ka (Brooks et al. 2005). These early Holocene conditions likely influenced the spread of pastoralism into the Sahara (Brooks et al. 2005; Kuper and Kröpelin 2006). One major expansion in Africa that left a reasonably clear archaeological, linguistic, and genetic trail is the spread of Bantu speakers, alongside early domesticated resources and subsequently ironworking, that began in West-Central Africa by ∼3.5 ka, resulting in continent-wide dispersals both south and east (Phillipson 2005).
Eastern Africa, thought to be the most genetically diverse region of the world and the likely place of origin of L3 (Torroni et al. 2006) as well as the source for non-African populations (Tishkoff et al. 2009), is clearly pivotal to most of these expansions (Scheinfeldt et al. 2010; Campbell and Tishkoff 2011). There is also evidence that at least part of this region acted as a refuge during the severest climatic episodes of the last several hundred thousand years, particularly around wooded lake margins and perhaps also in coastal regions (Barham and Mitchell 2008; Basell 2008)—although Compton (2011) has recently suggested that the Mediterranean coast and the South African coastal plain may also have acted as refugia for humans during this period. Moreover, Eastern Africa was a key secondary dispersal area for the Bantu expansion (Salas et al. 2002; Phillipson 2005). We have therefore enriched the databases of mtDNA L3 diversity in Eastern Africa by firstly characterizing at low-resolution 327 samples from Sudan, Ethiopia, and Somalia and then selecting from these 70 L3 lineages for complete mtDNA genome sequencing. The application of several methods and mutation rates for the evaluation of TMRCAs allowed us to narrow the time span for the emergence of L3 and establish an upper bound for the out-of-Africa migration. In addition, Bayesian skyline analysis of 328 complete L3 sequences and founder analysis of 2,359 L3 hypervariable segment I (HVS-I) sequences enabled us to infer both local demographic expansions and migrations within Africa.
Materials and Methods
Samples, mtDNA Sequencing, and Haplogroup Affiliation
We collected a total of 102 Sudanese, 77 Ethiopian (both emigrants in Dubai), and 148 Somali (refugees in Yemen) samples, belonging to unrelated individuals, who gave appropriate informed consent for their biological samples to be used for mtDNA characterization. The work complied with the Helsinki Declaration of Ethical Principles (59th World Medical Association General Assembly, Seoul, October 2008) and was approved by the Ethics Committee of the University of Porto (11/CEUP/2011). We sequenced hypervariable segments I and II (HVS-I and HVS-II) of all Sudanese and Ethiopian samples and HVS-I in the Somali population, using a procedure described previously (Pereira et al. 2000). This information was used to assign samples to haplogroups, following the most recent phylogenetic evidence, reported in Phylotree website (van Oven and Kayser 2009). The sequences obtained are reported in supplementary table S1 and supplementary material 2, Supplementary Material online.
We selected for complete mtDNA sequencing a total of 21 Sudanese, 16 Ethiopian, and 20 Somali samples chosen from the sequences characterized in this work, and 11 from Chad and 2 from Soqotra belonging to haplogroup L3 from data sets published previously (Cerný et al. 2007; Cerný, Pereira, et al. 2009). We followed the methodology and checking procedures reported in Pereira et al. (2007), and mutations were scored relative to the revised Cambridge reference sequence (Andrews et al. 1999). The 70 complete mtDNA sequences are deposited in GenBank (accession numbers JN655773–JN655842).
For the L3 phylogeny reconstruction, preliminary reduced-median network analyses (Bandelt et al. 1995) led to a suggested branching order for the trees, which we then constructed most parsimoniously by hand. We used the software mtDNA-GeneSyn (Pereira et al. 2009) to convert files.
For estimation of the TMRCA for specific clades in the phylogeny, we used the ρ statistic (Forster et al. 1996) and maximum likelihood (ML). We used ρ (the mean sequence divergence from the inferred ancestral haplotype of the clade in question) with a mutation rate estimate for the complete mtDNA sequence of one substitution in every 3,624 years, correcting for purifying selection using the calculator provided (Soares et al. 2009) and a synonymous mutation rate of one substitution in every 7,884 years. We estimated standard errors as in Saillard et al. (2000). We also obtained ML estimates of branch lengths using PAML 3.13 (Yang 1997), assuming the HKY85 mutation model with gamma-distributed rates (approximated by a discrete distribution with 32 categories). We converted mutational distance in ML to time using the same complete mtDNA genome clock.
We obtained Bayesian skyline plots (BSPs) (Drummond et al. 2005) from BEAST 1.4.6 (Drummond and Rambaut 2007) for the complete L3 sequences with a relaxed molecular clock (Drummond et al. 2006) (lognormal in distribution across branches and uncorrelated between them) and the HKY model of nucleotide substitutions with gamma-distributed rates. Previous ML analyses indicated that HKY is an adequate model, and it is not necessary to use more complex models (Soares et al. 2009). BSPs estimate the effective population size through time using random sequences from a given population.
Haplogroup L3 clearly does not equate to population data, but the signal associated with this haplogroup might nevertheless signal demographic processes in the populations carrying it, as suggested previously (Atkinson et al. 2009). The main purpose of this analysis was to provide BSP plots based on the L3 data—not to perform an independent calibration of the L3 phylogeny since we did not discern any secure calibration point within L3. With this purpose, we performed an analysis aiming to approximate the mutation rate to the one used in the remaining analyses. This was based on the age of L3, assigning a normal distribution with a mode age of 65 ka and a 95th percentile varying between 60 and 70 ka, based on the age estimations obtained with ρ and ML. Since the mutation rate is time dependent, we provided minimum age estimates for some of the younger clades, maintaining some level of independence from the remaining analyses with ρ and ML. Some of the subclades within L3d1a1, L3e1a, L3e1b, L3e1d, L3e3a, L3e4a, L3f1b1, and L3f1b4 that are present in Southern Africa (and Central Africa) were assumed to have arrived there due to Bantu migrations (Newman 1995), a notion bolstered by the founder analysis results (see below). In this way, the age attributed to these clades was at least 3.5 ka old since the clade must be at least the age of the Bantu expansion but could be older if it already carried some diversity in Central Africa before the beginning of the expansion southward. To take this into account, we set an exponential distribution with a mode of 3.5 ka and 95th percentile 8 ka, allowing the age to be substantially higher than 3.5 ka. BEAST uses a Markov chain Monte Carlo (MCMC) approach to sample from the posterior distributions of model parameters (branching times in the tree and substitution rates). Specifically, we ran 100,000,000 iterations, with samples drawn every 10,000 MCMC steps, after a discarded burn-in of 10,000,000 steps. We checked for convergence to the stationary distribution and sufficient sampling by inspection of posterior samples, and we visualized BSPs with Tracer v1.3. We used a generation time of 25 years, as in Fagundes et al. (2008), and forced the larger subhaplogroups to be monophyletic in the analysis since we aimed at a tree structure that is directly comparable with the remaining analyses.
We used the mutation rate obtained in the L3 overall tree to run analyses for the subhaplogroups and subregions. The subhaplogroup analyses allowed to us distinguish which clades were mainly responsible for the increases in effective population sizes observed in the overall L3 analysis. The subregion analyses permit us to observe if a given region contains the signal for a given population increase in its specific L3 sequences, indicating a possible population expansion within that region involving several of the subclades. In order to perform a systematic comparison and description of the increment periods in the effective population size of the BSP, we calculated a rate of population size change through time, and it was calculated relative to individuals per 100 individuals for a period of 100 years (see supplementary fig. S1, Supplementary Material online for a graphic of the variation of this value for the BSP of overall L3). As a definition, we considered the rate of increase of at least 1 individual per 100 individuals in 100 years as a steeper increase, matching well the visualization of the BSPs.
To visualize the geographical distribution of L3 and its subhaplogroups, we constructed interpolation maps using the “Spatial Analyst Extension” of ArcView version 3.2 (www.esri.com/software/arcview/). We used the “inverse distance weighted” (IDW) option with a power of two for the interpolation of the surface. IDW assumes that each input point has a local influence that decreases with distance. The geographic location used is the center of the distribution area from which the individual samples of each population were collected. The data used are listed in supplementary table S2, Supplementary Material online.
We employed a founder analysis (Richards et al. 2000) for the L3 sequences in Africa. It assumes a strict division between source and sink populations and a series of criteria to partly account for homoplasy and back migrations between regions. The distinction between source and sink populations is straightforward for most lineages in Africa, except for some deep-ancestry clades which are present in both Eastern Africa and Central Africa. Some cautionary steps were taken in order to allow for the effect of gene flow in both directions. We performed the following founder analyses:
(1) From Eastern Africa into Central Africa: Eastern Africa was the source of most of the ancient L3 variation, although some subclades (L3b, L3d, and L3e, as we will discuss in Results) most probably emerged in Central Africa, with their presence in Eastern Africa related to migrants from Central Africa and not the other way around; so these subclades were not assumed to be founders from Eastern Africa. Instead, two founders (containing the entire L3bd and L3e clades) were accounted for as Eastern African founders into Central Africa. We opted for enlarged definitions of Central Africa (including all of Nigeria) and East Africa (extending from Sudan to Tanzania) for the four founder analyses to minimize potential problems with genetic drift that might arise with selected smaller data sets. In this case, the inclusion of all of Nigeria in Central Africa would only increase the data set of the sink population and is perfectly valid when considering an eastern source.
(2) From Central Africa into Eastern Africa: Central African haplogroups were involved in important expansions in Central Africa, namely the Bantu expansion. In this case, only L3b, L3c, and L3e would provide founders into Eastern Africa. We included a single founder, the L3 root type that emerged and evolved in situ, to account for the remaining L3 variation observed in Eastern Africa. Although only a small fraction of Nigeria was a source for Bantu speakers, the extended boundaries are nevertheless beneficial for its use as a source for Bantu lineages into Eastern Africa (or Southern Africa) since it would either contain lineages that had subsequently moved from the more strictly defined Central Africa, improving the source data set, or contain lineages not involved in these migrations that would not impact on the analysis.
(3) From Central and Eastern Africa into North Africa: We excluded the Sahel from the analysis as this region has had continuous recurrent gene flow both from north and south, dissolving the definition of well-defined populations, and the populations in this area would be difficult to either classify as source or sink population.
(4) From Central and Eastern Africa into Southern Africa: We performed this analysis in two ways, considering two versions of the Southern African data set as the sink population. At least two major routes are thought to have been taken by Bantu speakers spreading into Southern Africa, a “western stream” and an “eastern stream.” The western stream took the expanding Bantu from eastern Nigeria/Cameroon into Angola, South Africa, and Botswana from around 3.5 ka, whereas the eastern stream reached Mozambique and South Africa within the last 2 ka, starting from a core area in the Great Lakes region of Uganda 2.5 ka (Bandelt et al. 2001; Salas et al. 2002; Phillipson 2005). In the first, we considered the entire Southern African data set, but for the second, we tested for the possibility of a more recent ancestry for populations at the extreme southeastern tip of the continent, which likely received migrants primarily from the eastern stream of the Bantu expansion as well as more recent immigration from Eastern Africa (Phillipson 2005; Huffman 2007). Here, we therefore only included samples from Mozambique and South Africa.
We reconstructed, haplogroup by haplogroup, HVS-I networks in the range 16,051–16,400 bp of the reference sequence (Andrews et al. 1999). Data used are indicated in supplementary table S2, Supplementary Material online. We used an f1 criterion (which dictates that a sequence type is only considered a founder if it presents derived variation in the hypothetical source population) to identify founders, in order to allow for some homoplasy and back migrations (Richards et al. 2000).
We estimated the age of the migration of each founder using the ρ statistic (Forster et al. 1996) and an HVS-I mutation rate of one mutation every 16,677 years (Soares et al. 2009). In order to assess the error in the Bayesian partitioning across the different migration times realistically, we calculated an effective number of samples in each founder. This was obtained by multiplying the number of samples in each founder by a ratio of the variance assuming a starlike network and the variance calculated with the Saillard et al. (2000) method. We scanned the distribution of founder ages for each region defining equally spaced 200-year intervals for each migration from 0 to 100 ka. We performed a second analysis using defined migration times, based on the previous scan in the context of archaeological and climatological evidence. This second analysis allowed us to fractionate the L3 lineages into each migration and to probabilistically detect which lineages moved in each migration.
Phylogeography of Haplogroup L3
The complete mtDNA phylogeny of haplogroup L3 is shown in the supplementary tree, Supplementary Material online, including age estimates using the complete genome (using ρ and ML) and the synonymous (using ρ) clocks. Figure 1 presents an outline topology of L3, indicating the primary branches with age estimates using the same methods, scaled against the ML estimates.
L3 most likely had an origin in Eastern Africa (Torroni et al. 2006). This is supported by the presence of all major branches, with L3a and L3h virtually specific to the region and L3eikx and the L3f haplogroups having a probable origin there as well, whereas for L3bcd, the region of origin is less clear and will be discussed in more detail below. The other two main branches of L3, M and N, exist only outside Africa, except for some back-migrations into Africa around 50–30 ka in the form of haplogroups U6 and M1 (Olivieri et al. 2006; Pereira et al. 2010) and some more recent intrusions (Cherni et al. 2009; Ottoni et al. 2010). Since there is strong evidence that the dispersal out of Africa was via the Horn, soon after L3 arose (Macaulay et al. 2005), the distribution of M and N also points to Eastern Africa as the center of gravity for L3. It seems likely that L3 dates somewhere between 60 and 70 ka, as the TMRCA estimates vary between 58.9 ka (using ρ and the complete genome) and 70.2 (using ML), with the synonymous clock providing a value between the two (63.1 ka). We checked if the L3 tree rejected a strict molecular clock by running the ML analysis without stipulating a molecular clock and performing a likelihood ratio test that clearly did not reject the clock hypothesis (P = 0.9899). These results led us to consider an age of 65 ka (varying between 60 and 70 ka in the 95% percentile) in the internal calibration of BEAST, as there was no other reliable calibration point we could use. The ages using the complete genome and the synonymous clock were 1.04 and 0.97 times the BEAST estimates, respectively, indicating that the BSPs were calculated using a similar rate to the other analyses, as we intended.
The BSP for L3 overall points to three main episodes of population growth within Africa associated with L3 (fig. 2 and table 1). The first is steepest at ∼40 ka, although with a gradual start from before 50 ka. This signal is not observed in the BSPs of any of the individual L3 branches analyzed; it corresponds to the emergence of the various subhaplogroups between 40 and 50 ka. Although the root type of L3 probably expanded soon after its emergence 60–70 ka, given its seven basal branches, the single coalescence point was not detected in the BSP. The second episode is a steep increase in the effective population size around the beginning of the Holocene (table 1), associated particularly with the main Central African haplogroups (L3bd, L3e). It is also suggested by several other haplogroups present in Eastern Africa at the time (L3h, L3x, and L3f); no signal was detected for each of them individually (supplementary figs. S2–S5, Supplementary Material online), but it was observed when they were combined (supplementary fig. S6, Supplementary Material online). The third major expansion took place from ∼4 ka and led to the largest increase (table 1), and as the signal is only pronounced in the Central African BSP (Supplementary fig. S8, Supplementary Material online), this seems to have been a localized occurrence. Atkinson et al. (2009) performed a similar analysis on African lineages, including haplogroup L3, but the resolution of L3 was much lower in that case (80 sequences against >300 now), and they did not detect the two distinct increases in the Holocene.
|Data Set||Peak||Range of Increment||Increment Ratio|
|Data Set||Peak||Range of Increment||Increment Ratio|
NOTE.—Increment ratio corresponds to the number of times the effective population size increase during this period.
Eastern African Origins
The diversification of L3 in Eastern Africa began early, as demonstrated by the ages of L3a and L3h (fig. 1), both of which are virtually specific to this region (fig. 3A). The BSP for Eastern Africa (supplementary fig. S6, Supplementary Material online) alone rises most steeply only after 40 ka (table 1), but the plot shows a progressive increase from before 50 ka. Accordingly, the scan of HVS-I diversity of founder L3 lineages in Eastern Africa showed a peak at ∼58.8 ka (corresponding to nearly three quarters of the L3 data in Eastern Africa; table 2), followed by a second peak at ∼1.8 ka.
|Migration Time (ka)||% of L3 Lineages (SE)|
|East Africa||58.8||74.0 (0.5)|
|Central Africa||42.4||75.0 (2.7)|
|North Africa||35.0||7.4 (2.7)|
|South Africa (whole)||3.2||86.7 (4.3)|
|South Africa (southern)||1.8||83.4 (3.7)|
|Migration Time (ka)||% of L3 Lineages (SE)|
|East Africa||58.8||74.0 (0.5)|
|Central Africa||42.4||75.0 (2.7)|
|North Africa||35.0||7.4 (2.7)|
|South Africa (whole)||3.2||86.7 (4.3)|
|South Africa (southern)||1.8||83.4 (3.7)|
NOTE.—Migration times were obtained from a previous scan using increments of 200-year intervals resulting in probable peaks of migration. A migration time of 0.1 ka was included with each analysis to account for historical gene flow, with the exception of the founder analysis into North Africa where the 0.6 ka value was already close to this value. SE, standard error.
Haplogroup L3f most likely also arose in Eastern Africa, where it is both most frequent (fig. 3B) and most diverse (Salas et al. 2002), although several branches were carried by migrants into the Sahel and Central Africa: L3f1 has two main subclades (L3f1a and L3f1b) with basal lineages from Eastern Africa; L3f2 is specific to Eastern Africa; and L3f3 seems to have expanded into the Sahel 8–9 ka hypothetically from an Eastern African source (Cerný, Fernandes, et al. 2009), as it is also present in Central and North Africa. The scan of the founder lineages in Central Africa (fig. 4B) suggests that this region received a strong genetic input of L3f lineages in the beginning of the Holocene, around the time of the expansion of L3f3, perhaps reflecting a wide scale demographic process in the early Holocene, also suggested from the BSP analysis (supplementary fig. S6, Supplementary Material online; table 1). L3eikx also most likely originated in Eastern Africa, with subclades L3i and L3x virtually exclusive to this region (fig. 3A) and the rare L3k mainly present in North Africa. The complete genome tree confirms that L3e (the most frequent African L3 clade: fig. 3C), however, seems to have an origin in Central/West Africa (Bandelt et al. 2001; Salas et al. 2002): the Eastern African samples are clearly derived within the tree, suggesting gene flow from Central Africa. The recent gene flow into Eastern Africa is represented by the peak at ∼1.8 ka (fig. 4A and table 2) in the HVS-I founder analysis, which includes some L3e lineages.
L3bcd has three main subclades, with L3b and L3d tentatively united by a transition at control region position 16124 to form the putative subclade L3bd. The great age of L3bcd and its wide distribution across Africa makes phylogeographic inferences difficult. Furthermore, L3c is extremely rare: Only two samples have been detected so far, one in Eastern Africa and the other in the Near East. This might echo an early origin of L3bcd in Eastern Africa, before moving west, but its rarity makes this conclusion extremely tentative. In a scenario of an early origin of L3bcd in Eastern Africa, M and N would be the only subclades of L3 to have most likely originated outside of Eastern Africa (although an origin in Eastern Africa remains possible: Richards et al. 2006). L3b and L3d most likely began to diversify in Central/West Africa, representing the earliest major spread of L3 lineages within Africa that we are able to detect. The recent peak of the founder scan for L3, dating to ∼1.8 ka in Eastern Africa, mainly comprises L3b and L3d lineages in the corresponding partition in table 2 (founders F8, F16, F17, and F25 in supplementary table S4, Supplementary Material online). Many of these lineages are also present in Southern Africa, with founder age estimates indicating that they arrived very recently, probably again within the last two millennia. The frequency of these four lineages is higher in the southern part of Eastern Africa (9% in Tanzania, 4.7% in Somalia, and 3.2% in Kenya) than to the north (2.3% in Ethiopia and 0.78% in Sudan). Together with the age estimates, this points to a genetic input from West/Central Africa into Eastern Africa within the last few thousand years, into regions that now have many Bantu-speaking populations.
Central and West Africa
The most common L3 lineages in Central and West Africa are L3b, L3d, L3e, and L3f, and apart from the latter, they most probably all originated within this region. L3b has a rare subbranch (L3b2) and one widespread subbranch (L3b1) with the frequency focus in Central/West Africa (fig. 3D). L3b1 has point estimates between 12.6 and 17.6 ka (depending on the clock), and its most common subclade, L3b1a, has point estimates of 11.7–14.8 ka, with starlike patterns suggesting involvement in major expansion. For L3d, all Eastern African samples except one (sample 78 in the supplementary tree, Supplementary Material online) are derived in the tree. The HVS-I data also suggest that the diversity in Eastern Africa (23.1 ± 10.5 ka) is lower than the overall value for L3d (28.9 ± 7.4 ka) and for Central/West Africa (29.7 ± 7.1 ka), and again the focus of the frequency distribution (fig. 3E) is Central/West Africa.
Although L3eikx arose in Eastern Africa, L3e itself most probably had an origin in Central Africa where all five of its subclades are found. The ancestors of haplogroup L3e may have moved into Central Africa around the same time as for L3bd, as part of the same range expansion, especially if we consider an origin of L3bcd in Eastern Africa. In the founder analysis scan, L3bd and L3e were displayed as a single peak (although not very sharp) dating to ∼42.4 ka (fig. 4B). This is also the time where growth is observed in the BSP of the L3 data (fig. 2 and table 1). Using this value as a migration time in the founder analysis, it would comprise three quarters of the L3 Central African data. The second peak dating to 9.2 ka (table 2) corresponds mainly to L3f1 lineages (founders F59 and F69 in supplementary table S4, Supplementary Material online), closely matching the time of expansion of L3f3 into the Sahel (Cerný, Fernandes, et al. 2009).
The BSPs for Central Africa show a steep rise in the effective population size after 4 ka and centered at 2.6 ka (table 2). The signal detected in the BSP corresponds mainly to population growth within the Central African populations, not detected in the founder analysis, which only detects range expansions from one region to another. This demographic expansion most probably corresponds to the Bantu agriculturalist expansion that occurred after ∼4 ka in Central Africa.
The scan of founder lineage variation (fig. 4C) in North Africa indicates two peaks at 0.6 and at 6.6 ka. The data set for North Africa includes a high fraction of direct matches (nearly 20%) which contribute to the more recent peak, most likely reflecting movements into North Africa from Central/Eastern Africa within the last 1,000 years, perhaps including lineages carried with the trans-Saharan slave trade (Harich et al. 2010). We performed a founder analysis stipulating three migration times, including a third one of 35.0 ka, based on the ages of U6, L3k and the population increase in the BSP (supplementary fig. S7, Supplementary Material online) with results presented in table 1. The biggest slice corresponds to the peak at 6.6 ka, corresponding to one-third of the L3 lineages (table 2), mainly affiliated to haplogroup L3e5, which is largely restricted to Northwest Africa. It dates to 12.4–13.6 ka and so may have begun earlier than 6.6 ka. The other major lineages contributing to the 6.6 ka partition (Central African L3b, L3e1, and L3e2 lineages: founders F17, F28, and F41 in supplementary table S4, Supplementary Material online), suggest the postglacial period was characterized by gene flow across the Sahel belt (Cerný et al. 2007); these founder clades are mainly restricted to Northwest Africa and absent from Egypt.
The BSP for North Africa also indicates the major rate of increase in the effective population size centered at about 10 ka (supplementary fig. S7, Supplementary Material online; table 1), contemporaneous with Central and Eastern Africa. Since the majority of L3 lineages in the scan had an origin elsewhere, the timing of the growth signal observed in the North African BSP might in fact be describing the demography of the source, with many of the coalescences occurring prior to the dispersal into North Africa. However, an independent analysis of North African U6 (Pereira et al. 2010) indicated growth associated with this haplogroup ∼10 ka.
Southern African samples, all from Bantu speakers, are scattered throughout the L3 tree, with typically only a single sequence per founder type. A BSP for the Southern Africa data would be quite misleading since these sequences consistently coalesce outside Southern Africa and are few in number (19). Therefore, we must turn primarily to the larger volumes of HVS-I data.
Southern Africa is known to have received a high genetic input from further north with the Bantu expansion of the last two millennia (Pereira et al. 2001; Salas et al. 2002). The founder analysis for Southern Africa as a whole yielded a single peak at 3.2 ka (fig. 4D), closely matching the time of the initial expansion south of Bantu speakers ∼3.5 ka (Barham and Mitchell 2008). This indicates that all the L3 lineages in Southern Africa are likely to be the result of migrations within the last few millennia: L3 in the south does not predate the Bantu dispersals. Although the western stream was at least partly distinct from the eastern stream, the founder types are mostly the same, indicating that they were probably from the same proximal source in Central Africa or were already in contact before reaching the south. The founder analysis using only the data from Mozambique and South Africa yielded a single younger peak of founder lineages at 1.8 ka (fig. 4D), matching closely the time of arrival of the eastern stream of Bantu-speaking agriculturists at the tip of the continent (Phillipson 2005), which probably predominated in this region (Huffman 2007). This peak also provides a good corroboration for the HVS-I clock employed here, also recently supported by estimates of the expansion into Remote Oceania (Soares et al. 2011).
The time of emergence and expansion of L3 is a crucial issue in the study of human evolution since it provides an upper bound on the time of the out-of-Africa migration. Haplogroups M and N, the non-African branches of L3, most likely arose outside Africa or “en route,” serving as a lower bound for the dispersal. The range we have obtained for the age of L3 (∼60 to 70 ka, with overlapping 95% confidence intervals for the various estimates from 59 to 79 ka) serves to virtually exclude the suggestion of an out-of-Africa exit (at least, of the ancestors of the non-African human populations alive today) before 73.5 ka, the time of the Toba supereruption. Some archaeological data suggest the possibility of earlier migrations out-of-Africa during MIS5 or in the transition to it both through the Levant (Bar-Yosef 1992) or via the southern route (Petraglia et al. 2007; Armitage et al. 2011), but no maternal descendents of these possible earlier migrants remain in the extant population; L3 and its M and N daughters had not yet been born at that time.
Furthermore, since the age of the M and N Eurasian founders of ∼50 to 65 ka (Soares et al. 2009) is close to the age of their ancestral L3 clade in Africa, the out-of-Africa dispersal may have been of a piece with the initial diversification and expansion of L3, so that the L3 expansions in Eastern Africa and the exit of modern humans from Africa ∼60 ka were all part of a single demographic process. It seems likely that the moister climate after ∼70 ka in Eastern Africa allowed dramatic human population growth (Scholz et al. 2007), perhaps associated with improved hunting, marine exploitation, exchange networks, and possibly even plant food management strategies as suggested by Mellars (2006). This generated the oldest major signal of expansion in the human mtDNA tree, the radiation of L3, leading rapidly to the spread of H. sapiens toward the rest of the world. It is worth stressing that this signal is not reflected in other mtDNA lineages at this time (Behar et al. 2008). Within Africa, the Pleistocene migrations detected in the L3 pool were responsible for the introduction of L3bd and L3e into Central Africa in the period between 60 and 35 ka (fig. 5A), but none reached Southern Africa at that time.
The timing therefore suggests that the expansion and dispersal may have been primarily stimulated by environmental factors (Cohen et al. 2007; Scholz et al. 2007), with the model of Scholz et al. (2007) providing the best current fit to the mtDNA chronology. This resembles the more detailed model proposed by Mellars (2006), but with the caveat that although Mellars proposed environmental change as an important driver, he also coupled the expansion to a step change in the move toward behavioral modernity, by analogy with the European Upper Paleolithic, and in contrast to more incremental models of the African transition to modernity. This increased evidence for modernity is most evident in the MSA finds in Southern Africa ∼70 to 80 ka (e.g., Henshilwood et al. 2002, 2009; Texier et al. 2010), although it might plausibly have originated in Eastern Africa a little earlier (Mellars 2006).
However, neither an origin of L3 in Southern Africa, as suggested by Compton (2011), nor a dispersal from Eastern Africa to Southern Africa at ∼70 to 80 ka, as implied by Mellars (2006), can be plausibly coupled with the expansion of haplogroup L3 as reconstructed here. The nearest phylogenetic relatives of L3, L4 and L6, are both localized entirely within Eastern Africa, Arabia, and the Near East (Kivisild et al. 2004; Torroni et al. 2006; Behar et al. 2008), strongly supporting an origin for L3 in Eastern Africa and effectively ruling out the possibility of a dispersal from Southern Africa, where the indigenous prefarming/herding. lineages today belong solely to L0d and L0k.
There is an intriguing possible rider to this conclusion. North Africa has been entirely depopulated and repopulated, at least with respect to mtDNA variation (Pereira et al. 2010), since the time of the Aterian industry, where modern symbolic behavior is attested very early, similar to Southern Africa and in contrast to Eastern Africa (Barton et al. 2009). We might therefore contemplate a possible North Africa ancestry for L3, with its rapid radiation corresponding to an early range expansion into Eastern Africa. However, any potential dispersal between the Mediterranean and the Horn of Africa around the time of the MIS4/3 transition would face severe environmental difficulties, unlike the “green Sahara” conditions of MIS5 and the early Holocene (Drake et al. 2011). We therefore conclude that an indigenous origin for L3 in Eastern Africa remains by far the most likely scenario.
As Mellars (2006) has argued, the early evidence for symbolically mediated behavior in both North and Southern Africa rules out any simple direct link for the expansion of L3 (Watson et al. 1997; Ambrose 1998). Evidence of engraved ochre now extends back to at least 100 ka (Henshilwood et al. 2009), Nassarius marine shell beads were evidently present across the range of early modern humans from Southern Africa to North Africa, and the Levant before 80 ka—possibly tens of thousands of years earlier (Mellars 2006; Vanhaeren et al. 2006; Bouzouggar et al. 2007; Barton et al. 2009; d'Errico et al. 2009; )—and evidence for burial ritual is found in early modern humans in the Levant dating to 90–110 ka (Mellars 2006; Shea 2008). Thus, as suggested by Basell (2008), the demographic expansions that led to the first successful dispersal out of Africa seem better explained by the play of palaeoenvironmental forces than by recourse to the advantages of “modernity.”
Major L3 dispersals within Africa also took place during the Holocene, including, but not limited to, the Bantu expansion (fig. 5B). Indeed, the immediate postglacial period may have been at least as significant for reshaping the African genetic landscape. The Lateglacial period of tropical Africa is now thought to have witnessed a megadrought at 16–17 ka comparable to those prior to 70 ka (Stager et al. 2011). From ∼11.5 ka, however, for a few thousand years, the climate was warm and humid, although less so at the northern and southern extremes (Brooks et al 2005; Kuper and Kröpelin 2006; Weldeab et al. 2007; Barham and Mitchell 2008). The Holocene climatic optimum allowed population sizes to increase and movement of people probably also took place. We detected increases in effective population sizes at this time both in Eastern Africa and Central/West Africa, which had distinct L3 mtDNA gene pools. Thus, the postglacial population expansions were widespread and most likely caused by continent-wide climate change. They mirror the mtDNA expansion signals elsewhere in the world in the early Holocene, such as in Europe (Atkinson et al. 2008; Soares et al. 2010) and Southeast Asia (Soares et al. 2008).
During the climatic optimum, genetic exchange occurred between Eastern Africa, Central Africa, and the Sahel, with the onset of wetter conditions and the shifting of the desert margins (Brooks et al. 2005; Kuper and Kröpelin 2006). Haplogroup L3f3 is thought to have expanded from Eastern Africa into Chad ∼8 ka (Cerný, Fernandes, et al. 2009). The present analysis shows that not only L3f3 but several L3f clades (mainly L3f1) probably also expanded around this time into Central Africa, leading to L3f ’s widespread distribution in Central Africa and to its later contribution to the maternal gene pool of Bantu-speaking agriculturists. The North African data also show migrations across the Sahel belt in this time period, perhaps associated with the expansion of herding across the Sahara (Barham and Mitchell 2008), confirming its status as a major corridor for bidirectional migration (Cerný et al. 2007).
The most recent major signal of population expansion in L3 evidently corresponds to the Bantu expansions, which are well attested by linguistic, archaeological, and genetic evidence (Richards et al. 2004; Phillipson 2005). We detected a sharp population size increase ∼4 ka in the Central African data, the largest witnessed in the L3 data. Gignoux et al. (2011) have also detected population growth for sub-Saharan Africa in this time frame using a data set of combined African haplogroups previously assigned to Bantu dispersals (Salas et al. 2002), including L3d and L3e2. The founder analyses from Central Africa into Eastern Africa and Southern Africa showed migrations involving L3b, L3d, L3e, and L3f lineages within the last 2–3 ka. The data do not indicate any genetic input involving L3 lineages into Southern Africa before this time; nor is there any distinct signal for more recent immigration within the last millennium (Huffman 2007). The BSP for Central Africa and the founder analyses to Southern Africa show a strong correspondence between the beginning of the Bantu expansion in Central Africa and the timing of the Bantu dispersals to the south. They thereby corroborate the molecular clocks and dating methodologies that we have employed and strengthen our conclusions regarding the less well-attested earlier Holocene and late Pleistocene dispersals.
The URLs for data presented herein are as follows: Calculator for converting ρ values and ML estimates to age estimates, http://www.fbs.leeds.ac.uk/staff/profile.php?tag=Richards; Network 4, http://www.fluxus-engineering.com/sharenet.htm; and mtDNA-GeneSyn, http://www.ipatimup.pt/downloads/mtDNAGeneSyn.zip
We thank Paul Mellars, Peter Mitchell, and Stephen Oppenheimer for critical advice. FCT, the Portuguese Foundation for Science and Technology, supported this work through the research project PTDC/ANT/66275/2006 and the personal grants to V.F. (SFRH/BD/61342/2009), J.B.P. (SFRH/BD/45657/2008), M.D.C. (SFRH/BD/48372/2008), N.M.S. (SFRH/BD/69119/2010), and P.S. (SFRH/BPD/64233/2009). P.S. also thanks the de Laszlo Foundation for financial support. The Ministry of Education of the Czech Republic also supported this work through the grant KONTAKT ME 917. IPATIMUP is an Associate Laboratory of the Portuguese Ministry of Science, Technology, and Higher Education and is partially supported by FCT.