Abstract

Widespread interest in the first successful Out of Africa dispersal of modern humans ∼60–80 thousand years ago via a southern migration route has overshadowed the study of later periods of South Arabian prehistory. In this work, we show that the post-Last Glacial Maximum period of the past 20,000 years, during which climatic conditions were becoming more hospitable, has been a significant time in the formation of the extant genetic composition and population structure of this region. This conclusion is supported by the internal diversification displayed in the highly resolved phylogenetic tree of 89 whole mitochondrial genomes (71 being newly presented here) for haplogroup R0a—the most frequent and widespread haplogroup in Arabia. Additionally, two geographically specific clades (R0a1a1a and R0a2f1) have been identified in non–Arabic speaking peoples such as the Soqotri and Mahri living in the southern part of the Arabian Peninsula where a past refugium was identified by independent archaeological studies. Estimates of time to the most recent common ancestor of these lineages match the earliest archaeological evidence for seafaring activity in the peninsula in the sixth millennium BC.

Introduction

During the course of the past 10 years, the study of the first migration of anatomically modern humans Out of Africa through the Bab el-Mandab Strait has dominated both archaeological (Petraglia and Alsharekh 2003) and genetic (Quintana-Murci et al. 1999; Kivisild et al. 2004; Macaulay et al. 2005) investigations of the Arabian Peninsula. Consequently, this issue has overshadowed other periods of prehistory that might have been even more significant in the formation of regional extant population.

In particular, recent archaeological and palaeoclimatological investigations have revealed the importance of the terminal Pleistocene and Holocene in the population history of Arabian Peninsula (Petraglia and Rose 2009) and have identified three possible core zones (the Red Sea, the South Arabian [Dhofar], and the Persian Gulf Basins) that could have served as refugia during environmental downturns (Rose and Petraglia 2009). It appears that the period from 35 to 20 thousand years ago (henceforth referred to as KYA), as lake deposits in northern and southern parts of the Peninsula testify, was a favorably wet one. Subsequently, however, the Last Glacial Maximum (LGM), a period lasting several thousand years from around 20 KYA, has been marked by eolian processes (erosion and deposition caused by wind) and dune formation in the Rub’ al Khali desert as well as other places of Arabia (Bailey 2009; Parker 2009). This change to a harsher and more arid climate was likely associated with the general movement or exodus of hunter–gatherer populations to more hospitable locations.

New evidence points to a brief additional wet phase between 15 and 13 KYA (Parker 2009). This could be related to the Bölling-Allerød (BA) interstadial (dated also between 15 and 13 KYA), which, in the northern latitudes, is associated with the mass wasting of its ice sheets. According to Parker (2009), “The BA brief interlude may have permitted brief human entry into parts of Arabia either from outside the peninsula or from refugia within Arabia (e.g., Dhofar or the now submerged Arabo-Persian Gulf Basin).” The most suitable places for population concentration were regions with high altitudes, which still today bear the highest population densities (Wilkinson 2009). The onset of the Younger Dryas ∼13.5 KYA ended the climatically favorable period until the Early Holocene when monsoon precipitation again increased ∼9.0 KYA. This last pluvial phase lasted until ∼5.0 KYA when present climatic conditions were established (Parker 2009).

Archaeological research has revealed that southwest Arabia, especially the highlands, must have been well populated over the past 5,000–6,000 years and may have acted as a long-term population centre during much of the Holocene (Wilkinson 2009); it remains to be determined if it also acted as a refuge for populations during the Palaeolithic. Furthermore, phylogenetic analyses of linguistic data demonstrate an expansion of extant Semitic languages throughout southern Arabia and East Africa some 1,000–3,000 years ago and their initial entry into the region may have occurred thousands of years earlier (Kitchen et al. 2009).

The contemporary distribution of ancestral and derived mitochondrial DNA (mtDNA) sequences shows that modern human colonization of Eurasia may have been accomplished via a southern coastal route ∼60–80 KYA (Forster et al. 2001; Metspalu et al. 2004; Macaulay et al. 2005; Thangaraj et al. 2005). Subsequent mtDNA diversification took place in south-western Asia, where haplogroups M and N (and its derivative R) evolved. In fact, precisely where these lineages first emerged is still under discussion. According to Metspalu et al. (2004), this process took place in three distinct regions in southern Asia (the Middle East, the Indus Valley, and East Asia) with only minor migrations between these regions. One of the most important results of such past mtDNA diversification processes is the absence or very low frequency of M haplogroups in Western Eurasia today, demonstrating that the division of an original “Out of Africa” gene pool can be placed somewhere west of the Indus Valley, with a later westward migration to the Middle East and back to Africa including only subbranch M1 (Forster et al. 2001; Olivieri et al. 2006).

Given its recent elevation to pivotal location in terms of past human migrations, the Arabian Peninsula became the focus of several genetic studies, both for mtDNA and for Y-chromosome markers (Richards et al. 2003; Kivisild et al. 2004; Luis et al. 2004; Abu-Amero et al. 2007; Rowold et al. 2007; Abu-Amero et al. 2008; Cadenas et al. 2008; Černý et al. 2008, 2009; Alshamali et al. 2009). At the mtDNA level, great complexity of the South Arabian gene pool has been demonstrated as gene flow from Eurasia for a substantial proportion of Ethiopian and Yemeni mtDNA haplotypes, especially (preHV)1 (Richards et al. 2003; Kivisild et al. 2004). Furthermore, Saudi Arabian mtDNAs show a high frequency of (preHV)1 lineages and molecular dates of main branches of this haplogroup fall within the Early Holocene (Abu-Amero et al. 2007, 2008; even if confidence intervals were large due to the small number of sequences).

The importance of the (preHV)1 haplogroup, recently renamed R0a (Torroni et al. 2006), was further demonstrated in a regional study of four Yemeni populations where a high level of sharing of R0a sequences was particularly striking given the otherwise low levels of haplotype sharing across Yemen (Černý et al. 2008). Moreover, it has been found that R0a is not only frequent but also diversified on Yemen’s Soqotra Island, where the new branch R0a1a1 has recently been identified (Černý et al. 2009).

Due to the high frequency of the R0a haplogroup and its internal diversification within Arabia and neighboring regions, we have focused our attention on this specific branch of mtDNA phylogeny. We present 71 new R0a whole-genome sequences from southern Arabia (Yemen) and northern/Horn of Africa (Somalia, Ethiopia, Sudan, Tunisia, and Morocco). These new sequences, together with 18 previously published genome sequences, reveal phylogeographic differences in the regional distribution of R0a throughout the Arabian Peninsula and East Africa. Estimations of the time to the most recent common ancestor (TMRCA) for different R0a subbranches, calculated with recently updated substitution rates, are discussed in the light of archaeological and paleoclimatic data.

Materials and Methods

Samples

Yemeni R0a haplotypes came from two different sampling collections undertaken by two authors (V.C. and C.M.). Those (n = 22) from the V.C. Yemeni data set (n = 250) were published recently and their locations are described therein (Černý et al. 2008, 2009). The Yemeni data set of C.M. (n = 550), from which we have used 28 R0a haplotypes, come from highland areas (Dhamar, Amran, Hajja, and Taizz), lowland areas near the Red Sea (Zabid), desert areas (Al Bayda, Abyan, Jawf, Maarib, and Shabwa), and the eastern part of Yemen (Hadramawt and Al Mahra). We also performed complete mtDNA sequencing on one Ethiopian Jewish R0a haplotype from the collection of C.M. In addition, we surveyed our Central, East, and North African data sets for the presence of R0a haplotypes, resulting in 2 samples out of 448 Chad Basin samples (Černý et al. 2007), 4 samples out of 149 from Somalia (unpublished), 4 out of 77 from Ethiopia (unpublished), 4 out of 102 from Sudan (unpublished), 4 out of 304 from Tunisia (Cherni et al. 2009), and 2 out of 81 from Morocco (Harich et al. 2010). In total, 71 haplotypes were selected for complete genome analysis (see supplementary material 1, Supplementary Material online).

Laboratory Work

Amplification of the hypervariable segment 1 (HVS-1) of samples from the V.C. data set was reported previously (Černý et al. 2008, 2009). Amplification of HVS-1 of the C.M. data set was performed using primer pair P23 (Gonder et al. 2007). From the combined set of HVS-1 sequences, we classified R0a haplotypes as those displaying substitutions at positions 16126 and 16362 and clearly not ascribed to any other haplogroup, such as J (which has an additional substitution at position 16069) or T (which also has a substitution at position 16294).

Whole-genome sequencing of all Yemeni, Chad, and Ethiopian Jew (total 53) samples was conducted using 24 primer pairs (Gonder et al. 2007). In contrast, 18 samples from Somalia, Ethiopia, Sudan, Tunisia, and Morocco were analyzed using 32 primer pairs (Maca-Meyer et al. 2001; Pereira et al. 2007). Polymerase chain reaction (PCR) products were purified and sequenced using the forward primers only. However, in cases of the poly-C stretch between nucleotides 568–573 and 16184–16193, the reverse primers were used as well. Sequencing was performed on an ABI 3100 DNA Analyzer (Applied Biosystems, Foster City, CA). Chromatograms were evaluated by two independent observers (V.C. and L.P.) with the help of SeqScape (Applied Biosystems) and BioEdit version 7.0.4.1 (Hall 1999). In cases of ambiguous results, new PCR amplification and sequencing reactions were performed. The complete mtDNA sequences are deposited in GenBank database with accession numbers: HM185203–HM185273.

Phylogenetic Analyses

A preliminary reduced median network analysis (Bandelt et al. 1995) led to a suggested branching order for the tree, which was then constructed by hand. All variable positions were used except 16182C, 16183C, and 16519, as they are too recurrent and inconsistently reported in the literature. We compared our phylogenetic reconstruction with the trees published in two papers (Abu-Amero et al. 2007; Alvarez-Iglesias et al. 2009) and found the only differences to be in the placement of the highly recurrent substitution at position 58 and the insertion 60.1T (for which we used the most parsimonious inference), and in the nomenclature of some branches due to updated information in our expanded phylogeny.

We used 71 new Yemeni, Moroccan, Tunisian, Sudanese, Ethiopian, Somali, and Chad whole R0a genome sequences and 18 previously published whole R0a genome sequences (Achilli et al. 2004; Palanichamy et al. 2004; Abu-Amero et al. 2007; Gasparre et al. 2007; Behar et al. 2008; Costa et al. 2009; Hartmann et al. 2009). An inconsistency was detected between the sequences deposited in GenBank and the tree published in Abu-Amero et al. (2007) for position 16355, which was reported as a transversion to A in GenBank and a transition in the published tree; we decided to consider it as the transition as it seems to be the most probable variant when compared with other genome and HVS-1 sequences. For calculation of the TMRCA for specific clades in the phylogeny, the ρ statistic (mean divergence from inferred ancestral haplotype) was used with mutation rate estimates for complete sequence of one substitution in every 3,624 years corrected for the effect of purifying selection and for the synonymous substitutions of one substitution in every 7,884 years (Soares et al. 2009) using the calculator provided in that paper. Standard errors were calculated as in Saillard et al. (2000).

Interpolation Maps and Evaluation of Spatial Autocorrelation

To determine and visualize the geographical distribution of R0a and R0a1a, interpolation maps were drawn using the “Spatial Analyst Extension” of ArcView version 3.2 (www.esri.com/software/arcview). The “inverse distance weighted” (IDW) option with a power of two was used for the interpolation of the surface. IDW assumes that each input point has a local influence that decreases with distance. The geographic location used is the centre of the distribution area from where individual samples of each population were collected. This allows one to obtain the Morans I index and Z score (a statistic that assesses the significance of the null hypothesis, H0: there is no spatial pattern among the values associated with the geographic points in the study area). Correlograms for Morans I indices versus distances were obtained for R0a and R0a1a haplogroups using PaSSAGE software v 1.0 (www.passagesoftware.net). An interval of ten distance classes was used for all data points studied here and compared with ten distance classes for fewer data points (grouping some of the closest Yemen samples). The existence of a cline is assumed when a continuous decline trend composed of statistically significant points is observed.

Results

Frequencies and Geographic Distribution of R0a Haplogroups

In our population data sets of 800 HVS-1 sequences from Yemen, 176 samples (22%) were classified as R0a comprising 46 different haplotypes. The ancestral type, with only two diagnostic variants present in HVS-1 (16126–16362), is the most common (n = 43). This haplotype is present in 16 of 22 different Yemeni populations (see supplementary material 2, Supplementary Material online). The second most common haplotype (n = 38) is a one step descendant of the ancestral type (16126-16355-16362), which has been named R0a1a (Abu-Amero et al. 2007; Černý et al. 2009). These two haplotypes constitute almost half of the R0a pool in Yemen.

Among the next most frequent R0a haplotypes, R0a1a1a (16126-16172-16355-16362), previously identified only in Soqotra (Černý et al. 2009), was also observed in Al Mahra, the region of eastern Yemen near Omani Dhofar close to Soqotra. A similar frequency, but much wider geographical distribution, was also noted for haplotype 16126-16304-16362. The rest of the R0a haplotypes (n = 42) had quite low frequency, occurring mostly in only one of the sampled populations (see supplementary material 2, Supplementary Material online).

The geographic distribution of R0a frequencies was plotted across the Arabian Peninsula and neighboring regions. Figure 1a shows the highest frequency of R0a on Soqotra Island and the neighboring territory of Al Mahra, where it is observed in approximately one-third of the samples. Other locations in Yemen and Saudi Arabia show lower frequencies of R0a (except for some points near the Red Sea and in the central region of Saudi Arabia). Radiating from the main highest R0a frequency focus, frequencies decrease in all directions. The global value of Moran’s I index (Moran’s I index = 0.06; Z score = 1.9 standard deviations) shows that there is a 5–10% likelihood that this clustered pattern is the result of random chance. The plot of R0a1a is shown in figure 1b, and the global value of Moran’s I index (Moran’s I index = 0.01; Z score = 0.87 standard deviations) suggests that this pattern is random. The distribution of Moran’s I index versus distance classes (fig. 2a and b) shows that the distribution of both haplogroups is clinal, although some values of the distances classes are not statistically significant.

FIG. 1

Interpolation maps for R0a (a) and R0a1a (b) haplogroups.

FIG. 1

Interpolation maps for R0a (a) and R0a1a (b) haplogroups.

FIG. 2.

Spatial autocorrelation analyses for R0a (a) and R0a1a (b) haplogroups; filled points are statistically significant at the 5% level.

FIG. 2.

Spatial autocorrelation analyses for R0a (a) and R0a1a (b) haplogroups; filled points are statistically significant at the 5% level.

R0a Phylogenetic Tree Based on Whole-Genome Information

The R0a tree now includes 89 individuals, 71 of which are complete new sequences. Whole-genome sequencing revealed several coding region mutations important for the R0a topology. Figure 3 shows a simplified version of the R0a tree but the tree in the supplementary material 3 (Supplementary Material online) lists all mutations and all samples.

FIG. 3.

Main branches of the complete R0a mtDNA tree. The boxes with the haplogroup names are placed at the level of the respective TMRCA.

FIG. 3.

Main branches of the complete R0a mtDNA tree. The boxes with the haplogroup names are placed at the level of the respective TMRCA.

As was shown before (Abu-Amero et al. 2007), there are two main R0a branches, designated R0a1 and R0a2. R0a1 is defined by one substitution at position 827 and the largest proportion of its haplotypes, as explained in the previous section, bear substitutions at positions 146-8292-11761-16355 and form the previously named haplotype R0a1a. Most of the branches in R0a1a derive from the root. All Soqotra samples and one Tunisian sample belonging to this haplogroup share a substitution at position 13708; eight out of nine of these samples, and one sample from Al Mahra, share an additional substitution at position 16172, which we previously identified as haplogroup R0a1a1a (Černý et al. 2009). Thus, it seems that R0a1a1 may be widespread in Yemen but rare elsewhere, whereas R0a1a1a is dominant in Soqotra and only appears in geographically close areas.

Haplogroup R0a2 (bearing an insertion at position 60 and substitutions at 2355-15674) also shows several branches deriving directly from the root, but it contains a much larger number of subclades. Some of these subclades were previously named by Alvarez-Iglesias et al. (2009), but the improved resolution of our tree supports some revisions in nomenclature (see supplementary material 3, Supplementary Material online). An interesting branch is the widespread R0a2f. It was characterized by a substitution at position 8251, which was observed in one Italian and two nomadic Arabs from Chad. Further derived R0a2f haplotypes were observed in a cluster of eight Yemeni sharing substitutions at positions 131-7837-12542-13708-13827. This R0a2f1, without identifying mutations within the HVS-1 region, and for that reason not previously recognized (Černý et al. 2009), is also specific to Soqotra and adjacent regions of Yemen (four haplotypes were found on Soqotra, three in Al Mahra, and one in Hadramawt).

Furthermore, we detected a new branch diverging from the root of the tree, which we named R0a3. This new haplogroup shares the T insertion in position 60 and substitution at position 15674 with R0a2, but it does not have the substitution at 2355 and has an additional substitution at 15466.

TMRCA Estimates

The complete sequencing of this large set of R0a samples allowed us to apply both mutation rate estimates recently proposed by Soares et al. (2009), one rate for all positions and one for synonymous substitutions (table 1). The TMRCA obtained with only synonymous substitutions are ∼0.69–1.06 of the dates obtained when analyzing all positions. The relationship between the two TMRCA estimates (mean of 0.854) is not as close as that reported by Soares et al. (2009) (0.9452), demonstrating the randomness inherent in the emergence of neutral mutations as proposed by those authors as the most plausible explanation for differences in TMRCA estimates. For simplicity, in the following description, we refer only to the estimates obtained using all positions.

Table 1.

Age Estimates and Standard Deviations (in years) for the TMRCA of R0a Haplogroups.

Haplogroups All Positions Synonymous Substitutions 
R0a 22,588 ± 851 15,665 ± 1,157 
R0a1 21,573 ± 1,325 16,216 ± 1,852 
R0a1a 11,489 ± 993 9,683 ± 1,543 
R0a1a1a 3,160 ± 943 a 
R0a2 15,608 ± 933 14,484 ± 1,466 
R0a2f1 5,801 ± 1,294 a 
R0a2f1 + 11365 3,030 ± 1,120 a 
R0a3 12,455 ± 3,327 13,146 ± 5,786 
Haplogroups All Positions Synonymous Substitutions 
R0a 22,588 ± 851 15,665 ± 1,157 
R0a1 21,573 ± 1,325 16,216 ± 1,852 
R0a1a 11,489 ± 993 9,683 ± 1,543 
R0a1a1a 3,160 ± 943 a 
R0a2 15,608 ± 933 14,484 ± 1,466 
R0a2f1 5,801 ± 1,294 a 
R0a2f1 + 11365 3,030 ± 1,120 a 
R0a3 12,455 ± 3,327 13,146 ± 5,786 
a

Insufficient information for TMRCA estimation.

Table 1.

Age Estimates and Standard Deviations (in years) for the TMRCA of R0a Haplogroups.

Haplogroups All Positions Synonymous Substitutions 
R0a 22,588 ± 851 15,665 ± 1,157 
R0a1 21,573 ± 1,325 16,216 ± 1,852 
R0a1a 11,489 ± 993 9,683 ± 1,543 
R0a1a1a 3,160 ± 943 a 
R0a2 15,608 ± 933 14,484 ± 1,466 
R0a2f1 5,801 ± 1,294 a 
R0a2f1 + 11365 3,030 ± 1,120 a 
R0a3 12,455 ± 3,327 13,146 ± 5,786 
Haplogroups All Positions Synonymous Substitutions 
R0a 22,588 ± 851 15,665 ± 1,157 
R0a1 21,573 ± 1,325 16,216 ± 1,852 
R0a1a 11,489 ± 993 9,683 ± 1,543 
R0a1a1a 3,160 ± 943 a 
R0a2 15,608 ± 933 14,484 ± 1,466 
R0a2f1 5,801 ± 1,294 a 
R0a2f1 + 11365 3,030 ± 1,120 a 
R0a3 12,455 ± 3,327 13,146 ± 5,786 
a

Insufficient information for TMRCA estimation.

Overall, TMRCA estimates based on all positions, placed R0a and R0a1 in the time frame of 20,000–25,000 YBP (years before present) (22,588 ± 851 for R0a and 21,573 ± 1,325 for R0a1), ages somewhat older than the ones obtained by Abu-Amero et al. (2007), even if those estimates had large confidence intervals, that is, 18,959 ± 8,478 YBP and 18,993 ± 6,999 YBP for R0a (coding region and HVS-1, respectively) and 9,248 ± 7,604 YBP for R0a1 (coding region). Considerable population expansion seems to have occurred ∼12,000–16,000 YBP, with the emergence of the frequent and widespread haplogroups R0a1a (11,489 ± 993 YBP) and R0a2 (15,608 ± 933 YBP) as well as the less frequent and newly identified R0a3 (12,455 ± 3,327 YBP). Other R0a haplogroups seem to have emerged in the period between 5,000 and 11,000 YBP, with some being restricted to the Arabian Peninsula (such as R0a2c and an R0a1a haplogroup defined by 3438-5120-5333 both haplogroups have TMRCA estimates of 8,784 ± 2,744 YBP), whereas other haplogroups were observed in North and East Africa (R0a2a—6,549 ± 2,045 YBP; R0a2b—10,063 ± 2,298 YBP; and an R0a2 haplogroup defined by a substitution at position 9128—6,094 ± 2,268 YBP). The two branches restricted to Soqotra and neighboring regions in Yemen display TMRCAs of 3,160 ± 943 YBP (R0a1a1a) and 3,030 ± 1,120 YBP (R0a2f1 plus a substitution at position 11365), overlapping but narrowing the estimates previously reported when based only on HVS-1 variants (3,363 ± 2,378 YBP; Černý et al. 2009). Curiously, both R0a1a1a and R0a2f1 branches share the highly recurrent substitution at position 13708, which is a replacement substitution on the protein-coding gene ND5, converting an alanine (neutral apolar) to a threonine (neutral polar) (Pereira et al. 2009; Soares et al. 2009).

Discussion

In this work, we provide genetic evidence in support of human demographic expansions in South Arabia during the last 20 KYA when the climate was rapidly improving. We show that there is a statistically significant clustering pattern of mitochondrial haplogroups R0a throughout the Arabian Peninsula, reaching its highest frequency (∼30%) in southeast Yemen and especially on Yemen’s Soqotra Island. The distribution of Moran’s I index versus distance classes suggests a clinal distribution of R0a and R0a1a (fig. 1), although there are some nonsignificant values likely reflecting the heterogeneity of Arabia’s natural landscape where inhospitable desert alternates with more habitable oases and mountains. Furthermore, heterogeneity of sampling may play a role; in fact, when assaying fewer samples inside Yemen, the cline was clearer with four out of five points showing statistical significance (data not shown). Unfortunately, no available data from Oman were available to us.

Current evidence favors the introduction of R0 to Arabia from the Middle East, where the oldest lineages for the R clade are observed (Richards et al. 2000). However, it is possible that Arabia was an additional centre for R0 emergence. Our results show that several founders of R0a are present in southern Arabia, with TMRCA estimates suggesting population continuity between the terminal Pleistocene and Holocene. Some of these mtDNA lineages also spread to North and East Africa. Two periods of demographic expansions closely match two wet climatic periods: the end of the 35–20 KYA wet period and the brief wet phase between 15 and 13 KYA. The post-LGM expansion around 16 KYA seems to have been especially important (including the new R0a3 haplogroup and the very frequent haplogroup R0a1a with its many branches deriving directly from the root), which is concordant with the hypothesis of human resettlement of Arabia (Parker 2009).

Recent archaeological findings agree with the genetic evidence suggesting population continuity, especially in the Yemeni highlands where two Early Holocene pre-Neolithic and Neolithic sites have been identified (Fedele 2009). However, there also exists the hypothesis that the peopling of eastern Arabia by Pre-Pottery Neolithic B-related settlers was the result of widespread population dispersal during the Early Holocene (Uerpmann et al. 2009). Fedele (2009) examined this issue in the Yemeni highlands, where archaeological surveys at two sites attested to an Early Holocene “Pre-Neolithic” settlement throughout the eastern Yemen Plateau (and a single continuum from Pre-Neolithic to Neolithic) with features of its Pre-Neolithic industry displaying hints of similarity with East Africa rather than the Fertile Crescent. Further study (McCorriston and Martin 2009) focused on the Early Holocene pastoralists along the desert margins of southern Arabia and found evidence for cattle introduction from the Levant or possibly from northeastern Africa by the sixth millennium BC, if not earlier. They suggest multiple waves of expansion into Arabia, at different times, and even suggest that the earliest herd animals in southern Arabia were probably introduced as a pioneering strategy among local hunters.

Our results show that a substantial part of the Yemeni population is biologically related to one or more demographic expansion events that have taken place over the last 20 KYA. The southeast parts of Yemen, such as Al Mahra Hadramawt and mainly Soqotra might have played an important role in later demographic expansion, as R0a1a1a and R0a2f1 now testify. The place where these haplogroups were identified closely match a South Arabian refugium suggested by paleoclimatic data (Rose and Petraglia 2009) and show the importance of the Holocene in this area. It would be very informative to have R0a sequences from Oman (especially Dhofar) in order to more precisely localize the origin of these genetically apparent demographic upheavals. It is very interesting that the genetic dates for the introduction and expansion of Soqotran R0a clades (between 6 and 3 KYA) match the earliest evidence for seafaring activity in the peninsula in the sixth millennium BC (Boivin et al. 2009). Our study is thus an example of how results in human genetics can closely overlap with other fields of anthropology.

This project was supported by the Fulbright-Masaryk Fellowship of V.C. at the Department of Anthropology, University of Florida, the Ministry of Education of the Czech Republic, Grant number: KONTAKT ME 917, the Council of American Overseas Research Centers, the American Institute for Yemeni Studies, and the Portuguese Fundação para a Ciência e a Tecnologia (PTDC/ANT/66275/2006) (L.P.), (Instituto de Patologia e Imunologia Molecular da Universidade do Porto is an Associate Laboratory of the Portuguese Ministry of Science, Technology and Higher Education and is partially supported by the Portuguese Foundation for Science and Technology, FCT). This research was also supported by a grant from the National Science Foundation to C.J.M. (BSR-0518530).

References

Abu-Amero
KK
Gonzalez
AM
Larruga
JM
Bosley
TM
Cabrera
VM
Eurasian and African mitochondrial DNA influences in the Saudi Arabian population
BMC Evol Biol
 , 
2007
, vol. 
7
 pg. 
32
 
Abu-Amero
KK
Larruga
JM
Cabrera
VM
Gonzalez
AM
Mitochondrial DNA structure in the Arabian Peninsula
BMC Evol Biol
 , 
2008
, vol. 
8
 pg. 
45
 
Achilli
A
Rengo
C
Magri
C
, et al. 
(21 co-authors)
The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool
Am J Hum Genet
 , 
2004
, vol. 
75
 (pg. 
910
-
918
)
Alshamali
F
Pereira
L
Budowle
B
Poloni
ES
Currat
M
Local population structure in Arabian Peninsula revealed by Y-STR diversity
Hum Hered
 , 
2009
, vol. 
68
 (pg. 
45
-
54
)
Alvarez-Iglesias
V
Mosquera-Miguel
A
Cerezo
M
, et al. 
(11 co-authors)
New population and phylogenetic features of the internal variation within mitochondrial DNA macro-haplogroup R0
PLoS One
 , 
2009
, vol. 
4
 pg. 
e5112
 
Bailey
G
Petraglia
MD
Rose
JI
The Red Sea, coastal landscapes, and hominin dispersals
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
15
-
38
)
Bandelt
HJ
Forster
P
Sykes
BC
Richards
MB
Mitochondrial portraits of human populations using median networks
Genetics
 , 
1995
, vol. 
141
 (pg. 
743
-
753
)
Behar
DM
Villems
R
Soodyall
H
, et al. 
(15 co-authors)
The dawn of human matrilineal diversity
Am J Hum Genet
 , 
2008
, vol. 
82
 (pg. 
1130
-
1140
)
Boivin
N
Blench
R
Fuller
DQ
Petraglia
MD
Rose
JI
Archaeological, linguistic and historical sources on ancient seafaring: a multidisciplinary approach to the study of early maritime contact and exchange in the Arabian Peninsula
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
251
-
278
)
Cadenas
AM
Zhivotovsky
LA
Cavalli-Sforza
LL
Underhill
PA
Herrera
RJ
Y-chromosome diversity characterizes the Gulf of Oman
Eur J Hum Genet
 , 
2008
, vol. 
16
 (pg. 
374
-
386
)
Černý
V
Mulligan
CJ
Rídl
J
Žaloudková
M
Edens
CM
Hájek
M
Pereira
L
Regional differences in the distribution of the sub-Saharan, West Eurasian, and South Asian mtDNA lineages in Yemen
Am J Phys Anthropol
 , 
2008
, vol. 
136
 (pg. 
128
-
137
)
Černý
V
Pereira
L
Kujanová
M
Vašíková
A
Hájek
M
Morris
M
Mulligan
CJ
Out of Arabia-the settlement of island Soqotra as revealed by mitochondrial and Y chromosome genetic diversity
Am J Phys Anthropol
 , 
2009
, vol. 
138
 (pg. 
439
-
447
)
Černý
V
Salas
A
Hájek
M
Žaloudková
M
Brdička
R
A bidirectional corridor in the Sahel-Sudan belt and the distinctive features of the Chad Basin populations: a history revealed by the mitochondrial DNA genome
Ann Hum Genet
 , 
2007
, vol. 
71
 (pg. 
433
-
452
)
Cherni
L
Fernandes
V
Pereira
JB
, et al. 
(12 co-authors)
Post-last glacial maximum expansion from Iberia to North Africa revealed by fine characterization of mtDNA H haplogroup in Tunisia
Am J Phys Anthropol
 , 
2009
, vol. 
139
 (pg. 
253
-
260
)
Costa
MD
Cherni
L
Fernandes
V
Freitas
F
Ammar El Gaaied
AB
Pereira
L
Data from complete mtDNA sequencing of Tunisian centenarians: testing haplogroup association and the “golden mean” to longevity
Mech Ageing Dev
 , 
2009
, vol. 
130
 (pg. 
222
-
226
)
Fedele
FG
Petraglia
MD
Rose
JI
Early holocene in the highlands: data on the peopling of the Eastern Yemen Plateau, with a note on the pleistocene evidence
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
215
-
236
)
Forster
P
Torroni
A
Renfrew
C
Rohl
A
Phylogenetic star contraction applied to Asian and Papuan mtDNA evolution
Mol Biol Evol
 , 
2001
, vol. 
18
 (pg. 
1864
-
1881
)
Gasparre
G
Porcelli
AM
Bonora
E
, et al. 
(16 co-authors)
Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors
Proc Natl Acad Sci U S A
 , 
2007
, vol. 
104
 (pg. 
9001
-
9006
)
Gonder
MK
Mortensen
HM
Reed
FA
de Sousa
A
Tishkoff
SA
Whole-mtDNA genome sequence analysis of ancient African lineages
Mol Biol Evol
 , 
2007
, vol. 
24
 (pg. 
757
-
768
)
Hall
TA
BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT
Nucleic Acids Symp Ser
 , 
1999
, vol. 
41
 (pg. 
95
-
98
)
Harich
N
Costa
MD
Fernandes
V
Kandil
M
Pereira
JB
Silva
NM
Pereira
L
The trans-Saharan slave trade—clues from interpolation analyses and high-resolution characterization of mitochondrial DNA lineages
BMC Evol Biol
 , 
2010
, vol. 
10
 pg. 
138
 
Hartmann
A
Thieme
M
Nanduri
LK
Stempfl
T
Moehle
C
Kivisild
T
Oefner
PJ
Validation of microarray-based resequencing of 93 worldwide mitochondrial genomes
Hum Mutat
 , 
2009
, vol. 
30
 (pg. 
115
-
122
)
Kitchen
A
Ehret
C
Assefa
S
Mulligan
CJ
Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East
Proc Biol Sci.
 , 
2009
, vol. 
276
 (pg. 
2703
-
2710
)
Kivisild
T
Reidla
M
Metspalu
E
Rosa
A
Brehm
A
Pennarun
E
Parik
J
Geberhiwot
T
Usanga
E
Villems
R
Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears
Am J Hum Genet
 , 
2004
, vol. 
75
 (pg. 
752
-
770
)
Luis
JR
Rowold
DJ
Regueiro
M
Caeiro
B
Cinnioglu
C
Roseman
C
Underhill
PA
Cavalli-Sforza
LL
Herrera
RJ
The Levant versus the Horn of Africa: evidence for bidirectional corridors of human migrations
Am J Hum Genet
 , 
2004
, vol. 
74
 (pg. 
532
-
544
)
Maca-Meyer
N
Gonzalez
AM
Larruga
JM
Flores
C
Cabrera
VM
Major genomic mitochondrial lineages delineate early human expansions
BMC Genet
 , 
2001
, vol. 
2
 pg. 
13
 
Macaulay
V
Hill
C
Achilli
A
, et al. 
(21 co-authors)
Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes
Science
 , 
2005
, vol. 
308
 (pg. 
1034
-
1036
)
McCorriston
J
Martin
L
Petraglia
MD
Rose
JI
Southern Arabia’s early pastoral population history: some recent evidence
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
237
-
250
)
Metspalu
M
Kivisild
T
Metspalu
E
, et al. 
(16 co-authors)
Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans
BMC Genet
 , 
2004
, vol. 
5
 pg. 
26
 
Olivieri
A
Achilli
A
Pala
M
, et al. 
(15 co-authors)
The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa
Science
 , 
2006
, vol. 
314
 (pg. 
1767
-
1770
)
Palanichamy
MG
Sun
C
Agrawal
S
Bandelt
HJ
Kong
QP
Khan
F
Wang
CY
Chaudhuri
TK
Palla
V
Zhang
YP
Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia
Am J Hum Genet
 , 
2004
, vol. 
75
 (pg. 
966
-
978
)
Parker
AG
Petraglia
MD
Rose
JI
Pleistocene climate change in Arabia: developing a framework for hominin dispersal over the last 350 ka
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
39
-
50
)
Pereira
L
Freitas
F
Fernandes
V
Pereira
JB
Costa
MD
Costa
S
Maximo
V
Macaulay
V
Rocha
R
Samuels
DC
The diversity present in 5140 human mitochondrial genomes
Am J Hum Genet
 , 
2009
, vol. 
84
 (pg. 
628
-
640
)
Pereira
L
Goncalves
J
Franco-Duarte
R
Silva
J
Rocha
T
Arnold
C
Richards
M
Macaulay
V
No evidence for an mtDNA role in sperm motility: data from complete sequencing of asthenozoospermic males
Mol Biol Evol
 , 
2007
, vol. 
24
 (pg. 
868
-
874
)
Petraglia
MD
Alsharekh
A
The Middle Palaeolithic of Arabia: implications for modern human origins, behaviour and dispersals
Antiquity
 , 
2003
, vol. 
77
 (pg. 
671
-
684
)
Petraglia
MD
Rose
JI
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
Quintana-Murci
L
Semino
O
Bandelt
HJ
Passarino
G
McElreavey
K
Santachiara-Benerecetti
AS
Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa
Nat Genet
 , 
1999
, vol. 
23
 (pg. 
437
-
441
)
Richards
M
Macaulay
V
Hickey
E
, et al. 
(37 co-authors)
Tracing European founder lineages in the Near Eastern mtDNA pool
Am J Hum Genet
 , 
2000
, vol. 
67
 (pg. 
1251
-
1276
)
Richards
M
Rengo
C
Cruciani
F
Gratrix
F
Wilson
JF
Scozzari
R
Macaulay
V
Torroni
A
Extensive female-mediated gene flow from sub-Saharan Africa into near eastern Arab populations
Am J Hum Genet
 , 
2003
, vol. 
72
 (pg. 
1058
-
1064
)
Rose
JI
Petraglia
MD
Petraglia
MD
Rose
JI
Tracking the origin and evolution of human populations in Arabia
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
1
-
12
)
Rowold
DJ
Luis
JR
Terreros
MC
Herrera
RJ
Mitochondrial DNA geneflow indicates preferred usage of the Levant Corridor over the Horn of Africa passageway
J Hum Genet
 , 
2007
, vol. 
52
 (pg. 
436
-
447
)
Saillard
J
Forster
P
Lynnerup
N
Bandelt
HJ
Norby
S
mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion
Am J Hum Genet
 , 
2000
, vol. 
67
 (pg. 
718
-
726
)
Soares
P
Ermini
L
Thomson
N
Mormina
M
Rito
T
Rohl
A
Salas
A
Oppenheimer
S
Macaulay
V
Richards
MB
Correcting for purifying selection: an improved human mitochondrial molecular clock
Am J Hum Genet
 , 
2009
, vol. 
84
 (pg. 
740
-
759
)
Thangaraj
K
Chaubey
G
Kivisild
T
Reddy
AG
Singh
VK
Rasalkar
AA
Singh
L
Reconstructing the origin of Andaman Islanders
Science
 , 
2005
, vol. 
308
 pg. 
996
 
Torroni
A
Achilli
A
Macaulay
V
Richards
M
Bandelt
HJ
Harvesting the fruit of the human mtDNA tree
Trends Genet
 , 
2006
, vol. 
22
 (pg. 
339
-
345
)
Uerpmann
H-P
Potts
DT
Uerpmann
M
Petraglia
MD
Rose
JI
Holocene (Re-) Occupation of Eastern Arabia
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
205
-
210
)
Wilkinson
TJ
Petraglia
MD
Rose
JI
Environment and long-term population trends in southwest Arabia
The evolution of human populations in Arabia. Paleoenvironments, prehistory and genetics
 , 
2009
New York
Springer
(pg. 
51
-
68
)

Supplementary data