Origin and Expansion of Haplogroup H, the Dominant Human Mitochondrial DNA Lineage in West Eurasia: The Near Eastern and Caucasian Perspective

More than a third of the European pool of human mitochondrial DNA (mtDNA) is fragmented into a number of subclades of haplogroup (hg) H, the most frequent hg throughout western Eurasia. Although there has been considerable recent progress in studying mitochondrial genome variation in Europe at the complete sequence resolution, little data of com- parable resolution is so far available for regions like the Caucasus and the Near and Middle East—areas where most of European genetic lineages, including hg H, have likely emerged. This gap in our knowledge causes a serious hindrance for progress in understanding the demographic prehistory of Europe and western Eurasia in general. Here we describe the phylogeography of hg H in the populations of the Near East and the Caucasus. We have analyzed 545 samples of hg H at high resolution, including 15 novel complete mtDNA sequences. As in Europe, most of the present-day Near Eastern– Caucasus area variants of hg H started to expand after the last glacial maximum (LGM) and presumably before the Holocene. Yet importantly, several hg H subclades in Near East and Southern Caucasus region coalesce to the pre-LGM period. Furthermore, irrespective of their common origin, signiﬁcant differences between the distribution of hg H sub-hgs in Europe and in the Near East and South Caucasus imply limited post-LGM maternal gene ﬂow between these regions. In a contrast, the North Caucasus mitochondrial gene pool has received an inﬂux of hg H variants, arriving from the Ponto-Caspian/East European area.


Introduction
The Levantine part of the Near East was the area that was colonized foremost, though likely only episodically, about 100,000 years before present (YBP) (Shea 2003). Based on genetic data, it has been suggested that the earliest phase of the long-lasting settlement of Eurasia by anatomically modern humans (AMH) started 60,000-70,000 YBP and proceeded alongside the southern coast of the supercontinent, probably crossing first the Red Sea around Bab-el-Mandeb, continuing to India and further East (Cavalli-Sforza et al. 1994;Lahr and Foley 1994;Quintana-Murci et al. 1999;Kivisild et al. 2003aKivisild et al. , 2004Forster 2004;Metspalu et al. 2004;Macaulay et al. 2005;Thangaraj et al. 2005;Sun et al. 2006). According to the newest interpretation of the 14 C calibration data, Europe was populated around 41,000-46,000 YBP, likely after some hiatus since the ''opening'' of the southern route (Mellars 2006).
The demographic history of human populations during the Pleistocene has been profoundly influenced by largescale climate fluctuations, from which one of the most significant took place between 19,000 and 22,000 YBP, during the last glacial maximum (LGM), when the climate became significantly colder and dryer (Yokoyama et al. 2000;Clark et al. 2004). During this cold peak, extreme deserts occupied most of the Near East and Central Asia, whereas much of Europe and northern Asia was covered by steppe-tundra, forcing forest into scattered refugium areas in the western Caucasus and southern European peninsulas (Adams and Faure 1997; Peyron et al. 1998;Tarasov et al. 1999Tarasov et al. , 2000Crucifix et al. 2005). Postglacial expansion-recolonization from refugia is a concept that has recently been used to explain the genetic diversity of the present-day Europeans (Torroni et al. 1998(Torroni et al. , 2001Semino et al. 2000;Achilli et al. 2004;Rootsi et al. 2004;Tambets et al. 2004;Pereira et al. 2005). Much less, however, is known about the LGM period in the Near East and in the Caucasus. After the postglacial recolonization, another expansion happened thousands of years later, when agriculture started to develop in the Near East, resulting, according to many authors, in an outward migration of agriculturist populations to Europe and different parts of Asia, with an impact, the range of which is still hotly debated (Ammerman and Cavalli-Sforza 1984;Sokal et al. 1991; Barbujani et al. 1994;Cavalli-Sforza and Minch 1997;Chikhi et al. 2002;Dupanloup et al. 2004;Haak et al. 2005;Pinhasi et al. 2005).
An absolute majority of the western Eurasian mitochondrial DNA (mtDNA) pool consists of a small number of phylogenetically well-characterized branches of haplogroup (hg) R. The dominant hg in western Eurasia (H) descends from the hypervariable (HV) family of hgs, defined by substitutions at nucleotide positions (nps) 73 and 11719 relative to R* (Macaulay et al. 1999;Saillard, Magalhaes et al. 2000;Finnilä et al. 2001;Torroni et al. 2006). It has been accepted for some time now that most of the mtDNA hgs presently found in Europe, including hg H (Torroni et al. 1994), originated in the Near and Middle East (Torroni et al. 1994;Richards et al. 1996, for a review, see Forster 2004)-the question is when did they evolve? The hg H encompasses over 40% of the total mtDNA variation in most of Europe. Its frequency declines toward the East and South, but in the Near East, the Caucasus and Central Asia, its frequency is still as high as 10-30% (Metspalu et al. 1999;Richards et al. 2000;Tambets et al. 2000;Al-Zahery et al. 2003;Achilli et al. 2004;Loogväli et al. 2004;Metspalu et al. 2004;Quintana-Murci et al. 2004;Pereira et al. 2005).
More than 10 subclades within hg H, as defined by coding region mutations, have been described thus far, and a phylogenetic tree of 267 coding region sequences has been previously published by us (Loogväli et al. 2004). A number of hg H subclades show characteristic regional distribution. Thus, H1 and H3 are common in western Europe, having expanded after the LGM from the Franco-cantabrian refugium (Achilli et al. 2004;Loogväli et al. 2004;Pereira et al. 2005), whereas a subset of H2, defined by transition at np 951, is typical to eastern Europe and Asia, whereas H6 is the most frequent among the identified subclades of hg H in Central Asia (Loogväli et al. 2004).
Irrespective of their likely ancestral status relative to Europeans, the West Asian and the Caucasus populations have been profoundly underrepresented in the published mtDNA data sets. Here we analyze spatial and temporal spread of hg H in the Near East and the Caucasus and interpret the obtained results in a comprehensive West Eurasian context of this major maternal lineage, informative in terms of ancient human migrations between West Asia, the Caucasus, and Europe.
All confirmed hg H mtDNAs were subsequently screened for a series of single nucleotide polymorphisms that define different subbranches of this mtDNA lineage. The transition at np 239 was screened by sequencing, similarly to Loogväli et al. (2004), in all the samples, which harbored a transition at np 16362. Twenty-four polymorphisms throughout the mitochondrial genome were analyzed in all 545 samples. Transitions at nps 477, 951, 1438, 3010, 3796, 4336, 4745, 4769, 4793, 5004, 7645, 8448, 8598, 8994, 9380, 13020, 13101, 13708, 16482, and 14470TA transversion were detected by RFLP analysis ( fig. 1). To identify the transition at np 3010, we used mismatch forward primer 5#-np2981-acgacctcgatgttggatcaggacatcgc and similarly a mismatch forward primer was used in the case of the 14470TA transversion with the sequence 5#-np14448-caatagccatcgctgtaggat. A reverse mismatch primer, with the sequence 5#-np499-cgggggttgtattgatgagact, was employed to detect a polymorphism at np 477.
Mutations at nps 14869 and 14872 were detected by the absence of the 14869 MboI cutting site. To distinguish between the 2 transitions, all the samples that lacked this site were sequenced. Transitions at nps 456 and 6776 were detected by allele-specific polymerase chain reaction and by sequencing. Polymorphism at np 10166 was analyzed by sequencing samples lacking DdeI site at np 5003. Polymorphisms at nps 709 and 4745 were analyzed by RFLP in samples, which had a C to T mutation at np 14872. The polymorphism at np 11140 was screened in samples having BseMII site at np 1438.
The HVS-1 sequence of all the 545 samples was scored between nps 16024 and 16383. In order to elucidate the topology of the so far poorly resolved subclades of hg H, 15 samples were selected for complete sequencing. Samples inside the desired clades were selected randomly. We sequenced 6 samples with the 14872 transition (samples: Abazin 43, Lezgin 19, Mingrelian 9, Jordanian 923, Tabasaran 6, Turk 209), 3 samples with the 1438 transition (Dargins 18, 29, 75), 2 samples with the 5004 transition (Lezgin 5, Turk 137), 2 samples with a transition at np 7645 (Armenian 2, Turk 345), one sample with the 239 transition (North Ossetian 71), and one sample with the transition at np 8994 (Abkhazian 59). DYEnamic ET Terminator Cycle Sequencing Kit from Amersham Pharmacia Biotech was used for sequencing on a MegaBACE 1000 Sequencer (Amersham Biosciences, Piscataway, NJ). Sequence trace files were analyzed either in Seqlab (GCG Wisconsin Package 10, Genetics Computer Group) or in case of complete sequencing in Phred, Phrap, and Consed programs (Nickerson et al. 1997;Ewing et al. 1998).
Due to the large size of the data set, only the part of the network, with samples classified into sub-hgs, was presented. Coalescence ages of sub-hgs were calculated based on the network, by means of the average transitional distance from the root haplotypes (rho). One transitional step between nps 16090 and 16365 was taken equal to 20,180 years  and between 577 and 16023 equal to 5,138 years (Mishmar et al. 2003). For synonymous substitutions, we used the rate of one substitution in 6,764 years (Kivisild et al. 2006). Standard deviations (SDs) for age estimates were calculated as in Saillard, Forster et al. (2000).

Peopling of Western Eurasia 437
Coalescence ages for the clades in Europe were calculated on the data from Loogväli et al. (2004).
In an analysis of hg H variability for the Near East and the Caucasus, the information on European populations was drawn from the data presented by Herrnstadt et al. (2002) and complemented by frequencies for French from Loogväli et al. (2004) and Portuguese and Spanish from Pereira et al. (2005). Note that the samples of Herrnstadt et al. (2002) are from United States or United Kingdom and of unspecified descent. Yet, the sub-hg distribution is characteristic to other western European populations. To minimize deviation, we used average frequencies over the aforementionedpopulations(UnitedStatesorUnitedKingdom, French, Portuguese, Spaniards), in case the polymorphism was studied in more than one of them. Otherwise we used the only available frequency. To plot hgs on the same graph as populations, their coordinates (ranging from À1 to 1) were multiplied by 10. We calculated mismatch distributions (distributions of pairwise differences between sequences) on HVS-1 data in Arlequin 3.01 (Excoffier et al. 2005).

Topology of hg H Phylogenetic Tree
In a total of 6,199 samples from 11 Caucasus and Near Eastern populations, we found 1,219 samples to belong to hg H. From these, 545 hg H samples were chosen randomly over the region, to be tested for markers defining major subhgs of hg H and their internal branches ( fig. 1 and supplementary tables S1 and S2, Supplementary Material online). Altogether 61% of the samples could be clustered among 17 sub-hgs. A nomenclature, which we hereby update (supple- Inside hg H1, a new clade is characterized: H1d is defined by a transition at np 456 ( fig. 1). The presence of a transition at np 3796, representing H1b, has been noticed previously (Herrnstadt et al. 2002;Mishmar et al. 2003;Simon et al. 2003;Achilli et al. 2004;Pereira et al. 2005). However, we found this mutation also on the hg H5 background, which is noteworthy due to its nonsynonymous nature-the observed A to G substitution results in threonine to alanine replacement in the ND1 subunit of mitochondrial complex I. Notice that this mutation at np 3796 has been shown to be positively correlated with adult-onset dystonia and was suggested to cause abnormalities in the mitochondrial electron transport chain (Simon et al. 2003). Furthermore, outside hg H, the A to G transition at np 3796 has been detected in hg B (Herrnstadt et al. 2002), in hg M21 (Macaulay et al. 2005), and as a transversion from A to T in hg L1c, the latter substitution resulting in a serine codon (Ingman et al. 2000;Herrnstadt et al. 2002;Mishmar et al. 2003;Kivisild et al. 2006). Accordingly, nonsynonymous substitutions at np 3796 appear to be common in different, phylogenetically distant branches of human mtDNA and, therefore, unlikely to be under strong purifying selection (see also Mitchell et al. 2006).
Based on the combined presence of transitions at nps 1438 and 4769, Finnilä et al. (2001) identified hg H2 as the second most frequent subclade of hg H among Finns. These 2 mutations were observed in tandem also among 11 Caucasian-American samples in Herrnstadt et al. (2002), whereas a complete mtDNA sequence of an Iraqi individual in Achilli et al. (2004) hinted at a potential intermediate branch between these 2 defining positions. In our sample from the Near East and the Caucasus, we detected 5 more samples with 1438 substitution, all of them lacking the 4769 transition ( fig. 1), adding thereby weight to the idea of the origin of hg H2 outside Europe. Therefore, we propose to redefine hg H2 by the 1438 transition and nominate lineages inside H2 with the transition at np 4769 as H2a, with transitions at nps 8598 and 16311 as H2b, and with the transition at np 951 as H2a1. In the 3 new completely sequenced H2a samples ( fig. 2), one possessed the transition at np 10810, which is characteristic of H2c (Achilli et al. 2004). For this reason, we renamed it as H2a3 and the 2 other samples that shared a substitution at np 11140 as H2a4.
The topology of H4 changes significantly as a result of the complete sequencing of 2 genomes (figs. 1 and 2). It was previously considered to be defined by 6 mutations (Loogväli et al. 2004). Here we show that 3 mutations in the coding region-3992, 5004, and 9123-make up the root of the clade, whereas 3 transitions at nps 4024, 14365, and 14582 separate H4a, and a transition at np 10166 distinguishes H4b.
One of the most diverse sub-hgs of hg H is H13 (figs. 1 and 2). A transition at np 2259 separates H13a, which is further divided into H13a1 by a transition at np 4745, and H13a2 by transition at np 709. We have also completely sequenced 2 H14 genomes ( fig. 2). It appears that 2 HVS-1 transitions at nps 16256 and 16352 can be used to define subclade H14a ( figs. 1 and 2).
Four additional sub-hgs, H18, H19, H20, and H21, are defined here for the first time. H18 is defined by a transition at np 13708, which, notably, is a major nonsynonymous hot spot in mtDNA (Kivisild et al. 2006). H18 combines 3 previously determined mtDNA hg H complete or coding region sequences, which lack other diagnostic mutations of hg H subclades (Herrnstadt et al. 2002;Howell et al. 2003;Coble et al. 2004). However, taking into account the high variability of this position, the monophyletic nature of H18 should be considered with some caution. H19 is  (Anderson et al. 1981;Andrews et al. 1999). Substitution at np 152, variation of the number of Cs at np 309, and insertion at nps 522-523 were left out due to very fast mutation rates at these sites (from our sequenced samples DAR 29, LEZ 5, TAB 6, TUR 137, and TUR 345 have transition at np 152; DAR 75, LEZ 5, LEZ 19, OSE 71, and TUR 345 have a single C insertion at np 309; ABQ 43, DAR 16, and TAB 6 have 2 Cs inserted at np 309). defined by a transition at np 14869. Besides the single Syrian haplotype in our sample, 3 other mtDNA coding region or complete sequences (Herrnstadt et al. 2002;Howell et al. 2003) justify the proposed definition. H20 is defined by transition at np 16218 and C to A transversion at np 16328, whereas H21 is defined by transition at np 8994 ( figs. 1 and 2). An analysis of HVS-1 databases (over 22,000 published and unpublished samples) revealed an absence of the 16328CA transversion outside hg H, supporting its monophyletic status. In all, but one , published cases and in all our samples, this transversion occurs together with a transition at np 16218.
The majority of samples that did not belong to any of the characterized sub-hgs have CRS (Cambridge Reference Sequence) (Anderson et al. 1981;Andrews et al. 1999), or one mutation, however, 12.3% possessed three or more mutations in their HVS-1 (supplementary table S2, Supplementary Material online). On the other hand, our published (Loogväli et al. 2004) tree of 267 coding region sequences of hg H reveals the presence of a large number of solitary or binary twigs arising from the defining node of hg H. It strongly suggests a major ongoing expansion and diversification of this dominant maternal clade over the area of its present spread. Figure 3 gives an overview of the frequencies of the studied hgs across populations (for exact frequencies, see supplementary table S1, Supplementary Material online). Like in Europe, the most frequent subclade of hg H in the Near East and the Caucasus is H1. It encompasses over 11% of regional hg H samples, which makes its total frequency in the Caucasus and the Near East 2.3%. H1 is more common among the Lebanese (21% from hg H) and northern Caucasus populations (11-18%). These numbers are similar to those in eastern Europe, where it forms about 12% of the hg H gene pool in the Balkans and 18% among Slovaks (Loogväli et al. 2004). Interestingly, H1 is considerably more frequent (around 30% of hg H) both in West Europe and among Slavic-speaking East Europeans (Achilli et al. 2004;Loogväli et al. 2004). A finer clustering reveals an informative difference: whereas in Karatchaians-Balkarians (the North-Central Caucasus), all H1 samples fall into H1a and H1b-the 2 most common subclades of H1 in Europe-none of the Lebanese samples belong to these subclades of H1. Besides the North Caucasus populations, we found H1a and H1b outside of Europe only in Turks (supplementary table S1, Supplementary Material online).

Frequency Distribution of H Sub-hgs
A number of subclades of hg H reach their highest frequency among the western Caucasus populations (figs. 1 and 3). The most frequent of them is H5*, which forms over 20% of hg H gene pool in Karatchaians-Balkarians and Georgians-in people living in the immediate vicinity of the 2 sides of the High Caucasus. These numbers are considerably higher from the estimates in Europe or Central Asia, which vary from a total absence in Volga-Uralic Finno-Ugrians and Central Asian populations to 8% in Slovaks and French (Loogväli et al. 2004). At the same time, its subcluster, H5a, which represented 10% of hg H mtDNAs in the Balkans, is present in the Caucasus and the Near Eastern populations at a very low frequency. The frequencies of H20 and H21 peak in Georgians, with their spread limited to neighboring populations and to Syrians and Jordanians ( figs. 1 and 3).
Certain subclades of hg H were more prevalent in the Arabian Peninsula (figs. 1 and 3) including H2a1, H4b, H6, and H18, respectively, forming together approximately one half of the Arabian H lineages. Interestingly, H2a1 has been found at a similar high frequency in Central and Inner Asia (12.5%), whereas in Europe, it has been found only in Eastern Slavs (9% from hg H), Estonians (6%), and Slovaks (2%) (Loogväli et al. 2004). H2 forms a quarter of all hg H lineages in Daghestan. Yet, besides H2a1, common in the Arabian Peninsula, other variants of H2, like H2a4, form a large share of hg H in Daghestan. H6 is even more frequent in Central and Inner Asia (21%), especially so in Altaians (35%) (Loogväli et al. 2004).
One of the most diverse subclades of hg H, H13, reaches its highest frequency in Daghestan and in Georgia (15% and 13.3% from hg H, respectively) ( fig. 3, Supplementary Material online). Although all the H13 samples in Daghestan and also in Europe (Herrnstadt et al. 2002;Coble et al. 2004;Brandstätter et al. 2006) fall into H13a, the largest subclade of H13-additional H13 lineages-are present in the southern Caucasus and Near East populations ( fig. 1).
We carried out principal component analysis to explore affinities of mtDNA pools among different populations based on the frequency distributions of hg H subclades ( fig. 4A) as well as other hgs ( fig. 4B). In both plots, European populations are clearly separated from the rest. The populations from the southern Caucasus are more similar to Levantine populations, a trend that was particularly evident from the closeness of Syrians and Armenians. On the other hand, the northern Caucasus populations are genetically intermittent between European and Near Eastern populations. Because of the high H1 frequency in Lebanese, they are located, together with the northern Caucasus populations, closer to Europeans ( fig. 4A). An important observation of this analysis is the fact that the 2 PC plots-for hg H subgroups and, independently, for the joint mtDNA pool-are congruent in their basic pattern of the distribution of populations. Figure 4C demonstrates hgs whose frequency determines the placement of populations in principal component plots. The more frequent clades, characteristic of the European group of populations, are H1, H3, H5a, U5, and pre-V-V(HV0 in Torroni et al. 2006). The hgs HV, H4, H20, U1, U3, U6, and X appear typical to southern Caucasus populations, Turkey, and Syria, whereas in the Arabian Peninsula, hgs J and pre-HV (R0 according to Torroni et al. 2006), as well as African hg L lineages and H6b, are present at elevated frequencies in comparison with other populations. Finally, we estimated the effect of the previously uncharacterized subclades of hg H on the overall genetic landscape ( fig. 4D). The relatively high frequency of H13a1, together with those of H2a4 and H6a, characterizes Daghestan populations, distinguishing them from other northern Caucasus populations. H20 and H21, in addition to H5*, separate Georgians and Karatchaians-Balkarians from the rest.

Peopling of Western Eurasia 441
Coalescence Analysis From the HVS-1 coalescence analysis (table 1), it is evident that most clades of hg H bear the strongest signal for the beginning of their expansion after the LGM, during the Late Pleistocene and early Holocene. Significantly older is the estimate for H13. The apparent coalescence time for H1 is influenced by its subclades H1a and H1b, as without them the respective estimate in the Near East and the Caucasus drops from around 20,000-12,000 YBP. H6, one of the oldest clades in the Near East and the Caucasus, shows, in sharp contrast, an expansion age of a mere 3,400 YBP in Europe, which is the youngest estimate overall for the major subclades of hg H.
In addition to HVS-1 analysis, we also estimated the coalescence age from coding region data ( fig. 2). Using the calibration method of Mishmar et al. (2003), which does not differentiate between mutation types (synonymous vs. nonsynonymous), the age for H13 is 24,300 (SD 6,900) YBP and for H4 is 27,500 (SD 9,400) YBP. The age estimate for H13, when counting only synonymous substitutions (Kivisild et al. 2006), is 18,500 YBP (SD 6,600) and 10,100 YBP (SD 6,000) for H4. As an interesting empirical observation, we found that the nonsynonymous versus synonymous mutations ratio differs considerably between subhgs and, as estimated on the tree presented in figure 2, equals 0.5 for H13, only 0.2 for H13a1, and 0.67 for H4.
We calculated the mean number of pairwise differences for some clades (supplementary fig. S2, Supplementary Material online). Sub-hgs with younger coalescence times show mainly unimodal mismatch distributions, with the FIG. 3.-Frequency of hg H and its subclades (supplementary table S1, Supplementary Material online). Subclades H13, H14, H18, and H21 were not studied in French, Eastern Slavs, and Central Asia, which data were taken from Loogväli et al. (2004). 442 Roostalu et al. peak centered at one difference between sequence pairs. For a comparison, we have added our previous data of H3 sequences from European populations (Loogväli et al. 2004) because they represented lineages that were characteristic of postglacial recolonization of northern Europe (for a discussion, see Achilli et al. 2004;Loogväli et al. 2004). In older clades, there is a shift toward larger differentiation between lineages, moving the peak of mismatch distributions to 2 or 3 differences. The distributions can become multimodal as a result of constant population size for a longer period or multiple expansions and bottlenecks. The subclades of H6 show multimodal mismatch distributions, caused either by small sample sizes or, rather, by the complex demographic history of their carriers. Slightly multimodal is the distribution in the case of H1, which could be transformed to unimodal by excluding H1a and H1b.

Discussion
The peopling of Europe by AMH probably started more than 40,000 YBP (Mellars 2006), with the first evidence in the Lower Danube Basin (Churchill and Smith 2000;Conard and Bolus 2003), suggesting the Near East-Anatolia as a likely route for these pioneer huntergatherers to Europe. The present-day variation of hg H suggests that this mtDNA clade arose outside Europe before the LGM (Torroni et al. 1998;Richards et al. 2000;Loogväli et al. 2004;Pereira et al. 2005). In our attempt to expose pre-LGM limbs of hg H, we have characterized here the phylogeography of H13, which is one of the most diverse sub-hgs in the Near East and the Caucasus. It has a coalescence age of about 31,000 YBP according to HVS-1 (table 1)  Peopling of Western Eurasia 443 its origin before the LGM because the coalescence age, signaling the beginning of the expansion, is only the minimal absolute age of the clade. The beginning of the expansion of some other clades, like H6 and H14, dates to the pre-LGM period as well, but with SDs rather large, a more exact placement of their temporal origin is not currently possible. Furthermore, the timing of expansions relies heavily on the molecular clock exploited.
The topology of H14 ( fig. 1) illustrates the intricacy of estimating coalescence age in the case of a complex demographic history. Thus, H14a, being on a root of 2 HVS-1 mutations, elevates the apparent coalescence age of the whole H14 to 39,000 YBP. Yet, the topology of H14 is perhaps better explained by assuming the presence of 2 founders of unknown and unequal time of origin (H14 root haplotype and that of H14a), subject to a later, likely simultaneous expansion phase, manifested in their present-day diversity.
It is likely that the subclades of hg H that are common today, some of which being associated with post-LGM reoccupation, were already frequent before the LGM, decreasing the probability of their extinction. This suggestion is indirectly supported by multimodal mismatch distributions observed for H6 subclades and H1 (Supplementary Material online). In particular, H13 shows significantly earlier ''summary'' coalescence age, compared with other large subclades of hg H, and a unimodal mismatch distribution (see table 1 and supplementary fig. S2, Supplementary Material online). The reason for this could lie in its area of spread, centered in the southern Caucasus and the northern part of the Near East ( fig. 3), having presumably milder and less arid climate during the LGM, favorable for human occupation (Adams and Faure 1997;Ramrath et al. 1999;Tarasov et al. 1999Tarasov et al. , 2000Aksu et al. 2002). A global climate model, based on solar output, has revealed that a significant warming of the Earth's climate occurred between 33,000 and 26,000 YBP (Perry and Hsu 2000). Independently, more humid conditions in the Near East and Greece before the LGM (31,000-25,000 YBP) have been deduced from geological and pollen data analysis (Abed and Yaghan 2000; Tzedakis et al. 2002;Vaks et al. 2003;Hughes et al. 2005). These estimates overlap with the coalescence dates calculated here for the oldest subclades of hg H. We assume, therefore, that the first expansion wave of hg H may have taken place during this favorable time frame, probably in the northern part of the Near East and the southern Caucasus, where the oldest clades of hg H appear to be more diverse until now. It has been shown that the Upper Paleolithic archaeological culture was present in the South Caucasus more than 30,000 YBP, well before the LGM (Adler et al. 2006), giving support for our estimates of past population expansions in this region.
How far the pre-LGM expansion of hg H from the Near East may have reached before the onset of the LGM is indicated by the distributions of some hg H subclades (H1, H3) (Achilli et al. 2004;Pereira et al. 2005), as well as its sister clade hg V (Torroni et al. 1998(Torroni et al. , 2001. In Europe, these clades display frequency clines radiating from the Iberian Peninsula. This pattern has been associated with the spread of the carriers of the Magdalenian culture after the LGM, suggesting that hg H had reached Europe (Pereira et al. 2005) and, perhaps, western Siberia/Inner Asia (Loogväli et al. 2004), before the LGM.
It is most likely that the initial population expansion in the southern Caucasus and the Near East involved other maternal lineages besides hg H as well. In this context, it is worth pointing out that hg U3 has been shown to be most divergent in this region, having begun to expand about  (Metspalu et al. 1999). Similarly, hg HV1, with an analogous coalescence estimate, is most common and diverse in the southern Caucasus, present in the eastern Mediterranean. On the other hand, neither of the 2 became ever as frequent in Europe as hg H did (Tambets et al. 2000), suggesting that profoundly different later migration scenarios apply to them.
It should be stressed that for the majority of hg H subclades, the signal of expansion in the Near East and the Caucasus lies in a time frame between 18,000 and 10,000 YBP (table 1). It may suggest that such subclades not only expanded but also in fact arose much later than the earliest limbs of hg H. The European hg H gene pool differs significantly from that in the southern Caucasus and the Near East ( fig. 4A) because different sub-hgs have expanded after the LGM in different large subcontinental areas. Most importantly, it appears that after the initial migration of the carriers of hg H into Europe, presumably already before or during the Gravettian period, there was little subsequent admixture of the West Asian and European hg H lineages.
As for Europe, a number of frequency/diversity clines in the Near East and the Caucasus could be associated with the postglacial population expansion phase. This can be partially ascribed, as in Europe, to the (re)colonization of areas that were unsuitable for human occupation during the LGM due to aridity and lower temperatures. Sub-hgs H5*, H20, and H21 are the most frequent and diverse in the western Caucasus hg H gene pool. The region, stretching over the southeastern coast of the Black Sea, was a refugium area for forest (Adams and Faure 1997;Tarasov et al. 1999Tarasov et al. , 2000 and could have thus provided better conditions for fauna, as well as perhaps for human beings during the LGM. The phylogeography of H20 and H21 appears to be strictly limited within the immediate neighboring populations, suggesting their autochthonous origin in the Caucasus, whereas H5* has also been found throughout western Eurasia, albeit at a lower frequency (Loogväli et al. 2004). The expansion of humans to the Arabian Peninsula likely took place later, due to persisting aridity, which is still characteristic of the region today. As a consequence, the overall genetic diversity of hg H lineages in this region is very low ( fig. 1), and the corresponding frequency pattern of hg H subclades differs from that observed elsewhere in the Near East ( fig. 3).
Furthermore, our analysis provides evidence for possible back migration to the Caucasus and the Near East from the European populations. This possibility, as far as the Near East is concerned, has been discussed in some details by Richards et al. (2000), where a need for rigorous comparative phylogeographic lineage analysis (founder analysis) has been stressed. Complete mtDNA sequence based phylogeographic analysis-an approach that became available only recently-offers a new and more powerful means for such analysis (Torroni et al. 2006). Our results show that hg H-related gene flow from the East European Plain to the Caucasus populations is particularly evident in the mtDNA pool of the Turkic-speaking Karatchaians-Balkarians, where typically European sub-hgs of hg H, such as H1a, H1b, and H3, are present at a high frequency (figs. 1 and 2 and Supplementary Material online). This apparent overlap may have ancient roots, such as shared ancestry of Karatchaians-Balkarians and northern Ponto-Caspian nomadic people.
Taken together with recent series of predominantly ''eurocentric'' high-resolution phylogeographic analysis of hg H (Achilli et al. 2004;Loogväli et al. 2004;Pereira et al. 2005), presented here data suggest that hg H had already expanded before the LGM, with its oldest lineages being frequent in the southern Caucasus and the northern part of the Near East. A new phase of expansion followed the climate amelioration after the LGM. Later on, there appears to be only limited mtDNA flow from the Near East/the southern Caucasus toward Europe, as far as the dominant maternal lineage cluster-hg H-is concerned. As a result, different frequency spectra of hg H subclades characterize an otherwise largely joint Near Eastern heritage of maternal lineages for both West Asia and Europe.