Extended Y Chromosome Investigation Suggests Postglacial Migrations of Modern Humans into East Asia via the Northern Route

Genetic diversity data, from Y chromosome and mitochondrial DNA as well as recent genome-wide autosomal single nucleotide polymorphisms, suggested that mainland Southeast Asia was the major geographic source of East Asian populations. However, these studies also detected Central–South Asia (CSA)- and/or West Eurasia (WE)-related genetic components in East Asia, implying either recent population admixture or ancient migrations via the proposed northern route. To trace the time period and geographic source of these CSA- and WE-related genetic components, we sampled 3,826 males (116 populations from China and 1 population from North Korea) and performed high-resolution genotyping according to the well-resolved Y chromosome phylogeny. Our data, in combination with the published East Asian Y-haplogroup data, show that there are four dominant haplogroups (accounting for 92.87% of the East Asian Y chromosomes), O-M175, D-M174, C-M130 (not including C5-M356), and N-M231, in both southern and northern East Asian populations, which is consistent with the proposed southern route of modern human origin in East Asia. However, there are other haplogroups (6.79% in total) (E-SRY4064, C5-M356, G-M201, H-M69, I-M170, J-P209, L-M20, Q-M242, R-M207, and T-M70) detected primarily in northern East Asian populations and were identiﬁed as Central-South Asian and/or West Eurasian origin based on the phylogeographic analysis. In particular, evidence of geographic distribution and Y chromosome short tandem repeat (Y-STR) diversity indicates that haplogroup Q-M242 (the ancestral haplogroup of the native American-speciﬁc haplogroup Q1a3a-M3) and R-M207 probably migrated into East Asia via the northern route. The age estimation of Y-STR variation within haplogroups suggests the existence of postglacial ( ; 18 Ka) migrations via the northern route as well as recent ( ; 3 Ka) population admixture. We propose that although the Paleolithic migrations via the southern route played a major role in modern human settlement in East Asia, there are ancient contributions, though limited, from WE, which partly explain the genetic divergence between current southern and northern East Asian populations.


Introduction
Recent results from genome-wide autosomal single nucleotide polymorphisms (SNPs) showed that East Asian populations have a clinal structure with haplotype diversity decreasing from south to north (Abdulla et al. 2009). The south-to-north cline was also demonstrated by previous mitochondrial DNA (mtDNA) and Y chromosome studies (Su et al. 1999;Kivisild et al. 2002;Shi et al. 2005;Shi et al. 2008). However, there have been studies showing different patterns in which higher Y chromosome diversity in northern East Asia (NEAS) than in southern East Asia (SEAS) was observed, supporting that the NEAS populations have a closer genetic relationship with Central Asian populations Xue et al. 2006). Indeed, many reported studies have detected Central-South Asia (CSA)-and/or West Eurasia (WE)-related genetic components in NEAS (Xinjiang was considered as NEAS in this study, although it is often used to represent Central Asian populations), such as Y chromosome Xue et al. 2006) and mtDNA (Yao et al. 2004;Yang et al. 2008). Similarly, autosomal markers also revealed CSA-and WE-related haplotypes in NEAS (Norton et al. 2007;Hellenthal et al. 2008;Abdulla et al. 2009). For example, the whole-genome data showed that 5% of the East Asian haplotypes were found only in CSA (Abdulla et al. 2009).
For these CSA-and WE-related genetic components in East Asia, it was interpreted either as the evidence of ancient migration via the northern route (Underhill et al. 2001) or simply as the reflection of recent population admixture (Xu et al. 2008) associated with trade along the ancient Silk Road (Yao et al. 2004;Zhang et al. 2007;Yang et al. 2008). It is implausible that the genetic input from CSA and WE is completely attributed to the ancient trade because high frequencies of CSA-and WE-related genetic components were found in NEAS populations (e.g., mtDNA .40%; Y chromosome DNA .50% in Xinjiang Uygur) (Yao et al. 2004;Hammer et al. 2006;Xue et al. 2006;Yang et al. 2008). Recently, two archeological populations (approximately 2,500 and 4,000 years before present, respectively) in Xinjiang were found containing 27% (Zhang et al. 2009) and 30% (Li et al. 2010) CSA-and WE-related mtDNA lineages, respectively, indicating that CSA-and WE-related genetic input into NEAS occurred earlier than the proposed relatively recent population admixture. These findings implied that there might be even earlier migration events in NEAS via the northern route.
There are at least two Y chromosome haplogroups that can be used to test ancient human dispersals into East Asia via the northern route. Haplogroup R-M207 occurs frequently in CSA and WE populations. It was detected in East Asia, especially in Uygur with relatively high frequency (Xue et al. 2006). Although the existence of R-M207 in NEAS could be explained by recent population admixture, there has not been systematic examination in East Asia. Another haplogroup is Q-M242, which was detected in CSA (Seielstad et al. 2003;Sharma et al. 2007), WE , Siberia (Pakendorf et al. 2006;Pakendorf et al. 2007), East Asia (Su et al. 2000), and the Americas (Bortolini et al. 2003;Zegura et al. 2004), but not in Southeast Asia (except for Vietnam) (Hammer et al. 2006) and Oceania (Karafet et al. 2010). Hence, a systematic screening of haplogroup Q-M242 in East Asia will probably provide new insights into the possible northern route and the early migrations into the Americas (Q1a3a-M3, a subclade of Q-M242, is the Native American-specific haplotype). It has been shown that the dominant haplogroup O-M175, D-M174, C-M130, and N-M231 in East Asia all have a southern origin (Su et al. 1999;Shi et al. 2005;Kumar et al. 2007;Rootsi et al. 2007;Li et al. 2008;Shi et al. 2008;Zhong et al. 2010). However, the remaining haplogroups detected in East Asia are still possibly related to CSA and WE populations.
Most of these CSA-and WE-related haplogroups identified previously in East Asians were not analyzed in detail owing to insufficient sampling and/or the scarcity of Y chromosome markers. Recently, the high-resolution Y chromosome phylogeny has been constructed and detailed phylogeographic information of the Y chromosome haplogroups has been revealed (Semino et al. 2004;Sengupta et al. 2006;Karafet et al. 2008;Tofanelli et al. 2009;Underhill et al. 2009;Balaresque et al. 2010), providing an opportunity to dissect the CSAand WE-related Y chromosome components in East Asia.
In this study, we investigated Y chromosome population substructure of East Asians by high-resolution genotyping and fine-scale sampling in both NEAS and SEAS, and we performed phylogeographic analysis of CSA-and WE-related Y chromosome haplogroups in East Asia in order to trace the migratory history and geographic source.

Materials and Methods
Samples A total of 3,826 unrelated male samples (116 populations from China and 1 population from North Korea; figs. 1 and 2) were recruited with informed consents. The protocol of this study was approved by the Institutional Review Board of Kunming Institute of Zoology, Chinese Academy of Sciences. In order to compare the Y chromosome population structure of East and West Eurasians, a total of 6,308 individuals' Y-haplogroup data from 137 populations (supplementary tables S1 and S2, Supplementary Material online) were extracted from the literature as reference populations Hammer et al. 2006;Regueiro et al. 2006;Sengupta et al. 2006;Capelli et al. 2007;Gayden et al. 2007;Balanovsky et al. 2008;Gusmao et al. 2008;King et al. 2008;Lopez-Parra et al. 2009;Mirabal et al. 2009). In addition, according to the Y-haplogroup data of East Asians, the Y chromosome short tandem repeat (Y-STR) data (supplementary table S3, Supplementary Material online) were extracted from the literature (Seielstad et al. 2003;Cinnioglu et al. 2004;Shi et al. 2005;Bosch et al. 2006;Hammer et al. 2006;Sengupta et al. 2006;Capelli et al. 2007;Martinez et al. 2007;Nonaka et al. 2007;Sharma et al. 2007;King et al. 2008;Li et al. 2008;Shi et al. 2008;Tofanelli et al. 2009;Zhong et al. 2010).

Y Chromosome Genotyping
According to the hierarchical genotyping strategy (Underhill et al. 2000;Hammer et al. 2001), M175 was typed first because it occurs at high frequency in East Asians (Shi et al. 2005); all individuals without M175-Del were examined for M130, YAP, and M231; then, the rest of individuals who are ancestral at M175, M130, YAP, and M231 were subject to further subtyping according to the high-resolution Y Chromosomal Haplogroup Tree (Karafet et al. 2008;Underhill et al. 2009). The Y chromosome biallelic markers genotyped are shown in figure 2, which were determined by sequencing and PCR-restriction fragment length polymorphism. To evaluate the diversity of haplogroup F2, G, H, J, L, Q, R, and T, we also typed eight commonly used Y-STR markers: DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, and DYS393 using fluorescence-labeled primers with an ABI 3730 Genetic Analyzer. The Y-STR nomenclature follows the system proposed by Butler et al. (2002). In addition, another Y-STR maker DYSA7.2 Sengupta et al. 2006) was typed in the M241-derived individuals.

Data Analysis
Together with the published data (supplementary tables S1 and S2, Supplementary Material online), the haplogroup Zhong et al. · doi:10.1093/molbev/msq247 MBE frequency data of East Asians were used to perform correspondence analysis (CA) using software XLSTAT (http:// www.xlstat.com) and show the geographical distribution of different haplogroups. Owing to incomplete genotyping of the literature data, subbranches of a haplogroup were integrated into the ancestral markers (haplogroup D, E, C, G, H, I, J, L, N, O, Q, and R) and then the converted frequency data (supplementary table S1, Supplementary Material online) were used to perform CA.
CSA-and WE-related Y chromosome haplogroups were selected to construct their median-joining networks using Y-STR data (supplementary table S3, Supplementary Material online), including data from the literature and this study. Median-joining networks were constructed using the program NETWORK 4.5.1.0 (Fluxus Engineering) (Bandelt et al. 1999). When the constructed networks are lack of resolution, they were reconstructed by the median-joining method after processing the data with the reduced median method (reduction threshold 5 1.0) or reweighting the STR loci according to their variances (higher weights were assigned to the least variable loci). The age of STR variation within each Y chromosome haplogroup was estimated following the published method (Zhivotovsky 2001;Zhivotovsky et al. 2004;Sengupta et al. 2006). During the analysis of Y-STRs, DYS389II was named DYS389b after subtracting DYS389I because DYS389II contains the repeat number of DYS389I.

Y Chromosome Haplogroups Detected in East Asia
As shown in figure 2 and supplementary table S1 (Supplementary Material online), there are four dominant Y chromosome haplogroups in East Asians, that is, D-M174 (10.72%), C-M130 (12.43%, not including C5-M356), N-M231 (5.96%), and O-M175 (63.75%), which account for 92.87% of the Y chromosomes in East Asia. Because these four haplogroups have been well studied and shown clear southern origin during Paleolithic time (Su et al. 1999;Shi et al. 2005;Kumar et al. 2007;Rootsi et al. 2007;Li et al. 2008;Shi et al. 2008;Zhong et al. 2010), we did not conduct subtyping except for C-M130 in our reported subtyping data (Zhong et al. 2010).

Y Chromosome Population Substructure of East Asians
We performed CA based on the genotyping data of Y chromosome biallelic markers. In the CA plots ( fig. 3a), we observed the following features: 1) the SEAS populations cluster tightly together except for the Naxi group from the literature (Sengupta et al. 2006) and Yunnan Hui (a Muslim population); 2) the SEAS and most of the NEAS populations cluster together with all Southeast Asian populations, but cluster with only 7 of 37 South Asian populations (3 Tibeto-Burman groups from northeastern India, 2 Austro-Asiatic groups from eastern India, 1 Dravidian group from southern India, and 1 Sino-Tibetan group from Nepal); 3) NEAS populations are loosely clustered because many of them are close to South Asian and West Eurasian populations; 4) in particular, Uygur, Kyrgyz, and Hui populations in northwestern China show close relationship to West Eurasian populations. Although two SEAS populations (Hui and Naxi) are also close to West Eurasians, they Postglacial Migrations via the Northern Route · doi:10.1093/molbev/msq247 MBE are the known immigrants from northwestern China (Du and Yip 1993). Collectively, these features indicate that CSA-and WE-related Y chromosome haplogroups occur almost exclusively in northwestern populations rather than in SEAS and Southeast Asian populations. As expected, the Xinjiang Uygur populations contain all those CSA-and WErelated Y chromosomes ( fig. 2), which account for 71.79% of all Uygur males from Xinjiang, China.

Phylogeographic Analysis of CSA-and WE-Related Y Chromosome Haplogroups
Haplogroup E, C5, G, H, I, L, and T are prevalent in CSA and/ or WE populations with more subhaplogroups, and they also appear in NEAS populations with low frequency, but generally absent in SEAS populations (figs. 2 and 3b and supplementary tables S1 and S2, Supplementary Material online). The Y-STR haplotypes of CSA and WE populations are highly diverged with consecutive mutational steps, whereas those of NEAS populations display either partial cluster or sporadic distribution in the median-joining networks ( fig. 4). Within haplogroup G and T, the NEAS Y-STR haplotypes mainly link to West Asians, whereas those of haplogroup H and L tend to link to Indians and Pakistanis, respectively.
Haplogroup Q occurs in Eurasians and its subhaplogroups show different distributions. Q1a1-M120 is widely distributed in both SEAS and NEAS populations, but absent outside East Asia except for one incidence   (Du and Yip 1993;Wen, Li, et al. 2004). In further network analysis ( fig. 4f), the East Asian Y-STR haplotypes form a relatively separate cluster, whereas the CSA Y-STR haplotypes show a loose structure and long mutational steps; also, the NEAS Y-STR haplotypes display higher differentiation than the SEAS haplotypes (Q1a1-M120 and Q1a3*-M346). The high diversity and structured distribution of Q-M242 lineages in CSA, WE, and NEAS populations are indicative of their prehistoric migrations via the northern route into East Asia.
Haplogroup R occurs at high frequencies in Eurasians, especially in West Eurasians ( fig. 3b and supplementary table S2, Supplementary Material online). In East Asia, it is frequent in NEAS but rare in SEAS (sparsely detected in 5 of 82 SEAS populations). The R subhaplogroups mainly occur in CSA and WE: R1a1-M17 in Europe, West Asia, and CSA; R1b1b2-M269 in Europe and West Asia; and R2-M124 in South Asia. However, R1b1b1-M73 was detected mainly in NEAS and was sporadically detected in South Asia and WE. As shown in the network of R1b-M343 ( fig. 4h), most of the M73-derived individuals occur at the terminals of the network with multistep mutations, indicating a different origin of M73 in comparison with those West Eurasian individuals and a possible origin in NEAS because of its high Y-STR diversity. All M269-derived individuals from NEAS either link to or share haplotypes with West Eurasians. A recent study suggests a West Asian origin of R1b1b2-M269 and its Neolithic expansion (Balaresque et al. 2010). Therefore, the NEAS R1b1b2-M269 likely came from West Asia. Although R1a1-M17 occurs across Eurasia, its network ( fig. 4h) does not exhibit region-specific cluster; also, the NEAS Y-STR haplotypes of R1a1-M17 show consecutive mutation steps and high differentiation. In addition, these M17-derived individuals were proven to be the ancestral R1a1* by subtyping seven downstream SNPs ( fig. 2). Therefore, this lineage likely represents the early immigrants from CSA or WE. Haplogroup R2-M124 has a predominant distribution in South Asia ( fig. 3b) and exhibits a star-like structure in the network ( fig. 4g). Also, more diversified haplotypes occur in South Asia instead of NEAS. These features of R2-M124 reflect a relatively recent expansion from South Asia to NEAS via the northern route. The phylogeography of haplogroup R excludes the possibility that it migrated eastward via the southern route.
The WE origin haplogroup J (Semino et al. 2004) presents a similar distribution pattern with R in East Asia, primarily occurring in NEAS, but rare in SEAS ( fig. 2). As shown in figure 4c and d, there is higher Y-STR diversity in CSA and WE than in East Asian populations. East Asian J1-M267 individuals share haplotypes with West Eurasians and tend to link to Caucasian haplotypes. J2b2-M241 individuals in NEAS contain an 8-repeat motif at the A7.2 STR locus (West Eurasian 8-repeat motif vs. South Asian 7repeat motif; Cinnioglu et al. 2004;Sengupta et al. 2006) and link to West Asian haplotypes. In the network of J2-M172, 61% of the NEAS haplotypes share with West Eurasians and South Asians, again suggesting WE origin of these haplotypes.

The Age of STR Variation within Y Chromosome Haplogroups
To evaluate the early movements of different haplogroups detected in East Asia, the ages of STR variation within each

Southern versus Northern Route
Our previous Y chromosome data, primarily from haplogroup O-M175, C-M130, and D-M174 (Su et al. 1999;Shi et al. 2005;Shi et al. 2008;Zhong et al. 2010), suggested that the southern route made a substantial contribution to the early peopling of anatomically modern humans in East Asia, which was also supported by the phylogeography of haplogroup N-M231 (Rootsi et al. 2007). In this study, the four predominant haplogroups account for 92.87% of the East Asian Y chromosomes and show continuous distribution in East Asians, again indicating the substantial contribution of the southern route. However, the CSA-and WE-related Y chromosome haplogroups were also detected accounting for 6.79% of the East Asian males and exhibit structured distribution. They occur primarily in northwestern East Asia and gradually decrease from northwest to northeast. On the other hand, in contrast to the ages of STR variation within haplogroup C, D, and O (18.78-50.62 Ka), the CSA-and WE-related haplogroups and subhaplogroups show relatively young ages, ranging from 2.72 to 23.11 Ka (table 1). Among the 19 CSA-and WE-related haplogroups and subhaplogroups, the majority of them range from 9.68 to 17.83 Ka, suggesting ancient migrations from CSA and WE into NEAS via the northern route, which likely occurred following the last glacial maximum.
Furthermore, maternally inherited mtDNA shows similar distribution pattern with Y chromosomes in East Asia. The East Asian-specific mtDNA haplogroups are prevalent in both SEAS and NEAS populations (Kivisild et al. 2002;Derenko et al. 2007), whereas the West Eurasian-specific mtDNA haplogroups occur mainly in NEAS, especially in Uygur and Hui (Yao et al. 2004;Derenko et al. 2007;Yang et al. 2008). Additionally, these West Eurasian mtDNA haplogroups also show postglacial dispersal in northern Asia (Derenko et al. 2007). Therefore, both Y chromosome and mtDNA data support that the Paleolithic migrations of modern humans via the southern route had major contribution to current East Asians, whereas the postglacial migrations via the northern route also contributed to current NEAS populations.

Early Postglacial and Neolithic Migrations to NEAS via the Northern Route
Although the CSA-and WE-related Y chromosome haplogroups account for only 6.79% of East Asian males, it is noteworthy that 69.55% of them belong to two haplogroups Q and R, which probably represent early migrations from CSA and WE into NEAS via the northern route.
Q1a1-M120 and Q1a3*-M346 are the two major sublineages of haplogroup Q, which have similar ages of STR variation, 15.42 and 17.77 Ka, respectively (table 1). Q1a1-M120 is an East Asian-specific subhaplogroup. It occurs in most of the NEAS populations. In SEAS populations, it occurs mainly in southern Han Chinese with relatively low Y-STR diversity ( fig. 4f), implying that the spread of Q1a1-M120 was from north to south likely due to the demic diffusion of Han culture during Neolithic time (Wen, Li, et al. 2004). Q1a3*-M346 was not detected in the SEAS aborigines with only sporadic appearance in southern Han and Hui, again suggesting its southward migrations from the NEAS populations (Du and Yip 1993;Wen, Li, et al. 2004). Notably, Q1a3*-M346 is the ancestral haplogroup of Q1a3a-M3, which occurs only in Native American populations (Bortolini et al. 2003;Zegura et al. 2004;Karafet et al. 2008). The trace of Q1a3*-M346 in the NEAS populations provides important evidence in supporting the proposed prehistoric migration from Central Asia to the Americas. Collectively, the phylogeographic structure of haplogroup Q reveals early demographic expansions via northern Eurasia.
R1a1*-M17, recently renamed R1a1a* ), is the major sublineage of haplogroup R, and it has similar distribution pattern with Q1a1-M120 and Q1a3*-M346 in East Asia, prevalent in NEAS but rare in SEAS populations. The STR variation age of R1a1* in East Asia (15.37 Ka) is also similar with those of Q1a1-M120 and Q1a3*-M346, suggesting that R1a1* was one of the lineages entering East Asia via the northern route. It is well known that R1a1*-M17 is popular in West Asia and Europe. However, the currently known seven subbranches of M17  were not detected in our 84 M17-derived individuals, implying that these M17-derived individuals probably migrated into NEAS before the occurrence of the seven subbranches in West Asia and Europe. Interestingly, the STR variation age of East Asian R1a1*-M17 is similar with the age of West Indian R1a1*-M17 (15.8 Ka), both of which are older than the R1a1*-M17 in CSA and WE ). The ancient age of East Asian R1a1*-M17 can also be reflected in the network ( fig. 4h) by exhibiting high differentiation and consecutive mutational steps. All these features indicate that R1a1*-M17 in East Asia were ancient immigrants probably from CSA and underwent long-period independent differentiation. However, more informative subbranch markers are required for a better phylogeography of M17.
Neolithic expansions in the near and middle east have been demonstrated contributing greatly to the peopling of Europe (Semino et al. 2004;Balaresque et al. 2010). Did the Zhong et al. · doi:10.1093/molbev/msq247 MBE Neolithic expansions contribute to NEAS populations? Several Neolithic expansion-signature Y chromosome haplogroups were detected in NEAS, such as R1b1b2-M269 (Balaresque et al. 2010), J1-M267 (Tofanelli et al. 2009), and J2a-M410 (Semino et al. 2004). Furthermore, these haplogroups and several other haplogroups (G-201, H-M69, L-M20, and R2-M124) detected in NEAS all show close relationship with CSA and WE haplotypes in the network ( fig. 4), therefore suggesting the existence of Neolithic migrations to NEAS via the northern route. The Neolithic expansions is further supported by evidence from ancient human remains that the CSA-and WE-related genetic components were brought into NEAS and Siberia about 2,000-4,000 years ago (Francalacci 1995;Mair 1995;Lalueza-Fox et al. 2004;Keyser et al. 2009;Zhang et al. 2009;Li et al. 2010).
The early migrations into NEAS via the northern route can also be reflected by data from mtDNA. Several mtDNA haplogroups (H, V, and X), showing signatures of postglacial expansions (Torroni et al. 2001;Reidla et al. 2003;Derenko et al. 2007), were detected in NEAS populations (Yao et al. 2004;Yang et al. 2008). More detailed analysis of mtDNA in East Asia is needed to quantify the maternal contribution of the early migrations via the northern route.

More Recent Gene Flow from CSA and WE to NEAS
The trade via the ancient Silk Road was widely applied to explain the occurrence of CSA-and WE-related genetic components in NEAS (Yao et al. 2004;Yang et al. 2008). We show that the majority of the CAS-and WE-related genetic contribution to NEAS occurred much earlier than the time of ancient Silk Road. However, in the NEAS populations, the impact of the ancient Silk Road can also be reflected by the sporadic appearance of the minor CSAand WE-related haplogroups, such as E-SRY4064, C5-M356, I-M170, J2a2*-M67, Jab2-M241, and T-M70, which had not have enough time to accumulate withinhaplogroup genetic diversity. Among these rarely detected haplogroups, E-SRY4064, I-M170, and T-M70 are prevalent in WE populations. Interestingly, these features are consistent with a recent result that gene flows from WE populations to the extant ethnic groups in Northwest China were relatively weak (Shou et al. 2010), indicating that recent population admixture is rather limited and those CSA-and WE-related haplogroups mainly came from ancient demographic expansions via the northern route. Finally, it should be noted that the Muslim populations (Hui) in China were immigrants from Central Asia, the Persians and Arabs during Yuan Dynasty (;700 years ago) (Du and Yip 1993;Gladney 1996;Lipman 1997); indeed, the Hui populations were found carrying the majority of CSA-and WE-related Y chromosomes ( fig. 2).

Conclusions
Our investigation of East Asian Y chromosomes, based on high-resolution genotyping and fine-scale sampling, supports the existence of demographic expansions toward East Asia via the northern route, which started ;15-18 Ka (following the last glacial maximum). However, the demographic input via the northern route is mostly restricted to NEAS populations and just contributed limited Y chromosomes to current East Asian populations (6.79%). Among them, haplogroup Q-M242 and R-M207 likely represent the earliest settlers via the northern route. In addition, it was confirmed that the Paleolithic colonization of anatomically modern humans in East Asia via the southern route have made substantial contribution to the extant East Asian Y chromosome gene pool (92.87%). Paleolithic migrations via the southern route shaped the south-to-north clinal structure; the postglacial migrations via the northern route enlarged the genetic divergence between SEAS and NEAS populations.