Multiple Human Population Movements and Cultural Dispersal Events Shaped the Landscape of Chinese Paternal Heritage

Abstract Large-scale genomic projects and ancient DNA innovations have ushered in a new paradigm for exploring human evolutionary history. However, the genetic legacy of spatiotemporally diverse ancient Eurasians within Chinese paternal lineages remains unresolved. Here, we report an integrated Y-chromosome genomic database encompassing 15,563 individuals from both modern and ancient Eurasians, including 919 newly reported individuals, to investigate the Chinese paternal genomic diversity. The high-resolution, time-stamped phylogeny reveals multiple diversification events and extensive expansions in the early and middle Neolithic. We identify four major ancient population movements, each associated with technological innovations that have shaped the Chinese paternal landscape. First, the expansion of early East Asians and millet farmers from the Yellow River Basin predominantly carrying O2/D subclades significantly influenced the formation of the Sino-Tibetan people and facilitated the permanent settlement of the Tibetan Plateau. Second, the dispersal of rice farmers from the Yangtze River Valley carrying O1 and certain O2 sublineages reshapes the genetic makeup of southern Han Chinese, as well as the Tai-Kadai, Austronesian, Hmong-Mien, and Austroasiatic people. Third, the Neolithic Siberian Q/C paternal lineages originated and proliferated among hunter-gatherers on the Mongolian Plateau and the Amur River Basin, leaving a significant imprint on the gene pools of northern China. Fourth, the J/G/R paternal lineages derived from western Eurasia, which were initially spread by Yamnaya-related steppe pastoralists, maintain their presence primarily in northwestern China. Overall, our research provides comprehensive genetic evidence elucidating the significant impact of interactions with culturally distinct ancient Eurasians on the patterns of paternal diversity in modern Chinese populations.


Introduction
Population genomics and human pangenome projects aim to comprehensively document the genetic landscapes of globally diverse populations, elucidate their demographic histories, and uncover the genetic underpinnings of complex traits and diseases (Bergstrom et al. 2020;Byrska-Bishop et al. 2022).East Asia serves as one of the earliest cradles of civilization and the crossroads of the peopling of Oceania, Siberia, and America, whose genetic landscape is poorly characterized in the era of population genomics.China harbors extensive genetic, physical, cultural, and ethnolinguistic diversities, positioning it uniquely for studying the intricate demographic histories of diverse populations, including human divergence, migration, and admixture, and the interplay between genetics and culture (Wang et al. 2021b;Kumar et al. 2022).Numerous studies have sought to bridge the knowledge gap regarding the genetic diversity of Chinese populations by examining their evolutionary histories and the genetics of complex traits and diseases.Recent research utilized genome-wide SNP microarrays to analyze the genomic diversity and population history of various Sino-Tibetan, Mongolic, Tungusic, Turkic, Tai-Kadai, and Hmong-Mien groups (Feng et al. 2017;He et al. 2022;Wang et al. 2022;He et al. 2023b;Sun et al. 2023;Li et al. 2024).Additionally, the rise of wholegenome sequencing studies has expanded, featuring projects, such as the Westlake BioBank for Chinese, the NyuWa genome resource, the China Metabolic Analytics Project, and the 10K Chinese People Genomic Diversity Project (10K_CPGDP; Cao et al. 2020;Zhang et al. 2021a;Cong et al. 2022;Cheng et al. 2023;He et al. 2023c).These efforts enhance our understanding of the genetic diversity, demographic history, and genetic architecture of complex traits and diseases in ethnolinguistically distinct Chinese populations from an autosomal perspective, suggesting a further exploration of their fine-scale genetic structure from both uniparental and population-scale project perspectives.
The nonrecombining portion of the Y-chromosome has become pivotal in studying human evolutionary history across various time scales (Poznik et al. 2016).Recent advancements in sequencing technologies and computational methods for genome assembly, read mapping, variant calling, and benchmarking have significantly improved the generation of complete Y-chromosome sequences, enriching our understanding of Y-chromosome variations (Olson et al. 2023).These developments have facilitated the construction of a robust phylogenetic tree, with branch lengths indicating mutation counts (Poznik et al. 2016;Zhabagin et al. 2022).Over the past two decades, studies on targeted Y-SNPs have traced ancestral lines through paternal lineages, providing crucial phylogenetic data for research on human origins, migrations, and admixture (Su et al. 1999;Zerjal et al. 2003).Resequencing the entire Y-chromosome region using advanced next-generation sequencing and computational techniques has transformed research paradigms.For instance, Wei et al. identified 6,662 high-confidence variants across 36 diverse Y-chromosome sequences, refining existing Y-chromosome phylogenies (Wei et al. 2013).Similarly, Poznik et al. (2016) analyzed 1,244 complete Y-chromosome genomes from the 1000 Genomes Project (1KGP), uncovering over 65,000 variants and identifying recent expansions within specific paternal lineages.Studies on single populations or specific lineages have also been conducted.The O1a-M119 lineage, which is shared among the Sinitic, Tai-Kadai, and Austronesian groups, and key paternal lineages like C2a-F5484 and Q1a1a-M120 have been examined to trace their origins, diffusion, and contributions to the gene pools of Chinese ethnolinguistically diverse groups (Sun et al. 2019;Wu et al. 2020;Sun et al. 2021).However, the availability of large-scale Y-chromosome genomic databases for China remains limited, underscoring the need for more comprehensive databases to explore the paternal genetic landscape and its historical influences on diverse populations.
Recent increases in genomic resources from Chinese populations have highlighted the gap in our understanding of the paternal genetic diversity among ethnic minorities, which lags significantly behind that of Han Chinese and other global populations (Karmin et al. 2022).To address this issue, we launched the 10K_CPGDP by employing anthropologically informed sampling strategies (He et al.Wang et al. • https://doi.org/10.1093/molbev/msae122MBE 2023c).Additionally, we introduce the YanHuang cohort (YHC) genomic resource that includes new Y-chromosome sequences from ethnolinguistically diverse ethnic minorities and integrates data from the 10K_CPGDP.The YHC aims to provide a high-quality population-specific Y-chromosome database, delineate the fine-scale paternal demographic history of underrepresented groups, construct a highresolution, time-stamped phylogenetic tree, and develop novel East Asian-specific next-generation sequencing panels covering SNPs, STRs, InDels, and other variants for medical and forensic use.We also developed the "YHSeqY3000", the highest-resolution Y-specific targeted resequencing panel designed from whole-genome and genome-wide SNP data of Y-chromosomes within the YHC.We genotyped 2,999 panel-related Y-SNPs in 919 males from 57 diverse ethnic minorities who were also genotyped by whole Y-chromosome sequencing.Our efforts culminated in a comprehensive Y-chromosome database encompassing 15,563 individuals from modern and ancient Eurasian backgrounds, allowing us to construct the first fully resolved phylogeny incorporating ancient DNA sequences.This phylogeny helps estimate the coalescence dates of dominant lineages, trace the origins of Chinese paternal lineages, and elucidate the impacts of historical migrations, admixture, and shifts in subsistence strategies on the genetic architecture of these diverse groups.

Results and Discussion
Genetic Diversity of YHC Paternal Lineages Inferred from Y-Chromosome Sequences and the YHSeqY3000 Panel We performed whole Y-chromosome sequencing on 919 participants from 57 populations of 39 ethnic minorities (Fig. 1a; supplementary table S1, Supplementary Material online), integrated the genetic data of nearly 15,000 modern and ancient Eurasian people (supplementary tables S2 and S3, Supplementary Material online), and developed a highresolution YHSeqY3000 panel, including Y-SNPs not present in existing phylogenetic databases (ISOGG, Yfull).The  We grouped populations by linguistic and ethnic traits to investigate genetic affinities within language-or ethnicitybased metapopulations (supplementary fig.S7c to h, Supplementary Material online).Geographically close populations, including the Austronesian-speaking Saisiyat, Thao, Taroko, Atayal, and Tsou from Taiwan Province, clustered distinctly, separating early from other reference groups (supplementary fig.S7c, Supplementary Material online).Distinct branches primarily comprised Tai-Kadai, nearby Austronesian groups like Ede and Giarai, and southern Tibeto-Burman speakers, such as Sila and Lolo.The genetic closeness between the Austronesian-related and Tai-Kadaidominant clusters supports the hypothesis of a shared origin for Austronesian and Tai-Kadai speakers, as demonstrated by phylogenetic analyses based on neighbor-joining methods and clustering inferred from the haplogroup frequency spectra, PCA, and MDS (supplementary fig.S7d to f, Supplementary Material online).These analyses also revealed fine-scale genetic differences between Han Chinese and Tibeto-Burman populations and among linguistically diverse groups, underscoring frequent massive population movements and gene flow events in historical contexts.To determine whether paternal lineages corroborate current language family classifications and further explore genetic relationships within linguistically defined metapopulations, we merged all groups based on linguistic affinities for a comprehensive population genetic analysis (supplementary fig.S7g and h

MBE
clustering between the Tai-Kadai and Austroasiatic groups and between the Mongolic/Tungusic groups and the Amur River Basin ancient populations was observed.The neighbor-joining tree also indicated close genetic relationships between the Turkic and ancient Xinjiang populations, between the Koreanic and Japonic populations, and between the Austronesian and ancient Hanben populations (supplementary fig.S7h, Supplementary Material online).This study provides robust paternal genetic evidence supporting complex admixture and interactions among modern Chinese populations and ancient Eurasians.However, caution is advised regarding potential biases from low-coverage sampling and the simplistic grouping of linguistically similar, yet geographically disparate populations.

Complex Population Migration and Admixture Events Inferred from the Y-Chromosome Diversity Landscape
The observed paternal genetic structure indicated that multiple complex ancient migration and admixture events significantly shaped the gene pool of Chinese populations.A time-stamped phylogenetic tree revealed multiple lineage diversifications after the last glacial maximum (20 kya), with these lineages dispersing at varying times (Fig. 1b; supplementary fig.S1, Supplementary Material online).Analysis using a maximum likelihood (ML) tree incorporating ancient DNA sequences revealed diverse founding populations contributing to the Chinese paternal gene pool that likely originated from ancient migrations of descendants from indigenous rice or millet farmers, Siberian huntergatherers, or western Eurasian steppe pastoralists (Fig. 2; supplementary figs.S9 to S12, Supplementary Material online).The extent to which ancestral sources affected the paternal genetic makeup of Chinese ethnic minorities was systematically investigated, along with the geographical spread of identified lineages and their associations with expansions related to ancient farmers, hunter-gatherers, and pastoralists.Additionally, to determine the origins and distribution patterns of dominant paternal lineages in China, the participants were grouped into geographically defined metapopulations, and general geographical distribution patterns were estimated (Figs. 3 to 5; supplementary figs.S17 and S18, Supplementary Material online).Finally, we systematically assessed how ancient technological innovations and human migration events have influenced the paternal genetic landscape of Chinese populations, revealing a complex interplay of genetic inputs from various ancient populations.

Gene Flow from Ancient Pastoralists and Barley Farmers in
West Eurasia and Central/South Asia to East Asia Prehistoric and historical cultural exchanges along the southern Bactrian Marianna Archaeological Complex oasis farming route, Inner Asian Mountain Corridor, and northern Yamnaya/Afanasievo steppe pastoralist migration routes have significantly shaped the autosomal gene pool of ancient populations in the Altai Mountains and surrounding areas of northwestern and northern East Asia (Zhang et al. 2021b).
Haplogroups J/G/R and their major sublineages, which are prevalent among ancient western Eurasians, exhibit the highest frequencies in Northwest China (Figs. 2 and 3a and b; supplementary fig.S9a, Supplementary Material online).Specifically, most J haplogroup carriers in China belong to the J2-M172 sublineage, particularly J2a-M410.The origins of J2a in ancient populations can likely be traced back to the northern Fertile Crescent, and its current distribution primarily reflects expansions and admixture events related to ancient barley farmers (Figs. 2 and 3a).Similarly, individuals carrying G-M201 in Northwest China were predominantly classified under sublineage G2a (Fig. 3b).An optimized hot spot analysis revealed diffusion centers for J2a and G2a in the Xinjiang and Gansu-Qinghai regions, suggesting a correlation with these areas (supplementary fig.S17a, Supplementary Material online).Generally, the introduction of J/G-derived lineages into China is attributed to the eastward migration of barley farmer-related ancestral populations likely facilitated by gene flow events along the ancient Silk Road (Zhabagin et al. 2022;He et al. 2023c).
R-M207 is predominantly found among ancient western Eurasians and modern populations in North China, particularly in Northwest China (Figs. 2 and 3a and b).The basal haplogroup R was identified in a ∼24,000-year-old individual from the Mal'ta site near Lake Baikal in Siberia (Raghavan et al. 2014).In China, approximately 90% of R carriers are categorized as R1-M173, which bifurcates into R1a-L146 and R1b-M343 approximately 23 kya.The frequency of R1a-L146 notably exceeded that of R1b-M343 (Figs. 1b and 3b; supplementary fig.S1, Supplementary Material online).Furthermore, all individuals within R1a were classified into R1a1a sublineages, with R1a1a1b diverging approximately 5 kya and being the most prevalent (Fig. 1b; supplementary fig.S1 and table S5, Supplementary Material online).The spatiotemporal distribution of R1 subclades is closely linked to the movements of ancient steppe pastoralists, underscoring a significant genetic flow into China (Figs. 2 and 3a).Conversely, R2-M479 appears in East Asia at low frequencies (supplementary fig.S17a, Supplementary Material online) and is primarily concentrated in Central/South Asia, having recently extended from South Asia to North China via the northern route.Analysis combining ancient and modern population phylogenies revealed that samples from Mongolia with substantial West Eurasian ancestry, such as Mongolia_EIA_Sagly_4 and Mongolia_LBA_MongunTaiga_ 3, fall within the R1a1a sublineage.Nearly half of the ancient Xinjiang individuals are categorized within sublineages R1a or R1b, reflecting the historical impact of the Yamnaya/ Afanasievo-related pastoralists on the genetic makeup of the northwestern Chinese populations (Figs. 2 and 3a).Additionally, the sporadic presence of other rare haplogroups like H-L901 and I-M170 in China suggests a broad and recent gene flow from Central/South Asian and West Eurasian ancestors into the region.
To confirm that migrations related to pastoralist populations have reshaped the distribution of western Eurasian-related lineages in Chinese populations, we estimated the correlation between haplogroup frequencies and Complex Formation Processes of Chinese Paternal Landscape • https://doi.org/10.1093/molbev/msae122MBE both geographical (longitude and latitude) and genetic features (PC1-2, haplogroup frequency, Fst matrix, and autosomal-based admixture proportions).The frequencies of R-related sublineages correlate with latitude and exhibit high frequency in modern northwestern Chinese populations (supplementary fig.S14a and b, Supplementary Material online).Furthermore, the distribution patterns of R and its sublineages were significantly correlated (supplementary fig.S14c, Supplementary Material online).To elucidate the direct genetic contributions from ancient sources to modern Chinese populations, we constructed a six-source admixture model, revealing a gradual decrease in ancestral proportions from their archeologically confirmed origins or earliest emergence areas in China (Fig. 4b).If ancient migration events Fig. 2. Maximum likelihood phylogenetic tree among modern and ancient Eurasian populations.This tree includes newly genotyped individuals from Chinese ethnic minorities and ancient Eurasian reference populations.The colored regions on the map signify the sampling locations of both ancient and modern Eurasian populations, with black triangles marking the sites where ancient Eurasian samples were collected.The branch lengths are proportional to the mutation counts, and the branch colors indicate the sampling locations, with solid lines representing modern individuals and dashed lines depicting ancient individuals.Inside the circle, different shapes and colors denote the language families of modern individuals: rosy for Sinitic; dark goldenrod for Tibeto-Burman; bluish violet for Mongolic; brown for Tungusic; gray for Turkic; lavender for Koreanic; green for Tai-Kadai; and medium sea green for Hmong-Mien.Due to the small sample sizes (fewer than five), Austronesian, Austroasiatic, and Indo-European-speaking populations are not labeled.The outer circle displays sample information and corresponding haplogroup results, with violet representing modern individuals and light blue for ancient ones.Enlargements of dominant lineages and their representative ancient genomes are highlighted at the four corners of the map.Detailed views of these branches are provided in supplementary figs.S9 to S12, Supplementary Material online.Wang et al. • https://doi.org/10.1093/molbev/msae122MBE directly influence the lineage frequency patterns in Chinese populations, a solid positive correlation would be expected between the proportion of autosomal-based admixture from presumed ancestral sources and the frequency of founding lineages.Intriguingly, a significant correlation was observed between the Afanasievo-related ancestral proportions and the haplogroup frequencies of multiple H, J, and R sublineages (Fig. 5a and g).These findings, derived from the haplogroup frequency spectra of modern and ancient Eurasians, phylogeographic origin inferences, and multiple factor correlations, suggest that migrations of western Eurasian barley and pastoralist-related populations likely facilitated the development of these Chinese founding lineages.

Siberian Hunter-Gatherer-Dominant Paternal Lineages Are Widely Distributed in China
Ancient DNA studies have identified an ancestral component, termed Ancient Northeast Asian (ANA) ancestry, related to Neolithic hunter-gatherers from the Russian Far East, Mongolian Plateau, and Baikal region (Jeong et al. 2020; Fig. 4a and c).This ANA-related ancestry has contributed variably to distinct ancient populations in these regions,  and 3a).The frequencies of the C2/N1/ R1 sublineages were significantly positively correlated with the ANA-related ancestry (P < 0.05, Fig. 5b and g).The haplogroup Q-M242 appears in China at very low frequencies (<3%, supplementary table S5, Supplementary Material online) and displays varied distribution patterns between North and South China (Fig. 3c; supplementary table S5, Supplementary Material online).This lineage, which might have originated in Central Asia and southern Siberia approximately 31 kya (Fig. 1b), includes the Q1a1a-M120 subclade.This subclade, unique to East Asians, is relatively prevalent among Han Chinese individuals (∼81% of all Q lineages, supplementary table S5, Supplementary Material online) and likely underwent a local expansion in Northwest China between 5 and 3 kya (Sun et al. 2019).Furthermore, the Q1a1a1-F1626 subclade, a derivative of Q1a1a-M120, diversified approximately 4.3 kya (Fig. 1b).The ML phylogenetic topology indicated that ancient Mongolian individuals with minimal West Eurasian-related ancestry (<20%) belonged to Q1a1a or its sublineages (Figs. 2 and 3a).Venn diagrams illustrating shared ancestry-correlated lineages also show that the Q and R lineages are common among the Yamnaya and ANA-associated lineages (supplementary fig.S15, Supplementary Material online).Moreover, ancient individuals from the middle Neolithic (MN) Yangshao culture and approximately 3,000-year-old Hengbei residents from Shanxi, who carried the Q1a1a-M120 lineage, indicate that this haplogroup influenced the Han Chinese gene pool at least 6 kya.Q1b-M346, although rare in China, is   S5, Supplementary Material online), with some Bronze Age (BA) and IA individuals from the Mongolian Plateau and Xinjiang regions genotyped for Q1b or its subclades (Figs. 2 and 3a).
Ancient DNA evidence from autosomal variations and maternal lineages further underscores the substantial impact of Neolithic millet farmers on the permanent settlement of the Tibetan Plateau (Wang et al. 2023).
Archeological evidence indicates that millet-based agriculture independently emerged in the Yellow River Basin and West Liao River at approximately 6,000 BCE, fostering the development of foxtail (Setaria italica)-prevalent Yangshao and broomcorn (Panicum miliaceum)-prevalent Xinglongwa cultures, respectively (Miller et al. 2016;Leipe et al. 2019).Leipe et al. noted that shifts in agricultural practices from approximately 6000 to 2,000 BCE led to a quasi-exponential population growth in North China, aligning with the major dispersal of Sino-Tibetan-speaking populations from the Yellow River Basin during the fourth millennium BCE (Leipe et al. 2019).Ancient DNA analyses of millet farmers from the Yangshao and Longshan cultures suggested that the Sino-Tibetan people originated in North China (Ning et al. 2020b).The Haojiatai-related ancestry dominant in Chinese populations correlated strongly with the O/Q/C/N lineages (Figs.4e and  5d).O-M175, which is prevalent in East and Southeast Asians, includes the significant O1-F265 and O2-M122 subclades, whose expansions are associated with the spread of millet and rice agriculture from domestication centers in the Yellow River Basin, West Liao River, and Yangtze River Basin (Fig. 5d to f).The influence of Ancient Northern East Asian (ANEA) on modern East Asian paternal genetic diversity requires a further comprehensive assessment.O-related sublineages, with O2 lineages diversifying approximately 29 kya (Fig. 1b), are broadly distributed in North China and the Tibetan Plateau (Figs. 2 and 3a).O2-M122, particularly subclade O2a-M324, is a major paternal lineage in East and Southeast Asians, showing a strong correlation in distribution patterns (Fig. 3d; supplementary figs.S17, and S18c, Supplementary Material online).O2a-M324 is found at high frequencies along China's coast and surrounding areas (>52%), suggesting ancestral migration routes along the coast extending into SEA (supplementary fig.S17b, Supplementary Material online).An ancient individual from the MN West Liao River Hongshan culture identified as belonging to O2a-M324 supports this lineage's association with early cultural developments in Northeast China.Systematic evidence further corroborated that O2a-M324 originated in Northeast China, particularly in Heilongjiang Province, where it remains highly prevalent (supplementary fig.S17b, Supplementary Material online).However, the Complex Formation Processes of Chinese Paternal Landscape • https://doi.org/10.1093/molbev/msae122MBE histories of diverse human populations.However, research into the ancient genetic legacy reflected in modern Chinese populations via Y-chromosome analysis remains sparse.To address this gap, we used the YHC to analyze the Y-chromosome diversity in ethnolinguistically diverse Chinese populations through whole Y-chromosome sequencing and our newly developed high-resolution YHSeqY3000 panel.This project reconstructs demographic events, such as isolation, expansion, and admixture, using various computational models.The new data were integrated with a Y-chromosome genomic database of 14,644 individuals, creating a comprehensive database that includes 1,786 ancient Eurasians and 115 modern Chinese populations from 47 ethnic groups.This integration facilitates an in-depth exploration of the paternal genetic diversity of Chinese populations.Our findings indicate that multiple founding lineages associated with millet/rice farmers from the Yellow River Basin and the Yangtze River Basin, Siberian hunter-gatherers, and ancient western Eurasian pastoralists and farmers significantly influence the geographical patterns of paternal genetic stratification in Chinese populations.There is a strong correlation between the frequency of subsistence model-related founding lineages and the proportion of autosomal-based admixture from presumed ancestral sources, as well as between the latitude and a differentiated north-to-south genetic matrix.These correlations suggest that ancient migrations and extensive admixtures with indigenous populations primarily shaped the paternal genetic landscape of Chinese populations.To further elucidate the paternal evolutionary history of East Asians, we emphasize the importance of combining high-depth whole-genome sequencing data from both modern and spatiotemporally diverse ancient populations.This comprehensive approach will enhance our understanding of the dynamic interplay between migration, admixture, and cultural development in this region.

Study Participants
To comprehensively characterize the paternal diversity across China, saliva samples were collected from 919 participants representing 39 ethnolinguistic groups (supplementary table S1, Supplementary Material online).The participants were all descendants of self-identified ethnic group members, with their grandparents having resided in their respective sampling districts for at least three generations.The study received approval from the Medical Ethics Committee of West China Hospital of Sichuan University (2023-306) and was conducted following the Helsinki Declaration of 2013 (World Medical Association 2013).Informed consent was obtained from each participant before sample collection.

DNA Extraction, Whole-Genome Sequencing, and Genotyping
Genomic DNA was extracted using the QIAamp DNA Mini Kit (QIAGEN, Germany).DNA concentrations were quantified with the Qubit dsDNA HS Assay Kit, following the standard protocol on a Qubit 3.0 fluorometer (Thermo Fisher Scientific).Sequencing was conducted on the Illumina platform (Illumina, San Diego, CA, USA), achieving 80× genome-wide coverage.The raw sequencing reads were mapped to the human reference genome GRCh37 using BWA v0.7.13 (Li and Durbin 2009).Duplicate reads were removed with Picard v3.0.0, followed by a base quality score recalibration via GATK v4.2.6.1.Joint variant calling was executed using GATK HaplotypeCaller, CombineGVCFs, and GenotypeGVCFs modules (McKenna et al. 2010).High-quality variant calls within a 10 Mb region were obtained through a sequence mask (Poznik et al. 2013).
Variants exhibiting missing call rates greater than 5%, base quality below 20, and heterogeneity rate above 15% were filtered out using BCFtools v1.8 (Li 2011).Samples with missing call rates exceeding 5% were removed via vcftools v0.1.16(Danecek et al. 2011).Ultimately, 914 samples meeting quality standards were selected for the downstream analysis, including the reconstruction of a time-scaled phylogenetic tree.Additionally, Y-specific target sequences with 100× coverage were generated using the custom-designed YHSeqY3000 panel on the MGI sequencing platform to validate the sequencing performance.

Haplogroup Classification and Phylogenetic Relationship Construction
The initial classification of the Y-chromosome haplogroups was performed using in-house scripts based on a newly reconstructed phylogenetic tree supplemented by classifications from HaploGrouper (Jagadeesan et al. 2021) and Y-LineageTracker (Chen et al. 2021), referencing the Y-DNA Haplogroup Tree 2019-2020 (https://isogg.org/tree/index.html).BEAST v1.10.4 (Suchard et al. 2018) facilitated the construction of a phylogenetic tree and the estimation of the TMRCA for various nodes using approximately 10 Mb of Y-chromosome sequences.B-related haplotypes served as an outgroup (Mallick et al. 2016).The optimal substitution model was selected via jModelTest v2.1.10 (Darriba et al. 2012).Markov chain Monte Carlo sampling was executed over 100 million iterations, with samples logged every 1,000 iterations and the initial 10 million iterations discarded as a burn-in.An exponential growth coalescent tree prior was used alongside the GTR (general time reversible) substitution model and a strict molecular clock.The substitution rate was set at 7.6 × 10 −10 mutations per base pair per year (95% confidence interval: 6.7 × 10 −10 to 8.6 × 10 −10 ), as estimated by Fu et al. (2014).Three independent runs were amalgamated using LogCombiner, with the quality of the combined output manually verified using Tracer v1.7.1 (Rambaut et al. 2018).The maximum clade credibility tree was then generated with TreeAnnotator v1.10 and visualized using FigTree.To further investigate the ancient influences on the paternal landscape of the recently genotyped Chinese ethnic minorities, an ML phylogenetic tree was constructed using RAxML (Stamatakis 2014) (Martiniano et al. 2022), and the tree was refined with iTOL (Letunic and Bork 2021).For the complete data set of Y-chromosome target sequences from 919 samples, a network-based analysis of shared haplotypes was conducted using PopART (Leigh and Bryant 2015), providing a comprehensive view of haplogroup relationships.

Data Set Composition
We integrated previously published haplogroup data from 11,979 East Asian individuals across 79 populations drawn from key studies, the 1KGP, and the Human Genome Diversity Project (Poznik et al. 2016;Bergstrom et al. 2020).Additionally, data from 879 individuals across 27 SEA populations; 252 ancient East Asians from regions, including the Tibetan Plateau, Xinjiang, Amur River Basin, Yellow River Basin, West Liao River, and South China; and 1,534 ancient western Eurasians from the Allen Ancient DNA Resource were included (supplementary tables S2 and S3, Supplementary Material online; Mallick et al. 2024).A total of 13,777 modern individuals from 12 linguistically distinct groups were sampled, spanning 22 provinces, five autonomous regions, and four municipalities in China, as well as Thailand and Vietnam.These included 135 Austroasiatic-, 693 Austronesian-, 285 Hmong-Mien-, 75 Japonic-, 35 Koreanic-, 994 Mongolic-, 863 Tai-Kadai-, 1338 Tibeto-Burman-, 260 Tungusic-,1 Indo-European-, 291 Turkic-, and 805 Sinitic-speaking Hui, 3,248 northern Han Chinese, and 4,754 southern Han individuals (supplementary tables S1 and S2, Supplementary Material online).The haplogroups were manually revised according to variant information and the Y-DNA Haplogroup Tree 2019-2020.To facilitate the estimation of the spatial distributions of the paternal lineages, we aggregated haplogroup data to create metapopulations based on geographical region, ethnicity, and language family.The haplogroup frequencies were estimated at various levels of terminal haplogroups.Population genetic analyses were conducted on individual populations with sample sizes exceeding 10 and metapopulations exceeding 30.

Population Structure Inference
Pairwise Fst genetic distances were calculated from the haplogroup frequency spectra using Y-LineageTracker.MDS analyses were conducted based on these genetic distances utilizing the "cmdscale" function in R (https://itol.embl.de/itol.cgi).Additionally, PCA was performed on the haplogroup frequency spectra using Y-LineageTracker.

Spatial Statistics Correlated with the Phylogeographic Origin of Founding Lineages
The frequency of specific haplogroups within a provincedefined population at various terminal haplogroup levels was computed using Y-LineageTracker, with level parameters adjusted from 0 to 6.The Chinese populations were grouped according to provincial administrative boundaries, while populations from the island and mainland SEA were aggregated by country.The spatial distribution patterns of the dominant haplogroups in China were examined using ArcMap.This included the application of the Getis-Ord General G method for optimized hot spot analysis and spatial autocorrelation analysis using Moran's I.The clusters identified through optimized hot spot analysis, referred to as hot and cold spots, approximated the potential geographical origins or diffusion centers of specific haplogroups, and the mirroring regions illustrated the general distribution trends of these haplogroups.

Autosomal-Based ADMIXTURE Estimation
A data set was constructed from 445 ancient individuals across 88 Eurasian populations and 1,325 modern individuals from 62 geographically diverse populations, sourced from our integrated 10K_CPGDP database.Admixture proportions of Chinese populations were estimated using model-based ADMIXTURE.The autosomal data set was pruned using PLINK (Chang et al. 2015) with the parameters "--indep-pairwise 200 25 0.4" and "-allow-no-sex".Subsequently, ADMIXTURE was run with predefined ancestral sources ranging from 2 to 15 (Alexander et al. 2009).The optimal admixture model was determined based on the lowest cross-validation error values, and correlations between the haplogroup frequencies and autosomal-based admixture proportions of modern Chinese populations were estimated.

Correlation between Haplogroup Frequency and ADMIXTURE-Based Ancestral Proportion
The haplogroup frequencies of geographically defined metapopulations were initially calculated.The Chinese populations distinguished by geographic and ethnolinguistic characteristics were grouped by provincial administrative region.All examined lineages were truncated at the ninth level, identifying 139 common lineages with a frequency exceeding 0.05 in at least one population, 177 low-frequency lineages, and 165 rare lineages.Pearson's correlation coefficients between haplogroup frequencies and geographic coordinates (longitude and latitude), along with their intercorrelations and statistical significance, were estimated using the "corrplot" R package.Subsequently, all Chinese populations were consolidated into a single subpopulation, defining common lineages with frequencies above 0.01 or 0.05.The "corrplot" R package was also utilized to assess the correlation between admixture proportions and haplogroup frequencies.

Ethics Approval and Consent to Participate
This study received approval from the Medical Ethics Committee of West China Hospital of Sichuan University and was conducted following the principles outlined in the Helsinki Declaration.

Consent for Publication
Not applicable.

Fig. 1 .
Fig. 1.Geographic location and phylogenetic characteristics of 919 newly sequenced Chinese minority individuals.a) A map of East Asia displays essential data for 919 individuals from 57 Chinese ethnic minority groups.Circle sizes on the map indicate the sample sizes of individual populations, while colored provinces represent sampling locations, with colors denoting the total sample size from those regions.Additionally, ancient subsistence strategies, such as pastoralism, hunter-gathering, and agriculture, from western Eurasia, the Mongolian Plateau, and the origin centers of Chinese agriculture in the Yellow and Yangtze River basins are depicted.b) The Y-chromosome phylogeny includes 914 individuals who passed quality control, illustrating the most recent common ancestors (TMRCA) of various prevalent paternal lineages.B-lineage-related representative haplotypes from the Simons Genome Diversity Project serve as an outgroup.Branch lengths correlate with the estimated TMRCA.Major lineages are indicated by colored triangles, with the base width of each triangle proportional to the sample size.A detailed, time-stamped phylogenetic tree is presented in supplementary fig.S1, Supplementary Material online, with scales of divergence times differentiated by varying background colors.

Fig. 3 .
Fig. 3. Frequency spectrum of dominant Chinese paternal lineages in ancient Eurasians and modern East and Southeast Asians.a) The geographic distribution of 1,284 ancient individuals carrying 12 Y-chromosome lineages is depicted.Various haplogroups are represented by circles of different colors, with the size of each circle proportional to the frequency of the corresponding haplogroups.b and c) Frequencies of paternal lineages related to Western-origin and Siberian hunter-gatherers among eastern Eurasian populations are shown.Optimized hot spot analysis was employed to suggest the geographic origins of these focused lineages.d and e) Frequencies of sublineages associated with early East Asian-related D, ANEA millet farmer-related O2, and ancient Southern East Asian rice farmer-related O1 are displayed.Areas with high frequencies or significant phylogeographic relevance of the studied lineages are highlighted in hot red.Detailed frequency distributions of additional sublineages are provided in supplementary figs.S13 and S16 to S18, Supplementary Material online.

Fig. 4 .
Fig.4.Admixture results and geographic distribution of ancestral sources.a) Model-based ADMIXTURE analysis was performed for modern and ancient East Asian populations using predefined ancestral sources ranging from 2 to 15.The optimal fit was achieved with a six-way admixture model, which exhibited the lowest cross-validation error.b to g) The distribution of admixture proportions across various Chinese populations is depicted, with red indicating the highest proportion of a specific ancestral component.

Fig. 5 .
Fig. 5. Correlation between the autosomal ancestral proportions and the frequencies of the Y-chromosome lineages.a to f) Scatter plots display statistically significant positive correlations between autosomal-based ancestral proportions and population-specific founding paternal lineages.g) Correlations between autosomal-based admixture estimates of ancestral proportions and frequencies of paternal lineages are shown.Correlations involving proportions of different ancestral sources were excluded, and the initial clustering positions of these correlations are marked with arrows.This visual analysis elucidates the complex interplay between autosomal and Y-chromosomal data in tracing genetic heritage and lineage dynamics.