Ancient Genomes Reveal the Evolutionary History and Origin of Cashmere-Producing Goats in China

Abstract Goats are one of the most widespread farmed animals across the world; however, their migration route to East Asia and local evolutionary history remain poorly understood. Here, we sequenced 27 ancient Chinese goat genomes dating from the Late Neolithic period to the Iron Age. We found close genetic affinities between ancient and modern Chinese goats, demonstrating their genetic continuity. We found that Chinese goats originated from the eastern regions around the Fertile Crescent, and we estimated that the ancestors of Chinese goats diverged from this population in the Chalcolithic period. Modern Chinese goats were divided into a northern and a southern group, coinciding with the most prominent climatic division in China, and two genes related to hair follicle development, FGF5 and EDA2R, were highly divergent between these populations. We identified a likely causal de novo deletion near FGF5 in northern Chinese goats that increased to high frequency over time, whereas EDA2R harbored standing variation dating to the Neolithic. Our findings add to our understanding of the genetic composition and local evolutionary process of Chinese goats.


Introduction
As one of the most widespread and adaptable farm animals, goats inhabit a wide agroecological niche spanning all continents. In China, there are 138 million goats, which are distributed among 58 indigenous breeds adapted to various agroclimatic conditions (Du 2011;Skapetas and Bampidis 2016). Due to marked differences in climate, China is often partitioned into a northern and a southern region bounded by the Qinling Mountains-Huaihe River line, which approximates the 0 C January isotherm and 800 mm isohyet (Jian et al. 2012). Northern China has a relatively cold and dry climate, whereas southern China is relatively hot and humid (Jian et al. 2012). Goats in northern and southern China correspondingly evolved a series of distinct morphological traits (Du 2011). For example, goats in northern China, the main breeding area for cashmere goats, have an extraordinarily dense coat of hair and a more compact body conformation than those in southern China (Liu and Feng 1993;Hai-zhi et al. 2001). These local adaptations of northern and southern goats provide an opportunity to study fine-scale environmental adaptation under the framework of domestication. Understanding the evolutionary genomics behind such adaptations can help animal husbandry meet the challenges of global climate change.
A recent paleogenomic study indicated three distinct Neolithic goat populations around the Fertile Crescent, which contributed differentially to modern goat populations. These include the Neolithic West (Anatolia and the Balkans), Neolithic East (Iran and Turkmenistan), and Neolithic Levant (Jordan and Israel) populations (Daly et al. 2018). However, the origin, genetic turnover, and differentiation of Chinese goats are not well studied at the genomic level. Morphology-based archeological studies show that Chinese domestic goats may have been introduced from the Eurasian steppe to China by the early second millennium BC, concurrent with the gradual cooling and aridification of northern China, which increased the availability of grazing (Liu and Chen 2012a). This time period also included increasing contact between Chinese and Eurasian agricultural civilizations (Yuan et al. 2008;Liu and Chen 2012b). The geographical origin of Chinese goats and the process of their adaptive differentiation are still not clear, however, due to the lack of ancient genomic evidence. Although the origin of Chinese goats has been studied based on mitochondrial sequences (Chen et al. 2005;Han et al. 2010), such studies have severe limitations due to the reduced information content of singlelocus analyses.
Time-stamped ancient DNA data can help clarify historical selection processes and provide direct evidence of the genomic dynamics experienced by populations. In this study, we generated genomic data from goat remain dated to $3,900-450 years before the present (YBP) from nine archeological sites in China. By comparing the published data on ancient and modern goats, the origin and genetic differentiation of Chinese goats were investigated.

Genome Sequencing of Ancient Goats
We generated genome-wide data from 27 sets of Chinese ancient goat remains (table 1), including three sets of goat remains from the Shimao site of the Late Neolithic period ($3,900 YBP), 16 from six Bronze Age sites ($2,700-2,500 YBP), and eight from two Iron Age sites ($650-450 YBP) ( fig. 1a, supplementary figs. S1, S2 and tables S1, S2, Supplementary Material online). Notably, the Shimao goats represent the oldest known Chinese archaeological samples. We analyzed these data in combination with 54 previously reported ancient goat whole-genome sequencing data sets  . 1b). These relationships were also confirmed by the phylogenetic tree (supplementary fig. S4, Supplementary Material online).
To further determine the origin of Chinese goats, we then used D statistics to measure the genetic affinities among all

The Genetic Differentiation of Chinese Goats
To focus on the diversity within China, we then performed phylogenetic analysis and PCA using only Chinese goats. The phylogenetic analyses using modern Chinese goats and three ancient samples (YJL01 and YJL02 from Yanjialiang, WDH06 from Wangdahu) with at least 7Â coverage (7.350Â, 13.439Â, and 8.106Â, respectively) show that ancient Chinese goats are basal to the lineage of all Chinese goats, whereas modern Chinese goats form a monophyletic group with a north-south structure ( fig. 2a). This topology was supported by PCA with additional ancient samples, including low-depth samples. All the ancient Chinese goats were located in the center of the modern Chinese goats, and the first component was driven by the difference between NC and SC ( fig. 2b), which also coincides with the results of the fineSTRUCTURE and TreeMix analysis (supplementary figs. S7 and S8, Supplementary Material online). We then calculated the nucleotide diversity (h p ) within each group and the inbreeding coefficient (F) for each individual (supplementary figs. S9 and S10, Supplementary Material online). The results showed that modern SC has lower diversity (Student's t-test P ¼ 2.14 Â 10 À24 ) and a higher inbreeding coefficient (Student's t-test P ¼ 0.0035) than NC. The divergence time between NC and SC as estimated by diffusion approximations for demographic inference (@a@i) was $3,013 YBP (1,947-6,059 YBP, 95% CI) (supplementary fig. S11, Supplementary MBE Material online). These results confirmed that modern Chinese goats were mainly descended from ancient Chinese goats with a subsequent north-south separation.
The genetic differentiation within Chinese goats is also reflected in mitochondrial DNA. The maternal phylogenetic analysis showed that ancient Chinese goats were highly diverse, with four haplogroups (58% A, 29% B, 6% C, and 6% D) ( fig. 2c). This diverse mtDNA gene pool has existed since the Late Neolithic, with three haplogroups (A-C) found in Shimao (supplementary fig. S12, Supplementary Material online). In modern Chinese goats, SC has a similar haplotype A frequency (61%) to that of ancient Chinese goats, which is significantly lower than that of NC (92%, Fisher's exact test P ¼ 0.02) and the previously reported worldwide haplogroup A frequency (Naderi et al. 2007) (fig. 2c). The remaining SC goats fall within the B haplogroup, with a significantly higher frequency than that of NC (Fisher's exact test P ¼ 0.01) ( fig. 2c).
We then calculated outgroup f3 statistics to measure the genetic affinities between the global population of ancient goats and modern Chinese goats. The results showed that among all ancient populations, modern Chinese goats have higher genetic affinity to ancient Chinese goats (supplementary fig. S13, Supplementary Material online), which further corroborates the genetic continuity of Chinese goats rather than indicating population turnover due to immigration. However, compared with SC, NC exhibited less allele sharing with ancient Chinese goats from all periods (

Local Adaptation by Selective Sweeps
Northern and southern Chinese goats inhabit divergent environments and have separated with respect to both genotype and phenotype (Du 2011). For example, 11 of the 19 local goat breeds in northern China are cashmere-producing, whereas none of the 39 southern breeds are cashmere-producing (Du 2011). To gain insight into the genetic basis of the north-south divergence in modern Chinese goats, we next scanned for selection signatures separately in NC and SC. We calculated F ST and the h p ratio between NC and SC and used an outlier approach to identify genomic regions undergoing selective sweeps in these two groups. A total of 24 selective sweep regions spanning 33 candidate genes were identified (supplementary fig. S17 and table S6, Supplementary Material online). The top two F ST outliers contained two genes, Fibroblast Growth Factor 5 (FGF5) on chromosome 6 and Ectodysplasin A2 Receptor (EDA2R) on chromosome X ( fig. 3a), both of which are related to the development of hair follicles (Botchkarev and Fessing 2005;Zhang et al. 2009;Wang, Cai, et al. 2016). Furthermore, several other genes that show high selection signals are plausibly related to environmental adaptation. For example, MAGED1 (selected in NC) can bind to nuclear receptor RORa and thereby affect circadian clock function (Wang et al. 2010). CDC25A (selected in SC) is related to body size in goats (Wang, Liu, et al. 2016). SPAG17 (selected in NC) is associated with human adult height (Weedon et al. 2008), and a known mutation can produce mice with significantly shorter hindlimb length (Teves et al. 2015), consistent with selection for contracted limbs in cold-adapted populations (Allen 1877).
The strongest selection signal found on the autosomes (chromosome 6: 95,400-95,640 kb) harbors the FGF5 gene ( fig. 3a), which is related to the regulation of hair length in mice, dogs, and humans (Mizuno et al. 2011;Dierks et al. 2013;Higgins et al. 2014). In goats, a previous CRISPR/Cas9mediated animal experiment showed that the disruption of FGF5 resulted in more secondary hair follicles and longer fibers, resulting in increased cashmere production (Wang, Cai, et al. 2016). We validated the expression of FGF5 in the outer root sheaths of follicles in cashmere goats by immunohistochemical experiments (supplementary fig. S18, Supplementary Material online). Therefore, FGF5 appears to negatively regulate the development of hair follicles.
To identify the potential causal mutation around the FGF5 locus, we inspected all of the divergent mutations between NC and SC. There were no missense mutations in coding regions, but a 504-bp (chromosome 6: 95,454,685-95,455,188 bp) deletion was detected $14 kb downstream of FGF5 ( fig. 3b). The frequency of this deletion was high in NC (83.3%) but much lower in other Asian goats (SC: 8.3%, SAS: 16.7%, SWA: 2.8%). This deletion is absent in European and African goats, as well as in all bezoars (Capra aegagrus) ( fig. 3c). Furthermore, all 19 Chinese cashmere goats in this study carried this mutation, with 14 homozygous animals and 5 heterozygous animals. Notably, none of the ancient goats collected in this study (in China and the Near East regions) were found to harbor this deletion (supplementary note 15, Supplementary Material online).
Multiple lines of evidence suggest that this deletion in NC may have an enhancer function with regards to FGF5. It is located in a highly conserved element across 100 vertebrate species (Casper et al. 2018) and overlaps a H3K4Me1 peak (usually enriched at enhancers [Rada-Iglesias 2018]) in NHEK cells according to ENCODE (Encyclopedia of DNA Elements) data (Rosenbloom et al. 2012) (fig. 3b). NHEK cells are derived from normal human epidermal keratinocytes, which are used to replace hair follicle cells. Furthermore, this deletion region harbors three transcription factor binding sites (EP300, FOS, and CEBPB) verified in the human genome (Wang et al. 2013 show that the FGF5 downstream sequence led to a significant increase in luciferase expression (Student's t-test P ¼ 2 Â 10 À4 and 2 Â 10 À5 , respectively) compared with the promoter-only construct ( fig. 3e and supplementary fig.  S19, Supplementary Material online).
Another region with extremely high F ST occurs on the X chromosome at 17,915-18,539 kb ( fig. 3a). Low h p and Tajima's D values confirmed this selective sweep in NC ( fig. 4a). This 624 kb region contained only one gene, EDA2R (also known as XEDAR), which has been reported to regulate primary hair follicle placode formation (Botchkarev and Fessing 2005;Zhang et al. 2009) and is associated with male pattern hair loss (Prodi et al. 2008). The expression of EDA2R in northern Chinese cashmere MBE goats exhibits a seasonal pattern, being mainly expressed from August to October, and has a high correlation with FGF5 expression (R 2 ¼ 0.8, P ¼ 0.007). It may, therefore, be related to seasonal cold adaptation (supplementary fig.  S20, Supplementary Material online). The haplotype network of the EDA2R-selected region showed two highly divergent haplogroups in worldwide domestic goats ( fig. 4b). Of the NC individuals, 95.3% belong to one haplogroup (hereafter referred to as the NC-type). This haplotype is also found in SC and SWA, but at lower frequencies (36% and 57%, respectively). We traced the emergence of the NC-type among all available ancient and modern samples. At the early domestication stage, the NC-type was already present in the eastern Fertile Crescent (fig. 4b). Furthermore, the NC-type also existed in Uzbekistan and Turkmenistan in the post-Neolithic period. Using all the ancient Chinese samples, we observed that the frequency of the NC-type in the Iron Age (78.6%) was significantly higher than that in the Bronze Age (33.3%) (Fisher's exact test P ¼ 0.009) (supplementary fig. S21, Supplementary Material online). These results suggest that the NC-type of EDA2R descended from an ancestral standing variant >8,000 years old and then spread into Asian goat populations (NC, SC, and SWA).

Discussion
Time-stamped ancient goat DNA data from China provide novel opportunities to investigate the evolutionary origin and genetic differentiation of Chinese goats. Our paleogenomic analyses, including phylogenetic analysis using high-coverage samples ( fig. 2a) in addition to PCA and outgroup f3 using all the ancient Chinese samples (figs. 1b and 2b and supplementary fig. S13, Supplementary Material online), demonstrate the genetic continuity of Chinese goats from the Late Neolithic to the present. In addition, among three distinct Neolithic goat populations located around the domestication center, Chinese goats have a closer genetic affinity with goats in the eastern Fertile Crescent than with those in other regions ( fig. 1b and c), demonstrating that the Neolithic origin of Chinese goats was the eastern Fertile Crescent. Evolutionary History and Origin of Cashmere-Producing Goats . doi:10.1093/molbev/msaa103 that goats from the Chalcolithic period show more genetic affinity with Chinese goats than those from the Neolithic (sample dates ranging $9,000-8,000 YBP) or Bronze Age (sample dates ranging $4,400-3,900 YBP) ( fig. 1d, supplementary figs. S5 and S6, Supplementary Material online). On the basis of the above, we propose that Chinese goats are derived from a stock that separated from the eastern Fertile Crescent population between $8,000 and 4,400 YBP, roughly corresponding to the Chalcolithic period.
Modern Chinese goats showed north-south genetic differentiation in both the nuclear ( fig. 2a and b) and mitochondrial ( fig. 2c) genomes, and southern Chinese goats maintained a gene pool more similar to that of ancient Chinese goats ( fig. 2c-e). The gene flow from EUR into NC ( fig. 2d and supplementary fig. S16, Supplementary Material online) may contribute to this divergence. The higher level of allele sharing between goats in northern China and Europe, in line with the north-to-south decline of European gene flow observed in East Asian humans (Qin et al. 2015), suggested a northern path of gene flow from Europe to East Asia. The diffusion of goats with European ancestry into northern China may be associated with the recent introgression of Mongolian people into extensive areas of China (May 2012); this migration could also have been the driving force behind the exotic introgression of Eurasian taurine cattle into northern China (Chen et al. 2018).
By performing whole-genome selection scans, we analyzed the genetic basis of the divergence between modern Chinese goat populations. The top two divergent genes between NC and SC, a primary hair follicle-related gene (EDA2R) and a secondary hair follicle-related gene (FGF5) are both involved in hair growth. FGF5 is an inhibitor of the hair anagen phase and knocking out this gene can increase cashmere production (Wang, Cai, et al. 2016). We reported a downstream deletion of FGF5 that carries potential cis-regulatory enhancer regions ( fig. 3b, d, and e). This deletion was found in all Chinese cashmere goats and had a high frequency in NC goats compared with goats in other areas ( fig. 3c). These results indicate that this deletion may promote cashmere production by reducing FGF5 expression.
Ancient DNA provides direct insight with which to clarify historical selection processes. We searched for the source of the FGF5 downstream deletion but did not find it in any of the ancient samples (supplementary note 15, Supplementary Material online). Although our sample size does not allow us to exclude that the deletion did, in fact, exist in historical times, our result is compatible with the deletion emerging as a de novo mutation within the last 450 years. Therefore, FGF5 may represent a case of rapid gain of function by cisregulatory element evolution, a process that has previously been established in Yakutian horses (Librado et al. 2015). The evolutionary trajectory of EDA2R is different. The NC-type of EDA2R in Chinese goats appears to have descended from an ancestral standing variant of ancient goats in the eastern Fertile Crescent ( fig. 4b) and then undergone a significant increase in frequency in northern China during the Iron Age (sample dates ranging $650-450 YBP) (supplementary fig. S21, Supplementary Material online).
As wool is an important animal product with a long history of utilization by humans in China (Zhao and Jin 2007), the evolution of both FGF5 and EDA2R is likely to have been driven by artificial selection. Furthermore, the selected haplotypes of these two genes emerged or increased roughly contemporaneously with the period of large-scale cooling in the Northern Hemisphere (supplementary fig. S22, Supplementary Material online), known as the Little Ice Age ($600-300 YBP) (Mann et al. 2009). Hence, it is possible that decreasing temperatures motivated an increase in artificial selection for wool production in the cooler northern parts of China. This may represent a case of environmental change strengthening the human-induced selection regime for wool production traits. Such dynamic selection pressures during domestication and husbandry is an interesting and understudied topic worthy of further investigation.
In conclusion, through an analysis of worldwide ancient and modern goat genomes, we demonstrated that Chinese goats are mainly descended from Chalcolithic eastern Fertile Crescent goats with little genetic turnover from their first arrival in China to the present day. Modern goats in southern China in particular have retained a relatively archaic genetic profile, whereas northern Chinese goats show shifts in both nuclear and mitochondrial DNA over time. The two most divergent genes between NC and SC are both related to hair follicle development, and the evolutionary trajectories of these genes were uncovered through ancient DNA. Our study reveals the genetic origin and genetic differentiation of Chinese goats and contributes to a new understanding of the eastward dispersal of domesticated goats.

Materials and Methods
Ancient Samples Sequencing DNA was extracted from 27 ancient bone and teeth samples in a dedicated ancient DNA laboratory at Jilin University using a modified silica-spin column method. A genomic DNA library was prepared from 55.5 ml of ancient DNA using NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolabs Inc.). Then, all bar-coded libraries were sequenced on an Illumina HiSeq X Ten platform (paired-end 150 bp). Besides, we downloaded published genomic data of worldwide modern and ancient goats from the NCBI deposit (supplementary tables S3 and S4, Supplementary Material online).

Processing and Alignment of Sequencing Reads
After a series of quality controls (supplementary note 3, Supplementary Material online), the cleaned reads were aligned against the most recent goat reference genome (ARS1, GCF_001704415.1) (Bickhart et al. 2017) using BWAbacktrack (BWA aln) algorithm (Li and Durbin 2010). The DNA damage pattern was characterized using mapDamage (J onsson et al. 2013). The SNP calling was performed using GATK (McKenna et al. 2010), and the genotype likelihoods were estimated using ANGSD (Korneliussen et al. 2014). The transition sites were excluded in ancient samples to minimize the false-positive results.

Population Genetics Analysis
For phylogenetic analysis, we use modern and ancient goats with at least 3Â sequence depth. MEGA X (Kumar et al. 2018) was employed to construct the Neighbor-Join tree with 100 bootstraps (supplementary note 7, Supplementary Material online). Then, iTOL (Letunic and Bork 2016) was used to visualize the topological structure. We employed smartpca implemented in the EIGENSOFT package to perform PCA analysis (Patterson et al. 2006), using the "lsqproject" and "autoshrink" options (supplementary note 6, Supplementary Material online). The outgroup f3 statistics were calculated using AdmixTools (Patterson et al. 2012) and with Qazvin Bezoar (C. aegagrus) used as outgroup (supplementary note 9, Supplementary Material online). The D statistics were calculated via ANGSD (Soraggi et al. 2018) with argali (Ovis ammon) as the outgroup (supplementary note 9, Supplementary Material online).

Demographic History Analyses
We inferred the demographic history for Chinese goats using @a@i (Gutenkunst et al. 2009). The site frequency spectra used in @a@i was computed from the total of 454 Mb sequence of the 11 NC and 11 SC (supplementary note 12, Supplementary Material online). A simple model was fit first, then the complexity of the model was increased gradually (supplementary fig. S11, Supplementary Material online). The best model was selected according to the likelihoods and Akaike's information criterion, and nonparametric bootstrapping (100 times) was performed to determine the confidence interval of each parameter. Furthermore, we compared four hypothesized models (supplementary fig. S16, Supplementary Material online) with three groups (EUR, NC and SC) using ABC approach implemented in DIYABC (Cornuet et al. 2014). We performed four million simulations for each scenario using 6,497 SNPs which thinned using a distance filter of interval >50-kb and a rare SNP filter of MAF >0.05. All one-sample and two-sample summary statistics were used.

Genome-Wide Selection Analysis
To detect regions under selection, several statistics including F ST , h p , and Tajima's D were calculated with a 50-kb sliding window and 10-kb step size via vcftools (Danecek et al. 2011). The outlier windows with high F ST value were first retained (supplementary note 10, Supplementary Material online). Then, according to the h p value in each group, we classified the windows as being under selection in either NC or SC (supplementary fig. S17, Supplementary Material online). The X chromosome in the goat reference genome (ARS1) was separated into two scaffolds: NW_017189516.1 and NW_017189517.1. To avoid ambiguity, we concatenated these two scaffolds into a continuous one (NW_017189516.1 is before NW_017189517.1) in the Manhattan plot ( fig. 3a), referred to as X.

Immunohistochemistry Staining
According to the manufacturer's instructions, Paraffinembedded tissue was deparaffinized, and then antigen retrieval was carried out by treatment with EDTA antigen repair buffer (pH ¼ 9.0). Tissues were immersed in 3% hydrogen peroxide for 25 min to eliminate the endogenous peroxidase activity and blocked with blocking reagent in 3% BSA for a further 30 min. Then, samples were incubated at 4 C overnight in a humidified chamber with the following primary antibody: FGF5 (rabbit, 1:300; Abcam, UK). After being washed three times with PBS for 5 min each time, these samples were treated with a secondary antibody (HRP marker) for 50 min at room temperature. Later, samples were rinsed several times with PBS, treated with DBA, counterstained with hematoxylin, and imaged with light microscopy (supplementary fig. S18, Supplementary Material online).

Dual-Luciferase Reporter Analysis
The CNV sequence downstream FGF5 gene of goats was amplified through PCR. Then the sequence was inserted into the pGL3 luciferase reporter vector (Promega) and confirmed by sanger sequencing. The transfection was performed using TurboFect (R0531, Thermo Scientific, Waltham, USA). After 24 h of transfection, the activity of luciferase was measured with the Dual-Luciferase Reporter Assay System (Promega). Both goat and sheep fibroblast cells were used in this assay. Each experiment was independently performed at least three times ( fig. 3e and supplementary fig. S19, Supplementary Material online).