Genomic Analyses Unveil Helmeted Guinea Fowl (Numida meleagris) Domestication in West Africa

Abstract Domestication of the helmeted guinea fowl (HGF; Numida meleagris) in Africa remains elusive. Here we report a high-quality de novo genome assembly for domestic HGF generated by long- and short-reads sequencing together with optical and chromatin interaction mapping. Using this assembly as the reference, we performed population genomic analyses for newly sequenced whole-genomes for 129 birds from Africa, Asia, and Europe, including domestic animals (n = 89), wild progenitors (n = 34), and their closely related wild species (n = 6). Our results reveal domestication of HGF in West Africa around 1,300–5,500 years ago. Scanning for selective signals characterized the functional genes in behavior and locomotion changes involved in domestication of HGF. The pleiotropy and linkage in genes affecting plumage color and fertility were revealed in the recent breeding of Italian domestic HGF. In addition to presenting a missing piece to the jigsaw puzzle of domestication in poultry, our study provides valuable genetic resources for researchers and breeders to improve production in this species.


Introduction
The domestication of animals and plants in Africa provides valuable agricultural resources for human societies (Fuller and Hildebrand 2013). In contrast to many plants, only two animals were domesticated in Africa: donkey and HGF (Numida meleagris) (Gifford-Gonzalez and Hanotte 2011). After domestication from wild HGF, the domestic HGF spread widely across sub-Saharan Africa, and then underwent a global expansion due to human translocations and its ability to adapt to a wide range of habitats (Blench and MacDonald 2000;Roberts 2002). Nowadays, domestic HGFs together with gooses account for 1.99% of the world's poultry population (chickens 91.18%, ducks 5.60%, and turkeys 1.22%; FAOSTAT, last accessed December 22, 2020) and are widely valued as source of meat, eggs, and feathers. Production is increasing rapidly. In addition to its roles in natural control of the deer ticks, which are vectors of the Lyme disease (Duffy et al. 1992), domestic HGF also serves as a physiological animal model in studying the neuromuscular, mechanical, and energetic strategies for locomotion.Despite the economic and scientific importance of domesticated HGF, its domestication and evolution remain poorly understood. Two competitive hypotheses exist for the single-origin mode. Darwin proposed that domestic HGF evolved from wild guinea fowl in East Africa (Darwin 1883). In contrast, archaeological, linguistic, and ethnographic evidence has pointed to domestication in West Africa (Blench and MacDonald 2000). Recent artistic and osteological evidence suggest dual origins less than 2,000 years ago (YBP) in both West (Mali) and East (Sudan) Africa (Andah et al. 2014). However, this dating is younger than hieroglyph records in Egypt, which date back to around 2400 BC, and even younger than introduction of this bird into Europe around 500 BC by ancient Greeks (Roberts 2002), although there was no indication that those birds were domesticated or wild. Still, the scanty records from archaeology and history, together with the osteological similarities between HGF and the francolin species or even domestic chicken (MacDonald 1992), make hypotheses about the domestication of HGF await testing.
Genetic approaches have been applied to trace the history of domestication for poultries during the past decade (Dalloul et al. 2010;Shapiro et al. 2013;Lu et al. 2015;Zhou et al. 2018;Wang et al. 2020). Previous genetic diversity studies based on mitochondrial DNA (Adeola et al. 2015;Murunga et al. 2018) and microsatellite markers (Kayang et al. 2010;Botchway et al. 2013) revealed an absence of genetic structuring in populations of African domestic HGF, implying a recent domestication accompanied with rapid subsequent dispersal in Africa. Most recently, Vignal et al. assembled the reference genome of HGF (NumMel1.0; accession in NCBI: GCA_002078875.2) and then conducted population genomic analyses with whole-genome sequencing (WGS) for pools of individuals from wild and domestic populations from Europe and Africa to investigate domestication of HGF (Vignal et al. 2019). However, the lack of genomic data of samples from East Africa-one candidate domestication center (Darwin 1883;Andah et al. 2014) hampers hypotheses testing.

Significance
The helmeted guinea fowl (HGF, Numida meleagris) is the only bird domesticated in sub-Saharan Africa. Its domestication and evolution remains elusive since Charles Darwin. In this study, we provided valuable genomic resource for this bird and revealed the domestication of HGF in West Africa. We also identified selective signals involved in early domestication and recent breeding process. The future integration of genomic evidence from animals, plants, and human populations has potential to provide insights into the dispersal of agriculture in Africa.
Herein, we employ the combined strategy based on highdepth PacBio long-read sequencing, BioNano optical mapping, and high-throughput/resolution chromosome conformation capture (Hi-C) scaffolding to obtain a de novo assembly for HGF. This improved reference provides the backbone for WGS for a total of 129 guinea fowl samples from Africa, Asia, and Europe. These data serve to test the competing hypotheses on the domestication of HGF and further explore genetic diversity and population history. Our results fail to reject the hypothesis of HGF domestication in West Africa.

An Improved Helmeted Guineafowl Genome Assembly
We sequenced multiple samplings of DNA and RNA (supplementary table S1, Supplementary Material online). Using the strategy based on PacBio long-reads sequencing, BioNano optical mapping, and Hi-C scaffolding, we assembled 1,127 scaffolds with total length of 1049.9 Mb to generate assembly HGFv1 (supplementary fig. S1, Supplementary Material online), in which the first 16 scaffolds accounted for 90% (949.6 Mb) of the total assembly, approaching nearchromosome level ( fig. 1A). We further computed synteny between the HGFv1 assembly and the chicken reference GRCg6a to show HGFv1 scaffold 5 corresponding to GRCg6a chromosome Z; scaffold 4 to chromosomes 4q and 9; scaffold 6 to chromosomes 6 and 7 ( fig. 1B). The results were in agreement with the karyotypes for HGF and chicken (Shibusawa et al. 2002).
Comparing the HGFv1 assembly with the reference NumMel1.0 (Vignal et al. 2019), the contig N50 increases $ 62 times (14.17 vs. 0.23 Mb) whereas the gaps reduce $15 Mb (5.38 vs. 20.12 Mb) (table 1). BUSCO assessments for HGFv1 showed that 94.5% of the 4,915 expected avian genes (aves_odb9) were identified as complete. The LAI of 10.29 suggested that the assembly continuity of HGFv1 reaches the reference criteria (Ou et al. 2018). In addition, 2.05 Gb of RNA-seq data obtained from pancreas, hypothalamus, bone marrow, and bursa (Darris et al. 2015) were mapped onto the assemblies of HGFv1 and NumMel1.0, respectively. Overall, 92.28% and 88.48% of the RNA-seq reads could be mapped to the two assemblies, respectively. Around 13.64% of the HGFv1 assembly were characterized as repeats (details in Materials and Methods). Analyses identified 15,173 protein-coding genes, in which 14,373 were functionally annotated. We classified noncoding RNA into rRNA ($45 kb), snRNA ($32 kb), tRNA ($22 kb), and miRNA ($15 kb).

Genome Variation
A total of 129 genomes from 89 domestic HGF, 34 wild HGF, 5 vulturine guinea fowl (Acryllium vulturinum), and 1 crested FIG. 1.-Genome architecture of the HGFv1 assembly. (A) Circos plot of HGFv1 assembly, repeat content, gene density, and GC content (%). The 1-16 scaffolds, representing 90% of the total length of 1,127 scaffolds, are shown as the outermost tracks. The repeat content is indicated by the gray line. Regions within the gene density of more than ten genes are shown as red spikes, whereas those with five to ten genes are indicated by orange spikes. Green spikes represent regions with fewer than five genes. Regions of the genome with GC content higher than average (37.54%) are shown in light blue. All data were plotted in 100-kb windows. (B) Dot plots showing synteny between the HGFv1 assembly and chicken reference genome GRCg6a (GCA_000002315.5). The sky-blue dots represent the accordant alignments, whereas the firebrick dots are reverse adjustments. guinea fowl (Guttera pucherani) samples were resequenced ( fig. 2A and supplementary table S2, Supplementary Material online). All Illumina reads were mapped to the HGFv1 assembly to an average depth of 18Â (ranging from 10.4 to 42.2). Joint variant calling produced 44,035,924 SNPs and 4,214,076 InDels (details in Materials and Methods). Among them, a total of 17,334,420 SNPs and 1,591,307 InDels existed in HGF populations. For convenience, subsequent population genomic analyses used the biallelic SNPs from the 30 autosomal scaffolds with more than 20,000 markers only, which accounted for more than 97.5% length of the assembly.

Phylogeny and Population Structure
When using vulturine guinea fowl and crested guinea fowl as outgroup taxa, phylogenetic trees constructed with FastTree and RAxML showed that the clade of wild HGF from East Africa (Sudan and Kenya) diverged first, followed by the split of the wild HGF from Nigeria and all the domestic HGF ( fig The phylogeny, PCA, and ADMIXTURE identified ten individuals as outliers. Three Nigerian (Y10, Y4, and Y27) and one Sudanese (WSU15) "wild" HGF individuals clustered with domestic samples, which potentially indicated feral domesticated HGF. Sudanese "domestic" individuals DSU02 and DSU08 clustered with the wild samples, likely reflecting wild-caught individuals. In addition, Iranian gfIR09 and gfIR10 and Hungarian HUNG12 and HUNG13 grouped with  the samples from Hungary and China, respectively. To reduce the effects of population structure and cryptic relatedness (Lawson and Falush 2012), we removed these 10 outliers and re-grouped the remaining 119 samples into 12 populations in subsequent population genomic analyses.

Population Genomic Diversity
We calculated genetic diversity indexes across ten HGF populations. Wild HGF populations had greater nucleotide diversity ( fig. 3A) and singleton statistic (fig. 3B) than domestic HGF populations. Further, wild HGF samples from Kenya and Sudan had higher diversity than the Nigerian population. Among the domestic HGF samples, the genetic diversity was highest in the Nigerian population and lowest in the Italian population. Domestic HGF, with the exception of the Chinese population, generally had higher mutational load (measured by GERP score > 2) than wild HGF ( fig. 3C). LD (expressed as R 2 ) decay rates were higher in wild HGF than in domestic HGF populations ( fig. 3D). Among the domestic HGF samples, LD decay rates were highest in the Nigerian population and lowest in the Italian population followed by the Sudanese population. Scanning genomes obtained data on runs of homozygosity (ROHs > 1 Mb) and provided insights into inbreeding in HGF populations. Wild HGF populations had a lower mean number and sum length of ROHs than domestic HGF populations ( fig. 3E). Among the domestic HGF samples, the Nigerian population had the lowest level of ROHs. The Italian and Sudanese populations presented the highest mean number and sum length of ROHs, respectively.

Detection of Genetic Admixture
We used f statistics to explore the divergence and gene flow between wild and domestic HGF. Outgroup f 3 statistics (Raghavan et al. 2014) were obtained for "outgroup-vulturine guinea fowl: wild HGF, target-domestic HGF." Compared with wild HGF from Kenya and Sudan, wild HGF from , showed that domestic HGF populations split from wild HGF populations at 5,452 YBP (95% bootstrap interval 9,916-2,548). The divergence between Nigerian and Sudanese domestic HGF was dated to 1,261 YBP (95% bootstrap interval 5,180-162). The intensity of gene flow from wild Sudanese HGF to wild Nigerian HGF (9.3%) was stronger than that from wild to domestic HGF populations in Sudan (5.4%). The time of gene flow from Sudanese wild HGF to Nigerian wild HGF dated to around 1,448 YBP, whereas that from wild to domestic HGF populations in Sudan occurred around 154 YBP.

Scan of Selective Signals
Because PCA ascertained HGF population structure ( fig. 2C), we adopted the outlier approach with PCAdapt (Luu et al. 2017) to identify genomic regions that were affected by positive selection based on allele frequency data of 4,695,945 SNPs. Using the threshold of top > 0.1% SNPs (P < 0.001), we identified 4,393 SNPs as potentially having selective signals, among which 1,941 were annotated into 453 genes (supplementary table S3  involved 347 and 101 potentially selected genes, respectively. We further applied PBS (target-all combined domestic HGF, control-wild HGF from Nigeria, background-wild HGF from Kenya and Sudan) to detect positive selection using the empirical quantiles of top 0.5% SNPs. A total of 78 genes were detected in both PCAdapt and PBS and they were significantly enriched in GO terms (Fisher's exact test P value < 0.05) related to nervous system (e.g., "extracellular ligand-gated ion channel activity," GO:0005230; "GABA receptor activity," GO:0016917) and muscle ("mechanosensitive ion channel activity," GO:0008381; "cytoskeletal protein binding," GO:0008092) (table 2). For instance, GRIA4 encodes an AMPA-sensitive glutamate receptor that functions as a ligand-gated ion channel and mediates synaptic transmission and neuroplasticity (Zhu et al. 2000). The genes of ACTN1 Because plumage color is a classical paradigm in poultry domestication and breeding (Domyan et al. 2014;Chen et al. 2015;Zhou et al. 2018 ), we screened for selective sweeps on the Italian domestic breed Camosciata showing cream white plumage with defined white spots (mutant type; fig. 5) (Ghigi 1936). The breed Selvatica with its gray-black (wild type) plumage, which showed a close relationship with Camosciata ( fig. 2A), was used for comparison. The PBS (target-Camosciata, control-Selvatica, background-wild HGF from Kenya and Sudan) and Pi-ratio statistic (ratio of background-wild HGF from Kenya and Sudan to target-Camosciata) were used in a sliding window approach ( fig.  5A and B). The top > 1% SNPs were selected as outliers. Overlapping in 141 windows identified 63 potentially selected genes (supplementary table S4, Supplementary Material online). GO categories "melatonin receptor activity (GO:0008502)," "steroid binding (GO:0005496)," "L-malate dehydrogenase activity (GO:0030060)," and "catalytic activity (GO:0003824)" were enriched (Fisher's exact test P value < 0.05) in the GO analyses (table 3) 5D). This locus was conserved across species and located closely to the highly conserved histidine residues which were essential in catalytic activity of tyrosinase via binding to copper ions ( fig. 5E). The mutation p.Trp218Gly was predicted to be probably damaging with a score of 0.998 (sensitivity: 0.27; specificity: 0.99) in the PolyPhen-2 prediction, suggesting it likely affected tyrosinase function and caused white plumage in the Camosciata breed.

Discussion
By integrating multiple genomic technologies, we provide updated HGF genome assembly HGFv1 to a nearchromosome level. Compared with reference genome NumMel1.0, HGFv1 is more improved in sequence contiguity (table 1). Incorporating PacBio full-length RNA sequencing data also improves the annotation, leading to the characterization of 14,573 protein-coding genes. The HGFv1 assembly is compatible with current poultry reference genomes (Dalloul et al. 2010;Shapiro et al. 2013;Lu et al. 2015;ZhOu et al. 2018;Morris et al. 2020). Thus, HGFv1 provides a robust reference not only for various HGF researches, but also for Galliformes poultry (e.g., chicken, quail, and turkey) studies.
The whole genomes of 129 guinea fowl samples facilitates hypothesis testing on the domestication of HGF. Wild HGF from Nigeria is more closely related to the domestic HGF than the wild HGF from Kenya and Sudan ( fig. 2 and supplementary fig. S3, Supplementary Material online). This result reject Darwin's hypothesis of an East African origin of domestic HGF (Darwin 1883). This pattern is also supported by previous study revealing that the wild HGF populations from Burkina Faso in West Africa rather than those from South Africa showed a closer relationship to domestic HGF populations (Vignal et al. 2019). Among the domestic HGF populations in our study, the domestic HGF from Nigeria has the highest level of genetic diversity and LD decay rate, but the lowest level of ROHs ( fig. 3) . 4B), which appears to have occurred after the split of wild and domestic HGF around 5,500 YBP but before the divergence of domestic HGF around 1,300 YBP. During this period, a roughly 12-fold population bottleneck occurred. The genetic diversity was decreased but the levels of LD and ROHs were increased ( fig. 3). The scenario overlaps with the domestication and diffusion inferred by genomic data of pearl millet (Cenchrus americanus) during 5,889-3,685 YBP (Burgarella et al. 2018) and African rice (Oryza glaberrima) during 3,200-2,000 YBP (Cubry et al. 2018) in West Africa. The domestication of HGF and some Western African crops likely mirror each other, which are ascribed to cultural responses to the transition from a "green Sahara" to the desert and subsequent climate changes (Kropelin et al. 2008).
The population genomic analyses provide novel insights into the genetic changes as well as their potential effects in ancient domestication and recent breeding of HGF. In general, the accumulation of deleterious mutations was increased  The PBS statistic was constructed using the breed Camosciata as target, the breed Selvatica as control and the grouped wild HGF background. (B) The Pi-Ratio was calculated from the ratio of nucleotide diversity for the grouped wild HGF background to that for the breed Camosciata. All dots represent the sliding window of 10 kb with 10 kb step size. The windows with values over top 1% quantile for both two statistics were overlapped to identify selective genes which were noted with cross labels. (C) The genes located in the LD block of scaffold 1:7,845,520-17,375,296. The selective genes were in red. (D) The Camosciata breed with cream white plumage (mutation) and the Selvatica breed with wild plumage type. The Camosciata was selected from a small breeding flock originated in France and brought to Italy in 1922 and differs from the solid white variety due to the pigmented skin of the neck and the visible white spotting of the plumage absent in the solid white birds. The photos were taken at Az. Agricola E. Oggioni in Italy and provided by Maria Giuseppina Strillacci and Erica Gorla. (E) The mutation p.Trp218Gly in exon 1 of TYR gene. The alignment of protein sequences was shown. This mutation was indicated by star label. The neighboring histidine residues which were essential for TYR function via binding to copper ions were noted with arrows. (fig. 3C). Domestic HGFs are generally less skittish and locomotive than their wild counterparts (Roberts 2002). Among the selected genes with differentiation between wild and domestic HGF (table 2 and supplementary table S3, Supplementary Material online), our analyses identify selective genes such as GRIA4 involved in nervous system, implying genetic basis for behavioral changes in domestication. GRIA4 (also known as GluR4 in mice) belongs to glutamate receptor genes which downregulate excitatory signaling and stress response in domesticated animals (O'Rourke and Boeckx 2020). Interestingly, GRIA4 was also identified as a susceptibility locus for refractive error and myopia (Verhoeven et al. 2013), implying additional roles in visual deterioration involved in domestication (Wang et al. 2016). Meanwhile, our analyses detect several selective genes (e.g., ACTN1 and PIEZO2) playing substantial roles in muscle function. This may explain the reduced locomotion ability (including flying) in domesticated poultry (Wang et al. 2017;Stover et al. 2018). The results suggest selective genes involved in behavioral and locomotive changes facilitating the management of domesticated HGF.
As compared with other domestic HGF populations, Italian population, consisting of two breeds (Camosciata and Selvatica), shows the lowest levels of genetic diversity but the highest levels of LD and ROHs ( fig. 3). The N e estimation was small ( fig. 4A and supplementary fig. S8, Supplementary Material online). All the patterns consistent with bottlenecking in extensive breeding practices, as known by the evidences on selection starting in the first half of 1900s (Ghigi 1936). By screening signals of selection, our results identify mutation p. Trp218Gly in TYR as the candidate causal locus for white plumage in Camosciata breed. Selected genes PGR (progesterone receptor) and MMP13 (Matrix Metallopeptidase 13), which function in poultry fertility (Shen et al. 2016;Yuan et al. 2016), link with TYR in the 9.5 Mb LD block containing other selected genes (fig. 5C). The long LD block in the genomes brought challenge to refining the casual loci of white plumage, but provided an opportunity to investigate the pleiotropy and linkage involved in the breeding (Wright et al. 2010). Our results arouse the possibility that the white plumageorientated selection leads to numerous changes in fecundity and other phenotypes in the breeding of Camosciata.
In summary, we de novo assemble the genome of HGF to obtain a reference-quality avian genome. Together with sequenced population genomes, the data resource has potential to facilitate innovations in genetic resource management and improvement for HGF. Our population genomic analyses provide in-depth insights into the genomic architecture and population history of wild and domestic HGF populations. Our findings in combination with recent genomic analyses of African rice (Cubry et al. 2018;Choi et al. 2019), African yam (Scarcelli et al. 2019), and pearl millet suggest West Africa as a major cradle of both animal and plant domestication.

Materials and Methods
Sampling A total of four adult male domestic (HGF) individuals from China were sampled for de novo genome assembly (supplementary table S1, Supplementary Material online). Independently, a total of 129 samples were collected for WGS (supplementary table S2, Supplementary Material online). The study was approved by the Internal Review Board of Kunming Institute of Zoology, Chinese Academy of Sciences (SYDW20150605001). The sampling of wild species was approved by Kenyan Wildlife Service and Nigeria National Park Service under permit numbers KWS/BRM/5001 and NPH/ GEN/530/I/33, respectively. The samples from Sudan were taken from available collections (Weimann et al. 2016). A "no objection for the research" from the Directorate of Veterinary Services, Ministry of Agriculture, Livestock and Fisheries in Kenya under permit number RES/POL/VOL.XXVII/ 162 was obtained to use domestic Kenyan HGF samples. The Italian HGF populations were available according to the n. OPBA-56-216 document, allowing the use of collected samples for research purpose in available bio-banks. The domestic samples from Hungary, Iran, and Nigeria were collected based on the informed consent of the private HGF owners.

Genome Assembly and Annotation
We followed the combined strategy (Bickhart et al. 2017) based on PacBio long-read sequencing, BioNano optical mapping, and Hi-C scaffolding to obtain a de novo assembly of HGFv1. The genome synteny between HGFv1 and chicken reference genome GRCg6a (GCA_000002315.5) was checked by using MUMmer4 (Marcais et al. 2018). The assembly quality of HGFv1 was evaluated with BUSCO v3 (Waterhouse et al. 2018), LAI (LTR assembly index) (Ou et al. 2018), and RNA-seq reads mapping (Darris et al. 2015

Genetic Diversity, LD, and ROHs
We regrouped populations according to PCA and ADMIXTURE results. The nucleotide diversity (Nei and Li 1979) within each population was calculated using R package PopGenome (Pfeifer et al. 2014). We used R package SeqVarTools (Gogarten et al. 2019) to count singletons per individual. We applied a recently developed unbiased estimator for linkage disequilibrium (LD) (Ragsdale and Gravel 2020) that was not sensitive to small population size. We used the R package detectRUNS (Biscarini et al. 2018) to detect ROHs using the pruned data set to eliminate the impact of strong LD. The result was summarized with two measurements defined as the mean of total length of ROHs more than 1 Mb and the number of ROHs.

Genetic Load
We used the genomic evolutionary rate profiling (GERP) scores computed for the 58 sauropsids multiple wholegenome alignment (ftp://ftp.ensembl.org/pub/release-100/ compara/, last accessed May 21, 2020) as a measure of evolutionary constraint acting on the SNPs. Positive GERP scores, larger than 2, represented a substitution deficit, which were expected for sites under selective constraint. We used MUMmer4 to align the HGFv1 assembly to that of NumMel1.0 and extracted one-to-one alignment under the minimum identity of 90% and minimum length of 1,000 using the options "-i 90 -l 1000 -1 -q".

Detection of Gene Flow
Using the genome wide allele frequency data for each HGF population, we used qp3Pop and qpDstat as implemented in AdmixTools v5.1 (Patterson et al. 2012) to calculate outgroup f 3 statistic (Patterson et al. 2012) and D statistic (Durand et al. 2011). We adopted the TreeMix software v1.13 (Pickrell and Pritchard 2012) to build a ML tree setting the vulturine guinea fowl as outgroup. We used the options "-k 1000 -global" to make blocks of 1,000 SNPs. We ran 1,000 replicates for each tree, adding the option "-bootstrap." When there's migration event, we add "-se" option to calculate the standard errors of migration weights. We used qpGraph with the parameter "allsnps: NO" in AdmixTools to build an admixture graph. The vulturine guinea fowl was set to be outgroup.

Temporal Fluctuation of N e
We randomly selected eight different samples for each of HGF populations to avoid bias in sample size. Because the mutation rate for HGF was unavailable, we used the rate 1.91 Â 10 À9 per site per year (Jarvis et al. 2014) for the chicken lineage. We adopted SMCþþ (Terhorst et al. 2017) to infer Ne changes of HGF.
We also employed PopSizeABC ( Boitard et al. 2016) that was accurate even for recent history with the recombination rate of 1.7 cM/Mbp (Pigozzi 2016). The details were described in the supplementary data, Supplementary Material online.

Inference of Demographic Model
We used momi2 (Kamm et al. 2020) to explore demographic model based on four populations: wild HGF from Sudan, wild HGF from Nigeria, domestic HGF from Nigeria, and domestic HGF from Sudan. We split the extracted folded site frequency spectrum into 100 equally sized blocks for jackknifing and bootstrapping. We introduced two gene flow events originated from wild HGF from Sudan. We set constant population size for wild HGF populations while allowing changes for domestic HGF populations after their divergence from wild ancestor. We fitted 20 independent runs with different starting parameters and kept the model with the biggest log-likelihood value of the three runs. We referred to the Akaike information criterion (AIC) to select model with smallest AIC. We conducted 100 bootstrap calculations for the estimation of parameter range.

Scan of Selective Sweep
We used the R package PCAdapt (Luu et al. 2017) to detect selective signals under the context of PCA for wild and domestic HGF based on the allele frequency data. The SNPs with minor allele frequency 0.05 were filtered. We randomly sampled 100,000 SNPs to get a background distribution of statistics, we used the threshold of top 0.1% corresponded to a P value cut-off of 0.001. We considered loci over this threshold as outliers under potential selection. We classified the outlier loci according to the association with PC1 and PC2, respectively. The potential selective genes (genes under selection) were characterized according to the genome assembly annotation.
We also used population branch statistic (PBS) (Yi et al. 2010) in the form (target-all combined HGF; control-wild HGF from Nigeria; background-wild HGF from Sudan and Kenya) to detect selective sweep. We used the R package topGO (Alexa and Rahnenfuhrer 2019) with the algorithm set to be "parentchild" for Gene Ontology enrichment analysis.
In contrast to the SNP-based PBS calculation, we adopted PBS and Pi-Ratio statistics (Nei and Li 1979) in sliding window to detect the selective sweeps in Camosciata breed. The SNPs with low frequency (<0.10) were filtered. We used PopGenome (Pfeifer et al. 2014) to calculate the fixation index (F ST ) and nucleotide diversity within population (Pi). We set the window size of 10 kb and the sliding step of 10 kb. Using the data of randomly sampled 10,000 windows, we set the threshold from sample distribution with a P value cut-off of 0.01. The windows with values over top 1% quantile for both two statistics were overlapped to identify selective genes relative to such windows. For the block that contained dense significant genes located in scaffold 1:7,845,520-17,375,296, we checked the pairwise LD using the unbiased LD estimator (Ragsdale and Gravel 2020). The potential effect of nonsynonymous mutation of TYR gene was evaluated by PolyPhen-2 v2.2.2-release 398 (Adzhubei et al. 2010).

Statistical Thresholds for Outlier Approaches
To identify a threshold for identifying extreme outliers, we used an approach by randomly sampling from the data to get a background distribution. Specifically, for the SNPbased methods of PCAdapt and PBS, we randomly sampled 100,000 SNPs and used the score within top 0.1% as the threshold corresponding to P value cut-off of 0.001. For the SNP-based PBS results, we used a threshold with the top 0.5%. For the window-based PBS and Pi-Ratio analyses, we randomly sampled 10,000 windows of 10 kb to estimate the distribution of PBS and Pi-Ratio scores. We then set top 1% as the threshold corresponding to P value cut-off of 0.01 for PBS and Pi-Ratio.

Data Availability
Raw sequencing data that support the findings of this study have been deposited to the NCBI BioProject database under accession PRJNA639701 and PRJNA639777.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.