Analysis of Population Structure and Differentially Selected Regions in Guangxi Native Breeds by Restriction Site Associated with DNA Sequencing

Guangxi indigenous chicken breeds play a very important role in promoting the high-quality development of the broiler industry in China. However, studies on genomic information of Guangxi indigenous chicken to date remain poorly explored. To decipher the population genetic structure and differentially selected regions (DSRs) in Guangxi indigenous chickens, we dug into numerous SNPs from seven Guangxi native chickens (GX) by employing the restriction site associated with DNA sequencing (RAD-seq) technology. Another three breeds, Cobb, White Leghorn, and Chahua (CH) chicken, were used as a control. After quality control, a total of 185,117 autosomal SNPs were kept for further analysis. The results showed a significant difference in population structure, and the control breeds were distinctly separate from the Guangxi native breeds, which was also strongly supported by the phylogenetic tree. Distribution of FST indicated that there were three SNPs with big genetic differentiation (FST value all reach to 0. 9427) in GX vs. CH group, which were located on chr1-96,859,720,chr4-86,139,601, and chr12-8,128,322, respectively. Besides, we identified 717 DSRs associated with 882 genes in GX vs. Cobb group, 769 DSRs with 476 genes in GX vs. Leghorn group, and 556 DSRs with 779 genes in GX vs. CH group. GO enrichment showed that there were two significant terms, namely GPI-linked ephrin receptor activity and BMP receptor binding, which were enriched in GX vs. Leghorn group. In conclusion, this study suggests that Guangxi native chickens have a great differentiation with Cobb and Leghorn. Our findings would be beneficial to fully evaluate the genomic information on Guangxi native chicken and facilitate the application of these resources in chicken breeding.

polymorphism (Dai and Long 2015), DNA sequencing and so on. Notably, most of these ways are usually time-consuming or expensive, especially for those experiments with a large sample size. In order to cost-effectively get sufficient SNPs, the routine approach is using a restriction site associated with DNA sequencing (RAD-seq) to analyze the genetic differences in Guangxi native breeds. RAD-seq is one of the reduced representation genomes sequencing strategy based on Next-generation sequencing (NGS). This approach can obtain thousands of SNPs inexpensively compared to the whole genome sequencing, although the number of SNPs is less than the latter. Currently, it has become an important method to generate genome-wide molecular data for population genetic studies (Hou et al. 2015). RAD markers were applied using microarrays initially and subsequently adapted to NGS (Shendure and Ji 2008). Now, RAD-seq has been widely used in the research of genetic variation. Miller et al. (Miller et al. 2007) used the RAD-seq approach to efficiently map the stickleback major lateral plate locus to linkage group (LG) IV. They indicated that the RAD techniques can be utilized for genetic analysis in the model and non-model organisms. Since then, many empirical studies suggested that RAD data could be applicable for the estimation of genetic relationships (Eaton and Ree 2013;Wang et al. 2013;Gonen et al. 2015) and evolutionary history (Catchen et al. 2013).
As a typical broiler, Cobb has the advantage of low feed conversion ratio (FCR), high growth rate, and less costly nutrition. White Leghorn is a great layer of white eggs, it has been much used to highly productive egg-laying hybrids. In addition, Chahua chicken is an indigenous breed in Yunnan Province, which has a close genetic relationship with Gallus gallus. Therefore, in this study, we performed the RAD-seq to analyze the population structure and differentially selected regions among seven Guangxi native breeds and three other chicken breeds, which are Cobb, White Leghorn, Chahua chicken, respectively, to investigate the genetic information of Guangxi native breeds.

Samples collection and DNA extraction
In the current study, ten chicken breeds were selected. Among these breeds, seven of them were selected as the major objective of this study. All these seven breeds were from Guangxi Zhuang Autonomous Region, which are Three-Yellow chicken (SH), Nandan-Yao chicken (NDY), Lingshan-Xiang chicken (LSX), Xiayan chicken (XY), Longsheng-Feng chicken (LSF), Donglan-Wu chicken (DLW) and Lingyun-Wu chicken (LYW), respectively. The information is presented in Table 1. Besides, one breed named Chahua chicken (CH, Yunnan Province, China) as well as two commercial chicken breeds (Cobb and White Leghorn) were chosen as reference populations. Each breed with many individuals was sampled, and blood was collected from wing sinus for DNA extraction. Genomic DNA was extracted using the phenol/chloroform method. Chickens were handled in accordance with the principles and procedures outlined by the Guangxi University's Animal Care and Use Committee.
Library for RAD-seq and sequence analysis We randomly picked out 10 samples from each breed for library construction. After all DNA samples were digested with the restriction enzyme EcoRI, the fragments with P1 and P2 adapters were enriched by PCR amplification. Subsequently, the sequencing procedure was operated in Tianjin Novogene Bioinformatics Technology Co., Ltd.
After sequencing by Illumina HiSeqXten PE150, the raw data were subjected to quality control (QC) procedure to remove reads under cutoffs. Subsequently, eligible reads were aligned to the reference genome (Gallus_gallus-5.0/galGal5) using BWA software (Li and Durbin 2009) with default parameters, and duplicate removal was performed using SAMtools software ). The information about SNP detection was stored in the VCF files. To include SNPs in further analysis, we only kept autosomal SNPs genotyped in at least 70% of the samples of each breed.

Population structure
The analysis of the population structure was performed on 10 breeds. The SNPs analyzed in STRUCTURE (Pritchard et al. 2000) were randomly sampled from each breed, all these SNPs were filtered by minor allele frequency (MAF) $ 0.2. A total of 1,000 loci were selected with 10 times for STRUCTURE analysis. The K value we set in this study was ranged from 111, and for all analyses, 10,000 burn-in steps and 10,000 replicates were used. The optimal K was determined on webpage STRUCTURE Harvester (http://taylor0.biology.ucla.edu/struct_harvest/) by Evanno method (Evanno et al. 2005), and we processed these data for final results by using CLUMPP (Jakobsson and Rosenberg 2007) and DISTRUCT software (Rosenberg 2004). For phylogenetic analysis, we sampled at least 15,000 SNPs from each breed to avoid the effect caused by unique loci. Finally, the evolutionary history was inferred using the Neighbor-Joining method (Saitou and Nei 1987) conducted in MEAG7 software (Kumar et al. 2016) with 10,000 replicates.
Calculated for F ST For F ST calculation, the seven Guangxi native breeds were set as one indigenous population (GX) to compare with three other breeds, which were Cobb, Leghorn, and Chahua chicken (CH), respectively. The F ST value of each SNP was calculated using the R packages ("hierfstast"), and the distribution of F ST of three compared groups was plotted by R packages ("CMplot"). Subsequently, we chose the SNPs of which values of F ST were distributed within the top 1% and the top 5% as n■ the top significant SNPs. These SNPs were selected as elements in the discovery of DSRs.

Differentially Selected Regions
To locate the potentially functional genes that may directly influence the population differentiation, Differentially Selected Regions (DSRs) based on F ST -SNPs were used to search for the most likely candidate genes in three compared groups. Two algorithms were used to define a valued DSR: 1) there should be a significant SNP (top 1% in F ST values) as a center, the boundary of region would be determined by the adjacent SNPs, until there were no more than 2 sequential SNPs (top 5%); 2) there were more than 5 sequential SNPs (top 5%) without high significant SNPs (top 1%) also can be considered as a DSR (Li et al. 2017).

GO enrichment
The SNPs derived from DSRs were annotated by ANNOVAR (Wang et al. 2010). The gene functional enrichment analysis was performed using Panther bioinformatics resources (www.pantherdb.org). We selected Fisher's Exact as test type and Bonferroni correction for multiple comparisons. Finally, significant terms with the enrichment value of more than 1 and the P-value is less than 0.05 were reserved (Supplemental Material, Table S2-S4).

Data availability
Raw sequence data were deposited in the NCBI Sequence Read Archive under project accession number PRJNA589666. Table S1 contains information about DSRs in three compared groups. Table  S2-S4 contain information about the genes derived from DSRs and the GO enrichment in three compared groups. Figure S1

RESULTS
Sequencing data quality A total of 997.57 million reads were obtained from all individuals with an average depth of 8.41·. After quality control, several individuals are filtered because of the low level of sequencing data. As shown in Table 2, the range of Q30 and GC content were 81.48-95.31% and 39.45-55.01%, respectively. The average number of SNPs per breed were 565,326,560,692,517,315,581,273,591,750,634,112,651,443,528,045,624,448 and 625,699, respectively. With a series of filtered control, a total of 185,117 SNPs (autosomal) were left, all of these SNPs would be used in F ST analysis and differentially selected region (DSR) analysis.

Population structure
To infer the potential population structure, we performed the STRUCTURE analysis and phylogenetic analysis on all 10 breeds. For STRUCTURE analysis, we selected the STRUCTURE software (Pritchard et al. 2000;Falush et al. 2003) to calculate genetic structure, which is applicable to most studies of genetic variation. Normally, the loci analyzed in STRUCTURE should not be in tight linkage (Pritchard et al. 2000), but multiple SNPs derived from a single RAD site are assumed to be in perfect linkage (Catchen et al. 2013). To avoid this situation, we analyzed 1,000 randomly chosen loci for STRUCTURE analysis. By using the deltaK approach proposed by Evanno et al. (Evanno et al. 2005), we found that the optimal K was K = 9 (Figure S1-S2). As shown in Figure 1, although Nandan-Yao chicken can not exhibit as an independent cluster while K = 9, Guangxi native chicken breeds distinctly separate from others. Moreover, it is distinct that each breed can represent as a cluster while K = 10. For phylogenetic analysis, we performed the Neighbor-Joining method (Saitou and Nei 1987) on 10 breeds, the expected relationships in the current experiment are given in Figure 2.

Calculation of F ST
For better understanding the genetic variation of local chickens in Guangxi Zhuang Autonomous Region, we integrated seven Guangxi chickens as one indigenous population to compare with the control breeds, which were Cobb, Leghorn, and Chahua chicken (CH), respectively. F ST is commonly used to define the variance of allele frequencies between populations (Holsinger and Weir 2009), therefore, we calculated all F ST values for every SNP locus. The global distribution of F ST in three compared groups are shown in Figure 3. For all F ST values, we observed three SNPs located on chromosome 1, chromosome 4 and chromosome 12, derived from the GX-CH group with the highest F ST values (0.9427).

Identification of DSRs and analysis of gene enrichment
To find the difference among Guangxi native breeds and the control breeds in the genome, we defined regions with shared differentially selection signals across breeds as differentially selected regions (DSRs) (Li et al. 2017). The results also demonstrated as three groups, due to the algorithm for finding DSRs was based on F ST values. A total of 717 DSRs were validated from the GX-Cobb group with 882 genes annotated, simultaneously, 769 DSRs from the GX-Leghorn group with 476 genes and 556 DSRs from the GX-CH group with 779 genes (Table S1). A Venn diagram displayed the unique genes in each group, as shown in Figure 4, we recognized 469, 216 and 408 special genes respectively.
n■ All associated genes with DSRs were used to GO analysis in each group. A large number of terms were simulated (Table S2-S4), therefore we just exhibited the most enriched 10 terms within every ontology in each group ( Figure 5, 6, and 7). Notably, as Figure 6 showed, there were two significant terms with fold enrichment over 20 in the GX-Leghorn group, namely BMP receptor binding and GPI-linked ephrin receptor activity. Moreover, some interesting GO terms were observed in three groups. Many terms associated with female gonad development were found in the GX-Cobb group, such as "development of primary sexual characteristics", "female sex differentiation" and "regulation of gonadotropin secretion", we also found these same terms in the GX-CH group. Meanwhile, several terms contained "embryo development ending in birth or egg hatching", "developmental process involved in reproduction", "regulation of hormone levels" and so on, were detected both in the GX-Leghorn group and the GX-CH group, all these terms were related to reproduction. The detail information was listed in Table S2-S4.

DISCUSSION
Cobb is a well-known and most commonly raised modern highyielding broilers with low feed conversion but a high growth rate, and White Leghorn is usually regarded as a high-productive layer. They are two classic commercial chicken breeds. In addition, there are abundant germplasm resources with varieties of chicken breeds in Guangxi Zhuang Autonomous Region. To investigate the genomic information and genetic variation in Guangxi native breeds, in the current study, we developed genetic analysis among Guangxi native breeds and three other breeds using reduced represent SNPs which were originated from RAD-seq. As the results showed above, we described the population structure by STRUCTURE and phylogenetic tree analyses, plotted the distribution of F ST by calculating F ST values and identified the DSRs between Guangxi native breeds and the control breeds. In addition, the genes from DSRs were all performed on GO enrichment for functional analysis. According to these results, we discussed separately as below: From STRUCTURE analysis, all breeds were separate and each breed displayed as a cluster while K = 10, suggesting obvious genetic differentiation in Guangxi native breeds. But as we concluded from the Evanno method (Evanno et al. 2005), the optimal genetic structure exhibited when K was 9, it suggested that the Nandan-Yao chicken was more similar to Longsheng-Feng breed. We inferred that there still existed many linked RAD-sites, which led us to underestimate the genetic diversity of Guangxi native breeds (Cariou et al. 2016). Furthermore, as shown in the phylogenetic tree, Cobb and Leghorn were clustered in one branch and had further genetic distance with Guangxi native breeds. All of these results could suggest that Guangxi native breeds have various genetic diversity on the genome.
Notably, all of the F ST values of SNPs are under the 0.9 in three compared groups, excepted three sites in which the value of F ST is the same (0.9427) in the GX-CH group. These three loci were located in chr1-96,859,720, chr4-86,139,601 and chr12-8,128,322, respectively. But only two sites could be found in significant DSRs (chr1-96,859,720 and chr12-8,128,322), and these two sites are closed to three genes (HSPA13, WNT5A, ARHGEF3), which were found in all three DSR datasets concurrently. WNT5A (Wnt family member 5A) has been reported in many studies, implying that it exerts pivotal effects in the differentiation of chicken embryonic stem cells (ESCs) into spermatogonial stem cells (SSCs) (He et al. 2018) and the differentiation of the glandular stomach (Listyorini and Yasugi 2006) (Liu et al. 2016) and platelet function (Zou et al. 2017), as for HSPA13, few studies in NCBI database. These three genes might reveal that Guangxi native breeds have promising values in disease research.
In addition, we acquired some information about functional genes from GO enrichment. 1) Many terms (GO:0016342, GO:0019987, GO:0016339, etc.) involved with cadherin superfamily genes (CDH2, CDH6 and so on.) were observed from GX-Cobb group. Cadherin superfamily is made up of a large and multitudinous collection of molecules, related to various fundamental cellular processes such as cell recognition and cell-cell adhesion (Gul et al. 2017), furthermore, members in superfamily heavily impact the neural development (Jontes 2018). Meanwhile, we got two genes (INHBA, INHBB) that are relevant to the generation of activin and inhibin, the presence of them can regulate FSH. According to the trend between inhibin and FSH, the developmental period of chicken might be estimated (Vanmontfort et al. 1995). Considering the difference of growth rate among the Guangxi native breeds and Cobb, we thought that these two types of genes might have an important role in muscle growth and development. 2) The BMP4 (Bone morphogenetic protein-4) gene was strikingly observed in the GX-Leghorn group. This gene is a member of bone morphogenetic protein family (Katagiri and Watabe 2016), with great effects in embryonic development, organ formation (Wu et al. 2016)   and regulation of sex hormone levels (Liu et al. 2017), as well as transition of primordial-to-primary follicle (Knight and Glister 2006). All of these functions confirm the importance of BMP4 for egg production, suggesting that we can make use of this gene to improve the egg-laying performance of Guangxi native breeds. 3) We found many Eph family genes both in Leghorn and Chahua comparison groups. The family of Eph receptor tyrosine kinases and their ephrin ligands have important functions in axonal guidance, bone remodeling, immune system and cancer (Shiuan and Chen 2016), because of their binding relationship for either the glycosylphosphatidylinositol-linked ephrin-A ligands or the transmembrane-bound ephrin-B ligands, Ephs can be divided into two subclasses, EphAs and EphBs (Eph Nomenclature Committee 1997). In this study, EphAs appeared in two groups, but EphBs only enriched in the GX-CH group. This might speculate that Chahua or Guangxi native breeds have more robust neurodevelopment, because the EphB signaling has a distinct effect on axon guidance and morphogenesis (Allen-Sharpley and Cramer 2012). From the above, the analysis of genetic difference showed some surprising results, which may be since we treated all Guangxi native breeds as one population to compare with others, thus many traits those we more expected and their associated genes were not prominent. But combining with the results of population structure, this also partly reflected the rich genetic diversity of Guangxi native breeds.  With decreasing costs in NGS, various sequencing strategies can be used to study population genomics and genetic variation according to experimental requirements. Therefore, a valid even unique analytical method for each NGS project is necessary and vital (Holsinger and Weir 2009;Etter et al. 2011). Even though many researchers had focused on exploring software and quantifying the error rates for improving the reliability and accuracy of RAD data (Etter et al. 2011;Mastretta-Yanes et al. 2015;Rochette and Catchen 2017), it was still deficient compared with the whole-genome resequencing. Based on this condition, we may have more consideration of details of data processing, for deeply Figure 5 The most enriched GO terms in the GX-Cobb group: the results only showed the top 10 terms in each ontology, which are biological process, molecular function, cellular component, respectively.

Figure 6
The most enriched GO terms in the GX-Leghorn group: the results only showed the top 10 terms in each ontology, which are biological process, molecular function, cellular component, respectively.
excavating the genomic information in Guangxi native breeds based on RAD-seq.
The experiment has analyzed the genetic variation of Guangxi native chicken breeds by comparing Guangxi native chickens with three other breeds, and the results suggested that indigenous chickens have abundant genetic information that deserves us to further explore. In conclusion, this study provided a rich data resource and established a theoretical basis for further exploring the genetic mechanism of Guangxi native chickens, as well as accelerating modern chicken breeding.

ACKNOWLEDGMENTS
This work was supported in part by the National Natural Science Foundation of China (31660635) and the Science and Technology Major Project of Guangxi (GK AA17204027).