Chromosome-Level Genome Assembly of the Green Peafowl (Pavo muticus)

Abstract The green peafowl (Pavo muticus) is facing a high risk of extinction due to the long-term and widespread threats of poaching and habitat conversion. Here, we present a high-quality chromosome-level genome assembly of the green peafowl with high contiguity and accuracy assembled by PacBio sequencing, DNBSEQ short-read sequencing, and Hi-C sequencing technologies. The final genome size was estimated to be 1.049 Gb, whereas 1.042 Gb of the genome was assigned to 27 pseudochromosomes. The scaffold N50 length was 75.5 Mb with a complete BUSCO score of 97.6%. We identified W and Z chromosomes and validated them by resequencing 14 additional individuals. Totally, 167.04 Mb repetitive elements were identified in the genome, accounting for 15.92% of the total genome size. We predicted 14,935 protein-coding genes, among which 14,931 genes were functionally annotated. This is the most comprehensive and complete de novo assembly of the Pavo genus, and it will serve as a valuable resource for future green peafowl ecology, evolution, and conservation studies.


Introduction
The green peafowl is one of the most attractive pheasants. Its striking and long feathers are usually regarded as great ornaments, especially the tail feathers (McGowan and Kirwan 2019). It is commonly distributed in East and Southeast Asia (McGowan et al. 1998), but has been experiencing a sharp population decline over the past three decades, largely due to the long-term and widespread threats by human activities such as poaching and habitat conversion (McGowan et al. 1998;Kong et al. 2018). Currently, the population of green peafowl has diminished from most of its historical ranges, and now they are distributed in scattering areas with small and isolated populations (McGowan and Kirwan 2019). This pattern reduces the chance of gene flow and further leads to progressive loss of genetic diversity, which would substantially impair the potential of survival. Due to the high risk of extinction, it is classified as "endangered" in the International Union for the Conservation of Nature (IUCN) Red List (Kong et al. 2018;Wu et al. 2019), thereby urgently requiring systematic conservation efforts.
Genomic analysis is essential for making strategies for the protection and conservation of endangered animals. These analyses provide necessary information of local or metapopulations, such as genetic diversity, gene flow, phylogenetic relationships, genetic loads on genome, inbreeding, and outbreeding effects on individuals or populations, as well as adaptive evolution. A high-quality reference genome at chromosome level will greatly improve the abovementioned analysis, especially for precise estimation of inbreeding effects by analyzing runs of homozygosity (ROH) and genetic load. Recently, a de novo assembled draft genome of the green peafowl was reported (Dong et al. 2021). However, it was assembled using the second-generation sequencing data only, which is inevitable with high fragmentations and errors (Mittal et al. 2019). Such flaws in quality often lead to bias in the estimation of genetic parameters and genome characterizations.
Therefore, we assembled the first chromosome-level genome of a green peafowl by using the state-of-the-art genome sequencing technologies, comprising Pacific Bioscience (PacBio) long reads, DNBSEQ short reads, and Hi-C sequencing data. We showed obvious improvement in quality, contiguity and accuracy when compared with the previously published genome. This significantly improved assembly will provide a valuable and useful resource for future studies on ecology, evolution, and conservation of this species.

Genome Assembly
The genome size of the green peafowl was estimated to be 1.05 Gb by analyzing the frequency of 17-mers using $139.52 Gb DNBSEQ shotgun reads (table 1 and

Synteny Analysis and Sex Chromosome Identification
We performed the synteny analysis between the green peafowl genome and the chicken (G. gallus) genome ( fig. 1a). High collinearity with clear one-to-one block was found between the two genomes, validating the accuracy of our assembled genome at the chromosome level. We also found fission and fusion events in this comparison. The Chr2 of the green peafowl genome was identified to be the fusion of Chr2 and Chr4 of the chicken genome. Fusion events were also found in the Chr3, Chr4, and Chr6. In contrast, the Chr1 in the chicken genome was split into Chr26 and Chr27 in the green peafowl genome. Fission events were also found in the Chr2, Chr3, and Chr4 of the chicken genome.
We primarily identified that the Chr29 and Chr30 were the Z and W chromosomes of the green peafowl, according to the high similarity with the Z and W chromosomes of the chicken genome. To further validate our inference, we re-sequenced 14 individuals, including 8 female and 6 male individuals. Then, we mapped the whole-genome sequencing reads of these 14 individuals to our assembled genome. As expected, the sequencing depth of the Chr29 and Chr30 in the female individuals were significantly lower than that of autosomes ( fig. 1c . 1d). We then concluded that the newly identified Chr29 and Chr30 were the Z and W chromosomes in the green peafowl genome.

Discussion
Here, we report the first chromosome-level genome of the green peafowl with ten scaffolds totaling 790.8 Mb anchored to eight macrochromosomes and two sexual chromosomes (chromosome Z and chromosome W). The karyotypic study of the blue peafowl (Pavo cristatus), the closest relative of the green peafowl, showed eight pairs of macrochromosomes and one pair of sex chromosome (De Boer and Van Bocxstaele 1981). The correspondence between the karyotypic and genomic results indicated the high accuracy of our assembled genome at chromosome level. The GC content of the newly assembled genome was 42.1%, which is very similar to the chicken (42.3%, GRCg6a) and blue peafowl (42.3%, AIIM_Pcri_1.0) genome. In addition, 98.86% DNBSEQ shotgun reads and 98.80% Hi-C reads were mapped to the previously published genome (Dong et al. 2021) (GPF.v1 here after), which was lower than our assembled genome. Surprisingly, the contig N50 and scaffold N50 of our assembled genome were 279-fold and 37-fold longer than that of the GPF.v1 genome. For the gene set we annotated, the BUSCO score was 17.3% higher than that of the GPF.v1 (supplementary table S5, Supplementary Material online). By comparing the gene set of our assembled genome with that of GPF.v1 genome, we found that the number of genes identified in the two genomes was very similar, but much more genes in our genome were supported by homologous genes in the chicken genome, indicating the superiority in the accuracy of our assembled genome ( fig. 1b). Taken together, our assembled green peafowl genome is not only the most continuous, complete, and accurate de novo assembly of this species, but also the most continuous de novo assembly of the Pavo genus by far. With the muchimproved genome annotation, our assembled genome will provide a valuable resource for further research works of the green peafowl on ecology, evolution, and conservation.

Samples and Ethics Statement
One female green peafowl individual from Xinxing breeding base, Liaoning Province, China was selected for genome assembly. Fresh blood sample (1.5 ml) was collected and immediately frozen in liquid nitrogen for 2 h and then transferred to

Genome Assembly and Assessment
We estimated the size and heterozygosity of the P. muticus genome with a k-mer frequency-based method (Lander and Waterman 1988). The de novo assembly was built with PacBio long reads, DNBSEQ short reads and Hi-C sequencing data. The initial contigs were assembled by PacBio long reads with the Canu (v2.0) (Koren et al. 2017) pipeline. Subsequently, the NextPolish software (v1.4.0) (Hu et al. 2020) was used to polish the initial assembly with DNBSEQ short reads. Thereafter, we removed redundant sequences in the assembly by purge_dups (v1.2.5) (Guan et al. 2020). Hi-C clean reads were mapped to the initial genome assembly by using Burrows-Wheeler Aligner (BWA, v0.7.17) (Li and Durbin 2010) software with default parameters. Hi-C data quality control was performed by Juicer (v1.5.7) (Durand et al. 2016). 3d-DNA pipeline (v180922) (Durand et al. 2016) was finally used for assigning contigs to the chromosome-level. To assess the genome completeness of the assembly, we first performed the BUSCO (Simão et al. 2015) analysis using the database of vertebrata_odb9. Then, we mapped the DNBSEQ short reads and Hi-C reads to our assembled genome by BWA mem with default parameters to calculate the mapping rate.

Genome Annotation
We used ab initio prediction and homology-based approach to identify the repetitive regions in the genome assembly. RepeatModeler2 (v2.0.1) (Flynn et al. 2020) was used for ab initio prediction of repeats with default parameters. Then, repeats generated by RepeatModeler were merged to the RepBase as known elements. Finally, RepeatMasker (v4.0.5) (Tarailo-Graovac and Chen 2009) was performed using a conserved BLASTN search in RepBase library (Jurka et al. 2005) to identify and classify transposable elements. We also applied Tandem Repeats Finder (TRF v4.09) (Benson 1999) to identify and locate tandem repeats. Repeats were masked for gene annotation.

Synteny Analysis
The syntenic blocks between the green peafowl and chicken were defined by MCscan (v. 0.8) (Tang et al. 2008) based on core-orthologous gene sets identified using BLASTp with evalue <¼ 1e-5. The number of genes required to call synteny was larger than 4.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.

Data Availability
The data that support the findings in this study have been deposited into CNGB Sequence Archive (CNSA, https://db. cngb.org/cnsa/) (Guo et al. 2020) of China National GeneBank DataBase (CNGBdb) (Chen et al. 2020) with accession number CNP0002498.