Abstract

A high-quality reference genome coupled with resequencing data is a promising strategy to address issues in conservation genomics. This has greatly enhanced the development of conservation plans for endangered species. Pangolins are fascinating animals with a variety of unique features. Unfortunately, they are the most trafficked wild animal in the world. In this study, we assembled a chromosome-scale genome with HiFi long reads and Hi-C short reads for the Chinese and Malayan pangolin and provided two new representative reference genomes for the pangolin species. We found a great improvement in the evaluation of genetic diversity and inbreeding based on these high-quality genomes and obtained different results for the detection of genome-wide extinction risks compared with genomes assembled using short reads. Moderate inbreeding and genetic diversity were reverified in these two pangolin species, except for one Malayan pangolin population with high inbreeding and low genetic diversity. Moreover, we identified a much higher inbreeding level (FROH = 0.54) in the Chinese pangolin individual from Taiwan Province compared with that from Mainland China, but more than 99.6% runs of homozygosity (ROH) fragments were restricted to less than 1 Mb, indicating that the high FROH in Taiwan Chinese pangolins may have accumulated from historical inbreeding events. Furthermore, our study is the first to detect relatively mild genetic purging in pangolin populations. These two high-quality reference genomes will provide valuable genetic resources for future studies and contribute to the protection and conservation of pangolins.

Introduction

High-quality reference genomes enable the comprehensive analysis of population genomics and contribute to the revolution in conservation genomics [1]. However, fewer than 1% of the threatened species listed on the International Union for Conservation of Nature (IUCN) Red List have a reference genome [2], and this number will further decrease if long-read assembly is considered. In the shift from conservation genetics to conservation genomics [3], high-quality reference genomes play an important role in providing the necessary support for the conservation of endangered species [1, 4, 5, 6]. In particular, genetic rescue is considered an important strategy to facilitate gene flow and avoid inbreeding to increase the health of a population [7, 8]. A deep understanding of the genome-wide extinction risks and the basic genetic background of a small population is necessary for developing evidence-based strategies for genetic rescue. These include population structure, genomic diversity, genome-wide inbreeding, mutational load, and population demography [9–13]. A high-quality reference genome usually facilitates population-level studies in conservation genomics [1]. For example, the evaluation of inbreeding by measuring runs of homozygosity (ROH) relies on a high-quality reference genome with outstanding contiguity because the long ROH fragments in small populations with high-level inbreeding often span several millions of base pairs [13–15] and can hardly be detected based on the fragmented genomes assembled by short reads.

The pangolin is a living fossil with many unique biological characteristics [16, 17], such as overlapping keratin scales covering the body, a specialized diet, a long and muscular tongue, a sensitive olfactory system, and burrowing ability [18, 19]. Locals across its distribution areas have traditionally used its scales and meat for medicine and food [20]. The overexploitation of pangolins driven by the soaring demand for luxury food and traditional Chinese medicine has pushed this animal to the edge of extinction [21–24]. Currently, the pangolin is the most heavily trafficked wild animal worldwide, with more than 900,000 individuals poached over the past two decades and 67 countries from 6 continents involved in illegal poaching and trade [25]. For protection, all pangolin species have been placed into Appendix I of the Convention of International Trade of Endangered Species of Wild Fauna and Flora (CITES) as of 2016. Poaching is more rampant for Asian pangolins compared with African pangolins, particularly the Malayan pangolin (Manis javanica, hereafter MJ; NCBI:txid9974) and the Chinese pangolin (Manis pentadactyla, hereafter MP; NCBI:txid143292), which are under extreme survival pressure because of significant poaching and trafficking [10]. These two species have been listed as “Critically Endangered” on the IUCN Red List since 2014.

Previous studies identified two main Malayan populations [10] (hereafter MjavA and MjavB) and three main Chinese pangolin populations (hereafter MpenA, MpenB, and MpenC). The MjavA population was identified from the mainland (China and Myanmar) and diverged from the MjavB population, which may have originated from the Southeast Asian islands. Among the three Chinese pangolin populations, the MpenA population is a newly discovered population distributed in Guangdong, China. The MpenB population is distributed over a vast area, including southern China and Thailand. The MpenC population highly diverged from the other two populations, which are likely to have originated from Myanmar (Fig. 1A, Supplementary Tables S1 and S2). Although the varying population of the Chinese pangolin is still controversial [26], the highly divergent pangolin populations indicate that deep isolation may have occurred, which is usually detrimental to the survival of endangered species.

Introduction to the species distribution and chromosome synteny of the Chinese and Malayan pangolins. (A) The distribution area and sampling sites of the Chinese and Malayan pangolins in this study. The circles represent sampling sites of the Chinese pangolins reported by Wang et al. [27]. Samples without detailed locations are not shown on the map. (B) The chromosome-scale synteny analysis between the Malayan pangolin and Chinese pangolin genomes.
Figure 1:

Introduction to the species distribution and chromosome synteny of the Chinese and Malayan pangolins. (A) The distribution area and sampling sites of the Chinese and Malayan pangolins in this study. The circles represent sampling sites of the Chinese pangolins reported by Wang et al. [27]. Samples without detailed locations are not shown on the map. (B) The chromosome-scale synteny analysis between the Malayan pangolin and Chinese pangolin genomes.

Pacific Biosciences (PacBio) high-fidelity (HiFi) sequencing technology combined with a HiFi-specific assembler and Hi-C or parental sequencing data can generate high-quality, haplotype-resolved de novo assemblies, which represents one of the most promising strategies for genome assembly [28, 26]. This technique may facilitate a more accurate analysis of genome-wide genetic risks, specifically with respect to inbreeding. Because of the urgent need for establishing genomic backgrounds to support conservation, several reference genomes of pangolins have been assembled, annotated, and published [10, 18, 29–32]. However, high-quality reference genomes assembled from HiFi long-reads do not exist for pangolins. In this study, we present the genomes of the Malayan and Chinese pangolins at the chromosome scale with haplotypes resolved down to the chromosomal level, which provides new representative reference genomes for the pangolin species. We systematically examined the genomic backgrounds and evaluated the genome-wide extinction risks for five pangolin populations based on these two reference genomes.

Results

New representative reference genomes for the pangolin

To obtain high-quality reference genomes for both the Malayan and Chinese pangolin, we combined PacBio HiFi long reads, Hi-C reads, and DNBSEQ short reads for genome assembly (Table 1). We first generated phased contigs for both species, and the haploid-resolved contigs were further linked at the chromosome level by combining the HiFi long reads and Hi-C reads. We assembled 19 and 20 chromosome-scale pseudomolecules for MJ and MP, respectively (Fig. 1B, Supplementary Fig. S1), which was consistent with the karyotypic analysis [33], although the karyotype in pangolins may vary [33]. The diploid genome sizes assembled for MJ and MP were ∼2.56 Gb and ∼2.64 Gb and represented 96.78% and 92.48% of the estimated genome size (∼2.65 Gb for MJ and ∼2.86 Gb for MP), respectively (Supplementary Fig. S2). The contig and scaffold NG50 of the MJ/MP were 46.22 Mb/56.16 Mb and 141.80 Mb/140.71 Mb, respectively (Table 1, Supplementary Table S3). In addition, we identified the X-chromosome and Y-linked regions of both genomes (Supplementary Fig. S3). The hifiasm assembler simultaneously yielded in two groups of haplotigs for each of the MJ (hereafter MJH1, MJH2) and MP (hereafter MPH1, MPH2) genomes (Supplementary Table S4). Multiple lines of evidence support the high completeness and low level of artificial duplication of the haplotype-resolved assemblies. Both the base-level quality evaluation and the structural-level assessment showed that all genomes (2 diploid assemblies and 4 haploid assemblies) had high assembly accuracy (see the Supplementary Material online). High collinearity between the MP and MJ genomes was also observed with 4 fissions and 3 fusions in the MJ genome compared with the MP genome (Fig. 1B), which was consistent with a previous karyotypic analysis [34] showing that these two genomes were accurately assembled at the chromosome level. Overall, we are confident in these two new, high-quality, and representative reference genomes for pangolins.

Table 1:

Summary statistics for the genome sequences

CategoryMetricChinese pangolin (MP)Malayan pangolin (MJ)
Sequencing dataWGS (Gb)/depth (X)254.05/96.18245.34/95.87
HiFi (Gb)/depth (X)65.81/25.2095.99/37.51
Hi-C (Gb)/depth (X)239.77/90.78227.48/88.89
RNA-seq (Gb)6.216.35
ContinuityGenome size (Gb)2.642.56
Scaffold NG50a (Mb)140.71141.80
Scaffold number8962
Longest scaffold (Mb)234.25241.98
Structural accuracyReliable blocksb96.56%98.09%
False duplicationsc0.40%0.48%
CurationManualManual
Base accuracyBase pair QV57.0655.08
k-mer completeness96.81%95.40%
Functional completenessBUSCO assessment98.00% complete97.50% complete
Transcript mappability97.77%97.26%
Chromosome statusAssignedd97.22%98.72%
Pseudo-chromosomes number2019
CategoryMetricChinese pangolin (MP)Malayan pangolin (MJ)
Sequencing dataWGS (Gb)/depth (X)254.05/96.18245.34/95.87
HiFi (Gb)/depth (X)65.81/25.2095.99/37.51
Hi-C (Gb)/depth (X)239.77/90.78227.48/88.89
RNA-seq (Gb)6.216.35
ContinuityGenome size (Gb)2.642.56
Scaffold NG50a (Mb)140.71141.80
Scaffold number8962
Longest scaffold (Mb)234.25241.98
Structural accuracyReliable blocksb96.56%98.09%
False duplicationsc0.40%0.48%
CurationManualManual
Base accuracyBase pair QV57.0655.08
k-mer completeness96.81%95.40%
Functional completenessBUSCO assessment98.00% complete97.50% complete
Transcript mappability97.77%97.26%
Chromosome statusAssignedd97.22%98.72%
Pseudo-chromosomes number2019
a

Scaffold NG50: This metric represents the minimum length of a scaffold such that when all scaffolds are ranked by size, the cumulative length of scaffolds exceeding this threshold accounts for at least half of the estimated genome size.

b

Reliable blocks: These refer to genomic regions that are robustly supported by both Hi-C and HiFi sequencing reads, ensuring a reliable assembly [35].

c

False duplications: These are additional copies of k-mers present in a genome assembly beyond the expected count, as indicated by the k-mer histogram derived from the original high-fidelity reads [35].

d

Assigned: This term denotes the percentage of the genome assembly that has been confidently assigned to specific chromosomes.

Table 1:

Summary statistics for the genome sequences

CategoryMetricChinese pangolin (MP)Malayan pangolin (MJ)
Sequencing dataWGS (Gb)/depth (X)254.05/96.18245.34/95.87
HiFi (Gb)/depth (X)65.81/25.2095.99/37.51
Hi-C (Gb)/depth (X)239.77/90.78227.48/88.89
RNA-seq (Gb)6.216.35
ContinuityGenome size (Gb)2.642.56
Scaffold NG50a (Mb)140.71141.80
Scaffold number8962
Longest scaffold (Mb)234.25241.98
Structural accuracyReliable blocksb96.56%98.09%
False duplicationsc0.40%0.48%
CurationManualManual
Base accuracyBase pair QV57.0655.08
k-mer completeness96.81%95.40%
Functional completenessBUSCO assessment98.00% complete97.50% complete
Transcript mappability97.77%97.26%
Chromosome statusAssignedd97.22%98.72%
Pseudo-chromosomes number2019
CategoryMetricChinese pangolin (MP)Malayan pangolin (MJ)
Sequencing dataWGS (Gb)/depth (X)254.05/96.18245.34/95.87
HiFi (Gb)/depth (X)65.81/25.2095.99/37.51
Hi-C (Gb)/depth (X)239.77/90.78227.48/88.89
RNA-seq (Gb)6.216.35
ContinuityGenome size (Gb)2.642.56
Scaffold NG50a (Mb)140.71141.80
Scaffold number8962
Longest scaffold (Mb)234.25241.98
Structural accuracyReliable blocksb96.56%98.09%
False duplicationsc0.40%0.48%
CurationManualManual
Base accuracyBase pair QV57.0655.08
k-mer completeness96.81%95.40%
Functional completenessBUSCO assessment98.00% complete97.50% complete
Transcript mappability97.77%97.26%
Chromosome statusAssignedd97.22%98.72%
Pseudo-chromosomes number2019
a

Scaffold NG50: This metric represents the minimum length of a scaffold such that when all scaffolds are ranked by size, the cumulative length of scaffolds exceeding this threshold accounts for at least half of the estimated genome size.

b

Reliable blocks: These refer to genomic regions that are robustly supported by both Hi-C and HiFi sequencing reads, ensuring a reliable assembly [35].

c

False duplications: These are additional copies of k-mers present in a genome assembly beyond the expected count, as indicated by the k-mer histogram derived from the original high-fidelity reads [35].

d

Assigned: This term denotes the percentage of the genome assembly that has been confidently assigned to specific chromosomes.

Genome annotation

The total length of the repeat elements reached 1,260.92 Mb and 1,332.41 Mb, which accounted for 49.25% and 50.44% of the MJ and MP genomes, respectively (Supplementary Table S5). The composition of repeats in the MJ and MP genomes was similar, with the most abundant repeat element being LINE (long interspersed nuclear element) (MJ: 35.41%, MP: 37.36%), followed by LTR (long terminal repeat) (MJ: 11.14%, MP: 13.61%), DNA element (MJ: 2.23%, MP: 2.11%), and SINE (short interspersed nuclear element) (MJ: 0.96%, MP: 0.91%). We predicted 19,680 and 19,886 gene models in the MJ and MP genomes, respectively (Supplementary Table S6). The gene regions spanned over 853.90 Mb and 791.49 Mb, which comprised 33.36% and 29.96% of the MJ and MP genomes, respectively (Supplementary Table S6). The average gene length, exon length, and intron length were 43.39 kb, 175.75 bp, and 4.92 kb for the MJ genome and 39.80 kb, 175.07 bp, and 4.64 kb for the MP genome (Supplementary Table S6). The BUSCO analysis revealed high completeness for the gene sets of both genomes, with the lowest BUSCO score greater than 95% (Supplementary Table S7). Overall, 19,575 (99.47%) and 19,792 (99.53%) genes were functionally annotated in the MJ and MP genomes, respectively (Supplementary Table S8). In addition, we predicted 288/435 ribosomal RNA, 1,296/1,348 microRNA (miRNA), 806/350 transfer RNA (tRNA), and 1,521/1,412 small nuclear RNA (snRNA) in the MJ/MP genomes (Supplementary Table S9).

The HiFi genome improves the evaluation of genetic diversity and inbreeding

We compared the short-read assembled genome (hereafter SG; the YNU_ManPten_2.0 and YNU_ManJav_2.0 were used here to represent the Chinese pangolin and Malayan pangolin genome, respectively) and the long-read assembled genome (PacBio HiFi assembled genome in this study, hereafter long-read assembled genome [LG]) to evaluate the commonly used genetic parameters in population genomics, particularly in conservation genomics, including population structure, population history and separation, genetic diversity, and inbreeding (Supplementary Table S10). We found that the population structure (principal component analysis [PCA], phylogenetic tree, and admixture), population history, and population separation (inferred by MSMC2) were not significantly affected by these two types of reference genomes (Supplementary Figs. S4S7), because the results calculated based on the LG and SG were the same and consistent with previous reports. In contrast to other studies, however, we did not observe a distinct separation between the individual from Taiwan Province and other Chinese pangolins, although the Chinese pangolin in Taiwan Province is considered a subspecies of the Chinese pangolin [36]. Therefore, either the genetic differentiation is still relatively small between the Chinese pangolin in Taiwan Province and Mainland China, or the Taiwan individual was recently translocated from Mainland China. For genetic diversity, the difference between π values calculated by LG and SG was not large; however, this difference was significant, with a higher π value calculated based on the LG compared with that for the SG (Fig. 2A, Supplementary Table S10). Moreover, the ROH is an important genetic factor that reflects the inbreeding level in a population, whereas it was very sensitive to the quality of the reference genome. By screening ROHs across the genome to evaluate genome-wide inbreeding, we identified significant differences between the SG and LG (Supplementary Tables S11S12). For the Malayan pangolin populations, the FROH calculated based on the LG was markedly higher compared with those calculated based on the SG (Fig. 2B, Supplementary Table S13). However, for the Chinese pangolin, this difference was significant only for ROHs larger than 1 Mb, although FROH was larger for the LG at other ROH lengths (Fig. 2C).

Comparison of the genome-wide genetic diversity and inbreeding estimated based on the LG (long-read HiFi assembled genome) and SG (short-read assembled genome). (A) Comparison of genome-wide π calculated based on the LG and SG in the Chinese and Malayan pangolin genomes. (B) Comparison of FROH calculated based on LG and SG in the Malayan pangolin genomes. (C) Comparison of FROH calculated based on LG and SG in the Chinese pangolin genomes. NS: P ≥ 0.5, *P < 0.05, **P < 0.01, ***P < 0.001.
Figure 2:

Comparison of the genome-wide genetic diversity and inbreeding estimated based on the LG (long-read HiFi assembled genome) and SG (short-read assembled genome). (A) Comparison of genome-wide π calculated based on the LG and SG in the Chinese and Malayan pangolin genomes. (B) Comparison of FROH calculated based on LG and SG in the Malayan pangolin genomes. (C) Comparison of FROH calculated based on LG and SG in the Chinese pangolin genomes. NS: P ≥ 0.5, *P < 0.05, **P < 0.01, ***P < 0.001.

Genome-wide genetic diversity and inbreeding

Considering the improvement in evaluating the genetic diversity and inbreeding based on the high-quality HiFi genomes, we performed a reassessment based on the LG and found that the genome-wide genetic diversity (π) of the Chinese pangolin and Malayan pangolin was 0.0026 and 0.0018, respectively, which was higher than those calculated based on the SG (πMP = 0.0025, πMJ = 0.0016) (Supplementary Table S10). For the Chinese pangolin populations, MpenB exhibited the highest genetic diversity (πMpenB = 0.0020), followed by MpenC (πMpenC = 0.0018) and MpenA (πMpenA = 0.0017), which was consistent with a previous study but with higher π values [37]. Among the Malayan pangolin populations, the genetic diversity of MjavB (πMjavA = 0.0024) was higher than that of MjavA (πMjavB = 0.0007). The mean He of MjavA and MjavB was 0.063% and 0.189%, respectively, which was higher than the 0.043% and 0.141% values in a previous study [10]. The average genetic diversity of the Chinese pangolin was higher compared with that of the Malayan pangolin (Supplementary Table S10).

Inbreeding in small populations increases genome-wide homozygosity, and the resulting depression accelerates the loss of genetic diversity. For the MJ and MP populations, the average number of ROH fragments in each individual was 3,659.22 ± 215.96 and 1,543.43 ± 186.90, respectively. For both species, the ROHs were restricted to relatively small fragments (<1 Mb) (Fig. 3A, B), and the number of ROHs longer than 1 Mb accounted for only 0.94% and 1.27% of the ROH fragments for the MP and MJ genomes, respectively [10]. The total length of ROHs larger than 1 Mb also accounted for a small proportion of the two genomes (MP: 5.19%; MJ: 6.69%) [38]. We did not observe any ROH fragments greater than 5 Mb in either species. Similarly, the FROH in the MJ genomes (0.39 ± 0.024) was higher than that in the MP genomes (0.15 ± 0.023) (Supplementary Table S13). For ROHs longer than 1 Mb, however, the FROH was sharply reduced to 0.026 ± 0.003 and 0.0078 ± 0.002 for the MJ and MP, respectively (Supplementary Table S13), which were much lower than that reported in a previous study [10, 38]. The very high minor allele frequency of 0.2 and other abnormally stringent filtering parameters used in the previous study as a threshold to filter the single nucleotide polymorphism (SNP) may severely dilute the SNPs across the genome, resulting in an overestimation of the inbreeding in the genome.

Genome-wide inbreeding estimated by ROH in the Chinese and Malayan pangolin populations. (A) The length distribution of ROH across the genome in the Chinese pangolin population. (B) The length distribution of ROH across the genome in the Malayan pangolin population. (C) Genome-wide heterozygosity and inbreeding estimates (FROH) for all five pangolin populations. (D) Comparison of the averaged FROH in the MjavA and MjavB populations of the Malayan pangolins. (E) Comparison of the averaged FROH in the MpenA, MpenB, and MpenC populations of Chinese pangolins.
Figure 3:

Genome-wide inbreeding estimated by ROH in the Chinese and Malayan pangolin populations. (A) The length distribution of ROH across the genome in the Chinese pangolin population. (B) The length distribution of ROH across the genome in the Malayan pangolin population. (C) Genome-wide heterozygosity and inbreeding estimates (FROH) for all five pangolin populations. (D) Comparison of the averaged FROH in the MjavA and MjavB populations of the Malayan pangolins. (E) Comparison of the averaged FROH in the MpenA, MpenB, and MpenC populations of Chinese pangolins.

We further compared the ROH distribution in different populations of these two species (Supplementary Figs. S8S9). For Malayan pangolins, the inbreeding in the MjavA population (FROH = 0.55 ± 0.007) was more serious compared with that in the MjavB population (FROH = 0.15 ± 0.012) (Fig. 3C, D, Supplementary Table S13). Although this difference became smaller for ROHs greater than 1 Mb, it was still significant (Fig. 3D). Among the three Chinese pangolin populations, the inbreeding of the MpenA (FROH = 0.18 ± 0.036) and MpenB (FROH = 0.17 ± 0.039) populations was comparable but much worse compared with that of the MpenC population (FROH = 0.06 ± 0.005) (Fig. 3C, E, Supplementary Table S13). Similarly, the differences in FROH between the three MP populations were reduced for ROHs longer than 1 Mb (Fig. 3E, Supplementary Table S13). As reported in a previous study [10], we found that the FROH of the Taiwan individual (FROH = 0.54) was much higher than all other individuals in the MP population (Fig. 3C). Notably, we found that 99.6% of the ROH fragments in the genome of the Taiwan individual were less than 1 Mb, which were much higher than that reported in the previous study (Supplementary Table S13). In addition, the FROH varied greatly among individuals in the MpenA or MpenB population, but this was not observed in the MpenC, MjavA, and MjavB populations.

Genome-wide mutational load

The mutational load is the burden of deleterious variants carried by a population and reflects the evolutionary fitness of a population [39]. Although studies have examined the distribution of mutational load for the Chinese and Malayan pangolins [10, 27, 38], the HiFi genomes in this study provide new insight into the accumulation of mutational load in pangolins. We screened three categories of mutational load (loss of function, LoF; missense mutation; deleterious nonsynonymous mutation, dnsSNP) based on the HiFi genomes for the Chinese and Malayan pangolins (Fig. 4A, B, Supplementary Tables S14S15). We calculated the individual-level derived mutational load for each population to avoid bias introduced by different population sizes. In a previous study, the MpenC population had the most mutational load. In this study, however, we found that individuals in the MpenC population harbored the most mutational load, which was significantly more than that in the MpenB and MpenA populations (Fig. 4A). This may be the result of the HiFi reference genome used in this study and the different methods for the identifying the derived allele across the genome. The missense mutations and dnsSNPs in the MpenA population were comparable to those in the MpenB population (Supplementary Fig. S10a, b); however the MpenB population harbored many more LoFs compared with the MpenA population (Fig. 4A). Next, we focused on the derived homozygous mutational load (DHMD) and found that the MpenC population harbored the most DHMD, whereas MpenA and MpenB contained comparable DHMD (Fig. 4C, Supplementary Fig. S10c–f). For Malayan pangolins, the MjavB population harbored much more derived LoF compared with that of the MjavA population (Fig. 4B, Supplementary Tables S14S15). However, the proportion of DHMD in the MjavA population for the LoF was comparable to that in the MjavB population (Fig. 4D). The MjavA population exhibited a higher proportion of DHMD for the dnsSNP and missense mutations than that of the MjavB population (Supplementary Fig. S10f), possibly because of the more efficient genetic purging of large-effect deleterious mutations (LoF), which was not investigated in previous studies. In the genomic evolutionary rate profiling (GERP) analysis, we obtained a highly similar result with the MpenC population harboring the most relative mutational load among the MP populations and the MjavA population harboring more relative mutational load than that in the MjavB population (Fig. 4E, F).

Mutational load in the Chinese and Malayan pangolin populations. (A) Total number of individual-level LoF mutations across the Chinese pangolin populations. (B) Total number of individual-level LoF mutations across the Malayan pangolin populations. (C) The ratio of homozygous LoF mutations in the Chinese pangolin populations was calculated by the following formula: 2 × homozygous sites/(2 × homozygous sites + heterozygous site). (D) The ratio of homozygous LoF mutations in the Malayan pangolin populations was calculated using the same formula as that for the Chinese pangolin. (E) Relative mutational load in the Chinese pangolin populations (top 0.1% of GERP scores). (F) Relative mutational load in the Malayan pangolin populations (top 0.1% of GERP scores). The LoF here means loss-of-function mutations.
Figure 4:

Mutational load in the Chinese and Malayan pangolin populations. (A) Total number of individual-level LoF mutations across the Chinese pangolin populations. (B) Total number of individual-level LoF mutations across the Malayan pangolin populations. (C) The ratio of homozygous LoF mutations in the Chinese pangolin populations was calculated by the following formula: 2 × homozygous sites/(2 × homozygous sites + heterozygous site). (D) The ratio of homozygous LoF mutations in the Malayan pangolin populations was calculated using the same formula as that for the Chinese pangolin. (E) Relative mutational load in the Chinese pangolin populations (top 0.1% of GERP scores). (F) Relative mutational load in the Malayan pangolin populations (top 0.1% of GERP scores). The LoF here means loss-of-function mutations.

Site frequency spectrum (SFS) analysis revealed that 7.91% and 9.07% of putatively damaging and neutral alleles, respectively, were fixed in the MpenC population. These two ratios were 6.84% and 9.53%, respectively, in the MjavA population; however, the proportion of the fixed alleles in the other three populations (MpenA, MpenB, and MjavB) was much smaller (Fig. 5A. B). This indicates that the MpenC and MjavA populations may have experienced population bottleneck events, and genetic drift drove more rare alleles to fix into these two populations [9]. By comparing SFS lines between the Chinese and Malayan pangolins, we found that the SFS lines were flatter for polymorphic loci (fixed alleles excluded) in the Malayan pangolins, whereas the flattest SFS line was observed in the MjavA population, indicating the possibility of more serious bottlenecks in the evolutionary history of the MjavA population.

The SFS and genetic signals of genetic purging in pangolin populations. (A) SFS for putatively damaging (LoF and missense mutations) and neutral mutations (intergenic variants) in the MpenA, MpenB, and MpenC populations. (B) SFS for putatively damaging and neutral mutations in the MjavA and MjavB populations. (C) Dot plot showing the occurrence of LoF mutations in the two Malayan pangolin populations calculated as the ratio of the number of the mutational load to synonymous mutations in the ROH regions (ROHf) or non-ROH (non-ROHf) regions across the genome. Each point signifies the LoF frequency of an individual. The large dots represent the average LoF frequency for the population, while the lines indicate the standard deviation range around the mean. (D) The ratio of ROHf to non-ROHf for the LoF in the two Malayan pangolin populations. The box represents the interquartile range (IQR), stretching from the first quartile (Q1) to the third quartile (Q3). The line that bisects the box indicates the median value. The whiskers extended from the box to show the variability of the data, typically reaching to the minimum and maximum values that fall within 1.5 times of the IQR from Q1 and Q3, respectively. (E) Dot plot showing the occurrence of LoF mutations in the three Chinese pangolin populations calculated as that in (C). (F) The ratio of ROHf to non-ROHf for the LoF in the three Chinese pangolin populations.
Figure 5:

The SFS and genetic signals of genetic purging in pangolin populations. (A) SFS for putatively damaging (LoF and missense mutations) and neutral mutations (intergenic variants) in the MpenA, MpenB, and MpenC populations. (B) SFS for putatively damaging and neutral mutations in the MjavA and MjavB populations. (C) Dot plot showing the occurrence of LoF mutations in the two Malayan pangolin populations calculated as the ratio of the number of the mutational load to synonymous mutations in the ROH regions (ROHf) or non-ROH (non-ROHf) regions across the genome. Each point signifies the LoF frequency of an individual. The large dots represent the average LoF frequency for the population, while the lines indicate the standard deviation range around the mean. (D) The ratio of ROHf to non-ROHf for the LoF in the two Malayan pangolin populations. The box represents the interquartile range (IQR), stretching from the first quartile (Q1) to the third quartile (Q3). The line that bisects the box indicates the median value. The whiskers extended from the box to show the variability of the data, typically reaching to the minimum and maximum values that fall within 1.5 times of the IQR from Q1 and Q3, respectively. (E) Dot plot showing the occurrence of LoF mutations in the three Chinese pangolin populations calculated as that in (C). (F) The ratio of ROHf to non-ROHf for the LoF in the three Chinese pangolin populations.

Genetic purging in pangolin populations

Genetic purging is an important process that has an impact on the accumulation of deleterious mutations in the population, and it is usually more evident in small populations; however, few studies have discussed this issue in pangolins. To determine whether genetic purging has occurred in these pangolin populations facilitated by inbreeding, we first compared the occurrence of mutational load (LoF, missense mutation, and dnsSNP) in the ROH regions (ROHf, the ratio of the number of mutational load to synonymous mutations in the ROH regions across the genome) and outside of the ROH regions (non-ROHf, the ratio of the number of mutational load to synonymous mutations in the non-ROH regions across the genome) [12]. In the Malayan pangolin populations, the ROHf of highly deleterious mutations (LoF) was significantly lower than the non-ROHf in both the MjavA and MjavB population (Fig. 5C). This denoted that many large-effect deleterious alleles (e.g., LoF) still existed but are masked in non-ROH regions in both populations, indicating that the purging was less efficient to remove recessive deleterious mutations across the whole genome. On further inspection, we found that the difference between the ROHf and non-ROHf was smaller in the MjavA population than in the MjavB population (Fig. 5D and Supplementary Fig. S11a–c), indicating that the higher level of inbreeding in the MjavA population may have facilitated exposing more recessive deleterious alleles to be homozygous, which has resulted in a more efficient purging of deleterious mutations in this population than in the MjavB population [12, 13].

In the Chinese pangolin populations, the ROHf for LoF was lower than non-ROHf in all three populations. The values of non-ROH f and ROHf in the MpenA population were very similar, but the non-ROHf was significantly higher compared with that in the ROHf in both the MpenB and MpenC populations (Fig. 5E, F), indicating that the purging of deleterious alleles in the MpenA population might be more efficient than that in the MpenB and MpenC populations for the LoF mutations. However, this was not obvious for the relatively small-effect dnsSNP and missense mutations (Supplementary Fig. S11d–f). The Rxy also indicated that the MpenA population harbored the fewest deleterious mutations, followed by the MpenB and MpenC populations (Supplementary Fig. S12). In addition, the number of fixed damaging alleles was not significantly less than neutral alleles in all five populations (Fig. 5A, B), further suggesting that the genetic purging in the pangolin populations is weak and not sufficient to clear a large number of deleterious mutations.

Discussion

HiFi genomes improve the evaluation of genetic diversity and inbreeding

Accurate and precise evaluation of genome-wide extinction risks by measuring a series of genetic parameters is the central issue in conservation genomics and primarily depends on the quality of the reference genome [1]. However, what genetic parameters can be improved the most with a better reference genome? Here, we showed that the two genetic parameters promoted by a higher-quality reference genome are genetic diversity (π) and inbreeding (ROH) (Fig. 3AC). We detected more variants across the genome based on the LG than the SG because long reads could (i) span much more complex genomic regions [1] and (ii) generate much longer contigs than short reads [40, 41]. Many genomic regions that cannot be assembled by short reads can be assembled by long reads, and these regions may contain important variants. In addition, longer contigs facilitate a higher number of reads that align accurately with the reference genome. Both of these may contribute to enhancing the accuracy of genetic diversity calculations. Therefore, we do not recommend comparing the genetic diversity calculated based on the SG to that calculated based on the LG; however, it may still make sense to compare the genetic diversity that was calculated based on the SG genomes between different populations.

The estimation of inbreeding by detecting ROH across the genome highly depends on genome contiguity, because short contigs in the SG can hardly span over long ROH fragments. As predicted, the inbreeding level detected using the LG was significantly higher than those identified using the SG for ROH fragments larger than 1 Mb. When we focused on ROHs larger than 100 kb, this difference in FROH was still significant in the Malayan pangolin population but not in the Chinese pangolin population (Fig. 3B, C), which we inferred should have resulted from the different contiguities of the two pangolin genomes. Indeed, the contig N50 of the SG for the Chinese pangolin (133.77 kb) was much longer than that of the SG for the Malayan pangolin (73.8 kb), allowing for the detection of ROH longer than 100 kb. In contrast, the scaffold N50 of the Malayan pangolin genome was longer than that of the Chinese pangolin genome, suggesting that contiguity contributes more than scaffold contiguity to the detection of long ROH across the genome. It is worth noting that the Chinese pangolin displayed a higher FROH >1 Mb in comparison to the Malayan pangolin when the LG was used for ROH detection. However, this result was opposite when the SG genome was used for detection of ROH fragments. This undisputedly supports that the high-quality reference genome assembled by long reads is essential for accurately determining the inbreeding level.

Genome-wide extinction risks in different pangolin populations

Although the Chinese and Malayan pangolins are listed as Critically Endangered species by the IUCN Red List, the genome-wide genetic diversity of these two species is moderate and even higher than other endangered flagship species [10, 27], such as the tiger [40], giant panda [42, 43], golden snub-nosed monkey [44], and kākāpo [13]. In this study, we found an even higher genetic diversity than the previous reports [10, 27] for the two pangolin species (Supplementary Fig. S13, Supplementary Table S16). Although the pangolin populations have been declining for a long time, the recent population decline caused by poaching and illegal trade is more serious than ever, which has resulted in a very rapid decline in population size. Therefore, genetic drift and inbreeding may not have resulted in a substantial decrease in genetic diversity. Although the high-quality HiFi genome improved the estimation of inbreeding, the FROH in both species was still lower than many other endangered species [9, 40], indicating a relatively fine intrinsic genetic background for these pangolin populations.

We observed a much faster and sharper population decline for the MjavA population within the most recent 10,000 years compared with the other four pangolin populations (Supplementary Fig. S7), indicating that this Malayan pangolin population has a serious extinction risk. The LG-based population genomic analysis also revealed that both the genetic diversity (πMjavA = 0.0007) and inbreeding (FROH >100 kb = 0.55) in the MjavA Malayan population were much worse than those in other pangolin population, which may be caused by the isolation and limited gene flow with other populations, because the MjavA population is distributed across Southeast Asia [10] and the gene flow is easily separated by islands. However, we cannot precisely locate this population because of the lack of accurate sampling locations [10], which should be the subject of future conservation work. Another aspect to consider is that the Taiwan Chinese pangolin individual exhibited a very high level of inbreeding, with ROH fragments longer than 1 Mb accounting for only ∼0.4%, much less than the 8.04% reported in a previous study [10]. This may be attributed to the harsh filtration of SNPs in the previous study and the overestimation of ROH longer than 1 Mb. Two possibilities may explain this phenomenon: (i) the Chinese population in Taiwan Province has ever been extensively inbred, but this situation has gradually improved, and the long ROH fragments have been broken by recombination over generations, and (ii) this was a descendant of some highly inbred individuals translocated by humans from Mainland China to the Taiwan Province. The repeated mating with the native Taiwan pangolin population broke the long ROH segments into small fragments. However, we cannot rule out that the Taiwan population may still face serious survival risks, and more Taiwan Chinese pangolin individuals need to be added to the analysis to draw a clear conclusion.

The derived mutational load in all five pangolin populations was greater than that in the Amur tiger and South China tiger populations, even with their lower inbreeding levels [40]. For the Chinese pangolin, the MpenC population had the highest proportion of mutational load. We hypothesized that the MpenC population represents an ancient and isolated Chinese pangolin population in Yunnan Province and is less disturbed by human activity but has accumulated a large mutational load over its evolutionary history. We detected a stronger drift in the MpenC population, which can cause reduced efficacy of purifying selection to remove deleterious mutations [13]. In the Malayan pangolin, the MjavA population showed low genetic diversity and high inbreeding. Although the high inbreeding in the MjavA population may promote the purging of deleterious mutations, we still observed a large amount of DHMD in this population. The significantly lower frequency of LoF inside compared with outside the ROH regions indicates that many strongly recessive deleterious mutations remain in the non-ROH regions in a heterozygous state and have not been effectively removed. This may be explained by the less efficient genetic purging in the pangolin populations.

Novel implications for the global conservation of the two pangolin species

High-volume poaching and trafficking have resulted in the overexploitation of pangolins; thus, the wild population, particularly the Chinese and Malayan pangolins, has plummeted to near extinction [10, 21–24]. Although moderate inbreeding and genetic diversity for these two species indicate a fine intrinsic genetic status, the population differentiation (FST) among the Chinese and Malayan pangolin populations remains large, which was also reported in other studies [10, 27, 45], even larger than the genetic differences of many subspecies [5, 46, 47]. Genome-wide risks in different populations of Chinese and Malayan pangolins are also different [10, 27]. Therefore, in the effort to protect and conserve pangolins, addressing and managing the issues of illicit poaching and trafficking are just as important as genetic rescue efforts. Implementing timely protective and conservation measures for both the Chinese and Malayan pangolins will contribute to facilitating genetic rescue initiatives. Notably, both the genetic diversity and inbreeding of the MjavA population and the Taiwan individual are much worse than those in other pangolin populations, suggesting that the MjavA population, as well as the Taiwan Chinese pangolins, may suffer more serious survival pressures than other pangolin populations and should receive more attention and protection.

Materials and Methods

Samples, resequencing data, and ethics statements

The Chinese and Malayan pangolins used for genome assembly were wild individuals rescued by the Guangdong Wildlife Rescue Center. During a routine examination, 5 mL of blood was collected with an anticoagulant tube, immediately transferred to liquid nitrogen, and stored at −80°C. The sample collection, experiment, and research design were all approved by the Institutional Review Board of BGI (BGI-IRB E22017). We strictly adhered to the guidelines provided by the BGI-IRB for all procedures. The whole-genome sequencing data of 37 Chinese pangolin and 72 Malayan pangolin individuals were downloaded from the National Center for Biotechnology Information (NCBI) and the China National GeneBank DataBase (CNGBdb) for population genomic analysis in this study [10, 18, 27].

Nucleic acid extraction, library preparation, and sequencing

Total genomic DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen) for whole-genome sequencing (WGS) library preparation. Total RNA was extracted from blood using Trizol reagent (Invitrogen) from blood, and 250 to 300 bp reverse transcribed cDNA fragments were used for DNA library construction. Two Hi-C libraries were prepared with DpnII restriction endonuclease. DNA libraries were subjected to the Illumina HiSeq X Ten platform (RRID:SCR_016385) at Novogene for paired-end sequencing. For high-molecular-weight genomic DNA, the isolation was performed using the sodium dodecyl sulfate–based method, and purification was carried out by the Qiagen Genomic Kit. A 15k library was constructed using high-quality DNA samples (main band >30 kb) and sequenced with the PacBio Sequel II platform at Novogene.

Genome assembly and assessment

To estimate the genome size, a total of ∼100 Gb WGS short reads were used for analysis by the jellyfish (RRID:SCR_005491) (v2.3.1) [48]. The hifiasm (RRID:SCR_021069) [28] (v0.16.1) software was used to generate the primary genome with PacBio HiFi and Hi-C sequencing data. Hifiasm utilized Hi-C sequencing data to achieve chromosome-level phasing in a method that does not require parental data [26]. This process can also phase the primary contigs at the same time into two sets of haplotigs, representing the two haploid genomes of a diploid genome. Genome redundancy was removed by the software Purge_dups (RRID:SCR_021173) [34] (v1.2.5). Then, the Hi-C sequencing reads were remapped to the primary genomes, after quality control by the Juicer (RRID:SCR_017226) [49] (v1.5) and the mem algorithm of the Burrows–Wheeler Aligner (BWA, v0.7.17; RRID:SCR_010910) [50, 51]. The 3D-DNA pipeline (RRID:SCR_017227) (v190716) was finally used to concatenate and review the primary scaffolds to chromosome-scale genomes [52]. We identified the X chromosome and Y-linked regions using SRY genes and WGS short reads and confirmed that the sequencing depth of the sex chromosomes was approximately half that of the autosomes.

The genome completeness was evaluated by BUSCO (RRID:SCR_015008) (v5.2.2) software using the vertebrata_odb10 dataset [53]. We conducted a Merqury (RRID:SCR_022964) [54] (release 20,200,430) k-mer analysis and alignment of the WGS reads to the reference genome to evaluate the accuracy of the genome assembly. Genome regions covered by PacBio long reads greater than 10-fold were considered accurately assembled regions [55]. The identification of syntenic blocks between pangolin genomes was primarily performed by the NUCmer program in MUMmer (RRID:SCR_018171) [56] (v4.0.0rc1), followed by filtration using the delta-filter program in MUMmer (v4.0.0rc1) with parameters “-i 90 -l 5000.”

Genome annotation

Repeat elements in the genome were annotated using de novo and homology-based methods. De novo repeats were first annotated using the LTR finder (RRID:SCR_015247) [57] (v1.0.6) and RepeatModeler2 (RRID:SCR_015027) [58] (v2.0.1), and the identified repeats were then merged into the RepBase library as known elements. Transposable elements were identified and classified using RepeatMasker (RRID:SCR_012954) (v4.0.5) with a conserved BLASTN search against the RepBase library [59]. The RepeatProteinMask program in RepeatMasker (v4.0.5) was used to identify repeat proteins [59]. The tandem repeats were annotated using Tandem Repeats Finder [60] (v4.07).

Protein-coding genes were annotated using de novo, homology-based, and transcript-based approaches after masking the repeat elements. For the de novo method, we used Augustus [61] (v3.0.3), GlimmerHMM (RRID:SCR_002654) [62] (v3.0.1), and SNAP (RRID:SCR_007936) [63] (v11/29/2013) to predict the gene models. For transcript-based prediction, the transcripts were mapped to the reference genome using HISAT2 (RRID:SCR_015530) [64] (v2.1.0) and then assembled using StringTie (RRID:SCR_016323) [65] (v1.3.3b) based on clean RNA-seq data. Homology-based gene annotation was performed by using Blastall [66] (v2.2.26) with an E-value cutoff of 1e-5 against the protein sequences of Homo sapiens, Mus musculus, Canis lupus familiaris, and Felis catus. The final protein-coding gene set was generated using the MAKER (RRID:SCR_005309) [67] (v3.01.03) pipeline by combining high-quality homology-based, de novo, and RNA-seq supported genes.

Functional annotation was performed by a BLAST (RRID:SCR_004870) (v2.13.0) search against the SwissProt, TrEMBL, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases with an E-value cutoff of 1e-5. InterProScan (RRID:SCR_005829) [68] (v5.52–86.0) was used to predict motifs, domains, and Gene Ontology (GO) terms. The tRNA genes were identified using tRNAscan-SE (RRID:SCR_008637) [69] (v1.3.1), whereas the snRNA and miRNA genes were detected by searching the reference sequences against the Rfam database (RRID:SCR_007891) (Release 12.0) using the BLAST (v2.13.0) and the program cmsearch from infernal (RRID:SCR_011809) (v1.1.1) software.

Detection of structural variants in the pangolin genome

To identify sequence differences between the parental genomes, sequence alignment was performed using Mummer (v4.0.0rc1) with the parameters “nucmer –maxmatch -c 500 -b 500 -l 100” [56]. Structural variants (SVs) were detected based on the alignment results using SyRi (RRID:SCR_023008) [70] (v1.3). To verify the accuracy of the detected SVs, we aligned the PacBio long reads to the reference genome using BLAST (v2.13.0) to determine whether the reads crossed the breakpoints. Moreover, we extracted 300 bp of upstream/downstream flanking sequences for each breakpoint and manually verified them using DNBSEQ short reads by IGV (RRID:SCR_011793) [71] (v2.13.3) software. To identify gene loss in the haploid genome, we screened for pseudogenes interrupted by SVs using Mummer alignment and checked whether these genes had other copies across the entire genome.

Genome-wide variant calling and quality control

The BWA mem algorithm (v0.7.17) [50] was applied to map the whole-genome resequencing data of 72 Malayan pangolins and 37 Chinese pangolins to each of their reference genomes with default parameters. Sentieon (RRID:SCR_025615) [72] (v202010.01) was then used to sort, reorder, and deduplicate the alignment files for variant calling. Variants were detected for each individual using the Sentieon DNAseq Haplotyper pipeline, which is similar to the Genome Analysis Toolkit (GATK) HaplotypeCaller pipeline. Joint variant calling was performed using the Sentieon DNAseq GVCFtyper with all genetic variant call format (gVCF) files to generate a VCF file. To prepare for downstream analysis, the variant set was filtered to remove indels and multiallelic variants. For variant quality control, a stringent filtering step was performed using the following parameters: “QD <2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0”. Additionally, we filtered SNPs that were missed in more than 20% of the individuals in a population. We used both the short-read assembled genome (SG) and long-read assembled genome (LG) as references to generate variant sets for downstream comparison.

Population structure analysis

Before we performed PCA, the VCF file was converted to PLINK format using PLINK software (RRID:SCR_001757) [73] (v1.90b6.10). Genome-wide complex trait analysis (GCTA) [74] (v1.92.2) software was used for PCA analysis using the default parameters. To construct a phylogenetic tree, vcf2phylip [75] (v2.7) was used to convert the VCF file into PHYLIP format. The best substitution model was then calculated using jModelTest [76] (v2.1.10), and the maximum likelihood phylogenetic tree was constructed using IQ-TREE [77] (v1.6.12) software with default parameters. ADMIXTURE [78] (v1.3.0) was used to determine the ancestry proportion with a specified number of clusters (K) ranging from 1 to 10. For this analysis, we used both the SG and LG as the reference genomes to generate 2 groups of results.

Population demography inference

SMC++ [79] (v1.5.1) was used to infer the historical changes in the effective population size of the various pangolin populations. The SMC++ results were visualized by scaling the time to real years using a generation time of 1 year and a mutation rate of μ = 1.47 × 10−8 [10, 18] for both the Malayan and Chinese pangolins. MSMC2 (RRID:SCR_023677) [80] (v2.1.1) was used to infer the changes in the effective population size over the evolutionary history with four randomly selected individuals from each pangolin population. SNPs were first phased by Beagle (RRID:SCR_001789) [81] (v5.1) and then subjected to MSMC2 for inference of the population history. We used the SG and LG as the reference genomes to generate 2 groups of results for comparison.

ROH and genetic diversity

To detect ROH fragments, multi-individual VCF files were converted into PLINK bfile format using the PLINK (RRID:SCR_001757) [73] (v1.90b6.10) software. The ROH was then detected using the PLINK [73] (v1.90b6.10) software with the parameters “–homozyg –homozyg-window-snp 20 –homozyg-kb 10 –homozyg-density 50” [82]. ROHs shorter than 100 kb were excluded from the downstream analysis. FROH was calculated as FROH = LROH/LAUTOSOME, where LROH represents the total length of ROHs in each genome and LAUTOSOME represents the total length of the autosomes. Genome-wide genetic diversity (π) was calculated using vcftools [83] (v0.1.16) using the parameters “vcftools –gzvcf vcf.gz –window-pi 500,000 -out result.” Genome-wide heterozygosity was calculated using vcftools [83] (v0.1.16) with “vcftools –gzvcf vcf.gz –het –out result” parameters. For ROH and genetic diversity analysis, we used both the SG and LG as reference genomes for comparison, but only the results calculated based on the LG were used for further discussion.

Mutational load and genetic purging analysis

To identify the mutational load in the protein-coding genes, the variants were first annotated using ANNOVAR (RRID:SCR_012821) [84] (v20191024) and SnpEff [85] (v.5.0e). Variants annotated as stop gained, splice acceptor variant, or splice donor by SnpEff [85] were predicted to be LoF mutations. Nonsynonymous variants with a Grantham score ≥150 were considered deleterious mutations (dnsSNP) [86]. To determine the derived allele, the Malayan pangolin genome was split into 100-bp reads and mapped to the Chinese pangolin genome. If an allele was found within the Malayan pangolin genome and simultaneously represented the major allele (with an allele frequency exceeding 50%) within the Chinese pangolin population, it was designated as the ancestral state within the Chinese pangolin genome [87]. The same approach was used to determine the ancestral state of variants of the Malayan pangolin.

The occurrence of the mutational load in the ROH and non-ROH regions (ROHf and non-ROHƒ) for each individual genome was calculated by dividing the total number of deleterious mutations (Nm) within the ROH or non-ROH region by the number of synonymous mutations in the same region (SROH and Snon-ROH) as follows:

To estimate the relative excess of deleterious mutations in one pangolin population compared with another, we performed the Rxy analysis for dnsSNP, missense mutations, LoF, and synonymous mutations between pairs of pangolin populations [12]. We calculated the Rxy value using the following formula:

where |$m_X^i$| represents the count of derived alleles for the abovementioned mutations observed at each site (i) within one population (X), and |$m_Y^i$| represents that in another population (Y). |$s_X^i$| and |$s_Y^i$| represent the total number of alleles at each site (i) of the population (X or Y). C represents the abovementioned category of protein-coding sites, whereas I denotes the intergenic sites. We used the jackknife method during the calculation to obtain a standard error measurement. If Rxy = 1, both populations have the same level of derived mutation load, whereas if Rxy < 1, then population Y has more derived load than X and vice versa if Rxy > 1.

Genomic evolutionary rate profiling scores

It is difficult to estimate the genetic load without fitness data. Therefore, we calculated the relative mutational load for each individual genome. First, we screened the derived alleles distributed in the highly conserved genome region of the two pangolin species using the GERP method. To calculate the GERP scores, we selected the genomes of 37 species (Acinonyx jubatus, Bos taurus, Callithrix jacchus, Canis lupus, Cavia porcellus, Choloepus hoffmanni, Dasypus novemcinctus, Dipodomys ordii, Echinops telfairi, Equus caballus, Erinaceus europaeus, Felis catus, Homo sapiens, Loxodonta africana, Lynx canadensis, Mus musculus, Myotis lucifugus, Ochotona princeps, Oryctolagus cuniculus, Panthera pardus orientalis, Panthera tigris, Pan troglodytes, Prionailurus bengalensis, Procavia capensis, Pteropus vampyrus, Puma concolor, Rattus norvegicus, Sorex araneus, Spermophilus tridecemlineatus, Tupaia belangeri, Tursiops truncates, Vicugna pacos, Manis javanica, Manis pentadactyla, Tamandua tetradactyla, Ovis aries, and Vulpes lagopus) for screening ultra-conserved genomic regions. We split these genomes into 100-bp reads to generate fastq files. Then, we respectively aligned these fastq files to the Malayan and Chinese pangolin genomes using the mem algorithm in BWA (v0.7.17-r1188) with the “-B 3” parameter. GERP (RRID:SCR_000563) scores were then calculated using the gerpcol program from the GERP++ [88] software based on the abovementioned alignment files. In general, low GERP scores (<1) usually represent putatively neutral genome regions, whereas high GERP scores (>1) indicate conserved genome regions [13]. Derived alleles in more conserved genome regions (those with higher GERP scores) are likely to be more deleterious. In this study, we calculated the relative mutational load with mutations having the top 0.1% GERP scores to select more deleterious alleles distributed in highly conserved genome regions [9]. The relative mutational load was calculated by the following formula: the sum of all homozygous and heterozygous derived alleles multiplied by their conservation score over the total number of derived alleles, with the heterozygous sites counted as 1 allele and the homozygous sites counted as 2 alleles [13]. Therefore, a higher relative mutation load indicates that a relatively larger proportion of derived alleles may be found in more conserved genomic regions.

Site-frequency spectrum analysis

For SFS analysis, we calculated the frequency of each type of mutations at every site in the various pangolin populations. We considered intergenic variants as neutral, whereas LoF and missense variants were considered putatively damaging mutations [9]. For SFS in each pangolin population, we subsampled nonmissing derived alleles from each locus for calculation [9]. Fixed (frequency = 1) and missing (frequency = 0) alleles were included in the SFS for the five populations. We used LG as the reference genome for SFS analysis.

Additional Files

Supplementary Fig. S1. The heatmap represents the contact matrices generated by aligning the Hi-C data to the haploid chromosome-level Chinese pangolin (a) and Malayan pangolin (b) genomes.

Supplementary Fig. S2. Estimated genome size of the Chinese and Malayan pangolin genomes by using k-mer frequency analysis with k-mer size of 17. (a) The k-mer spectra of the Chinese pangolin genome. (b) The k-mer spectra of the Malayan pangolin genome.

Supplementary Fig. S3. Sequencing depths of each pseudo-chromosome. (a) Sequencing depths of the 19 autosomes, X chromosome, and Y chromosome in the Chinese pangolin genome. (b) Sequencing depths of the 18 autosomes, X chromosome, and Y chromosome in the Malayan pangolin genome.

Supplementary Fig. S4. Comparisons of LG and SG on PCA analysis for both Chinese and Malayan pangolin populations. (a) PCA analysis of Malayan pangolin populations based on SG. (b) PCA analysis of Malayan pangolin populations based on LG. (c) PCA analysis of Chinese pangolin populations based on SG. (d) PCA analysis of Chinese pangolin populations based on LG.

Supplementary Fig. S5. Comparisons of LG and SG on the construction of the phylogenetic tree for both Chinese and Malayan pangolin populations. (a) Phylogenetic tree constructed based on the SG for Malayan pangolin populations. (b) Phylogenetic tree constructed based on the LG for Malayan pangolin populations. (c) Phylogenetic tree constructed based on the SG for Chinese pangolin populations. (d) Phylogenetic tree constructed based on the LG for Chinese pangolin populations.

Supplementary Fig. S6. Comparisons of LG and SG on the admixture analysis for both Chinese and Malayan pangolin populations. (a) Genome-wide admixture analysis for three populations of Chinese pangolin based on the SG. (b) Genome-wide admixture analysis for three populations of Chinese pangolin based on the LG. (c) Genome-wide admixture analysis for two populations of Malayan pangolin based on the SG. (d) Genome-wide admixture analysis for two populations of Malayan pangolin based on the LG.

Supplementary Fig. S7. Comparison of LG and SG assembly for analyzing population history and separation in Chinese and Malayan pangolin populations. (a) The dynamics of effective population size of Malayan pangolin populations analyzed based on the SG. (b) The population size dynamics of Malayan pangolin populations analyzed based on the LG. (c) The population size dynamics of Chinese pangolin populations analyzed based on the SG. (d) The population size dynamics of Chinese pangolin populations analyzed based on the LG. (e) The divergence time between two populations of Malayan pangolin estimated based on the SG. (f) The divergence time between two populations of Malayan pangolin estimated based on the LG. (g) The divergence time among three populations of Chinese pangolin estimated based on the SG. (h) The divergence time among three populations of Chinese pangolin estimated based on the LG.

Supplementary Fig. S8. (a) The population-level ROH distribution in three Chinese pangolin populations. (b) The population-level ROH distribution in two Malayan pangolin populations.

Supplementary Fig. S9. (a) The individual-level distribution of ROH larger than 100 Kb in Chinese pangolin genomes. (b) The individual-level distribution of ROH larger than 100 Kb in Malayan pangolin genomes. Each row represents an individual.

Supplementary Fig. S10. Total deleterious nonsynonymous SNP (dnsSNP) (a) and missense (b) mutations at the individual level were assessed across five populations of Chinese and Malayan pangolins. The number of individual-level homozygous LoF (c), dnsSNP (d), and missense (e) mutations across the five pangolin populations. (f) The ratio of homozygous missense mutations and dnsSNPs in Chinese and Malayan pangolin populations was calculated as the following formula: 2 × homozygous sites/(2 × homozygous sites + heterozygous site).

Supplementary Fig. S11. Dot plot shows the occurrence of dnsSNPs (a) and missense mutations (b) in the two Malayan pangolin populations calculated as the ratio of the number of mutational load to synonymous mutations in the ROH regions (ROHf) or non-ROH regions (non-ROHf) across the genome. The ratio of ROHf to non-ROHf for dnsSNP and missense mutations in Malayan pangolin populations (c). Dot plot shows the occurrence of dnsSNPs (d) and missense mutations (e) in the three Chinese pangolin populations calculated as the ratio of the number of mutational load to synonymous mutations in the ROH regions or non-ROH regions across the genome. The ratio of ROHf to non-ROHf for dnsSNP and missense mutations in Chinese pangolin populations (f).

Supplementary Fig. S12. The Rxy ratio of derived alleles in x population to y population (x/y: MpenB/MpenC; MpenA/MpenC; MpenA/MpenB) for dnsSNP, synonymous, missense and LOF. The Rxy <1 indicated the population y has more derived alleles than population x.

Supplementary Fig. S13. Comparison of genome-wide π of the Malayan pangolin and Chinese pangolin with other endangered species. Abbreviations along the x-axis are as follows: AFU: red panda (Ailurus fulgens); AME: giant panda (Ailuropoda melanoleuca); ASI: Chinese alligator (Alligator sinensis); CMA: brown eared pheasant (Crossoptilon mantchuricum); MBE: dwarf musk deer (Moschus berezovskii); MJ: Malayan pangolin (Manis javanica); MP: Chinese pangolin (Manis pentadactyla); PTA: Amur tiger (Panthera tigris altaica).

Supplementary Fig. S14. The k-mer spectra plot estimated by Merqury. (a) The k-mer spectra plot for the haplotype-resolved chromosome-level genome of MP. (b) The k-mer spectra plot for the haplotype-resolved chromosome-level genome of MJ.

Supplementary Fig. S15. The sequencing depths of the two groups of haplotigs in both MJ and MP genomes. (a, b) Alignment with DNBSEQ read. (c, d) Alignment with PacBio HiFi reads.

Supplementary Fig. S16. Pairwise differences observed between the haploid genomes of Malayan and Chinese pangolin. The sliding window was set to be 100 bp.

Supplementary Fig. S17. (a) Brief introduction and circos diagram of the two pangolin genomes. (b) Structural rearrangements between the two haplotypes of each chromosome in Malayan pangolin and Chinese pangolin genomes. (c) Dot plot between MJH1 (x-axis) and MJH2 (y-axis), plotted by pafCoordsDotPlotly. (d) Dot plot between MPH2 (x-axis) and MPH1 (y-axis), plotted by pafCoordsDotPlotly.

Supplementary Fig. S18. Validation of structural variants through contig mapping. Contigs are mapped to both the haploid genomes to verify structural variants and visualized by the IGV software. (a) The correct structural variants between the haploid genomes. The breakpoints of the SVs could be covered by complete contigs in both of the haploid genomes. (b) The incorrect structural variants between the haploid genomes. The breakpoints of the SVs in one of the two haploid genomes could not be covered by the complete contig. In the IGV screenshot, the gray bar represents the contig spanning over the SVs, and the contig corresponding SV regions were marked as the red box. All structural variants have been verified, and several are randomly displayed here.

Supplementary Fig. S19. (a) The GO enrichment result of genes distributed in structural variants of Malayan pangolin. (b) The KEGG enrichment result of genes distributed in structural variants of Malayan pangolin. (c) The GO enrichment result of genes distributed in structural variants of Chinese pangolin. (d) The KEGG enrichment result of genes distributed in structural variants of Chinese pangolin.

Supplementary Table S1. Summarized sample information in this study.

Supplementary Table S2. Information of five populations in Chinese and Malayan pangolin included in this study.

Supplementary Table S3. Statistics of genome assemblies for the Chinese pangolin and Malayan pangolin.

Supplementary Table S4. The length of each chromosome in the Chinese pangolin and Malayan pangolin genomes.

Supplementary Table S5. The overall statistics of repeats in the Chinese and Malayan pangolin genomes.

Supplementary Table S6. Statistics of the annotated genes in Chinese and Malayan pangolin genomes.

Supplementary Table S7. BUSCO analysis of genome assemblies and gene sets in this study.

Supplementary Table S8. Statistics of functional annotation for the Malayan and Chinese pangolin’s gene sets.

Supplementary Table S9. Statistics of ncRNA annotation.

Supplementary Table S10. SNPs number, genetic diversity (π), heterozygosity, and SNP density calculated based on LG and SG in Malayan and Chinese pangolin populations.

Supplementary Table S11. The comparison of the SG and LG for estimating ROH in Malayan pangolin and Chinese pangolin populations.

Supplementary Table S12. The count and length of ROH fragments in the five populations of Malayan pangolin and Chinese pangolin.

Supplementary Table S13. The FROH in the five populations of Malayan pangolin and Chinese pangolin.

Supplementary Table S14. The total count of dnsSNPs, LoF, missense, and synonymous SNPs in various populations, along with the average number of dnsSNPs, LoF, missense, and synonymous SNPs at the individual level, across different populations of the Malayan and Chinese pangolins.

Supplementary Table S15. The individual-level mutational load estimates in all Chinese pangolin and Malayan pangolin populations.

Supplementary Table S16. Comparison of genome-wide nucleotide diversity (π) of the Malayan pangolin and Chinese pangolin with reference to other endangered species on the IUCN Red List.

Supplementary Table S17. Quality assessment of the MP and MJ genomes by the Merqury software.

Supplementary Table S18. The mapping rates of four types of sequencing data to genomes assembled in this study.

Supplementary Table S19. Comparison of the assembly statistics with the previously published Chinese pangolin and Malayan pangolin genomes.

Supplementary Table S20. Pairwise differences observed in comparisons between the haplotype genomes of Chinese and Malayan pangolin; the sliding window was set to be 100 bp.

Supplementary Table S21. Chromosomal structural variants (>50 bp) of MJH1 and MJH2 and of MPH1 and MPH2.

Supplementary Table S22. Genes distributed in the structural variants of MJ and MP genomes.

Supplementary Table S23. The KEGG enrichment result of genes distributed in the structural variants of MJ/MP.

Supplementary Table S24. The GO enrichment result of genes distributed in the structural variants of MJ/MP.

Supplementary Table S25. Functional description of pseudogenes interrupted by structural variants.

Abbreviations

BUSCO: Benchmarking Universal Single-Copy Orthologs; dnsSNP: deleterious nonsynonymous mutation; GATK: Genome Analysis Toolkit; GERP: genomic evolutionary rate profiling; GO: Gene Ontology; Hi-C: high-throughput/resolution chromosome conformation capture; HiFi: high fidelity; KEGG: Kyoto Encyclopedia of Genes and Genomes; LG: long-read assembled genome; LoF: loss of function; PCA: principal component analysis; RNA-seq: RNA sequencing; ROH: runs of homozygosity; SFS: site-frequency spectrum; SG: short-read assembled genome; SNP: single nucleotide polymorphism; TE: transposable element; WGS: whole-genome sequence.

Acknowledgments

This study was supported by the National Key Program of Research and Development, Ministry of Science and Technology (Grant No. 2022YFF1301500); the Guangdong Provincial Key Laboratory of Genome Read and Write (Grant No. 2017B030301011); and the Start-up Scientific Foundation of Northeast Forestry University (60201524043). This work was also supported by China National GeneBank (CNGB). We thank Kinanti Seraphina Larasati and Shiqing Wang for their help in reviewing and polishing the language; Hui Liu and Tian Xia from the Shenzhen Safari Park Co., Ltd. for assisting with sample collection; and all the researchers (Shiqing Wang, Dongyi Yang, Jieyao Yu, Jiale Fan, Yuting Huang, Yingna Zhou, Tianlu Liu, Jiatong Cheng, Chen Lin, and Shiyu Liu) involved in sample collection, genome sequencing, and analysis.

Author Contributions

T.L., H.L., Y.T. and Y.H. conceived and initiated the project. Y.H., J.L., F.H., Y.M., T.Q., W.L., and K.W. collected the samples. J.C., C.Y., Y.L., and Y.T. performed DNA isolation, library preparation, and genome sequencing. H.L., M.S., B.L., and Q.W. assembled the genomes and conducted the genomics analysis. T.L. coordinated the genomic analysis. T.L. and H.L. wrote the manuscript. S.K.S., M.S., Y.X., Y.F., and B.L. reviewed and edited the manuscript. T.L. and Y.H. made important contributions to the revision of the manuscript. All the authors read and approved the final manuscript.

Data Availability

Bioproject and biosample for the genomic data of this study were submitted to NCBI under accession number PRJNA1114675. The accession numbers of genomes are GCA_040,802,235.1 and GCA_040802205.1. The data that support the findings in this study also have been deposited into CNGB Sequence Archive (CNSA) [89] of China National GeneBank DataBase (CNGBdb) [90] with accession number CNP0004630. The resequencing data in this study were retrieved from earlier studies (37 Chinese pangolins: CNP0001723, CNGBdb; PRJNA529540, and PRJNA20331, NCBI Read Archive; 72 Malayan pangolins: PRJNA529540, NCBI Read Archive) [10, 18, 27]. All additional supporting data are available in the GigaScience repository, GigaDB [91–93].

Competing Interests

The authors declare no competing interests.

References

1.

Formenti
 
G
,
Theissinger
 
K
,
Fernandes
 
C
, et al.  
The era of reference genomes in conservation genomics
.
Trends Ecol Evol
.
2022
;
37
(
3
):
197
202
.. .

2.

Kitts
 
PA
,
Church
 
DM
,
Thibaud-Nissen
 
F
, et al.  
Assembly: a resource for assembled genomes at NCBI
.
Nucleic Acids Res
.
2016
;
44
(
D1
):
D73
80
.. .

3.

Ouborg
 
NJ
,
Pertoldi
 
C
,
Loeschcke
 
V
, et al.  
Conservation genetics in transition to conservation genomics
.
Trends Genet
.
2010
;
26
(
4
):
177
87
.. .

4.

Brandies
 
P
,
Peel
 
E
,
Hogg
 
CJ
, et al.  
The value of reference genomes in the conservation of threatened species
.
Genes
.
2019
;
10
(
11
):
846
. .

5.

Guang
 
X
,
Lan
 
T
,
Wan
 
Q-H
, et al.  
Chromosome-scale genomes provide new insights into subspecies divergence and evolutionary characteristics of the giant panda
.
Sci Bull
.
2021
;
66
(
19
):
2002
13
.. .

6.

Li
 
HM
,
Liu
 
BY
,
Shi
 
MH
, et al.  
Haplotype-resolved and chromosome-scale genomes provide insights into co-adaptation between the Amur tiger and Amur leopard
.
Zoological Research
.
2024
;
45
(
3
):
464
7
. .

7.

Frankham
 
R
.
Genetic rescue of small inbred populations: meta-analysis reveals large and consistent benefits of gene flow
.
Mol Ecol
.
2015
;
24
(
11
):
2610
18
.. .

8.

Weeks
 
AR
,
Heinze
 
D
,
Perrin
 
L
, et al.  
Genetic rescue increases fitness and aids rapid recovery of an endangered marsupial population
.
Nat Commun
.
2017
;
8
(
1
):
1071
. .

9.

Khan
 
A
,
Patel
 
K
,
Shukla
 
H
, et al.  
Genomic evidence for inbreeding depression and purging of deleterious genetic variation in Indian tigers
.
Proc Nat Acad Sci U S A
.
2021
;
118
(
49
):
e2023018118
. .

10.

Hu
 
JY
,
Hao
 
ZQ
,
Frantz
 
L
, et al.  
Genomic consequences of population decline in critically endangered pangolins and their demographic histories
.
Natl Sci Rev
.
2020
;
7
(
4
):
798
814
.. .

11.

von Seth
 
J
,
Dussex
 
N
,
Diez-Del-Molino
 
D
, et al.  
Genomic insights into the conservation status of the world's last remaining Sumatran rhinoceros populations
.
Nat Commun
.
2021
;
12
(
1
):
2393
. .

12.

Xue
 
Y
,
Prado-Martinez
 
J
,
Sudmant
 
PH
, et al.  
Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding
.
Science
.
2015
;
348
(
6231
):
242
45
.. .

13.

Nicolas
 
D
,
Tom
 
vV
,
Hernán
 
EM
, et al.  
Population genomics of the critically endangered kākāpō
.
Cell Genomics
.
2021
;
1
(
1
):
100002
. .

14.

Saremi
 
NF
,
Supple
 
MA
,
Byrne
 
A
, et al.  
Puma genomes from North and South America provide insights into the genomic consequences of inbreeding
.
Nat Commun
.
2019
;
10
(
1
):
4769
. .

15.

Xie
 
HX
,
Liang
 
XX
,
Chen
 
ZQ
, et al.  
Ancient demographics determine the effectiveness of genetic purging in endangered lizards
.
Mol Biol Evol
.
2022
;
39
(
1
):
msab359
. .

16.

Hua
 
L
,
Gong
 
S
,
Wang
 
F
, et al.  
Captive breeding of pangolins: current status, problems and future prospects
.
ZooKeys
.
2015
;
507
:
99
114
.. .

17.

Kondrashov
 
P
,
Agadjanian
 
AK
.
A nearly complete skeleton ofErnanodon (Mammalia, Palaeanodonta) from Mongolia: morphofunctional analysis
.
J Vertebr Paleontol
.
2012
;
32
(
5
):
983
1001
.. .

18.

Choo
 
SW
,
Rayko
 
M
,
Tan
 
TK
, et al.  
Pangolin genomes and the evolution of mammalian scales and immunity
.
Genome Res
.
2016
;
26
(
10
):
1312
22
.. .

19.

Ferreira-Cardoso
 
S
,
Billet
 
G
,
Gaubert
 
P
, et al.  
Skull shape variation in extant pangolins (Pholidota: manidae): allometric patterns and systematic implications
.
Zool J Linnean Soc
.
2019
;
188
(
1
):
255
75
.. .

20.

Heinrich
 
S
,
Wittman
 
TA
,
Ross
 
JV
, et al.  
The global trafficking of pangolins: a comprehensive summary of seizures and trafficking routes from 2010–2015
.
Petaling Jaya
:
TRAFFIC, Southeast Asia Regional Office
;
2017
.

21.

Zhang
 
F
,
Wu
 
S
,
Cen
 
P
.
The past, present and future of the pangolin in Mainland China
.
Global Ecol Conserv
.
2022
;
33
:
e01995
. .

22.

Challender
 
DWS
,
Harrop
 
SR
,
MacMillan
 
DC
.
Understanding markets to conserve trade-threatened species in CITES
.
Biol Conserv
.
2015
;
187
:
249
59
.. .

23.

Nijman
 
V
,
Zhang
 
MX
,
Shepherd
 
CR
.
Pangolin trade in the Mong La wildlife market and the role of Myanmar in the smuggling of pangolins into China
.
Global Ecol Conserv
.
2016
;
5
:
118
26
.. .

24.

Cheng
 
W
,
Xing
 
S
,
Bonebrake
 
TC
.
Recent pangolin seizures in China reveal priority areas for intervention
.
Conserv Lett
.
2017
;
10
(
6
):
757
64
.. .

25.

Zhang
 
F
,
Yu
 
Y
,
Wu
 
S
, et al.  
Reducing pangolin demand by understanding motivations for human consumption in Guangdong, China
.
Front Ecol Evol
.
2020
;
8
:
574161
. .

26.

Cheng
 
H
,
Jarvis
 
ED
,
Fedrigo
 
O
, et al.  
Haplotype-resolved assembly of diploid genomes without parental data
.
Nat Biotechnol
.
2022
;
40
(
9
):
1332
35
.. .

27.

Wang
 
Q
,
Lan
 
T
,
Li
 
H
, et al.  
Whole-genome resequencing of Chinese pangolins reveals a population structure and provides insights into their conservation
.
Commun Biol
.
2022
;
5
(
1
):
821
. .

28.

Cheng
 
H
,
Concepcion
 
GT
,
Feng
 
X
, et al.  
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
.
Nat Methods
.
2021
;
18
(
2
):
170
75
.. .

29.

Cao
 
P
,
Dai
 
Q
,
Deng
 
C
, et al.  
Genome-wide signatures of mammalian skin covering evolution
.
Sci China Life Sci
.
2021
;
64
(
10
):
1765
80
.. .

30.

Damas
 
J
,
Corbo
 
M
,
Kim
 
J
, et al.  
Evolution of the ancestral mammalian karyotype and syntenic regions
.
Proc Natl Acad Sci U S A
.
2022
;
119
(
40
):
e2209139119
. .

31.

Heighton
 
SP
,
Allio
 
R
,
Murienne
 
J
, et al.  
Pangolin genomes offer key insights and resources for the world’s most trafficked wild mammals
.
Mol Biol Evol
.
2023
;
40
(
10
):
msad190
. .

32.

Yan
 
D
,
Luo
 
X
,
Tang
 
J
, et al.  
High-quality genomes of pangolins: insights into the molecular basis of scale formation and adaption to myrmecophagous diet
.
Mol Biol Evol
.
2023
;
40
(
1
):
msac262
. .

33.

Nie
 
W
,
Wang
 
J
,
Su
 
W
, et al.  
Chromosomal rearrangements underlying karyotype differences between Chinese pangolin (Manis pentadactyla) and Malayan pangolin (Manis javanica) revealed by chromosome painting
.
Chromosome Res
.
2009
;
17
(
3
):
321
29
.. .

34.

Guan
 
D
,
McCarthy
 
SA
,
Wood
 
J
, et al.  
Identifying and removing haplotypic duplication in primary genome assemblies
.
Bioinformatics
.
2020
;
36
(
9
):
2896
98
.. .

35.

Rhie
 
A
,
McCarthy
 
SA
,
Fedrigo
 
O
, et al.  
Towards complete and error-free genome assemblies of all vertebrate species
.
Nature
.
2021
;
592
(
7856
):
737
46
.. .

36.

Challender
 
DW
,
Nash
 
HC
,
Waterman
 
C
.
Pangolins: Science, Society and Conservation
.
London, UK
:
Academic Press
,
2019
.

37.

Wang
 
Q
,
Lan
 
T
,
Li
 
H
, et al.  
Whole-genome resequencing of Chinese pangolins reveals a population structure and provides insights into their conservation
.
Commun Biol
.
2022
;
5
(
1
):
821
. .

38.

Wei
 
S
,
Fan
 
H
,
Zhou
 
W
, et al.  
Conservation genomics of the critically endangered Chinese pangolin
.
Sci China Life Sci
.
2024
;
67
(
10
):
2051
61
.. .

39.

Henn
 
BM
,
Botigué
 
LR
,
Bustamante
 
CD
, et al.  
Estimating the mutation load in human genomes
.
Nat Rev Genet
.
2015
;
16
(
6
):
333
43
.. .

40.

Zhang
 
L
,
Lan
 
T
,
Lin
 
C
, et al.  
Chromosome-scale genomes reveal genomic consequences of inbreeding in the South China tiger: a comparative study with the Amur tiger
.
Mol Ecol Resour
.
2023
;
23
(
2
):
330
47
.. .

41.

Yang
 
S
,
Lan
 
T
,
Zhang
 
Y
, et al.  
Genomic investigation of the Chinese alligator reveals wild-extinct genetic diversity and genomic consequences of their continuous decline
.
Mol Ecol Resour
.
2023
;
23
(
1
):
294
311
.. .

42.

Zhao
 
S
,
Zheng
 
P
,
Dong
 
S
, et al.  
Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation
.
Nat Genet
.
2013
;
45
(
1
):
67
71
.. .

43.

Lan
 
T
,
Yang
 
S
,
Li
 
H
, et al.  
Large-scale genome sequencing of giant pandas improves the understanding of population structure and future conservation initiatives
.
Proc Natl Acad Sci U S A
.
2024
;
121
:
e2406343121
. .

44.

Kuang
 
WM
,
Ming
 
C
,
Li
 
HP
, et al.  
The origin and population history of the endangered golden snub-nosed monkey (Rhinopithecus roxellana)
.
Mol Biol Evol
.
2019
;
36
(
3
):
487
99
.. .

45.

Hu
 
J
,
Roos
 
C
,
Lv
 
X
, et al.  
Molecular genetics supports a potential fifth Asian pangolin species (Mammalia, Pholidota, Manis)
.
Zoolog Sci
.
2020
;
37
(
6
):
538
43
.. .

46.

Pecnerova
 
P
,
Garcia-Erill
 
G
,
Liu
 
X
, et al.  
High genetic diversity and low differentiation reflect the ecological versatility of the African leopard
.
Curr Biol
.
2021
;
31
(
9
):
1862
71e5
.. .

47.

Carneiro
 
M
,
Albert
 
FW
,
Afonso
 
S
, et al.  
The genomic architecture of population divergence between subspecies of the European rabbit
.
PLoS Genet
.
2014
;
10
(
8
):
e1003519
. .

48.

Marçais
 
G
,
Kingsford
 
C
.
A fast, lock-free approach for efficient parallel counting of occurrences of k-mers
.
Bioinformatics
.
2011
;
27
(
6
):
764
70
.. .

49.

Durand
 
NC
,
Shamim
 
MS
,
Machol
 
I
, et al.  
Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments
.
Cell Syst
.
2016
;
3
(
1
):
95
98
.. .

50.

Li
 
H
.
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
.
arXiv:1303.3997v2
.
2013
; .

51.

Li
 
H
,
Durbin
 
R
.
Fast and accurate long-read alignment with Burrows–Wheeler transform
.
Bioinformatics
.
2010
;
26
(
5
):
589
95
.. .

52.

Dudchenko
 
O
,
Batra
 
SS
,
Omer
 
AD
, et al.  
De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds
.
Science
.
2017
;
356
(
6333
):
92
95
.. .

53.

Manni
 
M
,
Berkeley
 
MR
,
Seppey
 
M
, et al.  
BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes
.
Mol Biol Evol
.
2021
;
38
(
10
):
4647
54
.. .

54.

Rhie
 
A
,
Walenz
 
BP
,
Koren
 
S
, et al.  
Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies
.
Genome Biol
.
2020
;
21
(
1
):
1
27
.. .

55.

Qi
 
W
,
Lim
 
YW
,
Patrignani
 
A
, et al.  
The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features
.
Gigascience
.
2022
;
11
:
giac028
. .

56.

Marçais
 
G
,
Delcher
 
AL
,
Phillippy
 
AM
, et al.  
MUMmer4: a fast and versatile genome alignment system
.
PLoS Comput Biol
.
2018
;
14
(
1
):
e1005944
. .

57.

Xu
 
Z
,
Wang
 
H
.
LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons
.
Nucleic Acids Res
.
2007
;
35
(
Web Server issue
):
W265
68
.. .

58.

Flynn
 
JM
,
Hubley
 
R
,
Goubert
 
C
, et al.  
RepeatModeler2 for automated genomic discovery of transposable element families
.
Proc Nat Acad Sci U S A
.
2020
;
117
(
17
):
9451
57
.. .

59.

Chen
 
N
.
Using repeat Masker to identify repetitive elements in genomic sequences
.
Curr Protoc Bioinform
.
2004
;
5
(
1
):
4.10
.
1–4.10.14
. .

60.

Benson
 
G
.
Tandem repeats finder: a program to analyze DNA sequences
.
Nucleic Acids Res
.
1999
;
27
(
2
):
573
80
.. .

61.

Stanke
 
M
,
Steinkamp
 
R
,
Waack
 
S
, et al.  
AUGUSTUS: a web server for gene finding in eukaryotes
.
Nucleic Acids Res
.
2004
;
32
(
Web Server issue
):
W309
12
.. .

62.

Majoros
 
WH
,
Pertea
 
M
,
Salzberg
 
SL
.
TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders
.
Bioinformatics
.
2004
;
20
(
16
):
2878
79
.. .

63.

Korf
 
I
.
Gene finding in novel genomes
.
BMC Bioinf
.
2004
;
5
:
59
. .

64.

Kim
 
D
,
Langmead
 
B
,
Salzberg
 
SL
.
HISAT: a fast spliced aligner with low memory requirements
.
Nat Methods
.
2015
;
12
(
4
):
357
60
.. .

65.

Pertea
 
M
,
Pertea
 
GM
,
Antonescu
 
CM
, et al.  
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads
.
Nat Biotechnol
.
2015
;
33
(
3
):
290
95
.. .

66.

Mount
 
DW
.
Using the basic local alignment search tool (BLAST)
.
CSH Protoc
.
2007
;
2007
:
pdb top17
. .

67.

Campbell
 
MS
,
Holt
 
C
,
Moore
 
B
, et al.  
Genome annotation and curation using MAKER and MAKER-P
.
Curr Protoc Bioinform
.
2014
;
48
:
4.11.1
4.11.39
.. .

68.

Jones
 
P
,
Binns
 
D
,
Chang
 
H-Y
, et al.  
InterProScan 5: genome-scale protein function classification
.
Bioinformatics
.
2014
;
30
(
9
):
1236
40
.. .

69.

Lowe
 
TM
,
Eddy
 
SR
.
tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence
.
Nucleic Acids Res
.
1997
;
25
(
5
):
955
64
.. .

70.

Goel
 
M
,
Sun
 
H
,
Jiao
 
W-B
, et al.  
SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies
.
Genome Biol
.
2019
;
20
(
1
):
1
13
.. .

71.

Robinson
 
JT
,
Thorvaldsdottir
 
H
,
Turner
 
D
, et al.  
igv.Js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV)
.
Bioinformatics
.
2023
;
39
(
1
):
btac830
. .

72.

Freed
 
D
,
Aldana
 
R
,
Weber
 
JA
, et al.  
The Sentieon Genomics Tools—a fast and accurate solution to variant calling from next-generation sequence data
.
bioRxiv
.
2017
; .

73.

Purcell
 
S
,
Neale
 
B
,
Todd-Brown
 
K
, et al.  
PLINK: a tool set for whole-genome association and population-based linkage analyses
.
Am Hum Genet
.
2007
;
81
(
3
):
559
75
.. .

74.

Yang
 
J
,
Lee
 
SH
,
Goddard
 
ME
, et al.  
GCTA: a tool for genome-wide complex trait analysis
.
Am Hum Genet
.
2011
;
88
(
1
):
76
82
.. .

75.

Ortiz
 
E
. vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis.
Zenodo
.
2019
. .

76.

Darriba
 
D
,
Taboada
 
GL
,
Doallo
 
R
, et al.  
jModelTest 2: more models, new heuristics and parallel computing
.
Nat Methods
.
2012
;
9
(
8
):
772
. .

77.

Nguyen
 
LT
,
Schmidt
 
HA
,
von Haeseler
 
A
, et al.  
IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies
.
Mol Biol Evol
.
2015
;
32
(
1
):
268
74
.. .

78.

Alexander
 
DH
,
Novembre
 
J
,
Lange
 
K
.
Fast model-based estimation of ancestry in unrelated individuals
.
Genome Res
.
2009
;
19
(
9
):
1655
64
.. .

79.

Terhorst
 
J
,
Kamm
 
JA
,
Song
 
YS
.
Robust and scalable inference of population history from hundreds of unphased whole genomes
.
Nat Genet
.
2017
;
49
(
2
):
303
9
.. .

80.

Schiffels
 
S
,
Durbin
 
R
.
Inferring human population size and separation history from multiple genome sequences
.
Nat Genet
.
2014
;
46
(
8
):
919
25
.. .

81.

Browning
 
SR
,
Browning
 
BL
.
Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
.
Am Hum Genet
.
2007
;
81
(
5
):
1084
97
.. .

82.

Dobrynin
 
P
,
Liu
 
S
,
Tamazian
 
G
, et al.  
Genomic legacy of the African cheetah, Acinonyx jubatus
.
Genome Biol
.
2015
;
16
(
1
):
1
20
.. .

83.

Danecek
 
P
,
Auton
 
A
,
Abecasis
 
G
, et al.  
The variant call format and VCFtools
.
Bioinformatics
.
2011
;
27
(
15
):
2156
58
.. .

84.

Wang
 
K
,
Li
 
M
,
Hakonarson
 
H
.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
.
2010
;
38
(
16
):
e164
. .

85.

Cingolani
 
P
,
Platts
 
A
,
Wang
 
LL
, et al.  
A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: sNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3
.
Fly
.
2012
;
6
(
2
):
80
92
.. .

86.

Grantham
 
R
.
Amino acid difference formula to help explain protein evolution
.
Science
.
1974
;
185
(
4154
):
862
64
.. .

87.

Feng
 
S
,
Fang
 
Q
,
Barnett
 
R
, et al.  
The genomic footprints of the fall and recovery of the crested ibis
.
Curr Biol
.
2019
;
29
(
2
):
340
49
.
e7
. .

88.

Davydov
 
EV
,
Goode
 
DL
,
Sirota
 
M
, et al.  
Identifying a high fraction of the human genome to be under selective constraint using GERP++
.
PLoS Comput Biol
.
2010
;
6
(
12
):
e1001025
. .

89.

Guo
 
X
,
Chen
 
F
,
Gao
 
F
, et al.  
CNSA: a data repository for archiving omics data
.
Database
.
2020
;
2020
:
baaa055
. .

90.

Chen
 
F
,
You
 
L
,
Yang
 
F
, et al.  
CNGBdb: china National GeneBank DataBase
.
Hereditas (Beijing)
.
2020
;
42
:
799
809
.. .

91.

Lan
 
T
,
Tian
 
Y
,
Shi
 
M
, et al.  Supporting data for “Enhancing Inbreeding Estimation and Global Conservation Insights through HiFi Assemblies of Chinese and Malayan Pangolin.”  
GigaScience Database
.
2024
. .

92.

Lan
 
T
,
Tian
 
Y
,
Shi
 
M
, et al.  The high-quality reference genome of Chinese pangolin (Manis pentadactyla).
GigaScience Database
.
2024
. .

93.

Lan
 
T
,
Tian
 
Y
,
Shi
 
M
, et al.  The high-quality reference genome of Malayan pangolin (Manis javanica).
GigaScience Database
.
2024
. .

Author notes

Equal contributions.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.