Chromosome-level genome assembly of Phrynocephalus forsythii using third-generation DNA sequencing and Hi-C analysis

Abstract Phrynocephalus forsythii is a viviparous sand lizard that is endemic to the Tarim Basin with a broad altitudinal range of 872–3,100 m. Such variation in altitude and ecological variables can offer an opportunity to uncover genetic mechanisms of ectothermic adaptation to extreme environments at high- and low-altitude. Furthermore, the evolutionary relationship of karyotype with two different chromosome numbers (2n = 46 or 2n = 48) in the Chinese Phrynocephalus is unclear. In this study, a chromosome-level reference genome of P. forsythii was assembled. The genome assembly size was 1.82 Gb with a contig N50 length of 46.22 Mb, 20,194 protein-coding genes were predicted and 95.50% of these genes were annotated in functional public databases. After cluster contigs into chromosome level using Hi-C paired-end reads, we found that two chromosomes of P. forsythii were originated from one ancestral chromosome of species with 46 chromosomes. Comparative genomic analysis revealed that numerous characteristics associated with high- or low-altitude adaptation, including energy metabolism pathways, hypoxic adaptation, and immune, exhibit rapid changes or show signals of positive selection in the P. forsythii genome. This genome provides an excellent genome resource for the study of the karyotype evolution and ecological genomics of Phrynocephalus.


Introduction
Phrynocephalus is one of the major components of the central Asian desert fauna.Chinese Phrynocephalus species form two distinct clades: lowland oviparous and highland viviparous clade, and the karyotype between these two clades are different, which can be divided into two types according to the number.The first type is species with 46 chromosomes, all other lowland oviparous species belong to this category apart from P.mystaceus.The karyotype characteristic of these species is 2n = 46, with 11 pairs of macrochromosomes and 12 pairs of microchromosomes.The second type is species with 48 chromosomes, all highland viviparous clade species and P. mystaceus belong to this category.The karyotype characteristic of these species is 2n = 48, among which viviparous clade species have 12 pairs of macrochromosomes and 12 pairs of microchromosomes, while P. mystaceus had 11 pairs of macrochromosome and 13 pairs of microchromosomes. 1 However, the sequence of karyotype evolution among Chinese Phrynocephalus has not been resolved.
Determining the adaptability of Chinese Phrynocephalus species to extreme environments is also of scientific interest.As ectothermic, Phrynocephalus species' body temperature is highly dependent on the environment, and therefore faces physiological challenges due to inhabiting habitats with extremely high-or low-temperature environments.Chinese Phrynocephalus is widely distributed, which spans from arid regions of northwestern China to the Qinghai-Tibet Plateau (Fig. 1).][4] For lowland oviparous species that are mainly distributed in low altitude arid regions of northwestern China, high ambient temperatures are seen as mainly survival stress. 5,6In fact, Phrynocephalus has been identified as at high risk of total extinction due to thermal limits being exceeded. 7However, how Chinese Phrynocephalus species adapt to these extreme environments at the genome level has not been reported.
Phrynocephalus forsythii (family: Agamidae), a sand lizard with a unique distribution and phylogenetic location in the Chinese Phrynocephalus, possesses research value for karyotype evolution and extreme environment adaptation research.Phrynocephalus forsythii is the first divergent species in highland viviparous clade, 8 as a transitional species between lowland oviparous and highland viviparous clade of Chinese Phrynocephalus, P. forsythii genome may provide some valuable information related to the karyotypes evolution.Furthermore, P. forsythii exhibits a broad distribution with range of 872-3,100 m, this distribution pattern was due to the ancestor of P. forsythii isolated on the Qinghai-Tibet Plateau, and then spread to the Tarim Basin. 8,9Therefore, P. forsythii may highlight the ancestral high-altitude environment adaptation state of Phrynocephalus.Besides, since P. forsythii subsequently spread and colonized to the northern Tarim Basin, a arid and semi-arid region with high temperature and low precipitation, this species may also underwent an adaptation to high-temperature environment.Our previous study indicated that the genetic diversity of P. forsythii population distributed in the northern Tarim Basin was lower than that of high, hint that the adaptability of population was lower under high-temperature environment. 10,11We thus consider that molecular mechanisms related to adaptation of high ambient temperatures can also be explored in the genome of P. forsythii.
In this study, we combined genomic sequencing data from Illumina short reads, Oxford Nanopore long reads, and Hi-C data to generate a chromosome-level reference genome of P. forsythii, which has 48 chromosomes. 1We also downloaded the genome of P. perzewalskii, 12 which belongs to Chinese Phrynocephalus lowland oviparous clade and has 46 chromosomes. 1We compared the two genomes to explore the karyotype relationship between the Chinese Phrynocephalus oviparous clade species with 46 chromosomes and viviparous clade species.We further annotated the protein-coding genes in the genome and used the comparative genome approach to verify how P. forsythii adapt to the extreme environment about high-and low-temperature environment.This genome provides an excellent resource for karyotype evolution and extreme environment adaptive evolution research of Chinese Phrynocephalus.

Ethics statement
All experiments were approved by the Ethics Committee of the School of Life Sciences, Lanzhou University and followed the committee guidelines.

Sample collection and sequencing
Adult female P. forsythii was collected by hand from Korla (Xinjiang Uygur Autonomous Region of China), at an elevation of 876 m, in the north of Tarim Basin in July 2019.Fresh tissues, including the heart, liver, brain, gonad, and kidney, were separated and immediately immersed in liquid nitrogen.All tissues were subsequently stored in a drikold and returned to the lab.Liver tissue was used to extract DNA for genome sequencing, and all of these inner visceral organs were used for RNA-sequencing (RNA-seq) analysis and genome annotation.
Short-read sequencing, long-read sequencing, and Hi-C sequencing technologies were used to obtain the P. forsythii genome information.For short-read sequencing, a short insert library with a size of 350 bp was constructed using the TruSeq Library Construction Kit and was then sequenced on an Illumina NovaSeq 6000 Sequencing System (Illumina Inc., San Diego, CA, USA).Raw reads with >10% unknown bases or with >20% low-quality bases and all adaptor sequences produced in the PCR process will be removed as low-quality sequences.For long-read sequencing, another library was constructed using the genomic sequencing kit SQK-LSK108 and sequenced using the Promethion platform (Oxford Nanopore Technologies, Oxford, UK).Generated nanopore raw reads <1 kb in length or with a mean quality value of <7 were removed.The Hi-C library was constructed for the chromosome assembly of P. forsythii.Cells form liver tissue were cross-linked using the formaldehyde solution, and the DNA was cut using the HindIII enzyme.Biotinylated nucleotides were used to mark ends of restriction fragments and the T4 Ligase was used to ligate.Subsequently, the ligated DNA was sheared into a length of 350 bp and sequenced through the Illumina NovaSeq 6000 Sequencing System (Illumina Inc., San Diego, CA, USA) and the lowquality Hi-C (high-throughput chromosome conformation capture) reads were filtered using HiC-Pro v2.10.0 with default parameters.
RNA libraries were constructed to assist in genome annotation.RNA sequencing was performed on five tissues (heart, liver, brain, ovaries, and kidney) obtained during sampling.The RNA-seq library was constructed using the NEBnext Ultra-Directional RNA Library Prep kit (NEB, protocol B) and sequenced on the Illumina HiSeq 2000 platform.

Genome characteristic estimation and assembly
Illumina short reads were first used to estimate genome size.The 17-mer depth frequency distribution and total k-mer number were calculated using Jellyfish (v2.2.6) 13 and SOAPdenovo. 14Genome size was estimated using the equation: genome size = K_num/K_depth, where K_num is the total k-mer number and K_depth is the peak k-mer frequency depth of 17-mer.
Nanopore long reads were employed to assemble primary genomes using wtdbg2 using the following parameters: -k 0 -p 21 -K 1000.049988-A -S 4.000000 -s 0.050000 -g 0 -X 50.000000 -e 3 -L 0. 15 To further improve the quality and accuracy of our genome assembly, the assembled genomic sequences were first polished by Racon v1.2.1 using Nanopore long reads and then polished by Pilon v1.21 16 using Illumina short reads with the parameters of -Xmx30g-diploidchanges.After the genome was initially assembled and polished, Hi-C data were used to complete the assisted assembly of a chromosome-level genome.We used the ALLHiC scaffolding method to identify chromosomal Hi-C interactions, linking contigs using the following parameters: enz = MboI shortest_ = 150 MaxLinkDensity = 3 minREs = 50 break = Y filter = yes CLUSTER = 24 NonInformativeRatio = 0. 17

Genome assembly evaluation
Three strategies were used to evaluate the quality and accuracy of the assembled genome.The integrity of the genome was first evaluated using BUSCO (v3.0.2b). 18Subsequently, the number of conserved eukaryotes present in the assembled genome of P. forsythii was determined using the Core Eukaryotic Genes Mapping Approach (CEGMA). 19Finally, all filtered Illumina short reads were aligned to the assembled genome using BWA to evaluate genome completeness. 20

Genome annotation
Repeat sequences (tandem repeats and transposable elements [TEs]) within the genome were annotated according to different strategies.Tandem repeats were identified using Tandem Repeat Finder.TEs were identified through a combined strategy of the homolog and de novo prediction at both the DNA and protein levels.At the DNA level, homolog prediction was first implemented using the Repbase database to extract repeat regions.The de novo repetitive elements database was built using long tandem repeats (LTR)_FINDER, RepeatScout, and RepeatModeler.All repeat sequences possessing lengths >100 bp and gap Nʹ values <5% constituted the raw TE library.After prediction, a combination of the homolog alignment library and the de novo prediction library was supplied to RepeatMasker for repeat identification. 21t the protein level, RepeatProteinMask was used to search against the TE protein database.
Protein-coding genes within the entire genome were annotated using de novo prediction, homology-based prediction, and RNA-Seq-assisted prediction.For de novo prediction, Augustus v3.2.3, Geneid v1.4,GENESCAN v1.0, GlimmerHMM v3.04, 22 and SNAP 23 were used in our automated gene prediction pipeline.For the homology-based prediction, the protein sequences of Salvator merianae, Anolis carolinensis, Lacerta viridis, Lacerta bilineata, Shinisaurus crocodilurus, Pogona vitticeps, Phrynocephalus przewalskii, and Phrynocephalus vlangalii were downloaded from Ensembl and NCBI.The protein sequences of the above species were aligned to the genome of P. forsythii using TblastN v2.2.26 (E-value < 1e−5), 24 and the matching proteins were then analysed for accurate spliced alignments using GeneWise v2.4.1. 25For RNA-Seq-assisted prediction, transcriptome read assemblies were first generated using Trinity v2.1.1. 26NA-seq-based gene prediction was performed using PASA v2.0.2.Finally, a non-redundant reference gene set was generated by merging the genes predicted by the above three methods using EvidenceModeler v1.1.1. 27ene functions were first assigned using InterProScan v5.31 28 by searching against six different public protein databases, including Pfam v27.0, 29 PRINTS v42.0, PROSITE v20.97,ProDom v2006.1,SMART v6.2, and PANTHER v12.0.Gene Ontology (GO) IDs for each gene were assigned according to the corresponding InterPro entries.Additionally, the SwissProt and NR databases were also used for functional annotation, and the gene sets were mapped to a KEGG pathway to identify the best match for each gene.
The tRNAs were predicted using tRNAscan-SE v2.0. 30For rRNAs, we chose relative species rRNA sequences as references and predicted rRNA sequences using BLAST.Small nuclear RNA (snRNA) and microRNA were annotated using the non-coding database Rfam v14.0. 31
Given that P. przewalskii genome is fragmented and synteny is generally well conserved beyond genera, we downloaded a chromosome-level genome of Phrynosoma platyrhinos form NCBI (No. PRJNA685451), which has six macrochromosomes and 11 microchromosomes. 33urthermore, P.platyrhinos shared the most recent common ancestor with P. forsythii about 150 Mya (timetree.org),this species is an evolutionarily closer reptile to P. forsythii with a chromosome-scale assembly genome.We performed synteny analysis between P. forsythii and P.platyrhinos using MCscanx.

Comparative genome analysis 2.7.1. Species-specific genes
In order to explore the adaptation characteristics of P. forsythii in both high-and low-altitude environments at the genome level, species involved in the analysis were selected according to different research purposes.For P. forsythii high-altitude adaptation characteristic analysis, P. forsythii and 8 other low-altitude species Pseudonaja textilis, Podarcis muralis, Zootoca vivipara, Anolis carolinensis, Pogona vitticeps, Phrynocephalus przewalskii, Xenopus tropicalis, and Leptobrachium leishanense were selected and grouped them as the "high-altitude adaptation group", all species in this group have a lower elevation distribution than P. forsythii (Supplementary Table S1); for P. forsythii low-altitude adaptation characteristic analysis, we selected P. forsythii and seven other high-altitude species Rhinopithecus bieti, Bos grunniens, Pseudopodoces humilis, Phrynocephalus vlangalii, Nanorana parkeri, Bufo gargarizans, and Triplophysa tibetana into a group and named it as the "low-altitude adaptation group", these species have a higher elevation distribution than P. forsythii (Supplementary Table S1).Altitudinal range of these species were obtained from https://www.gbif.org/search.
Homology relationships (including orthologs and paralogs) among species were determined using OrthoMCL. 34Protein sequences of above species were acquired from Ensembl or NCBI and were filtered to retain the longest transcript.Sequences of fewer than 50 amino acids in length were removed.These retained sequences were aligned reciprocally with the BLASTP plug-in on NCBI with a threshold of e-value <1e−5.As P. forsythii may exhibit several unique characteristics for high-or low-altitude adaption, unique paralogs, and unclustered genes were identified as species-specific genes.Furthermore, in order to eliminate the disturbance of genetic differences caused by interspecific differences, species-specific genes of P. forsythii common to the high-and low-altitude groups were removed.GO terms that were significantly over-represented among P. forsythii-specific genes were identified using BiNGO 35 in Cytoscape. 36

Expanded and contracted gene families
Single-copy gene families were retrieved from the OrthoMCL results as described above and used for phylogenetic tree construction.The protein sequences from each family were aligned using MAFFT v7.475. 37Aligned sequences were eliminated in poorly aligned positions and divergent regions using Gblocks v0.91. 38The best-fit amino acid substitution model was selected using ProtTest v3.4.2 under the corrected Akaike information criterion and Bayesian information criterion, and the JTT + I + G + F model was finally selected. 39Subsequently, maximum likelihood phylogenetic tree was constructed using RAxML v8.2.10. 40The MCMCTREE program in the PAML v4.8 was used to calculate divergence time. 41The fossil calibration records were acquired from the TIMETREE website.The MCMC process was executed for 6,000,000 iterations after a burn-in of 2,000,000 iterations.Expansion and contraction gene families were determined by CAFE v3.1.Similarly, genes common to the high-and low-altitude adaptation groups from P. forsythii expanded and contracted families were removed.

Positively selected genes
To identify potential positively selected genes (PSGs) of P. forsythii, the gene family relationships between P. forsythii and three other species (Anolts carolinensis, Phrynocephalus przewalskii, and Phrynocephalus vlangalii) were calculated using the OrthoMCL.Protein sequences of each single-copy gene families were aligned using MAFFT v7.475 with default parameters.Subsequently, CDS alignments corresponding to those protein sequences were back-translated using PAL2NAL v14, 42 and the conservative region of these CDS alignments were extracted using Gblocks v0.91.The branch-site model of CODEML in PAML v4.8 was selected to test the potential PSGs as P. forsythii was set as the foreground branch and the other species as background branches.

Genome sequencing and assembly
Genome of P. forsythii was sequenced through the Illumina platform, Oxford Nanopore platforms, and Hi-C technology and a total of ~206.92Gb,~201.90Gb, and ~208.26Gb raw data were obtained, respectively (Table 1).According to the survey analysis, the total k-mer number was 94,714,458,115, and the peak k-mer frequency depth of 17-mer was 50 (Fig. 2).Finally, the estimated genome size was 1.89G.
The initial assembly of the genome was completed using the Nanopore long reads with N50 length of 19.58 Kb.After correcting the errors of the primarily assembled genome, a size of 1.82 Gb P. forsythii genome was assembled with the contig N50 was 46.22 Mb (Table 2), this genome size was basically consistent with the estimated genome size (1.89G).
Previous karyotype analysis showed that P. forsythii had 48 chromosomes (2n = 48), 1 we thus set k = 24 in the ALLHiC program to assemble the chromosome-level genome.Finally, ~207.91 Gb Hi-C clean data were mapped to the assembled contigs and 645 assembled contigs were mapped to 24 chromosomes, accounting for 99.35% of the entire genome, with the N50 lengths were 133.26Mb (Fig. 3A, Table 2).However, Hi-C-based heatmaps of DNA interaction of each chromosome indicated that two chromosomes (chromosome 11 and chromosome 24) have stronger interaction (marked with a red arrow in Fig. 3C), we, therefore, re-assembled chromosomal-scale genome and observed that contigs were well phased into 23 homologous chromosomes when K was set to 23, with 99.36% of contigs were clustered into the 23 groups (Fig. 3B).Given that previous studies have reported that P. forsythii has 48 chromosomes, we thus used the genome with 24 chromosome-scale scaffolds assembled by Hi-C data as the final assembly and for the subsequent analysis.

Genome assembly evaluation
BUSCO assessment indicated the presence of 94.7% of complete BUSCOs, with 94.2% complete and single-copy BUSCOs, 0.5% complete and duplicated BUSCOs, 2.8% fragmented BUSCOs, and 2.5% missing BUSCOs.Using the CEGMA method, a total of 239 conserved core genes (96.37%) of the 248 core eukaryotic genes were identified within the genome.Furthermore, a total of 97.78% of clean short reads could be mapped to the genome, and these clean reads covered 98.67% of the genome.These results indicated that the assembled genome covered most of the genetic regions, further confirming the assembly quality of the P. forsythii genome.

Genome annotation
A total of 845.99 Mb of repeat sequences were obtained, and these accounted for 46.56% of the P. forsythii genome.Among the repeated sequences, 3.26% (59.16 Mb) were tandem repeats and 46.1% (837.68Mb) were TEs.In the TEs, the long interspersed nuclear elements, short interspersed nuclear elements, DNA elements (SINEs) and long-terminal repeats account for 11.79%, 0.01%, 0.36%, and 40.97%, respectively (Table 3, Fig. 4).A total of 20,194 protein-coding genes were obtained by combining the results of three different approaches.Furthermore, the distributions of CDS length, exon length, exon number, gene length, and intron length of the P. forsythii genome were comparable to those of the closely related species, with the exception of Lacerta viridis and Lacerta bilineata, which have a large number of short-length CDSs and genes.We think that this feature may be unique to Lacerta species and therefore consider P. forsythii genome annotation results were accurate (Fig. 5).Additionally, of these protein-coding genes, 19,290 genes were annotated from the functional databases and accounted for 95.50% of predicted protein-coding genes (20,194) of P. forsythii (Table 4).

Genome alignment and synteny analysis
We mapped scaffolds-scale genome of P. przewalskii to the chromosomes of P. forsythii to identify the karyotype relationship in the Chinese Phrynocephalus species.In total, 1.16 Gbp sequences (67%) of P. przewalskii genome were aligned to the genome of P. forsythii.The alignments reveal that although the two species have different numbers of chromosomes, most of P. przewalskii scaffolds are aligned with 24 chromosomes of P. forsythii, and exhibit a unique forward alignment (Fig. 6A).
Genome synteny analysis showed that most of microchromosomes of P. forsythii exhibit genome synteny with those of P. platyrhinos.However, P. forsythii chromosomes 24, another microchromosomes, together with chromosome 11 exhibited genome synteny with one macrochromosome of P.platyrhinos (Fig. 6B and 6C).

Comparative genome analysis 3.5.1 Species-specific genes
To study the adaptive characteristics of P. forsythii in high altitude environments, we explored species-specific genes of P. forsythii.A total of 19,276 gene families and 10,986 unclustered genes were clustered in the nine species (Supplementary Table S3).Furthermore, 2,227 speciesspecific genes were identified.After removing species-specific genes common with the low-altitude group, 482 speciesspecific genes were identified (Fig. 7), these genes were over-represented in some major categories, such as DNA integration (corrected P = 2.48e−04), DNA metabolic process (corrected P = 1.67e−02), calcium-ion regulated exocytosis (corrected P = 2.48e−04), calcium-ion transport (corrected P = 5.68e−03) and calcium channel complex (corrected P = 5.34e−03) (Supplementary Table S4).
For the adaptation characteristics of P. forsythii to lowaltitude environments research, a total of 20,643 gene families and 13,047 unclustered genes were clustered in the 8 species (Supplementary Table S3).Of them, 2,437 species-specific genes were identified and 692 were retained after filtered genes common with the high-altitude group (Fig. 7).These speciesspecific genes unique to low-altitude environments were over-represented in some major categories, such as immune response-activating cell surface receptor signalling pathway (corrected P = 1.25e−03),B-cell receptor signalling pathway (corrected P = 1.25e−03), immune response-activating signal transduction (corrected P = 1.25e−03),ATPase inhibitor activity (corrected P = 4.47e−02), negative regulation of ATP-dependent activity (corrected P = 4.50e−02) and regulation of ATP-dependent activity (corrected P = 4.50e−02) (Supplementary Table S5).

Expanded and contracted gene families
In the high-altitude adaptation group, a total of 159 gene families were significantly expanded and 83 gene families contracted in the P. forsythii genome (Fig. 8A).After removing common genes with low-altitude adaptation group in expanded and contracted families, 455 genes from expanded gene families and 97 genes from contracted gene families were retained (Fig. 8C and D).These genes were executed    S6).For contracted gene families, genes were mainly enriched in gene function related to signal transduction (molecular transducer activity, signalling receptor activity, transmembrane signalling receptor activity, and G protein-coupled receptor activity) (Supplementary Table S6 and Supplementary Fig. S1).
In the low-altitude adaptation group, 138 and 285 gene families were significantly expanded and contracted (Fig. 8B).After removing common genes with the high-altitude adaptation group, 560 and 105 genes were retained from expanded and contracted gene families, respectively (Fig. 8C and D).Of then, genes from expanded gene families were over-represented in energy metabolism (lipid transport, lipid localization, long-chain fatty acid transport, fatty acid transport) and immune (lysozyme activity and peptidoglycan muralytic and genes from contracted gene families were mainly enriched in sensory perception (olfactory receptor activity, G protein-coupled receptor activity, transmembrane signalling receptor activity, signalling receptor activity, and molecular transducer activity) (Supplementary Table S7 and Supplementary Fig. S1).

Karyotypes relationship of Chinese Phrynocephalus
Previous study has revealed that karyotype of Chinese Phrynocephalus species originated from the same mutual ancestor and divergence into two different numbers of types: 2n = 46 or 2n = 48. 1 However, the karyotype evolutionary relationship between them has not been elucidated.In this study, we present a chromosome-level genome of P. forsythii which could provide some new insights into the chromosome relationships of Chinese Phrynocephalus species.
Twelve microchromosomes of P. forsythii (marked in the green box in Fig. 3C) exhibit strong interactions between chromosomes based on the Hi-C heat map.However, unlike the 12 microchromosomes mentioned above, there is a single microchromosome (chromosome 24, marked with a blue box in Fig. 3C) that has a strong interaction with a macrochromosome (chromosome 11, interaction was marked with red arrow in Fig. 3C).Hi-C is a powerful tool for detecting chromosomal rearrangements, which can be detected by interaction map between chromosomes located off the diagonal of the Hi-C heat map. 43Therefore, strong interaction between chromosome 11 and chromosome 24 of P. forsythii suggested that the two chromosomes may be fissioned from a single chromosome.To test our idea, we performed the genome synteny analysis between P. forsythii and Phrynosoma platyrhinos.The result showed that P. forsythii chromosome 11 and chromosome 24 exhibited genome synteny with one chromosome of P. platyrhinos, indicating that the two chromosomes of P. forsythii belong to one in genome of P.platyrhinos (Fig. 6B and 6C).Furthermore,  genome alignment analysis between P. forsythii (2n = 48) and P. przewalskii (2n = 46) showed that although the number of chromosomes differs between the two species, 1 almost all P. przewalskii scaffolds could align to 48 chromosomes of P. forsythii, indicating that the genomes of the two species have certain homology, otherwise the extra chromosome of P. forsythii could not be aligned by P. przewalskii.Therefore, we speculate that chromosomes 11 and 24 are a single chromosome in Chinese Phrynocephalus 2n = 46 species as well as species in other genera, and this chromosome breaks into two with the speciation of P. forsythii, which may also be the cause of the extra pair of chromosomes in the viviparous clade species.Novel microchromosomes created by macrochromosome fission have also been observed in other squamate reptiles, although such fission is much less common than the fusion of macro-and microchromosomes. 44Furthermore, although the number of macro-and microchromosomes varies among squamate reptiles, sequence comparisons show that macroand microchromosomes of these species are almost wholly conserved. 45In conclusion, our research on the karyotype relationship of Chinese Phrynocephalus provides evidence that microchromosomes could originate from macrochromosome fission.

High-altitude adaptation evolution
Compared with other species distributed at lower altitudes, several gene families in P. forsythii show expansion relative to species inhabiting lower altitudes, indicative of high-altitude adaptation in P. forsythii.First, the most major adaptation is  hypoxia response.When compared with other species living at low altitudes, P. forsythii species-specific genes were enriched in many GO categories related to calcium-ion transport.The calcium ion released from the sarcoplasmic reticulum is a principal determinant of cardiac contractility. 46In hypoxic environments, P. forsythii can enhance myocardial contractility by calcium-ion transport, thereby improving blood oxygen transport.Furthermore, many GO categories related to DNA methylation or demethylation have been found in P. forsythii expanded gene families.DNA methylation and demethylation enable animals to produce hypoxic responses and adaptation by regulating the transcription of hypoxia-response genes. 47hese results indicate that hypoxia is one of the major challenges for P. forsythii to adapt to high-altitude environments.Second, P. forsythii has increased energy metabolism, probably in response to hypoxic and low environment temperatures condition.Hypoxia reduces the oxygen supply to the cells and severely limits aerobic metabolism, thereby exacerbating the energy metabolism supply of animals that are native to high-altitude environments. 48Furthermore, ectothermic animals whose body temperatures are highly dependent upon the environment face a more serious challenge at high altitudes, as their access to energy is further limited in these environments. 49For these, gene involved in aerobic metabolism are observed in expanded gene families in the P. forsythii genome.Third, P. forsythii has responded to UV radiation.Increased exposure to UV radiation in high-elevation environments can damage DNA molecules, 50 so DNA damage response and repair pathways may show functional adaptations for high-altitude animals.Accordingly, we have found some P. forsythii genes from expanded gene families were enriched in GO categories related to DNA repair.Besides, immunity may also be affected by exposure to ultraviolet (UV) radiation.UV exposure in laboratory affects antigen presentation, inflammatory response, and cytokine Production. 51,52We identified genes that were significantly enriched in immunity categories in expanded gene families of P. forsythii, which indicated that high-altitude environment also weakens the immune system of P. forsythii.In summary, P. forsythii may have adapted to high-altitude environments by enhancing oxygen transport, energy, and immunological aspects.These may also be the genetic basis for the adaptation of other species in Chinese Phrynocephalus highland viviparous clade to the Tibetan plateau environment.

Low-altitude adaptation evolution
After originating at high altitudes, P. forsythii has subsequently spread and colonized the northern Tarim Basin, 10 an area with high ambient temperatures, and the high ambient temperature is thought another form of environmental stress for P. forsythii.Our result indicated that the genome of P.forsythii may have produced adaptive characteristics for the high ambient temperature environment.Firstly, compared with other species living at high altitudes, some P. forsythii genes from expanded gene families are enriched in lysozyme activity, which functions about antibacterial defence in animals are widely recognized. 53This indicated that the high ambient temperature has reduced the survival condition of low-altitude populations of P. forsythii and that bacteria are at higher densities at higher ambient temperatures, these conditions may lead to P. forsythii being more susceptible to bacterial invasion.This result is also consistent with our previous research, in which a larger number of pathogenic bacteria were found in the low-altitude population of P. forsythii. 54Furthermore, many species-specific genes of P. forsythii compared with other species living at high altitudes are enriched in many GO categories related to immune, which also indicated P. forsythii may respond to the infestation of these bacteria.Second, P. forsythii has also increased energy metabolism in high ambient temperature environments, and some enriched GO categories in P. forsythii expanded gene families are related to metabolism.As ectothermic, P. forsythii body temperature is highly dependent on the environment.In localities where ambient temperatures exceed the critical thermal maximum, ectotherms will retreat to thermal refugia on a daily basis, and therefore reducing foraging time and, hence, energy intake is therefore limited. 7Finally, positive selection gene Hspa1a has been found in P. forsythii genome.These genes belong to heat shock proteins (Hsp), when organisms are exposed to heat, cold, or some other stresses, they can synthesize Hsps which participate in unfolding and relocalization of proteins damaged by the stresses. 55These results indicated that P. forsythii has been stressed by the high-temperature environment and responded accordingly.

Conclusions
Based on Illumina short reads, Oxford Nanopore long reads, and Hi-C data, we generated a chromosome-level reference genome with a size of 1.82 Gb for P. forsythii.This chromosome-level genome indicated that two chromosomes of P. forsythii were originated from one ancestral chromosome of species with 46 chromosomes.Comparative genomic analysis revealed that P. forsythii genome has changed for coping with the life in the high-altitude extreme environment since it originated in the high altitude.In subsequent evolutionary history, P. forsythii has spread to a high ambient temperature environment, and the P. forsythii genome has produced some relevant adaptive characteristics.The genomic resources of P. forsythii and those candidate genes related to extreme environment adaptation provide more resources and insights about understanding the evolutionary development and ecological genomics of reptiles.

Figure 1 .
Figure 1.(A) photograph of Phrynocephalus forsythii from Xinjiang Uygur Autonomous Region.(B) The distribution of the Chinese Phrynocephalus species.The yellow circle represents the distribution regions of the lowland oviparous clade species and the green circle represents the distribution regions of the highland viviparous clade species.

Figure 3 .
Figure 3. HiC-based heatmaps of DNA interaction of each chromosome.(A) Set K = 24, the red arrow indicates that two chromosomes are strongly interacting.(B) Set K = 23.(C) A close-up view of the Hi-C data marked with a black box in (A).
for GO analysis, and were over-represented in some major molecular functional categories.In expanded gene families, genes were mainly enriched in DNA methylation or demethylation (DNA demethylation, DNA methylation or demethylation, oxidative DNA demethylation, oxidative single-stranded DNA demethylation, oxidative DNA demethylase), immune (antigen receptor-mediated signalling pathway, B-cell receptor signalling pathway, immune response-activating signal transduction, immune response-activating cell surface receptor signalling pathway, histone methyltransferase activity (H3-K4 specific) and immune response-activating cell surface receptor signalling pathway), energy metabolism (ATPase complex and ATP hydrolysis activity and ATP-dependent activity), and DNA repair (DNA dealkylation involved in DNA repair, DNA repair, cellular response to DNA damage stimulus, recombinational repair, and mismatch repair) (Supplementary Table

Figure 5 .
Figure 5. Annotation quality comparison of protein-coding genes.We compared the CDS length, exon length, exon number, gene length and intron length among 9 species.

Figure 6 .
Figure 6.Genome alignment and synteny analysis.(A) Large-scale genome alignment of P. forsythii (2n=48) and P.przewalskii (2n=46).The x-axis denotes the 24 chromosomes of P. forsythii.The y-axis represents genomic scaffolds of P.przewalskii.Blue dots denote unique forward alignments.Green dots denote unique reverse alignments.(B) and (C) Synteny between genomic regions of P. forsythii and P.platyrhinos.The thin lines represent all collinear regions in the corresponding chromosomes.Bold font denotes the corresponding chromosomes in P. forsythii and P.platyrhinos, respectively.Among them, chromosome 3 of the P.platyrhinos did not assemble into one; (B Synteny between all chromosomes of P. forsythii and P. latyrhinos; (C) Synteny between P. forsythii chromosome 11 and 24 and P. latyrhinos chromosome 2.

Figure 7 .
Figure 7.The count of P. forsythii species-specific genes obtained from high-altitude adaptation group and low-altitude adaptation group respectively.

Figure 8 .
Figure 8.Comparison of gene families.Gene family expansion and contraction in each evolutionary branch of high-altitude adaptation group (A) and lowaltitude adaptation group (B).The numbers of expanded and contracted gene families are shown at the nodes in the phylogenetic tree.The divergence time is given in millions of years.(C): Venn diagrams of genes from P. forsythii expansion gene family in the high-altitude adaptation group and lowaltitude adaptation group.(D) Venn diagrams of genes from P. forsythii contraction gene family in the high-altitude adaptation group and low-altitude adaptation group.

Table 1 .
Sequencing data used for moustache toad genome assembly and annotation

Table 2 .
Assembly data for the Phrynocephalus forsythii genome

Table 3 .
Statistics on TEs in Phrynocephalus forsythii genome

Table 4 .
Functional annotation of predicted protein-coding genes