Characterization of a pericentric inversion in plateau fence lizards (Sceloporus tristichus): evidence from chromosome-scale genomes

Abstract Spiny lizards in the genus Sceloporus are a model system among squamate reptiles for studies of chromosomal evolution. While most pleurodont iguanians retain an ancestral karyotype formula of 2n = 36 chromosomes, Sceloporus exhibits substantial karyotype variation ranging from 2n =  22 to 46 chromosomes. We present two annotated chromosome-scale genome assemblies for the Plateau Fence Lizard (Sceloporus tristichus) to facilitate research on the role of pericentric inversion polymorphisms on adaptation and speciation. Based on previous karyotype work using conventional staining, the S. tristichus genome is characterized as 2n =  22 with six pairs of macrochromosomes and five pairs of microchromosomes and a pericentric inversion polymorphism on chromosome 7 that is geographically variable. We provide annotated, chromosome-scale genomes for two lizards located at opposite ends of a dynamic hybrid zone that are each fixed for different inversion polymorphisms. The assembled genomes are 1.84–1.87 Gb (1.72 Gb for scaffolds mapping to chromosomes) with a scaffold N50 of 267.5 Mb. Functional annotation of the genomes resulted in ∼15K predicted gene models. Our assemblies confirmed the presence of a 4.62-Mb pericentric inversion on chromosome 7, which contains 62 annotated coding genes with known functions. In addition, we collected population genomics data using double digest RAD-sequencing for 44 S. tristichus to estimate population structure and phylogeny across the Colorado Plateau. These new genomic resources provide opportunities to perform genomic scans and investigate the formation and spread of pericentric inversions in a naturally occurring hybrid zone.


Introduction
Chromosomal rearrangements play important roles in adaptation, divergence, and speciation (Wellenreuther and Bernatchez 2018). Pericentric chromosome inversions, which are classified as inversions that include a centromere, impose significant evolutionary and ecological constraints on genome evolution by trapping alleles on inverted chromosome segments. This greatly reduces recombination with noninverted regions and increases linkage disequilibrium within inverted regions (Kirkpatrick 2010). Furthermore, when locally adapted alleles are located inside of an inversion, such as those that are ecologically relevant, the inversion can spread to fixation in a population (Kirkpatrick and Barton 2006).
The phrynosomatid lizard genus Sceloporus is a diverse clade containing 108 species with a broad distribution across North America (Leaché et al. 2016). Differentiation in the fundamental number of chromosomes is hypothesized to be a factor responsible for driving the rapid diversification of the genus (Hall 2009;Leaché and Sites Jr. 2009). While many species have the ancestral karyotype formula of 2n ¼ 36 chromosomes, Sceloporus exhibits substantial karyotype variation (ranging from 2n ¼ 22 to 46), sex chromosome evolution (Lisachov et al. 2020), and large genome rearrangements (Leaché and Sites Jr. 2009). Existing genomic resources for Sceloporus include an annotated genome for S. undulatus (Westfall et al. 2020), a de novo assembled shotgun genome for S. occidentalis and partial genomes for 34 other species (Arthofer et al. 2015).
Here, we present two annotated chromosome-scale genome assemblies for the Plateau Fence Lizard (Sceloporus tristichus), which is 2n ¼ 22 with six pairs of macrochromosomes and five pairs of microchromosomes, a karyotype formula shared with all 10 member of the undulatus species group (Leaché et al. 2016). An early study of the karyological differences within the group using conventional staining techniques identified a distinctive pericentric inversion polymorphism on chromosome 7 that varies geographically (Cole 1972). This inversion polymorphism produces distinctly recognizable variants classified by the position of the centromere (Cole 1972), and individuals with heteromorphic pairs of distinct chromosome 7 inversions can be found in a dynamic hybrid zone (Leaché and Cole 2007). The hybrid zone occurs in Arizona's Colorado Plateau at the ecotone between Great Basin Conifer Woodland and Grassland habitats (Leaché and Cole 2007). Temporal comparisons of clinal variation in kayotypes, morphology, mitochondrial DNA, and SNPs suggests that the hybrid zone is moving, possibly as a response to habitat modification (Leaché and Cole 2007;Leaché et al. 2017). The new genomic resources, reported here, provide a framework for increasing our understanding of the formation and spread of large chromosome inversions across broad temporal and spatial scales.

Materials and methods
Sampling, library preparation, and sequencing We sequenced two female specimens of S. tristichus collected from a hybrid zone in Navajo County, Arizona, USA in 2002. The two individuals were karyotyped to determine the position of the centromere on chromosome 7 in a previous study (Leaché and Cole 2007). One specimen (NCBI BioSample ID SAMN15450893; voucher specimen AMNH 153954) is from the northern end of the hybrid zone (Holbrook; HOL), and is submetacentric at chromosome 7 (a nonmedial centromere that is closer to the middle than to either end of the chromosome). The second specimen (NCBI BioSample ID SAMN15450894; voucher specimen AMNH 154032) is from the southern end of the hybrid zone (Snowflake; SNOW), and is telocentric at chromosome 7 (a terminal centromere that is close to the end of the chromosome) (Table S1).
Genome sequencing was performed by Dovetail Genomics (Santa Barbara, CA, USA). High molecular weight genomic DNA was extracted from flash frozen liver tissues and validated for large fragment sizes (50-100 kb) using pulsed field gel electrophoresis. 10X Chromium libraries were prepared using the Chromium Genome Reagent Kit (10x Genomics, Pleasanton, CA, USA) followed by Illumina HiSeqX sequencing (150-bp paired-end reads). One Chicago (Putnam et al. 2016) and one Hi-C library (Belton et al. 2012) were prepared for each sample. Quality control for these libraries was performed by mapping reads (75-bp paired-end MiSeq reads) to draft 10x Supernova assemblies. Finally, both libraries were sequenced on an Illumina HiSeqX (150-bp paired-end reads). Raw sequencing reads are deposited in the NCBI Sequence Read Archive (BioProjectID PRJNA644186).

Genome assembly and annotation
For each individual, a draft genome was de novo assembled from 709.35 million 10X Chromium sequence reads using the SuperNova assembly pipeline. Chromosome-scale scaffolding was achieved by mapping the Chicago and Dovetail Hi-C libraries back to the 10x Supernova assembly (Weisenfeld et al. 2017) using the HiRise software pipeline (Putnam et al. 2016).
Functional annotation of the two genomes was conducted with Funannotate v1.7.2. (Palmer and Stajich 2019) on a 64-core Ubuntu machine and the Hyak NextGen cluster supercomputer at the University of Washington. Briefly, Funnanotate aligns raw RNASeq reads to a genome sequence with minimap2 and assembles them using Trinity Grabherr et al. (2011). Such assemblies are used along with PASA predictions Haas et al. (2003) to build consensus gene model predictions, and train Augustus Stanke et al. (2004) and GeneMark-ES/ET Lomsadze et al. (2005). Repetitive regions are softmasked in Funannotate and the softmasked genomes are passed to the ab initio predictors. We used the Tetrapoda BUSCO (Benchmarking Universal Single-Copy Orthologs) dataset for gene prediction (Simão et al. 2015) and specified the option -repeats2evm, which passes repeat information to Evidence Modeler. Finally, Funnanotate performs genome functional annotations using the outputs of PFAM (Finn et al. . We ran the pipeline for each of the two HiRise genome assemblies and included a raw RNASeq dataset prepared from skeletal muscle of an adult female of S. undulatus (BioSample SAMN06312743) to build consensus gene model predictions. Functional annotation was performed on the 11 scaffolds or putative chromosomes, which included 86.4% and 91.3% of the total length of the HiRise assemblies (Table 1).

Pericentric inversion polymorphism
To identify the locations of inversions on chromosome 7 between HOL and SNOW we used progressive Mauve v.2.4.0 (Darling et al. 2004). The program was run with default "seed families" and default values for all other parameters. We determined the identity of large scaffolds by comparing S. tristichus assemblies to the assembled chromosome-scale genome of the closely related species S. undulatus (Westfall et al. 2020) (Table S3 and S4). Furthermore, HiCUP v0.8.0 (Wingett et al. 2015) was used to map the Hi-C paired end raw sequencing reads of HOL against the assembled chromosome 7 of SNOW using bowtie2 v.2.4.2. Briefly, HiCUP was used to digest the SNOW genome with DpnII, truncate the HOL raw reads at the putative Hi-C ligation junction, map Hi-C raw reads to the SNOW digested reference genome, and remove Hi-C artefacts and putative PCR duplicates. A pairwise interaction matrix was computed using 500-kb windows with HOMER (Heinz et al. 2010). The matrix was normalized for sequence depth. Finally, a normalized linkage density heatmap of the first 25 Mb was generated with TreeView3 (Page 1996).

Population genomics of the plateau fence lizard
To investigate the population structure of S. tristichus, we collected ddRADseq data for 44 samples from across the Colorado Plateau, USA (Table S5). The demultiplexed data are available at the NCBI Sequence Read Archive (SAMN14488621-SAMN14488665). The SNOW genome presented here was used as a reference to guide the assembly and alignment. To estimate the number of populations (K), we used PCA and discriminant analysis of principal components (DAPCs). A concatenated phylogenetic analysis of the SNP data using maximum likelihood with an acquisition correction model was estimated using one sample of Sceloporus cowlesi to root the phylogenetic tree. Detailed methods Sequence lengths are given in megabases. L50, minimum number of scaffolds that make up 50% of the total assembly length; N50, minimum scaffold size of fragments that make up 50% of the total assembly length; L90, minimum number of scaffolds that make up 90% of the total assembly length; N90, minimum scaffold size of fragments that make up 90% of the total assembly length.
for sequencing and analysis of the population genomics data are provided in the Supplementary Materials.

Data availability
The raw data for genome sequencing underlying this article are available at the NCBI Sequence Read Archive (SRA) under Bioproject ID: PRJNA616379 (SRR12147734-SRR12147739). GenBank accession numbers for the two genome assemblies are JACSCI000000000 (HOL) and JACSCJ000000000 (SNOW). ddRADseq raw reads are under NCBI SRA accessions (SAMN14488621-SAMN14488665).

Results and discussion
Assembly and annotation  Figures S3 and S4). The top six scaffolds are on average 206 Mb longer than the subsequent five, which is expected given that S. tristichus has 6 macrochromsomes. Functional annotation of the genomes resulted in up to 14,128 and 15,469 predicted gene models for HOL and SNOW, respectively (https://doi.org/10.6084/m9.figshare.13180268). Chromosome 7 spanned 864 annotated genes for HOL, whereas 1,030 were identified for SNOW. The higher number of annotated genes in SNOW is likely the result of a higher quality and more complete assembly compared to HOL as interpreted from the assembly summary statistics in Table 1.

Pericentric inversion polymorphism
Alignment of chromosome 7 with Mauve and HOL Hi-C reads mapping to chromosome 7 in SNOW resulted in the detection of a 4.62-Mb pericentric inversion (Figures 1 and 2). A chromosomal rearrangement at chromosome 7 distinguishes populations of S. tristichus located at opposite ends of the hybrid zone and is expected to reduce recombination in hybrids with heteromorphic   pairs of chromosome 7, which are found at the center of the hybrid zone. A total of 62 annotated genes with known function are found within the region that spans the inversion (Figure 1 and Supplementary Material Table S4). Whether any of these genes are involved in local adaptation, functional enrichment, or fixation of the inversion remains to be examined. The large size of this inversion makes it more likely that it would span multiple loci involved in local adaptation. However, it also makes it more likely to suffer meiotic costs that could reduce hybrid fitness (Kirkpatrick and Barton 2006). Adding sequencing data to refine our assemblies would help to confirm the extent of the inversion, as chromosome 7 in HOL included two scaffolds. As we used S. undulatus as a reference to confirm our chromosome-scale assemblies, we expect to find 1:1 synteny with S. undulatus (Westfall et al. 2020). Therefore, the pericentric inversion could be much larger than reported here, since it is possible that either one or both of the chromosome 7 scaffolds in HOL are reversed.

Population genomics of the plateau fence lizard
Reference-based assembly of the ddRADseq data for 44 S. tristichus provided 8,652 loci mapping to scaffolds and a total of 20,299 SNPs (Supplementary Table S7). PCA analysis revealed regional population structuring across the Colorado Plateau, and DAPC results provide biologically plausible models for K ¼ 2, 3, and 4 (Supplementary Figures S5-S9). The hybrid zone samples (HOL and SNOW) belong to distinct populations under each of these models; HOL is a member of a northern population that extends north across the Colorado Plateau into Utah, Colorado, and Wyoming, while SNOW clusters with other southern samples from the Mogollon Rim and central Arizona. The phylogenetic analysis used 8,130 biallelic SNPs and supported clades that are concordant with the populations identified using PCA and DAPC (Supplementary Figure S10 and Supplementary Table S8). This analysis provides an evolutionary context for studying the formation of the hybrid zone, which appears to be the result of secondary contact and not primary divergence (Supplementary Figure S10).
These new genomic resources provide a starting point for understanding how a pericentric inversion polymorphism could cycle in frequency in a dynamic hybrid zone that is moving through time. Identifying the location, size, and gene content of this inversion polymorphism provides the necessary tools for investigating the role of inversions on local adaptation and speciation. Bins of 500 kb were used. The heatmap is visualized as log2 ratios of observed interactions relative to expected interactions. Raw Hi-C sequence reads of HOL were mapped to SNOW. Blue squares indicate high interactions that are visible at the length reported for the inversion in this study (4.62MB). Interactions along the diagonal are the highest, as expected. The blue arrow indicates the approximate number of interaction counts at the inversion.