Chromosome-Level Genome Assembly of the Cape Cliff Lizard (Hemicordylus capensis)

Abstract Squamates represent a highly diverse and species-rich vertebrate group that is remarkably understudied from a genomic perspective. A scarcity of genomic data is particularly evident for scincomorph lizards, which encompass over 10% of all living squamates, and for which high-quality genomic resources are currently lacking. To address this knowledge gap, we present the first chromosome-level reference genome for this group, generated from a male Cape cliff lizard (Hemicordylus capensis), using highly accurate PacBio HiFi long-read sequencing data, long-range Omni-C chromosomal conformation capture data and transcriptomic data for annotation. The rHemCap1.1 genome assembly spans 2.29 Gb, with a scaffold N50 of 359.65 Mb, and includes 25,300 protein-coding genes, with a BUSCO completeness score of 95.5% (sauropsida_odb10). We have generated the most contiguous and complete chromosome-level squamate reference genome assembly publicly available to date. Furthermore, we used short-read resequencing of 35 males and females and applied a differential coverage approach to infer the sex-determination system of the species, which was previously unknown. Our results suggest this species has XX/XY sex chromosomes, representing the first evidence of sex determination in the family Cordylidae. This reference genome will help to establish this species as an evolutionary model for studying variation in body armor, a key trait in cordylids and other squamate groups. Lastly, this is the first squamate reference genome from a continental African species and, as such, represents a valuable resource not only for further evolutionary research in cordylids but also in closely related groups.


Introduction
Squamates are the most diverse and species-rich group of reptiles (Vitt and Caldwell 2014;Simões and Pyron 2021) and the second-largest extant vertebrate order. They show variation in numerous ecologically relevant traits such as limblessness (Leal and Cohn 2018), coloration (Stuart-Fox et al. 2021), and chromosome number (Waters et al. 2021). Despite this group's exceptional phenotypic diversity, evolutionary research on its members is limited by a lack of high-quality genomic resources. To date, there are only 72 squamate reference genomes publicly available on NCBI (Sayers et al. 2021, accessed December 20, 2022, which is an order of magnitude lower when compared with similarsized groups such as the order Aves (734 reference genomes) or the clade Percomorphaceae (711 reference genomes; Sayers et al. 2021, accessed December 20, 2022. African squamates are especially poorly represented, with only the Madagascar ground gecko (Hara et al. 2018) present on NCBI (Sayers et al. 2021), and there is currently no genome assembly available for a continental African squamate species.
Within squamates, the infraorder Scincomorpha includes the families Cordylidae, Gerrhosauridae, Scincidae, and Xantusiidae, forming a sister group to all other squamates excluding dibamids and gekkotans and encompassing over 10% of all living squamates (Pyron et al. 2013;Vitt and Caldwell 2014). The family Cordylidae contains 68 species (Uetz et al. 2022) widely distributed across sub-Saharan Africa, from South Africa to Ethiopia (Stanley et al. 2011;Nielsen and Colston 2014). Cordylids are ecologically and morphologically diverse and are composed of the subfamilies Platysaurinae and Cordylinae (Stanley et al. 2011). The latter have evolved viviparity and a distinct appearance, characterized by the expression of osteoderms which constitute protective body armor (Broeckhoven et al. 2016(Broeckhoven et al. , 2018. Cordyline species differ greatly in this trait, and a reduction in body armor is thought to have evolved multiple times independently within the group (Stanley et al. 2011).
Given the lack of molecular data available for cordylids, the sex-determination system of this group remains unknown. Recently, the first evidence on the sex-determination systems of Gerrhosauridae (the sister group to Cordylidae) and Xantusiidae was published, suggesting a ZZ/ZW system shared among these groups (Kostmann et al. 2020;Kratochvíl et al. 2020;Nielsen et al. 2020). However, concrete evidence on the sex-determination system in cordylids is still lacking.
Among cordylids, the Cape cliff lizard, Hemicordylus capensis, exhibits remarkable variation in body armor, including between sexes and among populations (Stanley et al. 2011;Broeckhoven et al. 2017). It is an endemic species to South Africa, with a distribution that spans the Cape Fold Mountains in the Western Cape region, where it occupies highly diverse environments (Stanley et al. 2011). As such, this species provides an ideal model to study the rapid evolution of variation in body armor, a trait that fundamentally impacts many aspects of ecological performance (Broeckhoven et al. 2018).
We integrated PacBio HiFi long-read sequencing, Omni-C long-range proximity ligation, and RNA-sequencing data to generate a high-quality chromosome-level annotated genome assembly of the Cape cliff lizard, H. capensis. Furthermore, we used Illumina short-read population sequencing data to shed light on the sex-determination system in cordylids. This reference genome is a valuable resource that will enable evolutionary genomics studies for closely related species and families.

Genome Assembly
PacBio long-read sequencing yielded over 12.5 million HiFi reads across 6 SMRT Cells 8M, for a cumulative total of 98.14 Gb (supplementary table S1, Supplementary Material online). K-mer analysis (k = 32) showed a k-mer distribution profile consistent with the theoretical diploid model (bimodal distribution) and gave an estimated genome size of 2,276,570,909 bp (supplementary fig. S1, Supplementary Material online).
We generated a genome assembly from a male H. capensis ( fig. 1a) (Waters et al. 2021), the H. capensis genome comprises both macro-(6) and microchromosomes (11), with a diploid number identical to that of other cordylids and gerrhosaurids for which there is karyologic evidence (Olmo and Odierna 1980). The final assembly had a QV score of 61.49 (probability of <1 incorrect base call per 1 Mb) and a BUSCO completeness score of 95.5% using the sauropsida_odb10 gene set. This is the most contiguous and complete publicly available chromosome-scale squamate reference genome to date (supplementary fig. S2

Mitochondrial Genome Assembly
The mitochondrial genome assembled with MitoHiFi had a final size of 17,116 bp. It consists of 23 transfer RNAs and 14 protein-coding genes and has a base composition of A = 32.68%, C = 27.24%, G = 13.5%, T = 26.58%.

Synteny Analysis
We conducted a pairwise synteny analysis between H. capensis, Phrynosoma platyrhinos, and Sphaerodactylus townsendi to investigate how conserved chromosome structure is between these distantly related species. Synteny is largely conserved along the macrochromosomes across the three species (supplementary fig. S3, Supplementary Material online), especially between H. capensis and P. platyrhinos. Notably, chromosomes 12 and 16 are homologous to chromosome 16 in S. townsendi, but corresponding syntenic regions were not identified in P. platyrhinos. Chromosome 3 is syntenic to chromosome 3 in P. platyrhinos, which in turn is homologous to the long arm of chromosome 1, and chromosomes 6 and 9 in chicken (Koochekian et al. 2022). Our results suggest that this fusion may be more ancient than previously suggested and ancestral in Unidentata rather than in Toxicofera (Deakin et al. 2016).

Identification of the sex-determination System of H. capensis
To identify the sex chromosome and infer the sexdetermination system in H. capensis, we used a differential coverage approach, comparing whole-genome sequencing data from males and females. We short-read sequenced 35 individuals with an average genome-wide coverage of 4.4X (range 1.3-11.8X; supplementary table S7, Supplementary Material online). The 12th largest chromosome showed a marked difference in coverage between males and females (Welch's t = −13.7, P < 10 −10 ; fig. 2a). This difference resulted from the female sequences not aligning to specific regions on that chromosome ( fig. 2b). In other regions of chromosome 12, females had similar, but generally slightly lower, coverage compared with males ( fig. 2b). Of the eight juveniles for which sex was unknown, four were classified as female due to lower coverage on chromosome 12 and a mapping signal identical to that of the females (supplementary table S7 average coverage) showed a bimodal distribution in females, with peaks close to 0 and 1× the genome-wide average in females, whereas males had a higher peak close to 0.5× and a lower peak close to 1× the genome-wide average ( fig. 2c). In the regions present in both males and females, heterozygosity was significantly higher in males (Welch's t = −19.4, P < 10 −18 ; fig. 2d). In the regions present only in males, average male coverage was approximately half (55%) of that on the rest of chromosome 12 (Welch's t = −427, P ≈ 0). Together, these results suggest that chromosome 12 corresponds to a Y chromosome with extensive pseudo-autosomal regions. Therefore, we conclude that H. capensis has an XX/XY sex-determination system, with males being the heterogametic sex. This result is particularly interesting given the recent identification of ZZ/ZW systems in the two other Cordylomorpha families, Gerrhosauridae and Xantusiidae (Kostmann et al. 2020;Kratochvíl et al. 2020;Nielsen et al. 2020). As in other squamate clades (Augstenová et al. 2018;Nielsen et al. 2018;Keating et al. 2021), sex-determination systems appear to be exceptionally dynamic in this group, and further studies of species-specific scincomorph sexdetermination systems are warranted.

Sample Collection, DNA Extraction, and Sequencing
The reference genome was sequenced and assembled from a wild-caught male Cape cliff lizard (rHemCap1) collected at Gifberg, Western Cape, South Africa. High-molecular weight DNA was extracted from brain, liver, and gonadal tissue using the Nanobind Tissue Big DNA kit (Circulomics) and sequenced on six PacBio Sequel IIe SMRT Cells 8M (HiFi) by Inqaba Biotechnical Industries, Pretoria, South Africa.

Omni-C Library Preparation and Sequencing
Two Omni-C libraries were prepared from blood collected from the reference male using the Dovetail Omni-C kit according to the manufacturer's protocol. The fragment size distribution and library concentration were assessed by VIB-UAntwerpen using a Fragment Analyzer (Agilent Technologies) and a Qubit Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA), respectively. Paired-end 150 bp sequencing was performed by GENEWIZ on an Illumina NovaSeq platform.

RNA-Sequencing
Total RNA was extracted from five tissues (brain, gonads, heart, liver, and muscle) for the reference male and a female from the same population, using the TRIzol™ Reagent and Phasemaker™ Tubes Complete System (Invitrogen) and following the manufacturer's instructions. RNA integrity was checked using a Fragment Analyzer. Library preparation with poly(A) enrichment and paired-end 150 bp sequencing was performed by GENEWIZ on an Illumina NovaSeq platform.

Genome Annotation
To identify and mask repetitive sequences, we used RepeatModeler2 v2.0.3 (Flynn et al. 2020) for de novo identification of transposable elements. These regions were then masked along with known metazoan repeats in the assembly using RepeatMasker v4.1.2 (Smit et al. 2013), and a repeat element landscape plot was generated (supplementary fig. S4, Supplementary Material online). Following repeat annotation, we conducted gene annotation. Firstly, RNA-seq reads were filtered with fastp v0.23.2 (Chen et al. 2018) in default mode and then mapped to the genome with STAR v2.7.10a (Dobin et al. 2013). Then, gene models in the masked assembly were predicted with BRAKER2 v2.1.6 using GeneMark-EX, and AUGUSTUS v.3.4.0 , with the RNA-seq mapped reads as evidence and the pre-trained parameter set for chicken, the closest available reference taxon. TSEBRA v1.0.3 (Gabriel et al. 2021) was used with default parameters to select the best predicted gene models based on RNA-seq evidence. Next, we ran InterProScan v5.57-90.0 (Jones et al. 2014) and eggNOG mapper v2.1.9 (Cantalapiedra et al. 2021) for functional annotation. All annotation information was integrated with funannotate v.1.8.13 (https://github. com/nextgenusfs/funannotate) to produce a final annotation file. Lastly, we evaluated the annotation completeness by running BUSCO v5.2.2 in protein mode with the saurop-sida_odb10 data set.

Sex-Determination Analysis
For Illumina short-read sequencing, DNA of 35 additional individuals (20 males, 7 females, and 8 juveniles for which sex was unknown) from the same population as the reference male was extracted from tail tissue with the DNeasy Blood & Tissue Kit (QIAGEN) following the manufacturer's instructions. Library preparation and paired-end 150 bp sequencing was performed by GENEWIZ on an Illumina NovaSeq platform. Raw reads were filtered with fastp v0.23.2 (Chen et al. 2018) and aligned to the reference genome with BWA MEM v0.7.17-r1188. Coverage was calculated for each sample with SAMtools v1.14 (Danecek et al. 2021), and BEDTools v2.30.0 (Quinlan and Hall 2010) was used to count alignments per sample along 150 kb windows along the 12th chromosome with the MultiCoV function. To investigate heterozygosity differences between sexes, variants on the 12th chromosome were called with BCFtools v1.14 (Danecek et al. 2021) with minimum mapping quality 30 and minimum base quality 30. Variants were filtered with VCFtools v0.1.16 (Danecek et al. 2011) to include only sites present in at least 95% of individuals, with a minimum depth of one read and a maximum depth of 50 reads. Heterozygosity was then calculated for each individual with plink v2.00a2.3LM (Chang et al. 2015) and tested for a difference between sexes with Welch's t-test.

Supplementary material
Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).