-
PDF
- Split View
-
Views
-
Cite
Cite
Ryuhei Hatanaka, Katsunori Tamagawa, Nami Haruta, Asako Sugimoto, The impact of differential transposition activities of autonomous and nonautonomous hAT transposable elements on genome architecture and gene expression in Caenorhabditis inopinata, Genetics, Volume 227, Issue 2, June 2024, iyae052, https://doi.org/10.1093/genetics/iyae052
- Share Icon Share
Abstract
Transposable elements are DNA sequences capable of moving within genomes and significantly influence genomic evolution. The nematode Caenorhabditis inopinata exhibits a much higher transposable element copy number than its sister species, Caenorhabditis elegans. In this study, we identified a novel autonomous transposable element belonging to the hAT superfamily from a spontaneous transposable element-insertion mutant in C. inopinata and named this transposon Ci-hAT1. Further bioinformatic analyses uncovered 3 additional autonomous hAT elements—Ci-hAT2, Ci-hAT3, and Ci-hAT4—along with over 1,000 copies of 2 nonautonomous miniature inverted-repeat transposable elements, mCi-hAT1 and mCi-hAT4, likely derived from Ci-hAT1 and Ci-hAT4 through internal deletion. We tracked at least 3 sequential transpositions of Ci-hAT1 over several years. However, the transposition rates of the other 3 autonomous hAT elements were lower, suggesting varying activity levels. Notably, the distribution patterns of the 2 miniature inverted-repeat transposable element families differed significantly: mCi-hAT1 was primarily located in the chromosome arms, a pattern observed in the transposable elements of other Caenorhabditis species, whereas mCi-hAT4 was more evenly distributed across chromosomes. Additionally, interspecific transcriptome analysis indicated that C. inopinata genes with upstream or intronic these miniature inverted-repeat transposable element insertions tend to be more highly expressed than their orthologous genes in C. elegans. These findings highlight the significant role of de-silenced transposable elements in driving the evolution of genomes and transcriptomes, leading to species-specific genetic diversity.
Introduction
Transposable elements (TEs) have the ability to move or replicate within the genome, leading to mutations and changes in genomic structure. TEs are broadly classified into 2 classes based on their transposition mechanism: Class I, or retrotransposons, which use RNA intermediates, and Class II, or DNA transposons, which use DNA intermediates for movement (Wicker et al. 2007). TEs within these classes can either be “autonomous,” having their own functional coding regions for transposases that enable independent transposition, or “nonautonomous,” lacking such coding regions and thus relying on transposases from autonomous elements for their transposition (Wicker et al. 2007).
hAT transposons, a major superfamily of Class II TEs, use a cut-and-paste mechanism for transposition and are found across a wide range of eukaryotes (Atkinson 2015). Autonomous hAT transposons feature 11–24-bp terminal inverted repeats (TIRs) at both ends and an internal open reading frame (ORF) that encodes a transposase. Noncoding sequences between the TIRs and the ORF include short subterminal repeats near the ends. Sometimes, these repeats partially overlap with the internal TIR sequences (Atkinson 2015). The sequence and number of these repeats vary among different hAT transposons and serve as transposase binding sites, aiding in the recognition of TIRs. hAT transposases, approximately 600–800 amino acids in size, possess a DDE motif originating from the RNase H domain, forming the catalytic core (Hickman et al. 2014). During the transposition process, the hAT transposase excises the transposon from its original location and insert it into a new site, typically resulting in the creation of 8-bp target site duplications (TSDs) (Atkinson 2015).
Miniature inverted-repeat transposable elements (MITEs) are a major group of nonautonomous Class II TEs (Fattash et al. 2013). Typically less than 1,000 bp in size, MITEs are AT-rich and often present in high copy numbers within genomes. These elements are derived from autonomous TEs through deletion, thus sharing significant structural and sequence similarities with their parental elements. Each MITE possesses TIRs at both ends, which are flanked by TSDs. Since MITE transposition relies on the recognition of TIRs by the transposase of related autonomous TEs, MITEs can be categorized into existing superfamilies of Class II TEs determined by the nature of their TIRs and TSDs. MITEs belonging to the hAT superfamily have been identified in a variety of organisms. For example, in the rice genome, active MITE families, nDart, dTok, and nDaiZ, have been found (Fujino et al. 2005; Moon et al. 2006; Huang et al. 2009). In the genome of lancelet Branchiostoma floridae, an estimated 2,500 copies of the MITE family LanceleTn-1 has been identified (Osborne et al. 2006).
In the genome of the model nematode Caenorhabditis elegans, around 12–16% is composed of transposon sequences (C. elegans Sequencing Consortium 1998; Laricchia et al. 2017). The majority of these belong to the Tc1/mariner superfamily, a group of Class II TEs, and their transposition activity has been observed (Eide and Anderson 1985; Bessereau 2006). While sequences derived from hAT transposons have been identified in C. elegans and other Caenorhabditis species, there are no reports of active hAT transposons in this genus (Bigot et al. 1996; Chesney et al. 2006; Woodruff and Teterina 2020). In C. elegans, transposon activity is typically suppressed by small RNA-mediated mechanisms (Sijen and Plasterk 2003; Slotkin and Martienssen 2007; Lee et al. 2012).
Within the Caenorhabditis genus, both Class I and Class II TEs have notably expanded in Caenorhabditis inopinata, the phylogenetically closest species to C. elegans (Kanzaki et al. 2018). This species has a genome approximately 23 megabases (Mb) larger than that of C. elegans, with a protein similarity to C. elegans of 81.3%. Despite their close relationship, these 2 species show significant differences in various aspects such as reproductive modes, body sizes, and habitats. In C. inopinata, several genes are disrupted due to TE insertions, including ergo-1, which is part of the 26G small-interfering (si)RNA pathway crucial for gene silencing in C. elegans (Yigit et al. 2006; Piatek and Werner 2014). Moreover, additional genes involved in this pathway, namely eri-6/7 and eri-9, are missing in C. inopinata, suggesting a divergence in small RNA regulatory mechanisms between the 2 species (Kanzaki et al. 2018). In contrast to C. elegans, where TE insertions typically lead to gene downregulation, in C. inopinata, they appear to cause gene upregulation (Kawahara et al. 2023). This highlights distinct TE regulatory patterns in C. inopinata compared to C. elegans. However, the current activities of TEs and their impacts in C. inopinata remain largely unexplored.
In this study, we identified 4 hAT superfamily TEs within the C. inopinata genome. We found that at least 3 of these transposons are currently active. Additionally, we found 2 hAT MITE families, likely to be deletion derivatives of 2 of the identified hAT transposons, have significantly expanded in the C. inopinata genome. Furthermore, C. inopinata genes with the insertion of these TEs tend to be more highly expressed than their counterparts in C. elegans. These findings imply that the transposition and expansion of de-silenced hAT TEs may have contributed to altering gene networks, leading to the species-specific traits of C. inopinata.
Materials and methods
C. inopinata maintenance and strains
Wild-type (NKZ35) and mutant C. inopinata strains were cultured as described (Kanzaki et al. 2018; Oomura et al. 2022). The alleles used in this study are listed in Supplementary Table 6. C. inopinata strains were maintained by transferring worm clusters containing more than 50 worms that contain both males and females.
The spontaneous Cin-bli-1(Sp34_20288200)(tj120) mutant was initially identified in the NKZ35 laboratory culture. Its Bli phenotype was detectable only in adult males and females. To establish the Cin-bli-1(tj120) mutant line, the original Bli female adult was first crossed with wild-type males to increase their numbers. After at least 2 generations, adult Bli males and L4 sibling females (a mix of +/+, Cin-bli-1/+, and Cin-bli-1/Cin-bli-1) were transferred to a new plate. Several days later, Bli gravid adult females (Cin-bli-1/Cin-bli-1) were transferred to a separate plate to maintain the homozygous Cin-bli-1 mutants.
Microscopy
Images of adult worms were captured on NGM plates using a Nikon SMZ1270 microscope equipped with a LEICA MC170 HD camera using the LAS-EZ software.
Homology search
To determine the original locus and distribution of Ci-hAT1 in the C. inopinata reference genome, a BLASTN search was conducted using the complete Ci-hAT1 sequence as the query. Additionally, BLASTP was performed to find homologs of the Ci-hAT1 transcript across the Caenorhabditis genus, utilizing data available in WormBase ParaSite (https://parasite.wormbase.org/index.html) (Howe et al. 2017).
Extraction of genomic DNA and single-worm PCR
For genomic DNA extraction, C. inopinata populations were washed from a 60-mm NGM agar plate with M9 buffer and collected by centrifugation. The resulting worm pellet was washed twice with M9 buffer and frozen in liquid nitrogen. The pellet was then resuspended in a 10-fold volume of lysis buffer [10-mM Tris–HCl, pH 8.5, 50-mM KCl, 2.5-mM MgCl2, 0.5% Tween 20, 0.45% NP-40 and 0.01% gelatin, 0.5 mg/ml of Proteinase K (NACALAI TESQUE) and RNase A (QIAGEN)] and incubated at 60°C for 1 h. Genomic DNA was subsequently isolated using phenol-chloroform-isoamyl alcohol extraction followed by ethanol precipitation.
For single-worm PCR, individual worms were lysed in 10-µl lysis solution (10-mM Tris–HCl, pH 8.5, 50-mM KCl, 1-mM EDTA, 0.5% Tween 20, and 0.5 mg/ml of Proteinase K) at 60°C for 1 h. Proteinase K was inactivated by heating at 95°C for 15 min, and the lysate was then directly used for PCR. All PCRs were performed using KOD One PCR Master Mix (TOYOBO, Japan). The primers used for each PCR are listed in Supplementary Table 7. PCR product was purified using either Wizard SV Gel and PCR Clean-Up System (Promega) or QIAquick Gel Extraction Kit (QIAGEN) for sequence analysis.
Bioinformatic characterization of hAT transposons in C. inopinata
Consensus TSDs and TIRs for Ci-hAT1, Ci-hAT2, Ci-hAT3, and Ci-hAT4 on the C. inopinata reference genome were identified manually, based on the coding regions of respective transcript. For repeat sequence detection, dot plot alignments were utilized (window size: 10, stringency: 8). The GC contents were calculated using a window size of 25 bp with a step size of 10 bp. Domain analysis of transposases and sequence alignment were conducted using the InterPro search tool (https://www.ebi.ac.uk/interpro/search/sequence/) and Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/), respectively. The expression levels of each transposase were determined using TPM normalization from RNA-sequencing data retrieved from GenBank (PRJDB14254).
Determine insertion sites of Ci-hAT1 by inverse PCR
Insertion sites of Ci-hAT1 were determined using inverse PCR, a method adapted from previous studies (Boulin and Bessereau 2007). Genomic DNA extracted from C. inopinata wild-type or Bli mutant populations was digested with SphI (New England Biolabs), a 6-base restriction endonuclease that does not cleave the Ci-hAT1 sequence. After heat inactivation of the enzyme, the digested DNA was ligated with T4 DNA ligase (NIPPON GENE). The PCR was performed in 2 steps using adjacent primers oriented in opposite directions. For the second PCR, primers binding more externally than those used in the first PCR were employed, with a 100-fold dilution of the first PCR products serving as templates. Primers for the first and second steps are listed in Supplementary Table 7. The PCR products were electrophoresed on a 1% agarose gel, followed by purification for sequence analysis. The genomic loci of Ci-hAT1 were determined by identifying the junction between the Ci-hAT1 sequence and the host genomic sequence.
Phylogenetic analysis of hAT transposases
To deduce the phylogenetic relationships of the Caenorhabditis hAT transposases, we compiled members of the hAT superfamily from a diverse range of taxa, referencing the previous studies (Supplementary Table 8) (Arensburger et al. 2011; Atkinson 2015). The protein sequences were aligned using MAFFT (Katoh et al. 2002), followed by the removal of spuriously aligned regions using TrimAl with default parameters (Capella-Gutiérrez et al. 2009). These aligned sequences were then used to construct a maximum-likelihood phylogenetic tree using IQ-TREE with 1,000 replicates for ultrafast bootstrap analysis (Nguyen et al. 2015).
Extraction and characterization of Ci-hAT1-like and Ci-hAT4-like TEs
To extract additional Ci-hAT-like TEs from the nematode genomes, we developed a stepwise approach. Positions 1–7 of the Ci-hAT TIRs are common except for the first base of the Ci-hAT4 TIR. It is predicted that the transposases directly bind to the first 5 bases of TIRs of each hAT transposon (Hickman et al. 2014). Thus, as an initial step, “CAGTGT” (bases 1–6 of the TIRs of Ci-hAT1, Ci-hAT2, and Ci-hAT3) and “TAGTGT” (bases 1–6 of the TIR of Ci-hAT4) were selected as query sequences to search for 5′-TIRs. For each identified 5′-TIR, we then searched for its reverse complement sequence “ACACTG” or “ACACTA”, respectively (representing the 3′-TIRs) within a distance of 80–5,000 bp. Between the pairs of TIRs found within this specified range, when the flanking 8-bp sequences outwardly from the 5′-TIR and 3′-TIR were identical, the 8-bp sequences were considered as TSDs. The sequence enclosed by these matched regions was categorized as a Ci-hAT1-like or Ci-hAT4-like TEs. Those less than 1,100 bp in length were further classified as Ci-hAT1-like or Ci-hAT4-like MITEs, respectively.
The consensus sequence patterns of TSDs and TIRs of Ci-hAT-like MITEs were visualized using WebLogo 3.0 (Crooks et al. 2004). To identify sequence similarities within Ci-hAT-like MITEs and the full-length parental transposons, BLAST was used (Altschul et al. 1990). BLASTN searches, with a word size 6 and the sequence complexity filter disabled, were conducted against the full-length parental transposons, using the sequences of the Ci-hAT-like MITEs as queries.
Distribution of MITEs and clustering of genes with MITE insertions
The distribution densities of mCi-hAT1, mCi-hAT4, and protein-coding genes across the 6 chromosomes of C. inopinata were analyzed using kernel density estimation (KDE) via seaborn library in Python (Waskom 2021).
To investigate the tendencies and effects of MITE insertions, protein-coding genes in the genome were categorized based on the insertion sites of mCi-hAT1 and mCi-hAT4. These categories included coding sequences (CDS), regions within 2-kb upstream of the first CDS, intron, and regions within 2-kb downstream of the last CDS. A 2-kb region was chosen as the flanking region for the analysis, based on the observation that the cis-acting sequences are typically found within 2-kb upstream of the translational start codon, and that long-range control mechanisms for transcription are rare in C. elegans (Reinke et al. 2013).
Estimation the difference of propagation timing of the 2 MITE families
The divergence between sequences can be used to estimate the timing of insertion of MITEs, i.e. a trace of the activity of transposons in the genome (Perumal et al. 2020). The MITE sequences of mCi-hAT1 and mCi-hAT4 were aligned with MAFFT ginsi (Katoh and Standley 2013), respectively. Consensus sequences for each MITE family were constructed by the cons program implemented in EMBOSS (Rice et al. 2000). Kimura 2-parameter distance was calculated between the consensus sequence and each MITE sequence using DiStats to estimate the level of base substitution rate per site (Kimura 1980; Astrin et al. 2016).
One-to-one orthologs
To establish the orthologous relationships among Caenorhabditis nematodes, we sourced amino acid sequences of 11 species: Caenorhabditis briggsae, C. elegans, Caenorhabditis latens, Caenorhabditis nigoni, Caenorhabditis remanei, Caenorhabditis sinica, Caenorhabditis tropicalis (obtained from WormBase ParaSite version WBPS15), Caenorhabditis niphades (Sun et al. 2022), C. inopinata (GCA_003052745.1), Caenorhabditis sp. 44, and Caenorhabditis wallacei. For each species, proteins encoded by the longest isoforms were extracted. These protein sequences were then analyzed using OrthoFinder (Emms and Kelly 2019) to identify ortholog groups. Subsequently, the 11,264 one-to-one orthologs between C. elegans and C. inopinata were selected for the following analysis.
DEGs between species
We used RNA-sequencing data retrieved from GenBank (PRJDB14254) (Kawahara et al. 2023), and adapted their method for identifying interspecific differentially expressed genes (DEGs). The dataset includes wild-type C. inopinata females and C. elegans fem-1 (hc17) mutant hermaphrodites, which are feminized, allowing comparisons of gene expression within a consistent sexual state at both the L4 larval and young adult (YA) stages. The mRNA abundance in each sample was quantified using RSEM v1.3.0 (Li and Dewey 2011) with bowtie2 v2.3.5 (Langmead and Salzberg 2012) serving as the mapping program under default parameters. For detecting DEGs, we normalized the expected gene count data relative to gene length in C. inopinata compared to C. elegans. Pair-wise comparisons between species at each developmental stage were conducted using the Wald test in DEseq2 v1.36.0 (Love et al. 2014). In addition to DEseq2's automated criteria for the P-value calculation, we applied an additional filter to exclude genes with very low expression (sum of expected count < 10), resulting in the analysis of 11,103 one-to-one orthologs. Genes with an adjusted P-value < 0.01 and an absolute log2 (Fold Change) value greater than 1 were classified as interspecific DEGs. The list of DEGs was used to investigate the correlation between MITE insertions and gene expression levels.
Effects of MITE insertions on interspecific differential expression of neighboring genes
To assess if orthologs with insertions of mCi-hAT1 or mCi-hAT4 are more likely to be interspecific DEGs, we conducted Fisher's exact tests in R 4.2.1 using the fisher.test function in R 4.2.1 on 2 × 2 contingency tables, where orthologs were categorized based on the presence or absence of the MITE in C. inopinata and the difference in expression level between species. Orthologs with MITE insertion exclusive to C. inopinata were further divided into interspecific DEGs, either exhibiting higher or lower expression in C. inopinata compared to C. elegans. Similarly, orthologs without MITE insertions exclusive to C. inopinata were grouped. The 2-tailed tests were conducted to evaluate the significance of associations between MITE insertions in C. inopinata and variations in expression level in this species.
The above analyses were performed separately for interspecific DEGs at both the L4 larval and young adult stages. Moreover, the analyses were stratified by the site of MITE insertion (within 2-kb upstream of the CDS, intron, or within 2-kb downstream of the CDS) and by MITE type (mCi-hAT1 and mCi-hAT4). Statistical significance was set at P < 0.05.
Results
Identification of a spontaneous Bli mutant in C. inopinata
A spontaneous blistered (Bli) mutant was isolated from a laboratory culture of a wild-type C. inopinata strain, NKZ35 (Fig. 1a). This mutant exhibited blisters on the cuticle at the head in adult females and at the head and/or tail in adult males. The backcrossing of Bli worms with wild-type (WT) worms demonstrated that this Bli phenotype was caused by a single recessive mutation.

Identification of a novel active hAT transposon in C. inopinata. a) Adult female and male wild-type (WT) and blistered (Bli) mutants of C. inopinata. Arrowheads indicate blisters present on the cuticle. Scale bars represent 100 μm. b) PCR analysis of the Cin-bli-1 gene (Sp34_20288200) in the WT and Bli mutants. c) Schematic gene structure of Cin-bli-1 alleles. Compared to the reference allele (Kanzaki et al. 2018), tjTi120 and tj182 had insertions of 3,144 and 14 bp, respectively, within the third exon. Insertions are indicated in red font. d) DNA sequences of the Cin-bli-1 alleles. In the tjTi120 allele, 8-bp target site duplications (TSDs) and 18-bp terminal inverted repeats (TIRs) of the transposon Ci-hAT1 were found at both ends of the insertion. In the tj182 allele, 2 TSDs (with a single base deletion) and an additional 8-bp insertion were identified as the transposon footprint. The target site and TSDs are highlighted in red letters, and the transposon and the additional 8-bp sequence are shown in bold black letters. e) The predicted BLI-1 amino acid sequence encoded by each allele. Frameshift mutations and premature termination codons were observed in tjTi120 and tj182.
In C. elegans, several genes are known to cause the Bli phenotype (Brenner 1974; Adams et al. 2023). Sequencing the C. inopinata genomic regions corresponding to the C. elegans Bli-causing genes revealed that the C. inopinata Bli population carried 2 mutant alleles in the gene orthologous to C. elegans bli-1 (Sp34_20288200), now termed Cin-bli-1. These alleles, named tjTi120 and tj182, had insertions of 3,144 and 14 bp, respectively, in the exon 3 of Cin-bli-1 (Fig. 1, b, c, and d). The C. elegans bli-1 gene is known to encode a unique cuticular collagen, essential for strut formation in the medial layer of the adult cuticle (Adams et al. 2023).The insertions in Cin-bli-1 lead to frame shifts and premature termination of the protein (Fig. 1e).
Identification of an active hAT superfamily transposon in C. inopinata
The 3,144-bp insertion in the tjTi120 allele exhibited all the features typical of the hAT superfamily of Class II transposons (Atkinson 2015), including 8-bp TSDs, TIRs at both ends (Figs. 1d and 2a), subterminal repeats, and a putative transposase ORF (Fig. 2a). Consequently, we named this active transposon in C. inopinata as Ci-hAT1.

Characterization of the Ci-hAT1 transposon. a) Schematic diagram of the Ci-hAT1 transposon. Two highly conserved nucleotides and subterminal repeats in the 5′-TIR are shown in red and blue, respectively. b) Dot plot alignment of Ci-hAT1 sequences. c) Diagram showing subterminal repeats at both ends of Ci-hAT1. The repeats on the top and bottom strands of Ci-hAT1 are depicted by blue circles above and below the transposon, respectively. d) GC content of Ci-hAT1 sequence. The AT-rich regions at both ends are indicated by gray squares. e) Domain organization of the predicted Ci-hAT1 transposase (Sp34_10071920). The DDE motifs are marked in red. f) Alignment of the amino acid sequences of RNase H-like domains in 8 known active hAT transposases and 6 Caenorhabditis hAT transposases. The distances between conserved blocks are indicated by the number of amino acid residues. Conserved amino acids are indicated by arrowheads. g) Transcripts per million (TPM) represent the abundance of Sp34_10071920 at both developmental stages. The transposases used in this analysis are listed in Supplementary Table 4.
Ci-hAT1 is 3136 bp in length, with identical 18-bp TIR at both ends. These TIRs do not resemble those of previously reported TEs, but they do contain 2 highly conserved nucleotides in the hAT superfamily (Warren et al. 1994): A at position 2 and G at position 5 of the TIRs (Figs. 1d and 2a). The 300-bp regions neighboring TIRs are AT-rich and include the sequence 5′-CGAAAA-3′ repeated 42 times, which is regarded as a potential subterminal repeat required for transposase binding (Fig. 2, b, c, and d) (Kunze and Starlinger 1989; Kim et al. 2011; Hickman et al. 2014). The sole Ci-hAT1 transcript (Sp34_10071920) contains domains common to known hAT transposases: a BED-type zinc finger domain and a ribonuclease (RNase) H-like superfamily domain (Fig. 2e) (Yuan and Wessler 2011; Hickman et al. 2014). Nine amino acid residues within the RNase H-like superfamily, essential for the catalytic activity of transposase (Hickman et al. 2014), are conserved in Ci-hAT1 (Fig. 2, e and f). RNA-seq analysis in C. inopinata females indicated high expression of the Ci-hAT1 transcript in both L4 larval and adult stages (Fig. 2g).
The hAT transposons initiate transposition by excising from the genome along with a single-stranded nucleotide adjacent to both 5′-ends of the transposon. The excision sites are then repaired through nonhomologous end joining (NHEJ), creating unique transposon footprint sequences (Lazarow et al. 2013). The footprint of the tj182 allele seems to be the result of a single base deletion adjacent to both Ci-hAT1 ends and the insertion of an 8-bp palindromic sequence, which is complementary to the TSD (Fig. 1d).
Transposition of Ci-hAT1 occurs frequently
Remapping to the reference genome suggested that only ∼0.5% of nucleotide positions in the reference genome carry residual heterozygosity in the inbred line (Kanzaki et al. 2018). Therefore, to identify transposon translocation events, we compared the reference genome to our laboratory stock, which was derived from the same inbred line used to determine the reference genome.
In the reference genome of C. inopinata, we located only 1 copy of the Ci-hAT1 sequence [99.9% identity (3,133 bp/3,136 bp)] in an intergenic region on chromosome 1 (Fig. 3a). In the laboratory stock from which the tjTi120 allele originated (referred to as the WT population), we found an additional genomic variation. Besides the sequence present in the reference genome sequence (Fig. 3, a and b, fragments (ii) 6,627 bp and (iii) 5,477 bp), this population also contained an allele with a 10,707-bp deletion (named tj180) that encompassed the Ci-hAT1 sequence and 2 adjacent genes, Sp34_10071910 and Sp34_10072000 (Fig. 3, a and b, fragment (i) 708 bp). Single-worm PCR analysis confirmed the WT population was a mix of homozygotes and heterozygotes for the reference allele and the tj180 allele (Supplementary Fig. 1). Further, inverse PCR analysis revealed a subpopulation in the WT population harboring a Ci-hAT1 insertion in an intergenic region on chromosome 5 (tjTi121) (Fig. 3, c and d, Supplementary Fig. 2, a, b, and c).

Frequent transposition of Ci-hAT1 within the C. inopinata genome. a) Structure of 2 alleles around the Ci-hAT1 insertion site on chromosome 1. The position and distance of each primer set are indicated. The red arrowhead below the chromatogram indicates the ligation site after the deletion of 10,707 bp in tj180. b) PCR analysis of each primer set in (a) for the genome of the wild-type population. c) Sequence variation at each Ci-hAT1 insertion site. Ci-hAT1 are shown as white pentagons, with acute angles indicating the coding orientation of the transcript. The TSDs and additional sequences are shown in red and black, respectively. The deleted region in chromosome 1 is indicated by a gray square. The populations in which each allele was observed are indicated next to the sequences. d) Predicted order of Ci-hAT1 transpositions in the C. inopinata genome.
To track the transposition events of Ci-hAT1, we examined its genomic location in the Bli population, which contains Cin-bli-1(tjTi120) and Cin-bli-1(tj182). In chromosome 5 of the Bli population, besides the Ci-hAT1-insertion allele tjTi121 identified in the WT population, we discovered an allele (tj181) missing the Ci-hAT1 sequence (Supplementary Fig. 2c). This tj181 allele had a footprint containing TSDs with 1 base deletion and a palindromic duplication complementary to the TSDs (Fig. 3c). In the subset of the Bli population, another Ci-hAT1 insertion (tjTi122) was identified in an intergenic region on chromosome 3 (Fig. 3, c and d, Supplementary Fig. 2, b and d). The absence of tjTi122 alleles in the WT population (Fig. 3c, Supplementary Fig. 2d) suggests that the Ci-hAT1 insertion into chromosome 3 occurred after its excision from the Cin-bli-1 (tjTi120) site (Fig. 3d).
Collectively, these findings suggest a sequential transposition of Ci-hAT1 from chromosome 1 in the reference genome to chromosome 5, 2 (Cin-bli-1 gene), and 3 (Fig. 3d).
Phylogenetic analysis of the Caenorhabditis hAT transposons
Applying criteria that include the presence of a hAT transposase-coding sequence, TIRs, and TSDs, we succeeded in identifying 3 more hAT transposons in the C. inopinata reference genome, and named Ci-hAT2, Ci-hAT3, and Ci-hAT4 (Fig. 4). Ci-hAT2, Ci-hAT3, and Ci-hAT4 each have distinct TIRs, measuring 10, 15, and 18 bp, respectively, which differ from the TIRs found in Ci-hAT1 (Table 1). The critical amino acid residues necessary for transposable activity are conserved in the transposases of these elements (Fig. 2f). RNA-seq analysis in C. inopinata females indicated that the transposases of these 3 transposons are expressed in both the L4 larval and adult stages (Supplementary Fig. 3a). Contrary to Ci-hAT1, the ∼300-bp subterminal regions at both ends of Ci-hAT2, Ci-hAT3, and Ci-hAT4 are not AT-rich. Instead, they contain unique repeat sequences: TCGATT, CGGTTA, and AATCGA, respectively (Fig. 4a, Supplementary Fig. 3a). In Ci-hAT3, we found a tandem repeat consisting of 13 consecutive 51-bp sequences located upstream of the transposase-coding region (Fig. 4a, Supplementary Fig. 3a). The variation in the subterminal sequences among Ci-hAT1, Ci-hAT2, Ci-hAT3, and Ci-hAT4 suggests that these transposons are unlikely to cross-mobilize due to their distinct binding requirements for transposases.

Additional identification of hAT superfamily TEs in C. inopinata. a) Dot plot alignments of the Ci-hAT2, Ci-hAT3, and Ci-hAT4 sequences. b) PCR validation of the alleles at the Ci-hAT2, Ci-hAT3, and Ci-hAT4 insertion sites in the WT population. c) Sequences of the Ci-hAT2, Ci-hAT3, and Ci-hAT4 insertion sites. Transposons are indicated by white pentagons with acute angles indicating the coding orientation of the transcript. The TSDs are indicated in red. B.I.: before insertion.
Transposon . | Transposase . | Chr . | TSD . | 5′-TIR . | 3′-TIR . | Length . |
---|---|---|---|---|---|---|
Ci-hAT1 | Sp34_10071920 | 1 | GTGGAGAA | CAGTGTGTCGTTTCGAAA | TTTCGAAACGACACACTG | 3136 |
5 | CATAGTAA | |||||
2 | CCCACCAT | |||||
3 | TTTGTAAG | |||||
Ci-hAT2 | Sp34_40165700 | 4 | CTCACAAT | CAGTGTGTGT | ACACACACTG | 3151 |
Ci-hAT3 | Sp34_40305600 | 4 | CCTGAAAC | CAGTGTGCTTC5ACCG | CGGTGAAGCACACTG | 4057 |
Ci-hAT4 | Sp34_50247500 | 5 | GTCTCAAC | TAGTGTGTCGTGTCGATT | AATCGACACGACACACTA | 3198 |
Transposon . | Transposase . | Chr . | TSD . | 5′-TIR . | 3′-TIR . | Length . |
---|---|---|---|---|---|---|
Ci-hAT1 | Sp34_10071920 | 1 | GTGGAGAA | CAGTGTGTCGTTTCGAAA | TTTCGAAACGACACACTG | 3136 |
5 | CATAGTAA | |||||
2 | CCCACCAT | |||||
3 | TTTGTAAG | |||||
Ci-hAT2 | Sp34_40165700 | 4 | CTCACAAT | CAGTGTGTGT | ACACACACTG | 3151 |
Ci-hAT3 | Sp34_40305600 | 4 | CCTGAAAC | CAGTGTGCTTC5ACCG | CGGTGAAGCACACTG | 4057 |
Ci-hAT4 | Sp34_50247500 | 5 | GTCTCAAC | TAGTGTGTCGTGTCGATT | AATCGACACGACACACTA | 3198 |
Transposon . | Transposase . | Chr . | TSD . | 5′-TIR . | 3′-TIR . | Length . |
---|---|---|---|---|---|---|
Ci-hAT1 | Sp34_10071920 | 1 | GTGGAGAA | CAGTGTGTCGTTTCGAAA | TTTCGAAACGACACACTG | 3136 |
5 | CATAGTAA | |||||
2 | CCCACCAT | |||||
3 | TTTGTAAG | |||||
Ci-hAT2 | Sp34_40165700 | 4 | CTCACAAT | CAGTGTGTGT | ACACACACTG | 3151 |
Ci-hAT3 | Sp34_40305600 | 4 | CCTGAAAC | CAGTGTGCTTC5ACCG | CGGTGAAGCACACTG | 4057 |
Ci-hAT4 | Sp34_50247500 | 5 | GTCTCAAC | TAGTGTGTCGTGTCGATT | AATCGACACGACACACTA | 3198 |
Transposon . | Transposase . | Chr . | TSD . | 5′-TIR . | 3′-TIR . | Length . |
---|---|---|---|---|---|---|
Ci-hAT1 | Sp34_10071920 | 1 | GTGGAGAA | CAGTGTGTCGTTTCGAAA | TTTCGAAACGACACACTG | 3136 |
5 | CATAGTAA | |||||
2 | CCCACCAT | |||||
3 | TTTGTAAG | |||||
Ci-hAT2 | Sp34_40165700 | 4 | CTCACAAT | CAGTGTGTGT | ACACACACTG | 3151 |
Ci-hAT3 | Sp34_40305600 | 4 | CCTGAAAC | CAGTGTGCTTC5ACCG | CGGTGAAGCACACTG | 4057 |
Ci-hAT4 | Sp34_50247500 | 5 | GTCTCAAC | TAGTGTGTCGTGTCGATT | AATCGACACGACACACTA | 3198 |
We analyzed the genomic locations of Ci-hAT2, Ci-hAT3, and Ci-hAT4 in the reference genome of C. inopinata and compared them with the WT populations of our laboratory (Fig. 4, b and c, Supplementary Fig. 3b). For the Ci-hAT2 and Ci-hAT3 insertion loci, our analysis revealed 2 distinct sequences: one identical to the reference genome with the transposons present and another lacking the transposon and any associated footprint sequences. This suggests that the insertions of Ci-hAT2 and Ci-hAT3 into these loci occurred relatively recently. In contrast, the genomic site of Ci-hAT4 matched exactly with the reference genome, indicating a stable presence at this location.
In our survey of the 22 Caenorhabditis species with genome sequences available on WormBase ParaSite (http://parasite.wormbase.org) (Howe et al. 2017), we identified 2 genes homologous to the Ci-hAT1 transposase: CRE21737.1 from C. remanei and CSP29.g8031.t1 from C. becei. The predicted amino acid sequences of both genes include the 9 essential residues required for catalytic activity in the RNase H-like superfamily (Fig. 2f) (Hickman et al. 2014; Karakülah and Pavlopoulou 2018). However, we did not find TIRs or TSDs near these genes. This absence suggests that these regions in C. remanei and C. becei may not function as active transposons.
hAT transposons are categorically divided into 3 distinct groups: the Ac family, the Buster family, and the Tip family (Arensburger et al. 2011; Atkinson 2015). Through a maximum-likelihood phylogenetic analysis involving the 6 transposases detected in the Caenorhabditis genus (Ci-hAT1, Ci-hAT2, Ci-hAT3, Ci-hAT4, CRE21737.1, and CSP29.g8031.t1) and known hAT transposons, we found that these Caenorhabditis transposases all belong to the Ac family and form a monophyletic group. This suggests a common ancestral origin for these transposons within the ancestral Caenorhabditis species. It appears that most related species have subsequently lost this particular group of transposons, as evidenced by their current distribution and phylogeny (Fig. 5).

Phylogenetic tree of hAT superfamily transposases. Phylogenetic tree of transposase sequences from the Caenorhabditis genus and other groups of hAT elements. Sixty transposases were used to draw a maximum-likelihood tree. Bootstrap values are shown to the right of each branch. Each transposase name is colored according to the taxonomic group: red for animals, green for plants, and blue for fungi. The sequence of IpTip100 was used as the outgroup.
Two MITE families derived from Ci-hAT1 and Ci-hAT4 are highly expanded in the C. inopinata genome
MITEs are nonautonomous Class II transposable elements, typically originating from internal deletions in autonomous parent transposons (Feschotte et al. 2007; Fattash et al. 2013). These elements are characterized by having TIRs and TSDs at both ends, while lacking the coding capacity of transposases. They are generally short in length and appear in high copy numbers within genomes.
Kanzaki et al. (2018) used multiple TE identification tools to identify TEs in the C. inopinata genome, including RepeatModeler (v1.0.4, http://www.repeatmasker.org/RepeatModeler. html) and TransposonPSI (v08222010, http://transposonpsi.sourceforge.net), but this analysis did not detect Ci-hAT-like TEs in C. inopinata nor expansion of hAT TEs when compared to the C. elegans genome. Therefore, to specifically detect Ci-hAT-like TEs with higher sensitivity, we developed a step-by-step method to detect the presence of Ci-hAT-like MITEs in the C. inopinata genome. The first step was to identify sequences starting with CAGTGT (the initial 6 bases of the TIRs of Ci-hAT1, Ci-hAT2, and Ci-hAT3) or TAGTGT (the initial 6 bases of the TIRs of Ci-hAT4). These sequences served as the 5′-TIR tips, and we looked for their reverse complementary sequences within a span of 1,100 base pairs, which would represent the 3′-TIR tips. In the second step, we selected those specific sequences that were also flanked by perfectly identical 8-bp TSDs, classifying them as Ci-hAT-like MITEs.
In our analysis, we identified a total of 808 Ci-hAT-like MITEs extracted using the “CAGTGT” sequence and 240 using the “TAGTGT” sequence (Fig. 6a, Supplementary Fig. 4, a and b, Supplementary Tables 1 and 2). These MITEs accounted for 0.4% of the C. inopinata genome. The majority of their TSDs conformed to the Ac pattern (5′-nTnnnnAn-3′) (Fig. 6b) (Arensburger et al. 2011). Further examination revealed that the 30-bp regions of both ends of these MITEs corresponded closely to the TIRs and subterminal repeats of Ci-hAT1 and Ci-hAT4, respectively (Fig. 6, c and d). Additionally, BLAST analysis indicated that this high homology extended into the subterminal regions of Ci-hAT1 and Ci-hAT4 (Fig. 6, e and f). These findings suggest that the majority of these Ci-hAT-like MITEs likely originated from internal deletions, possibly through a mechanism such as abortive gap repair (Rubin and Levy 1997; Capy 2021), in the parent transposons Ci-hAT1 and Ci-hAT4 (Fig. 6g). Consequently, we have designated these MITEs as miniature Ci-hAT1 (mCi-hAT1) and miniature Ci-hAT4 (mCi-hAT4).

Two groups of Ci-hAT-derived nonautonomous TEs. a) Histogram showing the size distribution of elements extracted from the C. inopinata genome by “CAGTGT” search (red) or “TAGTGT” search (blue). b) Sequence logos of TSDs of “CAGTGT” and “TAGTGT”-extracted elements. (c, d) Sequence logos of 30 bp at both ends of “CAGTGT” (c) and “TAGTGT”-extracted elements (d). The black logos indicate the queries. Arrows and dotted lines above the logos indicate the range of the Ci-hAT1 or Ci-hAT4 TIRs and subterminal repeats, respectively. (e, f) Graphical distribution of blast hits in Ci-hAT1 and Ci-hAT4 (E-value < 0.01). A total 731 “CAGTGT”-extracted elements were plotted on the Ci-hAT1 (e) and 237 “TAGTGT”-extracted elements were plotted on the Ci-hAT4 (f). g) Comparison of Ci-hAT1 and mCi-hAT1 (left) and a hypothetical diagram of MITE formation by abortive gap repair (right). The formation of a stem-loop structure between the subterminal repeats of single-stranded Ci-hAT1 during gap repair causes only the terminal region of Ci-hAT1 to elongate in the homologous chromosomes and generate MITEs. Black triangles and gray boxes indicate TIRs and subterminal regions, respectively.
To determine if Ci-hAT-like MITEs were present in other species within the Caenorhabditis genus, we applied a similar approach. In C. elegans, C. briggsae, and C. remanei, our analysis identified 0, 41, and 97 “CAGTGT”-extracted, and 0, 0, and 3 “TAGTGT”-extracted Ci-hAT-like MITEs, respectively (Supplementary Fig. 4c). TSDs of the “CAGTGT”-extracted Ci-hAT-like MITEs in the C. briggsae and C. remanei genomes matched the Ac pattern (Arensburger et al. 2011), indicating their origins from hAT elements. However, the TIRs of these MITEs did not exhibit significant homology to the TIRs of Ci-hAT1, likely due to mutation accumulation over time (Supplementary Fig. 4d). This suggests that the ancestral species of the Caenorhabditis genus might have acquired hAT transposons through horizontal transfer, but most species within the genus appear to have lost these elements during their evolutionary history.
The chromosomal distribution of the 2 MITE families is different
The distribution of mCi-hAT1, mCi-hAT4, and protein-coding genes across the chromosome in the C. inopinata genome was analyzed using KDE (Fig. 7a). We found that the protein-coding genes were uniformly distributed along all chromosomes, indicating a widespread distribution throughout the genome. The density of mCi-hAT1 elements increased toward the ends of the chromosomes. mCi-hAT4 elements were also present on all chromosomes but exhibited 2 internal peaks or a single peak in their distribution on each chromosome (Supplementary Table 3). Notably, both types of MITEs were almost evenly distributed across the autosomes, but their presence on the X chromosome was markedly low (Fig. 7a, Supplementary Fig. 4, a and b).

Chromosomal distribution and evolutionary distances of mCi-hAT1 and mCi-hAT4 in the C. inopinata genome. a) The top row for each chromosome plot shows the histogram and kernel density estimation (KDE) of protein-coding genes (black), the middle row shows mCi-hAT1 (red), and the bottom row shows mCi-hAT4 (blue). The x-axis is scaled in megabase pairs (Mbp), and the y-axis on the left indicates the density (×10−7) of each element for the KDE plot, while the y-axis on the right indicates the count of each element for the histogram. KDE plots beyond the data limits have been truncated. The numbers at the top of each plot indicate the total number of the corresponding elements on each chromosome. The gray lines at the bottom of the plots for mCi-hAT1 and mCi-hAT4 represent the relative chromosome lengths, and the overlapping rug plots show the position of each MITE on the chromosome. b) The distribution of evolutionary distances of 2 MITE types, mCi-hAT1 (red) and mCi-hAT4 (blue), from their consensus sequences. The evolutionary distance on the x-axis is calculated using the Kimura 2-parameter model, indicating the level of base sequence variation per site from the consensus. The right y-axis shows the number of MITE sequences corresponding to each distance in the histograms, while the left y-axis shows the density as estimated by kernel density estimation (KDE).
Further analysis was conducted to identify genes near these MITEs. For the 808 mCi-hAT1 elements, 185 were located within 2-kb upstream of 211 genes, 207 were found within the 2-kb downstream of the 231 genes, 394 were located in the introns of the 358 genes, and 1 was in the exon of the pseudogene (Supplementary Table 4). Regarding the 240 mCi-hAT4 elements, 65 were within 2-kb upstream of the 81 genes, 85 were within 2-kb downstream of 100 genes, 96 were in the introns of 95 genes, and 1 was inserted in the exon of the pseudogene (Supplementary Fig. 5, Supplementary Table 5). These results indicate that 77.5% of mCi-hAT1 and 79.2% of mCi-hAT4 are located within or in close proximity to (within 2 kb of) annotated genes.
To estimate the insertion times of these MITEs, we calculated the evolutionary distances between each MITE sequence and its consensus sequence for both MITE families (Fig. 7b). The majority of mCi-hAT1 s clustered close to their consensus sequence, indicating small divergence within this family. In contrast, mCi-hAT4 showed a wide range of evolutionary distances, demonstrating significant sequence variability within this family. This suggests that mCi-hAT1, with its high copy number and limited divergence within the family, is a recently expanded, young MITE. On the other hand, mCi-hAT4, with fewer copies and greater divergence within the family, is likely to be an older MITE.
mCi-hAT1 or mCi-hAT4 insertions may affect gene expressions
TE insertions can influence regulatory evolution and the formation of novel transcriptional networks by integrating into gene regulation within a specific species (Qiu and Köhler 2020; Carelli et al. 2022). Accordingly, we analyzed interspecific DEGs between C. elegans and C. inopinata, focusing on one-to-one orthologs. We aimed to explore whether insertions of mCi-hAT1 or mCi-hAT4 only in C. inopinata affect the transcription patterns of adjacent genes compared to those in C. elegans. As a control for expression differences arising from sexual dimorphism, RNA-seq data from feminized C. elegans hermaphrodites [fem-1(hc17)] and wild-type C. inopinata females were used.
Of the 11,103 one-to-one orthologs identified, they were categorized as DEGs with higher expression in C. inopinata, DEGs with higher expression in C. elegans, or non-DEGs during the L4 larval and young adult (YA) stages (Fig. 8a). Analysis revealed that genes with mCi-hAT1insertions, particularly in intronic or upstream regions, were frequently identified as DEGs with elevated expression levels in C. inopinata. In contrast, these were less commonly DEGs with higher expression in C. elegans. Similarly, genes with mCi-hAT4 upstream also showed a significant number of DEGs with increased expression in C. inopinata. However, for genes with downstream MITEs insertions, no notable association with interspecific DEGs was found (Fig. 8b, Supplementary Fig. 6). This pattern suggests that genes with upstream or intronic MITE insertions are more likely to exhibit differential upregulation in C. inopinata compared to C. elegans. These findings imply that the insertions of mCi-hAT1 and mCi-hAT4 may contribute to the divergence in gene expression between these species.

Analysis of differentially expressed genes (DEGs) between C. elegans and C. inopinata. a) Interspecific DEGs in the one-to-one orthologs between C. elegans and C. inopinata at 2 developmental stages. Orange dots indicate interspecific DEGs highly expressed in C. inopinata (ino), yellow dots indicate interspecific DEGs highly expressed in C. elegans (ele), and gray dots indicate non-DEGs. b) The proportion of interspecific DEGs at the young adult stage with or without MITEs (mCi-hAT1 or mCi-hAT4) at each gene location (upstream, intron, and downstream). Each bar indicates genes with (with) and without (without) MITEs, and includes a number indicating the number of genes in each category. Yellow indicates interspecific DEGs highly expressed in C. elegans, orange indicates interspecific DEGs highly expressed in C. inopinata, and gray indicates all other genes (including non-DEGs). Odds ratios (OR) and P-values shown at the top of each section were calculated using Fisher's exact test to indicate statistical differences between categories; P-values < 0.05 are highlighted in red.
Discussion
Novel members of the hAT superfamily transposons in C. inopinata
In this study, we identified 4 novel hAT transposons, designated as Ci-hAT1, Ci-hAT2, Ci-hAT3, and Ci-hAT4, within the genome of C. inopinata (Figs. 2 and 4, Supplementary Fig. 3a and Table 1). To our knowledge, this marks the first identification of active autonomous hAT transposons within the Caenorhabditis genus. These transposons possess typical features of the hAT superfamily (Atkinson 2015), including 8-bp TSDs and a transposase ORF. However, their TIRs and subterminal repeats exhibit no significant homology to any previously identified transposons. Phylogenetic analysis indicates that these transposons form a distinct monophyletic group within the Ac family, suggesting an evolutionary lineage from a common ancestral transposon within the Caenorhabditis genus (Fig. 5).
In our laboratory cultures of C. inopinata over several years, we detected 3 sequential transpositions of Ci-hAT1 and 1 transposition each for Ci-hAT2 and Ci-hAT3 (Figs. 3, c and d, and 4, b and c). However, no transpositions were detected for Ci-hAT4 during this period. These observed transposition frequencies may reflect differing activity levels among these transposons. While the lack of detected Ci-hAT4 transpositions leaves its activity status uncertain, it could either be inactive or exhibit a very low frequency of transposition. Notably, the overall transposon activity in C. inopinata appears to be higher than in C. elegans, which might be linked to genetic alterations in siRNA pathways. Unlike C. elegans, C. inopinata is missing several orthologs of genes in the ERGO-1 class 26G siRNA pathway, including ergo-1, eri-6/7, eri-9, and nrde-3 (Kanzaki et al. 2018; Kawahara et al. 2023). Additionally, the C. inopinata ortholog of the mut-2, a gene known to suppress transposons in C. elegans (Collins et al. 1987; Tabara et al. 1999; Chen et al. 2005; Zhang et al. 2011), has a DNA transposon insertion within its coding sequence and is expressed at a significantly lower level compared to its C. elegans counterpart (Kawahara et al. 2023). These genetic distinctions could be contributing to the heightened transposon activities observed in C. inopinata.
MITEs derived from the hAT superfamily transposons
MITEs originate from autonomous elements or conventional nonautonomous elements, typically through mechanisms such as abortive gap repair (Rubin and Levy 1997; Capy 2021). Because of their smaller sizes and higher copy numbers, MITEs can transpose and amplify more efficiently than their related autonomous elements (Yang et al. 2009). In the C. inopinata genome, this pattern is evident. We found only a single copy of each of the 4 identified Ci-hAT transposons, while the 2 MITE families, mCi-hAT1 and mCi-hAT4, presumably derived from these Ci-hAT transposons, exist in over 1,000 copies (Fig. 6). Although some MITEs can be cross-mobilized by related but non-cognate autonomous elements (Yang et al. 2009), the significant differences in the subterminal regions between these MITEs and their putative parent transposons suggest that they do not share the same autonomous elements for transposition (Figs. 2, b and d, and 4a, Supplementary Fig. 3a). Unlike Ci-hAT1 and Ci-hAT4, Ci-hAT2 and Ci-hAT3, despite being active, have no associated MITEs detected. This indicates that the ratio of MITEs to their corresponding autonomous transposons can vary markedly within the same genome. This variation of copy numbers of autonomous transposons and their MITEs is also observed in other organisms. For instance, in the Nipponbare rice strain, the DNA transposon Ping, a member of the PIF/Harbinger superfamily, is present in fewer than 7 copies, whereas associated MITEs (mPing) number around 50 copies (Monden et al. 2009). In the same strain, the autonomous Dart1 transposon of the hAT superfamily is present in 38 copies, while the corresponding MITEs (nDart1) are present in only 13 copies (Fujino et al. 2005; Shimatani et al. 2009). The reasons for such wide disparities in the ratios between autonomous elements and their nonautonomous counterparts remain unclear, underscoring the need for further research to understand the dynamics of TE transposition and proliferation.
Relationship between chromosomal distributions and activities of MITEs
The genomic distribution of TEs is influenced by a variety of factors, including the properties of the TEs themselves (Yoshida et al. 2017), recombination rate (Rizzon et al. 2002), gene density (Medstrand et al. 2002), and chromatin states (Peng and Karpen 2008). In the Caenorhabditis genus, each chromosome typically has a high gene density in its central regions and a low density in its arm regions (Kamath et al. 2003; Hillier et al. 2007; Spieth et al. 2014; Teterina et al. 2020; Woodruff and Teterina 2020). The hAT TEs, including MITEs, in this genus are commonly concentrated on the chromosome arms, consistent with the concept of background selection (Cutter and Payseur 2003; Rockman and Kruglyak 2009; Rockman et al. 2010; Andersen et al. 2012; Woodruff and Teterina 2020). In contrast to other Caenorhabditis species, where gene enrichment in central chromosomal regions is more pronounced, C. inopinata shows a less distinct pattern (Fig. 7a) (Woodruff and Teterina 2020). However, mCi-hAT1 in C. inopinata is distributed predominantly on the chromosome arms, mirroring the distribution seen in other Caenorhabditis hAT elements, and peaking at the regions with lower gene density. On the other hand, mCi-hAT4 demonstrates a more even distribution across most chromosomes (Fig. 7a).
We speculate that the distinct distribution patterns of mCi-hAT1 and mCi-hAT4 in the C. inopinata genome are influenced by the activity levels of their corresponding parental transposons. Specifically, mCi-hAT1 may have a higher amplification rate due to the active nature of its parent transposon, Ci-hAT1, while the less active Ci-hAT4 may contribute to a lower propagation rate of mCi-hAT4. Consistently, minimal sequence divergence is observed within the mCi-hAT1 family, and considering the detection of 3 recent transpositions of Ci-hAT1, it is plausible that mCi-hAT1 is currently undergoing an active TE burst phase—the rapid amplification of TE copies throughout a genome (Fig. 7b) (Belyayev 2014). In contrast, the greater sequence divergence within the mCi-hAT4 family suggests that it is an older, inactive TE. Chromosome arms, known for their higher recombination rates and consequently increased frequency of single nucleotide variations (SNVs) (Koch et al. 2000; Cutter and Payseur 2003; Hillier et al. 2007; Rockman et al. 2010), tend to accumulate mutations in TEs more than central chromosomal regions. This accumulation can lead to a gradual loss of inactive MITEs, such as mCi-hAT4, in the arms, eventually resulting in their relative concentration in central chromosomal regions. This hypothesis is consistent with observed patterns of greater variation in TE insertion age at the central regions of chromosomes compared to the arms in C. inopinata (Woodruff and Teterina 2020). Therefore, it appears that a combination of factors, including TE activities, regional recombination frequencies, and background selection processes, contribute to the differing distributions of mCi-hAT1 and mCi-hAT4.
The copy numbers of both mCi-hAT1 and mCi-hAT4 were notably lower on the X chromosome compared to the autosomes in the C. inopinata genome (Fig. 7, and Supplementary Fig. 4, a and b). This pattern resembles observations made for certain MITE families in C. elegans (Surzycki and Belknap 2000). The most plausible explanation for this difference is the unique chromatin structure of the X chromosome, which is affected by dosage compensation mechanisms. In C. elegans and C. briggsae, the dosage compensation complexes (DCC) bind to the hermaphrodite's 2 X chromosomes, resulting in chromatin compression and subsequently reducing gene expression by half on each X chromosome (Meyer 2022; Yang et al. 2023). Such condensed chromatin would likely present a less accessible environment for transposon integration. Supporting this hypothesis, in Drosophila melanogaster, where the X chromosome does not undergo extensive condensation (Lucchesi and Kuroda 2015), the distribution of DNA transposons shows no significant disparity between autosomes and sex chromosomes (Kaminker et al. 2002). Thus, if that C. inopinata employs a gene dosage compensation mechanism similar to that of C. elegans and C. briggsae, the considerably lower presence of MITEs on the X chromosome as compared to autosomes is likely attributed to the specialized chromatin structure unique to the X chromosome.
The correlation of mCi-hAT1 and mCi-hAT4 insertions and upregulation of neighboring genes
Our interspecific comparative gene expression analysis indicated that genes with mCi-hAT1 or mCi-hAT4 insertions in their upstream regions or introns tend to exhibit higher expression in C. inopinata (Fig. 8, Supplementary Fig. 6). These findings are consistent with a previous study reporting that TE insertions correlate with upregulated gene expression in C. inopinata, in contrast to their downregulatory effect in C. elegans (Kawahara et al. 2023). The causal relationship between the increased gene expression and MITE insertions is currently unclear. One possibility is that MITEs may be preferentially inserted into the open chromatin regions where transcription levels are already high. Alternatively, some MITE insertions may have triggered the increase of transcription levels of neighboring genes in C. inopinata. We speculate that the latter is more likely because MITE distribution is concentrated on chromosomal arms, regions typically associated with lower expression levels.
The interspecific differences in transcription levels associated with MITE insertions may be linked to the different status of transposon silencing. In C. elegans, TEs are effectively silenced, primarily through small RNA-dependent mechanisms (Malone and Hannon 2009). However, this level of silencing is not observed in C. inopinata, in which several siRNA pathway genes present in C. elegans, such as ergo-1, eri-6/7, eri-9, and nrde-3, are notably disrupted by TE insertions or absent (Kanzaki et al. 2018; Kawahara et al. 2023). This absence of part of small RNA pathways could directly or indirectly influence transposon silencing mechanisms. In C. elegans, the robust siRNA-dependent transposon silencing mechanisms tend to suppress the expression of genes harboring TEs (Guang et al. 2008; Padeken et al. 2021). In contrast, in C. inopinata, newly inserted TEs are not subjected to such stringent silencing, potentially leading to the upregulation of adjacent gene expression.
Insertions of mCi-hAT1 and mCi-hAT4 into the upstream region of genes or their introns were associated with increased gene expression, whereas insertions into downstream regions were not (Fig. 8, Supplementary Fig. 6). It has been shown that TEs can introduce novel regulatory sequences that affect the expression levels of adjacent genes (Chuong et al. 2017). For instance, in C. elegans, the m1m2 motif in the TIR of CELE2, a species-specific MITE, has been co-opted as a germline-specific promoter (Carelli et al. 2022). Similarly, in C. inopinata, mCi-hAT1 and mCi-hAT4 might have been integrated into the transcriptional network as new cis-regulatory elements, thereby influencing gene expression.
In summary, our study has identified active hAT transposons and MITEs in C. inopinata. The distribution of each MITE family appears to be influenced by their transposition activities. MITEs located in the upstream gene regions and introns tend to be associated with upregulated gene expression, compared to the corresponding genes in C. elegans. It is conceivable that the de-silenced TEs in C. inopinata contributed to their distinct phenotypic traits among the known Caenorhabditis species, including C. elegans. These findings provide further insights into the involvement of TEs in shaping genome architecture and transcriptional networks across different species.
Data availability
The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, tables, figures, and Supplementary material.
Supplemental material available at GENETICS online.
Acknowledgments
We thank Prof. Taisei Kikuchi, Prof. Takashi Makino, and members of the Sugimoto lab for the helpful discussion. Some strains were provided by the CGC, which is funded by NIH Office of Research Infrastructure Programs (P40 OD010440).
Funding
This work was supported by Japan Science and Technology Agency (JST), Core Research for Evolutionary Science and Technology (CREST) Grant Number JPMJCR18S7, Japan (AS), and JST, Support for Pioneering Research Initiated by the Next Generation (SPRING), Grant Number JPMJSP2114 (RH).
Literature cited
Author notes
Conflicts of interest. The author(s) declare no conflicts of interest.