-
PDF
- Split View
-
Views
-
Cite
Cite
Lorena Ancona, Flávia A Nitta Fernandes, Roberto Biello, Andrea Chiocchio, Tiziana Castrignanò, Marco Barucca, Daniele Canestrelli, Emiliano Trucchi, Evolutionary Dynamics of Transposable Element Activity and Regulation in the Apennine Yellow-Bellied Toad (Bombina pachypus), Genome Biology and Evolution, Volume 17, Issue 4, April 2025, evaf062, https://doi.org/10.1093/gbe/evaf062
- Share Icon Share
Abstract
Transposable elements (TEs) expansion and accumulation represent one of the main drivers of genomic gigantism. Different host genome silencing mechanisms have evolved to counteract TE amplification, leading to a genomic arms race between them. Nevertheless, the evolutionary relationship between TEs and host genome silencing pathways remains poorly understood. Here, we investigate the activity of TEs and TE-silencing mechanisms in somatic and germline tissues of Bombina pachypus, a 10 Gb anuran genome. Our findings reveal a higher activity of TEs in the gonads compared to the brain, with retrotransposons as the most active class in both gonads (∼15% increased expression compared to the brain) and DNA transposons showing a 2-fold higher activity in ovaries. However, analysis of differentially expressed TEs between male and female gonads revealed a greater number of overexpressed TEs in testes (231 vs. 169), with maximum fold changes up to 22 in testes versus 8 in ovaries. This suggests a more permissive environment for TE expression in male gonads. Accordingly, increased activity of TE-silencing pathways was observed in ovaries compared to testes, with the KRAB-ZFP complex showing not only the highest overall expression levels but also a distinct ovary-specific expression pattern. Summarizing, while the higher TE activity in the male gonad may result from the lower efficiency of the KRAB-ZFP complex, the elevated activity of KRAB-ZFPs in ovaries, along with growing evidence of the functional role of TEs in the germline, suggests the existence of a broad range of host–TE dynamics going beyond the arms race model.
Transposable elements (TEs) are major drivers of genome size expansion and evolution. They constitute a significant fraction of vertebrate genomes, showing considerable variability in diversity and abundance across lineages due to their dual impact on host genomes and the different host strategies evolved to suppress TE mobilization. Nevertheless, the activity of TEs in large genomes remains poorly investigated, representing a crucial avenue for understanding the evolutionary dynamics between host genomes and their mobile elements. We explored the activity of TEs and TE-silencing mechanisms in both somatic and germline tissues of Bombina pachypus, an anuran species with a 10 Gb genome, shedding light on this complex coevolutionary relationship.
Introduction
Transposable elements (TEs) are DNA sequences that replicate and mobilize within a host genome, influencing genome architecture and evolution. Long considered junk DNA, their high evolutionary potential is increasingly being discovered (Frank et al. 2022; Choudhary et al. 2023). Transposons can have a double impact on the host genome: from the disruption of coding regions and genomic rearrangements via ectopic recombination to being an important source of genomic novelty by acting as gene regulatory elements and generating new genes through domestication (Chuong et al. 2017; Schrader and Schmitz 2019).
Intriguingly, TEs can be the drivers of genome expansion, playing a major role in genome size variation (Chalopin et al. 2015; Sotero-Caio et al. 2017). In particular, a positive correlation has been detected between the accumulation of specific TE families and species with very large genomes, with notable examples among vertebrates (Sun et al. 2012; Meyer et al. 2021). However, genomic TE abundance alone provides limited insight into these elements' past and recent amplification history, as it does not unveil the evolutionary dynamics between TEs and their host genome. In addition, while the characterization of TE diversity and abundance is now an important step in genome annotation, the expression of TEs is poorly studied in large genomes (Rogers et al. 2018; Carducci et al. 2021; Wang, Itgen et al. 2021; Wang et al. 2023). It is therefore necessary to study TE activity and its relationship to genome size to understand the dynamics of transposon expansion.
The concept of TE activity is often misunderstood as it encompasses various interconnected levels. The first one is TE transcription, which can be initiated with an internal TE promoter or as co-transcription within a hosting gene (i.e. the gene with the TE insertion), serving as a prerequisite for TE mobilization (Lanciano and Cristofari 2020). Subsequently, translation of the different TE protein domains becomes necessary to enable their enzymatic activity and successive mobilization. In this paper, we refer to TE activity as TE transcription, which represents the primary step in investigating TE activity, as it constitutes the initial phase of TE mobilization.
An inverse relationship between genome size and the proportion of transcriptionally active TE copies has been detected in previous simulation studies, suggesting that genomes with more TE copies have a lower number of active TE families, due to the competition between different elements (Kijima and Innan 2013; Boissinot and Sookdeo 2016). Moreover, the more the TE copies increase, the more the host genome is activated to control and limit their expansion, for instance, by degradation of TE transcripts through RNA silencing, thus resulting in fewer active sequences (Roessler et al. 2018). In this genomic arms race, the host genome has evolved different mechanisms to repress TE mobilization.
In the germline, transposons find a breeding ground for expansion and transmission to the next generation, facilitated by the deletion of global methylation patterns that determine cellular potency during germ-cell development. The PIWI-interacting RNA pathway is the specific safeguard in germ cells to maintain genome integrity, which acts both at the transcriptional level through methylation of target TEs and at the posttranscriptional level through degradation of target TE transcripts (Iwasaki et al. 2015). In addition to the small RNA silencing pathways, the Krüppel-associated box domain zinc finger proteins (KRAB-ZFPs), the largest family of transcription factors in vertebrates, together with the nucleosome remodeling and deacetylase (NuRD) complex, play an important role in transcriptional repression (Ecco et al. 2017). After having emerged in the last common ancestor of tetrapods, KRAB genes seem to have co-opted retrotransposon regulatory sequences to control and involve them in transcriptional regulatory networks (Bruno et al. 2019; Playfoot et al. 2021).
Nevertheless, the relationship between transposon expansion and TE silencing among vertebrates is complex, with studies often revealing different patterns and dynamics among species (Carducci et al. 2021; Liu et al. 2022; Wang et al. 2023). Particularly in large genomes, it remains unclear whether the intense activity and amplification of TEs result from a lower efficiency of silencing mechanisms and whether distinct mechanisms and dynamics operate in different tissues.
Amphibians represent one of the groups with the largest genomes in the animal kingdom, exhibiting highly variable genome sizes among the three orders (ranging from 1 Gb in the anuran Platyplectrum ornatum to 120 Gb in the urodel Necturus lewisi), and an exceptionally high abundance of repetitive elements. This makes them an excellent model for studying transposon dynamics (Wang, Itgen et al. 2021; Haley and Mueller 2022).
Here, we investigate the expression and regulation patterns of TEs in the Apennine yellow-bellied toad (Bombina pachypus), an anuran species endemic to the Italian peninsula, showing one of the largest genome sizes among the Anura order (approximately 10 Gb). B. pachypus is listed as an endangered species because of habitat loss, climate change, and vulnerability to chytridiomycosis, an amphibian-specific fungus infection (Barbieri et al. 2004; Andreone et al. 2009; Canestrelli et al. 2013). The species is known to have suffered a drastic population decline of more than 50% across its range, with the exception of Southern Italy populations that showed the highest levels of intrapopulation genetic variation (Zampiglia et al. 2019; Martino et al. 2022).
Using transcriptomic data from somatic and germline tissues, we investigate the complex dynamics acting between transposons and the host genome, aiming to answer the following questions: (i) What are the transcriptional activity patterns of TEs in the large genome of B. pachypus, and which TE families exhibit the highest expression levels? (ii) Are distinct patterns of TE expression between somatic and germline tissues determining a preferential pathway for TE expression and, likely, propagation? (iii) What is the contribution of TE-silencing gene pathways in the different tissues? Are there tissue-specific control systems to counteract TE expansion?
Results
TE Expression
We initially obtained a concatenated transcriptome of the brain and gonads tissues with 1,738,562 contigs. To improve the quality of our assembly, we filtered out contigs with an expression level of less than 1 transcript per million (TPM), obtaining a final transcriptome assembly with 102,138 contigs. BUSCO assessment showed a completeness score of 88.3%, with 51.3% represented as single-copy and 37.0% as duplicated (S, 51.3%; D, 37.0%).
Out of 102,138 contigs, 81,589 (79.88%) were annotated as repeat elements: of these repeat-annotated contigs, 67% contain up to three repeat elements annotated on the same contig, while the remaining 33% contain more than three (supplementary fig. S1, Supplementary Material online; supplementary table S1a, Supplementary Material online). Additionally, 78% of the contigs contain up to three different families of repeats, while the remaining 22% contain more than three (supplementary fig. S2, Supplementary Material online; supplementary table S1c, Supplementary Material online). The number of contigs annotated to each repeat family is shown in supplementary fig. S3, Supplementary Material online.
Our TE detection and expression pipeline identified 22 active superfamilies in the three different tissues of B. pachypus (brain, testes, and ovaries). When estimating individual TE expression in each tissue, we found higher average TE expression in ovaries and testes compared to the brain (Fig. 1; supplementary table S2, Supplementary Material online). In particular, retrotransposons were the most active class in both ovaries and testes (average normalized counts: ovaries 706,236; testes 701,833; brain 602,183), with the LTR/Gypsy family showing the highest expression, followed by the L1 family. On the other hand, DNA transposons were the most active class in ovaries (average normalized counts: ovaries 563,452; testes 325,878; brain 270,719), with the highest expression in DNA/TcMar and DNA/hAT families.

Average expression levels of TEs in B. pachypus: TE counts normalized using the TMM method. Total_DNATE, total of all DNA transposons; Total_RetroTE, total of all retrotransposons (from LTR/nc to PLE families); B, brain (blue); T, testes (green); O, ovaries (pink). Nc, unclassified elements at the family level.
The female individual BP402 showed very high peaks in some TE families, particularly for the brain tissue, which was found to be an outlier in the MDS plot (supplementary fig. S4, Supplementary Material online) despite the different normalization methods tested. In addition, we characterized TE expression without BP402 and obtained the same global expression pattern (supplementary fig. S5, Supplementary Material online). In comparison to the other samples, BP402 was sampled during a different breeding season, which may have had an effect on the observed expression patterns.
By analyzing patterns of expression between testes and ovaries, we identified a total of 400 differentially expressed TEs (logFC ≥2): 231 overexpressed in testes, while 169 overexpressed in ovaries. In particular, the testes exhibited a subset of highly overexpressed TEs with logFC values reaching 22, predominantly represented by retroelements, compared to a maximum logFC of 8 in the ovaries (Fig. 2; supplementary table S3, Supplementary Material online). Of the overexpressed transposons in the testes: 53% were retrotransposons, 42% were DNA transposons, and the remaining 5% were unknown. In the ovaries, proportions were slightly different: 38% were retrotransposons, 53% were DNA transposons, and the remaining 10% were unknown. The brain showed no significant differentially expressed TEs (DE-TEs) between male and female individuals.

Volcano plot showing the overexpressed DE-TEs between testes and ovaries: overexpressed TEs in ovaries (left) and in testes (right). Dotted lines indicate logFC ±2.
TE-Silencing Gene Pathway Expression and Dynamics
We then investigated whether there were different expression dynamics of TE-silencing gene pathways between the brain and gonads in B. pachypus.
We observed tissue-specific strategies in the gonads, with high activity observed in most of the TE-silencing genes that we analyzed, with the exception of five Argonaute pathways genes (AGO1,2,4, DICER, and DROSHA), which exhibited low expression across all tissues (Fig. 3; supplementary table S4, Supplementary Material online).

Average expression levels of key genes involved in negative regulation of TE activity in B. pachypus: TE-silencing gene counts normalized using the TMM method. B, brain (blue); T, testes (green); O, ovaries (pink).
Regarding the other Argonaute proteins, which are essential components of the RNA-induced silencing complex (RISC) (Peters and Meister 2007), the PIWI family (which is specifically expressed in germ cells), was expressed in both the ovaries and the testes. PIWIL1 and PIWIL2 were active in both gonad tissues, while PIWIL4 displayed higher expression levels in testes, being the silencing gene with the greatest fold change (5.6 logFC) between testes and ovaries (Fig. 4; supplementary table S5a, Supplementary Material online). In addition, genes involved in the primary and secondary biogenesis of piRNAs (the endonuclease PLD6, the transposon silencer MAEL, and the histone methyltransferase SETDB1) exhibited greater expression levels in gonad tissues compared to the brain. Specifically, all three of these genes together with PIWI genes showed elevated expression and were differentially expressed between brain and gonad tissues (Fig. 5; supplementary table S5b and c, Supplementary Material online).

Volcano plots showing the overexpressed DEGs (TE-silencing gene pathways) between testes and ovaries: overexpressed DEGs in ovaries (left) and testes (right). Dotted lines indicate logFC ±2.

Volcano plots showing the overexpressed DEGs (TE-silencing gene pathways) between brain and gonads: overexpressed DEGs in gonads (left; orange, ovaries; blue, testes) and brain (right; orange, brain female; blue, brain male). Dotted lines indicate logFC ±2.
Considering the transcriptional sequence-specific TE repression, the KRAB-ZFP emerged as the complex with the highest activity in the gonads of B. pachypus, showing markedly elevated expression in the ovaries. This ovary-specific high expression encompassed all the genes associated with the complex, including the scaffold protein Trim28/KAP1, the histone methyltransferase SETDB1, the heterochromatin proteins HP1(a, b, g), and the maintenance DNA methyltransferases DNMT1. All these genes, with the exception of HP1g, were significantly overexpressed in the ovaries compared to the testes. Furthermore, the de novo methyltransferase DNMT3A, although showing low expression levels, was significantly overexpressed in the testes compared to the ovaries (Fig. 4; supplementary table S5a, Supplementary Material online).
An increased expression of the NuRD complex, which is likewise linked with the KRAB-ZFP complex, was detected in gonad tissues compared to the brain. CHD5 and p66beta transcripts were an exception, showing a brain-specific higher expression compared to the gonads. When examining DEGs between testes and ovaries, the majority of NuRD genes exhibited significant overexpression in ovaries, with only three (CHD3, CHD5, and MBD2) displaying overexpression in testes (Fig. 4; supplementary table S5a, Supplementary Material online).
Discussion
While it is now well established that one of the drivers of genomic gigantism is the expansion and accumulation of transposons, the genomic abundance of TE copies alone does not capture the complex dynamics and conflicts occurring between the host genome and their mobile elements. This is because a significant portion of transposable elements in large genomes consists of fossil, truncated, and degenerated copies that can no longer be actively mobilized. Such phenomenon is common both in plants and animal genomes (Novák et al. 2020; Wang, Wang et al. 2021).
Hence, within the scope of this study, we explored the activity of transposable elements and the host genome safeguard system in both somatic and germline tissues of B. pachypus, to deepen our understanding of TE expansion in large genomes.
Sex-Specific TE Activity in the Germline
Transposable elements exhibited higher gonad-specific activity in B. pachypus compared to the brain. In addition, TEs displayed a slight sex-biased expression pattern, with the testes being characterized by a higher activity of retrotransposon families (LTR/Gypsy), while the ovaries showed a prevalence of DNA-TE families activity (DNA/hAT and DNA/TcMar) (Fig. 1).
Nevertheless, in further exploring the TE activity between the two sexes, we found that the testes exhibited a significantly higher number of overexpressed TEs and substantially larger fold changes compared to the ovaries (Fig. 2). Notably, a subset of elements, predominantly from the LTR order, displayed substantially higher relative expression in testes than ovaries. This is exemplified by the maximum observed logFC values, reaching 22 in the testes, while remaining significantly lower in the ovaries, with a maximum of 8. This striking difference underscores the greater transcriptional activity of these elements in the male gonad.
These findings highlight the male gonad as a notably permissive environment for TE expression, a pattern consistent with prior findings in the salamander Ranodon sibiricus (Wang et al. 2023), in the two grasshopper species Locusta migratoria manilensis and Angaracris rhodopa (Liu et al. 2022), and also in Drosophila melanogaster (Lawlor et al. 2021). In particular, these previous studies revealed how different dynamics operate and sometimes cooperate, contributing to increased TE expression in the male gonad.
In D. melanogaster, Lawlor et al. (2021) demonstrated that higher TE expression in males results from distinct TE dynamics on the sex-limited chromosome. Specifically, they reported increased TE expression in primary spermatocytes, which was co-expressed with Y-linked fertility factors. This was further confirmed by substantially higher TE copy numbers in males compared to females, as predicted by TE insertions located on the Y-chromosome.
This enhanced TE expression in the testes may be associated with the accumulation of TEs on the Y-chromosome, as a consequence of the suppression of recombination (Muller's ratchet process) (Peona et al. 2021). Such a scenario is possible for B. pachypus, which has heteromorphic sex chromosomes with a male heterogamety system (XY). However, to explore this hypothesis, further analyses of the sex chromosome in B. pachypus using genomic data will be required.
Interestingly, in the salamander R. sibiricus, which has homomorphic sex chromosomes, Wang et al. (2023) similarly observed a 2-fold higher TE expression in the testes compared to the ovaries. This male-biased expression was related to higher expression of piRNA pathway genes, which was proposed as the main silencing mechanism acting in the male gonad, in contrast to the KRAB-ZFP complex, identified as the primary repressive mechanism in the female gonad.
In a broader study, Wang et al. (2024) investigated TE expression, piRNA, and KRAB-ZFP-mediated repression pathways across six large salamander genomes with genome sizes ranging from 21.3 to 49.9 Gb. While they similarly reported higher TE expression in male gonads across all species, this was primarily attributed to reduced suppression by the KRAB-ZFP complex in the testes compared to the ovaries, rather than to differences in piRNA pathway activity. Consistent with these findings, we observed similar patterns of TE expression and KRAB-ZFP-mediated regulation in B. pachypus. This suggests that the differential regulation of TE activity between male and female gonads may represent a conserved feature among amphibians, potentially reflecting shared evolutionary dynamics between TE expansion and host genome repression strategies.
Complex Interplay Between TEs and the KRAB-ZFP System
B. pachypus exhibited remarkable gonad-specific expression of distinct gene pathways involved in TE regulation within its large genome (Fig. 3). Considering that the germline is the principal route for the propagation and inheritance of TEs in subsequent generations, enhanced silencing defense systems are expected, especially in genomes with a large amount of TEs.
In line with these expectations, a greater expression of TE-silencing genes was observed in the ovaries compared to the testes of B. pachypus, which aligns with the higher TE expression detected in the testes.
Among the TE-silencing gene pathways, the Krüppel-associated box zinc finger protein (KRAB-ZFP) complex showed not only the highest overall expression levels but also a distinct ovary-specific expression pattern in B. pachypus (Fig. 3). This pattern is highlighted by the markedly elevated expression of several core components of the KRAB-ZFP silencing complex within the female gonads. The complex initiates its silencing function through the recruitment of the scaffold protein TRIM28/KAP1 by the KRAB domain, which subsequently recruits heterochromatin-inducing factors. These include the histone methyltransferase SET domain bifurcated 1 (SETDB1), responsible for catalyzing the trimethylation of lysine 9 on histone H3 (H3K9me3), the heterochromatin proteins (HP1), and the DNA methyltransferases (DNMTs) that facilitate heterochromatin formation and transcriptional repression, along with the chromatin remodeling and histone deacetylase NuRD complex (Ecco et al. 2017).
Importantly, the strong directionality of the complex in the female gonads is further corroborated by its significant overexpression in the ovaries compared to the testes, as evidenced by the differential silencing pathways expression analysis (Fig. 4). The specific overexpression of the KRAB-ZFP complex in the ovaries, coupled with the increased TE activity observed in the testes, suggests a negative feedback dynamic between TE activity and host genome silencing mechanisms in the germline. The intensified host genome defense in the female gonads may contribute to this dynamic, with tighter control over TE expression in the ovaries in contrast with increased TE activity in the testes, where such elements are under comparatively weaker regulation.
Similar ovary-specific expression of the KRAB-ZFP complex has also been reported in the African lungfish (Wang, Wang et al. 2021) and two large salamander genomes: Cynops orientalis and Ranodon sibiricus (Carducci et al. 2021; Wang et al. 2023), highlighting a potential role for the KRAB-ZFP complex in ovary-specific regulatory mechanisms.
However, growing evidence also suggests that the diversification and co-evolution of zinc finger genes and TEs across metazoans cannot be fully explained by the arms race model alone, suggesting the possibility of a more nuanced hybrid model (Rosspopoff and Trono 2023). This model is supported by several evidences: many fossil TEs remain targeted by KRAB-ZFPs even millions of years after losing the ability to transpose (Imbeault et al. 2017), and some TEs display evidence of positive selection for KRAB-ZFPs recruitment, with observed interactions between KRAB-ZFPs and other transcription factors involved in cell-cycle regulation, genomic imprinting, and CTCF binding sites (Schmidt et al. 2012; Helleboid et al. 2019; Takahashi et al. 2019; Choudhary et al. 2020). Furthermore, many KRAB-ZFPs have acquired alternative TRIM28-independent functions, through the evolution of variant KRAB domains that do not interact with TRIM28 and are devoid of repressor activity (Helleboid et al. 2019; Tycko et al. 2020). This strongly suggests additional roles for the KRAB-ZFP complex that extend beyond acting solely as transcriptional repressors.
Moreover, it is now well established that TEs play a role in gene regulatory networks and KRAB-ZFPs are increasingly recognized as pivotal elements in the domestication of TE-embedded regulatory sequences (TEeRS), facilitating their integration into the host genome to rewire gene regulatory networks, with potential impacts on organismal development and physiology (Lynch et al. 2011; Ecco et al. 2017). As previously mentioned, KRAB-ZFPs and TEs are involved in different regulatory processes within the germline, including gametogenesis, gastrulation, and embryonic genome activation (Pontis et al. 2019; Iouranova et al. 2022; Xiang et al. 2022).
In line with these additional functions, the increased activity of KRAB-ZFPs in female gonads of B. pachypus could also be linked to the involvement of these transcriptional gene pathways in oogenesis, as maternal effect genes that regulate oocyte gene expression. Several studies have also demonstrated the crucial role of SETDB1 and other KRAB and NuRD-related genes in oocyte meiotic progression and embryonic development in different species (Clough et al. 2014; Brici et al. 2017; Stäubli and Peters 2021).
Finally, a recent study has proposed an alternative function for KRAB-ZFPs as potential genome stabilizers, by preventing ectopic recombination through heterochromatin formation in the repetitive regions of the genome (Wells et al. 2023). This hypothesis is further supported by the presence of KRAB genes in genomic regions marked by H3K9me3, suggesting their role in stabilizing repeats rather than suppressing them transcriptionally. Indeed, as shown by an increasing number of studies, heterochromatin formation through KRAB-ZFPs does not irreversibly silence TEs. Instead, KRAB-ZFPs may regulate TE activity by modulating their expression, allowing stage- and tissue-specific activity (Fadloun et al. 2013; He et al. 2019; Pehrsson et al. 2019). The role of KRAB-ZFPs as modulators, rather than simple repressors, of TE transcription would explain the generally high TE expression observed in the gonads of B. pachypus, even if the KRAB-ZFP system appears to be the most active regulatory mechanism in this tissue.
Taken together, while the higher TE activity in the male gonad may result from the lower efficiency of the KRAB-ZFP complex, the elevated activity of KRAB-ZFPs in the ovaries, along with growing evidence of the functional role of TEs in the germline, suggests a broader range of dynamics between TEs and the host genome. TEs are no longer viewed as merely competing with the host genome cellular functions, but rather as integral components of a dynamic interaction, encompassing both detrimental and mutually beneficial effects. Further genomic analyses will be necessary to explore the number of KRAB genes and their localization in the genome of B. pachypus, contributing to a deeper understanding of the different evolutionary dynamics involved.
Conclusions
Our study contributes to understanding the complex dynamics acting between TE expansion and host genome silencing mechanisms in the large genome of B. pachypus.
Anurans show remarkable variation in genome size, ranging from 1 Gb in Platyplectrum ornatum to approximately 10 Gb in the Bombina genus. Nevertheless, they are characterized by less extreme genome sizes as those observed in urodels or lungfish, suggesting anurans as a valuable model system to investigate the initial phases of genome gigantism and TE dynamics.
We found consistent differences in TE expression patterns between somatic and germline tissues, with male gonads exhibiting significantly higher transcriptional activity, emerging as a more permissive environment for TE expression. While such a pattern in the male gonad may result from the lower efficiency of the KRAB-ZFP complex, the elevated activity of KRAB-ZFPs in the ovaries, along with growing evidence of the functional role of both TEs and KRAB genes in the germline, suggests the existence of a broad range of host–TE dynamics going beyond the arms race model.
Materials and Methods
Sampling Design
We analyzed six adult yellow-bellied toad individuals (three females and three males) that were collected during a population monitoring program in the spring of 2020 and 2023, from the Aspromonte massif in Calabria, Southern Italy. Target sampling and lab breeding (two strategies potentially useful to increase sample size) were not feasible, as this species is threatened and strictly protected.
Sampling procedures were approved by the Italian Ministry of Ecological Transition and ISPRA (permit number: 20824, 18-03-2020).
RNA Extraction, Library Preparation, and Sequencing
Individuals were dissected to obtain brain and gonad tissues, and samples were stored in RNAprotect Tissue Reagent (Qiagen) until laboratory processing.
RNA extraction was performed using the RNeasy Plus Kit (Qiagen), according to the manufacturer's protocol, followed by RNA quality and quantification procedures with a spectrophotometer and a Bioanalyzer (Agilent Cary60 UV-vis and Agilent 2100, respectively—Agilent Technologies, Santa Clara, USA).
Library preparation was performed separately for each sample and sequencing was performed by Novogene (UK) Company Limited, using the NEBNext Ultra RNA Library Prep Kit for Illumina and Illumina NovaSeq platforms, respectively, through a paired-end 150 bp (PE150) strategy as described in Chiocchio et al. (2022). We obtained on average 52.8 million reads for each brain sample library and 58.3 million reads for each gonad sample library.
Transcriptome Assemblies
To explore the dynamics of TE activity and investigate different expression patterns in both somatic and germline tissues of B. pachypus, we concatenated two different transcriptome assemblies of the brain and gonads tissues.
The brain transcriptome was already assembled (Chiocchio et al. 2022). We then assembled another transcriptome version starting from gonad RNA samples and concatenated it with the “after CD-HIT-est” version of the brain assembly as described below.
First, raw read quality was examined using FastQC 0.11.5 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and MultiQC v.1.9 (Ewels et al. 2016), and trimming was performed with Trimmomatic v.0.39 (Bolger et al. 2014) (setting the option SLIDINGWINDOW: 4: 15, MINLEN: 36, and HEADCROP: 13) to remove low-quality bases and adapter sequences. After the cleaning step and removal of low-quality reads, 395,999,482 clean reads (i.e. 96% of raw reads) were maintained for building the de novo transcriptome assembly.
Brain and gonad transcriptomes were de novo assembled separately, using the bioinformatic protocol described in Chiocchio et al. (2022) (Fig. 2) with modifications. Briefly, we used rnaSPAdes v.3.14.1 (Bushmanova et al. 2019) to construct two optimized de novo transcriptomes. Transcriptome quality was validated using BUSCO v.4.1.4 (Simão et al. 2015), DETONATE v.1.11 (Li et al. 2014), and TransRate v.1.0.3 (Smith-Unna et al. 2016). We removed potential redundancy with CD-HIT-est v.4.8.1 (Fu et al. 2012) using the default parameters, corresponding to a similarity of 95%. We then concatenated the two transcriptome assemblies and ran another step of CD-HIT-est to remove redundant transcripts. Finally, we filtered out contigs with less than one transcript per million (TPM) to remove extremely low-expressed transcripts.
Transcriptome Annotation
All the unique transcripts were converted to peptide sequence using TransDecoder v.5.5.0 (Haas, BJ; https://github.com/TransDecoder/TransDecoder). Sequences were searched against the nonredundant NCBI protein database using DIAMOND v.0.9.10 (Buchfink et al. 2015) with an E-value cut-off of ≤1 × 10−5. BLAST2GO v.5.0 (Conesa et al. 2005) and INTERPROSCAN v.2.5.0 (Quevillon et al. 2005) were used to assign Gene Ontology (GO) terms. Protein domains were annotated by searching against the InterPro v.32.0 (Hunter et al. 2011) and Pfam v.27.0 (Punta et al. 2012) databases, using INTERPROSCAN v.5.52 (Quevillon et al. 2005) and HMMER v.3.3 (Finn et al. 2011), respectively.
TE Detection and Expression
With the aim of discovering how transposons can expand and mobilize in a large genome and to explore the dynamics of their activity in different tissues, we first identified which TE families are present in the B. pachypus transcriptome and then investigated which of these families are active elements.
We first detected and annotated TEs in the transcriptome assembly using the Extensive de novo TE Annotator (EDTA) v1.9.9 (Ou et al. 2019), a de novo pipeline that combines a suite of best-performing packages and includes a final step with RepeatModeler (Flynn et al. 2020) to generate a library of TE sequences. Ideally, a comprehensive and intact TE library would be constructed from a reference genome, as this allows for the inclusion of both expressed and nonexpressed TEs and ensures complete TE structures. However, since the reference genome for Bombina pachypus is currently not available, we constructed the TE library using the transcriptome assembly as the best available alternative. Although this approach might result in a less comprehensive and potentially fragmented representation of TEs, it provided the best available option for our study.
Afterward, we refined the library with DeepTE (Yan et al. 2020), which classifies unknown elements at order and superfamily levels, based on convolutional neural networks. Following this step, we used RepeatMasker v4.1.2 (Smit et al. 2013-2015) with the final TE library to annotate the transcriptome assembly and evaluate the number of contigs annotated as TEs. Subsequently, we estimated individual expression levels of the different TE families in the different tissues, mapping each individual sample of brain and male and female gonads to the TE library (TE families consensus sequences found in the transcriptome) with SalmonTE v.0.4 (Jeong et al. 2018). TE counts were normalized with edgeR (Robinson et al. 2010) using the TMM method, which is recommended for between-sample comparisons, as it takes into account sequencing depth, RNA composition, and gene length. Differential expression analyses were conducted between different tissues and different sexes with edgeR, using the negative binomial Generalized Linear Model (GLM) (glmQLFit and glmQLFTest functions in edgeR). First, we tested for sex-based expression differences by conducting pairwise comparisons between male and female samples of brain and gonads separately (male brain vs. female brain, male gonad vs. female gonad). Subsequently, we explored differentially expressed TEs between brain and gonads by contrasting one tissue against the other (male brain + female brain vs. male gonad + female gonad). TE families with an adjusted P-value <0.05 and log2 fold change ≥2 were considered differentially expressed.
Characterization and Expression of TE-Silencing Gene Pathways
To investigate if there are tissue-specific strategies to control TE mobilization, we searched the literature (Biscotti et al. 2017; Almeida et al. 2022) to compile a list of genes involved in TE host silencing mechanisms.
We selected and analyzed the activity of 32 target genes, including germline-specific repressors that prevent the vertical inheritance of TEs, transcriptional repressors that act through methylation, and posttranscriptional repressors that promote TE transcript degradation, all of which are further described below, to gain a broad view of the different silencing dynamics.
We first functionally annotated our concatenated transcriptome and then used tBLASTn (Gertz et al. 2006) to search for transcripts which could be orthologous to the following set of TE regulatory genes: (i) members of Ago (AGO1, AGO2, AGO4) and Piwi (PIWIL1, PIWIL2, PIWIL4) pathways; (ii) genes involved in small RNA biogenesis (DICER, DROSHA, DGCR8, PLD6, MAEL, SETDB1); (iii) genes participating in KRAB-ZFPs repression complex and chromatin-related corepressors (SETDB1, Trim28/KAP1, HP1a, HP1b, HP1g, DNMT1, DNMT3A, PRMT5); and (iv) genes related to the NuRD complex (CHD3, CHD4, CHD5, HDAC1, HDAC2, MBD2, MBD3, MTA1, MTA2, p66aplha, p66beta, RBBP4, RBBP7). Then we translated the 32 target sequences into their protein sequences, looking for the most complete CDS region. In the case of a sequence fragmented into several frames, we manually curated the sequence in order to reconstruct the CDS region and identify the 5′ and 3′ UTR regions.
To estimate the individual expression levels of the 32 target genes in the different tissues, we mapped each individual sample of brain and gonad tissues to the transcriptome with Salmon v1.4.0 (Patro et al. 2017), followed by normalization of all transcripts with edgeR (using the TMM method). Afterward, we extracted the expression values of the 32 genes of interest.
Differential expression analyses were conducted between different tissues and different sexes with edgeR, using the negative binomial Generalized Linear Model (GLM) (glmQLFit and glmQLFTest functions in edgeR) with the same contrasts as described for TE differential analysis. Genes with an adjusted P-value of <0.05 and log2 fold change ≥2 were considered as differentially expressed.
Supplementary Material
Supplementary material is available at Genome Biology and Evolution online.
Funding
This study was supported by grants from the Italian Ministry for Education, University and Research (Prin Project: 2017KLZ3MA), from the Aspromonte National Park, and by European Union—Next Generation EU, Missione 4 Componente 1, CUP I53D23003220006.
Data Availability
Transcriptome assembly and annotation are available on Zenodo (DOI: https://zenodo.org/doi/10.5281/zenodo.14530351). Transcriptomic raw reads are available under NCBI BioProject PRJNA1200595. The TE-silencing gene sequences are available in GenBank under the accession numbers provided in supplementary table S6, Supplementary Material online.