Strain-specific evolution and host-specific regulation of transposable elements in the model plant symbiont Rhizophagus irregularis

Abstract Transposable elements (TEs) are repetitive DNA that can create genome structure and regulation variability. The genome of Rhizophagus irregularis, a widely studied arbuscular mycorrhizal fungus (AMF), comprises ∼50% repetitive sequences that include TEs. Despite their abundance, two-thirds of TEs remain unclassified, and their regulation among AMF life stages remains unknown. Here, we aimed to improve our understanding of TE diversity and regulation in this model species by curating repeat datasets obtained from chromosome-level assemblies and by investigating their expression across multiple conditions. Our analyses uncovered new TE superfamilies and families in this model symbiont and revealed significant differences in how these sequences evolve both within and between R. irregularis strains. With this curated TE annotation, we also found that the number of upregulated TE families in colonized roots is 4 times higher than in the extraradical mycelium, and their overall expression differs depending on the plant host. This work provides a fine-scale view of TE diversity and evolution in model plant symbionts and highlights their transcriptional dynamism and specificity during host–microbe interactions. We also provide Hidden Markov Model profiles of TE domains for future manual curation of uncharacterized sequences (https://github.com/jordana-olive/TE-manual-curation/tree/main).


Introduction
Arbuscular mycorrhizal fungi (AMF) are an ancient group of plant symbionts capable of colonizing thousands of different species.The AMF provides vitamins and minerals from the soil to the plants and receives carbohydrates and lipids in return (Martin et al. 2017).This relationship may have been established since plants first conquered the land (Delaux and Schornack 2021), and there is evidence showing that AMF improve plant growth and ecosystem productivity (Bonfante and Anca 2009).More recently, studies reported that these symbionts can retain atmospheric carbon, indicating they play a significant role in global carbon sequestration (Hawkins et al. 2023;Horsch et al. 2023).Nonetheless, despite their relevance for plant fitness and longterm evolution, AMF show low morphological variability with no apparent plant specificity (Corradi and Brachmann 2017).
In addition to being ecologically and agriculturally important, AMF harbors peculiar cellular features.Their spores and hyphae always carry thousands of nuclei, and to date, no stage where 1 or 2 nuclei coexist in one cell has ever been observed (Kokkoris et al. 2020).Furthermore, despite their longevity, no sexual reproduction has been formally observed in these organisms.However, evidence has now shown that AMF strains can either carry 1 (AMF homokaryons) or 2 nuclear (AMF heterokaryons) genotypes in their cells, a genetic characteristic found in sexually multinucleated strains of ascomycetes and basidiomycetes, suggesting that sexual reproduction does exist in these prominent symbionts (Ropars et al. 2016;Sperschneider et al. 2023).
The low morphological diversity of AMF masks remarkable differences in gene content and structure in this group.For example, species vary in genome size and gene counts (Morin et al. 2019), and the variation in genome size is significantly correlated with the abundance of transposable elements (TEs; Mathu et al. 2022).Recent studies based on chromosome-level assemblies of Rhizophagus irregularis also showed this model species carries 2 different (A/B) compartments, which are reminiscent of the "2-speed genome" structure previously reported in plant pathogens (Yildirir et al. 2022;Sperschneider et al. 2023).Overall, the A-compartment is gene-rich and is mostly composed of euchromatin, whereas the B-compartment contains a high concentration of TEs and is mainly composed of heterochromatin (Winter et al. 2018).Accordingly, in AMF, the A-compartment carries most "core genes"-i.e.genes shared by all members of the same species-and shows significantly higher gene expression, while the B-compartment has a higher density in repetitive DNA, as well as in secreted proteins and candidate effectors involved in the molecular dialogs between the partners of mycorrhizal symbiosis (Yildirir et al. 2022).
TEs are repetitive DNA classified into retrotransposons (class I), which use RNA molecules as intermediate for "copy-and-paste", and DNA transposons (class II), which spread through "cutting-and-pasting" the DNA (Wicker et al. 2007).Each class is divided into orders and superfamilies based on pathways of transposition and phylogenetic relationships (Wicker et al. 2008;Seberg and Petersen 2009), and thus classifying TEs is important to describe the evolution of the genome and infer its impact on the biology of any organism (Wells and Feschotte 2020;Rech et al. 2022).For example, by modifying chromatin status and attracting transcription factors, these elements can also promote the regulation of gene expression (Bourque et al. 2018).
In the model, AMF R. irregularis, about 50% of the genome is composed of TEs (Yildirir et al. 2022;Manley et al. 2023;Sperschneider et al. 2023).Their higher abundance in the B-compartment is linked to higher rates of rearrangements (Sperschneider et al. 2023), and it was proposed that elevated TE expression in germinating spores may lead to new expansions of TEs in this AMF species (Dallaire et al. 2021).Despite recent findings, key questions regarding the diversity and evolution of TEs, as well as their role in mycorrhizal interactions, remain unanswered.For example, approximately two-thirds of TEs remain unclassified, making it difficult to infer their function in AMF genome biology and evolution (Miyauchi et al. 2020;Dallaire et al. 2021).Similarly, because analyses of TE expression have so far centered on germinating spores, it is unknown how these elements are controlled during host colonization, and whether some show host-specific regulation.The present study addresses these questions by providing an improved classification of TE families in all R. irregularis strains with chromosome-level assemblies and by investigating their expression among multiple hosts.

Curation and classification of TE families
We used chromosome-level assemblies of 5 homokaryotic (4401, A1, B3, C2, and DAOM197198; Yildirir et al. 2022) and 4 heterokaryotic strains (A4, A5, G1, and SL1; Sperschneider et al. 2023; Supplementary Table 1) as a source to build repeat libraries.The curation for nonmodel species followed the most recommended guides (Wells and Feschotte 2020;Goubert et al. 2022).Firstly, repeat libraries were generated using RepeatModeler2.0.3 (Flynn et al. 2020) with the -LTRstruct mode for detecting Long Terminal Repeats sequences implemented by LTRharvest and LTR_retriever.The libraries from all strains were merged to create a single reference for the curation.
The unique library was submitted to TEclass, which separates the sequences into the order level: nonLTR, LTR, or DNA (Abrusán et al. 2009).This step helped us to distinguish orders with similar protein domains, such as DIRS (nonLTR) and Crypton (DNA).Tirvish, a tool from genome-tools (http://genometools.org/), was used to detect Terminal Inverted Repeats (TIRs) in DNA transposons elements (for order DNA/TIR) with the parameter -mintirlength 8. Hidden Markov Model (hmm) profiles of specific TE superfamilies or order domains were generated from a combination of conserved regions described in Supplementary Table 2.The sequences for each domain were first aligned using MAFFT (Nakamura et al. 2018), converted to the Stockholm format using esl-reformat, and finally submitted to hmmbuild (version 3.1b2) to generate the hmm profiles (Eddy 2011).
Class II elements (DNA/TIRS) were retained in the final library based on the following criteria: identified as DNA by TEclass, had a match with a transposon domain from hmmsearch, harbored a TIR sequence, and a size ranging between 1 and 17 kb.Sequences with TIRS, ranging between 50 bp and 1 kb and lacking transposase domains, were classified as MITEs (Miniature Inverted-repeat Transposable Elements).Following the hmmrsearch, certain elements exhibited similarity with more than one domain from different orders due to close relationships.To achieve the most accurate classification, these sequences were analyzed using phylogenetics to determine their evolutionary relationships.For this work, the protein sequences corresponding to these elements were aligned using MAFFT (Nakamura et al. 2018), and submitted to RAxML (Stamatakis 2014) to produce a phylogenetic tree using the PROTGAMMA model with 1,000 bootstraps.The best tree resolution was visualized using ggtree in R (Yu et al. 2017).The final classification of sequences aligning with more than one domain was based on their relationships, clustering elements from the same orders together, as illustrated in Supplementary Fig. 1.
Throughout the curation process, TE sequences classified by RepeatModeler were retained only if accurately identified by the program TEclass (Abrusán et al. 2009) and if the respective transposition domain was identified within their sequence.For newly identified sequences that were previously labeled as "unknown" by RepeatModeler, the classification was determined based on TEclass (Abrusán et al. 2009), the presence of a transposition domain, and their relationship to known sequences based on the phylogenetic reconstruction.The final library and models are available on https://github.com/jordana-olive/TE-manual-curation/tree/main and can be applied in any other dataset to custom TE characterization.

Repeat landscapes of genomes and compartments
We ran the RepeatMasker (version 4.1.2-p1;Smit et al. 2015) for all strains, with the parameters -a and -s, using the curated library as the reference (-lib option).The repeat landscapes were generated from modified createlandscape.pland calculedivergence.plscripts, provided in RepeatMasker files.These scripts calculate the divergence levels between the alignment of each TE sequence and the consensus family in the reference library.The landscapes also were generated to the A-and B-compartments, which are currently available only for the strains DAOM197198, A1, C2, A4, and A5 (Yildirir et al. 2022;Sperschneider et al. 2023).To assess whether the distribution of TEs varies across compartments, we extracted the percentage values from the landscapes of each Kimura bin, and then conducted a paired t-test using R. A significant difference between the landscapes was considered when P < 0.05.

TE and gene expression analysis
We evaluate the expression of genes and TEs using available RNA-seq from different tissues [germinating spores (Dallaire et al. 2021), intraradical mycelium (IRM), arbuscules (ARB; Zeng et al. 2018), and extraradical mycelium (ERM; Tsuzuki et al. 2016)], and mycorrhized roots from different plant hosts colonized by DAOM197198 [Allium schoenoprasum, Medicago truncatula, Nicotiana benthamiana (Zeng et al. 2018), and Brachypodium distachyon (Kamel et al. 2017)].The accession numbers to the data are available in Supplementary Table 3.The reads were filtered using Trimmomatic (Bolger et al. 2014) and aligned to the DAOM197198 reference genome using Bowtie2 (Langmead and Salzberg 2012).The read count was accessed by TEtranscripts (Jin et al. 2015) guided by the TE annotation performed in this study and gene annotation executed by Yildirir et al. (2022).Using DESeq2 (Love et al. 2014), for each host condition and tissue, the differential expression was generated comparing germinated spores as control.A transcript was considered differentially expressed when Padj (adjusted P-value) is ≤0.05.

TE nearby gene correlation
TE location and gene expression correlation were analysed using available RNA-seq from DAOM197198 generated through Oxford Nanopore Technology (ONT) sequencing (Manley et al. 2023).The long ONT-RNA-seq reads were filtered and trimmed using pychopper (https://github.com/epi2me-labs/pychopper;7.98% did not pass the quality parameters and were discarded).Nucleotide correction was performed based on self-clustering using isONTcorrect (Sahlin and Medvedev 2021).The filtered and corrected reads were aligned to the DAOM197198 genome using hi-sat2 and annotated using stringtie (Pertea et al. 2015) guided by the TE annotation performed in this study and gene annotation executed by Yildirir et al. (2022).In the same way, stringtie generated the counts of the transcripts used in the expression analysis.For detecting TEs upstream of genes, we extracted the genomic regions up to −1,000 to the transcript start position and then intersected with TE annotation using bedtools (Quinlan and Hall 2010).The Pearson correlation method in R was employed to assess the coexpression between genes and their upstream TE pairs.

A curated database reveals new TE families in R. irregularis
Using RepeatModeler and RepeatMasker, recent analyses of R. irregularis chromosome-level datasets indicated that strains of this species carry an average of 50% of TEs (Yildirir et al. 2022;Sperschneider et al. 2023).However, only one-third of their repeat content could be classified, and thus, on average, 30% of all available genomes are composed of unclassified TE sequences (Miyauchi et al. 2020;Yildirir et al. 2022;Manley et al. 2023;Sperschneider et al. 2023).To address this, we used chromosomelevel assemblies from five homokaryons and 4 heterokaryons R. irregularis strains to generate curated repeat libraries.When all genomes are considered, out of a total of 9,257 TE sequences identified by RepeatModeler, only 2,369 (∼25%) can be considered well-defined, bona-fide nonredundant consensus sequences harboring transposition domains.The notable reduction in TE numbers is due to noncurated datasets containing highly degenerated TE without domains (relics), and repeats that cannot be classified with current knowledge of TE evolution.
In the final curated library, 1,458 sequences belong to families previously classified by RepeatModeler (K, Fig. 1a), while 636 sequences represent newly identified families (N, Fig. 1a) of the following orders: SINE, LINE, LTR, DNA/TIRS, DNA/Crypton, Maverick, and RC/Helitron (Fig. 1a).Following curation, in the DAOM197198 genome, sequences belonging to these orders increased in number by 4-fold on average, and similar results were obtained for all investigated R. irregularis strains (Fig. 1b).We also assessed the influence of the manually curated library on the repeat landscape of the DAOM197198 genome (see Fig. 1c-e).Following TE curation, the percentage of the DAOM197198 genome represented by classified TE increased from 12% (Fig. 1, c and d) to 36% (Fig. 1e).

Repeat landscapes differ among and within R. irregularis strains
In the R. irregularis isolate DAOM197198, TEs are mobilized and retained differently across the genome (Dallaire et al. 2021), with 2 main waves of expansions.We accessed the distribution of TEs in the other strains based on the Kimura substitution calculation, which estimates the level of divergence of annotated TE compared with its consensus in the reference library.A lower number of Kimura substitution indicates high similarity between annotated sequences and their consensus, suggesting recent insertion events, while a higher number indicates greater evolutionary distance and likely older TE insertions.
Our analysis revealed that the patterns of repeat landscapes based on Kimura substitution levels differ among R. irregularis strains.Specifically, in most strains, expansions of TEs are recent, as highlighted by the high number of elements with Kimura substitutions ranging from 0 to 5, supporting the findings that new TE expansion bursts exceed older TE insertions in DAOM197198 (Dallaire et al. 2021; Fig. 3).However, other strains exhibit different TE distributions.For example, strains A4 and SL1 carry a larger proportion of TEs with older Kimura substitution levels, indicating strain-specific patterns of TE emergence, evolution, and retention.Remarkably, the genome of C2 has a much larger proportion of younger, and thus likely more active TEs, and it is plausible that this feature is linked to the larger genome size of this strain compared with relatives-i.e. the C2 genome is 162 Mb compared with an average of 147 Mb for other R. irregularis strains (Fig. 3).
The A-and B-compartments were recently identified in the chromosome-level assembly of 3 homokaryons (DAOM197198, A1, and C2; Yildirir et al. 2022) and 2 heterokaryon (A4 and A5; Sperschneider et al. 2023) strains.Based on available data, we observed that the distribution of TEs based on Kimura substitution levels also differs significantly between the A-and Transposable elements in a plant symbiont | 3 B-compartment in all strains (Fig. 4).Specifically, the proportion of TEs with Kimura substitutions between 10 and 20-i.e.older TEsis always higher in the B-compartment (P < 0.05), indicating that these elements are maintained over time at higher levels in this portion of the genome.In contrast, old and degenerated TEs appear to be rapidly eliminated in the A-compartment (Fig. 4).

TE and gene expression are positively correlated within the B-compartment
The localization of TEs near genes can impact their expression (Bourque et al. 2018;Drongitis et al. 2019).We investigated whether the presence of TEs upstream of genes is associated with gene expression in compartments by identifying the expression of TEs located within 1,000 nucleotides upstream of genes using long-read RNA data from the DAOM197198 strain (Manley et al. 2023).We found no correlation between the expression of TEs and coding genes located in the A-compartment (r = 0.21, P < 0.0001; Fig. 4f); however, a positive and significant correlation exists for genes and TEs in the B-compartment (r = 0.84, P < 0.0001; Fig. 4g).This finding suggests that TEs and genes in the B-compartment are being coexpressed.

TEs are significantly more expressed in colonized roots compared with ERM
To obtain additional insights into the biology of TEs, we investigated their regulation during host colonization using R. irregularis DAOM197198 RNA-seq data from multiple tissues, including the micro-dissected cells of ARB and IRM, and ERM (Fig. 5a) in symbiosis with M. truncatula roots (Fig. 5b-d).
The number of upregulated TEs is more than 4 times higher in IRM and ARB than in the ERM in the same host (Fig. 5b-d).TEs significantly upregulated in colonized tissue (ARB and IRM) include DNA/TIRS, LTR, LINE, and SINES orders.Among these, LTR/Gypsy,   LINE, CMC-EnSpm, Plavaka, hAT-Tag1, hAT-Ac, and Helitron are the superfamilies with more expressed families (Supplementary Table 4).One mechanism to control TE mobilization is through RNAi and AGO proteins (Kelleher et al. 2018).Typically, fungi possess 1-4 AGO genes per genome (Chang et al. 2012).However, R. irregularis harbors 40 copies of AGO-like genes, and of these 25 contain all typical AGO core domains (e.g.PIWI, PAZ, MID, and N-terminal) with some exhibiting signs of expression (Silvestri et al. 2019).Among these AGO-like genes, only one (Rhiir2092, JGI 1582012) presents significant expression-i.e. more than >100 transcripts per million in the samples, while the remainder exhibit minimal  Transposable elements in a plant symbiont | 5 to no expression.We find that the Rhiir2092 gene is significantly more expressed in ERM (P < 0.05) compared with the other conditions (Fig. 5, b and f).Specifically, for this gene, ARB and IRM laserdissected cells have significantly reduced expression, which again differs from the significantly higher expression of TEs under these conditions (P < 0.05; Fig. 5, b and f).

TE regulation changes with different hosts and correlates with host genome size
Analyses of RNA-seq data roots from multiple hosts colonized by DAOM197198 (A. schoenoprasum, M. truncatula, N. benthamiana, and B. distachyon), also reveals that TE expression differs significantly among hosts, with some families being expressed only in 1 of 4 hosts (Fig. 6 and Supplementary Table 4).For example, a total of 404 families are upregulated during colonization with A. schoenoprasum roots (Fig. 6a), including both families from class II elements (e.g.Sola-1, hATm, Academ-1, and DIRS) and class I elements (e.g.CR1-Zenon and DIRS) that are only upregulated with this host.In N. benthamiana and M. truncatula roots, the symbiont upregulates a smaller number of TE families compared with A. schoenoprasum, 251 and 197, respectively, while B. distachyon was the condition with lowest TE upregulation overall (116 families; Fig. 6d).The differences in TE expression among hosts are not linked to the variability in expression of AMF AGO-like gene-i.e.hosts with highest TE expression do not always have low AGO-like expression and vice versa.
Given the observed variation in TE expression in the AMF, we wondered how the plant host influences the differential expression of these superfamilies.A significant and positive correlation (r = 0.91, P = 0.001) between the repeat content of the host genome and the number of overexpressed families in the symbiont.For example, A. schoenoprasum (14 Gb) is the host with the most TE content and its colonized roots have the highest number of TEs upregulated in the AMF.By contrast, B. distachyon (272 Mb) has the lowest TE content and has significantly lower TE upregulation in the symbiont.
It is noteworthy that, in contrast to fungi, the repeat content of plant species has been well characterized using available tools, because most repeat databases have been curated using plant genome information (Flynn et al. 2020;Storer et al. 2021).As such, it is very unlikely that the significant correlation we observed was skewed by the noncurated nature of the plant genomes we used, particularly given that these findings are consistent among multiple plant hosts and conditions.

An improved view of TE family diversity and evolution in a model plant symbiont
Through curation of R. irregularis repeat libraries, we first improved the proportion of annotated families in these model plant symbionts and uncovered novel sequences representing the largest proportion of their repetitive sequences.Our work also revealed how R. irregularis strains differ in TE retention and deletion over time within 2 genome compartments.For example, our findings indicate that strains with the largest genome sizes (C2) show a combination of a higher rate of TE emergence and retention of these elements.We also uncovered notable differences in how TEs evolve within each strain, with some (A1, B3) having much higher proportions of very young TEs compared with others (SL1, A4) that carry levels of Kimura substitution rates indicative of early repeat degeneration and fewer cases of recent expansion.

TE retention rates are different between compartments
By investigating the degree to which TEs accumulate mutations over time, a significant distinction emerged between A/B compartments.Specifically, all A-compartment landscapes show a continuing invasion of novel/young TE insertions, and the high methylation present in this compartment (Yildirir et al. 2022), and/or purifying selection, might be needed for their control and rapid removal, as evidenced by the lower TE density in this compartment.By contrast, these insertions accumulate in the B-compartment, leading to notable inflation of these elements over time, as shown by the stable TE density along the axis that Transposable elements in a plant symbiont | 7 defines the Kimura substitution rates.The accumulation of TEs in the B-compartment might be linked with their domestication (Zhang et al. 2019;Modzelewski et al. 2022) and/or the emergence of new functions, a view supported by the significant positive correlations we observed in the expression of TEs and genes and by similar findings from plant pathogens (Fouché et al. 2020(Fouché et al. , 2022)).

TE regulation and evolution further underpin similarities between AMF and known fungal pathogens
Obvious similarities between the genomes of AMF and those of plant pathogens have been known for some time (Mathieu et al. 2018;Reinhardt et al. 2021).These include enrichments in TEs, and genomes subdividing into highly diverging regions dense in effector genes and TEs, and more conserved ones composed of core genes and low repeat density.
In the plant pathogen Verticillium dahliae, TEs often locate in proximity to highly expressed pathogenicity-related genes within fast-evolving adaptive genomic regions (Torres et al. 2021).These regions are reminiscent of R. irregularis B-compartments (Reinhardt et al. 2021;Yildirir et al. 2022), which are also enriched in TEs and secreted proteins that promote symbiosis with different hosts (Teulet et al. 2023).As such, the significant correlation we observed between TEs and genes specific to the B-compartment may mirror identical processes in AMF and plant pathogens.
The increased upregulation of genes in close proximity to TEs during the colonization stages in the B-compartment could also indicates a TE control for derepressing those regions of the genome (Oggenfuss and Croll 2023).Indeed, it has been proposed that TE-effector regulation is well-timed, i.e. both are expressed during the infection and repressed in the absence of the host (Fouché et al. 2022).In filamentous fungi, the variability of TEs can also allow for escaping mechanisms of recognition by the plant immunity system (Fouché et al. 2020(Fouché et al. , 2022)), and it is thus possible that in AMF the expression of TEs could aid plant-symbiont communication during colonization.

What drives TE upregulation and host specificity during colonization?
TE expression is active in germinating spores (Dallaire et al. 2021), but our study shows their expression in colonized roots is much higher still.In this context, we observed that an AGO-like gene exhibits significantly higher expression in ERM compared with colonized roots (Silvestri et al. 2019).AGO proteins are known TE regulators (Thomson et al. 2015), and thus our results suggest that RNAi is one of the key factors implicated in the regulation of TEs across the stages of the mycorrhizal symbiosis-i.e.downregulation in ERM, and upregulation in planta .In fungi, it is expected that 1-4 AGO proteins regulate TEs (Chang et al. 2012;Lax et al. 2020).However, the genome of R. irregularis unexpectedly contains an expanded set of 40 copies of this gene (Silvestri et al. 2019).The remaining AGO-like genes, which exhibit minimal or no expression, may be implicated in other functions (Silvestri et al. 2019).This could be attributed to these copies being incomplete or carrying additional domains (Silvestri et al. 2019).Notably, the expression of this AGO-like gene did not vary significantly among hosts, and thus other factors could be responsible for the host-specific variation in the TE expression that we observed.One intriguing possibility is that TE expression in the hosts directly or indirectly influences TE expression in the AMF, as seen in multiple plant-microbe interactions in cross-talk regulations (Melayah et al. 2001;Weiberg et al. 2013;Ren et al. 2019;Wong-Bajracharya et al. 2022).In support of this view, we found a positive correlation between TE abundance in the host and TE expression in AMF.With mounting evidence of molecular crosscommunication between the mycorrhizal partners, including RNA from the fungi interacting with mRNA from the hosts (Silvestri et al. 2019(Silvestri et al. , 2020;;Ledford et al. 2023), it is likely that molecular dialogs between mycorrhizal partners also result in the increased TE expression we observed in the fungal symbiont.

Fig. 1 .
Fig. 1.A description of the curated TE library.a) The proportion of known (detected by RepeatModeler, in light color, K) and novel (in bold colors, N) curated sequences.b) RepeatModeler and curated library annotation comparison across the strains.The y-axis represents the R. irregularis strains, while the x-axis shows the relative abundance of the TEs in the genome from the homology-based annotation (RepeatModeler, left) and the current annotation (curation, right).c-e) Repeat landscape of DAOM197198 using different libraries as a source for annotation.Each plot shows the sequence divergence from its consensus (x-axis) in relation to the number of copies in the genome (y-axis).Newer insertions are shown in the left peaks, while the older and more degenerate insertions are on the right side of the graph.c) Annotation performed using the noncurated library from RepeatModeler, which depicts the unknown elements (in gray).d) Only noncurated TEs detected by RepeatModeler, which comprehends 12% of the repeat content.e) Landscape generated with the final curated library.

Fig. 2 .
Fig. 2.The annotation of TE families in R. irregularis strains.The heatmaps show the percentage of families in each strain considering the RepeatMasker annotation.a) The percentage of TE families in each genome using the noncurated library.b) Annotation using the curated library with known and novel TE families.Blank areas mean absence of certain families in that annotation, while the black rectangles highlight the new families found in the curation.

Fig. 3 .
Fig. 3. Repeat landscape of R. irregularis strains.a-h) Each plot shows the sequence divergence from its consensus (x-axis) in relation to the number of copies in the genome (y-axis).Newer insertions are shown in the left peaks, while the older and more degenerate insertions are on the right side of the graph.The strains with recent burst of TE expansions are indicated with the green arrow, while the ones representing early TE degeneration are highlighted with red arrow.

Fig. 4 .
Fig. 4. a-e) The repeat landscapes in R. irregularis strains with annotated A-and B-compartments.The arrows point to the divergence between the compartments.f-g) Coding genes and upstream TE correlation of expression in DAOM197198.Each point represents a coding gene-upstream TE pair, the y-axis is the TE expression (logTPM) and the x-axis is the coding gene expression (logTPM).The line represents the regression, surrounded by the confidence interval.

Fig. 5 .
Fig. 5.The expression of TEs in R. irregularis through the life stages of colonization.a) A schematic view of the different life stages compared with spores.b-d) Volcano plots of expressed TEs in each colonization stage, evidencing the number of downregulated and upregulated sequences on the left (DOWN) and right (UP), respectively.The ones expressed but not significantly different from spores are shown in the center as NO.e) The number of upregulated TEs in each condition.Inside the parentheses are the number of total upregulated TEs in that specific condition.The gray circle size is according to the number of upregulated sequences.f) AGO-like gene (Rhiir2092) expression under different conditions.Colonization stages collected by laser dissection of ARB and IRM and ERM and mycorrhized roots.The t-test compared all the conditions with spores, where *P < 0.05, **P < 0.001, and ns is no significance.The box limits represent the range of 50% of the data, the central line marks the median, and the vertical lines out of the boxes are the upper and lower values.normTPM, normalized transcripts per million.