Evolving Together: Cassandra Retrotransposons Gradually Mirror Promoter Mutations of the 5S rRNA Genes

Abstract The 5S rRNA genes are among the most conserved nucleotide sequences across all species. Similar to the 5S preservation we observe the occurrence of 5S-related nonautonomous retrotransposons, so-called Cassandras. Cassandras harbor highly conserved 5S rDNA-related sequences within their long terminal repeats, advantageously providing them with the 5S internal promoter. However, the dynamics of Cassandra retrotransposon evolution in the context of 5S rRNA gene sequence information and structural arrangement are still unclear, especially: (1) do we observe repeated or gradual domestication of the highly conserved 5S promoter by Cassandras and (2) do changes in 5S organization such as in the linked 35S-5S rDNA arrangements impact Cassandra evolution? Here, we show evidence for gradual co-evolution of Cassandra sequences with their corresponding 5S rDNAs. To follow the impact of 5S rDNA variability on Cassandra TEs, we investigate the Asteraceae family where highly variable 5S rDNAs, including 5S promoter shifts and both linked and separated 35S-5S rDNA arrangements have been reported. Cassandras within the Asteraceae mirror 5S rDNA promoter mutations of their host genome, likely as an adaptation to the host's specific 5S transcription factors and hence compensating for evolutionary changes in the 5S rDNA sequence. Changes in the 5S rDNA sequence and in Cassandras seem uncorrelated with linked/separated rDNA arrangements. We place all these observations into the context of angiosperm 5S rDNA-Cassandra evolution, discuss Cassandra's origin hypotheses (single or multiple) and Cassandra's possible impact on rDNA and plant genome organization, giving new insights into the interplay of ribosomal genes and transposable elements.


Introduction
Despite both being repetitive genome components, the tandemly arranged ribosomal genes (rDNAs) and the dispersed transposable elements (TEs) seem to not have much in common.Yet, rDNAs and TEs can enter intricate co-evolution processes that are sometimes proposed, but neither fully appreciated nor understood; neither in their evolutionary mechanisms and implications nor in their effect on the genome (reviewed in Garcia et al. 2024).
Ribosomal RNAs (those encoded by ribosomal DNAs) comprise 80% of the RNA found in a typical cell and account for 60% of the ribosomal mass, having an essential role in protein synthesis (O'Connor and Adams 2010;Eaves et al. 2020).In eukaryotes there are four ribosomal RNA genes.The 18S, 5.8S, and 26S (28S/25S) rRNA genes are coded in a single operon (called 35S in plants and 45S rDNA in animals Hemleben et al. 2021) and the 5S rRNA gene is usually coded outside this operon, by 5S rDNA.These rDNAs are present in high numbers, from 50 to 13,000 copies per cell.
The 5S rDNA is perhaps the most enigmatic of the rRNA genes.It consists of a small transcriptional unit of about 120 base pairs and a nontranscribed spacer (NTS), normally clustered in long tandem arrays.While the 5S rRNA gene sequence is highly conserved, the NTS is variable in length and sequence, even in closely related species.The rapidly evolving nature of the NTS has been used for inferring interspecific relationships in many plant species (Perina et al. 2011;Waminal et al. 2014;Alexandrov et al. 2021).For such an apparently simple molecule of fundamental importance, we know so very little, and many features of 5S remain unknown or controversial, e.g.although 5S rRNA is essential for the function of ribosomes, its specific role still remains unclear.Evolutionary biologists have conflicting views on its evolution, some assume 5S rDNA to be a paradigm of concerted evolution, whilst others propose alternative models (Brown et al. 1972;Nei and Rooney 2005).Another interesting feature is the range of genomic arrangements in which 5S rDNA can be found: although typically organized in tandems, it can be linked to their repetitive gene families (such as the 35S/45S DNA), scattered in the genome or located in linear or circular extrachromosomal DNA units (Drouin and de Sá 1995;Cohen et al. 2010;Rebordinos et al. 2013;Vierna et al. 2013).However, we still do not know of any evolutionary advantage or biological functional differences of one 5S rDNA arrangement over another.
In contrast to the 5S rDNA, TEs occur in a wide variety of structures and sequences (Bourque et al. 2018).Among those, long terminal repeat (LTR) retrotransposons are the most widespread in plants, sometimes accounting for more than 80% of their genomes (Schnable et al. 2009).They are flanked by the name-giving LTRs that encode Polymerase II (Pol II) promoter motifs as well as transcription start and stop sites.Full-length retrotransposons usually carry all protein domains needed for their retrotransposition and can operate autonomously.Nevertheless, nonautonomous LTR retrotransposons also exist, with many not carrying any open reading frames at all.Instead, these short terminal repeat retrotransposons in miniature (TRIMs) proliferate by exploiting the amplification machinery of their autonomous counterparts (Witte et al. 2001;Gao et al. 2016).
Intriguingly, the ubiquitous TRIM family Cassandra deviates strongly from all other known TRIMs: Instead of carrying a retrotransposon-typical Pol II promoter, the Cassandra TRIMs have replaced this by co-opting the Polymerase III (Pol III) promoter from the 5S rRNA gene; hence, Cassandra retrotransposons and the 5S rDNA often share considerable sequence stretches and adopt similar secondary structures after transcription (Kalendar et al. 2008).Being unusually widespread across plant genomes (Gao et al. 2016), including monocots, dicots, and ferns (but not gymnosperms), Cassandras are believed to be ancient (Antonius-Klemola et al. 2006;Kalendar and Schulman 2006;Yin et al. 2014;Gao et al. 2016;Maiwald et al. 2021).
Both being ancient parts of the genome, actively using Pol III promoters and generating RNAs that adopt distinct secondary structures, Cassandra retrotransposons and 5S rDNAs share many similarities.Nevertheless, it is still unclear how Cassandra retrotransposons and the 5S rDNA depend on each other and if these two sequence classes co-evolve.We aim to understand how changes in the 5S rDNA sequence, structure and organization are mirrored by Cassandra retrotransposons.
To tackle this question, we focus on the Asteraceae, probably the largest and most diverse plant family, which harbors a wide variation in 5S rDNA sequence, structure and genomic organization.The Asteraceae family comprises over 25,000 species that fall into more than 1,700 genera (Mandel et al. 2019).An unusual 5S rDNA organization, in which the 5S rDNA is linked to the 35S rDNA, has been detected in three large groups of subfamily Asteroideae (tribes Anthemideae, Gnaphalieae, and the Heliantheae alliance), accounting for nearly 25% of this families' species (Garcia et al. 2010).A later study detected a significant sequence divergence in the conserved C-Box of the 5S promoter in some, but not all, Asteraceae with 35S-5S linkage (Garcia et al. 2012).Thus, the Asteraceae harbor a range of 5S rDNAs variations-to our knowledge, much more than other plant families.If Cassandra retrotransposons are impacted by changes in the 5S rDNA, this plant family offers a superior starting point to understand any potential 5S rDNA-Cassandra co-evolution.To date, over 20 Asteraceae genome assemblies are available (www.plabipd.de;last accessed 22.05.2023),offering a generous resource for investigating the potential rDNA-retrotransposon co-evolution.
Here, we systematically follow Cassandra and 5S rDNA evolution across plants and especially within the Asteraceae.To better understand the make-up of canonical Cassandra retrotransposons and their dependence on the 5S rDNA, we first mined all published sequences across plants.Then, focusing on the Asteraceae with their diverging 5S rDNA landscapes, we analyzed nearly 2,000 newly identified Cassandra sequences from 15 out of 22 screened Asteraceae genomes, and checked how 5S rDNA changes may impact Cassandra evolution.For this, we targeted two major shifts in 5S rDNA evolution: Promoter sequence mutation and emergence of the 35S-5S linkage.

Plant Material and Genomic DNA Sequencing
To provide short read sequencing data for 5S rDNA linkage analysis, Illumina short read sequencing was performed.Plants of Artemisia annua (MV8) from the living collection of the Institut Botànic de Barcelona (Spain), and Tragopogon porrifolius (TRA18) provided by the IPK Genebank of the Plant Genome Resources Center Gatersleben (Germany), were grown under long day conditions in the greenhouse.Genomic DNA was extracted from 1 to 3 g of fresh leaf material with a modified CTAB (cetyltrimethyl/ammonium bromide) protocol after Doyle and Doyle (1987) and Cullings (1992).
For both species WGS library preparation (TruSeq DNA kit) and sequencing was carried out by Macrogen Inc. Europe, using an Illumina NovaSeq machine.The sequencing yielded around 4.5 Gb of 151 bp paired-end reads for each species, with an insert size of 660 bp (A.annua) and 470 bp (T.porrifolius), respectively.

Retrieval of Published Plant Cassandra Retrotransposon Sequences
We manually extracted representative Cassandra retrotransposon sequences from all published reports that targeted plant genomes.For further analyses, we included 66 Cassandra-named sequences from these studies, which we could unambiguously determine to be Cassandra retrotransposons by manual annotation of Cassandra-typical sequence features (supplementary table S1, Supplementary Material online).As some Asteraceae species show an unusual variation in 5S rDNA sequence and genomic arrangement, we focused on de novo identification of Cassandra sequences in these species.For identification purposes we used published genomes of 22 Asteraceae species (supplementary table S1, Supplementary Material online).A multiquery BLAST search with known plant Cassandra sequences against these genomes was not successful.The first de novo identification in an Asteraceae genome was performed in A. annua with TRIM-specific LTR Finder settings (Gao et al. 2016): -d 30, -D 2000, -l 30, -L 500 and relaxed parameters for motif detection, only looking for primer binding site (PBS) and polypurine tract (PPT) motifs (no conserved regions, TG-CA architecture or target site duplications (TSD sequences; to detect diversified sequences).LTR-Finder hits were extracted and used to perform a multiquery BLAST search (Megablast, wordsize 28, Gap cost: 2/2, scoring 1/−2) with known Cassandra sequences and the A. annua-specific 5S rRNA gene.BLAST hits were processed by manual inspection, followed by annotation of a representative reference full length sequence.The A. annua-specific Cassandra sequence was included in the Cassandra dataset, which was then again used for a multiquery BLAST against the remaining Asteraceae genome assemblies.Genomes of Cichorioideae subfamily showed no Cassandra positive BLAST hits and therefore were screened with the LTR-Finder routine again, but yielded no results.
For each species, initial Megablast outputs of Cassandra population hits were inspected manually and reference sequences were chosen by focusing on the availability of TSD and conserved 5′/3′ LTRs with pairwise identities of 70% and above.Full length identification and annotation of Cassandra sequences were performed via discontinuous Megablast (wordsize 11, Gap cost: 2/2, scoring 1/−2) with the annotated Cassandra references against the corresponding genome assembly.Hits were manually inspected and annotation of LTRs, promoter motifs and (if available) TSD sequences was performed.When necessary, Cassandra sequences were grouped into different variants.

5S rRNA Gene Retrieval and Identification
5S rRNA gene information for selected species was obtained from the 5S rRNA database (http://combio.pl/rrna/;Szymanski et al. 2016) and the NCBI nucleotide database.As we focused especially on species within the Asteraceae, of which some do not have published 5S rDNA data, we performed readcluster analysis with the RepeatExplorer pipeline on Galaxy (Novák et al. 2013).Paired-end, randomly selected WGS datasets (Illumina) were downloaded from ENA and analyzed by the pipeline RepeatExplorer2 (https://repeatexplorer-elixir.cerit-sc.cz/galaxy).For long read data, we used the RepeatExplorer Utilities "Create sample of long reads" and "Get pseudo short paired-end reads from long reads" to adapt read length.After checking quality with FastQC (Andrews 2010), reads were trimmed and adapters removed when present.Read sampling (5 million reads per pair) was carried out, reads were interlaced and subsequently analyzed by TAREAN (a tool for the identification of genomic tandem repeats from NGS data), as implemented in RepeatExplorer2 (default settings).5S rDNAs were detected as one of the tandem repeats present in the analyzed genomes and the consensus sequence was extracted for each of the analyzed species.To check whether we observed populations of different C-Box motifs we performed read mapping with bowtie2 allowing 1 mismatch (-N 1 -L 20 -R 3) and a less stringent read mapping in Geneious 6.
These 5S rDNA sequences were compared with a standard 5S rDNA reference sequence (Arabidopsis thaliana E006, from the 5SrRNAdb by Szymanski et al. 2016) with Geneious 6, and the corresponding genic portion of each of the target species was extracted.

Determining the Linkage between 5S rDNA and 35S rDNA by low Coverage Assembly
To confirm the potential linkage of the ribosomal genes, WGS data from selected species (supplementary table S1, Supplementary Material online) was retrieved from the respective sequence archives (ENA, SRA, and GSA).The read quality was checked using FastQC 0.11.9 (Andrews 2010) and quality or adapter trimming was carried out using Trimmomatic 0.39 (Bolger et al. 2014) when necessary.Reads were subsampled to match a 1× genome coverage using seqtk 1.3 (Li et al. 2013) or the RepeatExplorer Utility "Create sample of long reads" (https://www.elixir-czech.cz/;Novák et al. 2020) for short and long reads, respectively.For sequencing data with less data, all available reads were used.For the first assembly round MEGAHIT 1.2.9 (Li et al. 2016) with the meta-large preset was used and only contigs larger than 5 kb were kept.The second assembly round was done with SPAdes genome assembler 3.15.5 (Bankevich et al. 2012) using the isolate preset, a coverage cut-off of 20 and the MEGAHIT final contigs as trusted contigs.The resulting assemblies were visualized using the Bandage 0.9.0 (Wick et al. 2015) assembly viewer.Nodes were colored by BLAST hits using the ribosomal genes of Helianthus annuus and drawn around the BLAST hits using a distance of 5 to 25, respectively.

Cassandra Metadata and Comparison With 5S rRNA Genes
Sequence comparison for Cassandra full length and LTR sequences was performed with multisequence alignments (MUSCLE; Edgar 2004) and manual refinement.Cassandra sequences were then grouped into families according to their level of similarity.We applied a pairwise identity threshold of 70% for family assignment.Exceptions were made for families with different variants.These families showed variants with indels, which affect the overall alignment pairwise identity in a negative way.But nevertheless, sequences could be grouped into one family due to their similarity in nonindel areas.To compare species-specific Promoter Mimicry of Cassandra Retrotransposons • https://doi.org/10.1093/molbev/msae010MBE 5S rRNA genes with the corresponding Cassandra sequence, we performed species-, lineage-, and family-specific alignments (MUSCLE) and dotplots.
All studies regarding nucleotide sequence motifs and variability of Cassandra sequences were performed by manual inspections of the MUSCLE alignments.

Results
Cassandras Form a Lineage in Plants: They Share 5S Promoters in their LTRs, but Differ in Sequence and Length To better understand the relationship between Cassandra retrotransposons and the 5S rDNA, we first compiled a Cassandra dataset across different plant species and expanded it with full length Cassandras from published Asteraceae genomes.Our motivation to investigate the structural hallmarks of Cassandra retrotransposons beyond the Asteraceae were initial difficulties in Cassandra identification: As Cassandras carry highly conserved regions and share LTR similarities across the plant kingdom (Kalendar et al. 2008, Maiwald et al. 2021), one could assume that they constitute a single Cassandra family with derivatives, scattered across plant genomes.However, for Asteraceae genomes, an initial similarity search via BLAST was not successful, indicating either greater divergence in Cassandra sequence or absence in Asteraceae genomes.Adopting a de novo approach relying solely on structurebased criteria, however, we retrieved Cassandra sequences for 15 out of the 22 investigated Asteraceae genome sequences.Sequence-wise, Asteraceae-derived Cassandras differ slightly from previously published Cassandra sequences (see below).
We complemented this Asteraceae dataset with 66 published Cassandra sequences from other plants, reaching 81 Cassandra reference sequences, each representing the Cassandra retrotransposon landscape of the respective genome.This dataset allows us to investigate structural hallmarks of Cassandra sequences across the plants and to understand how Asteraceae Cassandras compare to other plant Cassandras.
Being nonautonomous LTR retrotransposons, all Cassandras in our dataset show no remains of any coding regions.Sequence-wise, all of them show LTR/LTR identities of at least 74% (supplementary table S2, Supplementary Material online).Structurally, they harbor a methionine PBS and a PPT for first and second strand synthesis by a reverse transcriptase (supplementary table S2, Supplementary Material online).
Regarding element and LTR lengths, Cassandra sequences across the angiosperms show a wide range of sizes.The smallest Cassandra in our dataset is from Saruma henryi (Aristolochiaceae) with a length of 563 bp, whereas the largest from Glycine max (Fabaceae), with a length 968 bp, is almost double in size.It is noticeable that Cassandra fulllength elements are more or less equal-sized within one plant family (supplementary table S2, Supplementary Material online).Furthermore, longer Cassandra full length retrotransposons also tend to have rather LTR sequences.For example, the longest Cassandras reside within the Fabaceae (Cassandra length approx.900 bp; LTR lengths: approx.400 bp), whereas smaller representatives are found in the Piperales and Polypodiales (Cassandra length: approx.600 bp; LTR length approx.200 bp; Fig. 1, supplementary table S2, Supplementary Material online).Similar rules apply to the internal region.There seems to be a plant familyspecific conserved preference of internal region size lengths, which, in contrast to the LTR lengths, does not correlate with the overall length.
Regarding LTR sequence conservation, a typical familydefining feature of an LTR retrotransposon, the variation is very high across all 81 plant Cassandra sequences.All of them have an overall pairwise identity of 44.4%.Dotplot comparison about all plant Cassandras led us to the identification of Cassandra families based on plant familyspecific similarities (see a comparative analysis among all Cassandra species in Fig. 1).A closer look into LTR identities also revealed high values >70% within different plant families (Table 1; plots with darker shading in Fig. 1).The exceptionally low sequence identity for Cassandras within the Fabaceae and Asteraceae can be explained by variant formation (supplementary fig.S1, Supplementary Material online) due to indels within the LTRs of certain species.
To understand the main hallmarks of Cassandra retrotransposons and to assess the extent of Cassandra variability in angiosperms, we compared the sequence relationships of the Cassandra LTR regions, the internal regions, and the plant species.As seen in Table 1, Cassandra shows high pairwise identity values >70% for at least one of the two main sequence compartments: LTR (Poaceae, Brassicaceae, Caryophyllaceae, Amaranthaceae) or internal region (Fabaceae).Exceptions are Rosaceae Cassandras, with the highest values of pairwise identity for both LTR and internal region, and Asteraceae Cassandras, with the lowest conservation for both structural components.A patchy pattern for Asteraceae Cassandras is also seen within the dotplot (Fig. 1) and sequence alignments (supplementary fig.S2, Supplementary Material online) and resembles the phylogenetic assignment to the corresponding tribes of these species.Hence, a higher similarity of LTRs is not accompanied by conservation of the internal region and the other way around.Also, transcription-wise, there seems to be no need to preserve either LTR and/or internal regions as all combinations of similarity patterns in the two components are present across Cassandra families.
Considering this context, we find that LTRs of Asteraceae Cassandras are characterized by their short lengths ( MBE observed a mix of Cassandra-positive hits in the genomes of the Carduoideae.This led us to identify, for the first time, Cassandra-related nonautonomous LTR elements without the 5S promoter.These occur in some Asteraceae species of subfamily Carduoideae.These Cassandra-like elements share a part of their internal region with the canonical Cassandra, but differ in LTR sequence and length (supplementary fig.S2, Supplementary Material online).Most strikingly, as the iconic Cassandra 5S promoter is missing, these sequences do not represent Cassandra retrotransposons, but canonical TRIMs.Instead of the 5S rDNA promoter regions these TRIMs harbor TATA boxes as a putative signal for polymerase II transcription 48 nt from the 5′ LTR terminus (supplementary fig.S2, Supplementary Material online).In two species (Arctium lappa and Carthamus tinctorius), these Cassandra-like TRIMs co-exist with Cassandra, whereas in Cynara cardunculus only the Cassandra-like TRIM is present.As such, these Cassandra-like TEs may represent evolutionary progenitors or intermediates.
Plant Cassandra Sequences Share a Conserved 5S rDNA Similarity Region Out of the 81 Cassandra retrotransposons in our dataset, we compared them to their corresponding published or newly annotated Asteraceae 5S rRNA gene, if available.By definition, all Cassandra sequences harbor a region similar to their corresponding 5S rRNA gene.The length of this similarity region is species-specific and always located approximately in the middle of the LTR sequence (supplementary table S2, Supplementary Material online).The shared similarity region spans to the internal control region (ICR) with the promoter motifs (A-Box, intermediate element [IE] and C-Box) in all Cassandras.Although some species, like Malus × domestica (supplementary table S2, Supplementary Material online), show a large similarity region of more than 100 nt, we were able to define a "core" unit of ∼ 70 bp present in all Cassandra sequences (supplementary table S2, Supplementary Material online; Fig. 2).Although the core unit itself is present and conserved in all Cassandras, the position of the similarity regions within the LTR varies (supplementary table S2, Supplementary Material online), but it is never located at the LTR termini.We observe familyspecific similarity regions located within a region of 25% to 75% of all LTR nucleotide positions.
Within this highly conserved similarity region there is a very prominent variation: two kinds of C-Boxes.One of these motifs is solely specific for the Asteraceae Cassandra sequences within the Anthemideae tribe, the other is found in the remaining species.These prominent deviations can be explained by a closer look at the corresponding genomic 5S rRNA genes.In certain Asteraceae species, the usually highly conserved C-Box is slightly different, showing a 5′-GGCTTGGGTG-3′ motif instead of the canonical 5′-AGGATGGGTG-3′.This C-Box variation is mirrored by the corresponding Cassandra sequence, although it is not identical, showing a 5′-GGCCTGGGTG-3′ motif (Fig. 2; bold bases within the C-Box).To check, how common these C-Box shifts are among the Asteraceae Cassandras, we deeply screened the 15 Asteraceae genomes to collect in total 1,992 full length Cassandras, which can be grouped into 24 variants.We observed a changed C-Box in all annotated Cassandra sequences of the Anthemideae and in Conyza canadensis (Astereae).Within one genome, e.g. in A. annua, up to four different C-Box populations were detected (supplementary fig.S2, Supplementary Material online).We hence observe that Cassandra elements mimic changes in the 5S promoter of the 5S rRNA gene.
The 5S Promoter Motif Changed after the Emergence of the 35S-5S Linkage in the Asteraceae Interestingly, the 5S promoters with the mutated C-Box emerged in the Anthemideae, a tribe well known for harboring 5S genes in a linked arrangement, such as species of the genus Artemisia (Garcia et al. 2009).We wondered, if there was any relation between the 35S-5S linkage and the promoter shift, and if they evolved independently or within similar timespans.
We already have data regarding the promoter sequences (Fig. 2), but for some of the species, the 5S arrangement has yet to be determined.Hence, we analyzed the 5S arrangement in all Asteraceae species and the three chosen outgroups for Fig. 2.This was done by low coverage read assembly, where the 35S and 5S contigs were visualized (Fig. 3).Arrangement-wise, we observed 10 separate and 5 linked rDNA arrangements within our dataset.
We conclude that the Asteraceae species A. annua, Glebionis coronaria, Tanacetum cinerariifolium (all within the Anthemideae), Helichrysum umbraculigerum (Gnaphalieae), and Bidens hawaiensis (Coreopsideae) have 35S-5S linkage.Clearly, despite 35S-5S linkage occurring often in the Anthemideae as reported, it is neither restricted to this tribe nor present in all species of this tribe.Instead, linkage occurs polyphyletically and is not necessarily associated with the promoter shift limited to the Anthemideae (Fig. 2).Nevertheless, considering linkage in the Anthemideae and its sister tribe Gnaphalieae, we assume that the linked 35S-5S arrangement emerged first and that the promoter sequences shifted later in the evolutionary timeline.
Fixed in the 5S Gene, but Variable in the TE: Two 5S-derived Regions are Variable in Cassandra Having clarified that the promoter shifts and 35S-5S linkages are not always correlated, we wondered about the sequence variability between the promoter box motifs.As highlighted in Fig. 2, the LTR core unit includes the highly conserved 5S rDNA promoter box motifs and parts of their flanking genic region.However, in Cassandra, not all parts of the similarity region are as highly conserved as the promoter box motifs.Comparing Cassandra retrotransposon sequences to the 5S rRNA genes, we note that Cassandra retrotransposons have regions of high variability just next to the box motifs: In the promoter-flanking regions, an accumulation of mutations appears to be tolerated between the boxes (supplementary fig.S3, Supplementary Material online).The nucleotide Promoter Mimicry of Cassandra Retrotransposons • https://doi.org/10.1093/molbev/msae010MBE sequence most prone to mutations thereby seems to be adjacent to the conserved box motifs, e.g. between the A-Box and the IE, we observe 4 nt long and between the IE and the C-Box 5 nt long polymorphic motifs, further referred as MotIE and MotC (supplementary fig.S3, Supplementary Material online).We performed an all-against-all dotplot analysis, where similarities between defined windows (k = 12, allowed mismatches n = 3) are marked as a dot.If sequences show larger regions of similarity, multiple dots form linear patterns.Sequences are shaded according to the longest common substring, meaning sequences with a high number of matching windows are shaded in dark gray and white/light gray for sequences with a small number of similar sequence units.Each and every sequence shows at least a small diagonal, which information-wise can be mainly limited to the conserved region Cassandra sequences share with the 5S rDNA and a short promoter-flanking region.Cassandra LTR sequences show increased similarity within one plant family but tend to accumulate mutations for more distal ones.Visualization of phylogenetic relationships are indicated by colored lines and resemble the current taxonomy proposed by The Angiosperm Phylogeny Group (2016).
Placing the Asteraceae Cassandra retrotransposons into this framework of typical plant Cassandra lengths, they are on the smaller side, due to their short LTR and internal sequences with median length values of 270 and 82 bp, respectively, similarly to Cassandras from the Rosaceae, Amaranthaceae, and ferns.To better understand the co-evolution mechanisms that accompany the interplay of Cassandra retrotransposons and the 5S rDNA, we collected comprehensive Cassandra data across the vascular plants.We focused especially on the Asteraceae, combining Cassandra information, 5S rDNA promoters and 35S-5S linkage to illustrate the variable landscape of the 5S-Cassandra interaction (Fig. 4).For Asteraceae we gathered data from 11 plant tribes and 22 plant families (Fig. 4; column 1).Limitations are in the data foundation, as for some species not all required sequencing reads were available (Fig. 4; column 2).Nevertheless, our data provides a comprehensive framework that-for the first time-allows tracing Cassandra evolution (Fig. 4; column 3) in comparison to 5S evolutionary changes, such as promoter and arrangement shifts (Fig. 4; column 4).

Discussion
Plant Cassandras Share Structural Hallmarks, But Are Variable in Sequence and Length: Cassandra Is Better Defined as a Lineage than as a Family As they are present across the plant kingdom, it could be assumed that Cassandra forms a single family of nonautonomous LTR retrotransposons.Our dataset of plant Cassandras now offers the most comprehensive framework up to date to re-examine, if all plant Cassandra sequences indeed form a single family.If we follow TE taxonomy, LTR retrotransposons are firstly classified into superfamilies and lineages according to their structure, e.g. by the presence and order of conserved regions, and in a second step by the sequence of the conserved regions (Wicker et al. 2007;Arkhipova 2017;Neumann et al. 2019).The main focus is usually on the enzymatic domains, as these are most conserved (Malik and Eickbush 1999;Malik 2005;Eickbush and Jamburuthugoda 2008).For family classification, LTR sequences are compared across sequences, but other parameters, such as element and LTR lengths as well as the internal sequences are also considered (Wicker et al. 2007).As Cassandra retrotransposons do not code for any proteins, we can only take into account these length and sequence parameters.
Across the plant kingdom, represented here by 18 plant families, 81 Cassandra retrotransposons share the 5S promoter as a central motif in the LTR.However, apart from this, they show large variability in element length, structure, and nucleotide information (Fig. 1, Table 1, supplementary table S1, Supplementary Material online).For the LTR as a family-defining hallmark (Simon and Zimmerly 2008), we recognized family-specific sequence and length similarities beyond the conserved 5S-derived similarity region.Nevertheless, for Asteraceae and Fabaceae sequences, we surprisingly observe more variability in the LTRs (Table 1) than in the internal region.This is caused by indels in the LTRs of the Asteraceae and Fabaceae Cassandras that can be interpreted as tribe/subfamilyspecific phylogenetic signals.We observe a link between Cassandra diversification and species richness in the Asteraceae and Fabaceae, as these two belong to the most species-rich plant families, with various subfamilies (Christenhusz and Byng 2016).Therefore, it is not surprising that we see this variation also in repetitive elements.Regarding internal regions, we observe mostly preferred size ranges and conservation within plant families (i.e.within the Rosaceae and Fabaceae).Despite this general trend, the internal Cassandra regions can sometimes vary within the same family as observed in the Amaranthaceae (see also Maiwald et al. 2021 for an in-depth report).
Concluding, apart from the 5S promoter as shared structural hallmark, Cassandra sequences and lengths are variable and can form several Cassandra families.Taxonomically, we therefore understand Cassandra retrotransposons as a lineage of nonautonomous LTR retrotransposons rather than a family.This is in line with the lineage concepts of autonomous LTR retrotransposons that are based on structural similarities (Llorens et al. 2009;Du et al. 2010).For example, TEs of the chromovirus lineage have a chromodomain in the integrase region (Neumann et al. 2011;Weber et al. 2013) and TEs of the Retand/ Ogre lineage have two instead of a single ribonuclease H region (Neumann et al. 2019).If we put our observations in the frameworks for TE classification (see also the TE Hub initiative; Elliott et al. 2021), we can extend the lineage concept toward nonautonomous retrotransposons, and suggest the concept of a Cassandra lineage, encompassing all nonautonomous LTR retrotransposons that contain a similar 5S promoter region.

Co-evolution of Cassandra and the 5S rDNA
For all Cassandras in our dataset for which we could retrieve the corresponding 5S rDNA gene, we observe

MBE
mirroring of the conserved promoter box motifs.This is particularly noticeable within the Asteraceae, where different 5S variants and organizations have been described (Garcia et al. 2010(Garcia et al. , 2012) and which we therefore investigated in detail (Fig. 4).We here describe Cassandra's promoter mimicry in a depth that was not possible before, with the identification of almost 2,000 Cassandra sequences and scanning of all C-box promoter motifs.Nevertheless, promoter mimicry is not exclusively observed in Cassandra sequences.In plants, sequence mirroring strategies are widely found: "target mimicry", for example, describes endogenous long noncoding RNAs MBE that mimic and inhibit other small RNA molecules (miRNAs; Wu et al. 2013;Ye et al. 2014;Li et al. 2015).
Although compared to these mechanisms, Cassandra promoter shifts may not affect genome integrity as strongly.
Regarding the mechanisms behind Cassandra promoter mimicry, an initial mutation within the 5S rDNA gene in an Anthemideae ancestor is a likely scenario.Due to concerted evolution of the ribosomal genes (Nei and Rooney 2005), mutations can spread through the array and become fixed.This is well in line with the current concerted evolution models of the 5S rDNA.We can assume that a mutated C-Box in the 5S rDNA would be followed by molecular changes in the corresponding transcription factors on a cellular level to guarantee sufficient transcription.
Most likely, Cassandras adapt only slowly to these genic promoter changes.This is supported by the detection of Fig. 3. 35S-5S linkage of ribosomal RNA genes is present in several, but not all, Asteraceae species.The graphs represent the low coverage assembly of the rRNA genes and show gene order, arrangement, and organization.Complete circular graphs represent the typical rDNA monomer for each species.Linked ribosomal genes (L) show a single circular graph including the 5S rDNA gene (marked by arrows; five instances).Separated ribosomal RNA genes (S) show two graphs (10 instances).For Scalesia atractyloides (*), read quality strongly impaired the alignments and prevented full circular assemblies.Nevertheless, the rRNA genes were assembled completely, with several copies being separated by spacers.A separated arrangement can be concluded even from the imperfect assembly of S. atractyloides rDNAs.Five of the Asteraceae show the 35S-5S linkage being neither restricted to species with the promoter shift (see H. umbraculigerum and B. hawaiensis) nor affecting all species with the shifted promoter (see Chrysanthemum indicum).The line thickness is associated to the coverage depth of individual contigs; however, these are not comparable across the species shown.Sequence representation is not to scale.
Promoter Mimicry of Cassandra Retrotransposons • https://doi.org/10.1093/molbev/msae010MBE multiple Cassandra populations with differing C-Boxes in a single genome (detected in A. annua, G. coronaria, and T. cinerariifolium within the Anthemideae tribe).Despite this, the main Cassandra C-Box variant always corresponds to the genic rDNA within each of these genomes.
Interestingly, in C. canadensis, a species with a canonical rDNA, a very small (four members) Cassandra population with a deviating C-Box was detected.Here, we do not consider promoter mimicry, but rather consider these Cassandras to be remnants of a soon to be extinct Cassandra population.
As Cassandra retrotransposons rely on the 5S rDNA promoter for transcription (Kalendar et al. 2008;Maiwald et al. 2021), they need to either adapt or become inactive relics.This would also increase the mutation pressure for Cassandra sequences.Cassandra's promoter mimicry that mirrors the mutation in the C-Box (either random or also caused by gene conversion) was likely necessary to maintain TE integrity and to avoid vanishing.As an alternative hypothesis, an independent emergence of the C-Box in Cassandra sequences within the Anthemideae spawned from each 5S rDNA variant could also be possible, but seems unlikely, as overall sequence identity between Anthemideae Cassandras and the other Asteraceae is too high.
We thus suggest the following chain of events: First, the rDNA promoter mutated, followed by homogenization/ concerted evolution.Second, Cassandra retrotransposons mirrored these C-Box shifts to survive.Third, mobilization (and with this reverse transcription, amplification and integration) of Cassandra retrotransposons led to further copies with the new promoter.At the same time, spread through already existing Cassandra copies may have been influenced by gene conversion and even by homogenization in tandemly arranged Cassandras (as seen in Yin et al. 2014;Kalendar et al. 2020;Maiwald et al. 2021).
In contrast to the highly conserved 5S promoter box motifs and the observed cases of promoter mimicry, we observed variable motifs in the immediate vicinity of the highly conserved promoter boxes in the Cassandra LTRs.The variable motifs MotIE and MotC of Cassandra are most striking in this context, i.e. within the expected most conserved region of the already highly conserved 5S rRNA gene.However, we assume the necessary preservation of the promoter boxes to enable transcription, whereas mutations in other regions are tolerated and can at least reach as much variability as the non-5S-derived sequence regions.Interestingly, mutations in these variable motifs did not include larger indels.So, retention of certain lengths and spacing between the conserved promoter boxes seems to be mandatory, likely to enable the folding of secondary structures, as suggested previously for both 5S rDNA C-Box variants (Garcia et al. 2012).

Did Cassandra Retrotransposons Emerge Multiple Times?
Despite having rebuked an independent Cassandra emergence in the tribe Anthemideae, the question holds: Is it likely that Cassandra emerged several times in the plant kingdom?For the purpose of this study, we consider Cassandra emergence as the process of obtaining a 5S promoter in the LTR of a retrotransposon.The acquisition of new, prominent, genic promoters such as the 5S, would enable a retrotransposon to pursue new strategies for proliferation.This is already well known for nonautonomous retrotransposons, e.g. for SINEs in animals (Vassetzky and Kramerov 2013).Even apart from promoter structures, many instances of co-option of other sequence modules by TEs were observed (reviewed by Cosby et al. 2019 and  Wang and Han 2021).We hypothesize that the emergence

MBE
of Cassandra retrotransposons was caused by spatial association of an LTR retrotransposon near an existing or extrachromosomally located 5S rDNA array (Fig. 5a).Also, a direct integration of Cassandra into a 5S rDNA cluster is imaginable as this would enable a first quick Cassandra outburst through accessible regulatory transcription factors (Dewannieux et al. 2004;Lynch et al. 2015) before elimination due to 5S homogenization (Garcia et al. 2024).Either way, the close proximity of a Cassandra progenitor and nuclear 5S rDNA would have allowed promoter obtainment, without harming the 5S gene itself.An explanation for this neutral interaction could be a tendency to pseudogene accumulation at the borders of ribosomal RNA arrays (Robicheau et al. 2017).
For Cassandra establishment we will discuss two hypotheses: On the one hand, Cassandra retrotransposons may have arisen only once in the early vascular plants (Fig. 5b-left side), which led to a stable Cassandra population in ancestral species.The single origin hypothesis is supported by the strong conservation of the 5S promoter region-a 70 bp sequence stretch that extends well past the promoter boxes, as pointed out previously.In this scenario, this 5S region was obtained by an LTR retrotransposon once and was retained through plant and Cassandra evolution.This retention across the evolution of vascular plants may well be possible.If TE populations do not harm or significantly alter gene expression, they tend to be tolerated by the host (Zhang and Mager 2012), which could have been the case for Cassandra 5S regions.Following this line of thought, the large sequence differences between Cassandra retrotransposons of different plant families arose as result of mutation and sequence reshuffling.
On the other hand, Cassandra retrotransposons may have originated at multiple time points during plant evolution (Fig. 5b-right side), forming several independent Cassandra populations in the plant kingdom.If Cassandra emerged independently, we would consider the 70 bp region including the 5S promoter as a sequence module that was taken up multiple times by nonautonomous LTR retrotransposons, thereby forming Cassandra sequences repeatedly, in independent manners.Modular evolution is one of the typical strategies of TE evolution, occurring across all major TE clades and lineages (Smyshlyaev et al. 2013;Heitkam et al. 2014;Seibt et al. 2020).
In the light of both scenarios, the newly identified Cassandra-like retrotransposon-a TE with high similarity to Cassandra, but without the 5S promoter-could be a possible Cassandra precursor.By acquiring the 5S promoter module, a full Cassandra may have been formed from this Cassandra-like TRIM.The presence of polymerase II promoter motifs in these Cassandra-like elements in the form of TATA boxes, which may have allowed them to proliferate,

Cassandra origin
e.g. by transposition, recombination, eccDNA integration Fig. 5. Cassandra origin and evolution within the plant kingdom.We suggest a Cassandra origin by obtaining 5S rDNA promoter (and flanking) regions due to spatial association, e.g.transposition, recombination or eccDNA integration through an LTR-retrotransposon (a).Based on our data we assume Cassandra originating (star) either as a single event in early vascular plants (b-left side; purple star) or multiple origins (b-right side; red/yellow stars) in the plant kingdom.In the single origin scenario one originator formed multiple variants (purple/pink circles).Each of the Cassandra families, we observe today (purple/pink rectangles), is a successor of one of these variants.For the multiple origin hypothesis, the scenario slightly changes with each family (red/yellow rectangles) being assignable to one of the newly emerged Cassandras (red/yellow stars) within the plant kingdom.
Promoter Mimicry of Cassandra Retrotransposons • https://doi.org/10.1093/molbev/msae010MBE point into this direction.Alternatively, these Cassandra-like elements may have arisen from Cassandras that have lost the 5S rDNA promoter.Nevertheless, losing such a beneficial hallmark does not seem to be advantageous.Either way, after Cassandra emergence, a range of different evolutionary mechanisms act on these TEs, including the accumulation of mutations, rearrangements, recombination, reshuffling, and even polymerization.These processes caused Cassandra diversification into different variants which, through further diversification, formed the Cassandra families we observe today (Fig. 5b-both sides).Both scenarios can explain Cassandra retrotransposons with large LTR similarity within plant families (Fig. 1).Diversification of TE due to read-through transcription (Xiao et al. 2008;Elrouby and Bureau 2010;Jiang and Ramachandran 2013), recombination (Devos et al. 2002;El Baidouri and Panaud 2013;Yin et al. 2014Yin et al. , 2015;;Kalendar et al. 2020;Maiwald et al. 2021), and chimeric TE formation (Vicient et al. 2005;Wollrab et al. 2012;Sanchez et al. 2017) is a common mechanism.Also, the formation of extrachromosomal DNA during reverse transcription can influence TE sequence and structure on a nucleotide level (Drost and Sanchez 2019).
Whatever form of evolution has taken place, the 5S promoter is a powerful sequence module to be shuffled around, as it grants access to a reliable, environmental stressindependent transcription.The procurement of additional sequence information through reshuffling is known as a great step in LTR retrotransposon evolution (Malik andEickbush 1999, 2001) and, apart from conserved protein domains, this could also apply to promoter regions, as seen for SINEs (Seibt et al. 2020).The neat observed borders of the similarity region within all Cassandra LTRs (Fig. 2) clearly hint to some sort of modular reshuffling, with either the flanking LTR regions diversifying over time (single origin) or modular obtainment of this region by multiple "initiator" elements (multiple origin).Nevertheless, the strong similarity across the whole 5S region leads us to favor the single origin hypothesis.
Has Cassandra Carried the 5S Gene into the Linked Arrangement?
35S-5S linkage is one the of the most prominent hallmarks of 5S rDNA in the Asteraceae.It is reported for several species (Garcia et al. 2010), but as a big controversy not necessarily in line with the recent phylogeny (Mandel et al. 2019).Our integrative dataset includes five Asteraceae with 5S rDNAs in linked arrangements (Fig. 4).We find that linkage is present in the Anthemideae, Gnaphalieae, and Coreopsideae tribes, but in a patchy manner: First, phylogenetically, these three tribes are not direct sisters.Second, there is variation even within a tribe, as for instance, the Anthemideae contains both arrangement types (Figs. 3 and 4;Garcia et al. 2009Garcia et al. , 2010)).This patchy pattern of linked and separate arrangements across the phylogeny of the Asteraceae can be explained by either (1) multiple emergences of 35S-5S linkage or (2) a singular emergence of linkage followed by subsequent losses.
Regarding the multiple emergences of 35S-5S linkage in closely related Asteraceae, we consider this as unlikely.Despite reports of independent emergence for example in Ginkgo and other gymnosperm taxa (Garcia and Kovařík 2013), we consider the affected Asteraceae tribes as too closely related to have linkage occurring by chance.
Instead, we favor a single emergence, followed by loss of linkage within certain tribes/species and the retention in others.This scenario could be the result of frequent hybridization and polyploidization among the Asteraceae: If species with separate (S) and linked (L) arrangements hybridize, they would generate F1 hybrids (S × L) that have both arrangement types.One of them likely vanishes during the following generations (probably through concerted evolution mechanisms), thus enabling alternating and patchy distributions of S and L arrangements in and across populations.This scenario is supported by reports of frequent hybridization, polyploidization and general species richness in the Asteraceae (Semple and Watanabe 2023).Nevertheless, in both scenarios, there had to be a coexistence of linked and separate arrangements for a certain timespan after emergence, which can be seen by accidental observation of unlinked 5S rDNA units in an otherwise Ltype species, Coreopsis major (Garcia et al. 2010).Clearly, for all scenarios, one 5S arrangement was retained whereas the other had to vanish-either by chance or by selection.On a molecular level, the fixation of one 5S arrangement type can be achieved by homogenization and concerted evolution processes.
As discussed for the promoter shifts, we see that Cassandra clearly follows 5S rDNA evolution.Therefore, we asked if Cassandra could also impact 35S-5S linkage emergence.At first glance it seems obvious: due to their mobility and copy-and-paste mode of proliferation, Cassandras could have carried the 5S rRNA gene into the 35S rDNA array, as considered previously (Garcia et al. 2009(Garcia et al. , 2010)).However, if we have a closer look at Cassandra itself, this seems very unlikely for two main reasons: 1) None of the 81 species-specific Cassandras in our dataset (from 17 different plant families all across the plant kingdom) carried the whole 5S rDNA gene.
The region is always limited to a module containing the promoter regions and a short part of the flanking genic region.2) Retrotransposons have a totally different structure compared to ribosomal DNA clusters, with other mechanisms putting evolutionary pressure on the sequence.
On the flip-side, we assume that 35S-5S linkage might have had an influence on Cassandra emergence and evolution.One can assume a limitation of 5S diversification through linkage, as concerted evolution may have a greater impact on 5S in linked arrangements as opposed to 5S in separated arrangements.Mutation rates support this hypothesis (Sònia Garcia et al. personal communication).If the linked arrangements allowed a faster spreading and Maiwald et al. • https://doi.org/10.1093/molbev/msae010MBE fixation of 5S mutations in the array, it would be the perfect starting ground for the observed promoter shifts in the Anthemideae.As discussed above, in order to survive, Cassandra sequences then must adapt to this shift and mimic the new C-Box variant/information.

Conclusion
We collated a dataset of representative species-specific plant Cassandra retrotransposons and defined the presence of the 5S-related sequence stretch in the LTR as a hallmark defining the Cassandra TE lineage.Narrowing down on the Asteraceae, a plant family with wide variation in the 5S gene sequence and organization, we put together a comprehensive Cassandra-5S rDNA framework to trace the interplay between Cassandra TEs and 5S rDNA evolution.We find that shifts in 5S promoters are mimicked closely by the TE, whereas overall reorganization in 5S rDNA architecture does not impact the Cassandra landscapes.We here provide convincing evidence for gradual Cassandra-5S rDNA co-evolution that gives insight into the interplay between TEs and rDNA in plant genomes.

Fig. 1 .
Fig. 1.LTR comparison of 81 Cassandra retrotransposons.We performed an all-against-all dotplot analysis, where similarities between defined windows (k = 12, allowed mismatches n = 3) are marked as a dot.If sequences show larger regions of similarity, multiple dots form linear patterns.Sequences are shaded according to the longest common substring, meaning sequences with a high number of matching windows are shaded in dark gray and white/light gray for sequences with a small number of similar sequence units.Each and every sequence shows at least a small diagonal, which information-wise can be mainly limited to the conserved region Cassandra sequences share with the 5S rDNA and a short promoter-flanking region.Cassandra LTR sequences show increased similarity within one plant family but tend to accumulate mutations for more distal ones.Visualization of phylogenetic relationships are indicated by colored lines and resemble the current taxonomy proposed by The Angiosperm Phylogeny Group (2016).Placing the Asteraceae Cassandra retrotransposons into this framework of typical plant Cassandra lengths, they are on the smaller side, due to their short LTR and internal sequences with median length values of 270 and 82 bp, respectively, similarly to Cassandras from the Rosaceae, Amaranthaceae, and ferns.

Fig. 2 .
Fig. 2. Similarity region shared by the 5S rRNA gene and Cassandra retrotransposonsin the Asteraceae and other plants.Differences from the 5S gene are highlighted in colors (A = red, T = green, C = blue, G = yellow).Key sequence variation in the C-Box is highlighted in bold.The arrangement of the 5S rDNA in a linked (L) or separated (S) configuration is indicated (see also Fig.3).The regions of high sequence divergence between the Box motifs are marked with an orange rectangle.

Fig. 4 .
Fig. 4. Framework for understanding the interplay between Cassandra retrotransposons and the 5S rDNA in the Asteraceae: summary, integrative overview, and data-based limitations.The Asteraceae phylogeny in the first column is based on Mandel et al. (2019).

Table 1 ,
supplementary tableS2, Supplementary Material online) and lesser conservation compared to most of the other plant families studied.

Table 1
Median length (bp) and family pairwise identity values (%) of plant Cassandras for families with two or more members.The number of representative Cassandras resembles the number of species with Cassandra retrotransposons within this plant family Promoter Mimicry of Cassandra Retrotransposons • https://doi.org/10.1093/molbev/msae010