Sample Sequence Analysis Uncovers Recurrent Horizontal Transfers of Transposable Elements among Grasses

Abstract Limited genome resources are a bottleneck in the study of horizontal transfer (HT) of DNA in plants. To solve this issue, we tested the usefulness of low-depth sequencing data generated from 19 previously uncharacterized panicoid grasses for HT investigation. We initially searched for horizontally transferred LTR-retrotransposons by comparing the 19 sample sequences to 115 angiosperm genome sequences. Frequent HTs of LTR-retrotransposons were identified solely between panicoids and rice (Oryza sativa). We consequently focused on additional Oryza species and conducted a nontargeted investigation of HT involving the panicoid genus Echinochloa, which showed the most HTs in the first set of analyses. The comparison of nine Echinochloa samples and ten Oryza species identified recurrent HTs of diverse transposable element (TE) types at different points in Oryza history, but no confirmed cases of HT for sequences other than TEs. One case of HT was observed from one Echinochloa species into one Oryza species with overlapping geographic distributions. Variation among species and data sets highlights difficulties in identifying all HT, but our investigations showed that sample sequence analyses can reveal the importance of HT for the diversification of the TE repertoire of plants.


Introduction
Evolution results from selection and drift controlling the fate of modifications of the genetic material of organisms. Genetic variants can result from gene or genome duplication (Crow and Wagner 2006), substitutions (Lenski et al. 2003), recombination (Bodmer and Parsons 1963), transposon activity (Kidwell and Lisch 2000), and horizontal transfer (HT) of DNA . As more genome sequences become available, we are able to gain new insights into these processes and their impacts. While the importance of HT for some eukaryotes has been clearly established (Keeling and Palmer 2008;Schaack et al. 2010), the phenomenon remains poorly studied, mainly because its existence remained debated until recently Martin 2017;Sibbald et al. 2020).
HT refers to the movement of DNA across mating barriers. The phenomenon has been reported among fungi (Fitzpatrick 2012), insects (Crow and Wagner 2006), and other animals (Thomas et al. 2010), and between kingdoms (Gladyshev et al. 2008;Richards et al. 2009;Hotopp 2011;Mayer et al. 2011). Several lineages of plants have received HTs from viruses, prokaryotes, and nonplant eukaryotes (Yue et al. 2012;Cheng et al. 2019), as well as from other plants (Vallenback et al. 2010;Christin et al. 2012;El Baidouri et al. 2014;Dunning et al. 2019). Previously reported HTs in plants concerned mainly genes and organelle genomes (Dowson et al. 1989;Jain et al. 1999;Bergthorsson et al. 2003;Richardson and Palmer 2007;Stegemann et al. 2012;Xi et al. 2013). Although transposable elements (TEs) are the major components of many plant genomes (Bennetzen 2000;Schnable et al. 2009;Kim et al. 2014), only three studies have focused on their HT among plants (Diao et al. 2006;Roulin et al. 2008;El Baidouri et al. 2014), and have suggested that HT of LTR-retrotransposons in particular is widespread across diverse plant lineages (El Baidouri et al. 2014). However, more studies in plants are required to understand the role of horizontally transferred TEs in plant genome evolution.
In this study, we develop a novel method to investigate HT among plants using low-coverage sample sequences. We focused on panicoid grasses, which have been shown to be involved in HTs (Christin et al. 2012;Dunning et al. 2019). We first look for HT of LTR-retrotransposons from any of 19 panicoid grasses to any of 115 angiosperms with complete genome sequences. We then focus on two genera of grasses for which multiple genome sequences are available to track the dynamics of HT through time. Our study sheds new light on the importance of HT for the diversification of plant genomes.
(with $2.3Â coverages of the expected genomes, given their predicted sizes) from 19 panicoid genomes (supplementary table S1 and fig. S1, Supplementary Material online). Targeted investigation of LTR-retrotransposons was conducted by screening the LTR-retrotransposons of 115 plant species that have reported complete genome sequences (supplementary table S2, Supplementary Material online) with reverse transcriptase (RT) sequences. Initially, the wellconserved YXDD domain of the RT was used in the HMM search. Horizontally transferred LTR-retrotransposons were identified as those with mapped sample sequences with >97% identity. Through this, a total of four cases of possible horizontally transferred LTR-retrotransposons were identified as events into the O. sativa genome from a panicoid genome, with an average identity of mapped sample sequences between 97.5% and 98.5% ( fig. 1).
To verify the adequacy of the identity threshold, we compared the identity of nuclear gene coding sequence (CDS) between O. sativa and each of the four panicoid species that were involved in the putative HT events. The pairwise identity between all four panicoid sample sequences and the CDS of O. sativa peaked at 86% ( fig. 1). Only 2.6% of the sample sequences had >97% identity with the CDS of O. sativa. Therefore, the >97% threshold is high enough for the identification of horizontally transferred LTR-retrotransposons.

HTs of LTR-Retrotransposons between the Panicoid Grasses and Oryza
Because all four possible horizontally transferred LTRretrotransposons were identified in O. sativa among the 115 plant species, subsequent analyses focused on the genus Oryza. Using ten whole-genome sequences of different Oryza species (Goff et al. 2002;Stein et al. 2018), we identified 15 LTR-retrotransposons with high similarity (>97% identity) between the studied panicoid grasses and Oryza species ( fig. 2A, table 1, and supplementary fig. S2, Supplementary Material online).
FIG. 1. Horizontally transferred LTR-retrotransposons detected between 19 panicoid species and 115 other plant species. The histograms in the "Degree of CDS sequence identity" column indicate the genetic distance between each species pair as shown by the level of CDS sequence homology (the total number of comparisons is shown with a green number on each histogram). The peak point of the histogram is considered as a representation of the degree of divergence from the speciation event and is marked with a vertical dotted red line, with the percent identity at the top. For the four HTs, the identity of the panicoid sample sequence hits to the target genome is shown on the right. Sequences of the recipient genomes are depicted by horizontal red bars and domains of LTR-retrotransposons by gray, pink, orange, green, and light blue boxes that represent gag (GAG), aspartic protease (AP), integrase (IN), reverse transcriptase (RT), and RNase H (RH), respectively. Because of sequence assembly issues or internal rearrangements, some domains are missing (CL102, Iseilema membranaceum and CL102, Cenchrus sieberianus) or in a nonstandard order (CL129, Eriochloa meyeriana). The whole unit of the LTR-retrotransposon is delimited with blue bars, showing that there was no additional high (HT-derived) homology between the panicoid and Oryza genomes flanking the precise TE ends themselves. Vertical bars indicate the position and identity of the hits. The height and color intensity of the bars are proportional to the degree of identity. The level of CDS identity indicating the speciation point is depicted by a dotted horizontal red line on each graph. The RT sequences used in the screening of horizontally transferred LTR-retrotransposons were clustered for subsequent analyses and a total of 368 RT clusters were obtained. To investigate which of these 368 might be horizontally transferred, we constructed phylogenetic trees with the RT sequences of the clusters containing possible horizontally transferred LTR-retrotransposons and their homologs in the Oryza species. For 14 of the 15 putative horizontally transferred LTR-retrotransposons, phylogenetic analyses confirmed that some Oryza RT sequences are nested among those of the panicoid species (supplementary fig. S3, Supplementary Material online), as expected with HT. For the last candidate (CL329), only five RT sequences were present in the cluster, making the phylogenetic tree uninformative. In all phylogenetic trees, the Oryza RT sequences were nested among those from panicoid grasses, indicating that the Oryza lineage was the recipient of the HT. In two cases, the HTs of two different Oryza species were placed together in a single phylogenetic tree ( Material online). Considering CL329 as well, the 15 horizontally transferred LTR-retrotransposon candidates would represent 12 HT events (table 1).
To examine whether the presence of horizontally acquired sequences in multiple Oryza species results from a single transfer in their common ancestor, we first analyzed the HT case of CL025, which was also identified in the first search ( fig.  1). Some CL025 elements were identified with high identity between Cymbopogon citratus and both of the closely-related species O. sativa and Oryza nivara ( fig. 2B). Although it was not detected in our HT search, potentially because of a low number of mapped reads, one homolog of the sequence was also identified in Oryza rufipogon. The phylogenetic analysis of CL025 showed that sequences from the three Oryza species grouped together and nested within C. citratus, as expected following HT to the common ancestor of the three Oryza species ( fig. 2C). We performed similar analyses with all of the other HT candidates, and the phylogenetic analysis demonstrated that CL102 was likely transferred from Iseilema membranaceum or its relative to the common ancestor of O. sativa  The HT of an LTR-retrotransposon would appear as a new insertion when compared with the collinear regions of other Oryza genomes that lack the HT. To test this prediction, comparative analyses of genomic fragments examined all putative HTs. Seven out of twelve putative HT events are located in highly repetitive regions, leading to potential assembly problems (table 1, HT events # 1, 2, 3, 5, 6, 8, and 9). For the other five candidates, orthologous sequences to the regions harboring the horizontally transferred LTRretrotransposon were isolated from different Oryza species. The CL102 of O. sativa has >97% identity to an I. membranaceum LTR-retrotransposon, and the same LTRretrotransposon is identified in the collinear region of O. rufipogon, suggesting that the sequence was transferred to this genomic region in the common ancestor of the two

Nontargeted Investigation of HTs between Echinochloa and Oryza
Out of 12 HT events of LTR-retrotransposons detected in Oryza species, four were received from the Echinochloa genus, which was thus selected for nontargeted investigations to allow the detection of other types of horizontally transferred fragments. Six additional Echinochloa species were sample sequenced (supplementary table S3, Supplementary Material online), leading to a total of nine Echinochloa and ten Oryza species in these analyses (supplementary fig. S7, Supplementary Material online).
The pooled reads of all Echinochloa sample sequences were mapped simultaneously to each of the Oryza genomes. Using a 97% identity threshold, a total of 58 densely mapped regions were detected (figs. 3 and 4A and supplementary fig. S8, Supplementary Material online). As expected, the number of predicted HT events varies with the threshold, with 3 cases at 100% and up to 165 at a 94% threshold ( fig. 4B  and supplementary fig. S8, Supplementary Material online). At high homology levels, all of the apparent HT events were of TEs (see below). While a lower threshold might result in more false positives, the broad distribution of these degrees of high similarity suggests that HTs have occurred frequently during the diversification of the Echinochloa and Oryza genera.
The pairwise identity between the different Oryza and Echinochloa species was reported for each of the 58 HTs detected with a 97% threshold ( fig. 4A). Most fragments show differences between Oryza and Echinochloa sequences (<100% identity, fig. 4A), indicating that the exact donor was not sampled or mutations accumulated since the transfers. HTs detected in O. sativa, O. rufipogon, and O. punctata were more similar to sequences from the closely related Echinochloa esculenta, Echinochloa crus-galli, Echinochloa oryzoides, and E. haploclada ( fig. 4A), indicating that the HT occurred from this lineage.
Interestingly, three regions that contained horizontally transferred sequences had a 100% identity between Echinochloa callopus and Oryza brachyantha ( fig. 4A, C, and D). A comparison of the HT sequence with a maize ortholog indicated that the sequences from O. brachyantha represent three truncated versions of the same LTR-retrotransposon, likely derived from a single HT. One of the sequences corresponds to a full unit of an LTR-retrotransposon missing one LTR, another one to a fragment of the LTR sequence, and the third to a solo LTR (fig. 4C)

The Frequency of HTs of TEs into Plant Genomes
The dynamic activity of TEs provides plasticity to plant genomes. A TE family can increase its population size through its amplification activity, and it also can become extinct by deletion or degeneration in a plant genome (Kaplan et al. 1985). Because it is an irreversible phenomenon, the repeated extinction of TE families would gradually reduce TE diversity. In our study, dense sampling of a group of donors and a group of recipients allowed direct assessments of the dynamics of HT of TEs. We identified a minimum of 27 independent HT events (nineteen from the HT clusters and eight from the singletons) between the Echinochloa genus and the O. punctata lineage ( fig. 5A and supplementary table S5, Supplementary Material online), by far the most active exchange history detected. Interestingly, these HTs involved diverse types of TEs, such as Gypsy and Copia LTRretrotransposons, the DNA transposons Harbinger, MuDR, EnSpm, and hAT, and the L1 non-LTR retrotransposon family. Moreover, many (perhaps all) of these TEs were amplified in the host genome after the HT events ( fig. 5B and supplementary fig. S6, Supplementary Material online). Therefore, we provide direct evidence that HT of TEs can play an important role in maintaining or increasing the diversity of the TE families against the extinction of TE families.
Because of lineage sorting of initially hemizygous TE transpositions, one always expects to find only a tiny minority of the TE insertions generated over evolutionary time in any single haplotype. Hence, the discoveries in this analysis only provide a minimum estimate of the true frequencies of events. Moreover, HT of TEs or other types of sequences might be expected to be most frequent between closely related plants, because of basic genetic compatibility, but these were excluded from our study because of an inability to distinguish between horizontally transferred sequences that are highly homologous and sequences that are highly homologous merely by vertical descent from common recent ancestors. Finally, horizontally transferred sequences that did not Recurrent Horizontal DNA Transfer in Grasses . doi:10.1093/molbev/msab133 MBE amplify, were from very ancient HT events, or were selected against would also be underrepresented in our analysis.
Hence, it appears that HT is very common between some plant lineages, even those that have diverged from a common ancestor for >50 My, as is the case for the panicoid grasses and the genus Oryza.

Possible Mechanisms of the HT of TEs
Despite repeated reports of HT among plants, the underlying mechanism remains unknown. In the case of TEs, several scenarios are possible. For instance, an HT could involve a large fragment of DNA containing TEs that are subsequently amplified (Dunning et al. 2019). Or the TEs could move on their own, perhaps as an LTR-retrotransposon that occasionally acts as a retrovirus. How plant tissues interact to allow this transfer to occur is another issue, with tissue wounding and insect-mediation or interspecies root cell-cell interactions all as possibilities.
No evidence of HT of a large DNA fragment was found in this study, despite 100% similarities suggesting recent transfers ( fig. 4A). The numerous HTs detected in O. punctata involved LTR-retrotransposons, DNA transposons, and non-LTR-retrotransposons, indicating that multiple types of TEs can be exchanged. This great variety in TE types that underwent HT demonstrates that the movements do not rely on any specialized function specific to one type of TE.
Massive HTs of TEs were detected after analyzing 195 insect genomes (Peccoud et al. 2017), and we suggest that insects might also move TEs among plants. Plant TEs can be activated with wound stress caused by physical damages from insects (Wessler 1996;Takeda et al. 1998), and they could therefore be transferred temporarily to the insect, which could place them into the wounds of the next plant it attacks. For example, aphids are known to act as a vector of viruses among plants (Ng and Perry 2004). It is known that the grafting of two sexually incompatible plants can transfer the chloroplast genome and cause HT (Stegemann et al. 2012). With a similar mechanism, DNA fragments transferred by insects might integrate into the genome of a gametophytic or meristematic cell, resulting in HT. Because the Echinochloa genus includes species that are well-known as rice paddy weeds (Bhagirath and David 2009), insect-mediated, or root-root interaction-based HTs could occur in this environment, perhaps explaining their preponderance in our analyses.
Our nontargeted investigation identified no HT of standard nuclear protein-encoding genes. With the exception of LINEs and SINEs, all TEs have a DNA intermediate during their replication. Because many of the HTs that we identified consisted of an intact unit of a TE that was able to subsequently proliferate, the transfer likely involved the DNA form of the intermediate. By contrast, genes are not routinely mobile, but rather are transcribed from chromosomes in the form of RNA, which is less stable and lacks an intrinsic integration property. DNA transposons were more frequently transferred among insects than retrotransposons (Peccoud et al. 2017), and 16 of the 27 HT events identified in O. punctata involved DNA transposons. The relatively high number of DNA transposons transferred into O. punctata suggests that DNA-type TEs have been particularly active in this species and/or in the donor lineages that have contributed to its genome. We conclude that the extrachromosomal stability of the TEs coupled with their propensity for self-replication likely accounts for the numerous successful HTs identified here.

Conclusions
One of the biggest bottlenecks in studying HTs among angiosperms is the lack of abundant sequence resources (Wallau et al. 2018). In this study, we show that low-depth sample sequence analyses can solve the genome sequence shortage by generating data that can be used to detect HTs for large numbers of species. We first scanned the genomes of 115 angiosperms with sequences belonging to 19 panicoid grasses. Despite the large number of species, putative HTs were initially detected only to O. sativa. These results suggest that HT happens frequently only between some groups of plants. Previous studies have reported HT between other distantly related groups of grasses (Mahelka et al. 2017;Hibdige et al. 2020), demonstrating that the phenomenon is widespread at least among some grasses. The small number of HTs identified in our first scan may thus reflect computational limitations. First, the number of HTs detected depends on the similarity threshold ( fig. 4B). Second, relying on high similarities means that only recent HT can be detected. Large-scale surveys with diverse species are consequently required, and we have shown that increasing either the number of potential recipients or donors increases the number of HTs detected. Because sample sequences, such as those used in this study, are becoming available for large numbers of species, we predict that our method will allow large-scale systematic detections of HTs in the future across hundreds or thousands of sample-sequenced lineages.

Low-Depth Sample Sequencing
A total of 25 panicoid species were selected for analyses, most being grown in a comparative experiment by Atkinson et al. (2016) (supplementary table S1, Supplementary Material online). Genomic DNAs were isolated from leaf tissue using Qiagen spin columns (DNeasy Plant Mini Kit; Qiagen). Illumina NextSeq was used for sequencing. The sequences generated were targeted to be $151 bp single-end reads. The sequences used were at least 100 bp in length after quality-trimming.

LTR-Retrotransposon-Targeted Search for HT
To look into possible HT of LTR-retrotransposons from the 19 studied panicoid species, we compared the panicoid RTs that we discovered to 115 publicly available plant genome sequences. These 115 plant genome sequences are listed and referenced in supplementary table S2, Supplementary Material online, and were chosen because they covered a great breadth of angiosperm diversity and/or because they were considered reference or otherwise highly assembled genomes. Reference Park et al. . doi:10.1093/molbev/msab133 MBE sequences from other panicoid genomes (e.g., maize, sorghum, foxtail millet, pearl millet, proso millet) were not used in the analysis because their genes and TEs are closely related to those in the studied panicoids by shared vertical descent. Hence, the closest relatives to the panicoids of these 115 species were grasses in the Pooid and Oryzoid subfamilies of the Poaceae, which last shared a common ancestor with the panicoids >50 Ma (Christin et al. 2014).
To estimate the degree of divergence between some of the reference genomes and the 19 panicoids, we used the CDS of all of their nuclear genes. The CDS were compared with the panicoid sample sequences by BLAST (single HSP, score: >60, and e-value: <e-8). In each case, the best nonoverlapping hits among the mapped panicoid sample sequences were considered. The hits corresponding to possible paralogs would be impossible to discern from orthologs for these short and unassembled sample reads, so they were not removed. Histograms were drawn showing the identity of the compared sequences, and the peak point was considered as an indication of the approximate speciation point.
All of the RT sequence reads were isolated from the 19 panicoid sample sequences with HMMER (default options) (Mistry et al. 2013) using customized HMM profiles for the short-read sequences (supplementary additional files 1 and 2, Supplementary Material online). The isolated RT sequences were clustered based on sequence similarity with the RepeatExplorer pipeline (default options) (Nov ak et al. 2013). As a result, a total of 368 RT clusters was generated. Each of the RTs was compared with the 115 plant genome sequences by BLAST (single HSP, score: >60, and e-value: <e-8). The top three homologies with >85% identity and at least 140 bp matched length were considered. Using their position information, the sequence contig of the target genome containing the match was isolated with 10 kb flanking regions on each side. The isolated 20 kb-length sequence contigs were compared again to all of the 19 panicoid sample sequences by BLAST (single HSP, score: >60, and e-value: <e-8) to find all high homologies. From the BLAST results, the best hits in each position across the whole sequence contig were isolated with the threshold of a minimum of 100 bp matched length and a maximum of 75 bp overlap with other homologies. If there were a minimum of 10 hits within a LTR-retrotransposon, and their average identity was >97.0%, the sequence was considered as a horizontally transferred LTR-retrotransposon. The HT of panicoid LTR-retrotransposons into the ten Oryza species was investigated by the same method. The horizontally transferred LTR-retrotransposons were classified with the maize repeat database (supplementary additional file 3, Supplementary Material online).
Homology/annotation of the five domains of LTRretrotransposons to the target genome sequence contigs was carried out with HMMER 3.0 (default options). The HMM profiles for the five standard LTR-retrotransposon domains were downloaded from GyDB. The position information of the annotated domains was used for identifying the candidate LTR-retrotransposons.
Homologous copies of horizontally transferred LTRretrotransposons in the ten Oryza species were identified by comparing the RT sequences of the horizontally transferred LTR-retrotransposons of panicoids to all the Oryza genome sequences by BLAST (single HSP, score: >60, and evalue: <e-8). From the BLAST results, a matched length longer than half of the query RT sequence and with identity >85% was considered a homologous copy of the horizontally transferred LTR-retrotransposon. The identity of these homologous copies was used for drawing the identity histograms. All of the analyses and visualizations of the results were carried out with in-house python scripts.

Nontargeted Search for HT
The sample sequences of the nine Echinochloa species were mapped to the ten Oryza genome sequences with bowtie 1.2.0 (default options) (Langmead 2010). To use a 94% threshold for the initial search, we generated 50 bp short-read sequences and mapped them by allowing a maximum of three mismatches. With this threshold, if the length of mapped regions in the target Oryza genomes was >500 bp and the interval between mapped sequences was <150 bp, we isolated the mapped region with 20 kb flanking regions on both sides. The 150 bp-long sample sequences of all the Echinochloa species were mapped again to the isolated target sequence contigs with BLAST (single HSP, score: >60, and evalue: <e-8). The target sequences mapped with the Echinochloa sample sequences by >97% identity and >500 bp mapped region were considered as candidate contigs containing horizontally transferred fragments. The candidate contigs were filtered by removing the contigs if the mapped region corresponded to organelle genome sequences, ribosomal sequences, simple repeat sequences, or highly conserved genes. After filtering, the horizontally transferred fragments were clustered with RepeatExplorer (default options). The clustered and nonclustered elements were classified using the maize repeat database. In the case of non-LTR retroelements, their identities were further confirmed by finding RT domains with the non-LTR retroelement domain search tool, RT1class1 (http://www.girinst.org/RTphylogeny/ RTclass1, last accessed May 12, 2021) (supplementary additional file 4, Supplementary Material online).

Phylogenetic Analysis of Panicoids
The long single copy part of the maize chloroplast sequence ($82 kb) (Maier et al. 1995) was used as the reference. Reads from each of the 25 panicoid sample sequences mapping to this reference were identified and assembled into speciesspecific contigs by Velvet 1.2.10 with default options (Zerbino and Birney 2008). The assembled chloroplast sequences were corrected again by mapping reads, and a > 90% consensus was extracted in Geneious (Kearse et al. 2012). The chloroplast sequences were aligned using MAFFT v. 7.392 (Katoh and Standley 2013) with six other chloroplast sequences (Phragmites australis, Aristida rufescens, Chloris virgata, O. sativa, Brachypodium distachyon, and Bambusa multiplex) obtained from GenBank, with the three last species used to root the tree. Trimal v. 1.4.rev6 (Capella-Guti errez et al. 2009) was used to remove all sites with ambiguous or missing data. The 56,388 bp alignment was used to infer a Recurrent Horizontal DNA Transfer in Grasses . doi:10.1093/molbev/msab133 MBE maximum-likelihood phylogenetic tree with RAxML v. 8.2.12 (Stamatakis 2014). Using the GTRCAT model, the best tree out of ten runs was identified, and support values were then estimated with 100 bootstrap pseudoreplicates. The inferred relationships mirrored those based on coalescence analyses of nuclear genes (Dunning et al. 2019). A similar approach was used to infer a tree for nine Echinochloa species, except that the plastid genome sequence from Setaria italica was used as the outgroup.
Phylogenetic Tree for HT Analysis RTs in the Oryza genomes were identified through BLAST searches (single HSP, score: >60, and e-value: <e-8, aligned length > 100 bp), using the panicoid RT sequences as the query from the clusters containing the horizontally transferred sequences. All RTs with at least one hit from the panicoids were retrieved from the Oryza genomes, and aligned to infer a phylogenetic tree. The alignment was conducted with MUSCLE as implemented in MEGA7.0 software (Kumar et al. 2016), and the phylogenetic tree was constructed by the Neighbor-Joining method (p-distance and pairwise deletion).

Analysis of Intrafamily TE Divergence to Determine Transposition History
The UPGMA algorithm (Legendre and Legendre 1998) was used to estimate the relative activity history of TEs based on pairwise distances. For this, the pairwise sequence identity of the TE sequences was calculated in each cluster by all-to-all BLAST analyses (single HSP, score: >60, and e-value: <e-8).
Based on the UPGMA algorithm, the phylogenetic relatedness and its node values were calculated from the pairwise distances of the RT sequences. The software used to calculate the relative activity history of TEs is found in supplementary additional file 5, Supplementary Material online. The distances at all the nodes were used for drawing identity histograms to estimate the relative activity history of TEs. The histograms were drawn by counting the number of nodes in each of the one percent intervals. Calculation of the distances at each node and visualization of the results were performed with two in-house python scripts, one to calculate the distances at each branching point and the other to visualize the results.

Phylogenetic Tree of Ten Oryza Species
To construct a phylogenetic tree of the ten Oryza species, orthologous CDSs of all nuclear genes were identified as the reciprocal best hits from BLAST searches (single HSP, score: >150, and e-value: <e-8). Each of the orthologous CDSs was translated into the amino acid sequence, and the amino acid sequences were aligned with prank (Löytynoja and Goldman 2010). The unaligned regions were trimmed, and the remaining sequences were converted back into CDS. The third positions of codons of all sequences were concatenated, and a phylogenetic tree was inferred with RAxML, as described above. The inferred relationships mirrored those inferred in previous studies (

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.