Discovery of divided RdRp sequences and a hitherto unknown genomic complexity in fungal viruses

Abstract By identifying variations in viral RNA genomes, cutting-edge metagenome technology has potential to reshape current concepts about the evolution of RNA viruses. This technology, however, cannot process low-homology genomic regions properly, leaving the true diversity of RNA viruses unappreciated. To overcome this technological limitation, we applied an advanced method, Fragmented and Primer-Ligated Double-stranded (ds) RNA Sequencing (FLDS), to screen RNA viruses from 155 fungal isolates, which allowed us to obtain complete viral genomes in a homology-independent manner. We created a high-quality catalog of 19 RNA viruses (12 viral species) that infect Aspergillus isolates. Among them, nine viruses were not detectable by the conventional methodology involving agarose gel electrophoresis of dsRNA, a hallmark of RNA virus infections. Segmented genome structures were determined in 42 per cent of the viruses. Some RNA viruses had novel genome architectures; one contained a dual methyltransferase domain and another had a separated RNA-dependent RNA polymerase (RdRp) gene. A virus from a different fungal taxon (Pyricularia) had an RdRp sequence that was separated on different segments, suggesting that a divided RdRp is widely present among fungal viruses, despite the belief that all RNA viruses encode RdRp as a single gene. These findings illustrate the previously hidden diversity and evolution of RNA viruses, and prompt reconsideration of the structural plasticity of RdRp.


Introduction
Mycoviruses are viruses that infect fungi. These viruses inhabit the insides of the rigid fungal structure, and are transmitted to other fungal cells through cell division, sporogenesis, or cell-tocell fusion (hyphal anastomosis), and their extracellular phase is hardly ever observed (Ghabrial and Suzuki 2009). While there are no reports of mycoviruses killing their host fungi during the fungal life cycle, some asymptomatic infections appear to be reminiscent of symbiotic relationships. Surveillance studies have revealed that mycoviruses are, to a certain extent, present in isolates of plant pathogenic fungi (van Diepeningen et al. 2006;Urayama et al. 2010;Arjona-Lopez et al. 2018) and human pathogenic fungi (Kotta-Loizou and Coutts 2017). In fact, more than 200 viral species have been detected in fungi to date (Gilbert et al. 2019).
Double-stranded (ds) RNA (dsRNA) has traditionally been used as the hallmark of RNA mycovirus detection. The rapid and specific extraction method for dsRNA from fungal cells and the conventional detection method, agarose gel electrophoresis (AGE) have accelerated large-scale mycovirus screening (Morris and Dodds 1979;Okada et al. 2015;Khankhum et al. 2017;Arjona-Lopez et al. 2018). However, AGE cannot detect low-level mycovirus infections because its sensitivity of viral detection is relatively low.
The establishment of metagenomic and metatranscriptomic analyses for virus surveillance over the last decade has enabled more viruses to be detected in ecologically diverse environments as well as in all living creatures. These methods, which rely on deep-sequencing analysis, were expected to be more sensitive than AGE at detecting mycoviruses. Indeed, Illumina sequencing technology has successfully identified mycoviruses from fungal isolates previously considered free of these viruses by AGE (Nerva et al. 2016). Deep-sequencing methods, however, are limited in their ability to identify novel viral sequences that lack homology to known virus-related sequences. Thus, finding a homologyindependent method capable of detecting unidentified viruses and viral sequences in samples is required to fill this gap in viral sequence detection. Such a method would expand the list of virusrelated sequences and augment current knowledge about viruses.
The multi-segmented RNA genomes possessed by some mycoviruses are coordinately replicated in the host. One segment contains an open reading frame (ORF) encoding an essential and universally seen RNA-dependent RNA polymerase (RdRp) in RNA viruses. Additional ORFs are contained in the other segment(s). In most cases, the cognate segments can be identified via the conserved sequences at their terminal ends (Urayama et al. 2018), but the conventional high-throughput sequencing methods mostly suffer from a loss of information for the terminal sequences. This technical limitation has hindered the discovery of the cognate segments and the novel viral sequences on them. To overcome this issue, we have recently developed 'Fragmented and primer-Ligated DsRNA Sequencing (FLDS)' technology to enable researchers to identify complete viral RNA genomes (Urayama et al. 2016(Urayama et al. , 2018. This method provides reliable terminal sequences for each genome, from which the segmented RNA genomes of viruses can be determined in a homology-independent manner. This advantage has allowed us to obtain highly informative complete genomes.
In this study, we adopted FLDS technology to comprehensively screen for mycoviruses in Aspergillus species. Altogether, 155 isolates of Aspergillus fumigatus and its related species were screened, and the complete genomes of eighteen viruses, as well as one with an incomplete genome, were determined for sixteen fungal isolates. The FLDS-based high-quality sequences, we obtained, supported the multi-segmented genome structure for eight viruses and uncovered novel viral sequences containing seven predicted novel ORFs in total. The most surprising finding from this screening is that one virus carries a partial RdRp lacking the essential C/D motif. The cognate segment of this virus encodes a novel ORF containing a C/D-like sequence motif. These unexpected results confirm that the ability to obtain complete, highquality viral genomes makes FLDS technology a powerful tool for expanding our current understanding of diversity in RNA viruses.

Strains and culture conditions
The Aspergillus strains used in this study are listed in Table S1. These strains were cultured in potato dextrose broth (PDB) with reciprocal shaking (120 rpm) for up to 5 days at 30 C or 37 C. All the strains were provided by the National BioResource Project (https://nbrp.jp/). Pyricularia (Magnaporthe) oryzae APU10-199A (Higashiura et al. 2019) was cultured in PDB for 1 week at 30 C before harvesting.

RNA extraction
Fungal mats (fresh weight, 100 mg) were disrupted in liquid nitrogen in a mortar or using FastPrep 24 (MP Biomedicals Inc., OH, USA). dsRNA and ssRNA purification was performed as described previously (Urayama et al., 2016(Urayama et al., , 2018(Urayama et al., , 2020. In brief, total nucleic acids were manually extracted from the ground cells with sodium dodecyl sulfate-phenol. dsRNA was purified using the cellulose resin chromatography method (Morris and Dodds 1979;Okada et al. 2015) and subjected to AGE analysis. To obtain sequence-grade dsRNA, the remaining DNA and ssRNA were removed with amplification grade DNase I (Invitrogen, Carlsbad, CA, USA) and S1 nuclease (Invitrogen). Total RNA was extracted from the pulverized samples using the TRIzol Plus RNA Purification Kit (Invitrogen), and the eluted RNA was treated with amplification grade DNase I (Invitrogen) and purified using RNA Clean & Concentrator-5 (Zymo research, Irvine, CA, USA).

Sample pooling, sequence library construction and sequencing
When dsRNA band(s) were visible by AGE, the dsRNA samples from each isolate were individually prepared for viral genome sequencing by FLDS. When no visible dsRNA band or bands were observed, the dsRNA samples from up to twenty isolates were pooled into a single sample (referred to as pool 1-8), following FLDS analysis (referred to as pooled-FLDS analysis).
dsRNA was converted into dscDNA by the FLDS method (Urayama et al. 2018). Each purified dsRNA was fragmented by ultrasound using a Covaris S220 ultrasonicator (Woburn, MA, USA), and a U2 adapter was ligated to each dsRNA fragment using T4 RNA ligase (Takara Bio Inc., Kusatsu, Japan). After denaturing the product, single-stranded (ss) cDNA (sscDNA) was synthesized using the SMARTer RACE 5 0 /3 0 Kit (Takara) with a U2-complementary primer. dscDNA was obtained by PCR with a U2-complementary primer and a universal primer mix (provided by the SMARTer RACE 5 0 /3 0 Kit). cDNA libraries were constructed as described previously (Urayama et al. 2018). Each 300 bp of the paired-end sequences from each fragment were determined on the Illumina MiSeq platform (Illumina, CA, USA).

Data processing
Clean reads were obtained by removing low-quality, adapter, and low-complexity sequences as described previously (Urayama et al. 2018). For the RNA virome analyses, contaminated rRNA reads were removed by SortMeRNA (Kopylova et al. 2012). According to a previous method (Urayama et al. 2018), the cleaned reads were subjected to de novo assembly using the CLC Genomics Workbench version 11.0 (CLC Bio, Aarhus, Denmark). The resulting assemblies were manually examined and extended using the assembly Tablet viewer (Milne et al. 2010). Where the terminal end of a contig ended with same bases for more than ten reads or a polyA sequence was present, the position was recognized as the terminal end of the RNA genome. When a contig had termini at both of its ends, it was considered to be the full-length sequence of the RNA genome. Multisegment genomes were judged according to the terminal sequence similarities of the segments. BlastN and BlastX programs (Camacho et al. 2009) were used to identify sequence similarities among known nucleotide sequences and protein sequences, respectively.

Phylogenetic analysis
ORF prediction was performed by ORFfinder, after which Pfam domain searching was conducted (Finn et al. 2016). Phylogenetic analysis of the helicase and methyltransferase domains of RdRp, which were based on the amino acid sequences obtained from the NCBI nr database, were aligned using MUSCLE (Edgar 2004) in MEGA6 (Tamura et al. 2013). The accession numbers of the sequences used for the analyses are listed in Table S4. Alignment-ambiguous positions were removed with trimAl (Capella-Gutié rrez et al. 2009). Maximum likelihood-based phylogenetic analyses were performed using RAxML (Stamatakis 2014), and bootstrap tests were conducted with 1,000 samplings. The amino acid substitution model was selected by Aminosan (Abascal et al. 2005) using Akaike's information criterion (Sugiura 1978). To visualize phylogenetic trees, FigTree was used.

Reverse transcription (RT) PCR (RT-PCR)
To detect RNA viruses from host fungi in pooled samples, RT-PCR analyses were performed using specific primer sets (Table  S5) as described previously (Urayama et al. 2015). After pricking the mycelia grown on a potato dextrose agar several times with a toothpick, the toothpick was then dipped into the one-step RT-PCR mix in a PCR tube and twisted three times. One-step RT-PCR was performed using the Super-Script III One-Step RT-PCR System with Platinum Taq (Invitrogen) according to the manufacturer's protocol. To support the sequences provided by FLDS, the 3 0 end sequences of RNAs 1 and 2 from AfuRV1 were confirmed by one-step RT-PCR using an oligo (dt) primer and specific primers (Table S5). Total RNA was used as the template. PCR products were run on 1 per cent agarose gels and the visualized fragments were excised, purified using the FastGene Gel/

Comprehensive screening for RNA viruses using FLDS
Altogether, 155 Aspergillus strains were used to screen for RNA viruses (Table S1). First, the dsRNA extracted from the mycelium of each strain was subjected to AGE. As a result, dsRNA bands were visibly detected from four A. fumigatus and five Aspergillus lentulus strains (Fig. 1). The patterns of the dsRNA bands differed from each other. The dsRNAs from the remaining 146 strains were pooled into eight groups (dsRNAs from $20 strains per group) and sequenced using FLDS to identify viruses that were undetectable by AGE. After RT-PCR verification, viral sequences were identified in eight strains (Fig. S1A). Overall, seventeen fungi were identified that were infected with an RNA virus (Table 1), and the frequency of RNA virus-positive isolates of each species was as follows: A. fumigatus 8/79 (10.1 per cent), A. lentulus 5/27 (18.5 per cent), and Aspergillus pseudoviridinutans 4/15 (26.7 per cent), Aspergillus udagawae 0/19 (0 per cent), and Neosartorya fischeri 0/15 (0 per cent) (Fig. S2). The A. fumigatus set, which included twenty environmental and fifty-nine clinical isolates, contained three and five isolates that were infected with RNA viruses, respectively. We also identified additional RNA viruses, which did not possess visible dsRNA band from two AGE-positive strains, and confirmed the presence of these viruses using RT-PCR (Fig. S1B).

Determination of segmented virus genomes
We determined the full-length viral sequences by FLDS for the strains that we identified by AGE and the pooled FLDS. Altogether, we determined the complete sequences of sixteen fungal strains within three species, A. fumigatus, A. lentulus, and Aspergillus pseudoviridinutans. The complete sequence from A. pseudoviridinutans IFM 59503 could not be obtained despite this virus being detected by RT-PCR. The segmented viral genomes were determined according to the terminal sequence similarity among them (Fig. S3). For example, an RNA virus isolated from A. fumigatus IFM 62632 was found to contain four segments, whereas A. fumigatus IFM 63431 and IFM 62629 each contained three segments (Table 1). As a consequence, clarification of each segment resulted in us being able to differentiate the coinfecting viruses in the fungal isolates. In fact, A. fumigatus IFM 62632 and IFM 63147 were co-infected with two and three viruses, respectively. Altogether, twenty viruses were identified in seventeen Aspergillus isolates, among which eight had segmented genomes (two each for bi-segments, four each for trisegments, and two each for tetra-segments) ( Table 1). The FLDS method detected the viral genomes with high sensitivity, even when multiple viruses had co-infected the same host, and it accurately discriminated segmented genomes in the viruses.

RNA virus classification
Our BlastX analysis revealed that all twenty of the viral RNA sequences contained RdRp (Table S2). Based on the criterion that the RdRp-encoding segments sharing >90 per cent nucleic acid sequence identity were recognized as single operational taxonomic units (OTUs), twenty viruses fell into twelve OTUs. The virus from A. pseudoviridinutans IFM 59503 with a partial sequence also fell into one of the OTUs. Based on the high sequence identity (>95 per cent) shared with known viral sequences, four viruses were recognized as Aspergillus fumigatus polymycovirus 1 (AfuPmV1), Aspergillus fumigatus chrysovirus (AfuCV), Aspergillus fumigatus narnavirus 2 (AfuNV2), and Aspergillus fumigatus mitovirus 1 (AfuMV1). The other eight OTUs were regarded as novel viral species, and were therefore tentatively named according to the taxonomical linage of the top 'hits' for RNA viruses in the BlastX analysis against the NCBI nr database (Table S2). Six of eight viral species were related to dsRNA virus families (Partitiviridae and Totiviridae), positive ssRNA virus families (Narnaviridae and Botourmiaviridae), and a negative ssRNA virus family (Betamycobunyaviridae, a previously suggested virus family), whereas two OTUs were considered to be unclassified RNA viral linages. Notably, six of the twenty viruses identified herein, all of which had been detected by AGE, are considered to be dsRNA viruses.

Structures of the novel RNA viruses identified in Aspergillus fungi
AfuRV1, which was identified in A. fumigatus IFM 63439, consists of three RNA segments (3,611, 3,447, and 1,943 nucleotides long, excluding the polyA-like region) ( Fig. 2A). Each segment contains a single ORF lacking significant nucleotide identity to known sequences in the NCBI nt database. ORF1 contains methyltransferase (E-value ¼ 1.8 Â 10 210 ) and RdRp (E-value ¼ 2.6 Â 10 282 ) domains, and ORF2 contains methyltransferase (E-value ¼ 3.3 Â 10 225 ) and helicase (E-value ¼ 2.5 Â 10 221 ) domains. BlastX analysis showed that the sequence with the top hit for ORF1 was RdRp from Luckshill virus (LuV) (an unclassified ssRNA virus) (coverage, 93.0 per cent; E-value, 0; identity, 32.7 per cent), whereas that for ORF2 was a hypothetical protein from Cyril virus (another unclassified ssRNA virus) (coverage, 78.0 per cent; E-value, 3.0 Â 10 240 ; identity, 32.2 per cent). Phylogenetic analysis of the RdRp domain showed that AfuRV1 falls into the virga-like virus clade of viruses previously isolated  Table S4. The viral sequences from fungi, invertebrates, oomycetes, plants, and an unknown host are indicated by orange circles, blue squares, light blue diamonds, green triangles, and question marks, respectively. The viral sequences identified in this study are shown in red font. from invertebrates and fungi (Fig. 2B). AfuRV1, however, fell into the invertebrate-derived sub-clade containing LuV, rather than the mycovirus sub-clade. Interestingly, viruses within the virga-like clade have never been reported to possess segmented genomes. In our RT-PCR experiments to detect nonsegmentated RNA genome of AfuRV1, no product was obtained, supporting our conclusion that these three segments were not subgenomic RNAs of a hypothetical monopartite RNA virus (data not shown). We found that the AfuRV1 genome contains an additional segment. Comparing the methyl transferase domains of ORF1 and ORF2, ORF2 fell into the virga-like virus clade, but ORF1 did not (Fig. 2C). The helicase domain in ORF2 did not fall within an established family or proposed group (Fig. 2D).
AleNV1, which originated from A. lentulus IFM 63547, contains two segmented genomes (3,071 and 1,814 nucleotides long, excluding the polyA-like region) (Fig. S4), whose RdRp genes share low sequence homology with that of the Beihai narna-like virus 21. The ORFs in RNA2 did not share significant similarity with known proteins. Based on its RdRp sequence, AleNV1 belongs to the well-established Narnavirus genus (Fig. 3). No viruses with bi-segmented genomes have been reported so far in viral genera, except LepseyNLV1 (whose complete sequence has not been reported), and MaRNAV1, which was isolated from the human trypanosomatid parasite Leptomonas seymouri (Grybchuk et al. 2018) and the human malaria parasite Plasmodium vivax (Charon et al. 2019). Thus, AleNV1 has a unique genome structure and is the first reported The arrangements of the amino acid residues of motifs A and C. The green and blue stick model show amino acid residues of motifs A and C from AfuNV2, respectively (red, oxygen atom; blue, nitrogen atom). White stick model shows the corresponding residues from bacteriophage Qb. Characters refer to amino acid residues of AfuNV2 and parentheses are amino acid residues of bacteriophage Qb. (C) Hydrogen bonds between ORF1 and ORF2. Green and blue ball models show ORF1 and ORF2 residues, respectively. Stick models show amino acid residues of ORF1 and ORF2 that interact with each other. Pink dashed lines refer to hydrogen bond between amino acid residues of ORF1 and ORF2.
isolate with a bi-segmented genome among mycoviruses from the Narnavirus genus.

Discovery of a novel divided RdRp sequence
AfuNV2, a previously reported virus (Zoll et al. 2018), was identified in three different A. fumigatus isolates (IFM 63147, IFM 63431, and IFM 62629) (Table 1). Although the sequence reported by another research group is non-segmented, our FLDS-based sequencing revealed that AfuNV2 possesses RNA2 and RNA3 segments in addition to RNA1 (RdRp). This tri-segmented genome was confirmed to contain the highly conserved terminal sequences described above (Fig. S3), and was also confirmed by AGE where the band patterns were seen to correspond to the tri-segmented genome's length (Fig. 1, lanes 2 and 3). RNA2 and RNA3 are predicted to encode a large single ORF (ORF2) and multiple short ORFs (ORF3, ORF4, and ORF5) (Fig. 4A). BlastP analysis showed that the sequence with the top hit for ORF2 was RdRp from Plasmopara viticola-associated narnavirus 33 (PvaNV33) (coverage, 74.0 per cent; E-value, 3EÀ17; identity, 23.9 per cent), whereas ORF3, ORF4, and ORF5 share no significant similarities (E-value 1 Â 10 25 ) with known protein sequences in the NCBI nr database or in the Pfam domain database.
When the RdRp amino acid sequences were aligned, we noticed that the RdRp-encoding ORF1 from AfuNV2 lacked motifs C and D, but the well-conserved motifs (F, A, and B) among the narnaviruses from other fungi were present (Fig. 4B, C). Interestingly, closely-related narnaviruses (CtNV1, AtNV1, NpNV2, FuPNV2, and AfuNV1) were also found to lack the C and D motifs (Fig. 4C) (Lin et al. 2020). In motif C, GDD, a conserved amino acid sequence, is believed to play an essential catalytic role. Therefore, we searched for the GDD sequence in the other RNA2 and RNA3 ORFs. After much searching, we found a potential sequence in the N-terminal region of the RNA2 ORF. Sequence alignments suggested that this region includes amino acid residues that are conserved in the C and D motif regions of the RdRp proteins from narnaviruses (Fig. 4C). This was observed in three of the sequences from the AfuNV2 viruses we isolated. To rule out the possibility that there are RNAs which have RNA1 and RNA2 sequences in single molecule, we performed RT-PCR analysis (Fig. S5). Our data supported that AfuNV2 has RNA1 and RNA2 sequences in different RNA molecules. However, we were unable to confirm whether the aforementioned closely-related CtNV1, AtNV1, NpNV2, FuPNV2, and AfuNV1 species contain RNA2 with an ORF containing a C/Dlike motif sequence, because their sequences were deposited as non-segmented genomes.
To support this finding, we searched for other viruses that lack the C and D motifs in RdRp. We eventually found one known RNA virus by FLDS sequencing, Magnaporthe oryzae narnavirus 1 (MoNV1) (MN480844) from Pyricularia (Magnaporthe) oryzae APU10-199A, with a complete genome and reliable terminal sequences in each segment. While a non-segmented genome was reported for MoNV1, the MoNV1 we isolated has four genomic segments, but the motifs C and D are missing from RdRp on RNA2 (Fig. 4D). The C/D-like motif lies within the N-terminal region of the ORF of RNA1, as was the case of AfuNV2 RNA2 (Fig. 4B). The ORFs on RNA3 and RNA4 from MoNV1 contain no predicted proteins or domains with significant homology. In addition to MoNV1, another narnavirus, MoNV2, coinfected the same strain. MoNV2 has a non-segmented genome containing a single ORF that shares significant nucleotide identity with RdRp from AleNV1 (coverage, 99.0 per cent; E-value, 0; identity, 74.61 per cent) (Fig. 3). The RdRp from MoNV2 harbors a C/D motif (Fig. 4C). Collectively, the results for these two different fungal species (A. fumigatus and P. oryzae) suggest that certain narnaviruses carry RdRp sequences that are divided into two different ORFs, with one ORF containing motifs F, A, and B, and the other potentially containing motifs C and D.

Homology modeling of ORF1 and ORF2
To obtain structural information about ORF1 and ORF2, the SWISS-MODEL program was used to perform homology modeling of ORF1 and ORF2 together. We used the RdRp domain derived from bacteriophage Qb, which was the best structural match, as a template for building the model (global model quality estimation, 0.03; sequence identity, 18.88per cent). Ser437-Asn626 of ORF1 and Met1-Leu108 of ORF2 were modeled and superimposed with the RdRp of bacteriophage Qb (Fig. 5A). Although ORF1 and ORF2 showed low sequence identity with the RdRp from bacteriophage Qb, the three-dimensional structures of the motif regions were highly conserved, especially the arrangements of the amino acid residues that form motifs A (STDWESATD) and C (GDDEI) in the model, which were nearly the same as those of the RdRp from bacteriophage Qb (Fig. 5B). Some amino acid residues of ORF1 and ORF2 were predicted to interact with each other by hydrogen bonds and van deer Waals contacts ( Fig. 5C and Table S3). In addition to interactions between motifs A and C, such as Thr521/Trp523 and Glu65, there were also hydrogen bonds formed between amino acid residues of ORF1 and ORF2 that did not contain any known structural motifs. The regions at the C-terminus of modeled ORF1 and the N-terminus of modeled ORF2 formed a flexible loop structure, suggesting that these regions are not essential for the function of RdRp. Together, these data suggest that the split occurred at a domain boundary and that the two fragments could associate to form a functional enzyme.

Discussion
DsRNA mycoviruses infect a wide range of fungal species including Ascomycota, Basidiomycota, Glomeromycota, and Mucoromycotina (Gilbert et al. 2019). Some viruses can affect growth, development, toxin production, and pathogenicity in their fungal hosts. Fungal viruses have not been intensively studied despite their potential impact on ecology, agriculture, food security, and public health. In this study, several new viral sequences were successfully identified as complete genomes. A most intriguing finding was the presence of a divided RdRp gene. No examples like this irregular form have been reported in any other RNA viruses known to infect living creatures (on the way of reviewing process, another example was reported (Sutela et al. 2020), see below).
Large-scale AGE screening for RNA viruses was conducted on a set of more than 300 Aspergillus strains (Bhatti et al. 2012), and 6.6 per cent of the detected strains contained dsRNA. Elsewhere, of the eighty-six A. fumigatus clinical isolates that were examined 18.6 per cent contained dsRNA (Refos et al. 2013). Here, we detected dsRNA by AGE from 9/155 strains (5.8 per cent) of Aspergillus species. Another advanced screening method, metatranscriptomics, was recently used to detect viral sequences, and seventy-two assembled virus-related sequences from 275 isolates of plant pathogenic fungi were identified (Marzano et al. 2016). The searches were based on identifying sequences with significant identity to known amino acidencoding viral sequences. Notably, only 15 per cent of the sequences were predicted to be derived from dsRNA, while 73 per cent and 12 per cent were predicted to be derived from positive-sense RNA or negative-sense RNA viruses, respectively. More recently, a pipeline to efficiently detect viral sequences from a transcriptomics dataset was proposed (Gilbert et al. 2019). This was also based on sequence similarity to RdRp. Of the 569 RNA-Seq samples, fifty-nine complete mycoviral genomes were identified in forty-seven datasets, thirty-four viruses (57 per cent) were predicted to be dsRNA viruses, and 88 per cent were new species. Here, ten viruses were identified by FLDS that were overlooked by AGE. All the overlooked viruses were predicted to be ssRNA viruses. We consider that the amount of dsRNA recovered as a replicative intermediate of ssRNA is low. Nonetheless, FLDS captured sequences derived from ssRNA viruses. Therefore, the highly sensitive, highthroughput sequence-based screening of FLDS is a powerful tool for constructing deep and wide viral catalogs.
One apparent limitation occurs with homology-based viral sequence detection whereby the novel ORFs accumulated by knowledge-based updating can overlook those lacking homologies to known virus-related ORFs, resulting in a biased list of viral sequences. Our FLDS-based screening was clearly able to circumvent this issue by identifying six viral species with segmented genomes and discovering seven ORFs that had never been recognized as virus-related sequences (ORFs encoding hypothetical proteins with unknown functions, Table S2).
The discovery of segmented genomes provides evolutionary insight into the origin of certain mycovirus groups. AfuRV1 has three segmented genomes and contains two methyltransferase domains in different segments. As far as we are aware, RNA viruses with multiple methyltransferase domains have never been reported in published literature. Phylogenetic analysis revealed that the methyltransferase domain of ORF2 (MT2) and the RdRp domain of ORF1 belong to a virga-like virus clade. In contrast, the methyltransferase domain of ORF1 (MT1) and the helicase domain of ORF2 fall into an unclassified group (not the virga-like virus clade). This suggests that the ancestor of AfuRV1 acquired MT1 and its helicase domain from a different virus. Identifying the origins of these domains and the evolutional history of AfuRV1 is not straightforward because MT1 and the helicase domain are distantly related to the domains of known viruses. By FLDS analysis, we identified segmented genomes in AfuNV2 and AleNV1 that both belong to the Narnaviridae family. Interestingly, segmented genomes have only been reported for LepseyNLV1 (a protozoal virus) and Botourmiaviridae family plant viruses, and they have not been reported among the Narnaviridae family or closely-related ssRNA viruses (Rastgou et al. 2009;Grybchuk et al. 2018). Hence, this is the first identification of Narnaviridae family mycoviruses with multi-segmented genomes. It is noteworthy that AfuNV2 and AleNV1 were not classified as belonging to the sub-clade that includes LepseyNLV1. Furthermore, AfuNV2 and AleNV1 narnaviruses are not closely related to each other and have different genome structures. This suggests that during evolution of the Narnaviridae family and its relative ssRNA viruses, the acquisition of segments and changes in genome structure occurred independently in each host kingdom. We cannot, however, rule out the possibility that non-RdRp-encoding segments are satellite RNAs that tentatively coexist. Further investigations in this area are required.
The gene encoding RdRp is universally present among RNA viruses (Wolf et al. 2018), and all known RNA viruses encode RdRp in a single ORF (King et al. 2012). Unexpectedly, we found that viruses in a certain clade of Narnaviridae encode an RdRp that lacks catalytic domains C and D, and a different coexisting ORF encodes the missing domains. This irregular genome structure was identified in two different viruses from two different fungal hosts, Aspergillus and Pyricularia. Besides these two viruses, this group includes other viruses isolated from Fusarium (FuPNV2), Neofusicoccum (NpNV2), Cladosporium (CtNV1), and Alternaria (AtNV1) (Fig. 3). According to the deposited sequences, these viruses appear to lack domains C and D of RdRp. These sequences were not obtained by FLDS; thus the corresponding ORF with a GDD motif or cognate genomes were unlikely to be detected; indeed, no such sequences have been deposited. The deposited sequence dataset from other narnaviruses supports the possibility that imperfect RdRp proteins exist in a certain group of mycoviruses that infect a wide range of the Ascomycetes phylum. As described above, the ORF2 of AfuNV2 showed low but certain similarity to the RdRp sequence from PvaNV33. Interestingly, PvaNV33 showed certain similarities not only to ORF2 but also to ORF1 of AfuNV2 (Fig. S6). Moreover, sequences of RdRp from PvaNV32 (GenBank: QIR30311.1) and PvaNV35 (GenBank: QIR30314.1) that formed a single clade with PvaNV33 also shared certain similarities to both ORFs of AfuNV2. This complementary genomic structure suggests that the RdRps from these Plasmopara viticola-associated narnaviruses are the molecular ancestors from which ORF1 and ORF2 of AfuNV2 were derived by division. During the preparation of this manuscript, another research group reported on the identification of MoNV1 from an M. oryzae strain isolated in China (Lin et al. 2020). The viral sequence was deposited as a single genome with RdRp lacking domains C and D, which concords with our results for the MoNV1 we identified in our work. This consistency in the viral sequences from different countries suggests that viruses with irregular genome structures are widely distributed, meaning that the event leading to the atypical structures did not occur locally. During revision of this manuscript, Dr. Turina and his colleagues reported a virus with a split RdRp that infects the ericoid mycorrhizal fungi Oidiodendron maius (Sutela et al. 2020), which further supports our finding that AfuNV2 from three A. fumigatus strains and MoNV1 from a P. oryzae strain possess divided RdRp sequences on different genome segments. Our homology modeling analysis for the artificially fused RdRp fragment suggests that proteins from the divided RdRp of AfuNV2 can form a structure that is similar to the existing functional enzyme. Our finding of divided RdRp sequences raises questions about how divided RdRps function in host cells and whether atypical structures are limited to mycoviruses. Answering these questions is of great interest to us.
In conclusion, our FLDS-based screening for mycoviruses using 155 fungal isolates has led to the discovery of novel species and novel segmented genomes in some of the viruses. Some viral genomes have novel structures that would not have been captured by conventional methods.

Data accessibility
Data sets supporting the results of this study are available in the GenBank database repository (Accession Nos. DDBJ: LC553675-LC553714) and the Short Read Archive database (Accession No. DDBJ: DRA010415).

Supplementary data
Supplementary data are available at Virus Evolution online. Identification of strains infected with viruses detected by pooled FLDS. In ApvBOV1, virus-infected strains were investigated using two primer sets. Strains that produced the expected size products in both primer sets were designated as infected strains. (B) RT-PCR detection of viruses obtained by FLDS. RT-PCR analysis was conducted as described in Materials and Methods except for AfuRV1. In AfuRV1, total nucleic acid was used as template. The PCR products were electrophoresed, and the gel was stained with GelRed. The DNA size marker was kHind III digests. The number shows the tested strain number (IFM number), and bold indicates virus-infected strains. For sequence of primers and their expected product size, see Table S5.  Table S5. (B) Detection of the RT-PCR products. Mixture of total RNA extracted from AfuNV2 infected strains (A. fumigatus IFM 62629, IFM 63147 and IFM 63431) was used as the template. We extracted total RNA independently, and mixed equal amounts of them. Total RNA extracted from A. fumigatus IFM 64916 was used as negative control. The PCR products were electrophoresed, and the gel was stained with GelRed. The DNA size marker was 200bp DNA Ladder. Fig. S6: Schematic model of RdRp protein from PvaNV33 and comparison with AfuNV2. Conserved AD and F motifs in RdRp from PvaNV33 and AfuNV2 are shown. The regions with similarity between two proteins are shown in a striped square.