Deciphering Genome Content and Evolutionary Relationships of Isolates from the Fungus Magnaporthe oryzae Attacking Different Host Plants

Deciphering the genetic bases of pathogen adaptation to its host is a key question in ecology and evolution. To understand how the fungus Magnaporthe oryzae adapts to different plants, we sequenced eight M. oryzae isolates differing in host specificity (rice, foxtail millet, wheat, and goosegrass), and one Magnaporthe grisea isolate specific of crabgrass. Analysis of Magnaporthe genomes revealed small variation in genome sizes (39–43 Mb) and gene content (12,283–14,781 genes) between isolates. The whole set of Magnaporthe genes comprised 14,966 shared families, 63% of which included genes present in all the nine M. oryzae genomes. The evolutionary relationships among Magnaporthe isolates were inferred using 6,878 single-copy orthologs. The resulting genealogy was mostly bifurcating among the different host-specific lineages, but was reticulate inside the rice lineage. We detected traces of introgression from a nonrice genome in the rice reference 70-15 genome. Among M. oryzae isolates and host-specific lineages, the genome composition in terms of frequencies of genes putatively involved in pathogenicity (effectors, secondary metabolism, cazome) was conserved. However, 529 shared families were found only in nonrice lineages, whereas the rice lineage possessed 86 specific families absent from the nonrice genomes. Our results confirmed that the host specificity of M. oryzae isolates was associated with a divergence between lineages without major gene flow and that, despite the strong conservation of gene families between lineages, adaptation to different hosts, especially to rice, was associated with the presence of a small number of specific gene families. All information was gathered in a public database (http://genome.jouy.inra.fr/gemo).

• Thresholds setting Thresholds for Burkholderia and Magnaporthe were manually set to best split the two main modes of the distribution when a second peak in distribution was detected ( Figure  S1a). To avoid false positives from the border effect of the thresholds, a double threshold system was used. Regions were labelled as Burkholderia when the following two conditions were met: (i) the KL divergence to the Magnaporthe prototype was over their respective threshold (1000 Arbitrary Units) for Magnaporthe, and (ii) the KL divergence to the Burkholderia prototype was smaller then the threshold for Burkholderia (930 AU).

• Taxonomical assignment
The sequences selected with this parametric method were analyzed with GOHTAM (Menigaud, et al. 2012) and confirmed the homogeneity and origin from Burkholderiales. Comparison of the selected DNA to the results of Blast used to learn the prototype of Burkholderia showed a significant increase in DNA coverage for the parametric method over Blast while the oligonucleotide composition homogeneity was conserved.
• Manual curation This automatic detection of Burkholderia regions was followed by a manual curation. Scaffolds that comprised both M. oryzae and Burkholderia regions were systematically verified by Blastn and Blastx, and corrected when necessary. A Burkholderia tag was reported in corresponding OrthoMCL families for all genes located in Burkholderia regions. Following this, some OrthoMCL families comprised both Burkholderia and Magnaporthe genes. In these cases, scaffolds containing these particular M. oryzae genes were verified by Blastn and Blastx, and corrected when necessary.

Results
Large supplementary genomic regions were confirmed in four out of the nine genomes (FR13, GY11, PH14 and TH12). The cumulative sizes of Burkholderia regions were estimated to be 0. 72, 7.26, 9.78, and 8.39 Mb in FR13, GY11, PH14 and TH12 assemblies, respectively. Burkholderia scaffolds and regions (see table S1) were systematically tagged and filtered out of the assemblies for further bioinformatics analyses.
GOHTAM taxonomical assignment of these regions confirmed the homogeneity and origin from Burkholderiales. Interpretations from the results suggested an unsequenced species closely related to Burkholderia phytofirmans and Burkholderia xenovorans. Targeted Blast comparisons indicated that some of these supplementary regions are almost identical to Burkholderia fungorum sequences (100 % identity for 16S, recA and gyrB genes), and therefore probably originating from one or several bacterial isolate(s) of this species. Figure S1a: Distribution of the window compositional divergence according to the Bukholderia prototype Distribution of the Kullback--Leibler (KL) divergence per genome between the tetranucleotide composition of the sliding windows and the prototype tetranucleotide composition of the genome--specific Burkholderia sequences identified by Blast homology to Burkholderia species present in GenBank nr. Burkhorderia prototypes were learned per genome and on more than 100 Kb DNA. Burkholderia prototypes built from each genome were found very similar. The distribution of the KL from a fixed genome were comparable when the Burkholderia prototype from another genome was used. Compared to the composition of a Burkholderia prototype, low divergences are expected from sequences from Burkholderia or compositionally close. Hence a low value peak corresponded to Burkholderia, and a high value peak corresponded to Magnaporthe sequences. We used the strains 7015 and US71 as control to set the threshold at the junction of distributions. Figure S1b: Distribution of the window compositional divergence according to the Magnaporthe prototype Distribution of the Kullback--Leibler (KL) divergence per genome between the tetranucleotide composition of the sliding windows and the prototype tetranucleotide composition of the genome--specific Magnaporthe sequences extracted from the assembly scaffold when Blast hits against GenBank nr. did not match Burkhorderia species for more than 1% coverage. All the scaffolds matching this criteria were used to build the prototype compositional profile. Compared to a composition of Magnaporthe, low divergences are expected from sequences of the fungal genome. The low value peak correspond to Magnaporthe sequences, and the second peak to Burkholderia.