-
PDF
- Split View
-
Views
-
Cite
Cite
Ramesh R Vetukuri, Sucheta Tripathy, Mathu Malar C, Arijit Panda, Sandeep K Kushwaha, Aakash Chawade, Erik Andreasson, Laura J Grenville-Briggs, Stephen C Whisson, Draft Genome Sequence for the Tree Pathogen Phytophthora plurivora, Genome Biology and Evolution, Volume 10, Issue 9, September 2018, Pages 2432–2442, https://doi.org/10.1093/gbe/evy162
- Share Icon Share
Abstract
Species from the genus Phytophthora are well represented among organisms causing serious diseases on trees. Phytophthora plurivora has been implicated in long-term decline of woodland trees across Europe. Here we present a draft genome sequence of P. plurivora, originally isolated from diseased European beech (Fagus sylvatica) in Malmö, Sweden. When compared with other sequenced Phytophthora species, the P. plurivora genome assembly is relatively compact, spanning 41 Mb. This is organized in 1,919 contigs and 1,898 scaffolds, encompassing 11,741 predicted genes, and has a repeat content of approximately 15%. Comparison of allele frequencies revealed evidence for tetraploidy in the sequenced isolate. As in other sequenced Phytophthora species, P. plurivora possesses genes for pathogenicity-associated RXLR and Crinkle and Necrosis effectors, predominantly located in gene-sparse genomic regions. Comparison of the P. plurivora RXLR effectors with orthologs in other sequenced species in the same clade (Phytophthora multivora and Phytophthora capsici) revealed that the orthologs were likely to be under neutral or purifying selection.
Introduction
Species of the genus Phytophthora are all known to be pathogens of plants, some with a narrow host range, and others able to infect hundreds of different plant species (Erwin and Ribeiro 1996). Phytophthora species superficially resemble fungi but are hyphae-forming oomycetes, and are placed in the stramenopiles (syn. Heterokonta; kingdom Chromista, SAR supergroup), along with brown algae and diatoms (Cavalier-Smith 2018). Phytophthora plurivora is a soil-borne root rot pathogen infecting a broad range of woody plants, such as Quercus spp., Acer spp., Alnus spp., Vaccinium spp., Rhododendron spp., and Fagus spp. (Jung and Burgess 2009; Schoebel et al. 2014). It is predominantly implicated in widespread declines of European beech (Fagus sylvatica) and oak species (Quercus spp.) (Jung et al. 2000; Jung 2009). Common symptoms of P. plurivora disease include collar rots, bark cankers, extensive damage to fine roots, and crown dieback on young and mature trees (Jung et al. 2005; Orlikowski et al. 2011). Stem inoculations to test the relative susceptibility of conifers and broadleaved tree species common in Sweden demonstrated P. plurivora to be highly aggressive on pedunculate oak (Quercus robur L.), European beech and black cottonwood (Populus trichocarpa), highlighting the overall risk of different Phytophthora species to forest trees (Cleary et al. 2017).
The most recent phylogeny of the genus Phytophthora places P. plurivora in Clade 2, subclade 2c, together with related pathogens Phytophthora acerina, Phytophthora pachyleura, Phytophthora capensis, Phytophthora pini, Phytophthora multivora, and Phytophthora citricola (Yang et al., 2017). P. plurivora is proposed to be most likely native to Europe based on haplotype and microsatellite data analysis (Schoebel et al. 2014). It is now distributed worldwide, aided by dissemination of diseased plant material through the plant nursery trade (Schoebel et al. 2014). In southern Sweden, P. plurivora, along with P. cactorum, has been recognized as an increasing threat to cultivated plantation forests (Cleary et al. 2017; Grenville-Briggs et al. 2017).
The great majority of studies on Phytophthora species have focused on those that infect important annual crop plants, for example, Phytophthora infestans that infects potato and tomato, and Phytophthora sojae that infects soybean (Kamoun et al., 2015). Here, we present a draft genome sequence assembly for a P. plurivora strain recently isolated from Sweden, as a resource to enable future studies that aim to develop a greater understanding of how Phytophthora species such as P. plurivora cause disease on woody host plants. We then used this resource to examine the complement of effector coding genes to identify rapidly evolving candidate RXLR and Crinkler effector coding genes (Schornack et al. 2009), collinearity of genomic regions, and whether the effector genes reside in gene poor genomic regions as in other Phytophthora species.
Materials and Methods
Sample Collection and Sequencing
P. plurivora strain AV1007 was isolated from a bleeding canker on a diseased European beech tree (F. sylvatica) in Malmö, Sweden in 2016. P. plurivora was cultured in liquid V8 juice media and DNA extracted as described (Löbmann et al. 2016). Paired-end reads of 2 × 250 bp (37 million reads from each library) were sequenced using the Illumina Hiseq2000 rapid mode sequencing platform at MR DNA Molecular Research Laboratory, USA. Adapter removal was performed using Trimmomatic, and FastQC tools were used for raw data assessment (Bolger et al. 2014). To prepare mycelium samples for RNA isolation, P. plurivora was grown in liquid V8 medium for three days at 20 °C, collected by gravity filtration, snap frozen in liquid nitrogen, and stored at −70 °C until used for RNA extraction. Three independent biological replicates were prepared. Total RNA for RNA sequencing was extracted from frozen mycelium samples using the RNeasy Plant mini kit (QIAGEN) following the manufacturer’s protocol. Yield and integrity of the RNA was assessed using a NanoDrop Micro Photometer (NanoDrop Technologies), Agilent Bioanalyzer (Agilent Research Laboratories) and agarose gel electrophoresis, respectively. Library preparation for RNA sequencing was carried out using Illumina TruSeq RNA preparation kit using poly-A selection. Illumina HiSeq2500 2 × 126 bp paired-end sequencing was performed at SciLifeLab, Sweden. The reads obtained (Library 1: 30193560, Library 2: 36829808 and Library 3: 38506144) were mapped onto the reference genome, and the Tuxedo suite (Trapnell et al. 2012) was used for analysis. Expression levels of genes were assessed using the Tuxedo pipeline.
Genome Assembly and Annotation
The Illumina sequence reads were assembled de novo using SPAdes 3.5.0, a de-Bruijn graph-based assembler which incorporates error correction and removal of poor quality reads (Bankevich et al. 2012). The assembled genome was assessed using QUAST (Gurevich et al. 2013). To evaluate if the assembled genome size for P. plurivora is a marked underestimate of true genome size, such as may occur due to a high repeat content, we calculated a genome size estimate using k-mer analysis of our sequence data (Schell et al. 2017). We used Jellyfish (Marçais and Kingsford 2011) for generating the k-mer histograms with a k-mer value of 25 and minimum phred quality of reads at 20. The histograms generated by Jellyfish were fed into GenomeScope (github.com/schatzlab/genomescope; last accessed August 23, 2018). We also computed genome sizes for Phytophthora infestans and P. multivora for comparison purposes using the same method. The raw sequence reads for P. infestans and P. multivora (isolate NZFS 3378) were downloaded from NCBI (https://trace.ncbi.nlm.nih.gov/Traces/sra/? run=ERR1990236 and https://trace.ncbi.nlm.nih.gov/Traces/sra/? run=SRR2126502). Augustus (Stanke et al. 2004) was used for gene prediction using the previously sequenced plant pathogenic relative, Phytophthora capsici, as a training model. The gene space assessment of P. plurivora was evaluated by BUSCO 2.0 (Benchmarking Universal Single-Copy Orthologs [BUSCOs]) for completeness based on a set of common stramenopile genes (Simão et al. 2015). For comparison, BUSCO analysis was also carried out for genome assemblies for P. infestans, P. capsici, P. ramorum, and P. multivora. The predicted protein sequences were submitted to Interproscan 5 (Jones et al. 2014) on our local server to annotate Pfam domain functions in predicted protein sequences.
Repeat and Effector Analysis
Analysis of repetitive DNA sequences was carried out using RepeatModeler 1.0.10 (http://www.repeatmasker.org/; Last accessed August 23, 2018) for de novo repeat identification, using oomycete genomes as a model repeat library. Repeat finding in P. plurivora was carried out using RepeatMasker (http://www.repeatmasker.org) (Smit et al. 2013–2015), using the consensus file generated by RepeatModeler.
Four different pipelines were used for RXLR effector prediction. The most stringent pipeline was based on an in-house protocol (supplementary fig. 1a, Supplementary Material online, Effector Prediction Pipeline A) and the second pipeline was based on EffectorP 1.0, a fungal-based pipeline (supplementary fig. 1a, Supplementary Material online, Effector Prediction Pipeline B) (Sperschneider et al. 2016). The other two prediction pipelines involved using earlier RXLR effector prediction methods (Whisson et al. 2007; Win et al. 2007) from a Galaxy server (Afgan et al. 2016) (supplementary fig. 1a, Supplementary Material online, Effector Prediction Pipelines C and D) (Cock et al. 2013). The in-house effector prediction method was used to predict effectors for P. plurivora and other closely related oomycetes such as P. multivora (two isolates) and P. capsici for comparative genomic analysis.
Prediction of Crinkle and Necrosis (CRN) effectors was performed as described previously (Haas et al. 2009; Yin et al. 2017). SignalP 3.0 and Phobius 1.01 searches to predict secretion signal peptides were performed as described by Yin et al. (2017).
Intergenic Distance and Synteny Analysis
Analysis of intergenic distances was carried as described in Saunders et al. (2014). Briefly, the 5’ and 3’ intergenic distances for all genes, effectors, and BUSCOs were 2D binned and plotted using R packages GenomicRanges, rtracklayer, Rsamtools, and ggplots. The Nucmer program from Mummer 3.0 (Kurtz et al. 2004) was used with maxgap 50 and breaklen 400 for comparing the genomes of P. plurivora with related species P. multivora and P. capsici.
Computing dN/dS Ratio for RXLR Effectors
RXLR effectors predicted by our in-house pipeline were analyzed using OrthoMCL (Li et al. 2003) for predicting clusters of orthologous RXLR effector genes from the genomes analyzed here. The clustered groups of effectors from six genomes (P. plurivora, P. multivora isolates NZFS 3378 and NZFS 3448, P. capsici, P. ramorum, and P. cinnamomi) were used for calculation of non-synonymous/synonymous codon substitution (dN/dS). Only OrthoMCL orthologous clusters containing at least three genes in a cluster were used for dN/dS ratio calculation using the maximum likelihood method PAML in the CODEML package (Yang 2007).
Pathogen–Host Interactions, CAZY, Secondary Metabolite, and Ploidy Analyses
To identify pathogenicity factors in common with other pathogens, BLASTP searches of all P. plurivora protein sequences were performed against the pathogen–host interactions (PHIs) database (Urban et al. 2015). Predicted protein sequences from the P. plurivora genome were explored for carbohydrate active enzymes by BLASTP sequence similarity search against the dbCAN database (http://csbl.bmb.uga.edu/dbCAN/; Last accessed August 23, 2018) using a 1e−10e-value cut-off (Lombard et al. 2014). The entire genome assembly was uploaded to the antiSMASH 3.0 server (Weber et al. 2015; Blin et al. 2017) for identification of regions potentially involved in secondary metabolite production. To estimate the ploidy of the sequenced P. plurivora genome, we used ploidyNGS (Corrêa dos Santos et al. 2017) that derives ploidy information from allele frequencies present in the Illumina short reads.
Results and Discussion
Overview of the P. Plurivora Draft Genome Assembly
Here we have presented the first draft genome assembly for the plant pathogen, P. plurivora, the fourth species sequenced from Clade 2 of the Phytophthora genus. P. plurivora is a species recently separated from the P. citricola species complex (Jung and Burgess 2009), together with P. multivora and P. pini (Hong et al. 2009; Scott et al. 2009). Using SPAdes 3.5.0, a total of 41 Mb of P. plurivora genome was assembled into 1,919 contigs and 1,898 scaffolds (mean coverage, 220×). Contigs below 2 kb were removed from the assembly, and mitochondrial genome sequences were not screened out; 9% of sequence reads were unassembled and were discarded. This represents one of the smallest draft genome assemblies for a Phytophthora species (McGowan and Fitzpatrick 2017); P. multivora, P. kernoviae, and P. agathidicida have similar genome assembly sizes of 40, 43, and 37 Mb, respectively (supplementary fig. 2a, Supplementary Material online). Other Phytophthora genomes sequenced to date are typically in excess of 50 Mb and up to 240 Mb (Haas et al. 2009). Assessment of assembly quality revealed: N50 = 48,620 bp; N75 = 21,603 bp; L50 = 242; L75 = 547; longest contig = 294,496 bp; number of contigs >25 kb = 489; number of contigs >10 kb = 921). Using the previously sequenced plant pathogenic relative, P. capsici as a training model, gene prediction using Augustus predicted 11,749 genes. Preliminary annotation of P. plurivora predicted genes revealed 6,353 sequences having Pfam domains. The gene sequences predicted from the P. plurivora genome are available in EumicrobeDB (www.eumicrobedb.org; Last accessed August 23, 2018) (Panda et al. 2018). From an RNAseq dataset generated from in vitro cultured mycelium of P. plurivora, over half the genes predicted were detected with a fragments per kilobase of transcript per million mapped reads (FPKM) value of 10 or greater, while approximately 74% of predicted genes were detected at an FPKM of 5 or greater. All the predicted genes (11,749) had overlaps with RNAseq data (Supplementary material 1, Supplementary Material online). Similar and lower proportions of predicted genes were expressed at these levels in in vitro cultured mycelium of P. infestans (Ah-Fong et al. 2017) and P. capsici (Chen et al. 2013), respectively. This suggests that the accuracy of our gene predictions for P. plurivora compares favorably to those in other Phytophthora species.
An assessment of gene space representation revealed that 226 out of the 234 stramenopile BUSCOs (96.6%) were represented as single complete copies in the P. plurivora genome. No complete duplicated BUSCOs were identified, while one was fragmented and seven were not found (3.0%). By comparison, 16 BUSCOs were not found in P. capsici, 10 and 12 BUSCOs were absent from the genome assemblies of two sequenced isolates of P. multivora (NZFS 3378 and NZFS 3448, respectively), four BUSCOs were absent from the P. ramorum genome, and nine BUSCOs were absent from the P. infestans genome. We compared the absent BUSCOs with each other and found none that were common to genome assemblies of P. plurivora, two isolates of P. multivora, P. capsici, P. ramorum, or P. infestans (supplementary fig. 3, Supplementary Material online and supplementary material 2, Supplementary Material online). The BUSCO analysis suggests that our genome assembly is highly representative of the gene space in P. plurivora, and compares favorably to other Phytophthora genome assemblies.
To further evaluate our draft P. plurivora genome, we compared the assembly statistics to those for P. multivora (two isolates), P. kernoviae, and P. agathidicida, which were also sequenced and assembled using similar strategies (Studholme et al. 2016). The P. plurivora assembly also compares favorably to these other species (table 1; supplementary fig. 2a and b, Supplementary Material online). Genome size estimates have not been evaluated by methods such as flow cytometry for many Phytophthora species (Jung et al. 2017), and prior to this study there was no genome size estimate available for P. plurivora. From k-mer analysis of our sequence data, we calculated a genome size estimate of 45 Mb (supplementary fig. 4a, Supplementary Material online). We calculated the genome size of P. infestans and P. multivora isolate NZFS 3378 for comparison. The genome sizes of P. infestans and P. multivora NZFS 3378 were calculated to be 145 and 49 MB respectively. The lower than expected value for P. infestans is likely due to the high repeat content of that genome. Our calculations thus suggest that our genome assembly for P. plurivora spans over 90% of the genome size estimate, and that repetitive sequences are not as prevalent as in some other Phytophthora genomes, especially P. infestans (Tyler et al. 2006; Haas et al. 2009).
Genome Described here . | Total Genes . | Genome Size (Mb) . | Number of Predicted RXLRs . | Total No of Contigs/Scaffolds . | Host . |
---|---|---|---|---|---|
P. plurivora | 11,741 | 41 | 84 (Pipeline A) | 1,898 | F. sylvatica |
P. multivora isolate 1 (NZFS 3378) | 14,200 | 40 | 92 (Pipeline A) | 2,844 | Idesia polycarpa |
P. multivora isolate 2 (NZFS 3448) | 15,091 | 40 | 84 (Pipeline A) | 2,840 | Metrosideros kermadecensis |
P. ramorum | 16,134 | 65 | 370 (Jiang et al. 2008) | 2,576 | Quercus agrifolia |
P. capsici | 20,378 | 62 | 140 (Pipeline A) | 917 | Laboratory backcross progeny |
P. cinnamomi | 26,132 | 58 | 565 (Studholme et al. 2016; McGowan and Fitzpatrick 2017) | 1,314 | Eucalyptus marginata |
Genome Described here . | Total Genes . | Genome Size (Mb) . | Number of Predicted RXLRs . | Total No of Contigs/Scaffolds . | Host . |
---|---|---|---|---|---|
P. plurivora | 11,741 | 41 | 84 (Pipeline A) | 1,898 | F. sylvatica |
P. multivora isolate 1 (NZFS 3378) | 14,200 | 40 | 92 (Pipeline A) | 2,844 | Idesia polycarpa |
P. multivora isolate 2 (NZFS 3448) | 15,091 | 40 | 84 (Pipeline A) | 2,840 | Metrosideros kermadecensis |
P. ramorum | 16,134 | 65 | 370 (Jiang et al. 2008) | 2,576 | Quercus agrifolia |
P. capsici | 20,378 | 62 | 140 (Pipeline A) | 917 | Laboratory backcross progeny |
P. cinnamomi | 26,132 | 58 | 565 (Studholme et al. 2016; McGowan and Fitzpatrick 2017) | 1,314 | Eucalyptus marginata |
Genome Described here . | Total Genes . | Genome Size (Mb) . | Number of Predicted RXLRs . | Total No of Contigs/Scaffolds . | Host . |
---|---|---|---|---|---|
P. plurivora | 11,741 | 41 | 84 (Pipeline A) | 1,898 | F. sylvatica |
P. multivora isolate 1 (NZFS 3378) | 14,200 | 40 | 92 (Pipeline A) | 2,844 | Idesia polycarpa |
P. multivora isolate 2 (NZFS 3448) | 15,091 | 40 | 84 (Pipeline A) | 2,840 | Metrosideros kermadecensis |
P. ramorum | 16,134 | 65 | 370 (Jiang et al. 2008) | 2,576 | Quercus agrifolia |
P. capsici | 20,378 | 62 | 140 (Pipeline A) | 917 | Laboratory backcross progeny |
P. cinnamomi | 26,132 | 58 | 565 (Studholme et al. 2016; McGowan and Fitzpatrick 2017) | 1,314 | Eucalyptus marginata |
Genome Described here . | Total Genes . | Genome Size (Mb) . | Number of Predicted RXLRs . | Total No of Contigs/Scaffolds . | Host . |
---|---|---|---|---|---|
P. plurivora | 11,741 | 41 | 84 (Pipeline A) | 1,898 | F. sylvatica |
P. multivora isolate 1 (NZFS 3378) | 14,200 | 40 | 92 (Pipeline A) | 2,844 | Idesia polycarpa |
P. multivora isolate 2 (NZFS 3448) | 15,091 | 40 | 84 (Pipeline A) | 2,840 | Metrosideros kermadecensis |
P. ramorum | 16,134 | 65 | 370 (Jiang et al. 2008) | 2,576 | Quercus agrifolia |
P. capsici | 20,378 | 62 | 140 (Pipeline A) | 917 | Laboratory backcross progeny |
P. cinnamomi | 26,132 | 58 | 565 (Studholme et al. 2016; McGowan and Fitzpatrick 2017) | 1,314 | Eucalyptus marginata |
Phytophthora species secrete effector proteins that act either in the apoplastic space or inside host cells to aid infection and/or elicit defense responses (reviewed in Schornack et al. 2009; Whisson et al. 2016). Within this latter effector class are the RXLR and CRN (crinkler) effector families, which are characterized by a signal peptide and conserved peptide motifs required for translocation into the host (Whisson et al. 2007; Schornack et al. 2010). SignalP v3.0 analysis predicted 1,737 secreted proteins of which 84 were predicted by our HMM as RXLR class effector proteins (see later section Supplementary material 3, Supplementary Material online) and 60 proteins grouped as predicted secreted CRN class effectors. Within the RXLR class, the following motifs were present; EER (80 proteins), WY (24 proteins), WYY (23 proteins), LYD (14 proteins) (Win et al. 2012; Ye et al. 2015).
Large numbers of CAZy proteins have been identified in other sequenced Phytophthora species (Ospina-Giraldo et al 2010; Grenville-Briggs et al. 2017). Here, we identified glycoside hydrolases (332), glycosyltransferases (271), carbohydrate binding modules (304), polysaccharide lyases (49), and carbohydrate esterases (43). The number of polysaccharide lyases predicted in P. plurivora is similar to that found in other Phytophthora species, but our primary analysis here shows elevated numbers of other CAZy proteins in P. plurivora, compared with other Phytophthora species (Ospina-Giraldo et al. 2010). Homologs were found in the PHI database for 2.6% of the total predicted P. plurivora proteome (308/11,749), predominantly with P. sojae and Fusarium graminearum. The secondary metabolite analysis on the P. plurivora genome revealed six genomic regions with potential to encode enzymes involved in secondary metabolite formation (supplementary table 1, Supplementary Material online). The six genomic regions comprise 56 genes that match with the antiSMASH 3.0 database (Weber et al. 2015). This included a non-ribosomal peptide synthase, and enzymes for ectoine and terpene biosynthesis. When compared with fungi, species of Phytophthora are not known to produce many secondary metabolites. The best characterized secondary metabolites from Phytophthora are the mating hormones α1 and α2, which are diterpene molecules (Tomura et al. 2017). It has also been shown that Phytophthora species produce a signal molecule derived from 4, 5-dihydroxy-2, 3-pentanedione which has quorum-sensing activity in bacteria (Kong et al. 2010). It is possible that the secondary metabolism predictions from the P. plurivora genome may be involved in the synthesis of these bioactive secondary metabolites.
Ploidy levels in Phytophthora species can be variable within and between species, but are at least diploid (Bertier et al. 2013; Li et al. 2017). Using ploidyNGS to analyze allele frequency distributions based on Kolmogorov-Smirnov distance, ploidyNGS suggested that the P. plurivora strain sequenced here was most likely tetraploid. P. plurivora is a homothallic (self-fertile, inbreeding) species, signifying that if its survival in the environment is via sexually derived oospores, then heterozygosity levels will be reduced with each generation, as has been observed in P. plurivora strains sampled from different countries (Schoebel et al. 2014). Despite this consideration, heterozygous loci with additional alleles were detected, suggestive of an elevated ploidy level (supplementary fig. 5, Supplementary Material online).
Phytophthora genomes often contain high levels of repetitive DNA sequences, such as P. infestans for which the genome contains over 75% repetitive sequences (Tyler et al. 2006; Haas et al. 2009). Approximately 15% of the P. plurivora genome is comprised of repetitive sequences, far less than many other Phytophthora genomes sequenced to date. The predominant repeat type is interspersed repeats, accounting for 50% of the total repeats. Repetitive DNA elements and long terminal repeat (LTR) retroelements comprise approximately 30% of the total repeats (table 2).
Repeats . | Number . | Length Occupied . | Percentage of Sequence . |
---|---|---|---|
SINEs: | 52 | 6,478 | 0.02% |
ALUs | 0 | 0 | 0.00% |
MIRS | 0 | 0 | 0.00% |
LINEs: | 181 | 87,398 | 0.22% |
LINE1 | 114 | 43,446 | 0.11% |
LINE2 | 0 | 0 | 0.00% |
L3/CR1 | 25 | 20,942 | 0.05% |
LTR elements: | 1,543 | 945,917 | 2.34% |
ERVL | 0 | 0 | 0.00% |
ERVL-MaLRs | 0 | 0 | 0.00% |
ERVL-class I | 0 | 0 | 0.00% |
ERVL-class II | 0 | 0 | 0.00% |
DNA elements: | 2,430 | 1,115,747 | 2.76% |
hAT-charlie | 0 | 0 | 0.00% |
TcMar-Tigger | 3 | 887 | 0.00% |
Unclassified | 1,427 | 826,032 | 2.04% |
Total interspersed repeats: | 2,981,572 | 7.37% | |
Satellites: | 0 | 0 | 0.00% |
Simple repeats | 4,134 | 186,013 | 0.46% |
Low complexity: | 500 | 26,340 | 0.07% |
Repeats . | Number . | Length Occupied . | Percentage of Sequence . |
---|---|---|---|
SINEs: | 52 | 6,478 | 0.02% |
ALUs | 0 | 0 | 0.00% |
MIRS | 0 | 0 | 0.00% |
LINEs: | 181 | 87,398 | 0.22% |
LINE1 | 114 | 43,446 | 0.11% |
LINE2 | 0 | 0 | 0.00% |
L3/CR1 | 25 | 20,942 | 0.05% |
LTR elements: | 1,543 | 945,917 | 2.34% |
ERVL | 0 | 0 | 0.00% |
ERVL-MaLRs | 0 | 0 | 0.00% |
ERVL-class I | 0 | 0 | 0.00% |
ERVL-class II | 0 | 0 | 0.00% |
DNA elements: | 2,430 | 1,115,747 | 2.76% |
hAT-charlie | 0 | 0 | 0.00% |
TcMar-Tigger | 3 | 887 | 0.00% |
Unclassified | 1,427 | 826,032 | 2.04% |
Total interspersed repeats: | 2,981,572 | 7.37% | |
Satellites: | 0 | 0 | 0.00% |
Simple repeats | 4,134 | 186,013 | 0.46% |
Low complexity: | 500 | 26,340 | 0.07% |
Repeats . | Number . | Length Occupied . | Percentage of Sequence . |
---|---|---|---|
SINEs: | 52 | 6,478 | 0.02% |
ALUs | 0 | 0 | 0.00% |
MIRS | 0 | 0 | 0.00% |
LINEs: | 181 | 87,398 | 0.22% |
LINE1 | 114 | 43,446 | 0.11% |
LINE2 | 0 | 0 | 0.00% |
L3/CR1 | 25 | 20,942 | 0.05% |
LTR elements: | 1,543 | 945,917 | 2.34% |
ERVL | 0 | 0 | 0.00% |
ERVL-MaLRs | 0 | 0 | 0.00% |
ERVL-class I | 0 | 0 | 0.00% |
ERVL-class II | 0 | 0 | 0.00% |
DNA elements: | 2,430 | 1,115,747 | 2.76% |
hAT-charlie | 0 | 0 | 0.00% |
TcMar-Tigger | 3 | 887 | 0.00% |
Unclassified | 1,427 | 826,032 | 2.04% |
Total interspersed repeats: | 2,981,572 | 7.37% | |
Satellites: | 0 | 0 | 0.00% |
Simple repeats | 4,134 | 186,013 | 0.46% |
Low complexity: | 500 | 26,340 | 0.07% |
Repeats . | Number . | Length Occupied . | Percentage of Sequence . |
---|---|---|---|
SINEs: | 52 | 6,478 | 0.02% |
ALUs | 0 | 0 | 0.00% |
MIRS | 0 | 0 | 0.00% |
LINEs: | 181 | 87,398 | 0.22% |
LINE1 | 114 | 43,446 | 0.11% |
LINE2 | 0 | 0 | 0.00% |
L3/CR1 | 25 | 20,942 | 0.05% |
LTR elements: | 1,543 | 945,917 | 2.34% |
ERVL | 0 | 0 | 0.00% |
ERVL-MaLRs | 0 | 0 | 0.00% |
ERVL-class I | 0 | 0 | 0.00% |
ERVL-class II | 0 | 0 | 0.00% |
DNA elements: | 2,430 | 1,115,747 | 2.76% |
hAT-charlie | 0 | 0 | 0.00% |
TcMar-Tigger | 3 | 887 | 0.00% |
Unclassified | 1,427 | 826,032 | 2.04% |
Total interspersed repeats: | 2,981,572 | 7.37% | |
Satellites: | 0 | 0 | 0.00% |
Simple repeats | 4,134 | 186,013 | 0.46% |
Low complexity: | 500 | 26,340 | 0.07% |
RXLR and CRN Effector Prediction
As in other Phytophthora species sequenced to date, the genome of P. plurivora contains many genes encoding RXLR class effectors. RXLR effectors have been extensively researched in P. infestans and P. sojae and found to be translocated inside plant cells during infection to facilitate the infection process (Whisson et al. 2016; Wang et al. 2017). These effectors are modular proteins that contain an N-terminal signal peptide, a conserved RXLR peptide motif typically within the next 40 amino acids, and often an EER motif near the RXLR. The functional effector peptide region is located between the RXLR–EER and the C-terminus. The majority of effectors in this class that have been functionally characterized contain all three of these features (Anderson et al. 2015; Whisson et al. 2016). We used four different methods to predict effectors from Phytophthora genomes: A is a modified prediction method described by Jiang et al. (2008); B is the effectorP program for fungal effector prediction, based on machine learning (Sperschneider et al. 2016) and C (Win et al. 2007) and D (Whisson et al. 2007) are older prediction methods using a Galaxy pipeline (supplementary fig. 1a, Supplementary Material online). The four different RXLR effector prediction pipelines yielded differing numbers of candidates (supplementary material 4, Supplementary Material online). We compared the RXLR effectors predicted by each of four pipelines and plotted a Venn diagram (supplementary fig. 1b, Supplementary Material online). Pipeline C (Win et al. 2007) produced the largest number of RXLR effectors (196), but half of the effectors predicted by this pipeline did not possess the canonical EER motif. By comparison, the HMM-based prediction methods predicted fewer candidate RXLR effectors. Pipeline A, with additional filtering steps, was the most stringent, yielding 84 candidate effectors, of which 80 contain the signal peptide, RXLR and EER motifs. As such, we regarded this as a ‘high confidence’ set of predicted effectors for further analysis. We predicted CRN effectors as described (Yin et al. 2017) by using CRN sequences from P. infestans as training material (Haas et al. 2009). We predicted 139 CRN proteins from P. plurivora, of which 60 had signal peptides for secretion, and these can be considered as candidate effectors. A large proportion of CRN proteins identified from other Phytophthora genomes, such as P. infestans and P. capsici, also do not possess a predicted signal peptide (Haas et al. 2009; Stam et al. 2013).
Effector-Coding Genes are Located in Gene-Sparse Regions of the P. plurivora Genome
Comparison of RXLR class effectors between Phytophthora species from different clades has typically revealed that these effectors have diverse sequences, with many having no homologs in other species (Quinn et al. 2013; McGowan and Fitzpatrick 2017), and thus are evolving rapidly. Comparisons of effectors from more closely related species can reveal the genomic processes acting on effector coding genes to drive their diversification. It has been proposed that Phytophthora species have two-speed genomes, where effector coding genes reside in gene-poor, repeat-rich regions that are prone to rapid evolution (Dong et al. 2015). The RXLR and CRN effectors in P. plurivora also reside in more gene-poor regions of the genome, as evidenced by greater intergenic distances between them and neighboring genes (fig. 1A and B). Specifically, the mean intergenic distance between all P. plurivora genes at the 5’ end is 1,107 bp whereas this distance is 3,117 bp for RXLR effector coding genes and 2,018 bp for CRN effector coding genes. At the 3’ end, the mean intergenic distance for the entire predicted gene set is 848 bp, whereas it is 2,128 bp for RXLR effector genes and 3,087 bp for CRN effector genes. The median intergenic distance between all predicted genes was significantly less than that for the effector gene classes analyzed. The median 5’ end intergenic distance between all predicted genes was 582 bp, whereas it was 1,920 and 1,518 bp for the RXLR and CRN effector genes, respectively. Similarly, the median 3’ prime intergenic distance between all the genes was 316 bp, whereas it was 1,414 and 1,925 bp for RXLR and CRN effectors, respectively. A t-test on flanking intergenic distances of effectors and core genes showed that intergenic distances were significantly different from the core genes and effector genes (supplementary material 5, Supplementary Material online and supplementary fig. 6, Supplementary Material online). These results for P. plurivora follow a similar trend observed for other Phytophthora genomes (Dong et al. 2015).

—Plot showing the intergenic distance between genes in P. plurivora. (A) Red dots indicate the intergenic distances for RXLR effector-coding genes. (B) Black dots indicate the intergenic distances of CRN effector coding genes. The color scale representing the gene content per bin is shown to the right of the plots; values on x- and y-axes are nucleotides (nt). See supplementary material 4, Supplementary Material online and supplementary fig. 6, Supplementary Material online for data and boxplot analysis.
Synteny Analysis
Draft genome sequences for two P. multivora isolates from Studholme et al. (2016) allowed us to compare them with our P. plurivora draft assembly. Genome assemblies of P. plurivora and both isolates of P. multivora are of similar size, at approximately 40 Mb each. Comparing the genes from P. plurivora with those from other Clade 2 species, the highest level of similarity was observed between P. plurivora and P. multivora. There were 2,158 and 1,472 P. plurivora genes without orthologs in P. multivora NZFS3378 and NZFS3348, respectively, of which 1,361 genes were common between NZFS3378 and NZFS3348 (fig. 2; supplementary material 6, Supplementary Material online).

—Venn diagram comparing P. plurivora genes not found in genome assemblies of P. multivora isolates and/or P. capsici. Bar graph below the Venn diagram shows the total number of genes specific to P. plurivora that were not found in the genome assemblies of P. multivora or P. capsici. See supplementary material 5, Supplementary Material online for lists of genes used for figure construction.
Genome collinearity was studied between P. plurivora and both isolates of P. multivora, and P. capsici. The largest scaffold (scaffold_1 length of 294,496 bp containing 95 protein-coding genes [GenBank NMPK01000001.1]) was almost fully collinear with both P. multivora isolates, except for three missing genes: g65, g72, and g95. These three genes are highly conserved in the two P. multivora isolates and are also highly conserved in other Phytophthora sp. Blocks of genome collinearity have been identified between diverse species such as P. infestans, P. sojae, and P. ramorum previously (Haas et al. 2009). P. plurivora and P. multivora are more closely related than the species in these previous comparisons (Yang et al. 2017), and so it is unsurprising that most of their genomes exhibit strong collinearity (fig. 3A and B).

—Collinearity between P. plurivora and P. multivora genomes. (A) Whole genome alignment (Mummer) between P. plurivora and P. multivora isolate 1 (NZFS 3378). (B) Whole genome alignment (Mummer) between P. plurivora and P. multivora isolate 2 (NZFS 3448). The X-axis is the reference genome of P. plurivora and the Y-axis is P. multivora. The blue dots represent reverse complement matches and red represents forward matches.
Through comparing P. plurivora and P. multivora, we identified evidence of localized gene duplication and sequence diversification in a cluster of RXLR effectors. P. plurivora Scaffold_267 contains seven RXLRs in its entire length of 45 kb, and we identified two scaffolds (LGSM010000246.1 and LGSM01000099.1) in P. multivora NZFS 3378 (isolate 1) and two scaffolds in P. multivora NZFS 3448 (isolate 2) (LGSL01000255.1 and LGSL01000075.1) having collinearity with it. All seven RXLR effectors found in this P. plurivora scaffold had collinear homologs in the P. multivora scaffolds; however, one RXLR of P. plurivora PlRXLR 39, is duplicated in P. multivora NZFS 3378 (isolate 1) and one RXLR of P. multivora NZFS 3378 (isolate 1), MlRXLR15 is duplicated into two RXLRs in P. plurivora e.g., PlRXLR43 and PlRXLR44 (fig. 4).
![—Scaffold_267 of P. plurivora containing seven RXLR-coding genes in its entire length of 45 kb has collinearity with two scaffolds (LGSM010000246.1 and LGSM01000099.1) in P. multivora NZFS 3378 (isolate 1) and two scaffolds in P. multivora NZFS 3448 (isolate 2) (LGSL01000255.1 and LGSL01000075.1 [not shown]). P. plurivora PlRXLR39 has undergone duplication in P. multivora into MlRXLR29 and MlRXLR30. The purple rectangles in the image indicate the plus strand and the yellow rectangles indicate the negative strand.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/gbe/10/9/10.1093_gbe_evy162/3/m_evy162f4.jpeg?Expires=1747870380&Signature=olF0hvEHRJ1Vg7FreEDBn7w40lCbJsCuQPDwOyk1f~EBeuHXwOMMWVYQXJsqtNXJRoHyKP6MTtl8zsWb96v5hum32WwD7jRKS1ScE9pvHiyBHLab6JA~rd4MO4elS6Dab4iZJ4-44USqtDN9GfzpqpEXcF-bcZGOLPPQ7SjvY7Mq5tHd9vFJIdm0q6dLhUqRWk7w1kJRKqakeAxbjE-gxrvc~mp5HuY7-6hYq1GrfU6kHYz-2cNx9~sSD4-hayd3dmfHcLOQdMycwiKE6m4FDGPih2M0KxhGUbNTaSCf~aZbnsyG~OYtA7voiV51Uj0NQR3ecZMa-CxUQSDCSKXF8w__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
—Scaffold_267 of P. plurivora containing seven RXLR-coding genes in its entire length of 45 kb has collinearity with two scaffolds (LGSM010000246.1 and LGSM01000099.1) in P. multivora NZFS 3378 (isolate 1) and two scaffolds in P. multivora NZFS 3448 (isolate 2) (LGSL01000255.1 and LGSL01000075.1 [not shown]). P. plurivora PlRXLR39 has undergone duplication in P. multivora into MlRXLR29 and MlRXLR30. The purple rectangles in the image indicate the plus strand and the yellow rectangles indicate the negative strand.
Analysis of Synonymous and Non-synonymous Codon Substitution of P. plurivora RXLR Effectors Reveals a Subset under Neutral or Purifying Selection
To determine whether RXLR effector sequence diversification was selection-neutral or under positive selection, we compared the P. plurivora RXLR effector complement with those predicted from two isolates of P. multivora, P. capsici, P. cinnamomi, and P. ramorum. All six species are considered to be plant pathogens with a broad host range. P. multivora is closely related to P. plurivora, while P. capsici is more distantly related within Clade 2 (Yang et al. 2017). P. cinnamomi and P. ramorum are both pathogens of trees and are placed in Clade 7c and Clade 8c, respectively (Yang et al. 2017). Using pipeline A, we predicted effectors of two P. multivora isolates NZFS 3378 (isolate 1) and NZFS 3448 (isolate 2) (Studholme et al. 2016), P. capsici (Lamour et al. 2012), P. cinnamomi (Studholme et al. 2016) and P. ramorum (Jiang et al. 2008). For P. capsici 140 RXLR effectors were predicted, whereas for P. multivora there were 84 and 92 predicted for NZFS 3378 and NZFS 3448, respectively.
OrthoMCL analysis (default BLASTP parameters, 1.5 inflation value) identified 105 clusters that had at least three members in a group. A total of 48 groups contained members from a single species, 42 groups contained members from two species, and 15 groups contained members from three species. No clusters comprised RXLRs from more than three Phytophthora species. P. plurivora effectors were represented within 47 clusters, P. multivora was represented in 53 clusters and P. capsici within 33 clusters having three or more members. dN/dS ratios were calculated for all clusters containing at least three members (supplementary material 7, Supplementary Material online), as a previous study had shown that RXLR effector paralogs within a species could be under positive selection (Win et al. 2007). Values for dN/dS >1.0 are suggestive of positive selection, while values below 1.0 are suggestive of purifying selection. The 105 clusters resolved into 432 pairs of proteins. The higher dN/dS ratios ranged from 1.0 to 3.6 and included five effectors from P. plurivora, but only PlRXLR53 exhibited a markedly elevated dN/dS ratio (1.9) that suggested positive selection (fig. 5A and B).

—dN/dS analysis of RXLR effector coding genes from P. plurivora, P. capsici, P. ramorum, P. multivora isolates, and P.cinnamomi. (A) Scatter plot of all dN and dS values for RXLR effector coding genes from P. plurivora, P. capsici, P. ramorum, P. multivora isolates, and P.cinnamomi. (B) Scatter plot of dN and dS values after removal of the outlier pair PrAvh147 and PrAvh227 from P. ramorum. See supplementary material 6, Supplementary Material online for dN and dS values used for figure construction.
Oomycete RXLR effectors are considered to be rapidly evolving and under positive selection, as homologs are often not found in species in different clades of Phytophthora (Quinn et al. 2013; McGowan and Fitzpatrick 2017). It was therefore unexpected that only PlRXLR53, grouped with Ml1RXLR62, Ml2RXLR50 and Ml2RXLR83 from P. multivora, showed a dN/dS ratio (1.9) that suggested positive selection. All other P. plurivora effectors in OrthoMCL groups of three or more effector genes were either not under selection (neutral), or under purifying selection. By comparison, effectors from P. ramorum showed higher dN/dS ratios (supplementary material 7, Supplementary Material online) as shown previously (Win et al. 2007). Seven P. plurivora RXLR effectors had no homologs in any of 26 sequenced Phytophthora genomes: PlRXLR4, PlRXLR6, PlRXLR18, PlRXLR27, PlRXLR36, PlRXLR52, PlRXLR58. Taken together, these results suggest that P. plurivora possesses RXLR effectors that are under diverse evolutionary pressures, with one subset showing evidence of purifying selection, and a further subset that have evolved rapidly and are specific only to P. plurivora. Sequencing of additional Clade 2 species, and species that are more closely related to P. plurivora, may provide more resolution in clarifying the mode of selection acting on this class of effectors. We have plotted multiple sequence alignments of PlRXLRs having the highest dN/dS ratios from group12 (PlRXLR9), group14 (PlRXLR20, 32, 76, 81) and group39 (PlRXLR53) in supplementary fig. 7a–c, Supplementary Material online.
Conclusion
The genome sequence presented here provides a resource that can underpin further investigation into the mechanisms of disease caused by P. plurivora, a prevalent but little researched pathogen of important tree species. Our genome sequence of P. plurivora is consistent with the genome architecture of other sequenced Phytophthora species, and we found evidence for elevated ploidy, as can occur in Phytophthora species. This genome resource can be used in future population genomic studies for identification of haplotypes and alleles, and in identifying which effectors may function in infection of woody host plants.
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.
Footnotes
Data deposition: This project has been deposited at DDBJ/ENA/GenBank under the accessions NMPK00000000 and SRP132452. The Genome version described in this paper is version NMPK01000000. Raw sequence data can be found in the NCBI Sequence Read Archive, with accession number SRP132452.
Acknowledgments
We are grateful to Laura Vetukuri for collecting and isolating P. plurivora strain AV1007. This work is supported by research funds from Swedish Research Council Formas (2015-430), the Swedish Foundation for Strategic Research (FFL5), Helge Ax: son Johnsons Stiftelse, Nordic Joint Committee for Agricultural and Food Research (NKJ), Nordic Forest Research (SNS) network and the Scottish Government Rural and Environment Science and Analytical Services Division (RESAS). Funding support to ST through a DBT-Ramalingaswamy fellowship is gratefully acknowledged. MMC was supported by a DST-INSPIRE AORC fellowship. The authors acknowledge support from the National Genomics Infrastructure in Stockholm funded by Science for Life Laboratory, the Knut and Alice Wallenberg Foundation, Partnership Alnarp, Parvatha Vardhini foundation and the Swedish Research Council, and SNIC/Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure.
Literature Cited
Author notes
Sucheta Tripathy and Mathu Malar C are joint first authors and contributed equally to this work.