Draft genome assemblies for tree pathogens Phytophthora pseudosyringae and Phytophthora boehmeriae

Abstract Species of Phytophthora, plant pathogenic eukaryotic microbes, can cause disease on many tree species. Genome sequencing of species from this genus has helped to determine components of their pathogenicity arsenal. Here, we sequenced genomes for two widely distributed species, Phytophthora pseudosyringae and Phytophthora boehmeriae, yielding genome assemblies of 49 and 40 Mb, respectively. We identified more than 270 candidate disease promoting RXLR effector coding genes for each species, and hundreds of genes encoding candidate plant cell wall degrading carbohydrate active enzymes (CAZymes). These data boost genome sequence representation across the Phytophthora genus, and form resources for further study of Phytophthora pathogenesis.


Introduction
Phytophthora species are oomycetes, eukaryotic plant pathogens with a filamentous growth habit that superficially resemble fungi. However, they are stramenopiles, phylogenetically separated from the true fungi, belonging to the SAR group (stramenopiles-alveolata-rhizaria) (Cavalier-Smith 2018). Species of Phytophthora represent significant threats to food security, causing billions of dollars of losses to key crops such as potato, tomato, and soybean (Tyler 2007;Fry et al. 2015). Phytophthora species also represent some of the greatest threats to tree and forest health (Hansen et al. 2012). Sampling from diseased trees, forest soils, and water sources has led to the identification of many Phytophthora species that can infect trees (Jung et al. 2017(Jung et al. , 2018. While molecular host-pathogen interactions are intensively researched for a small number of crop pathogenic Phytophthora species, such as Phytophthora infestans (potato late blight) and Phytophthora sojae (soybean root rot), most tree pathogenic species have not received as much attention (Kamoun et al. 2015). Sudden oak death caused by Phytophthora ramorum, and Jarrah dieback caused by Phytophthora cinnamomi, are well-known examples of tree pathogenic species with global impact (Grü nwald et al. 2008;Fisher et al. 2012;Hardham and Blackman 2018). Pathogens of woody host plants are present in most, if not all, clades of the Phytophthora genus. The genomes of many tree pathogenic species have been sequenced in recent years (for example Feau et al. 2016;Studholme et al. 2016Studholme et al. , 2019Vetukuri et al. 2018), providing resources to predict and compare the pathogenicity factors encoded in these genomes.
Among the most commonly detected Phytophthora species in the United Kingdom in diseased plants and soil is Phytophthora pseudosyringae (Clade 3) (Riddell et al. 2019). P. pseudosyringae is also present in many other countries (Linzer et al. 2009;Scanu et al. 2015;Stamler et al. 2016;Hansen et al. 2017;Khaliq et al. 2019). A draft genome sequence was recently reported for P. pseudosyringae (McGowan et al. 2020), which provided the first view of the pathogenicity arsenal in this species. The only other genome sequence from a Clade 3 species is from Phytophthora pluvialis (Studholme et al. 2016). Another globally distributed species is Phytophthora boehmeriae (Clade 10) which not only causes disease on trees such as black wattle (Acacia mearnsii), but also crops such as cotton and chili (Chowdappa et al. 2014;Santos 2016;Wang et al. 2020). Phytophthora Clade 10 contains few species (Yang et al. 2017). While there are multiple genome sequences for different isolates of the tree pathogen P. kernoviae, only one other Clade 10 genome is available (Studholme et al. 2019).
To build molecular biology resources and enable future gene diversity studies in P. pseudosyringae, we generated a further draft genome sequence assembly for this species. To facilitate more robust evolutionary studies into pathogenicity of Clade 10 species in future, we also generated a genome sequence assembly for P. boehmeriae.

Materials and methods
Phytophthora cultures, DNA preparation, and Illumina sequencing Cultures of P. pseudosyringae (SCRP734) and P. boehmeriae (SCRP23) from the James Hutton Institute culture collection were maintained on rye agar. Hyphae for both species were grown without shaking in amended lima bean broth (Bruck et al. 1981) for 72 h at 20 C, harvested by gravity filtration through 70 mm nylon mesh, and immediately frozen in liquid nitrogen until used for DNA extraction. Genomic DNA from each species was extracted from the frozen hyphae using the protocol of Raeder and Broda (1985), followed by RNaseA treatment and DNeasy (Qiagen) column cleanup using the manufacturer's protocol. DNA quality was assessed by absorbance at 260 and 280 nm (Nanodrop), and agarose gel electrophoresis. Genomic DNA was fragmented for sequencing library construction by physical shearing using a M220 ultrasonicator (Covaris) as recommended. Libraries for whole-genome sequencing were prepared using the Illumina TruSeq DNA PCR-Free kit, following the manufacturer's protocol (350 bp insert) and standard indexing. Libraries were sequenced (150 bp, paired-end) on a NextSeq 500 (Illumina) as recommended.

Phytophthora genome assembly and gene prediction
Reads for each species were separately subjected to quality control using trimmomatic (Q15) (Bolger et al. 2014). FASTQC (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc/) (last accessed 16 August 2021) was used to assess read quality pre and post-read trimming. CLC assembler (version 4.10.86742) (Qiagen CLC Genomics Workbench) was used for a first pass assembly. This assembly was subject to Blobtools (version 1.0) (Laetsch and Blaxter 2017) analysis to identify contaminant contigs. The contigs were DIAMOND-BLASTp (version 0.9.24.125) (Buchfink et al. 2015) searched against the NCBI NR database and taxonomically assigned using Blobtools. For final assembly, both genomes were processed through multiple rounds of read filtering/assembly and Blobtools-contamination identification. The final assembly was made with CLC using the contamination filtered reads (as described in Thorpe et al. 2018), and was passed as trusted contigs for SPAdes (version 3.13.0) (Bankevich et al. 2012). The resulting SPAdes assembly was screened through Blobplots once more. The final contig assembly was then subjected to scaffolding using SSPACE (version 3.0) (Boetzer et al. 2011). To reduce false-positive scaffolding, dereplicated reads are needed; the Q15 or greater quality reads already prepared using trimmomatic were dereplicated using PRINSEQ (prinseq-lite-0.20.4) (Schmieder and Edwards 2011). Assemblies were repeat masked using a de novo set of Phytophthora specific repeats (generated here, as described in Thorpe et al. 2018) and repbase (Bao et al. 2015). These were modeled from multiple Phytophthora genomes. This was then taken further to predict transposons.
The completeness of each Phytophthora genome assembly was assessed by analysis of 234 conserved stramenopile genes, with BUSCO (Seppey et al. 2019) run in long mode.
Ploidy level was estimated by mapping the dereplicated Q15 reads to the final genome using BWA-mem (Li 2013) and assessed using nQuire (Weiß et al. 2018). nQuire uses several statistical approaches, for example, correlating the SNP distribution with the expected distribution of various known ploidy distributions. The resulting statistical output was then interrogated to determine the most statistically significant ploidy level.

Secreted proteins, RXLR effectors, and CAZymes
The catalog of secreted proteins (presence of a signal peptide and absence of a transmembrane domain) for each Phytophthora genome was predicted by SignalP v4.1 (Nielsen 2017) and Phobius (Kä ll et al. 2004).
To identify genes encoding secreted RXLR effectors, we adopted an inclusive strategy. RXLR effector genes were predicted using three strategies, described in Whisson et al. (2007), Win et al. (2007), and Bhattacharjee et al. (2006). The resulting gene lists were then combined to form a final nonredundant catalog of genes encoding candidate RXLR effectors.
Intergenic distances (bp) were calculated for all genes, and separately for the BUSCO and RXLR effector coding genes. Intergenic distances at 5' and 3' ends of genes were then plotted against each other to identify if there was a bias in the intergenic distances for the RXLR effector coding genes.
Candidate secreted crinkle and necrosis (CRN) effectors were predicted using a regular expression search for L[FY]LA[RK] in the predicted proteins from both species sequenced (https://github. com/peterthorpe5/reg_exp_finder/blob/master/crinkler_reg.py) (last accessed 16 August 2021). The positive hits were then aligned using Muscle (Edgar 2004) and inputted to hmmbuild. The resulting Hidden Markov Model was used to interrogate the predicted proteins using hmmsearch (1e À10 cut off), to yield the candidate CRNs. A further regular expression search only for the LXLFLAK motif characteristic of CRN effectors (Schornack et al. 2010) was also performed (https://github.com/test1932/Regexscript-s-) (last accessed 16 August 2021). Returned sequences were assessed by SignalP v5.0 for the presence of a signal peptide.
Additional regular expression searches were performed for proteins containing the PEP13 elicitor (VWNQPVRGFKVXE) (Brunner et al. 2002), the GHRHDWE peptide characteristic of necrosis and ethylene-inducing 1-like (NLP) proteins (Chen et al. 2018), and HXGPCEXXXDD found in a class of candidate secreted effector proteins in P. palmivora (Evangelisti et al. 2017).
Carbohydrate active (CAZy) proteins were predicted through the dbCAN2 web server, using hmmer, DIAMOND-BLAST, and HotPep tools ).
Alien Index calculation to detect candidate horizontal gene transfer events Potential horizontal gene transfer (HGT) events in the evolutionary history of the Phytophthora species sequenced here were identified through calculation of the Alien Index (AI) for each gene (Gladyshev et al. 2008;Flot et al. 2013), as described by Thorpe et al. (2018). Differences to the published method were that DIAMOND-BLASTP results identified as nonstramenopile were considered as candidate HGT events, and the strict AI threshold value for a gene to be considered as an HGT event was 30.

BLAST similarity and dN/dS analysis of RXLR effectors
Predicted RXLR effector protein sequences and predicted proteomes were searched against the NCBI NR database. Percentage identity, length of alignment, and bitscore were recorded and binned for each query sequence. Percentage identity, alignment, and bitscore were plotted against number of sequences in each bin.
For comparison with the RXLR effectors from the species sequenced in this report, genome assemblies (including versions) of 27 Phytophthora species were used to source RXLR effectors for dN/dS calculations (Supplementary Table S1). As a negative control, the set of genes identified from BUSCO analysis for the species assembled here were analyzed.
Analysis of selection on RXLR and BUSCO proteins was carried out by Orthofinder (Emms and Kelly 2019) and separately a Reciprocal Best BLAST Hit (RBBH) clustering network, as described in Thorpe et al. (2016). Briefly, RBBH searches were performed between all species. The results were then clustered using MCL (version 14-137) (Dongen 2000) using inflation value I ¼ 6, which resulted in an RBBH clustering network. dN/dS analysis was performed as described in Thorpe et al. (2016) with slight modification. Briefly, amino acid sequences for each cluster were aligned using Muscle (Edgar 2004) and badseq_remover.pl (https://github.com/dukecomeback/bad-sequence-remover) (last accessed 16 August 2021) was used to remove sequences too divergent from each other. Then the alignment was refined using Muscle. The aligned amino acid sequence was back translated to its original nucleotide sequence, preserving its alignment information. The nucleotide aligned clusters were then filtered for the most informative coding regions, and indels removed using trimAI (no_gaps) (Capella-Gutié rrez et al. 2009). On clusters with three or more members, Codonphyml (Gil et al. 2013) was used to perform dN/dS analysis. Detailed method and scripts can be found at the github page listed in the data availability statement.

Genome assembly
Illumina sequencing yielded 118,669,161 and 168,739,903 raw reads for P. pseudosyringae and P. boehmeriae, respectively. Our 48.9 Mb assembly for P. pseudosyringae is broadly similar in size to that reported for a different isolate by McGowan et al. (2020), but the N50 value of 76 kb for the assembly reported here is threefold greater ( Table 1) Our estimation of heterozygosity in the isolates of the two species sequenced here showed low levels for P. boehmeriae and P. pseudosyringae (0.01 and 0.03%, respectively). Both P. pseudosyringae and P. boehmeriae are homothallic (inbreeding), and the observed low levels of heterozygosity are consistent with this.
Genome sizes estimated from sequence reads were 52.1 and 46.2 Mb for P. pseudosyringae and P. boehmeriae, respectively. These estimates suggest that our assemblies for P. pseudosyringae and P. boehmeriae are largely representative of the genome content. Discrepancy in genome size estimates and assembly sizes may be resolved through use of longer read sequencing technologies and flow cytometry (Cui et al. 2019;Lee et al. 2020;van Poucke et al. 2021). Genome sizes of Phytophthora species, estimated by flow cytometry, were typically larger than the genome assembly sizes from sequencing (McGowan and Fitzpatrick 2017;van Poucke et al. 2021). Flow cytometry estimated a haploid genome size for P. pseudosyringae ranging from 72 to 86 Mb (van Poucke et al. 2021); no data are available for P. boehmeriae.
The repetitive DNA content in Phytophthora genomes can be highly variable, ranging from 74% for P. infestans (Haas et al. 2009) to 15% in P. plurivora (Vetukuri et al. 2018). These repeats encompass a diverse range of DNA sequences, but are primarily made up of mobile elements, either intact or degraded. We estimated the proportion of mobile element repeats in the two genome assemblies reported here at 22.4% (10.9 Mb) for P. pseudosyringae and 13.3% (5.3 Mb) for P. boehmeriae.
Phytophthora species are at least diploid during asexual stages and may have elevated ploidy. We used nQuire to estimate ploidy from sequence data, where the smallest DlogL is accepted as the ploidy level. We found the greatest support for tetraploidy in P. pseudosyringae (DlogL ¼ 1534) and P. boehmeriae (DlogL ¼ 603.8).

Gene prediction, and BUSCO v2 estimation of genome completeness
We predicted 15,624 genes for P. pseudosyringae and 12,121 genes for P. boehmeriae. For P. pseudosyringae, this is a similar gene number to that previously predicted by McGowan et al. (2020). The completeness of the genome assemblies was estimated using a set of 234 conserved stramenopile genes (BUSCO v2; Table 2). This showed a high degree of gene representation for P. pseudosyringae (92.3%) and P. boehmeriae (90.2%). The assemblies for P. pseudosyringae and P. boehmeriae had no duplicated or fragmented BUSCO genes (Table 2).
Gene duplication was also assessed for both species, identifying single copy, dispersed copies, proximal duplications, tandem duplications, and segmental duplications. P. boehmeriae and P. pseudosyringae have similar proportions of single-copy and duplicated genes (Figure 1; Supplementary Table S2).

Pathogenicity genes
Phytophthora species are known to deploy a diverse array of secreted proteins to facilitate infection (Schornack et al. 2009;McGowan and Fitzpatrick 2017). We predicted signal peptides for a total of 1,599 and 1,459 proteins from P. pseudosyringae and P. boehmeriae, respectively.
A major component of the pathogenicity arsenal of Phytophthora species are a large group of secreted proteins that are translocated into plant cells to exert their function. They are characterized by a conserved RXLR peptide motif (arginine-any amino acid-leucine-arginine) within the N-terminal 50 amino acids (Whisson et al. 2007;Wang et al. 2017). We adopted an inclusive strategy for identifying candidate RXLR effector coding genes, incorporating results from three different search strategies. We identified 279 and 380 candidate RXLR effectors from P. pseudosyringae and P. boehmeriae, respectively (Supplementary Table S3). The candidate RXLR effector number for P. pseudosyringae is higher than that determined by McGowan et al. (2020) due to the more inclusive prediction strategy used here. The list of RXLR effector candidates will likely include some false positive predictions, but these can be eliminated using future transcriptome experiments of plant infection; RXLR effectors are typically up-regulated during infection (Haas et al. 2009;Jupe et al. 2013). From our RXLR predictions, we identified 118 and 276 proteins possessing both complete RXLR and downstream EER motifs from P. pseudosyringae and P. boehmeriae, respectively. Both of these motifs have been demonstrated to have a role in translocating effectors into plant cells (Whisson et al. 2007;Dou et al. 2008), and the RXLR motif represents a protease cleavage site (Wawra et al. 2017).
In other Phytophthora genomes, RXLR effectors predominantly reside in gene poor, repeat rich regions, which is reflected in larger intergenic distances (Haas et al. 2009;Vetukuri et al. 2018).
We also determined the 5' and 3' intergenic distances from RXLR effector coding genes from P. pseudosyringae and P. boehmeriae and plotted them against all genes from the genomes, and against the set of genes identified in BUSCO analysis. Only genes for which both 5' and 3' intergenic distances could be calculated were included in plots. This revealed that, similar to other examined Phytophthora species, the intergenic distances for RXLR effector coding genes in these species are greater than those for the core set of genes (Figure 2; Supplementary Table S4).
Using regular expression and HMM searches to predict candidate secreted CRN effectors, we predicted only a single candidate in P. pseudosyringae, and two in P. boehmeriae (Supplementary  Table S5). This was surprising, as most Phytophthora genomes sequenced to date encode numerous candidate CRN effectors (McGowan and Fitzpatrick 2017). Many predicted CRN-family proteins do not possess predicted signal peptides (McGowan and Fitzpatrick 2017) and here we found that none of the three CRN candidates possessed a predicted signal peptide. McGowan et al. (2020) manually curated 90 CRN candidates in P. pseudosyringae, but only 37 were considered to be secreted effector proteins. Our findings suggest that CRN proteins are not major contributors to pathogenicity in the two species sequenced.
Phytophthora species are also known to possess multiple genes encoding proteins that can cause plant cell death, either by cytotoxicity or through elicitation of plant immune responses. Regular expression searches for two families of these proteins revealed four candidate PEP13 elicitor containing proteins in P. pseudosyringae and P. boehmeriae (Supplementary Table S5). For cytotoxic NLP1-like proteins, seven candidates were identified in P. pseudosyringae (six predicted secreted), and 14 in P. boehmeriae (12 secreted) (Supplementary Table S5). Evangelisti et al. (2017) identified a class of 42 predicted secreted proteins from P. palmivora that contained a conserved HXGPCEXXXDD peptide motif. Searches of the predicted proteomes of the two species sequenced here identified 34 proteins containing this motif from P. pseudosyringae (29 predicted secreted), and 42 from P. boehmeriae (36 secreted) (Supplementary Table S5). That the motif is conserved in similar numbers of proteins in these species from different clades of the genus, and a high proportion are predicted to be secreted, suggests that they may be a class of candidate effector proteins important for Phytophthora pathogenicity.
Another significant set of genes involved in pathogenicity encode carbohydrate active (CAZy) proteins, especially those with   potential enzymatic activity for digesting plant cell walls, allowing pathogen ingress. Phytophthora genomes are predicted to encode hundreds of CAZymes, often present as gene families of closely related members (Ospina-Giraldo et al. 2010;Brouwer et al. 2014;McGowan and Fitzpatrick 2017). We used three CAZy prediction tools within dbCAN2 to identify candidate CAZy proteins and included all positive returned sequences as candidates, since Phytophthora proteins may be more divergent than many CAZy proteins represented in databases. We predicted 565 and 503 CAZy proteins for P. pseudosyringae and P. boehmeriae, respectively. Of particular interest are the lytic polysaccharide monoxygenases (within the auxiliary activity grouping), glycoside hydrolases, polysaccharide lyases, and carbohydrate esterases (Supplementary Table S6). The number of proteins predicted for each CAZy family was broadly similar in the two Phytophthora species sequenced, but with some differences that may reflect the pathogenic niche for each species. For example, P. pseudosyringae possesses single secreted lytic cellulose monooxygenase (AA16) and secreted cellulose binding (CBM1) proteins, while P. boehmeriae possesses four of each. Similarly, P. boehmeriae possesses a single secreted glycoside hydrolase family 10 protein, while P. pseudosyringae possesses four.

Evidence for HGT
During their evolution, the oomycetes may have acquired genes from other kingdoms, with genes potentially transferred horizontally from fungi and bacteria to oomycetes (Morris et al. 2009;Richards et al. 2011;McCarthy and Fitzpatrick 2016). We carried out a strict analysis of potential HGT events in the two genomes presented here, using a high AI threshold value (Supplementary  Table S7). Our approach identified four potential HGT events in P. pseudosyringae, all but one from eukaryotic sources. These encompassed proteins with similarity to a TPR/SEL1 repeat protein (PHYPSEUDO_011742), a glycoside hydrolase family 63 (PHYPSEUDO_013022), an alpha/beta hydrolase (PHYPSEU DO_013053), and an M54 peptidase (PHYPSEUDO_014249). A single virus (Catovirus) to oomycete HGT event was identified for P. boehmeriae (mRNA capping enzyme; PHYBOEH_005573). Whether these candidate HGTs contribute significantly to pathogenicity remains to be determined experimentally.

Analysis of selection on RXLR effectors and BUSCOs
The RXLR effectors encoded by Phytophthora genomes are considered to be evolving at a greater rate than core ortholog proteins (Win et al. 2007). This may be reflected in characteristics such as lower levels of sequence identity with orthologous proteins and shorter length of sequence similarity. When the RXLR effector complements and total predicted proteomes from both genomes in this report were searched against the NCBI NR database, the RXLR effectors exhibited lower levels of sequence identity, lower alignment lengths, and lower bitscores (Figure 3; Supplementary Figure S1). The total predicted proteome showed the opposite trend to the RXLR effectors, with higher levels of sequence identity ( Figure 3; Supplementary Figure S1). The more rapid rate of evolution of RXLR effectors has been reflected in a dN/dS score greater than 1.0 (Win et al. 2007). We evaluated dN/dS scores for the two genome assemblies reported here, using both Orthofinder and a reciprocal best blast hit (RBBH) clustering network, the latter being a stricter method. For comparison, we also performed this analysis on the BUSCO orthologs identified when analyzing gene representation in the two genomes. Using Orthofinder, only one gene in the BUSCO gene set had a dN/dS value greater than 1, signifying positive selection (PHYPSEUDO_008773; PITH domain protein; dN/dS ¼ 1.6). Using RBBH, no genes in the BUSCO set showed evidence for positive selection. Using Orthofinder and all RXLR coding genes from both genomes, we found 238 RXLR coding genes were present in clusters with at least two other genes, of which 54 genes showed evidence of positive selection (dN/dS > 1.0) (Supplementary Table S8). RXLR effectors from P. pseudosyringae were most highly represented (44) among those exhibiting positive selection, with over four-fold fewer from P. boehmeriae (10). The RBBH strategy revealed only three RXLR genes with evidence for positive selection, two from P. pseudosyringae (PHYPSEUDO_006359 and PHYPSEUDO_007604) and one from P. boehmeriae (PHYBOE H_007729). In other Phytophthora species, the number of candidate RXLR effectors exhibiting a signature of positive selection has ranged from as few as one in P. plurivora (Vetukuri et al. 2018) to greater than 20 in P. ramorum (Win et al. 2007). The effectors with Figure 2 Plots of 5' against 3' intergenic distances (log 10 ) for genes from P. pseudosyringae and P. boehmeriae. Richness of gene density for intergenic distances is represented by color scale ranging from blue (low) to red (high). Genes encoding RXLR effector proteins are shown as black triangles; BUSCO genes are shown as yellow dots. Only genes for which both 5' and 3' intergenic distances could be calculated are shown.
the greatest level of positive selection are candidates for further studies into their function in disease development.

Conclusions
Genomes from two Phytophthora species that can infect trees have been sequenced and assembled. Our assembly for P. pseudosyringae is the most complete. Less fragmented genome assemblies may be achieved in the future by using longer read sequencing. The P. pseudosyringae strain used here is the second for this species to be sequenced and will begin to provide insights into gene diversity in this globally prevalent Phytophthora species. The genome sequence for P. boehmeriae will add to the genome sequencing coverage in Clade 10, which includes the important tree pathogen P. kernoviae, and provide a resource for evolutionary studies within this clade and the genus. The candidate pathogenicity proteins identified here will provide a basis for further experimental research, such as transcriptomic analyses and effector function assays, to gain a deeper understanding of Phytophthora pathogenesis on trees.
Supplementary material is available at G3 online.