A High-Quality Grapevine Downy Mildew Genome Assembly Reveals Rapidly Evolving and Lineage-Specific Putative Host Adaptation Genes

Abstract Downy mildews are obligate biotrophic oomycete pathogens that cause devastating plant diseases on economically important crops. Plasmopara viticola is the causal agent of grapevine downy mildew, a major disease in vineyards worldwide. We sequenced the genome of Pl. viticola with PacBio long reads and obtained a new 92.94 Mb assembly with high contiguity (359 scaffolds for a N50 of 706.5 kb) due to a better resolution of repeat regions. This assembly presented a high level of gene completeness, recovering 1,592 genes encoding secreted proteins involved in plant–pathogen interactions. Plasmopara viticola had a two-speed genome architecture, with secreted protein-encoding genes preferentially located in gene-sparse, repeat-rich regions and evolving rapidly, as indicated by pairwise dN/dS values. We also used short reads to assemble the genome of Plasmopara muralis, a closely related species infecting grape ivy (Parthenocissus tricuspidata). The lineage-specific proteins identified by comparative genomics analysis included a large proportion of RxLR cytoplasmic effectors and, more generally, genes with high dN/dS values. We identified 270 candidate genes under positive selection, including several genes encoding transporters and components of the RNA machinery potentially involved in host specialization. Finally, the Pl. viticola genome assembly generated here will allow the development of robust population genomics approaches for investigating the mechanisms involved in adaptation to biotic and abiotic selective pressures in this species.

Introduction biotrophy (Thines and Kamoun 2010), a parasitic relationship in which the pathogen extracts nutrients from the living tissues of its host. An understanding of the virulence mechanisms of oomycete pathogens and of the ways in which they adapt to their hosts is the key to developing sustainable control strategies. Several oomycete genome sequences have already been published, for species from the genus Saprolegnia (Jiang et al. 2013), white rusts (Kemen et al. 2011;McMullan et al. 2015), the genus Pythium (L evesque et al. 2010;Adhikari et al. 2013), the genus Phytophthora (Tyler et al. 2006;Haas et al. 2009;Lamour et al. 2012;Liu et al. 2016), and downy mildews (Baxter et al. 2010;Derevnina et al. 2015;Sharma et al. 2015;Kobayashi et al. 2017;Fletcher et al. 2018). These abundant genomic data for oomycetes have provided opportunities to improve our understanding of the evolutionary processes linked to pathogenic lifestyles. For example, it has made it possible to identify genes lost following the adaptation to obligate biotrophy (Baxter et al. 2010;Kemen et al. 2011) and novel effector families involved in the manipulation of plant defenses (Bozkurt et al. 2012), and has given rise to the "two-speed genome" concept for filamentous plant pathogens (Dong et al. 2015).
Plasmopara viticola, an obligate biotrophic and heterothallic (i.e., sexual reproduction requires individuals with different mating types) diploid oomycete, is the causal agent of downy mildew on grapevine (species of the Vitaceae family). It is endemic to North America and was introduced into Europe in the late 1870s (Fontaine et al. 2013). This pathogen is now found on cultivated grapevine (Vitis vinifera) worldwide, and downy mildew is considered to be one of the most important diseases of grapevine. Plasmopara viticola has recently been shown to form a complex of host-specialized species on wild Vitis in North America (Rouxel et al. 2013(Rouxel et al. , 2014. The development of microsatellite markers for Pl. viticola over the last decade has led to improvements in our understanding of the genetic diversity, epidemiology (Gobbin et al. 2005(Gobbin et al. , 2006, and invasion history of this species in Europe (Fontaine et al. 2013). Plasmopara viticola has a strong adaptive potential, as demonstrated by the rapid emergence of fungicide resistances in this species (reviewed in Delmas et al. 2017) and of the recent erosion of plant partial resistances (Delmas et al. 2016). Resources for population genetics studies based on genomewide diversity, and notably a highly contiguous reference genome assembly, are required to decipher the processes involved in this rapid adaptation, but such tools are currently lacking.
Upon infection, oomycetes secrete effector proteins, which modify host metabolism and defenses for the benefit of the pathogen. Effectors are described as apoplastic when they act in the extracellular space of the plant, and as cytoplasmic when translocated into the plant cell. RxLR effectors are the most widely studied class of cytoplasmic effectors (Bozkurt et al. 2012). Candidate apoplastic and cytoplasmic effectors have been described for Pl. viticola (Yin et al. 2015;Mestre et al. 2012Mestre et al. , 2016Xiang et al. 2017) and several genes encoding RxLR effectors have recently been functionally characterized (Xiang et al. 2016(Xiang et al. , 2017Liu et al. 2018). However, the role of these proteins in infection and adaptation to the host remains unclear. Genomic resources for Pl. viticola are required to improve our understanding of the genetic mechanisms underlying plant-pathogen interactions in this pathosystem and their coevolution. Recent years have seen the publication of various transcriptomes (Yin et al. 2015;Mestre et al. 2016) and draft genome sequences based on short-read sequencing (Dussert et al. 2016;Yin et al. 2017;Brilli et al. 2018). Unfortunately, these assemblies have low completeness scores and remain fragmented.
We produced a high-quality assembly of the Pl. viticola genome with PacBio long-read technology, which we used to characterize more extensively the repertoire of pathogenicity-related genes in this species. We also used short reads to assemble the first genome for Plasmopara muralis, a phylogenetically close species that infects members of the Vitaceae family such as Parthenocissus quinquefolia and Parthenocissus tricuspidata. We used these sequences and available oomycete genome sequences for a comparative genomics analysis, which identified gene families absent from all biotrophic oomycete species. Finally, we performed several tests to identify genes potentially under positive selection and involved in the adaptation of Pl. viticola and Pl. muralis to their hosts.

Preparation of the Material and DNA Extraction
Sporangia from the sequenced Pl. viticola isolate (INRA-PV221) were previously collected for the assembly of a draft genome sequence (Dussert et al. 2016). This isolate belongs to clade B (Pl. viticola f. sp. aestivalis), as defined by Rouxel et al. (2013). Detached grapevine leaves from V. vinifera cv. Cabernet-Sauvignon plants were disinfected with 65% calcium hypochlorite (50 g/l), washed with sterilized water and infected with fresh sporangia (the inoculum also contained 5 g/l ampicillin, 1 g/l ticarcillin-clavulanate, and 1 g/l boscalid). The plants were placed in a growth chamber for 5-7 days at 22 C (12 h light/12 h dark photoperiod). We then collected sporangiophores and sporangia in TE buffer (10 mM Tris-HCl, 1 mM EDTA). The suspension was centrifuged (20,000Â g, 4 C), the supernatant was removed and the pellet was stored at À80 C. Genomic DNA was extracted with a CTAB procedure, according to the protocol of Cheeseman et al. (2014), but with an additional ethanol precipitation step. The extracted DNA was quantified with a Qubit fluorometer (Invitrogen), and sequenced with a PacBio RS II (Pacific Biosciences) sequencer using P6-C4 chemistry (GeT-PlaGe facility, Toulouse, France).
Plasmopara muralis infects Parthenocissus quinquefolia in North America (Rouxel et al. 2014) and is also found on grape ivy (Parthenocissus tricuspidata) in Europe (Thines 2011). Sporangiophores and sporangia were collected from Pl. muralis isolate INRA-PM001 growing on the leaves of a Parthenocissus tricuspidata plant in Villenave d'Ornon, France. Genomic DNA was extracted with a standard CTAB protocol, as described by Delmotte et al. (2006), and sent to Beckman Coulter Genomics (Grenoble, France) for sequencing on an Illumina HiSeq 2000 sequencer (2Â100 bp pairedend reads).

RNA Extraction and Sequencing
Leaves of V. vinifera cv. Muscat Ottonel were inoculated with Pl. viticola isolate INRA-PV221, as described by Mestre et al. (2012). Infection was checked 24, 48, and 72 h postinoculation (hpi), by observation under a microscope, and leaves for these three time points were snap-frozen in liquid nitrogen and stored at À80 C. For each time point, three biological repetitions, with different batches of plants and inoculum, were performed. Total RNA was extracted by CTAB/ phenol-chloroform extraction followed by LiCl precipitation (Zeng and Yang 2002). Residual genomic DNA was removed by DNase treatment with the Ambion Turbo DNA-free kit. RNA quality was checked with an Agilent BioAnalyzer. RNAseq libraries were constructed from 4 mg of total RNA with the Illumina TruSeq kit. Sequencing was performed at the GeT-PlaGe facility (Toulouse, France) on an Illumina HiSeq 2000 sequencer (2Â100 bp paired-end reads) for samples at 24 and 48 hpi and on an Illumina HiSeq 3000 sequencer (2Â150 bp) for samples at 72 hpi, producing a mean of 38 million pairedend reads per sample.

Genome Assemblies
The PBcR wgs8.3rc2 assembly pipeline (Berlin et al. 2015) was used for read correction and assembly of the corrected reads. Corrected reads were also assembled with FALCON (Chin et al. 2016). Despite parameter tuning, the preliminary version of FALCON-Unzip (downloaded in April 2016) was unable to generate haplotypes and the FALCON assemblies were less comprehensive than our best PBcR assembly. However, the PBcR assembly was about twice the expected size of the genome due to the high degree of heterozygosity. We eliminated redundancy in the assembly with an ad hoc procedure, and used the FALCON assembly to scaffold the PBcR contigs. The assembly was then polished with Pilon 1.20 (Walker et al. 2014), for which outputs were modified using custom scripts. Details of the assembly, redundancy removal, and polishing procedures are available in the supplementary note, Supplementary Material online.
The Pl. muralis genome was assembled with SPAdes 3.9.0 (Bankevich et al. 2012), using a custom procedure to discard contaminants (supplementary note, Supplementary Material online). Only scaffolds of more than 1 kb in length were retained. The median read coverage for the final assembly was 120Â.

Repeat Annotation
Repeated sequences in the two genomes were annotated with REPET 2.2 (Quesneville et al. 2005;Flutre et al. 2011): De novo libraries of repeated sequences were built with the TEdenovo pipeline (default parameters except minNbSeqPerGroup: 5 and no structural search), and genomes were annotated with the TEannot pipeline (default parameters, with the procedure recommended on the software website). Repeated elements from the Repbase20.05 database were also aligned to genome sequences during this step, but were used only to softmask sequences for gene prediction and not for other analyses.

Gene Prediction
For Pl. viticola, RNA-seq reads were aligned against the genome with STAR 2.4 (Dobin et al. 2013) then assembled de novo into transcripts with Trinity release 20140717 (Grabherr et al. 2011), using a genome-guided approach. Genes were predicted with BRAKER 1.8 (Hoff et al. 2016), with the aligned RNA-seq reads, and MAKER 2.31 (Holt and Yandell 2011), with the assembled transcripts and proteomes of nine other oomycete species as evidence. For Pl. muralis, for which no RNA data were available, genes were predicted with MAKER, using gene predictor software trained with the Pl. viticola proteome. Details are available in the supplementary note, Supplementary Material online.

Quality Control
The quality of genome assemblies and gene annotations was assessed with BUSCO 2.0 (Simão et al. 2015), using the Alveolata-Stramenopiles data set. Comparison of the k-mer content between the self-corrected PacBio reads and the resulting Pl. viticola genome assembly was carried out using KAT (Mapleson et al. 2016) with a k-mer size of 21. The Pl. viticola long-read assembly was aligned against the previously published draft assembly for the same isolate (Dussert et al. 2016) with the SynMap tool ) from CoGe  with default parameters, and the results were visualized with Circos (Krzywinski et al. 2009). We also used QUAST 2.3 (Gurevich et al. 2013) to compare the annotated PacBio assembly (used as a reference) with the previous draft.

Protein Functional Annotation and Secretome Characterization
Functional annotation of the proteome and gene ontology (GO) term mapping were performed with Blast2Go PRO 4.1.5 (Gotz et al. 2008), using results from BLASTPþ (-evalue 1e À5 -max_hsp 20 -num_alignment 20) with the NCBI nr database and from InterProScan. For the identification of putatively secreted proteins, we first discarded proteins predicted by TargetP 1.1 (Emanuelsson et al. 2000) to have a mitochondrial targeting peptide (RC value between 1 and 3). The remaining proteins were analyzed with TMHMM 2.0 (Krogh et al. 2001), retaining only those with no transmembrane domain or a single transmembrane domain in the first 70 amino acids. Finally, SignalP 4 was used to detect signal peptides in the remaining proteins (Petersen et al. 2011).
One of the most widely studied families of secreted effectors in Phytophthora species and downy mildews has a conserved N-terminal RxLR (Arg-X-Leu-Arg) domain, often followed by a dEER (Asp-Glu-Glu-Arg) motif. We searched for candidate effector RxLR proteins in each secretome, using the Galaxy workflow of Cock et al. (2013): Genes identified as putative effectors by the method described by Win et al. (2007), based on regular expression searches, and either of the two methods described by Whisson et al. (2007), based on regular expressions and a HMM profile, were retained. These candidates were used as queries in high-stringency (evalue 1e À30 ) BLASTP searches against the respective Plasmopara proteomes. The sequences of the proteins identified as BLAST hits were aligned, and those with clear RxLR and/or dEER motifs at the expected positions were retained. In parallel, secretomes were used as queries in a low-stringency (-evalue 1e À5 ) BLASTP search against a database of oomycete RxLR effectors, and BLAST hits were considered as candidates. Finally, candidate effector genes were subjected to a structural homology analysis with Phyre2 (Kelley et al. 2015). Proteins with high-confidence (100%) structural homology to proteins with known functions but lacking RxLR and/or dEER motifs at the expected positions were considered to be false-positives and were discarded. The WY-motif, a conserved structural fold, is often found in RxLR proteins (Boutemy et al. 2011). We searched for this motif in secreted proteins with HMMER 3.1b2 (available at hmmer.org; last accessed December 19, 2017), using the HMM profile from Boutemy et al. (2011), with the same domain score cutoff of 0. For proteins with WY motifs not detected as putative RxLR effectors, amino acid sequences were aligned with MAFFT 7 (Katoh and Standley 2013), and sequence logos from these alignments were generated using WebLogo 2.8.2 (Crooks et al. 2004).
The secretomes of the two species were tested for enrichment in GO terms for the "biological process" and "molecular function" ontologies with the topGO package (Alexa and Rahnenfuhrer 2016) in R (R Core Team 2017). We used Fisher's exact test with the "elim" algorithm, which takes into account local dependencies between GO terms (Alexa et al. 2006). With this algorithm, tests for different GO terms are not independent, and a classical false discovery rate (FDR) approach (Storey and Tibshirani 2003) is not applicable. We followed the procedure of Daub et al. (2013), using modified R code from this paper, to calculate the empirical FDR with 200 permutations of the gene label (secreted/not secreted), and we reported GO terms with an FDR <10%.

Genome Characteristics and Architecture
The size of coding sequences, coding exons and introns, and the number of exons were determined with GenomeTools (Gremme et al. 2013). The RNA-seq read count for Pl. viticola genes in each library was calculated with htseq-count from HTseq 0.6.1p1 (Anders et al. 2015), and read counts per million (CPM) were computed with R. A gene was considered to be supported by RNA data if its CPM value was at 1 or more in at least one biological replicate. Depending on the library, only 0.03-4.04% of reads were mapped onto the genome, so we decided not to perform differential gene expression analysis with these data.
For Pl. viticola, Illumina paired-end reads from a previous genome draft (and used for sequence correction, see supplementary note, Supplementary Material online) were aligned against the new assembly with BWA (Li and Durbin 2009). Heterozygous sites were then detected with VARSCAN 2.3 (Koboldt et al. 2012) (-min-coverage 20 -min-reads2 10min-avg-qual 15 -min-var-freq 0.2 -P-value 0.01). Sites with a read coverage more than 1.5 times the mean genome coverage were discarded. Using BEDtools 2.25 (Quinlan and Hall 2010), we determined the proportion of heterozygous sites, gene density, repeat coverage and GC content for genomic windows of 100 kb, and visualized the results with Circos. This analysis was not carried out with Pl. muralis, because the genome assembly was too fragmented.
The genome architecture of the two species was characterized by assessing local gene density (Saunders et al. 2014). For each gene, the 5 0 and 3 0 intergenic distances were calculated, and the gene count for 2D distance bins was plotted as a heatmap in R. Genes located at the scaffold extremities were not taken into account for this analysis. The distance between genes and their closest repeat sequence (not allowing for overlaps) was calculated with BEDtools. Additionally, distances to the closest gene and to the closest repeat sequence were compared between gene categories (i.e., secreted and nonsecreted, core, and specific) using a Wilcoxon rank-sum test in R. We investigated the possible relationship between intergenic distance and the orientation of adjacent gene pairs, by classifying adjacent gene pairs as oriented head-to-head (HH, adjacent pairs oriented !), tail-to-tail (TT, ! ), or tail-to-head (TH, ! ! or ). The observed proportions of gene pairs in the different orientations were compared with the proportions expected by chance (0.25 for HH and TT, 0.50 for TH) using a v 2 test performed in R.

Comparative Genomics with Other Oomycetes
Gene synteny between the genomes of Pl. viticola and Pl. muralis, Pl. halstedii, Ph. infestans, and Py. ultimum was assessed with SynMap (merging of syntenic blocks with Quota Align Merge, syntenic depth 1:1) from CoGe, and visualized with Circos. Orthology relationships for the genes of Pl. viticola, Pl. muralis, and 19 other oomycetes from the orders Peronosporales, Pythiales, Albuginales, and Saprolegniales and including five other obligate biotrophic species (supplementary table 1, Supplementary Material online) were determined with OrthoFinder 1.1.2 (Emms and Kelly 2015). The results were visualized with the R package UpSetR (Conway et al. 2017) for various species groups. GO enrichment analyses were carried out for genes absent from all biotrophs as described above, using the GO annotation for nine species available from the UniProt database (supplementary table 1, Supplementary Material online). The Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation of the Ph. infestans proteome, available through the KEGG API, was also used to predict the function of orthologs missing from all biotrophs and the biological pathways they are involved into.
The amino acid sequences of single-copy genes detected in all oomycetes in the orthology analysis were aligned with MAFFT 7. Alignments were cleaned with GBlocks (Castresana 2000) with default parameters and concatenated, discarding protein alignments extending over less than 50 positions. The concatenated alignment was used to infer a phylogeny for the species considered with PhyML 3 (Guindon et al. 2010), available from http://www.atgc-montpellier.fr/phyml-sms (last accessed Septembre 15, 2017), with automatic selection of the substitution model with SMS (Lefort et al. 2017) and 100 bootstraps.

Detection of Positive Selection
To detect genes with a high substitution rate in Pl. viticola and Pl. muralis, we computed pairwise dN/dS values between pairs of orthologs from the two species. Pairs of orthologs were identified by the reciprocal best hits (RBH) method with BLASTPþ (-evalue 1e À6 -qcov_hsp_perc 50, min. identity 40%) and using the Smith-Waterman algorithm, as suggested by Moreno-Hagelsieb and Latimer (2008). The nucleotide sequences of RBH pairs were aligned with MUSCLE 3.8.31 (Edgar 2004), using protein alignments as guides, and poorly aligned regions were removed with GBlocks, trimming only entire codons (Àt ¼ c Àb3 ¼ 6 Àb4 ¼ 9). Alignments of less than 120 bp in length were discarded, and pairwise dN/dS values were calculated with codeml (runmode ¼À2) in PAML 4.8 (Yang 2007). Pairs of orthologs with dS values greater than 3 and dN/dS values of 99 were discarded (final data set: 9,053 pairs of orthologs). Mean pairwise dN/dS values for different gene categories (e.g., nonsecreted and secreted genes, or core and specific genes) were compared in t tests, assuming unequal variance, in R.
The pairwise dN/dS approach makes it possible to analyze large numbers of pairs of orthologs and to include genes specific to Pl. viticola and Pl. muralis, but has some drawbacks. First, focusing on gene pairs with a pairwise dN/dS >1 as a signature of positive selection is not a very sensitive approach, because pairwise dN/dS values provide an average estimate of selective pressure over the whole gene. Indeed, whereas most sites in most proteins are probably subject to strong evolutionary constraints, some proteins may have specific regions or sites under positive selection that remain undetected in the pairwise approach because their overall ratio is below 1. In addition, a high pairwise dN/dS value may indicate that the protein evolved rapidly in one or both species. We obtained a more precise assessment of rate changes in the various evolutionary branches and between sites, by carrying out a branch-site test (Zhang et al. 2005) to detect positive selection affecting a subset of sites in a particular lineage. This analysis was performed with data from six species: Pl. viticola, Pl. muralis, Pl. halstedii, Ph. infestans, Ph. nicotianae, and Ph. capsici. Orthology relationships for the genes of the six species were determined with OrthoFinder, retaining only single-copy orthologs (resulting in 4,042 ortholog groups). For each ortholog group, nucleotide sequences were aligned and cleaned as described for the pairwise dN/dS computation, resulting in a final data set of 4,017 ortholog groups. Positive selection was tested separately for the Pl. viticola and Pl. muralis branches. The likelihoods of the null (no selection) and alternative (positive selection) models were calculated with codeml (model ¼ 2, NSsites ¼ 2). For the alternative model, three runs with different starting values for omega (0.5, 1, and 1.5) were performed, and the run with the highest likelihood was retained. The two models were compared in a likelihood-ratio test in R, with the test statistic following a v 2 distribution with 1 degree of freedom. P-values were corrected for multiple testing by the FDR procedure in the R package qvalue (Dabney et al. 2015), and we retained genes with an FDR <10%. GO enrichment analyses for genes with a significant positive selection signal were performed as described above.

Results and Discussion
A New High-Quality Pl. viticola Genome Assembly The PacBio sequencing of Pl. viticola isolate INRA-PV221 produced 2.6 million reads (mean size: 8,072 bp, total data: 21.2 Gb). The genome size of Pl. viticola has been estimated at about 115 Mb (Voglmayr and Greilhuber 1998), so the amount of data obtained corresponded to a mean sequencing depth of 185Â. The use of long reads at high coverage, followed by ad hoc redundancy removal and polishing with short reads resulted in a high-quality pseudo-haploid assembly (table 1), with only 358 scaffolds and a total size of 92.94 Mb. This assembly was more complete (95.7% of expected BUSCO genes, Alveolata-Stramenopiles data set, supplementary table 2, Supplementary Material online) than previous drafts generated with short reads only (Dussert et al. 2016;Brilli et al. 2018) or with a hybrid approach in which long reads were used for scaffolding (Yin et al. 2017). The assembly was also more contiguous, resolving many of the repetitive regions (supplementary fig. 1, Supplementary Material online). Heterozygosity levels were high in the sequenced Pl. viticola isolate, with a mean of 7,952 heterozygous sites/ Mb. The assembly of highly heterozygous diploid species is often problematic. However, comparison of the 21-mer content between the self-corrected PacBio reads and the Pl. viticola assembly (supplementary fig. 2, Supplementary Material online) showed a k-mer spectrum very similar to what is expected in a good haploid assembly, with only a small proportion of the homozygous content found duplicated (blue area in the homozygous peak). The percentage of duplicated genes in our assembly was also lower than that for previous drafts, with only 1.7% of duplicated BUSCO genes. This indicated that our procedure for eliminating alternative haplotypes was efficient. The higher heterozygosity of Pl. viticola than of the homothallic Pl. halstedii (120 heterozygous sites/ Mb; Sharma et al. 2015) was probably due to its heterothallism, which strongly favors outcrossing. Repeat sequences accounted for 37.34% of the genome assembly for Pl. viticola. Local repeat coverage varied considerably, from 0% to 100% in our 100 kb window analysis ( fig. 1). Most (78.5%) of the identified repeat sequences were LTR retrotransposons and their nonautonomous derivatives (supplementary table 3, Supplementary Material online). In total, 15,960 proteincoding genes were predicted for Pl. viticola, 75.6% of which were supported by RNA-seq data. The completeness of the annotation was high, with 97.4% of the expected BUSCO proteins (table 1). Almost half the genes (46.7%) consisted of a single exon (table 1). As observed for repeat coverage, gene density in Pl. viticola varied considerably over the genome, from 0 to 53 genes per 100 kb window ( fig. 1).
The assembly of the Pl. muralis genome with short reads resulted in 1,437 scaffolds, for a total size of 59.54 Mb (table 1). Estimated genome sizes in Plasmopara species range from 90 to 160 Mb (Voglmayr and Greilhuber 1998), so the smaller assembly size suggested that regions of the genome might be missing. However, the percentage of genes expected to be conserved was high in the assembly (95.8% of BUSCO genes), indicating a high level of completeness. Only 10.43% of the assembly consisted of repeat elements, suggesting that most of the repeated regions were not assembled, probably due to the exclusive use of paired-end reads with a short insert size. As in Pl. viticola, most (62.4%) of the identified repeat sequences were retrotransposons of the LTR order and their derivatives (supplementary table 3, Supplementary Material online). We annotated 12,194 protein-coding genes in the genome of Pl. muralis. The completeness of the annotation was again high, with 93.6% of expected BUSCO proteins (table 1). As observed for Pl. viticola, a large proportion of genes (37.7%) in Pl. muralis were composed of single exons.

Gene Losses Linked to Biotrophic Lifestyle in Oomycetes
The Pl. viticola genome displayed a high degree of gene synteny with the genomes of Pl. muralis, Ph. infestans, and Py. ultimum. Depending on the species used for comparison, the mean size of the syntenic blocks in Pl. viticola ranged from 130 to 172 kb, with syntenic regions covering a total of 37.7% to 43.5% of the genome (supplementary table 4, Supplementary Material online). We observed a very low level of synteny between Pl. viticola and Pl. halstedii, which was surprising given the closeness of the phylogenetic relationship between these species: The mean syntenic block size was 38 kb, and syntenic regions accounted for only 10.2% of the genome. Moreover, syntenic blocks were not in the same order in Pl. viticola and Pl. halstedii, whereas their organization was similar in Pl. viticola and the other species (supplementary fig. 3, Supplementary Material online). This suggested that either the genome of Pl. halstedii had undergone extensive genomic rearrangements, or alternatively, could indicate the existence of missassemblies in the Pl. halstedii reference genome. The orthology analysis detected 18,115 ortholog groups. Among these groups, 8,430 included Pl. viticola proteins and 8,158 included Pl. muralis proteins (8,044 ortholog groups included proteins from both species). A total of 2,474 ortholog groups were common to all oomycete species considered ( fig. 2A), including 163 singlecopy orthologs. The phylogeny based on these single-copy orthologs (144 single-copy orthologs after filtering, for a total of 41,428 amino acid positions) suggested that downy mildews belonging to the Peronosporales (Plasmopara genus, Pe. tabacina and Hy. arabidopsidis) were paraphyletic and, consequently, that obligate biotrophy had evolved independently at least twice within this order ( fig. 2B; supplementary fig. 4 Previous comparative studies led to the discovery of genes lost following the adaptation to biotrophy in particular  oomycete species (Baxter et al. 2010;Kemen et al. 2011). We searched for gene families common to or lost in all oomycete biotrophic species. Biotrophic species had no specific ortholog group in common, and only one group of orthologs (containing proteins of unknown function) was common to the biotrophic species of the Peronosporales order (Plasmopara sp., Pe. tabacina, and Hy. arabidopsidis), consistent with the notion that the two lineages of biotrophic downy mildew species evolved independently. We identified 135 groups of orthologs absent exclusively in biotrophic species ( fig. 2A), corresponding to gene losses common to all biotrophs. Although some of these gene losses might be random, they also could be due to convergent adaptive processes. Some of the GO terms for which enrichment was detected in these ortholog groups (supplementary table 5, Supplementary Material online) were totally absent from biotrophic oomycete proteomes. This was the case for proteins with riboflavin (vitamin B2) transport activity. Riboflavin is the precursor of the FAD and FMN cofactors, found in flavoproteins, and a significant depletion of GO terms characterizing flavoproteins ("FAD binding", "FMN binding", "D-amino acid oxidase activity" and "succinate dehydrogenase activity") was also observed in biotrophs. Overall, biotrophic species had smaller numbers of flavoproteins, with only 17 on average, versus 57 in other species (supplementary table 6, Supplementary Material online). Cobalamin (vitamin B12, another cofactor) binding proteins, such as methylmalonyl-CoA mutase and the cobalamindependent methionine synthase, were also totally absent from biotrophic species. Furthermore, one of the groups of orthologs absent from biotrophic species contained proteins involved in cobalamin metabolism. A possible explanation for the absence of these genes is that biotrophic species might not be able to obtain these cofactors or their precursors easily from the host, and they have therefore lost genes encoding proteins requiring these cofactors.
Only 15 of the Ph. infestans orthologs missing in biotrophic species were KEGG-annotated (supplementary table 7, Supplementary Material online), and most of their functions were found in the GO enrichment analysis. Among them were two proteins (PITG_13128 and PITG_21426) from the sterol biosynthesis pathway. These genes are expressed in Ph. infestans, but their actual function is unknown because they do not seem to be involved in sterol modification (Dahlin et al. 2017). The protein from the pyrimidine metabolism pathway (PITG_02105) may be involved in the pyrimidine salvage pathway during the necrotrophic phase (Garc ıa-Bayona et al. 2014), which might explain why it has been lost in biotrophic oomycetes.
The GO terms depleted in biotrophic species also included "proteolysis", which encompassed two ortholog groups including proteins with PAN/Apple domains, several of which had a secretion peptide. These genes may have been lost to facilitate the evasion of host recognition, because PAN/Apple domains, which bind polysaccharides or proteins, can induce plant defenses (Larroque et al. 2012). The proteins annotated with the terms DNA binding/DNA recombination/DNA integration probably constituted two large transposable element families, surprisingly absent from all biotrophic oomycetes. Finally, the missing groups of orthologs included a large family of transmembrane amino acid transporters, two ortholog groups of ubiquitin-protein ligases, one ortholog group of patatin-like phospholipases (a function also found in the Ph. infestans KEGG annotation) and one group of secreted peroxiredoxins. These gene families may be involved in specific metabolic pathways that have been lost in biotrophic oomycetes, but the reasons for this loss remain unclear because the precise functions of these proteins are unknown.

Repertoire of Pathogenicity-Related Proteins in Pl. viticola and Pl. muralis
Plant pathogens secrete proteins to facilitate successful host colonization and growth. These proteins may remain in the apoplastic region or are delivered to the cytoplasm of plant cells. The proteome of Pl. viticola contained 1,592 putatively secreted proteins (9.97% of the proteome), 1,244 of which were supported by RNA-seq data. This corresponded to about 2.5 times as many secreted proteins as were found in Pl. halstedii (631), but was similar to the numbers found in Phytophthora species (see Sharma et al. 2015 for review). The number of secreted proteins in the Pl. muralis proteome (720, i.e., 5.90% of the proteome) was closer to that for Pl. halstedii. Differences in secretome size may be due at least partly to differences in assembly completeness. Indeed, a comparison of our previous draft with the long-read assembly with QUAST revealed that 4,477 genes, including 655 encoding secreted proteins, were missing from the INRA-PV221 short-read assembly. Thus, the PacBio assembly recovered a larger number of genes, including a considerable number of genes encoding secreted proteins. The secretomes of the two species included a large proportion of candidate pathogenicity-related genes (table 2) putatively involved in plant-pathogen interactions. Furthermore, GO enrichment analysis (supplementary tables 8 and 9, Supplementary Material online) showed that these secretomes were enriched in functions linked to plant cell wall modifications (degradation and modification of cellulose and pectin), proteolysis, protease inhibition, and reactive oxygen species metabolism, which are involved in plant defenses.
RxLR proteins, a class of cytoplasmic effectors (i.e., translocated into the plant cell), accounted for a large proportion of the proteins in the secretomes. The secreted proteins included 540 proteins from Pl. viticola and 132 proteins from Pl. muralis (33.9% and 18.3% of the secreted proteins, respectively) identified as candidate RxLR effector genes by at least one method (supplementary table 10, Supplementary Material online). Phyre2 analysis revealed that a large proportion of the candidates identified by BLAST also displayed high-confidence structural homology to known RxLR effectors. The number of putative RxLRs with RNA-seq support in Pl. viticola (484) was also high. There were 343 secreted proteins (including 317 RxLRs) with at least one WY motif in Pl. viticola, and 78 (66 RxLRs) in Pl. muralis. The number of WY domains in a given protein ranged from 1 to 17 (median: 3). The sequences of secreted proteins with WY motifs not annotated as RxLR contained a dEER domain, as shown by their sequence logo (supplementary fig. 5, Supplementary Material online), indicating that they might also act as RxLR-like effectors.
Genes specific to a lineage are often involved in speciesspecific adaptive processes (Khalturin et al. 2009). We therefore searched for genes restricted to Plasmopara species infecting hosts from the Vitaceae. Orthology analysis identified 229 ortholog groups (comprising 691 Pl. viticola genes and 483 Pl. muralis genes, for a total of 1,174 genes) common only to Pl. viticola and Pl. muralis ( fig. 2A). Secreted proteins accounted for a large proportion (32.7%) of these proteins, probably due to their role in pathogen-host interactions. Most of these secreted proteins (61.7%, distributed in 39 ortholog groups) were annotated as RxLR, suggesting that adaptation to hosts from the Vitaceae probably involved diversification of this effector family. Specific groups also included 17 secreted serine proteases (in one group), two glycosyl hydrolases and one necrosis-inducing NPP1-like protein. Another 1,347 genes were specific to Pl. viticola, including 151 secreted proteins, 28 of which were annotated as RxLRs. Plasmopara viticola-specific proteins also included two secreted cysteine proteases, three glycosyl hydrolases, three pectin esterases and one serine protease. There were 1,194 proteins (40 secreted) specific to Pl. muralis, including four RxLRs. Other interesting proteins included two glycosyl hydrolases (one of which was secreted) and one serine protease.
Most of the genes specific to Pl. viticola or Pl. muralis were not functionally annotated, with only 21.9% and 11.6%, respectively, of these genes having at least one mapped GO term.
A Two-Speed Genome Architecture Plasmopara viticola and Pl. muralis displayed the two-speed genome architecture ( fig. 3; supplementary fig. 6, Supplementary Material online) observed in several filamentous plant pathogens (Dong et al. 2015). Core genes common to all oomycetes were preferentially located in gene-dense regions ( fig. 3B), whereas genes encoding secreted proteins in Pl. viticola and Pl. muralis had larger flanking intergenic distances than other genes, indicating that they were preferentially located in gene-sparse regions ( fig. 3C). This was also the case for genes specific to Pl. viticola and Pl. muralis ( fig. 3D). Statistical comparison of distances between genes and their closest neighbor between gene categories (supplementary fig. 7, left panels, Supplementary Material online) confirmed these observations (P < 1e À15 for all Wilcoxon rank-sum tests). Gene-sparse regions coincided with repeatrich regions in Pl. viticola, as the distance between genes and their closest repeat sequence was shorter in these regions . Two major groups were identified when considering all genes or only core genes with short 3 0 intergenic distance: Genes with short (between 80 and 300 bp) or large (between 5 and 20 kb, approximately) 5 0 intergenic distances ( fig. 3A and B). In Pl. viticola, this pattern was explained by an excess of gene pairs in an HH orientation when considering intergenic distances >5 kb (32% of HH gene pairs compared with the expected 25%, P < 1e À15 for the v 2 test comparing the observed proportions to those expected by chance, supplementary table 11, Supplementary Material online), and an excess of gene pairs in a TT orientation for intergenic distances <500 bp (34% of TT gene pairs compared with the expected 25%, P < 1e À15 ). The same results were observed in Pl. muralis (supplementary table 11, Supplementary Material online). Thus, gene pairs with a TT orientation, which had a shared 3 0 intergenic region, were, on average, closer to each other than gene pairs with an HH orientation, which had a shared 5 0 intergenic region. Neighboring genes generally have correlated expression levels in eukaryotic genomes (Michalak 2008), but TT gene pairs are more frequently negatively coregulated in Arabidopsis thaliana (Williams and Bowles 2004) and Saccharomyces cerevisiae (Wang et al. 2014) than adjacent genes in HH and TH orientations. Unfortunately, we had too little RNA-seq data (not enough mapped reads, see the Materials and Methods section) to determine whether this was the case in Pl. viticola. In Ph. infestans, about 10% of adjacent genes have (positively or negatively) correlated expression levels (Roy et al. 2013), but the specific effect of different gene-pair orientations on gene expression has not been investigated in oomycetes.  4A) were significantly higher (P < 10 À15 ) for genes encoding secreted proteins (mean: 0.44) than for other genes (mean: 0.27). For RxLRs, the mean pairwise dN/dS was 0.70 (P < 10 À15 ), indicating an even faster rate of evolution. This pattern was also observed when comparing core genes (mean: 0.20) and genes specific to species infecting hosts from the Vitaceae (mean: 0.79, P < 10 À15 ), possibly due, at least in part, to relaxed selective constraints on lineagespecific genes (Cai and Petrov 2010). Overall, 92 pairs of orthologs had pairwise dN/dS values above 1 (supplementary table 12, Supplementary Material online), 64 of which were found expressed in our RNA-seq data in Pl. viticola. They included 37 genes encoding proteins secreted in at least one of the species, including 19 RxLRs, two serine proteases, a Kazallike protease inhibitor, a cysteine-rich protein, a glycosyl hydrolase, and a mannose-binding lectin identified in a previous study (Mestre et al. 2012). All these genes encoding for secreted proteins were expressed in Pl. viticola except for PVIT_0017527.T1. This gene had no other functional domain, but its ortholog in Pl. muralis, PMUR_0012778.T1, was annotated as an RxLR. Two of the candidate RxLRs had best BLASTP hits in the nr database with previously characterized Pl. viticola genes: PVIT_0013263.T1 displayed 99.6% amino acid identity with RxLR16, which triggers a hypersensitive Local gene density when considering (A) all genes, (B) core genes found in ortholog groups common to all oomycetes, (C) genes encoding secreted proteins and (D) genes found only in Pl. viticola or in both Pl. viticola and Pl. muralis. The number of genes for 2D (5 0 and 3 0 ) intergenic distance bins is represented as a heatmap. (E) Mean distance between genes and their closest repeat element, represented as a heatmap for 2D intergenic distance bins. (F) Orientation and intergenic distance of adjacent gene pairs. The gene count for different intergenic distances has been plotted for adjacent gene pairs with a head-to-head (HH), tail-to-head (TH) and tail-to-tail (TT) orientation. Dashed vertical lines indicate the cutoffs used for the gene orientation analysis. Distances are shown on a log scale for all figures. response in Nicotiana benthamiana (Xiang et al. 2017), and PVIT_0010026.T1 displayed 91.6% identity with RxLR28, which suppresses plant defenses in the same species (Xiang et al. 2016). Genes with dN/dS values above 1 were preferentially located in gene-sparse regions of the genome ( fig. 4B), consistent with the notion that genes in these regions tend to evolve faster and are involved in adaptive processes.

Genes Under Positive Selection in Pl. viticola and Pl. muralis
The branch-site test was used to detect positive selection in Pl. viticola and Pl. muralis. This test had a relatively low power given that data were available for only six species, but it was nevertheless more sensitive than the pairwise dN/dS approach. In total, 270 single-copy orthologs (6.72% of the 4,017 gene families tested) were found to be under positive selection in at least one of the branches tested in branch-site tests (10% FDR threshold). The overall proportion of genes encoding secreted proteins with positive selection signals was low (table 3). For the Pl. viticola branch, 131 significant genes were identified (supplementary table 13, Supplementary Material online), all of which were found expressed in the RNA-seq data except for PVIT_0009280.T1. Only three of the genes were encoding secreted proteins (2.54% of the tested secreted proteins). These proteins included a cysteine protease and two proteins of unknown function. The candidate genes also included a gene encoding a Kazal-like protease inhibitor, although this protein lacked a secretion signal. For the Pl. muralis branch, 157 genes were found to be under selection (supplementary table 14, Supplementary Material online), including nine encoding secreted proteins (5.17% of the secreted proteins tested): An NADPH-hemoprotein reductase, a ubiquitin-specific peptidase, a protein with an Lsm domain (found in proteins involved in mRNA maturation), a glycosyl hydrolase, a protein with a heavy metal-associated domain, three proteins with EGF-like or filamin domains, and one with no identified domain. The only secreted proteins with a positive selection signal in the two branches were the cysteine protease (no secretion peptide detected in Pl. muralis) and one of the proteins of unknown function.
Several genes encoding transporters also presented significant positive selection signals, and some had functions potentially involved in plant-pathogen interactions: For Pl. viticola, a major facilitator superfamily transporter with 1,3b-glucan synthase activity, a multidrug transporter, a copper ion transporter and a folate-biopterin transporter, and for Pl. muralis, an amino acid transporter and a phospholipidtranslocating ATPase. These transporters may play an important role in interactions between Pl. viticola and Pl. muralis and their hosts, notably for the acquisition of specific nutrients. For example, a sucrose transporter plays a major role not only in carbon uptake, but also in the avoidance of plant defenses in the biotrophic corn smut fungus Ustilago maydis (Wahl et al. 2010). These transporters may also be involved in resistance to plant defense compounds or fungicides (Del Sorbo et al. 2000). Finally, there is at least one example of a secreted effector protein resulting from the alternative splicing of a gene coding for a transporter in the obligate oomycete pathogen Pseudoperonospora cubensis (Savory et al. 2012).
Enrichment in functional terms was found only in the GO enrichment analysis for the Pl. viticola branch. The terms for which enrichment was detected were "snoRNA binding" (q ¼ 0.014) and "nuclear localization binding" (q ¼ 0.074). Small nucleolar RNAs (snoRNAs) are involved principally not only in ribosomal RNA modifications, but also in RNA interference (Scott and Ono 2011), and two of the three proteins with the "nuclear localization binding" term were transporters of ribosomal proteins. In addition, both tested branches included multiple candidate genes with functions linked to RNA modification or processing (table 3). Genes under selection in the Pl. viticola branch included two orthologs of Ph. infestans genes encoding chromodomain proteins potentially involved in gene silencing (PITG_07902 and PITG_15837; Vetukuri et al. 2011). The other genes were linked to RNA metabolism, mRNA maturation and processing, or rRNA and tRNA modification. The genes with a selection signal in both branches included an snoRNA binding protein and a DEAD/ DEAH RNA helicase. Small RNAs play a major role in plantpathogen interactions, being involved in the regulation of pathogen virulence by the host (host-induced gene silencing) and the manipulation of host immunity by the pathogen (reviewed in Weiberg and Jin 2015), potentially accounting for the positive selection signal for some components of the RNA machinery. A recent study by Brilli et al. (2018) demonstrated the presence of RNA interference machinery genes in Pl. viticola and the existence of a bidirectional exchange of small RNAs between Pl. viticola and its host V. vinifera, confirming the importance of small RNAs in this pathosystem.
Our list of candidate genes could guide the discovery of new effectors and the prioritization of functional validation analyses. This is particularly important for obligate biotrophic species, for which functional studies are difficult and currently restricted to heterologous expression systems. Genome editing is currently being developed in the hemi-biotrophic Phytophthora genus, the degree of success differing between species (van den Hoogen and Govers 2018), and may be even more challenging in biotrophic downy mildews.

Conclusions
This study presents a genome assembly of Pl. viticola, the grapevine downy mildew pathogen, generated exclusively using PacBio long reads at high coverage. This sequencing strategy yielded a better assembly of repetitive regions and, consequently, improved the detection of genes encoding secreted proteins potentially involved in pathogenicity and preferentially located in these regions. We also assembled the genome of Pl. muralis with short reads, the first step in the comparative genomics study of multiple Plasmopara species infecting hosts from the Vitaceae (Rouxel et al. 2014(Rouxel et al. , 2013, which will shed light on the genetic mechanisms underlying host specialization in this species complex. The different approaches used highlighted the importance of cytoplasmic RxLR effectors for the adaptation of Pl. viticola and Pl. muralis to their host, as already shown for other pathogens in the order Peronosporales (Bozkurt et al. 2012). Our genome-wide approach detected positive selection signals for other candidate genes, including several genes encoding transporters and genes linked to the RNA machinery.
The availability of a high-quality reference assembly for the Pl. viticola genome also paves the way for robust population genetic studies based on genome-wide diversity (i.e., by resequencing individuals), such as population genome scans to detect genes under selection and genome-wide association studies to link genotype and phenotype variations. This work will improve our understanding of the genetic basis of the rapid adaptation of Pl. viticola to abiotic (e.g., fungicides) and biotic (e.g., partially resistant cultivars) factors.