The NLR-Annotator tool enables annotation of the intracellular immune receptor repertoire.

Disease resistance genes encoding nucleotide-binding and leucine-rich repeat (NLR) intracellular immune receptor proteins detect pathogens by the presence of pathogen effectors. Plant genomes typically contain hundreds of NLR-encoding genes. The availability of the hexaploid wheat (Triticum aestivum) cultivar Chinese Spring reference genome allows a detailed study of its NLR complement. However, low NLR expression and high intra-family sequence homology hinders their accurate annotation. Here we developed NLR-Annotator, a software tool for in silico NLR identification independent of transcript support. Although developed for wheat, we demonstrate the universal applicability of NLR-Annotator across diverse plant taxa. We applied our tool to wheat and combined it with a transcript-validated subset of genes from the reference gene annotation to characterize the structure, phylogeny and expression profile of the NLR gene family. We detected 3,400 full-length NLR loci of which 1,560 were confirmed as expressed genes with intact open reading frames. NLRs with integrated domains mostly group in specific subclades. Members of another subclade predominantly locate in close physical proximity to NLRs carrying integrated domains, suggesting a paired helper-function. Most NLRs (88%) display low basal expression (in the lower 10 percentile of transcripts). In young leaves subjected to biotic stress we found upregulation of 266 of the NLRs. To illustrate the utility of our tool for the positional cloning of resistance genes, we estimated the number of NLR genes within the intervals of mapped rust resistance genes. Our study will support the identification of functional resistance genes in wheat to accelerate the breeding and engineering of disease-resistant varieties.

Secondly, we developed NLR-Annotator, a tool for de novo genome annotation of 134 loci associated with NLRs. By applying NLR-Annotator and our expression data to 135 the genome of the wheat reference cultivar Chinese Spring, we found 3,400 loci that 136 may be functional NLR genes or pseudo genes. We show here that wheat NLRs 137 predominantly occur towards the telomeres and in close proximity to each other, 138 display pronounced copy number and sequence variation between the A, B and D 139 sub-genomes (yet maintain conservation of intron-exon structure), contiain clade-140 specific integration of exogenous domains, and that NLR expression is modulated by 141 development and biotic stress. 142

Motif-based annotation of NLR loci in whole genome assemblies 145
We set out to develop a method for annotation of NLRs independent from gene 146 calling. The recently published pipeline NLR-Parser  uses 147 combinations of short motifs of 15 to 50 amino acids to classify a sequence as NLR-148 related. These motifs had been defined based on manual curation of a training set of 149 known NLR sequences by Jupe et al. (Jupe et al., 2012) and mainly resemble the 150 sub-structures of NLR protein domains also described in other studies 151 (Supplemental Figure S1; (Jupe et al., 2012)). Since these motifs may occur 152 randomly in a genome, the NLR-Parser searches for combinations of doublets or 153 triplets of motifs that often occur in the same order. The drawback of 154 however, is that it can only classify a sequence, not distinguish the border between 155 two NLRs within the same sequence. In the extreme case of a whole chromosome 156 with multiple NLRs, this would be classified as a single complete NLR. NLR-Parser thus needs pre-defined gene models, or another method to delimit the borders it 158 searches within. 159 Here, we present a new tool, NLR-Annotator, as an extension of 160 to annotate NLR loci in genomic sequence data. In this study, we define the term 161 'NLR locus' as a section of genomic sequence associated with a single NLR, i.e. one 162 NB-ARC (nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4) 163 domain potentially followed by one or more leucine-rich repeats (LRRs). An NLR 164 locus may be a gene with an intact open reading frame but may also be the trace of 165 a sequence pseudogenized in the observed genome. Our pipeline disects genomic 166 sequences into overlapping fragments, and then uses NLR-Parser to pre-select 167 those fragments potentially harbouring NLR loci. In this step, the nucleotide 168 sequence of each fragment is translated in all six frames to search for motifs. motifs that are associated with the NB-ARC domain (Supplemental Figure S1). This 174 is used as a seed, which is then elongated into the pre-NB region and the LRRs by 175 searching for additional motifs associated with those regions. 176

177
We tested the NLR-Annotator on the Arabidopsis thaliana genome sequence using 178 Arabidopsis gene models (TAIR10, http://www.arabidopsis.org) as a gold standard. 179 Because NLR-Annotator identifies NLR loci, which can be active genes or 180 pseudogenes, the Arabidopsis gold standard annotation is not perfect for 181 benchmarking, but it is nontheless a good option. Idependently of NLR-Annotator, 182 we classified the 27,416 protein sequences from TAIR10 using Hidden Markov 183 Models from Pfam domains. We found 166 protein sequences with an NB-ARC 184 domain. Using NLR-Annotator, we found 171 loci in the TAIR10 genome assembly. 185 Of the 166 NLR annotated genes, only eight were not overlapping with one of our 186 loci. We manually investigated those eight protein sequences and found that four 187 were activated disease resistance (ADR) 1 or ADR1-like. We had already reported 188 that NLR-Parser does not detect the ADR1 family genes . 189 Similar to ADR1, the remaining four proteins were not detected because for at least 190 one motif, the similarity to the consensus motif was below the default threshold. Of 191 the 171 loci called by NLR-Annotator, 162 overlapped with our gold standard gene 192 models. In four of these cases the protein contained two NB-ARC domains resulting 193 in two separate loci annotated by NLR-Annotator. We aligned the remaining nine 194 NLR loci, which had no overlap to an NB-ARC containing protein, to NCBI non-195 redundant proteins using BlastX (https://blast.ncbi.nlm.nih.gov). In all cases we 196 found homology to NLR proteins from other species, which suggests that although 197 those loci were not called as genes in the Arabidopsis reference the NLR-Annotator 198 accurately identified sequences associated with NLRs. Finding 158 out of 166 NLR 199 genes in TAIR10, we conclude a sensitivity (ratio of identified NLR genes to all NLR 200 genes) of 95% for NLR-Annotator. Every one of the loci found with NLR-Annotator 201 was validated to be associated with NLRs supporting a specificity (ratio of correctly 202 identified NLRs to all identified NLRs) of 100%. 203

Physical position of NLR loci 239
With the availability of pseudo-chromosomes, which represent 94% of the predicted 240 wheat genome, most NLR loci can now be placed in the physical context of their 241 location on the chromosome. The foremost practical implication of this is to take 242 advantage of available mapping data and other resources to accelerate the 243 identification of functional disease resistance genes. Annotated NLR loci were found 244 to preferentially locate at the telomeres of each chromosome ( Figure 2) and cluster 245 together. More than 400 of the 3,400 observed NLR loci are in a proximity of less 246 than 5 kb to another NLR locus. Half of all loci are in a distance of less than 50 kb to 247 another locus ( Figure 2). The NLR loci are distributed over all chromosomes and the 248 number of NLR loci per chromosome ranges between 29 (chromosome 4D) and 280 249 (chromosome 4A). No clear preference of NLR numbers towards one homeologue 250 within each group could be observed (Supplemental Table S3). 251 To demonstrate the potential practical advantage of our resource, i.e. 252 annotated NLR loci with a physical position on pseudo-chromosomes, we searched 253 the literature for leaf rust and stem rust resistance genes that have been genetically 254 mapped in wheat but not yet cloned. Most genes have been mapped in accessions 255 other than the reference accession Chinese Spring and most resistance genes may 256 not have a functional allele in Chinese Spring. Nevertheless, we hypothesized that in 257 many cases, a non-functional allele or close homologue may be present, the 258 sequence of which could then be used to speed up the cloning of the functional allele 259 from the resistant accession. 260 To explore the suitability of this approach, we searched the Chinese Spring 261 genome for homologues of exemplarily cloned R genes, namely the stem rust 262 resistance genes Sr22, Sr33, Sr35, Sr45, and Sr50, the powdery mildew resistance 263 genes Pm2, Pm3, Pm8, the leaf rust resistance genes Lr1, Lr10 and Lr22a, and the 264 yellow rust resistance genes Yr5, Yr7 and YrSP (Supplemental Table S1). In eight 265 cases (Sr22,Sr33,Sr45,Pm2,Pm3,Lr1,Yr5,Yr7), the best alignment of the protein 266 sequence to Chinese Spring was on the same chromosome where the gene was 267 cloned from. For Sr35 and YrSp, the second best alignment was on the right 268 chromosome and the best alignment was on a homeologous chromosome. For Pm8, 269 which was introgressed from rye (Secale cereale), and Lr22a, the second best 270 alignment was on the right chromosome whereas the best alignment was on a 271 scaffold yet to be assigned to a chromosome. Sr50 was cloned from rye as well 272 (Mago et al., 2015) but characterized to be an orthologue of Mla, on group 1 273 chromosomes. As expected, the best three alignments for Sr50 were therefore with 274 chromosomes 1A, 1D and 1B. For Lr10, we did not find an alignment on 1A, but on 275 the homeologous chromosomes. However, this was expected since Chinese Spring 276 was previously characterized to be a deletion haplotype of Lr10 (Isidore et al., 2005). 277 Next, we aligned sequences of flanking markers from 39 mapped, but not yet cloned 278 Sr and Lr genes to Chinese Spring, defined the physical interval and counted the 279 number of NLR-associated loci within the interval. The loci can be used for candidate 280 gene approaches or further delimiting of map intervals in positional cloning projects. 281 In 32 cases, we could determine a physical interval, either by using the exact 282 flanking markers or by using chromosome ends or the centromere as surrogate, to 283 determine a number of candidate NLRs. We found between 0 and 61 NLRs as 284 candidates (average: 10) (Supplemental Table S4 We expected a substantial number of our independently predicted NLR loci to be 294 overlapping with genes from the automated gene annotation v1.1 of the Chinese 295 Spring reference sequence (IWGSC RefSeq v1.0). Of the 3,400 loci we predicted by 296 NLR-Annotator, 2,955 overlap with genes annotated in RefSeq v1.1. In total, 578 297 NLR loci defined by NLR-Annotator correspond to more than one gene in RefSeq 298 v1.1. We looked at these 578 loci in more detail. In 70 cases, we found gaps 299 (stretches of unassigned nucleotides, Ns) in the assembly potentially interfering with 300 transcript mapping and thus hampering gene calling and giving rise to two falsely 301 called genes. In 116 cases, we found one of the gene models resembling a complete 302 NLR gene, i.e. the P-loop, at least three consecutive NB-ARC motifs and at least one 303 LRR motif indicating a potential overextension of the NLR locus, usually brought 304 about by a randomly occuring LRR motif shortly downstream of the gene 305 (Supplemental Table S5). In 29 of 30 random examples (Supplemental Table S4) of 306 the remaining 405 cases, we observed a stop-codon in the coding sequence 307 interupting the open reading frame in the transcript. 308 We believe that these cases with stop codons represent alleles of NLR genes 309 that have recently been pseudogenized but which are still transcribed. An example 310 supporting this theory is the Pm2 gene conferring resistance to Blumeria graminis, 311 the causal agent of powdery mildew. The Pm2 gene was cloned from the wheat 312 cultivar Ulka (Sanchez- Martin et al., 2016) where it encodes a full-length NLR. The 313 allele in Chinese Spring has a stretch of 12 nucleotides replaced by five other 314 nucleotides, thus causing a frame shift leading to an early stop codon, but is 315 nonetheless still transcribed (Supplemental Figure S2). 316

A set of validated NLR gene models in Chinese Spring 318
For the subsequent analyses which involve gene models, we proceeded with gene 319 models from the IWGSC gene annotation v1.1. To select appropriate transcripts that 320 encode complete NLRs, we used NLR-Parser to classify sequences and selected 321 those containing a P-loop (Meyers et al., 1999), at least three consecutive motifs 322 associated with the NB-ARC domain (motifs 1, 6, 4, 5, 10, 3, 12, 2) and at least one 323 LRR-associated motif (motifs 9, 11 or 19). Of 4,983 transcripts (Supplemental Table  324 S5) that either overlapped with an NLR locus or were classified as "NLR-associated" 325 by NLR-Parser, we identified 1,823 transcripts (corresponding to 1,560 genes) that 326 encoded complete NLRs. In addition, we used ADR1 (AT1G33560) to search for 327 homologues in the gene models. The four candiate ADR1 transcripts 328 (TraesCS5D02G081800.1, TraesCS5B02G075900.1, TraesCS5A02G069600.1, 329 TraesCS5A02G069600.2) were added. Our final list of confirmed 1,827 NLR 330 transcripts (Supplemental Table S6) is most likely an underestimate of the total NLR 331 gene content. Manual curation would likely be required to obtain a more precise 332 estimate of the total true NLR complement. 333

Phylogenetic analysis of wheat NLR genes 335
NLRs constitute a large gene family and only for a very few individual genes has a 336 function been assigned. A phylogenetic analysis provides a means to order this large 337 number of genes with respect to their sequence relationship and arrange them into 338 sub-families. We set out to establish this order and look for common features in 339 various clades of the NLR gene family, which may hint at common functional 340

attributes. 341
We extracted the NB-ARC domains from NLR protein sequences and 342 calculated their phylogenetic relationships. For this analysis, we used the 1,827 NLR 343 transcripts we identified in the IWGSC v1.1 gene annotation rather than loci 344 annotated by NLR-Annotator. The reasons are firstly an often-occurring intron within 345 the NB-ARC domain complicating identifiation of the correct, complete open reading 346 frame and subsequent translation without the support of transcript data. Secondly, in 347 our subsequent analysis below we intended to investigate integrated domains, which 348 is not possible without gene models. Figure 3 shows the NLR phylogeny of Chinese 349 Spring with clades highlighted that are discussed subsequently in this manuscript. 350 We also added known resistance genes to the tree. An interactive version of Figure   For reasons mentioned above we used gene models for our phylogenetic 354 analysis. However, we also tested if we can use information from NLR-Annotator to 355 construct phylogenetic trees as a function for genomes lacking a gene annotation. 356 To this end we used only the concatenated amino acid motifs within the NB-ARC 357 domain that were identified by NLR-Annotator. The resulting tree (Supplemental 358 Figure S3) is similar to the tree based on whole NB-ARC protein sequences ( Figure  359 3). For example, clades with specific features that we identified (see sections below) 360 were preserved in this tree based only on the concatenated NB-ARC motifs. To 361 highlight the similarity between the two trees, we colour-coded NLR loci in the 362 concatenated NB-ARC domain tree corresponding to genes from several clades with 363 distinct features. Apart from six outliers, all loci which clustered in the full-length NB-364 ARC domain tree (Figure 3), also clustered in the concatenated NB-ARC motif tree 365 ( Figure S3). All six outliers originate from proteins with two complete NLRs, which 366 are fused. With NLR-Annotator those were separated as two loci and the outliers 367 show the phylogenetic position of the secondary NB-ARC domain in the protein. We screened the NLR protein sequences for integrated domains. In total, we 379 found that 129 of the 1,560 protein sequences carry integrated domains. The most 380 prevalent domains were protein kinase (41 cases), DDE acidic triad-transposase 381 (DDE-Tnp4) (26 cases), zink finger (zf)-BED (named after the Drosophila proteins 382 BEAF and DREF) (10 cases), Jacalin (8 cases) and Motile Sperm (7 cases) 383 (Supplemental Table S7). 384 We then associated integrated domains with the phylogeny of the NB-ARC 385 domains of all NLRs. The clades with accumulated integrated domains are shown in 386

Tandem NLRs 398
NLRs occuring in tandem can function together. They can be arranged in a head-to-399 head formation in which the genes are on opposite strands and the distance 400 between gene starts is shorter than the distance between gene ends (Narusaka et 401 al., 2009). We searched for this type of tandem NLR genes in the reference wheat 402 genome. We found 52 NLR pairs with a position as described above and a maximum 403 distance of 20 kb (Supplemental Figure S5, Supplemental Table S8). Twenty-five 404 pairs had one mate located in the red "Helper" clade. Fifteen of those pairs have the 405 other mate located in Clade ID5, which accumulates various integrated domains. 406

Intron-Exon structure 408
We investigated introns in NLR genes and observed a large diversity in terms of 409 length and number of introns. The average intron length is 1,197 bp but they can 410 extend up to 24,586 bp, while the average intron number is 2 and ranges between 0 411 and 23 (Supplemental Figure S6). Another interesting feature is the position of

NLR Expression 439
In a previous study we observed that in wheat the NLRs encoded by the stem rust 440 resistance genes Sr22 and Sr45 were expressed at low levels in leaves of seedlings 441 (~30/~60 Gb sequencing resulted in <20x/<15x coverage of Sr22/Sr45, respectively) 442 . To test whether this observation of low expression could 443 be extended broadly to NLRs in wheat, we sequenced the transcriptomes of young 444 leaves generating more than 305 Gb of data in three samples. We also enriched the 445 same samples for NLR genes (cDNA R gene enrichment sequencing, RenSeq) 446 (Andolfo et al., 2014) to detect lowly expressed NLR genes. We considered 447 expression of a gene detectable if more reads would map to it than necessary to 448 cover the entire length 5-fold. In the combined data sets, we found 1,074 NLR genes 449 being detectable as transcribed. In the un-enriched transcriptome data, 859 genes 450 (80%) could be detected. Down-calculating those numbers (Figure 4), we estimate 451 that generating 50 Gb of wheat transcriptome data would allow for 55% of NLRs 452 expressed in young leaves to be detected. 453 454

Stress-induced expression of NLRs 455
There are more than a thousand NLRs in a wheat genome, representing 1-2% of the 456 total gene coding capacity (Consortium, 2018). However, the immunity conferred by 457 NLRs comes at a metabolic cost and over-activation of immune responses may give 458 rise to stunted growth (Yi and Richards, 2009;Choulet et al., 2014). Therefore, we 459 assume that NLR expression would be tightly regulated. 460 As an example, we checked whether NLR expression changes after biotic stress. hours after challenge. Three (non-NLR) genes that had previously been reported to 474 be PAMP responsive  showed differential expression by 475 reverse transcription quantitative PCR (RT-qPCR) confirming a responsivenes to 476 PAMP treatment (Supplemental Figure S8). 477 Within our set of 628 NLR genes for which we could obtain expression data 478 (File S1), nearly half were differentially expressed. In total, 266 genes (257 in the 479 case of chitin and 194 in the case of flg22) were found to be at least two-fold 480 stronger expressed than before treatment. Moreover, the expression of 243 (234 in 481 the case of chitin and 166 in the case of flg22) of those genes was reduced again to 482 resting state levels at 180 minutes after treatment. We visualized this general 483 tendency by plotting expression values of NLR genes in ternary plots where the dot 484 size and colour additionally depicts the maximum expression found at any time point 485 ( Figure 5). 486 Most NLR genes that showed differential expression after chitin treatment 487 also showed differential expression after treatment with flagellin (Supplemental 488 Figure S9a). In particular, when we relaxed the threshold of "2-fold overexpression", 489 we noted that treatment with either PAMP caused upgregulation of the same genes 490 (observed in 255 cases out of 266 differentially expressed genes, Supplemental 491 Figure S9b). 492 The upregulated NLR genes were not associated with specific phylogenetic 493 clades. 494 495

Discussion 496
In this study we present the tool NLR-Annotator for de novo annotation of NLR loci. 497 Our tool is distinguished from standard gene annotation in that it does not require 498 transcript support to identify NLR-associated loci. NLR-Annotator is therefore a 499 powerful tool for the identification of potential R genes that could be used in breeding 500 for resistance to wheat pathogens. Very often, however, the funcional copy of an R 501 gene is not present in the accession that was chosen for a reference genome. The 502 positive selection imposed on NLRs often results in extensive accessional sequence 503 and copy number variation at NLR loci (Noel et al., 1999;Kuang et al., 2004;Chavan 504 et al., 2015;. Thus, the corresponding NLR homologue in the wheat 505 reference accession, Chinese Spring, would not be considered by a standard gene 506 annotation. NLR-Annotator, on the other hand, is not limited to functional genes. In 507 our analysis of Chinese Spring, we found 3,400 loci while only 1,560 complete NLRs 508 could be confirmed through gene annotation in IWGSC RefSeq v1.1. Within the 509 additional loci identified by NLR-Annotator there might be some NLRs with potential 510 for function. However, we speculate that most of these are non-functional 511 pseudogenes. Nevertheless, it is important to define these loci in the reference 512 accession Chinese Spring because these genes may have functional alleles in other 513 accessions. A case in hand concerns Pm2 (from the cultivar Ulka), which in Chinese 514 Spring has an out-of-frame insertion/deletion leading to an early stop codon. The To improve the annotation further would require manual annotation locus-by-532 locus, in each case, taking into account existing gene annotations, mapped transcript 533 data from different sources, and our NLR-Annotator loci. All these data would have 534 to be integrated and inspected to provide bona fide gene models. Given the diversity 535 of NLR complements in different wheat accessions ) the amount of 536 work required for this effort is considerable. For practical applications such as R 537 gene cloning, our automated NLR locus discovery will, in most cases, likely be 538 sufficient to identify a candidate gene, which can be annotated ad hoc in the relevant 539 accessions. For studies requiring annotated genes we have to settle for a subset of 540 the actual NLR complement. 541 We set out to investigate the sequence diversity relationships within the NLR 542 gene family in Chinese Spring. It has been well documented that in NLRs the NB-543 ARC domain is far more conserved than the LRRs. Therefore, phylogenetic studies 544 compare the NB-ARC domains to explore and depict the sequence relationship In brief, we found that the new assembly supports the observations made in the 556 study by Bailey and colleagues. One additional feature that we discovered is a clade 557 of NLRs which contain elongated NB-ARC domains (Clade "NB-ARC", Figure 3). 558 The elongated part of the NB-ARC domain precedes the canonical NB-ARC domain 559 and consists predominantly of motifs 6, 4 and 5, often preceded by motif 1 (See 560 Supplemental Figure S1 and Supplemental Figure S7). We do not know the function 561 of this elongated NB-ARC domain. However, a member of this clade is RGA2 which 562 has previously been described to be essential for Lr10 function (Loutre et al., 2009). 563 To expand the previous phylogenetic analysis to include the entire set of all 564 3,400 NLR loci, we developed a new feature in NLR-Annotator. By reducing the NB-565 ARC domains to only be represented by the even more conserved motifs 566 (Supplemental Figure S1), we can ensure to exclude introns from the phylogenetic 567 analyses, and can also exploit the a priori knowledge of the position of the motifs 568 within the NB-ARC domain to avoid an external multiple alignment. This then permits 569 using the alignment from NLR-Annotator directly as an input to compute a 570 phylogenetic tree. The resulting phylogeny (Supplemental Figure S3) was validated 571 by visualising leaves that correspond to genes from specific clades in the phylogeny 572 derived from gene models (Figure 3). The phylogenetic relationship of genes was 573 maintained for corresponding loci in the tree generated from NLR-Annotator. One 574 example, however was found in the Mla region, where a locus was associated with a 575 neighboring sub-clade rather than the other loci of the Mla region. 576 Finally, with the IWGSC RefSeq v1.0 gene annotation v1.1, we performed 577 NLR expression profiling. We found that many NLRs are expressed at low levels, 578 requiring either extremely deep sequencing or target enrichment to be detected. We 579 also looked at NLR induction in response to PAMPs and found that a subset of NLRs 580 are triggered following PAMP exposure. The transcriptional response was 581 measurable within 30 minutes and had returned to base level after 3 hours. This 582 response shows the tight regulation of the NLR immune system and may reflect the 583 fine tuning between the cost of defence and resource allocation for growth and 584 reproduction. 585 586

Conclusions 587
In this study, we introduce NLR-Annotator to detect NLRs in a genome sequence. 588 The power and novelty of this tool is that it is completely independent of a 589 preceeding gene annotation thus enabling rapid ab initio interrogation of the NLR 590 complement of a genome without reliance on gene expression or other limitations of 591 gene prediction pipelines. We then used NLR-Annotator to catalogue NLRs in the 592 reference sequence of wheat cv. Chinese Spring, and extended our analysis to 593 comprehensively characterise this complex multigene family. We predict this tool will 594 also facilitate the cloning of functional R genes from plant genomes by positioning 595 NLR loci within mapped intervals, as demonstrated here for leaf and stem rust, two 596 major diseases of wheat. 597 598 Methods 599

NLR-Annotator 600
The NLR-Annotator pipeline is divided into three steps: (1) dissection of genomic 601 input sequence into overlapping fragments; (2) NLR-Parser, which creates an xml-602 based interface file; (3) NLR-Annotator, which uses the xml file as input, annotates 603 NLR loci and generates output files based on coordinates and orientation of the 604 initial input genomic sequence. All three programs are implemented in Java 1.5. 605 Source code, executable jar files and further documentation has been published on 606 GitHub (https://github.com/steuernb/NLR-Annotator). The version of NLR-Parser 607 used here has been published previously along with the MutRenSeq (Steuernagel et 608 al., 2016) pipeline and uses the MEME suite (Bailey and Gribskov, 1998). All 609 genomes annotated in this publication were dissected into fragments of 20 kb length 610 overlapping by 5 kb. Manual investigation of questionable Arabidopsis thaliana 611 protein sequences was performed using SMART (Letunic et al., 2015). Coding sequences of cloned R genes were aligned to pseudo chromosomes of 625 Chinese Spring using BLASTn (Zhang et al., 2000) and default parameters. To align 626 marker sequences extracted from the literature to the pseudo chromosomes we 627 used BLASTn and the parameter -task blastn to amend for short input sequences. In 628 uncertain cases, matching positions were validated using DOTTER (Sonnhammer 629 and Durbin, 1995). 630 631 www.plantphysiol.org on May 6, 2020 -Published by Downloaded from Copyright © 2020 American Society of Plant Biologists. All rights reserved.

Comparison of NLR loci to gene annotation 632
Comparison of NLR loci (generated from NLR-Annotator) with gene models from 633 IWGSC gene annotation v1.1, was done using the Java program 634 CalculateNLRGenes.java (https://github.com/steuernb/wheat_nlr). Overlaps were 635 calculated using the position of gene transcripts. An overlap was only considered if 636 both the locus and the transcript were on the same strand. 637 638 639

Phylogenetic analysis 652
Protein sequences from NLR genes as well as from known R genes were screened 653 for the NB-ARC domain using HMMER v. 3.1b1 (Eddy, 2011) and PFAM-A v. 27.0 654 (http://pfam.xfam.org/). Command line call was "hmmscan --domtblout outputfile 655 Pfam-A.hmm inputfile". Intervals within each sequence were defined based on the 656 presence of motifs 1, 6, 4, 5, 10, 3, 12, 2. An interval has to start with motif 1, other 657 motifs may be absent but if present, the order is not allowed to be changed. For each 658 protein sequence the largest interval including 20 flanking amino acids was used as 659 the NB-ARC domain. A multiple alignment of NB-ARC domain sequences was 660 generated using clustal-omega (Sievers et al., 2011). A phylogenetic tree was 661 generated using FastTree (Price et al., 2010). The tree was visualized using iTOL 662 General feature format (GFF) files of both high-and low-confidence genes from the 673 IWGSC RefSeq annotation v1.1 were screened for pairs of NLR genes that were on 674 reverse complementary strands, in a distance of less than 20 kb and in a head-to-675 head relation, i.e. the distance between gene starts is shorter than the distance 676 between gene end. Java source code was deposited on GitHub 677

Elicitation with PAMPs 720
The protocol was modified from Schoonbeek et al. . 721 Chinese Spring wheat plants were grown for 3 weeks in a growth cabinet under a 722 16:8 hours day:night regime at 23:18°C. For each biological repetition three strips (2 723 cm) where cut from leaf 2 and 3, placed in a 2 ml tube with sterile water and 724 vacuum-infiltrated for 3 times for 1 minute. The following day water was removed 725 and replaced by fresh water or PAMPs dissolved in water at 1 g/l for chitin (Nacosy,726 Yaizu Suisankagaku Industry Co., Japan) or 500 nM for flg22 (www.peptron.com). 727 Samples were drained and flash frozen in liquid nitrogen after 30 or 180 min prior to 728 pulverisation with two stainless steel balls in a Geno/Grinder (SPEX). RNA was 729 extracted using the RNAeasy plant kit (www.Qiagen.com), the concentration was 730 determined on a NanoDrop 8000 Spectrophotometer (Thermo Fisher Scientific) and 731 quality was assessed with a RNA 6000 Nano chip on a Bioanalyzer 2100 (Agilent 732 Technologies). After removal of genomic DNA with TURBO DNA-free Kit (Thermo 733 Fisher Scientific), 1 µg of RNA was converted to cDNA with SuperScript IV (Thermo 734 Fisher Scientific) and expression of PAMP-inducible genes (Schoonbeek et al., 735 2015) was verified by quantitative PCR (Supplemental Figure S8)  replicates is expressed as log2 relative to the expression level at t=0, after infiltration 739 and overnight incubation but before addition of fresh water or PAMP solutions. 740 Expression was normalised to EF-1α (Elongation factor 1-alpha, M90077; with 741 primers ATGATTCCCACCAAGCCCAT and ACACCAACAGCCACAGTTTGC

Expression data analysis 753
The transcript analysis pipeline (calculation of read counts and TPM (transcripts per 754 million)), is based on Kallisto (Bray et al., 2016), and is common with the global 755 wheat study by Ramírez-González et al. (Ramírez-González et al., 2018). Complete 756 transcript lists were filtered for NLR genes as defined above. To estimate the 757 detectability of NLR transcripts, we combined Kallisto read counts from 3-leaf stage 758 replicates. We considered a transcript to be expressed if in total RNA or in RenSeq 759 cDNA the combined length of mapped reads (i.e. a read length of 150 multiplied by 760 the sum of counts from all replicates) exceeded 5 times the length of the transcript 761 itself. Testing the same criterion using only read counts based on 305 Gb of total 762 RNA input data, we found 80% of NLRs. We then gradually reduced Kallisto read 763 counts proportionally to a reduced input data to estimate the percentage of NLRs 764 detected with less input data. 765 For the differential gene expression analysis, we pre-processed count values 766 obtained with Kallisto using Degust (DOI: 10.5281/zenodo.3258932), loading only 767 genes that had a read count of more than 10 in at least one sample. Data was 768 normalized using the method Voom/Limma. 769 For ternary plots, data was pre-processed with the Java program 770

Availability of data and materials 806
The datasets generated and/or analysed during the current study are available in the 807 EBI short read archive (SRA) under study numbers PRJEB23081 and PRJEB23056. 808

Competing interests 809
The authors declare that they have no competing interests. Health BB/P012574/1), the 2Blades Foundation, the Betty and Gordon Moore 815 Foundation, and the Gatsby Foundation. 816

Acknowledgements 817
We thank the IWGSC for early access to the RefSeq v1.0 of Chinese Spring, our 818 colleagues Yajuan Yue and JIC Horticultural Services for plant husbandry, and the 819 NBI Computing Infrastructure for Science (CiS) group for HPC maintenance. We 820 thank David Swarbreck and Gemy Kaithakottil for technical support with Web Apollo. 821 We thank Tobin Florio (www.flozbox.com/Science.illustrated) for the artwork in 822 that encode for NLR-associated motifs as well as integrated domains are displayed. 887 Color coding of motifs is consistent to Figure S1.