Genome-Wide Comparison of Nucleotide-Binding Site-Leucine-Rich Repeat-Encoding Genes in Arabidopsis 1[W][OA]

Plants, like animals, use several lines of defense against pathogen attack. Prominent among genes that confer disease resistance are those encoding nucleotide-binding site-leucine-rich repeat (NB-LRR) proteins. Likely due to selection pressures caused by pathogens, NB-LRR genes are the most variable gene family in plants, but there appear to be species-speciﬁc limits to the number of NB-LRR genes in a genome. Allelic diversity within an individual is also increased by obligatory outcrossing, which leads to genome-wide heterozygosity. In this study, we compared the NB-LRR gene complement of the selfer Arabidopsis thaliana and its outcrossing close relative Arabidopsis lyrata . We then complemented and contrasted the interspeciﬁc patterns with studies of NB-LRR diversity within A. thaliana . Three important insights are as follows: (1) that both species have similar numbers of NB-LRR genes; (2) that loci with single NB-LRR genes are less variable than tandem arrays; and (3) that presence-absence polymorphisms within A. thaliana are not strongly correlated with the presence or absence of orthologs in A. lyrata . Although A. thaliana individuals are mostly homozygous and thus potentially less likely to suffer from aberrant interaction of NB-LRR proteins with newly introduced alleles, the number of NB-LRR genes is similar to that in A. lyrata . In intraspeciﬁc and interspeciﬁc comparisons, NB-LRR genes are also more variable than receptor-like protein genes. Finally, in contrast to Drosophila , there is a clearly positive relationship between interspeciﬁc divergence and intraspeciﬁc polymorphisms.

Resistance (R) gene-dependent recognition of avirulence (Avr) determinants is a cornerstone of plant defense against pathogens. There are at least five diverse classes of R proteins, the largest of which encodes proteins that have a nucleotide-binding (NB) site along with leucine-rich repeats (LRRs; Dangl and Jones, 2001). NB-LRR proteins can be subdivided into two types based on structural features of the N terminus: TIR-NB-LRR (TNL) proteins have a domain that resembles the intracellular signaling domains of Drosophila Toll and mammalian IL-1 receptors, while CC-NB-LRR (CNL) proteins contain a putative coiled-coil domain. The predominant function of NB-LRR genes is in disease resistance (Kunkel et al., 1993;Grant et al., 1995;Botella et al., 1998;Bittner-Eddy et al., 2000;van der Biezen et al., 2002;Slusarenko and Schlaich, 2003;Allen et al., 2004;Sinapidou et al., 2004;McDowell and Simon, 2006), although they can sometimes mutate to produce phenotypes not clearly related to immunity (Faigón-Soverna et al., 2006;Huang et al., 2010;Yang et al., 2010). Because of their diversity, NB-LRR and other R genes are also prone to cause autoimmunity problems in hybrids, due to inappropriate recognition of proteins encoded in an unmatched parental genome (Bomblies and Weigel, 2007Weigel, , 2010. Comparative genomic analyses have indicated that plant genomes can encode several hundred NB-LRR genes, but there is great diversity in the total number of NB-LRR genes as well as in the distribution of subclasses of NB-LRR genes. An interesting example is provided by TNL genes, which are absent from grass genomes surveyed to date (Bai et al., 2002;McDowell and Simon, 2006;Bomblies and Weigel, 2010). Since this subfamily is present in conifers and mosses, it appears that it was lost in grasses and perhaps other monocots (Meyers et al., 1999(Meyers et al., , 2002Akita and Valkonen, 2002). In contrast, dicot genomes typically contain more TNL than CNL genes (Meyers et al., 2003;Yang et al., 2008;Porter et al., 2009).
Most NB-LRR genes reside in clusters (Meyers et al., 2005). NB-LRR genes within clusters can be divided into two types based on their phylogenetic relationships: tandem duplicates, so that genes group together in a species-wide gene tree; and mixed clusters that contain genes from different branches of a species-wide tree (McDowell and Simon, 2006). Two models have been proposed to explain NB-LRR distribution patterns that are inconsistent with simple tandem duplications (Leister, 2004): ectopic duplications and large-scale segmental duplications with subsequent local rearrangements.
Gene conversion is another important process that influences the evolution of gene families, including NB-LRR genes (Bendahmane et al., 2000;Baumgarten et al., 2003;Mondragon-Palomino and Gaut, 2005;Kuang et al., 2008). The extent of gene conversion is correlated with high levels of sequence identity, close physical clustering, gene orientation, and recombination rate (Mondragon-Palomino and Gaut, 2005;Xu et al., 2008), although close physical proximity is not a conditio sine qua non (Kuang et al., 2008).
LRRs are also found in receptor-like proteins (RLPs), which have extracellular LRRs, a transmembrane helix, and a short cytoplasmic region. In several plant species, RLP genes can encode R proteins. In Arabidopsis thaliana, several RLP genes have been implicated in developmental processes, but no R gene has been mapped to an RLP locus (Jones et al., 1994;Yang and Sack, 1995;Jeong et al., 1999;Nadeau and Sack, 2002;Wang et al., 2008). However, one RLP gene has been shown to confer nonhost resistance, while another one can mutate to a dominant form that activates responses normally associated with defense Zhang et al., 2010).
In our study, we used high-quality whole-genome sequences from reference strains of two closely related species, A. thaliana and Arabidopsis lyrata, to address several questions regarding the evolution of NB-LRR genes. (1) Has the mating system shift affected the number of NB-LRR genes? (2) How does their arrangement as single genes, in tandem clusters or in mixed clusters, affect the diversification of NB-LRR genes? (3) What is the relative importance of gene conversion? (4) Is there any correlation between the interspecies divergence of orthologs and the intraspecies polymorphism within A. thaliana? (5) How do the evolutionary patterns of NB-LRRs and RLPs compare? Our work complements a recent study (Chen et al., 2010) that had accessed an early draft of the A. lyrata genome sequence and that lacked extensive synteny information.

Identification, Classification, and Phylogenetic Analysis of NB-LRR Genes
We used the most conserved portion of NB-LRR proteins, a domain that includes the NB site, for an initial round of identification of NB-LRR-encoding genes in A. lyrata and A. thaliana. We then built an Arabidopsis-specific hidden Markov model (HMM) profile based on sequences of both species to identify additional NB-LRR genes (see "Materials and Methods"). We found 159 members of the NB-LRR gene family in A. thaliana. In the A. lyrata genome, we could identify 185 NB-LRR genes ( Table I). Note that not all NB-LRR family members have both NB and LRR domains. Previous studies in A. thaliana reported 166 (Richly et al., 2002), 174 (Yang et al., 2008;Chen et al., 2010), or 178 (Meyers et al., 2003) NB-LRR genes, while The Arabidopsis Information Resource (TAIR) annotation contains 127 (Swarbreck et al., 2008). The differences in our study resulted from more stringent criteria, in addition to exclusively using an HMM model instead of BLAST-based approaches. In A. lyrata, 182 have been reported (Chen et al., 2010); because the same methodology was not applied to A. thaliana, it is difficult to compare that number with our results.
Consistent with a recent study (Chen et al., 2010), only the CNL subfamily has more members in A. thaliana than A. lyrata (Table I; Supplemental Tables S1  and S2). There are 10 A. lyrata genes but no A. thaliana genes that only encode the NB site portion. We produced phylogenetic trees of the encoded NB domains for the CNL and TNL subgroups, combining genes that encode complete proteins with ones lacking the LRR domains (CN and TN; Figs. 1 and 2; Supplemental Figs. S1 and S2). Most branches include genes from both species, but several terminal branches consist of NB-LRR genes from only one of the two species.

Analysis of NB-LRR Gene Clusters
Slightly less than two-thirds of the A. lyrata NB-LRR genes, 118, are found in 35 clusters, each containing between two and 23 genes (mean of 3.4, median of 2.0). The proportion of NB-LRR genes in clusters is higher in A. thaliana, with 113 genes arranged in 38 clusters, each consisting of two to nine genes (mean of 3.0, median of 2.0; Table II). The largest cluster in A. lyrata is the RPS4 locus, a mixed cluster with 23 genes of both the CNL/CN and TNL/TN types. The RPS4 cluster in A. thaliana includes only six genes. One of its members, the namesake RPS4 gene, endows A. thaliana with specific resistance to Pseudomonas syringae pv tomato expressing avrRps4 (Gassmann et al., 1999). The largest cluster in A. thaliana contains nine TNL genes of unknown function, from AT1G72840 to AT1G72950. A. thaliana also has more mixed clusters, altogether 27, than does A. lyrata, which has 21 mixed clusters in total (Table III; Supplemental Tables S3 and S4). While we could identify a similar number of gene conversion events in TNL clusters (42 in A. thaliana and 37 in A. lyrata), we found many more such events in A. thaliana CNL clusters (45) than in A. lyrata CNL clusters (three; Table IV). In each class, the median length of conversion tracts is longer in A. lyrata than in A. thaliana ( Fig. 3; Table IV). A significant positive correlation between cluster size and gene conversion frequency was found in A. thaliana (Spearman's correlation test; r = 0.55, P = 0.03) but not in A. lyrata (r = 0.28, P = 0.30).

Divergence of Orthologs
We identified 88 NB-LRR gene pairs of orthologs shared between A. lyrata and A. thaliana (Table V). For CNL/CN and TNL/TN types, about three-quarters of gene pairs at the terminal branches of the phylogeny were orthologous between species (11 of 16 for CNL/CN and 36 of 48 for TNL/TN). There are great differences, however, in sequence divergence among orthologs (  Table S5). The nonsynonymous divergence (dN) values of the 88 ortholog pairs range from We determined synteny based on orthology of the five genes upstream and the five genes downstream of the focal locus. Most of the orthologs (86.4%) are found in syntenic regions (Supplemental Table S6). The dN values of the 76 syntenic ortholog pairs range from 0.02 to 0.35 (mean of 0.10, median of 0.07), while the dN values of the 12 nonsyntenic ortholog pairs range from 0.09 to 0.34 (mean of 0.19, median of 0.18). The dS values range from 0.08 to 0.74 (mean of 0.23, median of 0.17) for syntenic ortholog pairs and from 0.11 to 0.84 (mean of 0.47, median of 0.51) for nonsyntenic ortholog pairs. The resulting dN/dS ratios range from 0.09 to 0.89 (mean of 0.42, median of 0.39) for syntenic ortholog pairs and from 0.34 to 1.17 (mean of 0.49, median of 0.40) for nonsyntenic ortholog pairs. The dN and dS values of syntenic ortholog pairs are significantly lower than in nonsyntenic ones (Wilcoxon rank sum test; P , 0.01 in both cases).
Importantly, dN and dS values as well as dN/dS (P = 0.02) ratios are generally greater for ortholog pairs that reside in clusters than for singletons (Fig. 4B). The dN/ dS ratios of the 27 singletons range from 0.09 to 0.76 (median of 0.36, mean of 0.37), while the 61 genes in clusters have dN/dS ratios from 0.17 to 1.17 (median of 0.43, mean of 0.46).

Correlation between Within-and Between-Species Deletions
In the context of the 1,001 Genomes Project for A. thaliana (Weigel and Mott, 2009), 80 Eurasian accessions have recently been resequenced using Illumina short-read technology . Since presence-absence polymorphisms are frequent within A. thaliana (Bakker et al., 2006;Shen et al., 2006), we were interested to learn whether this group of genes was overrepresented among the ones that lack orthologs in A. lyrata, as a possible indication of similar evolutionary processes shaping NB-LRR gene diversity over different time scales.
For all 159 NB-LRR genes present in the reference genome of A. thaliana, we calculated the fraction of missing calls (which can result from sequences either having been deleted or being very divergent) across all 80 resequenced accessions. As expected from previous whole-genome studies Ossowski et al., 2008), the NB-LRR genes had on average many more missing calls (median of 0.11, mean of 0.19) than the genome-wide average (median of 0.01, mean of 0.03; P , 2.2e-16; Fig. 5A; Supplemental Table S7). Furthermore, we found that NB-LRR genes within clusters are substantially more likely to have missing calls than singletons: the median of missing calls for 46 singletons is 0.05 and the mean is 0.13 (range of 0.0001-0.66), while for the 113 genes in clusters, it is 0.14 and 0.22 (range of 0.0001-0.90; P = 0.02; Fig. 5A). The CN class has on average the fewest missing calls (significantly different from all NB-LRR genes; P = 0.03), and the CNL class has the most ( Fig. 5A; Supplemental Table  S6). A significantly positive correlation between dS and missing calls was found for NB-LRR genes with A. lyrata orthologs (Spearman's correlation test; r = 0.41, P = 8.0e-05). In addition, we assessed potential copy number variations of NB-LRR genes in the 80 resequenced A. thaliana genomes . Using a cutoff of 85% missing calls across the entire length of the gene, 33.3% (53 genes) of NB-LRR genes in the reference genome appeared to be deleted in at least one of the 80 accessions, with an allele frequency of one to 72 (median of 14.0, mean of 23.5). In comparison, only 12.5% (3,389) of genes in the entire genome are potentially deleted in at least one accession, with an allele frequency of 1 to 80 (median of 6.0, mean of 13.7; P = 0.002; Fig. 5B).
We then asked whether genes that are deleted in at least one of the 80 resequenced A. thaliana accessions are more likely to be missing from A. lyrata. Of the 106 A. thaliana genes without potential deletions, 43 lacked clear orthologs (40.6%). Of the 53 NB-LRR genes that are potentially deleted in at least one A. thaliana accession, this was the case for 28 genes (52.8%). Also, the median frequency of potential deletion alleles among these 28 genes was higher than for the 25 genes with A. lyrata orthologs (19.5 versus 12; Fig. 5B). To obtain additional insights into within-and between-species NB-LRR gene evolution, we shotgun sequenced a bacterial artificial chromosome (BAC) containing the RPP7 cluster from the A. thaliana accession Cape Verde Islands (Cvi-0). Specific alleles of RPP7 are linked to resistance of Hyaloperonospora arabidopsidis ex parasitica in accessions Columbia (Col-0) and Landsberg erecta of A. thaliana (Speulman et al., 1998;Nemri et al., 2010). The Cvi-0 BAC contains a tandem cluster of NB-LRR genes residing in contig 1 (AT1G58370-AT1G59530; hereafter Cvi1), separated from another NB-LRR gene, AT1G59620, by five intervening genes residing in contig 2 (AT1G59590-AT1G59630; hereafter Cvi2; Fig. 6A; Supplemental Table S8). At 108.2 kb and 22 genes, contig 1 is much shorter in Cvi-0 than in the reference sequence from the Col-0 strain, with 184.0 kb and 34 genes, while contig 2 is similar in size (20.1 compared with 20.0 kb; Fig. 6B). While such size differences are dramatic, similar differences have been reported for other A. thaliana clusters (Bomblies et al., 2007;Alcázar et al., 2009). In A. lyrata, the region syntenic to contig 1 is only slightly longer than in Col-0, 220.3 kb with 33 genes, while the region syntenic to contig 2 is about three times as long, at 67.2 kb ( Fig. 6; Supplemental Table S8; Supplemental Fig. S3).
In the RPP7 cluster proper, Col-0 has eight NB-LRR genes, while Cvi-0 and A. lyrata both have only three. A phylogeny across the three genomes indicates an ongoing birth/death process and variation in domain composition (Fig. 7A). Based on the phylogeny and similarity comparisons, we reconstructed the evolutionary history of the RPP7 cluster, which experienced several tandem duplications and deletions (Fig. 7B).

Evolution of RLP Genes
Slightly more than two-thirds of the 57 A. thaliana RLPs, 35, are found in 13 clusters, each containing two to five genes (mean of 2.7, median of 2.0; Supplemental Table S9). We identified 22 RLPs that were orthologous between A. lyrata and A. thaliana. While there are great differences in sequence divergence among orthologs, they are generally not as divergent as NB-LRR orthologs (Fig. 8A). The dN values of the 22 ortholog pairs range from 0.01 to 0.10 (mean and median of 0.04), and the dS values range from 0.10 to 0.23 (mean and median of 0.14). The resulting dN/dS ratios range from 0.05 to 0.60 (mean and median of 0.31). Importantly, dN and dS values as well as dN/dS ratios are generally greater for orthologs that reside in clusters, and dN/dS and dN for these are also significantly greater than for singletons (P = 0.05 and 0.01, respectively; Fig. 8A). The dN/dS ratios of the 10 singletons range from 0.05 to 0.60 (median of 0.23, mean of 0.25), while the 12 genes in clusters have dN/dS ratios from 0.25 to 0.57 (median and mean of 0.36). The dN value of the 10 singletons ranges from 0.01 to 0.07 (median and mean of 0.03), while the 12 genes in clusters have dN values from 0.04 to 0.10 (median and mean of 0.05).
Turning to within-species variation, as assessed with 1,001 Genomes Project data , we found that RLP genes had more missing calls in 80 A. thaliana accessions (median of 0.01, mean of 0.05) than the genome-wide average (median of 0.01, mean of 0.03; P = 0.03; Fig. 8B). We furthermore found that RLP genes within clusters are slightly more likely to have missing calls than singletons: the median and mean of missing calls for 22 singletons are 0.005 and 0.018 (range of 0-0.11), while they are a little higher for the 35 genes in clusters, 0.016 and 0.067 (range of 0.0003-0.425; Fig. 8B). In contrast to NB-LRR genes, there was no clear correlation between dS between species and missing calls in A. thaliana for genes with orthologs in both species.  We also assessed potential copy number variation of RLP genes in the 80 resequenced A. thaliana genomes. Using a cutoff of 85% missing calls across the entire length of the gene, only four RLP genes (AT1G71390, AT1G17250, AT1G07390, and AT3G23010; 7.0% of all RLPs) appear to be deleted in at least one of the 80 accessions, with an allele frequency of one to five. These numbers are significantly lower than for NB-LRR genes in terms of both the total number of deleted genes and the allele frequency of deletions.

Similar Numbers of NB-LRR Genes in A. thaliana and A. lyrata
Our careful annotation of high-quality genome sequences revealed that the transition to selfing in A.
thaliana has not substantially changed the number of NB-LRR genes, with 185 in the A. lyrata reference genome and 159 in the A. thaliana reference genome. This is similar to another recent comparison of NB-LRR genes in A. lyrata and A. thaliana (Chen et al., 2010). In other species, the NB-LRR gene complement appears to vary from as few as 54 (in papaya [Carica papaya]) to 535 (in grapevine [Vitis vinifera]), with the NB-LRR gene families in Brassica rapa, Medicago truncatula, poplar (Populus species), and rice (Oryza sativa) falling between these two extremes (Ameline-Torregrosa et al., 2008;Yang et al., 2008;Mun et al., 2009;Porter et al., 2009). Our findings for Arabidopsis confirm that there does not seem to be a strong correlation between genome size and NB-LRR gene number. However, A. thaliana has more CNLtype NB-LRR genes than A. lyrata, perhaps because the latter more easily loses sequences encoding the coiled-coil domain. This appears to be the case for the RPP7 cluster (Fig. 7), where most genes from two A. thaliana accessions are of the CN or CNL type but the genes from A. lyrata are all of the NB type.
A common cause for genetic incompatibility in F1 hybrids of A. thaliana is hybrid necrosis, which results from inappropriate activation of the immune system in the absence of a pathogen attack. As in other species, this seems to be caused by the aberrant recognition of other plant proteins by specific NB-LRR alleles, either because they resemble pathogen proteins or because they resemble pathogen-modified plant proteins (Krü ger et al., 2002;Bomblies et al., 2007;Bomblies   Obligatory outcrossing species are always heterozygous and therefore express at all times a greater spectrum of NB-LRR gene alleles than selfers, which are mostly homozygous. Therefore, one might speculate that the potential for such aberrant NB-LRR interactions is higher in outcrossing species. Given the highly dynamic nature of NB-LRR genes, with frequent copy number variation in A. thaliana and not much more than half of all NB-LRR genes having orthologs in A. lyrata, it may seem plausible that the 1 million or so years since A. thaliana became selfing (Tang et al., 2007;Bomblies and Weigel, 2010;Yamamoto et al., 2010) could have sufficed to substantially alter the number of NB-LRR genes. However, this is clearly not the case.

Abundant Mixed Clusters and Frequent Gene Conversion
In A. lyrata, about two-thirds of NB-LRR genes reside in clusters, as described earlier for A. thaliana (Meyers et al., 2003). Mixed clusters, which are thought to result from ectopic duplications and large-scale segmental duplication with subsequent rearrangement (Leister, 2004), are the largest class in both species. Gene conversion, which is affected by high levels of se-quence similarity, close physical clustering, gene orientation, and local recombination rate, is an important factor in NB-LRR gene evolution (Bendahmane et al., 2000;Mondragon-Palomino and Gaut, 2005;Xu et al., 2008;Chen et al., 2010). Consistent with larger clusters providing more potential donor sequences, cluster size and gene conversion frequency are positively correlated in A. thaliana. We also observed that the median length of conversion tracts is higher in A. lyrata than in A. thaliana. Because of fewer structural differences, selfers have higher rates of recombination than related outcrossers (Roze and Lenormand, 2005;Glémin et al., 2006;Ross-Ibarra, 2007), and increased recombination might more easily break long conversion tracts into smaller ones.

Different Evolutionary Patterns for Singleton and Clustered NB-LRR Genes
NB-LRR genes show much higher rates of dS and dN and show higher dN/dS ratios than the average gene in the genome (Hu et al., 2011). Importantly, NB-LRR genes in clusters are more diverged between A. lyrata and A. thaliana than are singletons, although there is only limited evidence that entire NB-LRR genes are under positive selection. This is different for the sequences encoding the LRR domains, which are responsible for direct or indirect interaction with pathogen factors (Bakker et al., 2006;Chen et al., 2010).
NB-LRR genes constitute the most variable class of genes in A. thaliana genomes (Rose et al., 2004;Bakker et al., 2006;Clark et al., 2007;Ossowski et al., 2008). This is further supported by our analysis of 80 resequenced A. thaliana genomes , where we found NB-LRR genes within clusters to be more polymorphic than singletons, which also proved to be true for the specific alleles at the RPP7 cluster that we compared. Using missing calls from Illumina sequencing as an indication of potential deletions, we found for NB-LRR genes a positive relationship between interspecific synonymous substitutions and presenceabsence polymorphisms within A. thaliana, which contrasts with the situation in Drosophila, where the relationship is negative (Hahn, 2008). We did not, however, find a clear correlation between the frequency of potential presence-absence polymorphisms in A. thaliana and missing orthologs in A. lyrata. Nevertheless, among genes with presence-absence polymorphisms, A. thaliana genes without orthologs in A. lyrata had a higher frequency of potential deletion alleles than the ones with orthologs.
Both NB-LRR and RLP genes vary substantially in sequence divergence between species. Also similar to NB-LRR genes, RLP genes within clusters are more likely to have missing calls across 80 A. thaliana accessions than do singletons, although NB-LRR genes are much more often deleted (33% compared with 7% for RLP genes). Furthermore, even though NB-LRR genes are overall more divergent between species than RLP genes, genes from both families that are located within clusters are generally more divergent than singletons.

CONCLUSION
The comprehensive analysis of NB-LRR genes provides a valuable resource for many questions related to the evolution of the immune system, including natural variation in innate immunity (Holub, 2007) and hybrid failure (Bomblies and Weigel, 2007). That only little more than half of all A. thaliana NB-LRR genes clearly have orthologs in A. lyrata provides additional evidence for rapid births and deaths of NB-LRR genes (Michelmore and Meyers, 1998). The total number of NB-LRR genes, however, does not differ greatly between the outcrosser A. lyrata and the selfer A. thaliana, indicating that mating system shifts do not have dramatic short-term consequences for the immune gene repertoire.
RLP gene annotation in A. thaliana was based on previous studies (Fritz-Laylin et al., 2005;Wang et al., 2008). Orthologous genes from A. lyrata were extracted from a whole-genome alignment analysis (Hu et al., 2011).

Domain Annotation
NB-LRR proteins were characterized using SMART (http://smart. embl-heidelberg.de/). In addition, based on the alignment of proteins, we extracted NB and adjacent domains separately. All three data sets were clustered independently using mcxdeblast implemented in mcl-06-058 (http://micans.org/mcl/src/) with default parameters.
For clusters of N-terminal domains, COILS (http://www.ch.embnet.org/ software/COILSform.html) was used to identify CC domains when no CC or TIR domain had been detected with SMART in the previous step, using default parameters and a stringency of 90% threshold. Pfam version 23.0 (July 2008; http://pfam.sanger.ac.uk/) was used to confirm the identity of the CC and TIR domains. As long as the N-terminal domain clustered with other CC (TIR)-containing proteins, we considered it a CC (TIR)-containing protein.
Similarly, as long as the C-terminal domain clustered with other LRR domains or the NB domain clustered with other NB domains, we considered it as LRR or NB containing, even when SMART and Pfam results were insignificant. The work flow of domain annotation is outlined in Supplemental Figure S4.

Phylogenetic Analyses
For phylogenetic analyses, NB-LRR protein sequences were aligned with T-Coffee version 7.54 (Notredame et al., 2000) and trimmed to retain only the NB domains. Trees were rooted using a sequence from Streptomyces coelicolor (P25941) as in a previous study (Meyers et al., 2003). PAUP* version 4.0b10 (Swofford, 2003) was used to reconstruct phylogenetic trees using the neighbor-joining method with the Kimura two-parameter model (Kimura, 1980;Saitou and Nei, 1987). Topological robustness was assessed by bootstrapping with 1,000 replicates (Felsenstein, 1985).

Cluster Analyses
The same definition as in a previous study (Richly et al., 2002) was used (i.e. linked NB genes that were not interrupted by more than eight non-NB proteins).

Detection of Gene Conversion
We first collected coding sequences of NB-LRR genes in each cluster, aligned the sequences, and then used GENECONV version 1.81a (Sawyer,  1989) to detect gene conversion with default settings. A global value of P # 0.05 was used as in previous studies (Drouin, 2002;Mondragon-Palomino and Gaut, 2005). GENECONV cannot distinguish between gene conversion and unequal crossing over.

Divergence and Polymorphism
Orthologs were extracted from a whole-genome alignment analysis (Hu et al., 2011). Next, a codon-based alignment of orthologous coding sequences was performed with PAL2NAL (Suyama et al., 2006). Finally, dN, dS, and dN/dS ratios were calculated for ortholog pairs using the codeml program implemented in PAML version 3.15 (Yang, 1997). Synteny was determined based on orthology of five genes each upstream and downstream of the focal locus. If at least one flanking gene was orthologous, orthologs were accepted as syntenic.
Polymorphisms within A. thaliana were identified in whole-genome Illumina resequencing data of 80 accessions (http://1001genomes.org; Cao et al., 2011) based on TAIR9 annotation (which is identical to the TAIR8 annotation, on which we based our initial NB-LRR gene analyses; Swarbreck et al., 2008). Missing call frequencies were calculated as sum_missed/ total_length = (sum_z + sum_n)/((80*gene_length) 2 sum_del 2 sum_svdel), where z indicates zero coverage, n indicates no call (with reads mapped to this region but no base called because of ambiguous alignments; this excludes repetitive regions, which are coded as n in the initial matrix), sum_del indicates the sum of deleted bases predicted from alignments, and sum_svdel indicates the sum of deleted bases based on structural variant predictions.
BAC Sequencing, Assembly, Annotation, and Analysis of the Cvi-0 RPP7 Cluster An AT1G58602-derived 900-bp probe, which was produced from Col-0 genomic DNA as a template with primers G11766 (5#-CAACAAAGAAT-AGCAGCTTGGAGGA-3#) and G11767 (5#-GGGCATGGAGCCTTGTTTT-ACTATC-3#), was used to screen a BAC library of A. thaliana Cvi-0. End sequencing was used to map the borders of positive BAC clones, and a clone that flanks the region from At1g58370 to At1g59630 was chosen for shotgun sequencing. BAC DNA was prepared using the Large Construct DNA Preparation Kit (Qiagen). After physical shearing of high-quality BAC DNA, the TOPO Shotgun Subcloning Kit (Invitrogen) was used to construct a shotgun library. T7 and SP6 primers were used to sequence clones on an ABI 3730XL DNA Analyzer. Sequence assembly was performed using Phred for base calling, CROSS_MATCH for removing vector sequences, and Phrap and Consed for assembly Gordon et al., 1998). Primer walking and PCR were used to fill or polish gaps and ambiguous regions. Annotation was performed using FGENESH and FGENESH+ (http://linux1.softberry.com/berry.phtml). Spidey (http://www.ncbi.nlm. nih.gov/spidey/) was used to confirm the annotation based on the alignment with A. thaliana mRNA when needed. Mummer version 3.19 (Kurtz et al., 2004) was used to compare BAC sequences with the reference genomes of A. lyrata and A. thaliana.
The two Cvi BAC contigs have been deposited in GenBank (accession nos. JN389445 and JN389516).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Phylogenetic relationships of CNL and CN proteins across two species.
Supplemental Figure S2. Phylogenetic relationships of TNL and TN proteins across two species.
Supplemental Figure S4. Flow chart for domain annotation of NB-LRR proteins.
Supplemental Table S1. Domain annotation of A. thaliana NB-LRR proteins.
Supplemental Table S2. Domain annotation of A. lyrata NB-LRR proteins.
Supplemental Table S3. Cluster assignment in A. thaliana.
Supplemental Table S4. Cluster assignment in A. lyrata.
Supplemental Table S5. Mean (median) of dN, dS, and dN/dS for different NB-LRR gene types.    Figure S4. Flowchart for domain annotation of NB-LRR proteins.