Genome-wide association study for morphological traits and resistance to Peryonella pinodes in the USDA pea single plant plus collection

Abstract Peas (Pisum sativum) are the second most cultivated pulse crop in the world. They can serve as human food, fodder, and cover crop. The most serious foliar disease of pea cultivars worldwide is Ascochyta blight, which can be caused by several pathogens. Of these, Peyronella pinodes is the most aggressive and prevalent worldwide. Several traits, including resistance to Peyronella pinodes, stem diameter, internode length between nodes 2–3 and 5–6, and area of 7th leaf, were measured in 269 entries of the pea single plant plus collection. The heritability (H2) of the morphological traits was relatively high, while disease resistance had low heritability. Using 53,196 single-nucleotide polymorphism markers to perform a genome-wide association study to identify genomic loci associated with variation in all the traits measured, we identified 27 trait–locus associations, 5 of which were associated with more than 1 trait.


Introduction
Dry peas (Pisum sativum L.) are the second most cultivated pulse crop in the world, coming after dry beans (FAOSTAT 2014; https://www.fao.org/faostat/en/). The United States is the 4th largest producer and 3rd-largest exporter globally (Rawal and Navarro 2019). Peas can serve as human food, fodder, and cover crop. They fix biological nitrogen and can add up to 146 kg N ha À1 (Parr et al. 2011). Pea production can be affected by various abiotic and biotic stresses (Diego Rubiales et al. 2015). The most serious foliar disease of pea cultivars worldwide is Ascochyta blight (Bernard Tivoli et al. 2007), which can cause up to 75% yield loss under conditions that are favorable for the disease (Bretag et al. 2006). Ascochyta blight is a disease complex caused by several related fungal pathogens individually or in combination (Bernard Tivoli et al. 2007). In North America, the species associated with the disease are Didymella pisi, Peyronellaea pinodella, and Peyronellaea pinodes (Skoglund et al. 2011;Owati et al. 2020); the last of these being the most aggressive and prevalent worldwide (Rubiales et al. 2019;Xue et al. 1997;Bretag et al. 1995;Tivoli et al. 1996). Pathogens are very hard to discriminate based on the symptoms they cause, as all of them cause black to brown spots or lesions on leaves, stems, and pods (Skoglund et al. 2011). However they can be distinguished based on other characteristics such as spore characteristics, appearance on culture plates, and toxin production (Owati et al. 2020).
Ascochyta blight can be controlled with agronomic practices (Bretag et al. 2006;Liu et al. 2016). The use of clean and treated seed will prevent seed-born inoculum. Crop rotation, destruction of infected pea trash, and adjustment to the planting date will decrease the odds of soil and air-born inoculum infestation. To remediate losses, fungicides can also be applied but this results in increased input costs for the farmer. A more sustainable approach is to breed for resistance to the pathogen. So far, low levels of polygenic resistance to P. pinodes have been identified in Pisum sativum (Parihar et al. 2020), and high levels of resistance have been reported in wild peas (Fondevilla et al. 2008;Wroth 1998;Ambuj Bhushan Jha et al. 2012;Jha et al. 2017). Disease resistance has been associated with lower plant growth in many plant systems (Brown and Rant 2013). Although the causes of the tradeoff between growth and defense are not entirely known, they may be based partially on antagonistic signaling pathways and the metabolic costs of the defense response (Karasov et al. 2017). Since plant vigor is also paramount for a breeding program, it is important to know if disease resistance in a population is associated with undesired traits, such as hindered plant growth.
Although peas were the original system used for genetic studies (Ellis et al. 2011), the reference genome was only published in 2019 (Kreplak et al. 2019). Pea (2n ¼ 14) has a large, highly repetitive genome, of 4.45 Gb, almost twice as large as maize (2n ¼ 20, Investigation 2.4 Gb) and almost 4 times that of soybean (2n ¼ 20, 1.15 Gb). To preserve genetic diversity and aid in genomic-assisted breeding, the USDA assembled the pea single plant plus collection (PSPPC) (Holdsworth et al. 2017), which comprises 431 morphologically, geographically, and taxonomically diverse P. sativum accessions. Detailed genotypic data are publicly available for the PSPPC, along with 25 accessions of Pisum fulvum, a wild relative of peas (Holdsworth et al. 2017). Genome-wide association studies (GWAS) aim to connect underlining genetics to traits of interest, and requires a panel with diverse genotypes (Korte and Farlow 2013), the PSPPC is ideal for such studies.
All the published quantitative trait loci (QTL) mapping studies for resistance to P. pinodes have used biparental populations which can only detect effects at loci for which the parents possess functionally distinct alleles. GWAS, in contrast, has the potential to assess the relative effects of multiple alleles at each locus.
This study describes the first GWAS for resistance to P. pinode as well as several morphological traits.

Plant materials
Two hundred and sixty-nine P. sativum lines from the PSPPC collection were planted in a randomized complete block design in a growth chamber at the Phytotron at North Carolina State University, Raleigh, NC, USA. Cultivars Radley and Solara were used as resistant and susceptible checks, respectively. Due to space limitations, only 1 replicate was planted at a time. The experiment was run 7 times, i.e. in 7 replications. For each replication, 3 seeds per entry were planted in a 7.5-cm (225 ml) Styrofoam cup and thinned to 1 plant per plot before inoculation. The substrate used was 50% Sun Gro Propagation Growing Mix (Canadian Sphagnum peat moss 50-65%, vermiculite, dolomitic lime, 0.0001% silicon dioxide), and 50% cement sand. The cups were organized in metal trays holding 5 cups, and 4 metal traits were in each rolling cart. The chamber was kept at 21 C with 12 hr of light, and air humidity fluctuated between 50% and 100%. Plants were not inoculated with rhizobium but were fertilized 3 times a week when watered with a standard nutrient solution in the growth chambers containing nitrogen, phosphorus, potassium, and other micronutrients (https://phytotron.ncsu. edu/general-information/). As the plants grew, they were trellised using thin bamboo stakes.

Pathogen and inoculation
Six isolates of P. pinodes originating from different countries were obtained from the USDA collection kept in Washington (Supplementary Table 1). The pathogen was grown on potato dextrose agar plates for 12-15 days under 12 hr of light and 21 C. Pycnidia were harvested from the plates by gently flooding the plates with a solution of 0.12% Tween 20 in water followed by gentle brushing of the surface of the plate. The pycnidia were counted and the final inoculum concentration was adjusted to 5 Â 10 5 per ml in the same 0.12% Tween 20 solution (Fondevilla et al. 2005). The inoculum was prepared immediately before inoculation and was kept on ice to prevent premature spore germination. The entire procedure, from spore isolation to inoculation of the whole population took about $3 hr.
Plants were inoculated 10-13 days after planting when most of the plants were at the 3-4 expanded leaf stage. In the growth chamber, whole individual plants, and both sides of the leaves, were sprayed with inoculum solution using an airbrush until dripping, using approximately 1 ml of inoculum per plant. After inoculation, each metal tray containing 5 plants was covered in a clear plastic bag and tied to create high humidity conditions favorable to pathogen growth. After 96 hr, the bags were removed.

Phenotyping
The first 3 leaves were individually scored on a 0-6 scale developed by Schoeny et al. (1996), in which 0 corresponds to no symptoms and 6 to necrotic lesions in 75-100% of leaf area. Plants were scored 3-4 times over 10 days. On the final scoring, the total number of nodes of each plant was recorded. On the day after the final score, plants were cut at the base and imaged using an Epson Perfection V500 Photo Scanner. The 7th fully expanded leaf, or the 6th in case there was not a 7th, was removed and laid flat for the scanning. Images were obtained for 5 replicates and processed using the software ImageJ (Rasband 1997). Traits measured were internode lengths between nodes 2-3 and 5-6, stem diameter between nodes 5 and 6, and area of 7th or 6th leaf. These traits will be referenced throughout the article as internode 2-3, internode 5-6, diameter, and leaf area, respectively.

Statistical analysis
The distribution of each replicate for each trait was visualized using the package ggplot2 (Wickham 2016) in R studio. Outliers were defined as observations outside 1.5 times the interquartile range above the upper quartile and below the lower quartile and were excluded. Broad sense heritability (H 2 ) was calculated for each trait by dividing the variance due to genotype by the total variance for the phenotype, which was the sum of variance due to genotype, variance due to the interaction of genotype by replication, and variance due to error ( A single disease score per plant for each date was calculated as an average of the score of 3 leaves scored. For each replication, the standardized area under disease progress curve (sAUDPC) was calculated by averaging the value of 2 consecutive ratings and multiplying by the number of days between the ratings. This was done for each set of consecutive ratings. Values were then summed over the intervals and divided by the total number of days between the first and last evaluations (Campbell and Madden 1990). This calculation means that the sAUDPC scores are on the same scale as the original 1-6 scoring scale.
where y i is the disease score on the ith date, t i is the number of days since the first disease evaluation, and t n is the total number of days between first and last scoring. Best linear unbiased predictors (BLUPs) were calculated for each entry and trait. The ASReml package (Butler et al. 2018) in R studio was used to calculate BLUPS for sAUDPC and the lmer4 (Bates et al. 2015) was used to calculate BLUPS for internode 2-3, internode 5-6, diameter, and leaf area. To calculate BLUPs for sAUDPCs the following model was fit where y is the response variable sAUDPC, and the random effects are: G i is the effect of genotype, R j is the effect of replicate, GR ij is the interaction effect of genotype and replicate, SR kt is the effect of side of the chamber nested in replicate, TCR mnj is the effect of the inoculation unit of tray nested in cart and replicate, NR pj is the effect of the number of nodes nested in replicate.
A simpler model was used to calculate BLUPS for internode 2-3, internode 5-6, diameter, and leaf area where y is the response variable of each trait, G i is the effect of genotype, R j is the effect of replicate, and GR ij is the interaction effect of genotype and replicate.
Processed GBS genotype data aligned with the reference genome was obtained from Powers et al. (2021). The data were then filtered for minor allele frequency (MAF) (retaining alleles with >0.05 frequency) and heterozygosity (retaining lines with lower than 20% heterozygosity) using the software TASSEL (Bradbury et al. 2007). The GAPIT (Lipka et al. 2012) package in R was used to perform GWAS on each trait, using the Bayesian information and linkage disequilibrium iteratively nested keyway (BLINK) model. This model was selected because it has higher statistical power than the model fixed and random model circulating probability unification (FarmCPU). BLINK does not assume that causal genes are distributed evenly across the genome, as FarmCPU does, leading to fewer false positives and exclusion of causal genes (Huang et al. 2019). While FarmCPU divides the genome in equal sized bins to find the most significant marker in each bin, BLINK groups SNPs based on linkage disequilibrium regardless of how physically close they are, allowing the discovery of clusters. The BLINK models were fit as: where y is a vector of phenotypes; s i is a testing marker; S is a pseudo quantitative trait nucleotide; and e is the unobserved vector of residuals.
False discovery rate (FDR) adjusted P-values, with a threshold of 0.05, was calculated for all single nuclear polymorphisms (SNPs) using GAPIT (Lipka et al. 2012). The GAPIT (Lipka et al. 2012) package was also used to select the optimal number of principal components (PC) included in the model using a BIC; PC reflects the degree of population structure that should be accounted for in the model. The results indicated that no PCs were necessary in the model.

Candidate genes
Genes within 1 Mb of highly associated SNPs, as identified using the P. sativum v1a genome (https://www.pulsedb.org/jbrowses), were considered as candidate genes.

Results and discussion
For this experiment, 269 of the 431 lines that comprise the PSPPC were available for use. The lines were evaluated in controlled conditions for resistance to P. pinodes, and for 4 other morphological traits: 7 th leaf area, internode length between nodes 2 and 3, internode length between internodes 5 and 6, stem diameter between nodes 5 and 6. The experiment was run in 7 replications in a complete block design, with 1 full replicate per run. sAUDPC correlations between replicates were highly significant and ranged from r ¼ 0.16 to 0.35 (P-values <0.01) (Supplementary Table 2) and the broad-sense heritability for this trait was 0.3 (Table 1). Replicate was the largest source of variance for sAUDPC (Supplementary Table 3). The most likely causes of variation between replicates, and for the consequent moderate correlations and heritability we observed for sAUDPC, were variation in the inoculum, which was prepared fresh for each run. Replicate was also a substantial source of variance for most of the physiological traits measured (Supplementary Table 3). While growth chamber conditions were constant across replicates, variation in plant growth caused by factors such as variations in seed quality, germination speed, planting depth and local microclimates, may have also been a factor. Variation in plant growth was also a likely cause of variation in disease symptom measurements.
The 4 morphological traits (leaf area, diameter, internode 2-3, internode 5-6) were measured in 5 replicates. Each morphological trait had a higher correlation between replicates than sAUDPC, ranging from r ¼ 0.58 to 0.84 (P-values <0.01 for each trait and Supplementary Table 2), and broad-sense heritability ranging from 0.7 to 0.78 (Table 1). BLUPs were calculated for every trait and the largest source of variation was always entry (Supplementary Table 3). Correlations between BLUPs for all traits, including sAUDPC, ranged from 0.22 to 0.9 and were significant at a P-value < 0.01 (Fig. 1). Since the morphological traits are all parameters of plant growth, it was expected that they would be highly correlated. Disease resistance was moderately correlated with all the growth traits, suggesting that lower growth was associated with higher disease resistance. Associations between lower plant growth and higher disease resistance have been observed frequently in a number of plant systems (Brown and Rant 2013). The basis of the so-called "growth-defense tradeoff" is not entirely understood but may be based partially on antagonistic signaling pathways and the metabolic costs of the defense response (Karasov et al. 2017).
After filtering 319,141 SNPs for MAF (>0.05) and heterozygosity (<20%), 53,196 SNPs were used for GWAS. Q-Q plots for each GWAS suggested that models had a good fit and that reliable SNP-trait associations were identified (Fig. 2). In total, 27 significant trait-locus associations were identified across all 5 traits ( Fig. 2 and Table 2). The high phenotypic correlations between some of the growth-related traits, especially leaf area and stem diameter and between internodes 2-3 and internodes 5-6 were reflected in shared genetic control of the traits. Five SNPs were associated with more than 1 trait; 2 SNPs (S1LG6_261934321, S5LG3_544595701) were associated with both stem diameter and leaf area, 1 SNP was associated with internode 2-3 (S5LG3_569851018) and leaf area, and 1 SNP (S5LG3_572900348) was associated with internode 2-3 and internode 5-6 ( Table 2). One SNP (S1LG6_369964198) on chr1 (LG6) was associated with 3 traits, sAUDPC, leaf area, and stem diameter. All the SNPs that were significant for more than 1 trait had the minor allele phenotype related to a smaller plant; shorter internode, smaller leaf, thinner stem, and less disease. After calculating the physical distance between significant SNPs, only SNPs S2LG1_353493 and S2LG1_528924 were closer than 1 Mb, both were associated with internode 2-3, but 1 had the minor allele trait as longer internode while the other longer internode (  Table 4, we list the 415 predicted genes within 500-kb flanking significant SNPs. This constitutes a very preliminary list of candidate genes that might be involved in controlling these traits. The reference genome was published only 3 years ago (Kreplak et al. 2019) and many previous QTL and SNPs positions have been reported by linkage group (LG) instead of chromosome (chr). Here we will refer to the SNP position according to the chromosome and its associated LG reported in the reference genome (Kreplak et al. 2019). We used the BLAST tool available at PulseDB website (https://www.pulsedb.org/blast) to find the location on the reference genome of previously identified significant markers and proteins. Past studies conducted both in the field and in controlled conditions, found QTL associated with resistance to P. pinodes on multiple chromosomes (Fondevilla, Kü ster, et al. 2011;Prioul et al. 2004;Prioul-Gervais et al. 2007;S. Fondevilla et al. 2008S. Fondevilla et al. , 2018Carrillo et al. 2013Carrillo et al. , 2014Fondevilla, Almeida, et al. 2011;Timmerman-Vaughan et al. 2004, 2002Castillejo et al. 2020;Jha et al. 2017). Through the use of proteomics and gene expression studies, some candidate genes have been reported (Fondevilla et al. 2018;Castillejo et al. 2020), none of those were found to be within 500 kb flanking of the significant SNPs found in the current study. One reason for this may have been that previous studies relied on biparental populations, which will have fewer segregating alleles than in a diversity panel, and those alleles could be different than the ones segregating in this population. Experimental conditions and pathogen strain used can also be a cause of different results, most of the previous studies were phenotyped in the field, while we used controlled conditions. Little is known about the difference between the strains used in the present and past studies and the existence of P. pinodes pathotypes have been reported (Khan et al. 2013). If the strains belong to different pathotypes, difference in resistance could be observed. Furthermore, past studies utilized very few markers in comparison with the present study, which can result in a lower resolution for the QTL positioning.
The Le gene, described by Mendel (Ellis et al. 2011), was the first gene found to be associated with internode length in pea. The gene was mapped on chr5(LG3) (Psat5g299720.1, position: chr5LG3:567365719.567368443), and shown to affect gibberellin (GA) biosynthesis (Lester et al. 1997). GA affects many aspects of plant growth, including stem elongation and leaf expansion (Achard and Genschik 2009). Although Le plays a significant role in stem length, different studies have found multiple regions of the genome associated with internode length, suggesting that the trait is more complex than previously suggested by Mendel's studies Tafesse et al. 2020). Today we know that there are at least 3 more genes involved in the production of GA (Reid and Potts 1986;Ingram and Reid 1987;Martin et al. 1999;Davidson et al. 2003) and 2 more involved in the response to the hormone (Peck et al. 2001). The closest significant SNP to Le in the present study (S5LG3_569851018) is 2 Mb away and it was associated with leaf area and internode 2-3. Eight other SNPs were associated with internode length, illustrating that more than 1 gene is responsible for the trait.
Using biparental populations, Smitchger and Weeden (2019) found that a QTL on chr2(LG1) associated with seed size, side branch diameter, leaf length, main stem diameter, compressed side branch thickness, and compressed main stem. A QTL for seed size in a similar position was previously reported and called Tsw1.1, so Smitchger and Weeden (2019) called the QTL Putative Tsw1.1. We identified a significant SNP (S2LG1_353493) for internode 2-3 at 1.09 Mb distance from Putative Tsw1.1.
Marker S5LG3_544595701, which here was significantly associated with leaf area and stem diameter, is 0.23 Mb from Fig. 1. Distribution, correlations, and scatter plots of each trait BLUPs. Line for the best fit of data is displayed. *** correspond to P-value <0.001.
marker Chr5LG3_572669963 which was associated with reproductive stem length by Tafesse et al. (2020) in a GWAS that used a different population than the present study. Since these traits are both related to growth parameters and could be affected by similar processes, the markers could be associated with the same gene or a gene cluster in the region. We also found a marker (S5LG3_569851018) associated with leaf area to be 0.06 Mb from another marker (chr5LG3_569788697) previously associated with leaf chlorophyll concentration (Tafesse et al. 2020). Manhattan plots and the corresponding Q-Q plots for the traits indicated. a, e) sAUDPC for P. pinodes. b, f) Internode length between nodes 2 and 3. c, g) Internode length between nodes 5 and 6. d, h) Area of 6th leaf. e, i) Stem diameter between nodes 5 and 6. In the Manhattan plots, the Àlog 10 Pvalues adjusted for FDR (y-axis) are plotted against the position of each chromosome (x-axis), each circle represents an SNP, solid green line represents FDR adjusted P-value threshold of 0.005 and dotted line threshold of 0.05. Ashtari Mahini et al. (2020) used a biparental population to evaluate resistance to Sclerotinia sclerotium in peas in controlled conditions and found the same QTL to be associated with internode length and 2 measures of disease resistance. The allele conferring resistance was also associated with shorter plant stature. We found that SNP S1LG6_369964198 on chr1 (LG6) had the minor allele associated with less disease (smaller sAUDPC), smaller leaf area, and thinner stem. This SNP in conjunction with the positive correlation of growth traits and sAUDPC (Fig. 1) is a strong indication that growth parameters and disease resistance can be affected by the same genes. This is not a new phenomenon, disease resistance genes might come at a metabolic cost for plants (Karasov et al. 2017), which could explains the results observed in this study that genotypes with smaller leaves have less disease.

Data availability
GBS data aligned with the reference genome is available at https://github.com/selizpowers/GWAS (Powers et al. 2021). All phenotypic data are available in Supplementary Table 5.
Supplemental material is available at G3 online.

Acknowledgments
We would like to thank the NCSU Phytotron staff, for the good care of the facility and care of plants. Also, we thank the guidance of Dr. Rebecca McGee through the grant writing and experiment evaluations.

Conflicts of interest
None declared.