Phenotypic and genomic analyses of a fast neutron mutant population resource in soybean

Mutagenized populations have become indispensable resources for introducing variation and studying gene function in plant genomics research. In this study, fast neutron (FN) radiation was used to induce deletion mutations in the soybean ( Glycine max (L.) Merrill) genome. Approximately 120,000 soybean seed were exposed to FN radiation doses of up to 32 Gy to develop over 23,000 independent M 2 lines. Here, we demonstrate the utility of this population for phenotypic screening and associated genomic characterization of striking and agronomically-important traits. Plant variation was catalogued for seed composition, maturity, morphology, pigmentation, and nodulation traits. Mutants that showed significant increases or decreases in seed protein and oil content across multiple generations and environments were identified. The application of CGH (comparative genomic hybridization) to lesion-induced mutants for deletion mapping was validated on a mid-oleate X-ray mutant M23 with a known FAD2-1A gene deletion. Using CGH, a subset of mutants was characterized, revealing deletion regions and candidate genes associated with phenotypes of interest. Exome resequencing and sequencing of PCR products confirmed FN-induced deletions detected by CGH. Beyond characterization of soybean fast neutron mutants, this study demonstrates the utility of CGH, exome sequence capture, and next generation sequencing approaches for analyses of mutant plant genomes. We present this FN mutant soybean population as a valuable public resource for future genetic screens and functional genomics research. that mutant the causative This study describes the assembly and characterization of the largest collection of soybean FN mutants to date accompanied by the only FN mutant database to display genome-wide coverage of deletion events in addition to recorded phenotypic traits. X-ray mutant M23 with a known FAD2-1 gene deletion. Through CGH, we delineated an X-ray-induced genomic deletion to an estimated size of 163.6 kb on chromosome 10, encompassing the FAD2-1A gene, and subsequent sequencing of the region showed that the deletion covered


INTRODUCTION
The release of whole genome sequences in crops such as soybean (Schmutz et al., 2010) marks a new era for genomics research in crop species. Soybean (Glycine max (L.) Merrill) is one of the most valued crops for its ability to fix nitrogen and provide seed protein and oil. Resources to study gene function in this important species are needed, and using mutagenesis to develop population resources has long proven to be a key step for identifying gene function in many organisms.
A number of mutagen sources exist for introducing genomic variation. These include chemical, radiation, and transformation-induced mutagenesis of plant genomes (Ostergaard and Yanofsky, 2004;Waugh et al., 2006;Kuromori et al., 2009). Each of these methods results in a signature footprint of structural variation across the genome (Alonso and Ecker, 2006). Fast neutron (FN) radiation is a particularly promising source of mutagenesis due to the potential to create deletions in a wide range of sizes (Li and Zhang, 2002) for gene knockouts and disruptions.
FN radiation has been used to induce mutations for many decades and has been shown to be an effective mutagen in plants ( Koornneef et al., 1982). The majority of mutations that result from FN bombardment are DNA deletions that range in size from a few base pairs to several megabases (Li et al., 2001;Men et al., 2002). Precedence exists in many species, including Arabidopsis (Alonso et al., 2003), Medicago truncatula (Oldroyd and Long, 2003), Glycine soja (Searle et al., 2003), barley (Zhang et al., 2006), and Lotus japonicus (Hoffmann et al., 2007) for the use of FN mutagenesis in forward genetic screens. Many phenotype-associated genes have been successfully identified and cloned through such screens (Meinke et al., 2003).
Recent years have seen renewed interest in FN mutagenesis, and the advent of whole genome technologies has increased the capacity for mutant screening. Comparative genomic hybridization (CGH) microarrays and high-throughput next-generation sequencing (NGS) platforms are powerful tools for the analysis of copy number changes, polymorphisms, and structural variation in the genome. CGH compares DNA hybridization intensities from each genomic sample against a set of fixed probes and is capable of detecting regional copy number changes, such as deletions and duplications (Carter, 2007). NGS can be used to map deletions and insertions, polymorphisms, inversions and translocations (Medvedev et al., 2009). Coupling whole exome sequence capture with high-throughput NGS can target genes for selective resequencing. Previous studies have shown the utility of each of these approaches to identify and locate genomic changes (Sebat et al., 2004;Hodges et al., 2007;Korbel et al., 2007;Choi et al., 2009). Application of these technologies to a FN mutant population brings detection of gene deletions and phenome-based genetic screens to a whole genome level.
In this study, we describe the development of a FN mutant population resource in soybean (Glycine max). This population was used to screen for seed composition, maturity, morphology, pigmentation, and nodulation mutants. Phenotypes observed in the FN mutant population are described, and deletions within mutated soybean genomes of interest are characterized. We show the promise of the FN mutant population as a community resource and demonstrate the utility of CGH and NGS methods for performing informative genome analyses on such a population.

Soybean fast neutron population resource development
Approximately 60,000 soybean seeds of cultivar M92-220 were irradiated with fast neutrons in the first round of mutagenesis. One quarter (15,000) of the seeds were irradiated at each of the following doses: 4, 8, 16, and 32 Gy. In the first season, 20,000 M 1 seeds, 5,000 from each dose, were planted in two locations and harvested by single seed descent. An additional 60,000 seeds were irradiated in a second round of mutagenesis, half at 16 Gy, and half at 32 Gy, for greater representation in the higher doses, and seeds were also harvested by single seed descent. Independent M 2 plants were propagated in three locations and seed was harvested from individual plants. Statistics on dose comparisons for emergence, survival and propagation percentages were compiled (Table I). As expected, seeds exposed to greater FN radiation doses resulted in fewer emerged M 1 plants in the field and fewer individuals that produced viable seed at the end of the season. In the M 2 generation, radiation dose also correlated inversely to the number of mutant lines that produced seed but did not affect plant emergence in the same manner. As the number of loci that are involved in overall plant development and reproduction are far greater than those limited to plant emergence, this result was not surprising. Since mutant phenotypes in the M 2 generation are more likely due to heritable effects, as opposed to mutant phenotypes in the M 1 generation that may be due to physical damage from FN bombardment, M 2 plant leaf tissue and phenotypes were collected and catalogued. Corresponding M 3 seed was collected from independent M 2 individuals to form the basis of a FN mutant library resource.

Forward screen for seed composition mutants
An immediate function for the FN soybean population was to serve as a forward genetics resource for seed composition mutants. To this end, 10,000 M 3 lines were screened by NIR (Near InfraRed) spectroscopy for seed protein or oil percentage of total seed composition.
Additional preliminary data on carbohydrates and fatty acid and amino acid composition was also determined for a subset of samples. The mean value and detected range for seed composition components were assayed on seed harvested from three different locations (Supplemental Table S1) and showed significant increases and decreases in seed protein and oil within the FN mutant population.
A subset of mutant families consistently displayed increased or decreased seed protein or oil composition in subsequent generations. Eight seed protein and oil mutants, designated as "PO" mutants, were chosen primarily based on prior repeat performances in year-to-year rankings. PO1, PO3, and PO8 are mutants that displayed high seed protein phenotypes and ranked fourth, second, and first, respectively, among mutants in the M 4 generation. PO3 also exhibited chimeric leaf pigmentation. PO2 possessed the second highest seed oil content detected in the M 4 seed. PO4 and PO5 exhibited the lowest seed oil content, where PO4 also showed high seed raffinose and sucrose (0.82% and 8.29% of total seed carbohydrate, respectively) content. PO6 exhibited the second lowest seed protein content in the M 3 generation and the lowest seed protein content in the M 4 generation. PO7 exhibited the highest seed oil content in the M 3 and M 4 generations. PO3 showed an increase in combined seed protein and oil content compared to M92-220, and PO7, PO2, and PO8 also showed minor increases in combined seed protein and oil content.
The percent protein and oil composition of M 6 seed from the 2010 season for mutants PO1-PO8 are displayed (Table II). High seed protein and low seed protein as well as low seed oil phenotypes were confirmed in 2010. However, the selected high seed oil phenotypes were not observed in the 2010 season, and we attribute this variation to growing season conditions. Several seed protein and oil content mutants displayed maturity differences; however, maturity date did not correlate with a fixed trend in high or low seed protein or oil.

Fast neutron mutants with visual phenotypes
Visual phenotypes were recorded for the FN mutant population. Over five hundred independent individuals were observed that displayed an abnormal visual phenotype. Altered phenotypes were observed for approximately two percent of the soybean fast neutron mutant population.
Visual phenotypes observed in the field that were not attributed to disease were categorized under the areas of morphology, pigmentation, or maturity. Morphological phenotypes accounted for over seventy percent of the observed abnormal phenotypes. Pigmentation abnormalities were noted for over thirty percent of observed phenotypes, and maturity differences were recorded for around fifteen percent of recorded observations. Some of the mutant lines displayed more than one documented phenotype. Most visual phenotypes were observed for mutants derived from seed treated with 16 Gy (51.7%) or 32 Gy (38.8%) FN radiation doses.
A subset of mutants representing a few of the visual phenotypes observed in the FN population are shown in Figure 1. These include a short trichome mutant ( Figure 1A), chimeric and yellow pigmentation mutants ( Figure 1B and 1C), and a short petiole mutant with crinkled, curled leaves ( Figure 1D). A separate screen for root and nodulation mutants was also conducted within the growth chamber using duplicate M 2 seed. Among observed phenotypes during the root and nodulation phenotype screen were a non-nodulating mutant ( Figure 1F), a robust mutant with early podset (Figure 1G), and a hypernodulating mutant ( Figure 1H).

CGH validation and marker development for a known FAD2 gene deletion in the M23 mutant
CGH (comparative genomic hybridization) possesses the potential to map deletions in mutant genomes. Accordingly, a custom NimbleGen soybean 700K-feature CGH microarray containing 696,139 unique soybean probes was designed. This design incorporated unique probes that were spaced approximately every 1100 bp along the reference soybean genome sequence (www.phytozome.org). To validate the CGH method, genomic DNA from the M23 line was analyzed to verify a known ω -6 fatty acid desaturase (FAD2-1A) gene deletion. Increased oleate content in soybean oil improves its oxidative stability, thereby reducing the risk of producing trans-fatty acids from chemical hydrogenation of processed oil. Several soybean cultivars with increased oleate content are available, one of which is the mid-oleate mutant line M23 (Rahman et al., 1994) derived from X-ray mutagenized seed of the Bay soybean cultivar. Genetic studies indicated that the mid-oleate phenotype of M23 is associated with the deletion of FAD2-1A (Alt et al., 2005;Sandhu et al., 2007;Anai et al., 2008).
Given the agronomic importance of M23 in current breeding programs (e.g., (Scherder et al., 2008) for increased seed oleate content, CGH could also be utilized to define the presence of other genetic lesions in the M23 genome, in addition to the previously reported FAD2-1A deletion. Based on the normalized log 2 ratio of M23 to control (cultivar Bay) CGH data, a 163.6 kb CNV (Copy Number Variation) event was detected on chromosome 10 where the M23 hybridization signal was approximately four-fold less than Bay, indicating a deletion ( Figure   2A). According to the reference soybean genome sequence, twenty annotated genes are predicted within the chromosome 10 deletion (Supplemental Table S2 To confirm the chromosome 10 deletion, the genomic region encompassing the deletion junction was amplified by PCR. Flanking primers used for amplification were approximately 1.5 kb from the predicted deletion. As expected, no amplification was detected when genomic DNA from wild-type Bay was used as template, whereas a product of ~3.0 kb was obtained with M23 genomic DNA ( Figure 2B). Sequencing of the PCR product precisely mapped the deletion to base positions 49369546 to 49533559 of chromosome 10. Except for an extra nucleotide at the deletion junction, no sequence change was detected in the DNA regions immediately flanking the deletion. The PCR primers designed to amplify this region could be used as a molecular marker to select for the mid-oleate phenotype in segregating breeding populations.

Genomic analysis of fast neutron mutants by comparative genomic hybridization
Many aberrant phenotypes were observed in the soybean FN mutant population and found to be heritable. These phenotypes were hypothesized to result from genomic changes within the mutant after exposure to FN radiation. To confirm and characterize mutations at the genomic level, we performed comparative genomic hybridizations (CGH) using the custom NimbleGen soybean 700K-feature CGH microarray. Thirty microarray hybridizations were performed using genomic DNA from a subset of soybean FN mutants. These mutants were categorized under the Eight seed protein and oil composition mutants were chosen for CGH analyses based on high or low seed protein or oil composition across multiple environments and generations (Table II).
The M 4 generation was used for CGH analyses, and high/low phenotypes were confirmed on the M 5 seed harvested from an M 4 plant. Genomic regions that exhibited changes in DNA copy number were detected in these soybean FN mutants and collectively displayed in Figure 3.
Soybean FN mutants with aboveground visual phenotypes were also assayed by CGH to detect and map genomic changes. VP1 ( Figure 1A) and VP2 mutants were independently recovered and both exhibited a short trichome phenotype. VP3-VP5 pigmentation and short petiole mutants ( Figure 1B-D) were also assayed. In addition, a copper leaf colored mutant (VP6), a mutant with abnormal floral meristem development (VP7), a mutant showing fused trifoliates (VP8), and a mutant exhibiting thick, twisted petioles (VP9) were subjected to CGH for genomic analysis. DNA copy number changes detected by CGH for these visual phenotype mutant lines are displayed in Figure 4. Also shown are genomic regions with DNA copy number changes detected by CGH for late maturity, early maturity, and root and nodulation mutants (Supplemental Figures S1-S3).
Microarray hybridizations were performed using genomic DNA from M 2 , M 3 , or M 4 plant tissue.
Of the thirty hybridization results, a subset of M 4 seed composition and maturity mutant results 1 1 were derived from a common M 2 or M 3 plant. Thus, several detected DNA copy number change regions coincided (Figure 3 and Supplemental Figure S1: Chromosome (Ch) 10, ~48.7 Mb (PO1, LM1, LM5); Ch 10, ~23 Mb (PO4, PO5)) due to shared pedigree. These results refer only to the relationship of specific mutant plants that were assayed by CGH and not to mutant lines in general that are unique.

Screening for false-positive CNV events in the soybean FN population
A number of genomic locations consistently exhibited CNV across many FN mutant genotypes.
These can be observed in the multicolored groupings above or below the axis ( Table S3, Supplemental Figure S4). These data indicate that there were several regions of genomic heterogeneity maintained among the M92-220 individuals within the population (bulked seed) that was exposed to FN mutagenesis. This type of intra-cultivar heterogeneity appears to be typical of many soybean accessions (Haun et al., 2011). Segregation of regions of genomic heterogeneity that include natural copy number polymorphisms may appear as falsepositive fast neutron-induced duplications or deletions in CGH CNV analyses.
We used a combination of CGH and SNP genotyping data to mask regions of genomic heterogeneity from FN CNV analysis. Microarray probes located within confirmed polymorphic SNP regions were removed from consideration as FN-induced polymorphisms. The degree of variation at each probe position across 30 CGH microarrays was visualized by calculating and plotting values that crossed the 95 th percentile log 2 CGH ratio threshold (Supplemental Figure   S4). Major peaks of variation and SNP locations coincided to confirm regions of genomic heterogeneity within the population and to differentiate these regions from DNA copy number variation arising from other sources, i.e., FN bombardment.

Genomic CNV events detected in fast neutron mutants
Analysis of CGH data revealed a total of 61 genomic DNA regions with CNV among the 30 lines tested. This number was calculated using a stringent threshold of segments greater or less than three standard deviations from the mean normalized log 2 ratio for the sample versus control.
Of the 61 CNV events that passed the threshold, 52 were putative deletion regions (85.25%), and 9 were putative duplication regions (14.75%). The average number of regions with CNV per mutant was 2.03, and the average number of deletion regions detected per mutant was 1.73.
The number of genes located within each of the 61 significant CNV events ranged from zero to 145. On average, seventeen genes were found per CNV event. A total of 1048 genes, including 634 high confidence genes, were found within all CNV events defined across the 30 CGH microarrays (Supplemental Table S5). For 130 of these genes, putative paralogous genes were 1 3 found elsewhere in the genome. Over half of the detected CNV events occurred in pericentromeric regions, with 29 of 52 deletions and 5 of 9 duplications. This finding may be expected, as approximately half of the genome space consists of pericentromeric regions.
Importantly, deletion regions that contained single genes and mutants with as few as four predicted genes within all detected deletions were recovered.

Genomic analysis of soybean fast neutron mutants by exome resequencing
To confirm CGH-detected deletions in FN mutants, exome capture and resequencing was there was no evidence for the existence of exon sequence in these respective regions in the mutant sample DNA, while the control sample did provide sequence reads that mapped to these regions ( Figure 5B and 5D). Similarly, exome resequencing and whole genome paired-end mapping with NGS confirmed a CGH-detected deletion on chromosome 13 at ~42.3 Mb in VP5 (data not shown). Exome resequencing was performed on genomic DNA from a sibling of the short trichome VP1 mutant assayed by CGH; this sibling also displayed the short trichome phenotype. In addition, whole genome paired-end mapping using NGS was performed on VP1 (data not shown). The same deletion was detected on chromosome 5 at ~36.4 Mb by three different approaches, CGH, exome resequencing, or whole genome paired-end mapping through NGS, in two individuals with the VP1 short trichome mutation.

Demarcation of fast neutron deletion regions and cosegregation of a deletion with a dominant mutant phenotype
A small deletion detected by both CGH and exome resequencing in VP1 contained a single gene (Glyma05g31280) encoding a tetratricopeptide repeat-containing chaperone binding protein.
This region was chosen for polymerase chain reaction (PCR) confirmation. Primers were designed within the CGH probe sequences that flanked the deletion region, and PCR was performed under short extension conditions on VP1 versus wild-type M92-220 genomic DNA templates. Gel electrophoresis of the PCR products was performed. A single 579 bp product was obtained from VP1 and not seen in the wild-type control ( Figure 6A). Upon sequencing of the 579 bp product and alignment to the reference genome sequence, the exact breakpoint sites for the deletion were determined to span chromosome 5 base positions 36426532-36430207 ( Figure 6B). A larger deletion of nearly 40 kb on chromosome 10 that was detected by CGH ( Figure 5A) and exome resequencing ( Figure 5B) was also confirmed by PCR in PO1 ( Figure 6C and D).
A putative heterozygous deletion on chromosome 17 detected by CGH and by whole genome paired end mapping through NGS (data not shown) in VP5 was also confirmed by PCR.
Sequencing of the PCR product ( Figure 6E) revealed an 837,919 bp deletion on chromosome 17. This deletion spans chromosome 17 base positions 7770585-6932666, encompasses 87 highconfidence genes, and interrupts a ubiquitin-specific proteinase gene at one end. M 3 progeny of VP5 segregated ~3:1 for the short petiole phenotype, indicative of a dominant mutant phenotype.
The chromosome 17 deletion was found to cosegregate with the mutant phenotype ( Figure 6E and F) in all fourteen M 3 progeny. We used CGH to genotype six of the M 3 progeny (data not shown) and found three mutant individuals heterozygous for the chromosome 17 deletion. A total of 41 M 4 individuals derived from the heterozygous M 3 parents were scored for plant architecture and the chromosome 17 deletion. A perfect correlation was found between the presence of at least one copy of the chromosome 17 deletion and the short petiole phenotype among the 30 mutant and 11 wild-type segregating individuals. These data suggest that the chromosome 17 hemizygous deletion may be sufficient to confer the mutant phenotype in a dominant fashion or is tightly linked to the dominant causative locus. Furthermore, these data 1 5 provide evidence for the potential to detect associated loci and develop specific markers for mutant phenotypes within this soybean FN mutant population.

Soybean fast neutron mutant database
The complete catalog of soybean fast neutron M 2 mutants from this study has been launched at http://www.soybase.org/mutants. FN mutant trait data, observed phenotype descriptors, and photographs are presented on this site along with parallel data from the unmutagenized wild-type M92-220. Currently, the soybean FN mutant database lists over 23,000 independent FN mutant lines. The original seed composition data on M 3 seed from over ten thousand independent mutants is also displayed on the site.  BLAST (Altschul et al., 1990) allow the user to find mutants with CNV events that cover a gene with nucleotide or protein sequence similarity to a sequence of interest.
An added "mutant" track for FN mutants on the genome browser displays the location of all CNV events defined in this study and leads to information on the genes within identified deletion and duplication regions.

Genomic mutation detection methods
The Nimblegen 700K soybean CGH microarray was designed with unique probes spaced approximately every 1100 bp along the soybean genome sequence for a total of 696,139 probe sequences. This coverage allowed for preliminary whole genome information to be gained regarding the effect of FN radiation exposure on soybean genomic DNA. The resolution of the microarray platform and analysis method was limited in many regions to approximately 2 kb due to the requirement for a CNV event to cover at least two adjoining probes. In reality, we were able to detect a 986 bp deletion because the spacing between probes in this case was closer than average at ~450 bp. It is also possible that a single deviant datapoint on the CGH array could represent a true deletion, however such a region would not be identified using our stringent filtering criteria, and the exact breakpoints for fragmented deletion borders may not be resolved using CGH alone. The size of FN-induced deletions defined in this study range from less than 1 kb to almost 3 Mb and expands the range of FN-induced deletions reported in plants (Li et al., 2001;Men et al., 2002). However, an underestimation of structural variation events, particularly of smaller deletions and duplications, is likely to occur.
To validate the use of CGH for detecting gene deletions, we performed CGH on the mid-oleate X-ray mutant M23 with a known FAD2-1 gene deletion. Through CGH, we delineated an Xray-induced genomic deletion to an estimated size of 163.6 kb on chromosome 10, encompassing the FAD2-1A gene, and subsequent sequencing of the region showed that the deletion covered 1 9 164.01 kb. This control provided a proof of concept that CGH analysis can be reliably used to identify and develop markers for important deletion breakpoints in the soybean genome.
The use of paired-end mapping NGS technologies to detect structural variations in the genome has been reported previously (Korbel et al., 2007). In this study, we combined paired-end mapping NGS with array capture technologies to detect the presence and absence of gene exons in mutant versus control genomic DNA. Our use of NGS combined with exome capture and whole genome paired-end mapping confirmed CGH-detected deletions. Hybridization-based platforms are subject to many sources of error that may reduce the power of detecting polymorphisms, particularly for small features. The addition of NGS approaches allowed us to impose stringent parameters that, in conjunction with CGH, resulted in high-confidence calls, particularly for polymorphic genomic regions. The use of NGS paired-end technologies also adds the utility of mapping the insertion location of genomic duplications and translocations. As NGS costs decrease in the future, the cost per genotype will likely become comparable or preferable to CGH and enable the high-throughput genotyping of lesion-induced mutant populations.

Connecting gene to function
Genes detected in copy number change regions are candidate genes for the associated traits of interest. Assessment of existing annotations reveals potential genes that may play a role in the observed phenotype. For example, genes involved in regulation of protein metabolism (Glyma10g18310, encoding ubiquitin-conjugating enzyme) and proteolysis (Glyma10g19260, encoding serine carboxypeptidase) are deleted in low seed oil mutants (PO4 and PO5, Supplemental Table S5). Transcripts for a gene encoding a ubiquitin-conjugating enzyme were previously observed to accumulate during seed development of a high seed oil soybean cultivar (Wei et al., 2008). In addition, a gene involved in protein biosynthesis (Glyma02g03750,  Table S5). Previously identified QTL regions for seed protein, seed oil, and pod maturity exist on chromosomes with detected deletions. Additional mapping and follow-up studies will be required to determine whether these regions coincide.
Greater evidence for the existence of functional genes within FN mutant deletion regions may be found through the use of complementary genome resources. For example, examination of genes within deletion regions that possess minimum transcript accumulation evidence of at least ten read counts in at least one tissue of the recently reported RNA-Seq soybean gene expression atlas (Libault et al., 2010;Severin et al., 2010) reduces the gene list from 1048 to just over two hundred genes. Of these genes, only ~150 do not possess paralogs elsewhere in the genome.
Using such deduction methods, the potential pool of candidate genes may be condensed. The genome browser function at SoyBase (www.soybase.org/gbrowse/cig-bin/gbrowse (Grant et al., 2009)) allows for direct comparison of the location of soybean fast neutron deletions and duplications detected in this study to soybean gene models, gene expression evidence, and duplicated gene segments.
The assembly of a FN mutant population in soybean facilitates the study of genes with loss-of function phenotypes. Our findings support the utility of FN radiation as a mutagen to delete gene regions with tandem duplications. For example, eighteen genes with F-box and WD domain annotations (Glyma15g19120-Glyma15g19290) were found within a deletion region detected in EM3 (Supplementary Table S5). Multiple leucine-rich repeat genes (Glyma16g27520-Glyma16g27560), aminotransferase-related genes (Glyma09g07320-  Together with NGS technological advances, the utility of a fast neutron mutant population may now be explored more rapidly and comprehensively than ever.

Resource development
Soybean seeds of mid-maturity group I cultivar M92-220, derived from the 2006 Crop Improvement Association seed stock of the variety MN1302 (Orf and Denny, 2004), were exposed to fast neutron radiation doses of 4, 8, 16, and 32 Gy at the McClellan Nuclear Radiation Center at UC-Davis. M 1 seed was planted and propagated by single seed descent. M 2 seed was planted in a grid format, and M 2 plants were individually tagged with an assigned barcode identifier. Young leaf tissue was collected for each M 2 plant, and M 3 seed was singleplant harvested from greater than 20,000 M 2 plants. Seeds were analyzed for seed composition by NIR using a Perten DA7200 diode array instrument (Huddinge, Sweden) equipped with calibration equations developed by Perten in cooperation with the University of Minnesota.

SNP genotyping
Genomic DNA samples from soybean leaf tissue were isolated using the Qiagen DNeasy kit protocol. DNA samples were then assayed on the Illumina Goldengate® platform for genotyping using 1536 soybean Universal SNP BARC markers (Hyten et al., 2010).
Polymorphic SNP markers across the soybean FN population were detected, and SNP markers were subject to BLAST (Altschul et al., 1990) analysis to recover the physical SNP position along the reference genome sequence.

Comparative Genomic Hybridization (CGH)
Comparative genomic hybridizations were performed as previously described (  CGH data was analyzed using the Roche NimbleGen NimbleScan v2.5 segMNT algorithm. Corrected log 2 ratios were obtained for each probe datapoint. Probe segmentation and corrected log 2 ratios were obtained for each probe datapoint essentially as described (Haun et al., 2011).
Parameters were set for minimum segment lengths of two probes and segment log 2 ratio differences of 0.1 between segments at the 0.999 acceptance percentile. Spatial correction and qspline normalization was applied. Significant copy number changes were determined by retrieving segments with an average corrected log 2 ratio greater than the average plus three standard deviations (increase) or less than the average minus three standard deviations (decrease). If a gap between potential segments was less than half the size of the total distance covered by neighboring segments, then the entire region was considered a single CNV event.
Final mutant deletion and addition regions were determined after filtering each CNV event against regions of natural heterogeneity determined through SNP genotyping and CGH analysis of mutants within the population. Genes (ftp://ftp.jgipsf.org/pub/JGI_data/phytozome/v4.1/Gmax/annotation/Glyma1.gff2) that overlapped CNV events were determined using a custom Perl script. Paralogous genes in soybean were identified using BLAST (Altschul et al., 1990), DAGChainer (Haas et al., 2004, and selection of gene pairs from synteny blocks with average Ks values between 0.03 and 0.60.

CGH analysis of natural genetic heterogeneity within the mutant population
The standard deviation from the average corrected log 2 ratio of Cy3 (sample) to Cy5 (control) intensities was calculated at each probe position for each CGH array. After the average absolute value of the above was calculated for each probe position across thirty CGH arrays, the 95 th percentile border (1.103589) was calculated across 696,139 unique probe positions, and the median value across each eleven-probe sliding window was determined. Regions with median values that peaked above the 95 th percentile border were candidate regions of genetic heterogeneity highlighted for further examination. Genomic DNA preparations from four mutants, PO1, PO8, VP1, and VP5, were extracted using the Qiagen (Valencia, CA) Plant DNeasy system. Library preparation, exome capture, amplification, and high-throughput sequencing of exome-captured libraries were performed as previously described (Haun et al., 2011). The Illumina Solexa 76-base paired end short read sequences were aligned to the soybean genome sequence version 4.1 (Gmax.main_genome.scaffolds assembly; ftp://ftp.jgipsf.org/pub/JGI_data/phytozome/v4.1/Gmax/assembly/sequences/) using software SOAP2. The unique alignment allowed for a maximum of two mismatches. The Glyma v5.0 annotation file Glyma1_highConfidence.gff3 (2/8/2010) was used for exon annotations. There are 55,787 mRNAs and a total of 345,213 exons in this high confidence annotation. After unique alignment of paired read sequences to the reference soybean genome, the number of reads in any direction at each exon in each gene was counted using a custom Perl script. For read counts, a minimum of 70 out of the 76 bases of read sequence was required to overlap the reference exon sequence.

Exome resequencing and data analysis
The count number was globally normalized by dividing each number by the total counts in the sample. For visualization of exome differences, one count was added to each value, and the log 2 ratio of the normalized sample over control count number was calculated and plotted.

PCR analysis
Select regions with detected deletions were chosen for confirmation by PCR. Primers were designed within CGH probes that flanked the detected deletion region for a PCR product that spanned the deletion site. The PCR product was gel-extracted and sequenced. Alignment of the PCR product sequence to the reference G. max genome sequence assembly (www.phytozome.net) was performed to determine the exact breakpoint borders. The following primers were used for PCR confirmation: VP1 5'-GTA AGT AGC CTA CGC ATG ACC-3'   The normalized log 2 ratio of sample to control data is plotted as the median across 11 probe datapoints across chromosome positions. Results from each array are color-coded for mutants PO1 through PO8. PO1-PO3, PO8 = high seed protein. PO4 = low seed oil and high seed raffinose and sucrose. PO5 = low seed oil. PO6 = low seed protein. PO7 = high seed oil. The gray overlay represents CGH data from control vs. control comparative genomic hybridizations.
Colored regions above and below the control regions potentially represent copy number change differences. The y-axis scale is in terms of the number of standard deviations from average with the segment threshold for deletions or duplications at +/-3. set. RN3 = hypernodulating. The gray overlay represents CGH data from control vs. control comparative genomic hybridizations. Colored regions above and below the control regions potentially represent copy number change differences. The y-axis scale is in terms of the number of standard deviations from average, and the segment threshold for deletions or duplications is +/-3. plot. The absolute value of the normalized log 2 ratio of sample to control intensities at each probe position was calculated for each of thirty CGH arrays. The median value across an elevenprobe sliding window was calculated using the average of the above at each probe position, and values that peaked above the 95 th percentile border (1.103589) were plotted as regions of high variation detected by CGH (blue diamonds). BLAST positions of SNP markers from the Illumina Goldengate SNP genotyping platform that showed variation across the soybean fast neutron population are indicated (red bars). The majority of CGH and SNP genotyping results coincide (high stacks of blue diamonds above most red bars), Indicating that these are likely regions of M92-220 intravarietal heterogeneity. Each colored dot represents an exon in a high-confidence gene call. The color gradient indicates the lowest (red) to highest (blue) amount of read count evidence for an exon in sample PO1 compared to the control. The absence of sequence evidence for exons at ~48.7 Mb is shown and parallels the deletion found by CGH in (A). C) The corrected log 2 ratios of sample PO8 to control intensities are shown for chromosome 16 where a deletion is detected at ~28.1 Mb. D)

Supplemental
The normalized exome resequencing log 2 ratios of sample PO8 to control exon counts are displayed for chromosome 10. Each colored dot represents an exon in a high-confidence gene call. The color gradient indicates the lowest (red) to highest (blue) amount of read count evidence for an exon in sample PO8 compared to the control. The absence of sequence evidence for exons at ~28.1 Mb is shown and parallels the deletion found by CGH in (C). This genetic marker locus in VP5 cosegregates with progeny displaying the short petiole phenotype (+) and is not found in wild-type (WT) or in progeny without the short petiole phenotype (-) shown in (F).