Nucleotide Polymorphism in the Wheat Transcriptional Activator Spa Influences Its Pattern of Expression and Has Pleiotropic Effects on Grain Protein Composition, Dough Viscoelasticity and Grain Hardness

Storage protein activator (SPA) is a key regulator of the transcription of wheat ( Triticum aestivum L.) grain storage protein (GSP) genes and belongs to the Opaque2 transcription factor subfamily. We analyzed the sequence polymorphism of the three homoeologous Spa genes in hexaploid wheat.). The level of polymorphism in these genes was high particularly in the promoter. The deduced protein sequences of each homoeolog and haplotype show >93% identity. Two major haplotypes were studied for each Spa gene. The three Spa homoeologs have similar patterns of expression during grain development with a peak in expression around 300 degree days after anthesis. On average, Spa -B is 10 and 7 times more strongly expressed than Spa -A and Spa -D, respectively. The haplotypes are associated with significant quantitative differences in Spa expression, especially for Spa -A and Spa -D. Significant differences were found in the quantity of total grain nitrogen allocated to the gliadin protein fractions for the Spa -A haplotypes, whereas the synthesis of glutenins is not modified. Genetic association analysis between Spa and dough viscoelasticity revealed that Spa polymorphisms are associated with dough tenacity, extensibility, and strength. Except for Spa-A , these associations can be explained by differences in grain hardness. No association was found between Spa markers and the average single grain dry mass or grain protein concentration. These results demonstrate in planta Spa is involved in the regulation of GSP synthesis. The associations between Spa and dough viscoelasticity and grain hardness strongly suggest Spa has complex pleiotropic functions during grain development. These results indicated that the analysis of just one polymorphic site per gene can be used to distinguish between the main haplotypic groups. An additional SNP can be used to distinguish between the two haplotypes of Spa-A that constitute the largest haplotypic group (Supplemental Fig. S2A).

Proteins are the most important components of wheat (Triticum aestivum L.) grain governing its end-use value. Grain storage protein (GSP) composition is known to determine dough cohesiveness and viscoelasticity (reviewed by Weegels et al., 1996) with highly elastic (strong) dough being suited for making bread and more extensible (weak) dough for making cakes and biscuits. The most abundant GSPs in wheat are the gluten-forming gliadins and glutenins, which account for 60% to 80% of total grain protein, respectively. Gliadins are a mixture of monomeric proteins classed into four subgroups based on their amino-acid sequences and MWs (Wieser, 2007). Glutenins are composed of high-molecular-weight (HMW-GS) and low-molecular-weight (LMW-GS) subunits, which form very large macropolymers during grain desiccation (Don et al., 2006). It is generally accepted that glutenins have a prominent role in strengthening wheat dough by conferring elasticity, while gliadins contribute to the viscous properties of dough by conferring extensibility.
Because of the central role of GSP in determining dough processing properties, the loci coding for these proteins have been extensively studied. Analysis of alleles at the 12 main GSP loci revealed a high level of genetic and biochemical diversity (Zhang et al., 2003;2004;Ravel et al., 2006b) and direct relationships have been described between allelic variation at these loci and dough processing properties (e.g. Branlard et al., 2001;Békés et al., 2006). This information is now used in most wheat breeding programs. Significant quantitative genetic variations in GSP composition have also been reported for wheat (e.g. Graybosch et al., 1996;Huebner et al., 1997). However, only a few quantitative trait loci have been identified (Guillaumie et al., 2004;Charmet et al., 2005;Ravel et al., 2006c) and so far no gene or allele controlling natural variations in GSP composition has been identified.
The synthesis of GSPs during grain filling is controlled by several mechanisms including transcriptional and post-transcriptional modifications but it is generally thought that GSPs are primarily regulated through a network of interacting transcription factors (TFs; Verdier and Thompson, 2008). Three important cis-motifs are conserved in GSP promoters from wheat, barley, maize and rice. These are the GCN4-like motif (GLM) and the prolamin box (PB), which together constitute the endosperm box (EB;Colot et al., 1989), and the AACA motif (Zheng et al., 1993). The PB and the AACA motifs confer endosperm specific expression (Colot et al., 1987;Yoshihara et al., 1996), whereas GLM is thought to play a role in the nutritional regulation of prolamin gene expression (Müller and Knudsen 1993).
Several TFs have been reported to specifically interact with the EB and the AACA motif.
Recently, Moreno-Risueno (2008) reported the role of a B3-type TF (FUSCA3) binding to the RY box in the regulation of GSP genes. Full activation of GSP genes is achieved by the synergetic interaction of different TF combinations, suggesting that they are part of a regulatory network. Supporting this, complex mutual interactions through the formation of binary or ternary complexes mediated by PBF or other DOF proteins have been demonstrated for several of the TFs described above (e.g. Rubio-Somoza et al., 2006a;2006b;Yamamoto et al., 2006). In maize, Opaque2 has strong pleiotropic effects on grain size and composition and is thought to be pivotal in carbon allocation and amino acid synthesis during grain development (Hunter et al., 2002;Manicacci et al., 2009).
In wheat, the regulation of GSP gene expression by SPA has been established in vitro and in vivo in tobacco and maize leaf protoplast only for a LMW-GS gene (Conlan et al., 1999). The GLM motif is well conserved in all the prolamin genes analyzed so far (Halford and Shewry, 2007), so it is possible that SPA also regulates the expression of other wheat GSP genes. This needs to be experimentally confirmed for each GSP gene since in maize Opaque2 appears to regulate the Z22 class of α-zeins specifically, despite the EB being present in other groups of zein genes.
Grain protein composition was one of the main targets in the genetic modification of wheat in the late 1990s with the focus on the overexpression of HMW-GS (e.g. Blechl and Anderson, 1996) and LMW-GS (Masci et al., 2003) and more recently on the silencing of γ-gliadins (Gil-Humanes et al., 2008). An alternative strategy to modify grain protein composition would be to alter the transcriptional network regulating GSP gene expression. While it remains to be demonstrated whether variations in GSP transcriptional activators affect GSP composition and dough viscoelasticity, a recent genetic association study showed that natural polymorphism in maize Opaque2 is associated with grain Lys content (Manicacci et al., 2009).
Environmentally-induced changes in GSP composition are associated with the altered expression of genes encoding GSPs in response to signals that indicate the relative availability of nitrogen and sulphur (Peak et al., 1997;Chiaiese et al., 2004;Hernandez-Sebastia et al., 2005).
These signals trigger transduction pathways in developing grains that, in general, balance the storage of nitrogen and sulphur to maintain a homeostasis of the total amount of protein per grain (Tabe et al., 2002;Islam et al., 2005). It is emerging that these regulatory networks, still poorly understood at the molecular level, result in allometric relationships between the quantity of nitrogen per grain and the quantity of the different GSPs (Sexton et al., 1998;Landry, 2002;Triboi et al., 2003). So the gene regulatory network involved in the control of GSP synthesis is coordinated such that the grain reacts predictably to nitrogen availability giving rise to a metamechanism at the level of the grain (Martre et al., 2003). Changes in the expression of Spa might allow a shift in grain nitrogen allocation and therefore modify the GSP composition independently of grain protein concentration.
In this study, we tested this hypothesis by investigating the nucleotide polymorphism of the three homoeologous copies of Spa paying special attention to their 5' flanking sequences. The expression of each Spa homoeolog was analyzed with respect to the native haplotypes detected for the promoter. The relationships between Spa haplotypes and GSP composition were analyzed.
Finally the genetic relationship between Spa promoter haplotypes and dough viscoelastic properties was assessed in an association study. The associations between Spa polymorphisms and grain hardness were also analyzed.

The Promoter of Spa Shows a High Level of Polymorphism
To assess the nucleotide diversity of wheat Spa loci, Spa genes were amplified and sequenced from 42 wheat accessions sampled from the INRA worldwide core collection with the aim of maximizing the diversity observed (Supplemental Table S1; Balfourier et al., 2007). The length of the consensus sequence for the three homoeologs is between 4,708 and 6,750 bp and starts 1,009 to 2,382 bp upstream of the start codon. Based on the BAC sequences, the consensus sequences for Spa-A, -B and -D, the three homoeologous copies of Spa, terminated 63 bp and 98 bp upstream and 218 bp downstream of the stop codon, respectively. The pairwise comparison between each SPA homoeologous protein showed they are very similar (> 93%).
For Spa-A, we detected 41 single nucleotide polymorphisms (SNPs) representing on average one mutation every 134 bp (Supplemental Fig. S1A). Twenty percent of the SNPs were found in only one accession (singleton). Twenty-seven SNPs are located upstream of the start codon and three are in either exon 1 or 6. The two SNPs in exon 6 are non-synonymous (they lead to amino acid changes). One SNP in the first intron modifies the splicing site. We also observed six insertion-deletion polymorphisms (indels), all of them being in non-coding regions. The longest indel also contained one SNP.
We found 40 SNPs for Spa-B, i.e. an average of one SNP every 118 bp (Supplemental Fig.   S1B). Thirteen are upstream of the start codon and six are in exons 1, 2 or 6. The three SNPs in exon 1 and the distal one in exon 6 are non-synonymous. The most significant change at the protein level is likely to be the G/C change in exon 1 that creates a stop codon (Guillaumie et al., 2004). Of the four indels found, one is upstream of the translation start site (276 bp long it includes two SNPs) and the others are in introns.
Spa-D contained 106 mutations (eight are singletons), i.e. an average of one SNP every 64 bp (Supplemental Fig. S1C), 44% of which are in the 5' flanking region. Of the 21 SNPs in the coding sequence, 12 were non-synonymous. We also found 14 indels, all but one are in non coding regions. The two longest indels (173 bp and 214 bp long) are upstream of the translation start site. Annotation of the start site was complicated as in some cases it is interrupted by a 21-bp indel.
Most of the mutations in the A, B and D homoeologs (72.5%, 57% and 68%, respectively) are transitions. The nucleotide diversity (π) averaged 0.00183 and it is 1.5 and 1.8 times higher for Spa-D than for Spa-A and -B, respectively (Table 1). The nucleotide diversity of Spa-D was particularly high in the coding region compared to Spa-B and -A. The profiles of diversity for each homoeolog were similar (Fig. 1). The highest π values were observed in the promoter with a mean value of 0.0031. The bZIP domain which spanned from exon 3 to exon 6 was highly conserved and contained no polymorphism in Spa-A and -B and three SNPs, two synonymous plus one non-synonymous, in Spa-D. The basic region was perfectly conserved, but there are five amino acid changes in the leucine zipper domain (Fig. 2). One of these changes is at the position of an important hydrophobic residue in Spa-D. However, as it corresponds to the substitution of a hydrophobic residue (Ala) by another hydrophobic residue (Val), the functional consequences may be minimal.

Disequilibrium (LD)
The haplotype diversity is 1.9 and 2.4 fold higher for Spa-A than for Spa-B and -D, respectively (Table 1). Nine, seven and six haplotypes were identified for Spa-A, -B and -D, respectively. For each homoeolog, we observed two main haplotypic groups of unequal size. For Spa-A and -B, these two groups mostly reflect differences between the European and Asian accessions (Supplemental Fig. S2A and B). For Spa-A, the European haplotypic group was subdivided into two main haplotypic groups with similar frequencies. This organization explains why the haplotype diversity value found for this gene was so high. For Spa-D, all but five accessions belonged to the same haplotypic group (Supplemental Fig. S2C). The second group consisted of genotypes from Near-Eastern and South-East European countries and Central America, all belonging to haplotype 2.
To determine intragenic linkage disequilibrium (LD) we calculated mean square allelefrequency correlation (r 2 ) values and determine the number of significant pairwise comparisons by using Fisher's exact test ( Table 2). The level of LD was high along each homoeologous sequence. For Spa-A, -B and -D, respectively 38%, 26%, and 96% of the pairs of sites were in complete LD. LD analysis confirmed the organization of polymorphic sites into a few haplotypes.
In Spa-A, only 4% of polymorphic sites were in low LD (r 2 < 0.2; Table 2). For Spa-B and -D, all the sites were in high LD. These results indicated that the analysis of just one polymorphic site per gene can be used to distinguish between the main haplotypic groups. An additional SNP can be used to distinguish between the two haplotypes of Spa-A that constitute the largest haplotypic group (Supplemental Fig. S2A).

Spa Polymorphisms are Associated with Dough Viscoelasticity
Based on the above results, we chose a single site per homoeolog for genotyping the core collection resulting in two haplotypes per Spa homoeologous copy (Supplemental Table S1). For Spa-A and -D, we genotyped indels 147-108 bp (ins3) and 21-1 bp (ins6) 5' to the UTR, respectively (Supplemental Fig. S1A and C). For these two homoeologs, all the lines characterized by the deletion were assigned as haplotype 1 of the promoter. For Spa-B, we genotyped a non-synonymous G/C mutation in the first exon (SNP 14 in Supplemental Fig. S1B).
Haplotype 1 was characterized by the G allele. The six haplotypes were genotyped for the 372 accessions of the core collection to study their effects on average single grain dry mass, grain protein content and concentration, dough viscoelasticity and grain hardness. To avoid false positives, the genetic structure of the core collection was taken into account as covariate.
None of the markers genotyped was associated with the average single grain dry mass, the quantity of protein per grain or the grain protein concentration (Table 3, Model I). However, we found significant associations at 1% between the markers and alveograph parameters. The polymorphism of the Spa-A promoter was highly significantly associated with dough tenacity (P), extensibility (L) and grain hardness. On average, L was 1.4 times higher for haplotype 1 than for haplotype 2, while P of haplotype 1 was only 74% that of haplotype 2. In good agreement with this, we found no significant effect of Spa-A haplotypes on dough strength (W). For Spa-D, haplotype 2 (insertion) increased P, W and grain hardness by 28%, 47%, and 29%, respectively.
Although not significant for Spa-B, the haplotypes at the three loci with the lower level of Spa transcripts had harder grain textures than the haplotypes with the higher level of Spa transcripts (Table 3, Model I). Grain hardness was positively correlated with P and W (r 2 = 0.63 and 0.49, respectively; both P-values < 0.0001), but not with L (r 2 = 0.073, P-value = 0.161). Once grain hardness was introduced in the general linear model as covariate (Table 3, Model II), Spa-B and -D polymorphism was no longer associated with dough viscoelasticity. However, this confirmed the significant effect of Spa-A on L.

Spa Polymorphism changes its Level of Expression during Grain Development
In wheat, under normal growing conditions GSPs accumulate linearly between 250 and 700 degree days (°Cd) after anthesis (e.g. Triboi et al., 2003). We quantified the expression of each Spa gene (homoeologs and haplotypes) by qRT-PCR. No transcript was detected in the ovary ( Fig. 3). Between 150 and 400°Cd after anthesis, transcripts of the three Spa homoeologs were detected in all the accessions studied. In all the accessions, the transcript levels of Spa-B were significantly higher than those of Spa-A and Spa-D (fold differences in transcript levels ranged from 3.1 to 31.0 and the median fold differences in transcript abundance were 9.8 for Spa-B versus Spa-A and 6.9 for Spa-D versus Spa-A). The promoter haplotypes for Spa-A and -D were associated with significant quantitative differences in transcript levels, but their patterns of expression were similar ( Fig. 3A and C). Spa-A transcript levels peaked at 300°Cd and were on average 2.9 times higher for haplotype 1 than for haplotype 2 (Fig. 3A). During early to mid grain-filling, Spa-D transcripts were maintained at a more constant level than Spa-A and -B transcripts (Fig. 3C). For Spa-D, the transcript levels were on average 2.5 times higher for haplotype 1 than for haplotype 2. For haplotype 1 the peak in transcript accumulation was 50°Cd later and the peak value 1.3 times higher than for haplotype 2.

Allocated to Gliadins
The protein composition of mature grains was analyzed for 20 accessions representing the different combinations of Spa haplotypes (Supplemental Table S2). The total quantity of nitrogen per grain varied twofold across all accessions (0.54 to 1.09 mg N grain -1 ). There was no significant effect of Spa-B and -D haplotypes on the relationship between the quantity of each protein fraction and the total quantity of nitrogen per grain or on the gliadin-to-glutenin ratio (data not shown). However, clear differences were observed between Spa-A haplotypes in the allocation of grain nitrogen to the non-prolamin and gliadin protein fractions. The scaling exponent (α RMA ) of the non-prolamin to total nitrogen relationship were not significantly different (P = 0.626) for the two haplotypes (Fig. 4A), with a common fitted slope of 1.000 (95% CI = 0.906-1.096). Therefore, the quantity of non-prolamin proteins varied in direct proportion to the total quantity of nitrogen. The allometric constant was significantly higher (P < 0.001) for the accessions with the deletion in the Spa-A promoter (haplotype 1; Supplemental Table S3) meaning that for any given quantity of total nitrogen per grain haplotype 1 contained more nonprolamin proteins than haplotype 2. The scaling exponent of the gliadin to total nitrogen relationship was significantly higher than one for both haplotypes, indicating that gliadins varied disproportionally with the total quantity of nitrogen. Most interestingly the scaling exponent was significantly higher (P-value = 0.029) for haplotype 1 (α RMA = 1.607; 95% CI = 1.245-2.074) than for haplotype 2 (α RMA = 1.140; 95% CI = 1.014-1.281). The allometric parameters of the glutenin to total nitrogen relationship were similar for the two haplotypes and the common scaling exponent was not significantly different from one α RMA = 0.962; 95% CI = 0.8603-1.0752; Fig. 4C). As a consequence, the scaling exponents of the gliadin to glutenin relationship were significantly different for the two haplotypes (insert Fig. 4C) and were very close to that estimated for the gliadin to total nitrogen relationship (Supplemental Table S3). The average gliadin-to-glutenin ratio was significantly higher (P-value < 0.001) for haplotype 2 (0.74 ± 0.04) than for haplotype 1 (0.55 ± 0.07), independently of the total quantity of nitrogen per grain.

DISCUSSION
Here we demonstrated the high level of sequence diversity in the Spa genes of wheat. The many polymorphisms found in the promoter regions lead to differential expression during grain development of the haplotypes identified. The germplasm analyzed here included accessions from different geographical origins and represents a wide range of the natural diversity of Triticum aestivum (Balfourier et al., 2007;Horvath et al., 2009). Significant associations were found between the alleles identified and dough viscoelasticity and grain hardness. Spa-A allelic variation also has a significant effect on the allocation of grain nitrogen to gliadin protein fractions and on the gliadin-to-glutenin ratio. As the translated sequences indicate that SPA amino acid sequences are highly conserved, we hypothesized that the associations detected can be explained by the observed differences in Spa expression.

The Spa Promoter Shows a High Level of Nucleotide Polymorphism
When the three wheat genomes are considered together, we found on average one SNP every 91 bp with a π value of 0.00183. A shorter sequence of Spa-B (2,858 bp) in a set of 27 accessions (Ravel et al., 2006b) was shown to have a similar level of polymorphism. The three homoeologous copies of Spa were more polymorphic than has been reported for structural wheat genes (Ravel et al., 2006b). Moreover, in Spa genes there was twice as much nucleotide diversity in the promoter region as in the coding region and the DNA-binding domain sequences were almost completely conserved. Such modular structures have previously been reported for other TFs (e.g. Purugganan and Wessler, 1994;Henry et al., 2005). It has been proposed that modification of the cis-regulatory regions of TFs is a predominant mode by which plant form and function evolve (Doebley and Lukens, 1998). Spa showed a much higher level of polymorphism than other wheat TFs from different families such as Pbf (Ravel et al., 2006a and b) or Gamyb (Haseneyer et al., 2008). The translated sequences of Spa revealed few amino acid differences between homoeologs and haplotypes. The fact that the high rate of nucleotide polymorphism was associated with a very low rate of variation at the protein level might indicate that these genes are under strong constraints that limit changes to the protein sequence. The high level of intragenic LD observed for the three homoeologs suggests a low frequency of intragenic recombination consistent with the above hypothesis. Conversely, the high nucleotide diversity of Spa and promoters might result from selection pressure related to grain phenotypes associated with quantitative and/or spatiotemporal differences in their expression. The main haplotypic groups for Spa-A, and to some extent for Spa-B, correspond to the two main routes of migration of wheat from the Middle East (Feldman, 2001). The divergence of Spa-A and -B promoter sequences between European and Asian accessions support the hypothesis that adaptive divergence is responsible for the differentiation of the promoter between these geographical origins. It is generally observed that the level of diversity of the D genome is lower than that of the A and B genomes (e.g. Ravel et al., 2006b, Chao et al., 2009. However, even when the promoter region was not considered, Spa-D showed a higher level of polymorphism than Spa-A and -B. SNPs in Spa-D are organized in two main haplotypic groups supporting the existence of at least two different origins of the D genome as has already been proposed (Caldwell et al., 2004).

Storage Protein Composition
It has been estimated that for between 20% and 29% of hexaploid bread wheat unigene loci at The association study showed no effect of Spa haplotypes on average single grain dry mass or grain protein concentration. However, the analysis of the GSP composition of 20 accessions clearly showed changes in the allocation of grain nitrogen to the non-prolamin versus gliadin protein fractions associated with Spa-A. In good agreement with this result, in maize Opaque2 mutations are associated with less Lys-poor α-zein, which is compensated for by having more Lys-rich globulin and legumin-like storage proteins (Hunter et al., 2002;Gibbon and Larkins, 2005). Surprisingly, the absolute allocation of grain nitrogen to the gliadin protein fraction was higher in haplotype 2 which had fewer Spa-A transcripts than haplotype 1. However, the scaling exponent of the gliadin to total nitrogen relationship was higher in haplotype 1 than in haplotype 2. This latter result suggests that the synthesis of gliadin is more responsive to nitrogen availability in haplotype 1 than in haplotype 2, in good agreement with the putative role of SPA in the nutritional regulation of GSP synthesis (Müller and Knudsen 1993).
The lack of variation in the allocation of grain nitrogen to the glutenin protein fraction is also surprising considering that it has been shown that Spa-B can activate the expression of a LMW-GS gene in both maize and tobacco leaf protoplasts (Conlan et al., 1999). These results may also be the consequence of the spatial regulation of expression of Spa. HPLC separation and quantification of glutenin subunits and gliadin protein classes could be used to determine if there are compensatory mechanisms between different glutenin subunits and which gliadin classes are most affected by Spa-A haplotypes. A consequence of the invariable allocation of grain nitrogen to the glutenin protein fraction is that the gliadin-to-glutenin ratio increases disproportionally with the total quantity of nitrogen per grain. The fact that nucleotide polymorphisms at the Spa-A loci resulted in clear changes in the scaling relationship between gliadins and total nitrogen supports the hypothesis that SPA is part of a transcriptional regulatory network of GSP and that the scaling relationships analyzed here underlie the dynamics structuring this regulatory network.

Spa Polymorphism Is Associated With Grain Hardness and Dough Viscoelasticity
The association study showed nucleotide polymorphisms at Spa-A and -D have a significant effect on grain hardness. The soft texture of Opaque2 mutants has been related to increased synthesis of γ-zeins and differences in starch structure (Gibbon et al., 2003;Gibbon and Larkins, 2005). Similar modifications could be involved in the association reported here between Spa polymorphism and grain hardness. Differences in puroindoline gene expression may explain the association between Spa and grain hardness. Although it is not associated with PB, a GLM motif is present in the promoter of PinA coding for the puroindoline protein expressed in the endosperm (Evrard et al., 2007).
Grain hardness is well known to affect dough viscoelasticity independently of GSP composition (Eagles et al., 2006). The effect of Spa-D polymorphism on dough viscoelasticity is most likely related to changes in grain hardness, since the association was not significant when the grain hardness was introduced in the statistical model.. Conversely, significant effects of Spa-A markers on alveograph parameters were observed even when grain hardness was taken into account as a covariate. Although Spa genes are located at less than 1.3 cM from the loci coding for the HMW-GS (Glu-1; Guillaumie et al., 2004), we found no LD between the electrophoretic diversity of HMW-GS and Spa markers (data not shown). This strongly suggests that polymorphisms at the Glu-1 loci are not responsible for the associations observed between Spa and dough extensibility. Numerous studies have shown that dough extensibility increases as the gliadin-to-glutenin ratio increases (e.g. Wieser and Kieffer, 2001). It is thus surprising that the Spa-A allele associated with a higher gliadin-to-glutenin ratio was also associated with lower dough extensibility. Analysis of the composition of the gliadin protein fractions in the accessions analyzed here could give some insight into this unexpected result. We can neither rule out the possibility that the association between dough extensibility and Spa-A polymorphism was not caused by the observed changes in GSP composition, but to their interactions. Opaque and floury mutants show higher levels of peroxidase gene expression and the unfolded protein response is activated in these mutants (Gibbon and Larkins, 2005). If such responses are induced in wheat endosperm in response to Spa polymorphism they could have significant effects on GSP polymerization and therefore on dough viscoelasticity.

Materials
This study was based on the INRA worldwide core collection of hexaploid wheat (Triticum aestivum L.) consisting of 372 accessions (Balfourier et al., 2007). The population structure of this core collection is characterized by five ancestor groups (Horvath et al., 2009). The whole core collection has recently been phenotyped for several quality traits (Bordes et al., 2008).
Average single grain dry mass (mg DM grain -1 ), wholemeal flour protein concentration (mg protein g -1 DM; determined by near infra-red spectroscopy), the total quantity of protein per grain (mg N grain -1 ; calculated from average single grain dry mass and wholemeal flour protein concentration), grain hardness (dimensionless; determined by near infra-red spectroscopy), and dough strength (W, J), tenacity (P, mm H 2 O) and extensibility (L, mm; determined using Chopin alveography) were the phenotypes measured in this genetic association analysis.
To study the nucleotide diversity of Spa genes, a subset of 42 lines from different origins and ancestor groups (Supplemental Table S1) was sampled from the core collection as explained in Haseneyer et al. (2008). Genomic DNA was extracted (Tixier et al., 1998) from leaves harvested from a pool of six 3-wks-old seedlings per accession. Seeds used in this study were provided by the INRA-Clermont-Ferrand Genetic Resources Centre (http://www.clermont.ina.fr/umr-gdec) and came from a single self-pollinated head. All plantlets for a given accession were thus considered to be genetically identical.
Based on the polymorphism found in the promoter regions of Spa genes, seven accessions of the core collection were selected for expression studies (Supplemental Table S1). Seeds were sown in 294-cm 3 pots filled with a peat moss mix and were kept in a greenhouse for 2 wks. The plants were then vernalized for 8 wks in a growth chamber where the temperature was maintained at 4 ± 1°C, the photosynthetic photon flux density (PPFD) at the top of the canopy was 43 mmol m -2 d -1 during the 8-h photoperiod, and the relative humidity was maintained at 40%. After vernalization, the plants were transplanted to soil beds in a greenhouse with a distance between plants of 30 cm and a row spacing of 0.5 m, where they received a mean total daily PPFD of 26 mol m -2 d -1 (80% of ambient solar radiation), with daily maximum/minimum air temperatures averaging 21°C/15°C and day/night relative humidity averaging 50%/60%. Plants were watered as needed to maintain a soil water potential in the rooting zone higher than -0.1 MPa and received 30 g m -2 of N:P:K (17:17:17) fertilizer when transferred to the greenhouse. Air temperature next to the wheat ears was measured and recorded every 30 min using four HOBO H8 data loggers (Onset Computer Corp., Bourne, MA). Ears were tagged when the anthers of the central florets appeared (anthesis). Two ears per plant were harvested at 0 (ovary), 150, 200, 250, 300 and 400°Cd after anthesis. All samples were taken at 11.00am to avoid possible diurnal effects on gene expression. Three independent biological replicates were used. The basal grains on the spikelets of the central third of the ear were collected, the embryo and the external pericarp were rapidly removed and the endosperm was frozen in liquid nitrogen and stored at -80°C prior to RNA extraction.
Based on the different polymorphisms found in the promoter region of Spa genes, the grain protein composition was analyzed for 20 accessions of the core collection (Supplemental Table   S2

Polymorphism Analysis and Genotyping
For each Spa gene, several overlapping fragments were amplified and sequenced.
Homoeolog-specific primers in the coding and non-coding regions were designed using the homoeologous sequences at Spa loci in the Renan cultivar BAC library (Salse et al., 2008). The oligonucleotides used for amplification and sequencing and the annealing temperatures used in the touchdown program for PCR are given in Supplemental Table S4. Sequences were aligned using the Staden package (Staden et al., 2000) and consensus sequences were annotated using the accession number Y09013 as the reference mRNA (Albani et al., 1997).
Genotyping was based on the protocol described for SSR (Nicot et al., 2004) using forward primers 5'-tailed with the M13 forward consensus sequence (Supplemental Table S5). For SNPs, we designed primer pairs where the first primer was allele-specific and the second was genomespecific. For each SNP, we genotyped the 372 accessions of the core collection with two primer pairs (one for each allele). For each indels, we designed one genome-specific primer pair. The presence or absence of amplification products was read using GeneMapper® software version 3.7 (Applied Biosystems, Foster City, CA). For a given SNP, lines giving no signal with primer pairs designed for each allele were recorded as missing data. For indels, the amplification products from different alleles differ in size. Amplification products were visualized using an ABI PRISM®3100 Genetic Analyzer (Applied Biosystems).

Quantification of the Expression of Spa Homoeologs
Total RNA was extracted from ovaries and developing grains (without the embryo and external pericarp; Khaled et al., 2005). Transcript levels of A, B, and D homoeologs of Spa and control genes were quantified by real-time PCR (qRT-PCR) with an ABI Prism 7900HT sequence detection system using Power SYBR® Green PCR Master Mix (Applied Biosystems).
Details of primer pairs used for qRT-PCR are given in Supplemental Table S6. For each Spa gene, the specificity of the primer pairs was ensured by confirming the products gave a single peak in real-time melting temperature curve and a single band after agarose gel electrophoresis and there was adequate amplification from Chinese Spring nullitetrasomic lines as expected according to the primers used. PCR efficiency was determined by using a sample dilution series as template (Rasmussen, 2001). Amplification plots and predicted threshold cycle values were obtained from three independent biological replicates with the SDS software version 2.1 (Applied Biosystems). In preliminary experiments, the average coefficient of variation of technical replicates was estimated to be 0.94% (for n = 1000 independent experiments of three technical replicates each).
The genes coding for actin, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), elongation factor 1 alpha (eF1α), β-tubulin, and 18S were used as internal controls. All primer pairs gave an amplification efficiency of 100 ± 10% and were comparable. The optimal number of internal control genes determined using the geNorm algorithm (Vandesompele et al., 2002) was four. GAPDH, eF1α, β-tubulin and 18S had the most stable expression during grain development and were thus chosen for normalizing the expression of Spa. Their geometric mean was calculated and the normalized quantity of Spa genes was then calculated (Pfaffl, 2004).
Primer pairs amplifying a cyclophilin gene and spanning one intron were used to check for the absence of DNA contamination.

Quantification of Grain Protein Fractions
Mature grains were ground to wholemeal flour using a 6800 Cyclotec mill equipped with a 0.75 mm sieve (FOSS A/S, Hillerød, Denmark). The non-prolamin (mainly albumin-globulin and amphiphilic proteins), gliadin, and glutenin protein fractions were sequentially extracted from 100 mg of wholemeal flour (Triboi et al., 2003). Supernatants containing each protein fraction (80 µL) were oven dried for 3 h at 45°C in 5 mm × 8 mm tin sample capsules and their total nitrogen concentration was determined with the Dumas combustion method (AOAC method n° 992.23) using a FlashEA 1112 N/Protein Analyzer (Thermo Electron Corp., Waltham, MA). Two independent flour samples were extracted in duplicate for each accession and growing year. Each extract was analyzed in duplicate.

Data analysis
The nucleotide diversity (π) within each Spa homoeolog was calculated as the average number of nucleotide differences per base pair between sequences (Nei, 1987). Haplotype diversity, π , and the mean pairwise differences were computed using the DNA-SP software version 4. 10 (Rozas et al., 2003). Sites with alignment gaps were also considered. The patterns of diversity were analyzed by the sliding windows method (window size 100 bp, step size 25 bp) using Tassel software version 9.3.1 (Buckler et al., 2006). Single nucleotide polymorphism (SNP) information is available at http://urgi.infobiogen.fr/Gnp.
Linkage disequilibrium (LD) for pairs of polymorphic sites with minor allele frequencies over 5% was estimated using squared allele-frequency correlations (r 2 ; Hill and Robertson, 1968). The general linear model procedure of the SAS software version 9.3.1 was used to analyze genetic association. To reduce the rate of false positives, the structure components (i.e. the contribution of each line to the five ancestor groups) of the core collection (Horvath et al., 2009) were used as covariates (Model I). As grain hardness is know to modify dough viscoelasticity (e.g. Eagles et al., 2006), the effects of Spa markers were then tested with grain hardness introduced in the model I as covariate (Model II). Significant genetic associations were judged at α = 0.01.
The allocation of grain nitrogen to protein fractions was analyzed on Log-transformed data using the allometric model Log(F i ) = Log(β RMA ) + α RMA × Log(Ntot), where Ntot is the total quantity of nitrogen per grain, F i is the quantity of the protein fraction i per grain, α RMA is the scaling exponent, and β RMA is the allometric constant. Data were Log-transformed because it puts the variables on a multiplicative scale allowing us to assess relative variability in a meaningful way. α RMA calculated on Log-transformed variables gives the proportional relationship between variables. Reduced major axis was used to estimate the parameters of the allometric equations.
Test for homogeneity of slopes between haplotypes and calculation of common slopes were done using a likelihood ratio method, with the test statistic closely approximating a chi-square distribution with Bartlett correction (Warton and Weber, 2002). Where nonheterogeneity of slopes was demonstrated, differences in elevation of slopes (y-intercept) were tested using a Wald statistic (Warton et al., 2006). Curve-fitting, confidence interval (CI) calculations, and statistical analysis of the differences in slopes and intercepts were done using SMATR software version 2 (Flaster et al., 2006). Statistical tests were significance tested at α = 0.05. reading of the manuscript. We also thank Rachel Carol from Emendo Bioscience (http://www.emendo.co.uk) for English improvement of the manuscript.

Supplemental Data
The following materials are available in the online version of the article.  Table S1. Haplotypes for the three homoeologous Spa gene of 42 accessions from the hexaploid wheat core collection used for nucleotide polymorphism and expression analysis. Table S2. Haplotypes for the three homoeologous Spa genes of the accessions used to analyze grain protein composition of mature grains. Table S3. Summary allometric analysis of grain protein composition for wheat accessions belonging to Spa-A haplotypes 1 and 2. Table S4. Oligonucleotides and PCR conditions used for amplification of overlapping fragments and sequencing of Spa genes. Table S5. Oligonucleotides and PCR conditions used for genotyping Spa genes. Table S6. Sequences of primers used for qRT-PCR.    Inserts show the data for Spa-A and -D with an expanded x-axis. Data are means ± 2 SE for n = 4 independent replicates per genotype. The number of genotypes analyzed for each haplotype is indicated in brackets after the haplotype number.    , and glutenin (C) protein fractions versus the total quantity of nitrogen per grain for Spa-A haplotypes of hexaploid wheat. Insert, quantity of gliadins per grain versus quantity of glutenins per grain. Lines are reduced major axis linear regressions of Log-transformed data for haplotypes 1 (dashed lines) and 2 (solid lines). Data are means ± 2 SE for two years each analysed in duplicate.