Recent Proliferation and Translocation of Pollen Group 1 Allergen Genes in the Maize Genome 1,[w]

The dominant allergenic components of grass pollen are known by immunologists as group 1 allergens. These constitute a set of closely-related proteins from the β -expansin family and have been shown to have cell-wall loosening activity. Group 1 allergens may facilitate the penetration of pollen tubes through the grass stigma and style. In maize the group 1 allergens are divided into two classes, A and B. We have identified 15 genes encoding group 1 allergens in maize, 11 genes in class A and 4 genes in class B, as well as 7 pseudogenes. The genes in class A can be divided by sequence relatedness into two complexes, whereas the genes in class B constitute a single complex. Most of the genes identified are represented in pollen-specific EST libraries and are under purifying selection, despite the presence of multiple copies that are nearly identical. The group 1 allergen genes are clustered in at least six different genomic locations. The single class B location and one of the class A locations show synteny with the rice regions where orthologous genes are found. Both classes are expressed at high levels in mature pollen but at low levels in immature flowers. The set of genes encoding the maize group 1 allergens is more complex than originally anticipated. If this situation is common in grasses, it may account for the large number of protein variants, or group 1 isoallergens, identified previously in turf grass pollen by immunologists.

During the spring and summer, grasses release large amounts of wind-dispersed pollen, causing hay fever and seasonal asthma in more than 25% of the population in temperate regions (Knox and Suphioglu, 1996a). Allergy symptoms are triggered by the release of allergens when the grass pollen comes in contact with the moist surface of the human respiratory tract (Knox and Suphioglu, 1996b). The allergenic proteins of grass pollen have been classified into different groups on the basis of antibody cross reactivity, physicochemical properties and sequence relatedness (Knox and Suphioglu, 1996a).
Group 1 allergens, along with group-5 allergens, are probably the most widespread inducers of human allergic disease, due to the frequency and strength of their IgE reactivity (Suphioglu, 2000). It has been estimated that more than 95% of the people who suffer from grass pollen allergies have antibodies that recognize group 1 allergens (Bhalla, 2003).
Grass group 1 pollen allergens are now considered part of a superfamily of plant cell wall proteins called expansins. In the current nomenclature, group 1 allergens are part of the EXPB or β-expansin family, one of four expansin families (Kende et al., 2004).
Thus, they may also be called pollen β-expansins. Expansins have the ability to induce rapid extension of isolated cell walls and enhance stress relaxation (Sampedro and Cosgrove, 2005). The first expansins were purified from young cucumber seedlings on the basis of their capacity to induce cell wall extension (McQueen-Mason et al., 1992).
When the first expansin sequences were obtained, it was recognized that they shared distant homology to grass group 1 allergens (Shcherban et al., 1995). It was later shown that group 1 allergens from maize indeed had cell wall loosening activity (Cosgrove et al., 1997).
Based on the analysis of the rice genome, it has been proposed that the lineage that gave rise to group 1 allergens (represented in this species by OsEXPB1, OsEXPB9, OsEXPB10 and OsEXPB13) originated in the monocots through a short range translocation (Sampedro et al., 2005). On the basis of phylogenetic analysis, group 1 allergens have been divided into two rather divergent classes, A and B (~60% identical at the protein level), both of which are present in many different grass species (Li et al., 2003). The origin and subsequent divergence of the two classes has been linked in rice to an ancient whole genome duplication event shared by most grasses and predating the 6 divergence of rice and maize (Sampedro et al., 2005). Class A and class B allergens from maize have similar wall-extension activity, showing a clear specificity for grass cells walls (Li et al., 2003).
The expression of group 1 allergen genes in maize is predominantly restricted to pollen grains (Broadwater et al., 1993;Wu et al., 2001). Mature grass pollen contains high amounts of the group 1 allergens and it has been suggested that the corresponding genes are probably expressed late in pollen development, coinciding with the accumulation of storage proteins and carbohydrates (Knox and Suphioglu, 1996b). However, in maize, transcripts for group 1 pollen allergens were detected in very low amounts in uninucleate microspores, and subsequently increased greatly after the first pollen mitosis (Broadwater et al., 1993).
On the basis of its pattern of expression and its activity, it was proposed that the grass group 1 pollen allergens loosen the walls of the stigma and style in order to help the elongating pollen tube reach the ovary (Cosgrove et al., 1997). Such a role could explain the unusual abundance, atypical pH optimum (5.5), and high solubility of the group 1 allergens, as well as their tendency to cause cell wall breakage, characteristics not seen in expansins involved in other growth processes (Li et al., 2003). Based on the crystal structure of a maize group 1 pollen allergen, it has been suggested that these proteins may act by detaching arabinoxylan chains from cellulose microfibrils (Yennawar et al., 2006).
The maize genome is the product of a polyploidization event that happened between 5 and 12 Mya, shortly after the split from the sorghum lineage, and was followed by extensive diploidization (Swigonova et al., 2004). Since the duplication event it has been estimated that at least 50% of duplicate genes have been lost, obscuring microsynteny in the process (Lai et al., 2004). The maize lineage diverged from rice, the only fully sequenced grass genome, about 50 Mya, during the early history of the Poaceae family and, although extensive gene collinearity can still be detected, considerable chromosomal rearrangements appear to have taken place (Salse et al., 2004).
The maize genome (n=10) contains around 2.4 Gbp, six times the size of rice, and at least 60% of the genome is composed of retrotransposons (Paterson et al., 2005). A number of sequencing projects, including sequencing of BAC ends, methyl-filtration and high-Cot selection libraries, as well as selected entire BACs, in addition to more than a quarter of a million cDNA sequences, have already provided a good representation of the maize gene content (Messing and Dooner, 2006). In parallel, a detailed physical map has been built and anchored to the genetic map (Gardiner et al., 2004). In its latest release, 7/19/05, this map covers 91% of the genome in 721 contigs (http://www.genome.arizona.edu/fpc/ maize/#top). The inbred line selected for genomic sequencing as well as the physical map is B73, which was developed by Iowa State University in 1972(Troyer, 1999.
In this study we combine the results of our own gene sequencing and localization with publicly available sequence data to construct a census of maize genes encoding group 1 of allergens. While all class B genes are highly similar, class A genes in maize can be classified into two differentiated complexes. We also show that maize pollen group 1 allergens are encoded by a large set of genes that has grown through extensive tandem duplication and transpositions throughout the genome.

Maize contains a large number of group 1 allergen genes
In order to elaborate a census of maize pollen β -expansin genes, we analyzed the genomic and cDNA sequences deposited in GenBank. Due to evidence of variations among cultivars, the detailed analysis of maize cDNA sequences was limited to those from libraries obtained from inbred line B73, because they can be directly compared with the available genomic sequences. Also included in the analysis were two genomic sequences obtained by us in the course of this work (GenBank acc. nos. DQ525684 and DQ525685). Since B73 is highly inbred and has an extremely low level of heterozygosity (Gethi et al., 2002), we took all confirmed sequence differences to represent separate genes. We ignored, however, all the sequence mismatches that could be explained by sequencing errors or amplification artifacts. With these considerations in mind, we were able to identify a total of 15 genes, 11 of which fall into class A and 4 into class B ( Table   I). All of the class B genes and 9 of the class A genes are represented in pollen specific EST collections. The assembled cDNA sequences cover the entire coding region for all the expressed genes and the same is true for all but one of the genomic sequences (EXPB10d). Taking into account that over 200 B73cDNA sequences are group 1 allergen fragments and that 11 of the 13 expressed genes are also represented in genomic sequences, it is unlikely that many more highly expressed genes remain to be discovered.
In addition to active or potentially active genes, we also found genomic sequences for 7 pollen β -expansin pseudogenes, that is, genes whose coding region contains stop codons or frame-shifting insertions or deletions. The details of which sequences correspond to each of the genes or pseudogenes can be found in Supplemental Tables I, II and III and a detailed explanation of how they were grouped is presented in the Supplemental Text.
In a previous study, four maize pollen β -expansin genes were named on the basis of cDNA sequences (Li et al., 2003). Two of these sequences ( A phylogenetic analysis of all the full length protein sequences of maize pollen β expansin genes can be seen in Figure 1 (see Supplemental Figure 1 for a DNA trees and pseudogene phylogeny). While all the class B genes are very similar (97.7% or higher DNA identity), the class A genes can be clearly separated into two distinct branches where the similarity between the genes in separate branches (85.3 to 87.2% DNA identity) is considerably lower than that inside each branch (93.7% and 97.7%). The two branches correspond to the previously identified EXPB10 and EXPB11 genes. For this reason we will refer to them in this work as the EXPB10 and EXPB11 gene complexes, while all the class B genes can be considered members of the EXPB1 complex, as indicated by the shading in Figure 1. In addition to rice, we included in our analyses sorghum, a species closely related to maize that has a large number of pollen cDNA sequences available.
There are more than 200 sorghum ESTs for pollen β -expansin genes in GenBank, thus providing a good sampling of the gene diversity in this species. The phylogenetic tree in Figure 1 includes all the sorghum full length assemblies found in the PlantGDB database (Dong et al., 2005). It clearly suggests that the EXPB10 and EXPB11 complexes diverged very recently, sometime after the maize lineage had split from sorghum, around 12 Mya (Swigonova et al., 2004). The remaining partial sorghum sequences are very similar to the full length assemblies (94% or more identity) and do not change this conclusion (see Supplemental Figure 1 and Supplemental Text for details).
One of the most surprising results of the sequence analysis is the large number of nearly identical genes that seem to be under purifying selection, i. e. selection to preserve the protein sequence. This is particularly the case for the EXPB11 complex, where 5 genes (EXPB11a to EXPB11e) produce identical mature proteins and show only synonymous changes outside the signal peptide ( Figure 2A). A similar pattern of purifying selection is seen in the other complexes (Supplemental Figures 2 and 3). The ratio of pairwise nonsynonymous to synonymous substitutions, excluding the signal peptide, ranges from 0 to 0.15 for class A genes and from 0 to 0.22 for class B (Supplemental Figure 4). In addition to the large number of active genes, the presence of several pseudogenes in the EXPB11 complex is further evidence of the high rate of gene duplication in this lineage. They all seem to be fairly recent duplications, showing 91.7 to 96.7% DNA identity with expressed genes. The oldest pollen β -expansin pseudogene seems to be ψ EXPB10e with 82.6% identity and a large portion of the sequence is deleted.
In addition to premature stop codons and/or frameshifting insertion or deletions, most pseudogenes show evidence of relaxed selection (Supplemental Figure 4).
Another important result of the sequence analysis is evidence of a gene conversion event in the case of EXPB1c. An interlocus conversion happens when the sequence of one gene, or part of it, is substituted for that of another related gene. In this case, the conversion tract in EXPB1c encompasses between 504 and 767 nucleotides and covers most of the coding sequence. This region is now identical to both EXPB1a and EXPB1b, the likely donor genes, while both the 5' and 3' ends show numerous mismatches (Supplemental Figure 3). An analysis using GENECONV (Padidam et al., 1999) strongly suggests that this pattern could be a product of gene conversion (P-Value: 1.4 × 10 -4 ). Even more revealing is the presence in inbred line W23 of what appears to be an unconverted version of EXPB1c, with 11 mismatches in the putative conversion tract (Supplemental Figure 3B). No conversion events could be detected in the other complexes.

Genomic localization of EXPB10 genes
Information regarding the localization of the different pollen β -expansin genes in the maize genome was obtained from three sources: database sequences, BAC library screening and FISH (fluorescence in situ hybridization). The results are presented in Table I and Figure 3. Both the sequenced BACs and many of the BACs whose ends contain full or partial pollen β -expansin genes are included in the physical map that covers most of the maize genome. This information is detailed in Supplemental Tables I,   II and III. In the case of the EXPB10 genes, a BAC that contains EXPB10a has been localized to the long arm of chromosome 9 (9L), while ψ EXPB10e can be found in three overlapping BACs located in the same region, but at a considerable distance ( Figure 3E).
Finally, EXPB10a has recently being sequenced from BAC c0129O19, in chromosome 3 (see below).
As part of the maize mapping project, BAC libraries were hybridized with a large number of short probes derived from ESTs (Gardiner et al., 2004). Among the probes selected were three derived from the cDNA sequences of EXPB1 (PCO124530), EXPB10 (PCO110981), and EXPB11 (PCO068046). We ordered a number of BAC clones that reportedly hybridized to these probes and screened them by PCR and Southern blots. A large number of them turned out to be false positives (Supplemental Table IV). On the other hand, four overlapping BACs were found to contain two genes from the EXPB10 complex, which we then proceeded to sequence ( Figure 2C). One of them, EXPB10d, is not represented in the databases by any other sequences. As for EXPB10c, we sequenced it in BAC c0154E21, and an identical sequence has been recently obtained from BAC c0129O19, confirming our hypothesis ( Figure 2C). These BACs have been mapped to 3S, the short arm of chromosome 3 ( Figure 3E). Finally, when FISH was performed using an EXPB10 probe, signals were observed on chromosomes 3S, 5L, and 9L although the signal on 9L was very faint ( Figure 3C). The signals on 9L and 3S confirm the previous results, while the signal in chromosome 5 could indicate the location of EXPB10b or of other as yet unidentified genes belonging to the EXPB10 complex.
An automated analysis of maize-rice synteny, based on BAC ends and probe hybridization data (Soderlund et al., 2006), detected extensive conservation of gene order between maize chromosome 9 and rice chromosome 3, suggesting that the locations of ψ EXPB10e could be orthologous to that of the single cluster of class A genes in rice. The region surrounding ψ EXPB10e has recently been sequenced, which allowed us to analyze in detail the conservation of microsynteny ( Figure 4A). We found that the order and orientation of the genes in this small region has been perfectly preserved since the separation of the maize and rice lineages. The main differences are the proliferation of class A genes in the rice lineage and the distribution of the maize genes over a much larger span than their rice counterparts. As for the other EXPB10 genes, the genomic location of EXPB10a seems to be orthologous to a region in rice chromosome 3 hundreds of genes away from the class A genes, while the location of EXPB10c and EXPB10d on 3S appears to be orthologous to rice chromosome 1, where no class A genes are found (see supplemental Text for details). Both locations are therefore likely to be products of independent transposition events. This is indicated by the use of white triangles in Figure   3E.

Genomic localization of EXPB11 genes
In the case of the EXPB11 gene complex, five active genes and two pseudogenes, are located in close proximity in BAC c0329D10, which has been partially sequenced ( Figure 2B). This BAC has been mapped to chromosome 5 ( Figure 3E). During the BAC library screening we also detected this same group of genes through Southern hybridization, in a series of BACs that overlap each other and also BAC c0329D10 ( Figure 2E). This result was further confirmed by FISH hybridization with an EXPB11 probe, which only produced a signal on the long arm of chromosome 5 ( Figure 3D). This location is orthologous to rice chromosome 2, and thus appears to represent another case of translocation.
An end sequence for BAC b0259K07, mapped to chromosome 2, is the only evidence for would not be detected. It is also possible that the position of this BAC was mis-assigned.
At least two more EXPB11 genes and two other pseudogenes have not yet been localized.
It is possible that they are also part of the cluster of EXPB11 genes in 5L, since the sequencing of c0329D10 is incomplete and that of surrounding BACs has not started.

Genomic localization of EXPB1 genes
Two genes in this complex that have been mapped through genomic sequencing, double peaks where EXPB1a and EXPB1b have mismatches, indicating that all four known class A genes are located in the same genomic region ( Figure 4B).
In the FISH experiments, probes against both EXPB1a and EXPB1c genes produced similar results, hybridizing to the long arm of chromosome 9 ( Figure 3A). The same result was obtained with a probe designed against EXPB9a ( Figure 3B). This information confirms that all of the class B genes in maize are likely located in a single cluster in chromosome 9, although distributed over a considerable distance. In the case of class B genes, the automated synteny analysis shows that the region of chromosome 9 where maize EXPB1 genes are located is orthologous to the section of rice chromosome 10 that contains OsEXPB9, the single class B gene in this species. However, there is not yet enough information to determine how well microsynteny is conserved ( Figure 4B).

Analysis of group 1 allergen expression
Using Real Time PCR we studied the expression of the three complexes of group 1 allergen genes both in immature tassel flowers (~10 and 12 days before anthesis) and mature pollen ( Figure 5). The expression of the three gene complexes in immature flowers was very low, with only slight differences between the samples collected 10 and 12 days before anthesis, which correspond to the late uninucleate microspore stage (Mascarenhas, 1989). In contrast, in mature pollen the levels of expression are ~10 6 -fold higher, indicating that these genes are only activated in full during the last phase of pollen development. A previous study had already found, using an EXPB9 probe, weak expression of maize group 1 allergen genes before the first pollen mitosis, at the uninucleate microspore stage, followed by much stronger expression at later stages (Broadwater et al., 1993). We have shown that the same pattern seems to apply to both class A and class B and to all three maize gene complexes. These results are consistent with the existing hypothesis that the main function of group 1 allergens is to aid in the penetration and growth of the pollen tube through the maternal tissue (Cosgrove et al., 1997;Li et al., 2003). However, the presence of these proteins in immature pollen suggests that they could also play a role in pollen development. As for the relative levels of expression of the three complexes, the number of ESTs deposited in databases suggests that they may be considerably lower in the case of EXPB10 (Table I). Although affected by amplification efficiency, the level of expression we detected in mature pollen for the EXPB10 complex was 4.3 and 8.2 times lower than those of EXPB11 and EXPB1 respectively ( Figure 5).

Conserved elements in group 1 allergen promoters
The availability of genomic sequence from two highly divergent grasses offers the opportunity of identifying conserved sequences in the upstream sequences of rice and maize pollen β -expansins. It has been shown that this region is sufficient to determine pollen specific expression in one of the rice allergens (Xu et al., 1999). We were able to analyze the promoter sequence of all 4 maize class B genes, as well as 7 class A genes (2 EXPB10s and 5 EXPB11s), and compared them with their rice orthologs. All these genes are expressed in pollen, with the possible exception of EXPB10c which has not yet been detected in any EST libraries. Since the genes share a common ancestor, one could also expect them to share some cis regulatory sequences. We searched for potential regulatory elements of at least 8 bp, conserved in similar positions in rice and maize genes, using FootPrinter (Blanchette and Tompa, 2003). Promoter sequences evolve at a very fast rate, so that the probability of a motif of this size being conserved by chance between these two species is less than 0.001 (Kaplinsky et al., 2002).
With the exception of ZmEXPB9, for which we only have a short upstream fragment, all the analyzed rice and maize upstream sequences include one or two RYrepeats (CATGCATG), perfectly conserved in most cases (Supplemental Figure 4). This element is bound by the B3 domain of several transcription factors involved in seed maturation (Reidt et al., 2000;Braybrook et al., 2006). It is possible that a related transcription factor is active in grass pollen and involved in the transcriptional control of pollen β -expansins. Our results also suggest that the conservation of regulatory sequences between species is more extensive for class B genes, with at least four 8 bp motifs conserved in the same order and orientation, than for class A, where we could only detect the conservation of the RY-repeats (Supplemental Figure 5). Due to their recent divergence, the motifs shared by the maize EXPB10 and EXPB11 genes may have little functional significance.

DISCUSSION
It has long been known that in some grass species the group 1 pollen allergens consist of many protein variants with small differences in pI and molecular size (Petersen et al., 1993). Whether this isoform diversity is due to multiple genes or variations in posttranslational modifications can not be resolved without identifying the corresponding genes. The cloning of cDNA sequences encoding group 1 allergens from grass pollen began in the early 1990's (Perez et al., 1990;Broadwater et al., 1993;Xu et al., 1995), but gave few hints of the large genetic complexity we have discovered in maize. On the other hand, N-terminal sequencing of timothy grass group 1 allergens showed the existence of at least four slightly different protein sequences, all of which appear to belong to class A (Petersen et al., 1993). A previous study in maize identified only two class A and two class B genes (Li et al., 2003), and the recent sequencing of the rice genome revealed that this species had a single class B gene, and also a tandem of three genes and one pseudogene belonging to class A (Sampedro et al., 2006). In contrast to these numbers, the present work reveals the presence of at least 15 group 1 allergen genes in maize, at least 13 of which are simultaneously expressed in mature pollen (Table I). From the limited evidence we have collected for sorghum and the results of immunological studies, it would not be surprising to also find large numbers of group 1 allergen genes in other grass species. These results have implications for our understanding of the function and evolution of group 1 allergens. Additionally, they need to be taken into account, for example, in attempts to silence the expression of these genes, as was done for the group 5 allergens (Bhalla et al., 1999).

Genomic localization
We have shown that class B genes are still found in regions of colinear gene order in rice and maize (Figure 4). For class A genes, on the other hand, we have detected 4 translocation events from an ancestral location now occupied by ψ EXPB10e (Figure 3).
However, the fact that this pseudogene can be confidently placed in the EXPB10 complex (Supplemental Figure 1A) indicates that the oldest of these translocation events (barring long distance gene conversion) did not happened before the divergence of the EXPB10 and EXPB11 complexes. Our phylogenetic analysis shows that this divergence is more recent than the split of maize from sorghum ( Figure 1). It is thus likely that class A genes in the last ancestor of these two species were found in their ancestral location, which has also been conserved in the rice lineage (Figure 4).
The last common ancestor of rice and maize is also an ancestor to all but a few basal grasses (Kellogg, 2001). In particular it is also the ancestor of turf grasses such as Lolium perenne, Poa pratensis, Phleum pratense, and Cynodon dactylon, whose pollen is most frequently responsible for human allergic disease. The evidence that class A and class B gene locations have been preserved in the maize and rice lineages for most of their history suggests a strategy for isolating orthologous genes in turfgrasses and other species by positional cloning, since there are already detailed comparative genetic maps that cover many grass groups (Devos, 2005). It should also be noted that the translocation events we have detected in maize appear to be quite exceptional in the expansin superfamily at large. In the rice lineage, since its divergence from Arabidopsis, we only found six possible cases of translocation out of a total of 58 genes (Sampedro et al., 2005).
There is no evidence that the recent polyploidization of maize had any effect on the proliferation and diversification of pollen β -expansins. We found only one location for class B genes and none of the class A locations appear to be part of recently duplicated segments, since they are orthologous to different regions of the rice genome. Selection at the translational level has also been proposed as an explanation for the very high GC content at third codon position of many grass genes (Wong et al., 2002).
All pollen β−expansins have high GC content at third codon positions, but class B genes appear to have particularly high levels (85.5-87% in maize, 91% in rice) when compared with class A genes (70-75% for maize EXPB10s, 81-83% for EXPB11s and 71%-84% for rice). It has previously been shown that grass genes with higher GC content have a lower rate of synonymous substitution (Alvarez-Valin and Jabbari, 1999). This correlation, which could be caused by selection or other mechanisms, seems to apply to the two classes of pollen β−expansins and is evident in phylogenetic trees. Those trees based on DNA distances show much shorter branches for GC-rich class B than for class A (Supplemental Figure 1), a difference that is much reduced when the trees are based on protein sequences (Figure 1). This suggests than the differential GC enrichment of the two classes could be an ancestral characteristic maintained through evolutionary history, maybe caused by a different degree of selective pressure for efficient or rapid translation.
The unequal rate of evolution is also likely to be the reason why trees based on DNA sequence can have difficulties recognizing class A as a clade (Supplemental Figure 1B).

Multiplicity of pollen β -expansin genes
One of the most remarkable findings of this work is the presence in maize of multiple gene copies that are expressed simultaneously, with minor or no differences at the protein level, and all of them apparently under purifying selection, particularly in the case of the genes in the EXPB11 complex ( Figure 2). The expansin superfamily has been studied in rice, Arabidopsis and Populus and in these species the level of identity found inside the maize gene complexes is only seen in groups of two genes (Sampedro et al., 2005;Sampedro et al., 2006). Recent duplications appear to be more common in the maize genome, but a recent survey found evidence of copies with higher than 98% identity for only 1% of maize genes (Emrich at al., 2006).
The evidence for purifying selection in the pollen β -expansin duplicates suggests that each of these genes provides a selective advantage for the pollen grains that carry them, even though their function is likely to be identical. A possible explanation is that having multiple copies of group 1 allergen genes allows higher levels of total group 1 allergen expression to be reached. The apparently recent proliferation of class A and, to a lesser extent, class B genes in maize, could be linked to a particularly strong selection for high levels of pollen β -expansin in this lineage. Maize pollen grains have to grow through silks that can be up to 30 cm long, which could cause the levels of pollen β -expansin to be particularly critical for male-male competition. This idea is supported by reduced transmission through pollen of a β -expansin insertional mutant allele (Valdivia et al., 2007).
We should also consider the possibility that the phenomenon of gene proliferation in this group of genes is older than phylogenetic trees indicate. The existence of a cluster of class A genes in rice, together with the evidence from sorghum and the immunological results in other species, suggest that this tendency to duplicate may be a common characteristic of pollen β -expansins in grasses, particularly for class A genes. The appearance of a recent proliferation in multiple lineages could thus be a result of concerted evolution. Both unequal crossing over, through a process of repeated gene births and deaths, as well as gene conversions, can have homogenizing effects on clusters of closely related genes. We have seen clear evidence of a gene conversion event in the case of the EXPB1 complex, and the numerous pseudogenes show how gene copies are being lost in all the lineages. If homogenization is a common occurrence in these genes, then a long-range transposition (or the maize polyploidy) may have provided the opportunity for the divergence and possible specialization of the EXPB10 and EXPB11 complexes. A similar situation of specialization associated with physical distance has been described for Arabidopsis resistance genes, which also appear in clusters and are subject to frequent homogenization (Baumgarten et al., 2003).
The degree of functional differentiation in the grass group 1 allergens still needs to be addressed. We have found no evidence of differential expression in the three complexes ( Figure 5), but it seems likely that at least class Establishing the sequences and numbers of the genes that code for the two classes of group 1 allergens should facilitate further research into their biological functions.
Moreover, the situation in maize is likely to apply in some degree to other species and it should serve as a caution about the potential genetic complexity underlying these proteins in turf grasses, where their allergenic properties are of significant medical concern. EST sequences from B73 libraries, as well as maize genomic sequences available in GenBank as of November 2, 2006 (NR, GSS and HTGS databases) were analyzed and grouped into genes using SeqMan (DNAstar, Madison, WI). Sorghum EST assemblies were obtained from the PlantGDB database (www.plantgdb.org).

Phylogenetic Analysis
Sequences were aligned with the help of ClustalW, as implemented in MEGA3.1 replicates for all trees.

BAC Clone Identification
Cell stocks for BAC clones were grown in 500 mL of LB broth with 20 µg/ml of chloramphenicol and BAC DNA was purified using NucleoBond BAC Maxi plasmid purification kit (Clontech Laboratories, Palo Alto, CA). BAC clones were initially screened by PCR using class specific primers (Table II). PCR fragments were used for direct sequencing using an ABI Hitachi 3730XL DNA Analyzer.

Southern Blot
Twenty µg of DNA from each BAC were digested with HindIII, XhoI, and EcoRV at 37 ºC overnight. DNA was precipitated and separated on 0.8% agarose gels in 1X TAE and then transferred using a Vacugene Xl Vacuum Blotting System (Pharmacia Biotech, Amersham Pharmacia, Piscataway, NJ) to a Hybond-N nylon membrane (Amersham Biosciences, Piscataway, NJ). The membrane was pre-hybridized using Ultrahyb solution (Ambion, Austin, TX) overnight at 37 ºC. Hybridizations were performed at 68 ºC for ten hours in a mixture containing the probe added to the prehybridization buffer. Blots were washed twice at 62 ºC for 20 minutes in 5X SEN (5% SDS, 1 mM EDTA, 40 mM Na 2 HPO 4 ) and later for 20 minutes in 1X SEN (1% SDS, 1 mM EDTA, 40 mM Na 2 HPO 4 ), also twice. Blots were exposed to phosphor screens (Molecular Dynamics, Sunnyvale, CA) for 4 hours at room temperature and imaged using a Typhoon scanner and Image Quant TL software (GE Healthcare).
Two PCR-amplified fragments from the coding region of EXPB11 and EXPB1 genes were used as class specific probes (Table II). These probes were labeled with 32 P-dCTP using rediprime II random prime labeling system (Amersham Pharmacia, Piscataway, NJ). Gene specific probes and polymerase chain reaction (PCR) fragments were cloned with TOPO TA Cloning Kit (Invitrogen, Carlsbad, CA).

FISH Protocol
FISH was performed according to published protocols (Kato et al., 2004;Kato et al., 2006). Expansin probes were made from PCR products amplified using the primers and templates listed in Table II (Table II). this inbred line and counterstained with DAPI (blue). The cocktail contains CentC (green), "TAG" microsatellite clone 1-26-2 (green), and the 180bp knob probe (blue).
The gray values of the EXPB probes signals are displayed below the merged (RGB) image. Arrows indicate sites of hybridization to the EXPB probes. E, the maize chromosomes that contain group 1 allergen genes are represented to scale proportional to genetic distance. Black triangles indicate ancestral gene locations showing microsynteny with rice group 1 allergens (see Figure 4), white triangles are for locations where group 1 allergen genes have been translocated, black circles indicate centromeres. The closest markers to each location are shown.   Group 1 allergen genes identified in maize B73 cultivar. For each gene the availability of genomic sequence (asterisks indicate partial sequences), the number of diagnostic ESTs and the chromosome location, when known, are indicated (see Figure 3 for detailed localization).

Class Complex
Gene Genome ESTs Chr.