Alternative Splicing at NAGNAG Acceptor Sites Shares Common Properties in Land Plants and Mammals

In recent years, several papers have reported that a special type of alternative splicing (AS) event occurs at the tandem 3 # splice site, termed the ‘‘NAGNAG acceptor.’’ This type of AS event (termed AS-NAGNAG) is well studied in both human and mouse. To illustrate the signiﬁcance of AS-NAGNAG events, we focused on their occurrence in Arabidopsis thaliana and Oryza sativa (rice). Our study is the ﬁrst genome-wide approach examining AS-NAGNAG events in land plants. Based on transcripts and genomic sequences, we found 321 and 372 AS-NAGNAG events in Arabidopsis and rice, respectively. These events were signiﬁcantly enriched in genes encoding DNA-binding proteins, and more than half of all AS-NAGNAG events affected polar amino acid residues. The observed properties of AS-NAGNAG events in plants were similar to those seen in mammals. These results showed that AS-NAGNAG events may provide a mechanism for ﬁne-tuning of DNA-binding proteins in both mammals and land plants. We found 7 gene groups of AS-NAGNAG events that were conserved between Arabidopsis and rice, including 2 groups for RNA-binding proteins. Conservation of the events for RNA-binding proteins is a property also seen in mammals. Furthermore, we found 23 gene groups containing AS-NAGNAG events that occurred in noncorresponding introns of homologous genes. They included 5 groups of DNA-binding proteins, whose number was larger than expected. We think there is a bias with which AS-NAGNAG events are ﬁxed in genes for DNA-binding proteins. Our analysis showed that AS-NAGNAG events found in land plants share similar properties with those in mammals. Based on our results, we propose that AS-NAGNAG events are likely to be a common mechanism in the ﬁne-tuning of protein functions, especially DNA/RNA- binding proteins, in both mammals and plants. Their role might contribute to the construction of complicated transcriptomes and proteomes in the evolutionary history of mammals and land plants.


Introduction
Alternative splicing (AS) is a mechanism whereby 2 or more types of different mature mRNAs are generated from a single premature mRNA. Recent reports have found that a significant percentage of genes in plants undergo AS. For example, over 20% of pre-mRNAs corresponding to about 4,700 genes are alternatively spliced in Arabidopsis thaliana (Wang and Brendel 2006) and also over 20% corresponding to about 6,500 genes in Oryza sativa (rice) (Wang and Brendel 2006). On the other hand, more than 50% of genes are subject to AS in human and mouse (Johnson et al. 2003;Carninci et al. 2005). Recently, Hiller et al. (2004) reported that many human genes underwent AS events at splice acceptor sites with special consensus sequences. These sites exhibited tandem repeats of the consensus sequence ''NAG'' and were thus termed ''NAGNAG acceptor'' sites ( fig. 1). In these AS events (which we will term ''AS-NAG-NAG events''), both the first AG (the ''E'' site; Hiller et al. 2004) and the second AG (the ''I'' site) may function as splice acceptor sites. Such events result in the variable presence or absence of the 3 nt of the latter NAG in the mature mRNAs ( fig. 1). Because AS-NAGNAG events never result in frameshifts, they affect only 1 or 2 amino acids in the translated amino acid sequences unless they generate start or stop codons. Although the number of affected amino acids is very small, several properties of these events reveal their possible importance in the regulation of protein function. AS-NAG-NAG events are reportedly enriched in genes encoding DNA-binding proteins (Akerman and Mandel-Gutfreund 2006) and tend to occur in regions enriched in polar residues (Hiller et al. 2004). Additionally, many cases of AS-NAG-NAG events are conserved between human and mouse (Akerman and Mandel-Gutfreund 2006). Based on these results, AS-NAGNAG events are thought to have a regulatory role in the fine-tuning of DNA-binding proteins.
One alternative point of view, however, is that, AS-NAGNAG events may be explained by a simple physical model (Chern et al. 2006). Although Hiller et al. (2006) have argued against this theory, the biological importance of most AS-NAGNAG events remains unclear. Our study will contribute to this discussion using additional data derived from plant genomics. If DNA-binding proteins are also the principle targets of AS-NAGNAG events in plants and if these events primarily affect polar amino acids, the shared importance of AS-NAGNAG events in mammals and plants will be underscored, and these events may well be a shared strategy for regulating protein function. It should be noted that AS-NAGNAG events may be important not only because of their effects on DNA-binding proteins but also due to the influence that these effects then have on the entire transcriptome.
Recently, Campbell et al. (2006) analyzed sequence properties of the 3# splice acceptor site of AS events and reported that these events occurred in Arabidopsis and rice. However, the number of studies of AS-NAGNAG events in plants remains small, and the characterization of such events remains especially poor. In this study, we established algorithms to detect AS-NAGNAG events based on full-cDNA/expressed sequence tags. We attempted to determine how frequently AS-NAGNAG events occur in Arabidopsis and in rice using genome-wide analyses.
Characterization of AS-NAGNAG events is also necessary to assess the potential impact of these events on the plant transcriptome. If AS-NAGNAG events found in Arabidopsis and rice shared a common role in evolutionary pathways, such events should be conserved. For example, we found that AS events in genes for Ser/arg-rich splicing factors were highly conserved in Arabidopsis, rice, and probably in moss (Iida and Go 2006). In this study, we surveyed the conservation of AS-NAGNAG events in Arabidopsis and rice. We also focused our attention on a second area, that of AS-NAGNAG events found at noncorresponding introns of homologous genes. If a pair of homologous genes possesses AS-NAGNAG events at different introns, these events should be generated independently, and we term them AS-NAGNAG events at noncorresponding introns (ANNCIs). Although the amino acid position of the events is different, these events may have comparable impact on protein function when they are located in regions concerned with similar or identical molecular functions. In this paper, we provide examples of conserved AS-NAGNAG events or ANNCIs between Arabidopsis and rice as these events seem to play important role in modulating protein function. Also based on the results from these analyses, we discuss the similarities and differences of AS-NAGNAG events found in land plants and mammals.

Data Set
We used the entire genomic sequence and the annotated gene sets of Arabidopsis published on the Web site of The Arabidopsis Information Resource (TAIR; http:// www.arabidopsis.org/). For rice, we used the wholegenomic sequence and the annotated gene sets published by The Institute for Genomic Research (TIGR; http:// www.tigr.org/). We also used transcript sequences published by UniGene (Wheeler et al. 2007) and full-length cDNA sequences collected by RIKEN (Seki et al. 2002;Yamada et al. 2003), Ceres Inc. (Haas et al. 2002), and ''KOME'' (Kikuchi et al. 2003). For Arabidopsis, there were 447,107 transcripts in UniGene and the RIKEN and Ceres data sets had 280,569 and 5,000 sequences, respectively, which were full or partial reads of full-length cDNAs. There were redundancies between UniGene and the RIKEN or Ceres data sets, so the total number of transcripts in Arabidopsis was 501,736. For rice, the UniGene and KOME full-length cDNAs contained 374,632 and 32,127 transcripts, respectively. Because of some redundancy, the number of unique transcripts in rice was 374,954 ( fig. 2).

Construction of Transcription Units
We mapped transcript sequences to the genomic sequences to construct transcription units (TUs) (Okazaki et al. 2002;Iida et al. 2004). We used Blast (Altschul et al. 1997) for rough mapping and GeneSeqer (Brendel et al. 2004) for detailed mapping. In the first Blast step, we selected transcripts whose conformity with the genomic sequence was greater than 88%. Next, we made a partial sequence of the genome corresponding to each locus of the transcript and used GeneSeqer to make pairwise alignments of the genomic sequence and each transcript, taking into account exon-intron boundary rules. Finally, we limited our sample to sequences in which over 90% of the length of the transcript could be mapped to the genomic sequence. We clustered transcripts into 1 TU when they had the same direction and overlapped the same region of the genome.

Identification of AS-NAGNAG Events
We searched for AS-NAGNAG events using multiple alignments of nucleotide sequences in each TU. We created the multiple alignments from pairwise alignments of the genomic sequence and each transcript sequence. When 1 TU had the following 3 features, we treated it as having an AS-NAGNAG event: 1) it had the NAGNAG consensus site in the genomic sequence, 2) there was at least 1 transcript using the former AG site as the 3# splice acceptor site, and 3) at least 1 transcript used the latter AG site as the 3# splice acceptor site. We encapsulated these rules into a computer script written in Perl.

Gene Ontology Analysis
We performed gene ontology (GO) (Ashburner et al. 2000) analysis to determine which gene types tended to FIG. 2.-Flowchart outlining study procedures. We analyzed data derived from Arabidopsis and rice in parallel. During the final phase, we combined these results in order to more broadly survey conserved AS-NAGNAG events. contain AS-NAGNAG events. For this analysis, we used InterProScan (Mulder et al. 2007) to assign motifs and GOs to all the annotated gene models. We evaluated the number of genes with each GO, comparing the entire gene set with the subset of genes with AS-AGNAG events, using the fourth class of molecular function of GOs. When 1 gene had a GO whose class was under the fifth class, we converted it into a parent GO of the fourth class. We searched GOs that were enriched in genes undergoing AS-NAG-NAG and checked their significance using a chi-square test. When the frequency of a GO in genes with AS-NAGNAG events was greater than that in all genes and the P value computed as a result of the chi-square test was less than 0.05, we considered the GO to be statistically enriched in the AS-NAGNAG group. We used the R package (http://cran.r-project.org/) to calculate P values in chisquare tests.

Analysis of Amino Acids Affected by AS-NAGNAG Events
We analyzed amino acids that were affected by AS-NAGNAG events. Although AS-NAGNAG events were defined by the mRNA sequences, we had to use the open reading frame information from each mRNA for this analysis. We then compared the constructed TUs and annotated gene models. We used AS-NAGNAG events in our analysis when it was possible to map them to annotated gene models. We prepared 6 nt on the NAGNAG site, 2 nt on the 3# end of the previous exons, and 2 nt following the NAGNAG to determine the differences between amino acid sequences encoded by E and I transcripts ( fig. 1) (Hiller et al. 2004). When 1 amino acid residue was exchanged with 2 different amino acid residues by AS-NAGNAG event, we counted all 3 amino acids for the affected amino acids. Besides, when a AS-NAGNAG event had more than 2 different previous exons caused by another AS event, we counted all possible variations. In this analysis, we listed the frequency of each amino acid in the entire protein sequence, in the exon junctions, and in sequences affected by AS-NAGNAG events (table 2).
We also performed a statistical analysis similar to that of Hiller et al. (2004). In this analysis, we listed 10 amino acids on both sides of the exon junctions with and without AS-NAGNAG events and assessed whether these flanking sequences tended to be polar or not.

Analysis of Conserved AS-NAGNAG Events, ANNCIs, and Genomic NAGNAG Sites
We searched for AS-NAGNAG events that were conserved between Arabidopsis and rice. Because it is difficult to align divergent mRNA sequences, we used amino acid sequences translated from mRNAs containing AS-NAGNAG events. We grouped homologous genes utilizing Blast (Altschul et al. 1997), using as a query amino acid sequences that were translated from mRNA that contained AS-NAGNAG events. For the Blast searches, the parameters were set as follows: the low-complexity filter was on, the E value limit was 10 À20 , and the database contained all proteins in Arabidopsis and rice. We next used ClustalW (Thompson et al. 1994) to make multiple alignments of amino acid sequences that included each constructed group. When we detected a pair of AS-NAGNAG events that occurred at the same site on the alignment and were in the same phase, we considered them conserved events. In this analysis, the deviance of 1 amino acid was allowed because we detected cases in which the amino acids encoded near the exon boundaries were more divergent than those not around exon boundaries. In addition, we set another margin when the types of AS-NAGNAG events were different (i.e., the I and E transcripts). We verified conserved event candidates by visual examination of the multiple alignments (supplementary figs. 1 and 2, Supplementary Material online). Furthermore, we examined genes homologous to those with NAGNAG events to verify the corresponding introns or genomic NAGNAG sites.
For the case of AT5G16840.1, we performed homology modeling to characterize AS-NAGNAG events on the tertiary structure. For the modeling, we used SWISS-MODEL (Schwede et al. 2003) with pairwise alignment made by Blast (Altschul et al. 1997).

Construction of TUs
We used 501,736 transcripts of Arabidopsis in our analysis. We mapped these transcripts to the genome using Blast (Altschul et al. 1997) and GeneSeqer (Brendel et al. 2004). As a result, we mapped 450,474 (89.8%) of the transcripts to the genomic sequence with high accuracy (see the Materials and Methods for a detailed description). We constructed TUs based on the mapped transcripts. For Arabidopsis, there were 25,380 TUs, including 21,064 TUs with more than 2 transcripts (fig. 2), and they corresponded to 24,857 loci of annotated genes in the TAIR data set. In the TAIR gene models, the total number of loci was 26,751. Thus, we were able to cover nearly all the Arabidopsis loci in our analysis.
For rice, we used 374,954 full-length cDNA sequences from UniGene and KOME. We were able to map 338,036 sequences to the rice genome and determined 26,508 TUs. In all, 19,080 TUs contained more than 2 transcripts. In all, 26,508 TUs corresponded to 27,510 loci of the annotated genes in the TIGR data set, which accounted for approximately 47% of the 57,916 total loci.

Identification of AS-NAGNAG Events
We described above a method to detect AS-NAGNAG events contained in multiple alignments of TU transcripts. By applying the method to the data sets from Arabidopsis and rice, we identified 321 AS-NAGNAG events in Arabidopsis and 372 events in rice. These events were located in 316 TUs in Arabidopsis (1.5% of all TUs contained more than 2 transcripts) and 363 TUs in rice (1.9% of all TUs contained more than 2 transcripts) ( fig. 2 . Given our knowledge that certain AS-NAGNAG events had been previously annotated in gene models in TAIR or TIGR, we searched for gene models that corresponded to TUs with AS-NAGNAG events from annotated gene sets. We found corresponding gene models for 258 TUs in Arabidopsis and 248 TUs in rice, containing 261 and 253 AS-NAGNAG events, respectively. Of these, 187 events in Arabidopsis and 107 events in rice were already annotated. Therefore, we found 74 and 146 new AS-NAGNAG events in Arabidopsis and rice, respectively.

GO Analysis
To determine what types of genes tended to contain AS-NAGNAG events, we performed GO analysis. For this analysis, we used InterProScan (Mulder et al. 2007) to assign GOs to the 26,751 annotated Arabidopsis genes and 43,720 annotated rice genes. We excluded 14,196 transposable element-related genes because they account for a large fraction of the rice genes that negatively impact GO analysis. Using InterProScan, we identified 47,595 GOs (types of GOs: 1,428) in 15,482 Arabidopsis genes. In rice, we determined 54,603 GOs (types of GOs: 1,406) in 18,120 genes. We then used a data set composed of 258 genes from Arabidopsis and 248 genes from rice that corresponded to TUs with AS-NAGNAG events (see the previous sections for a detailed description). We searched this data set for GOs that were statistically enriched in AS-NAGNAG gene groups. We found 28 genes in Arabidopsis and 29 genes in rice with ''GO:0003677:DNA binding'' (table 1). The GOs were enriched in a group of genes with AS-NAGNAG events, and the differences were statistically significant (P values ,0.05). In addition, ''GO:0008026: ATPdependent helicase activity'' and ''GO:0016746: transferase activity, transferring acyl groups'' were also enriched in AS-NAGNAG groups in Arabidopsis. In rice, 4 other GOs, ''GO:0019887:protein kinase regulator activity,'' ''GO:0015485:cholesterol binding,'' ''GO:0031072:heat shock protein binding,'' and ''GO:0019888:protein phosphatase regulator activity'' were enriched in AS-NAGNAG groups.

Analysis of Amino Acid Residues Affected by AS-NAGNAG Events
We analyzed amino acid residues affected by AS-NAGNAG events. For this analysis, we used 506 genes (258 for Arabidopsis and 248 for rice) corresponding to TUs with AS-NAGNAG events. The results showed that in both Arabidopsis and rice, AS-NAGNAG events predominantly affected polar amino acid residues (table 2). Glutamine, serine, alanine, glutamic acid, and lysine each comprised more than 5% of the total number of amino acids in samples from both species. Excluding alanine, each of these is a polar residue, and together they accounted for more than 50% of amino acids affected by AS-NAGNAG events. We also compared the number of polar amino acids contained within the 10 flanking amino acid residues from exon junctions with and without AS-NAGNAG events. Flanking residues around exon junctions with AS-NAGNAG events were more polar than those without AS-NAGNAG events (P , 0.00001 in t-test), both in Arabidopsis and rice.

Analysis of Conserved AS-NAGNAG Events, ANNCIs, and Genomic NAGNAG Sites
We searched for AS-NAGNAG events that were conserved between Arabidopsis and rice. In this analysis, we used 506 genes (258 for Arabidopsis and 248 for rice) corresponding to TUs with AS-NAGNAG events. We found 7 homologous gene groups with conserved AS-NAGNAG events: small nuclear ribonucleoprotein D2 (Sm-D2), urease accessory protein D, Similar to surfeit locus protein 2 (SURF2) family protein, RNA recognition motif (RRM)-containing protein, putative O-acetyltransferase, auxin-induced gene IAA13,andanunknownprotein( fig.2andtable3).For4groups (Sm-D2, urease accessory protein D, RRM-containing protein, and IAA13), the influence on the encoded amino acids was also conserved (table 3). For example, we could assign the tertiary structure of the RRM domain of serine/  arginine (SR)-rich factor 9G8 (Hargous et al. 2006) to the RRM domain encoded by AT5G16480. The sequence identity between these RRM domains was approximately 35%. We next analyzed the impact of AS-NAGNAG events on the tertiary structure by homology modeling (fig. 3). The sequence containing the AS-NAGNAG events was positioned on the loop structure situated between 2 beta strands. These beta strands are known to be responsible for RNA binding in the SR-rich factor 9G8 (Hargous et al. 2006). The RRM domain encoded by the E transcript had a longer loop than that encoded by the I transcript. Therefore, the position of Glu42, which might affect RNA-binding properties, was different between E and I transcripts. A second example is the gene Sm-D2 encoded by AT2G47640 and AT3G62840, to which we could assign the tertiary structure of human Sm-D2 (Kambach et al. 1999). Although the position of the AS-NAGNAG event was disordered in the tertiary structure of human Sm-D2, the AS-NAGNAG event did lie on the N-terminal region that may serve as an inter-action face for a heptameric ring or for mRNA strands. For urease accessory protein D, AS-NAGNAG events that led to creation of a premature termination codon (PTC) were conserved; we will return to this topic in the Discussion.
We searched for genes homologous to those containing AS-NAGNAG events and analyzed whether or not they had corresponding introns at the same locations as the AS-NAGNAG events. When they did, we also checked for the existence of genomic NAGNAG sites in the homologous genes. Out of all 506 genes with AS-NAGNAG events, 442 genes had at least 1 homologous gene in the alternate species (rice homologues for Arabidopsis genes and Arabidopsis homologues for rice genes). For 286 of the AS-NAGNAG events, at least 1 homologue contained corresponding introns. We found that 88 events (46 in Arabidopsis and 42 in rice) had homologues with genomic NAGNAG sites at corresponding sites (supplementary table 2 We showed serine residues affected by AS-NAGNAG events and charged residues which positions were also affected by the events. (B) and (D) show the surface of the proteins. Positively charged residues (Arg and Lys) are showed with white, and negatively charged ones (Asp and Glu) showed with black. The front side of these figures is used for RNA interaction surface in the original proteins. These models were built by homology modeling based on the RRM domain of human SR-rich factor 9G8 (PDB ID 2HVZ) (Hargous et al. 2006). (A) and (C) were drawn using MOLSCRIPT (Kraulis 1991) and Raster3D (Merritt and Murphy 1994). (B) and (D) were drawn using UCSF Chimera (Pettersen et al. 2004). 714 Iida et al. events, 73 AS-NAGNAG events (38 in Arabidopsis and 35 in rice) had homologues with genomic NAGNAG sites but no observed AS-NAGNAG event. We also found 23 cases where several homologous genes had AS-NAGNAG events in different introns (supplementary table 3

Discussion
This study is the first genome-wide study analyzing AS-NAGNAG events in A. thaliana and O. sativa. One of our goals was to characterize AS-NAGNAG events in plants and then compare our findings with existing data derived from studies on mammals. In the present study, we found about 2% of all TUs having AS-NAGNAG events in Arabidopsis and rice, respectively. On the other hand, Hiller et al. (2004) reported that at least 5% of all human genes contain AS-NAGNAG events. Using these numbers only, it would appear that the prevalence of AS-NAGNAG events in plants is smaller than that in human. When we consider all AS events and not solely those associated with NAGNAG, the fraction of genes with AS is reportedly ;20% in Arabidopsis and rice (Wang and Brendel 2006). In comparison, the prevalence in mammals is reportedly at least 50% (Johnson et al. 2003;Carninci et al. 2005). The difference between the rate of AS-NAGNAG events in plants and human may in fact reflect a different background frequency of more general AS events between the 2 species.
We found that certain features of AS-NAGNAG events in plants highly resemble those in mammals. For example, AS-NAGNAG events in plants are enriched in genes encoding DNA-binding proteins and mainly affect polar amino acids such as lysine, glutamine, glutamic acid, and serine. These findings are highly consistent with those found in mammals (Hiller et al. 2004;Akerman and Mandel-Gutfreund 2006). This similarity suggests that AS-NAGNAG events have a common role in mammals and plants. Even though the absolute number of affected amino acids is small, polar amino acids can play an important regulatory role in DNA-binding proteins. Hiller et al. (2006) used the mouse Pax3 gene as an example. In this case, the presence or absence of glutamine caused by AS-NAGNAG events can change DNA-binding properties. Human EGR-1 protein, a C2H2-type zinc finger protein, follows a similar pattern. In this case, 3 amino acids, lysine-threonine-serine, were changed by AS, and these events can change the DNA-binding properties of the protein (Larsson et al. 1995;Stetefeld and Ruegg 2005). In this pair, both of genes had AS-NAGNAG events but the site quite different. These events were thought to be generated independently, so they are termed ANNCIs. (B) A scheme for domain structure of NAC family transcription factors. AT3G10480.1 from Arabidopsis has AS-NAGNAG event on DNA-binding domain that account for C terminal part of NAC domain. Os08g06140.1 from rice has the event on activation domain. This scheme is based on a tertiary structure of abscisic-acid-responsive NAC (PDB ID 1UT4) (Ernst et al. 2004) and a figure drawn by Ooka et al. (2003).

Shared Properties of AS-NAGNAG Events in Plants and Mammals 715
We hypothesize that AS-NAGNAG events found in plant DNA-binding proteins have at least a minor regulatory role of a similar kind. We also suggest that both mammals and plants use the same strategy for regulating protein functions through AS-NAGNAG events.
Our study also compared the presence of AS-NAGNAG events in Arabidopsis and rice and found 7 homologous gene groups where these events were conserved. The number of the conserved AS-NAGNAG events looked large, when we compared that with known conserved AS events at splice acceptor site, which was only 5 and did not include current AS-NAGNAG events (Wang and Brendel 2006). However, when we compared the number of conserved AS-NAGNAG events in land plants and mammals, there appeared to be a significant difference between the 2. Akerman and Mandel-Gutfreund (2006) reported high conservation of AS-NAGNAG events between human and mouse, based on findings from 215 conserved events between human and mouse. The possibility certainly exists that a larger number of cases show the conservation of the AS-NAGNAG events. Thirty-eight AS-NAGNAG events in Arabidopsis and 35 events in rice had corresponding introns in homologous genes with genomic NAGNAG sites but without confirmed AS events. Further accumulation of transcripts might provide more confirmation of AS-NAGNAG events. However, even in the case of human, for whom more transcripts have been accumulated than for Arabidopsis and rice, only 13% of all genomic NAGNAG sites were associated with transcript-confirmed AS events (Hiller et al. 2004). We believe that a certain number of genomic NAGNAG sites cannot provide AS variants or can provide AS variants at very low rates. We also cannot expect that all AS-NAGNAG events with homologous genes having genomic NAGNAG sites will become conserved cases. Although the number of conserved events in plants is very small, the result shows interesting consistency with mammals. Hiller et al. (2004) reported that several splicingrelated genes, such as PRPF3, PRPF8, U2AF1, and U2AF2, had AS-NAGNAG events conserved between human and mouse. In our results, Sm-D2, a component of snRNP, and one protein containing an RRM that can act in RNA metabolism were included in the conserved cases. Akerman and Mandel-Gutfreund (2006) reported that RNA-binding proteins were also common sites for AS-NAGNAG events. Although we did not find the relationship between AS-NAGNAG events and RNA-binding proteins to be statistically significant, AS-NAGNAG events may well have an important role in regulation of these RNA-binding proteins in plants because of the conservation of these events between Arabidopsis and rice. As described in the Results, these AS-NAGNAG events lie on protein-RNA or protein-protein interaction surfaces, suggesting their ability to modify protein function.
The presence of AS-NAGNAG events in RRMcontaining proteins demonstrates the potential importance of AS-NAGNAG events on changing the positions of functional residues in tertiary structures ( fig. 3). In such cases, AS-NAGNAG events occurring in positions neighboring the functional sites can modify the functions of the proteins, even if the sequences of the functional residues themselves are not changed by AS-NAGNAG events. Conservation of AS events in RNA-binding proteins resembles our previous results on SR proteins (Iida and Go 2006). These types of conserved AS events found on RNA-binding proteins should have important role in both Arabidopsis and rice, along with the presence of AS-NAGNAG events.
We found the cooccurrence of AS-NAGNAG events and genes with specific GO term DNA binding to be statistically significant, however, no genes for DNA-binding proteins reveal conservation of the events between Arabidopsis and rice. On the other hand, we found that DNAbinding proteins were enriched in genes with ANNCIs. We are unable to suggest a mechanism by which an AS-NAGNAG event moves from its original site to another intron. Such events should be generated independently. Out of 23 groups with ANNCIs, 5 groups could be considered DNA-binding proteins (supplementary table 3, Supplementary Material online). In fact, many DNA-binding proteins contain AS-NAGNAG events. However, the expected number of ANNCIs, assuming no bias in the distribution of AS-NAGNAG events, is 0.9 (if 28 and 29 genes encoding DNA binding in Arabidopsis and rice, respectively, are clustered into the homologous gene group). This difference is statistically significant (P value ,0.01, with binomial test). We then considered that some bias might impact the AS-NAGNAG events fixed in mRNAs encoding DNA-binding proteins in evolutionary path. It is likely that certain AS-NAGNAG events are useful in fine-tuning the function of DNA-binding proteins. One example is the NAC family of transcription factors ( fig. 4): the Arabidopsis gene has an AS-NAGNAG event on the DNA-binding domain and rice has an event on the activating domain (Ooka et al. 2003). Even if the location was different, both events could modulate the function of the transcription factors. In this case, the Arabidopsis AS-NAGNAG event is situated on the loop between 2 beta strands, whose high mobility precluded precise observation of their structure (Ernst et al. 2004). Such flexibility, however, likely facilitates the occurrence of AS-NAGNAG events within the protein.
In this study, we found several AS-NAGNAG events that could generate premature stop codons (PTC). The most interesting example was found in the mRNA-encoding urease accessory protein D because this AS-NAGNAG event was conserved between Arabidopsis and rice (table 3). A second example occurred in the mRNA for geranyl diphosphate synthase, which we listed as containing conserved AS-NAGNAG events (ANNCI_22 in supplementary table 3, Supplementary material online). In this case, 2 independent AS-NAGNAG events resulted in PTCs. Conserved AS-NAGNAG events and ANNCIs were expected to have potentially important roles, one of which is likely to be creation of PTCs. This indicates that AS-NAGNAG may use a similar mechanism to ''regulated unproductive splicing and translation'' (RUST; Lewis et al. 2003). We previously found that AS events occurring in SR proteins in land plants can regulate mRNAs with this type of mechanism (Iida and Go 2006). Wang and Brendel (2006) also reported the positive importance of RUST in land plants. RUST seems to be a widespread regulatory mechanism in land plants, and AS-NAGNAG is likely the most economic way to generate PTCs.