Comprehensive sequence analysis of 24,783 barley full-length cDNAs derived from 12 clone libraries.

Full-length cDNA (FLcDNA) libraries consisting of 172,000 clones were constructed from a two-row malting barley cultivar (Hordeum vulgare 'Haruna Nijo') under normal and stressed conditions. After sequencing the clones from both ends and clustering the sequences, a total of 24,783 complete sequences were produced. By removing duplicates between these and publicly available sequences, 22,651 representative sequences were obtained: 17,773 were novel barley FLcDNAs, and 1,699 were barley specific. Highly conserved genes were found in the barley FLcDNA sequences for 721 of 881 rice (Oryza sativa) trait genes with 50% or greater identity. These FLcDNA resources from our Haruna Nijo cDNA libraries and the full-length sequences of representative clones will improve our understanding of the biological functions of genes in barley, which is the cereal crop with the fourth highest production in the world, and will provide a powerful tool for annotating the barley genome sequences that will become available in the near future.

In 2009, approximately 54 million hectares of barley (Hordeum vulgare) were cultivated, and harvests of this species produced approximately 150 million tons of grain worldwide, according to the Food and Agriculture Organization of the United Nations (http://faostat.fao. org/). Barley belongs to the Poaceae family (i.e. grasses), which also includes rice (Oryza sativa), maize (Zea mays), wheat (Triticum aestivum), and rye (Secale cereale; cross-pollinated). These members of the Poaceae, including barley, are the major cereal crops cultivated throughout the world. Barley is utilized for animal feed, malting, and as a human food source. Because barley is self-pollinated and has a diploid (2n = 14) genome, it is recognized as a genetic model of the Triticeae tribe within the Poaceae. Studies of the barley genome, transcriptome, and proteome are currently advancing our understanding of the molecular functions of agriculturally important barley genes (Sreenivasulu et al., 2008a(Sreenivasulu et al., , 2008b. Although barley has been extensively studied in terms of its genetics and breeding, molecular biology and genomics studies have been limited until recently because of the large genome size of this species (greater than 5 Gb, approximately 12 times that of rice; Varshney et al., 2007). To further promote barley genomics, a multinational collaboration, the International Barley Sequencing Consortium (IBSC), has been developed with the objective of obtaining the whole genome sequence of barley (http://barleygenome.org; Schulte et al., 2009).
The study of an organism's transcriptome (i.e. all transcribed sequences) is one of the most effective ways to investigate the structure and function of its active genes. In many plant species, transcript contigs have been constructed by assembling all the EST data available in PlantGDB, with the aim of identifying a data set of unique mRNA sequences and maximizing the information obtained for both protein-coding and noncoding regions in these sequences (Duvick et al., 2008). A large set of ESTs (501,620 from the vulgare subspecies and 24,161 from the spontaneum subspecies in the National Center for Biotechnology Information-dbEST release-100110 [www.ncbi.nlm.nih.gov/dbEST/] and 522,561 sequences in PlantGDB [www.plantgdb. org/]) has been accumulated in the public domain. These ESTs have been assembled into 23,595 clusters in National Center for Biotechnology Information's Uni-Gene data set (http://www.ncbi.nlm.nih.gov/unigene) and into 134,482 PlantGDB-assembled unique transcripts (http://www.plantgdb.org/prj/ESTCluster/progress. php). It is estimated that approximately 75% of the genes in the barley genome have been captured (Sreenivasulu et al., 2008a). Although these assemblies provide researchers with a tremendous amount of information for understanding partial barley gene structures, the assembly of sequences from different barley varieties might result in erroneous contigs or lead to the fusion of unrelated transcripts via common domains or motif sequences. Moreover, these data might lead to an overestimation of the number of genes in barley, because these assemblies do not consider virtual connections between 5# and 3# end sequences where both sequences are derived from the same cDNA clone.
Capturing the transcripts of all active genes under defined (temporal, spatial, and stressed) conditions can provide a "snapshot" of living cells. The resultant data can be used to obtain quantitative (expression frequency) and qualitative (predicted protein function) information about these genes. For barley transcriptome analysis, a 22K Barley1 GeneChip probe array based on an EST database containing 350,000 sequences from 84 different RNA sources has been established (Close et al., 2004). The results of highthroughput transcript profiling of barley using this chip can be retrieved from PLEXdb (http://www. plexdb.org/) and ArrayExpress (http://www.ebi.ac. uk/microarray-as/ae/). Using several platforms for transcript profiling, researchers have described the responses of barley to conditions such as high salinity (Ueda et al., 2006;Walia et al., 2006), low temperature or freezing (Koo et al., 2008), and drought (Talamè et al., 2007;Tommasini et al., 2008) as well as during various stages, such as grain development (Sreenivasulu et al., 2006), germination (Sreenivasulu et al., 2008b), and growth (Druka et al., 2006), by means of coordinated upor down-regulation of sets of genes.
Moreover, the construction of a comprehensive gene set for barley by genome sequencing is under way. However, following sequencing, genome annotation in the absence of information on exon-intron junctions is not an easy task, even for organisms for which complete genome sequences have been revealed. Although several gene prediction programs have been developed, the predicted gene structures might not always be correct, because these programs may select and connect incorrectly predicted exons (Bennetzen et al., 2004;Cruveiller et al., 2004;Jabbari et al., 2004). Hence, most gene annotation projects refer to EST or mRNA sequences for the accurate structural annotation of genes (Imanishi et al., 2004;Itoh et al., 2007).
Full-length cDNA (FLcDNA) is defined as the DNA complementary to an mRNA sequence that extends from the region near the 5# cap structure to the poly(A) tail, which is a structure specific to eukaryotic mRNAs (Maruyama and Sugano, 1994;Suzuki et al., 1997;Carninci et al., 2000). FLcDNA technology has been applied to the genomes of humans (Ota et al., 2004), mice (Kawai et al., 2001), and several plant species, including Arabidopsis (Arabidopsis thaliana; Seki et al., 2002), O. sativa (Kikuchi et al., 2003), T. aestivum (Ogihara et al., 2004;Kawaura et al., 2009), soybean (Glycine max; Umezawa et al., 2008), Z. mays (Soderlund et al., 2009), and tomato (Solanum lycopersicum; Aoki et al., 2010). Recently, an FLcDNA library has been constructed for the Japanese malting barley variety Haruna Nijo (Sato et al., 2009b). From more than 45,000 cDNA clones, 5,006 nonoverlapping sequences were obtained. These sequences and clones have been made publicly available through the BarleyDB (http://www. shigen.nig.ac.jp/barley/), and they have proven effective in the identification of functional genes. However, this information is insufficient, because the number of obtained FLcDNAs is much lower than all of the expressed genes in the barley genome.
In this study, we constructed 12 FLcDNA libraries for Haruna Nijo from various organs or under different conditions to produce a comprehensive data set for this barley variety. After 5# and 3# end sequencing and clustering of approximately 172,000 cDNA clones, 24,783 complete FLcDNA sequences were retrieved. In conjunction with publicly available barley FLcDNAs, a total of 22,651 nonredundant barley FLcDNAs were obtained. These sequenced clones should provide comprehensive information about the barley gene repertory.

Barley FLcDNA Data Set
A total of 24,783 FLcDNAs were completely sequenced from the EST library (see Supplemental Information S1; Fig. 1). After eliminating contaminated sequences and possible fusions of unrelated transcripts, we obtained 23,614 barley FLcDNA sequences (henceforth, the National Institute of Agrobiological Sciences FLcDNAs, or NIAS FLcDNAs). These FLcDNAs were derived from the 12 cDNA libraries distinguished by library-specific tag sequences and an unknown group lacking the tag sequences (Supplemental Table S1). The number of FLcDNAs in the various libraries ranged from 857 in the library derived from flowers in the juvenile stage to 2,643 in the library derived from the plants treated with jasmonic acid. To characterize the data set, we compared the sequences with 4,999 Haruna Nijo FLcDNA sequences that had been published previously (henceforth, Public FLcDNAs; Sato et al., 2009b). We found that the NIAS FLcDNA inserts were clearly longer (1,701 6 846 bp) than the Public FLcDNA inserts (1,455 6 742 bp; Supplemental  Table I). These results suggest that the qualities of the two FLcDNA data sets are quite similar in terms of capturing complete ORFs.
To evaluate the novel barley FLcDNAs revealed by our data, we compared the sequence similarity between the NIAS and Public FLcDNAs. We found that 21,586 of the 23,614 NIAS FLcDNAs showed no significant hits for any of the Public FLcDNAs. After redundant clones were removed, the remaining 17,773 FLcDNAs represented novel barley sequences ( Fig.  1). Combined with the 4,878 nonredundant Public FLcDNAs, 22,651 representative barley FLcDNAs were constructed. We have designated these FLcDNAs "Uni-FLcDNAs" in this study and used this set for further comparative analyses. The average length of the Uni-FLcDNAs was 1,711 6 863 bp.
Recently, many paired sense/antisense transcripts, referred to as natural antisense transcripts (NATs; Osato et al., 2003;Wang et al., 2005), have been identified. In Arabidopsis, 958 NAT pairs have been confirmed (Alexandrov et al., 2006), and in rice, 687 bidirectional transcription units have been discovered by means of FLcDNA mapping onto genome sequences (Osato et al., 2003). To test for the existence of NATs in the obtained FLcDNAs, we used BLASTN to screen for paired sense/antisense transcripts among all of the FLcDNAs, with a positive match criterion of greater than 95% identity and an E-value of less than 1 3 10 25 . We determined that 2,051 FLcDNAs could form sense/antisense pairs. An example of a sense/antisense pair presented by ClustalX alignment software is shown in Supplemental Figure S2. NIASHv1002F20 and NIASHv1022F01 exhibit complete reverse homology, despite the fact that they both contain predicted ORFs. This suggests that our data set contains NATs, even for sequences that encode predicted proteins.

Protein-Coding Genes
Based on homology searches and predictions of the longest ORF, 22,623 of 22,651 representative ORFs were identified; 19,212 of these ORFs (84.9%) were homologous to known functional genes according to the results of BLASTX searches against the RefSeq (Pruitt et al., 2009) andUniProtKB (UniProt Consortium, 2009) databases. The proportion of representative FLcDNAs that were deemed to contain complete ORFs was 85.4% (Table I), indicating that most of the FLcDNAs were associated with protein-coding genes.
To identify sequences in the Uni-FLcDNAs that were homologous to previously cloned rice trait genes, we employed 881 genes from Oryzabase (http:// www.shigen.nig.ac.jp/rice/oryzabase/; Kurata and Yamazaki, 2006). Barley homologs of these genes could play important roles in barley traits (phenotypes). We searched for homologs of these sequences among the Uni-FLcDNAs using BLASTP (Supplemental Table  S2). We found that 721 of the 881 genes (81.8%) had barley homologs with a high similarity (50% or greater identity). Although we currently do not know whether these homologs are orthologs, this result indicates that these trait genes are highly conserved between barley and rice. The detection of homologous sequences among the Uni-FLcDNAs could accelerate the comparative mapping of barley genes and the functional prediction of these homologous cDNAs to promote future barley genomics research.  FLcDNAs had no ORFs that were similar to known proteins or longest ORFs whose lengths were 70 or more amino acids.
We found 75 representative FLcDNAs that were longer than 5,000 bp in length, and some of these had the capacity to encode relatively long ORFs. For example, four of the 10 longest FLcDNAs encoded ORFs longer than 1,000 amino acids. In contrast, three of their ORFs were shorter than 100 amino acids (Table II).
We estimated that there were 28 noncoding FLcDNAs among the Uni-FLcDNAs (see "Materials and Methods"). Although these noncoding FLcDNAs could have been derived from microRNAs, a BLASTN search against rice microRNAs revealed no significant homologies. Moreover, as 26 of the 28 noncoding FLcDNAs exhibited no homologies with four fully sequenced grass genomes (O. sativa, Z. mays, Sorghum bicolor, and Brachypodium distachyon), they might represent poorly conserved noncoding RNAs.
An analysis using InterProScan revealed that 16,859 of the 22,623 representative ORFs (74.5%) contained conserved domains, and 12,595 ORFs (55.7%) were assigned Gene Ontology (GO) terms (Barrell et al., 2009). Using GO2slim, we found that "binding" (GO:0005488) and "catalytic activity" (GO:0003824) were the most common second-level GO terms found in the data (Supplemental Fig. S3). The distribution of GO terms related to the barley Uni-FLcDNAs was quite similar to that seen in representative Rice Annotation Project (RAP) data (Tanaka et al., 2008).

Comparison of Barley FLcDNAs with the Triticeae Transcriptome
We described 17,773 novel barley nonredundant FLcDNAs (Fig. 1). To evaluate the amount of novel gene information in these sequences, homology searches of the FLcDNAs were conducted against sequences from publicly available transcripts. For this purpose, 6,625 mRNAs and 525,559 ESTs from barley and 3,433 mRNAs and 1,107,168 ESTs from wheat were downloaded from DDBJ/EMBL/Gen-Bank.
A BLASTN search showed that 3,278 of the representative FLcDNAs had no homologous sequences. The average insert length of these novel FLcDNAs (1,602 6 921 bp) was slightly shorter than that of the known FLcDNAs (1,729 6 851 bp; Fig. 2). Characterization of protein functions using the second-level GO terms showed a similar distribution of terms among the novel and known FLcDNAs, except for "structural molecule activity" (GO:0005198; Fig. 3; 2.4% in the "known" gene set and 5.1% in the "novel" gene set), indicating that the molecular functions of the novel FLcDNAs were distributed similarly to those of the known FLcDNAs. The novel transcripts were predicted to code for many different kinds of essential proteins, such as proteins involved in protein synthesis, including ribosomal proteins and elongation factors, transcription factors, cytochromes, and stress-related proteins, including jasmonic acid-induced protein, CBFII-5.1, heat shock protein, and nucleotide-binding site-Leu-rich repeat disease resistance protein homolog (Supplemental Table S3). Hence, we anticipate that these novel transcripts will support the construction of an essential gene network for barley.
Of the 3,278 novel FLcDNAs, 1,974 were conserved in all four sequenced grass genomes (i.e. O. sativa, Z. mays, S. bicolor, and B. distachyon; International Rice Genome Sequencing Project, 2005;Paterson et al., 2009;Schnable et al., 2009;International Brachypodium Initiative, 2010) according to the results of BLASTN searches, but 942 had no grass homolog (Supplemental Table S4). We conducted BLASTP searches of the amino acid sequences derived from the novel ORFs against predicted protein sequences from the four species and found that 1,731 ORFs were homologous to genes annotated in at least one species. These results suggest that the molecular functions of approximately two-thirds of the novel barley FLcDNAs could be predicted by comparison with annotated genes in major grass species.

Detection of Barley-Specific FLcDNAs
To elucidate whether the Uni-FLcDNAs contained gene structures that are conserved among crop plants, BLASTN searches were conducted against the four complete genome sequences. The results showed that 1,699 FLcDNAs (specific FLcDNAs) were not associated with homologous regions in the four genomes, whereas 20,952 FLcDNAs (common FLcDNAs) had a  Table S5). Most FLcDNAs (19,778) were conserved in all four species, but more barley ORFs were homologous to ORFs in O. sativa than in B. distachyon, which shares a more recent common ancestor with barley than rice does. This result might have been caused by differences in the methodology used to sequence O. sativa and B. distachyon and the resulting quality of the reference genome sequences. The number of conserved genes detected in Z. mays was less than half the number detected in S. bicolor, even though the evolutionary distances of barley from Z. mays and S. bicolor are similar. The length of the inserts in the barley-specific FLcDNAs was shorter (1,301 6 897 bp) than in the other FLcDNAs (1,745 6 851 bp). Of the specific FLcDNAs, 263 contained known functional domains based on InterProScan analysis and 25 were noncoding transcripts; thus, 85% of the specific FLcDNAs were not assigned to any putative function. GO data suggested that the relative proportions of the genes related to "signal transducer activity" (GO:0004871) and "enzyme regulator activity" (GO:0030234) were greater in the specific FLcDNAs than in the total representative FLcDNAs (data not shown). This suggests that barley exhibits speciesspecific regulatory networks involved in signal transduction, transcription, and metabolism.

Comparison of FLcDNA Sequences with Published BAC Sequences of Barley
To evaluate the barley FLcDNA data set in terms of its ability to capture a large number of active FLcDNAs, we mapped all of the FLcDNAs on publicly available barley bacterial artificial chromosome (BAC) sequences. Taketa et al. (2008) reported a Haruna BAC contig (AP009567) in which two genes were predicted. BAG12385.1 encodes a putative iron deficiency-specific 4 protein, and BAG12386.1 is a putative ethyleneresponsive transcription factor responsible for the nud (hull-less) phenotype. Even though the public FLcDNAs could not be mapped to the predicted loci, multiple FLcDNAs determined in this study could be mapped at each locus, which suggests that the structures of these genes are accurate (Supplemental Fig.  S4). We also mapped the FLcDNAs on BACs of a different cultivar, Morex. Haruna Nijo is an established Japanese two-row malting cultivar, whereas Morex is a six-row cultivar that is used as the American malting industry's standard cultivar. Because Morex will provide the first reference sequence for the barley genome, there have been increasing numbers of submissions of Morex sequences in the DDBJ (www.ddbj. nig.ac.jp/). To estimate the conservation of gene sequences between these cultivars, we mapped the FLcDNAs to publicly available Morex BAC sequences using EST2genome (Mott, 1997). We found that 373 FLcDNAs mapped to genomic sequences from 113 of 181 BACs. Forty-five FLcDNAs (31 representatives) mapped to more than one locus. For example, NIASHv1123K08 was mapped to four loci in three BACs (one locus each of AY758233.1 and DQ249273.1 and two loci of AC239053.1). These genes were highly conserved (99.4% identity on average; Fig. 4), and 61 of them were completely identical. The results of this analysis indicate that the Haruna Nijo FLcDNAs can be utilized by IBSC to annotate the Morex genome.

DISCUSSION
Scientists have recently recognized that FLcDNAs are indispensable resources for gene identification and annotation. As the barley genome sequencing project progresses, obtaining information about barley FLcDNAs has become increasingly important. There are a considerable amount of Triticeae (barley and wheat) transcript data in the public domain; however, the number of barley FLcDNAs is insufficient com-  pared with the number available for other crops, such as O. sativa, Z. mays, and G. max (Kikuchi et al., 2003;Umezawa et al., 2008;Soderlund et al., 2009).
The sequences developed in this study represent the largest collection of Triticeae FLcDNA data produced to date. To comprehensively cover the entire barley transcriptome with FLcDNA clones, we constructed novel libraries from Haruna Nijo from 12 different developmental stages, organs, and/or stressed conditions and isolated more than 170,000 clones (Supplemental Information S1). To identify the source of clones in the pooled library constructions, tagged sublibraries were used.
The ESTs we identified were clustered, and representative cDNA clones were selected for full sequencing. The quality of the FLcDNAs obtained in this study was evaluated by comparing their average sequence length (1,701 bp) with that of publicly available barley and rice FLcDNAs. A recent report in which the average length of 27,455 maize FLcDNAs was found to be 1,442 bp (Soderlund et al., 2009) also supports the length quality of the FLcDNAs in this study. The method of FLcDNA construction used here was the cap-trapper method (Carninci et al., 1996), which has been used in FLcDNA projects in mice, Arabidopsis (Seki et al., 2002, rice (Kikuchi et al., 2003), and soybean (Umezawa et al., 2008). Because this methodology is based on the selection of mRNAs with an existing cap structure, it provides high coverage of mRNA structures.
The 22,651 representative barley FLcDNAs described here were constructed using the clones produced in this study and publicly available barley FLcDNAs. Functional categorization of the Haruna Nijo FLcDNAs based on ORF prediction and GO assignments produced results similar to those found for rice FLcDNAs. This could indicate conservation of the entire gene sets between barley and rice. We also detected 28 noncoding genes. Only four of the 28 noncoding transcripts were included in the NAT pairs, suggesting that NATs are not the sole reason underlying the observed the noncoding transcripts.
Comparative analysis of the Uni-FLcDNAs against public data indicated the importance of this data set.
First, 17,773 FLcDNAs were found to be novel in barley, and 3,278 cDNAs from the Uni-FLcDNAs exhibited no homology to public EST or mRNA data from barley or wheat. We consider it unlikely that this paucity of homologous sequences resulted from natural variation among Haruna Nijo and other barley varieties, because 1,974 of the 3,278 novel FLcDNAs had common homologs in all four grass species examined (i.e. O. sativa, Z. mays, S. bicolor, and B. distachyon). As full-length sequences are available for these four grass species, these 1,974 genes might be structurally conserved and functionally active in grasses in general. These findings should help us to assign gene functions to the novel barley FLcDNAs.
Second, we detected 1,699 FLcDNAs that showed no homology to any of the four grass genomes. Even though the mean insert length of these FLcDNAs was shorter than that of the other cDNAs, these 1,699 FLcDNAs still have the capacity to encode functional proteins. Therefore, we concluded that these genes are Triticeae-specific (or at least Hordeum-specific) sequences. Unfortunately, only 263 specific FLcDNAs could be assigned putative functions based on Inter-Pro domains, and the gene functions of the other genes are unknown. The complete gene structure information and the FLcDNA clones of the barley-specific genes can be used in future experimental studies, such as for overexpression of recombinant proteins or microarray analyses, to reveal the functions of these genes. We note that the 20,952 FLcDNAs with homologs in all four grass genomes might still include some barley-specific genes, because the coding potential of their mapped regions has not been verified; additionally, there were cases where FLcDNAs mapped to nongenic regions in the latest annotated genome data.
The barley cv Haruna Nijo, which was used here as the source of the cDNA libraries generated, was released as a malting barley variety in 1981 and has been intensively used in the pedigree of Japanese malting barleys because of its excellent quality profiles for brewing. Additionally, several genetic/genomic resources have been established for this cultivar. More than 140,000 ESTs have been sequenced, and the positions of more than 2,890 of these have been located in genetic maps (Sato et al., 2009a). A BAC library has been constructed (Saisho et al., 2007), and some of these BAC clones have been beneficial for map-based cloning of trait genes in barley (Taketa et al., 2008  genome is approximately 100,000. This estimation is much larger than an estimation presented in a previous report (Mayer et al., 2009) of 38,000 to 48,000 genes in the barley genome (i.e. 104 to 132 kb per gene on average). It is likely that we have overestimated the number of genes, because the BAC clones used for genome sequencing come from gene-rich regions targeted for the purpose of map-based cloning or the investigation of genic regions. Based on the previously estimated gene number (Mayer et al., 2009), our Haruna Nijo data set of 22,651 nonredundant FLcDNAs could represent 47% to 59% of the total number of genes present in barley. The fact that 54% to 70% of the rice genes predicted by RAP have been validated by rice FLcDNAs indicates that our data set is similarly comprehensive. Not all genes present in barley are expected to have been expressed in the 12 conditions examined in this study, so we suggest that the proportion of active genes captured could be much greater than 47% to 59%.

CONCLUSION
We cloned more than 170,000 Haruna Nijo barley FLcDNAs and identified more than 24,000 complete FLcDNAs. The final set of 22,651 representative FLcDNAs obtained will be a very useful resource for future studies of barley as well as for the annotation of barley genomic sequences in the ongoing IBSC genome sequencing project. These data will also support the future wheat genome sequencing project currently being organized by the International Wheat Genome Sequencing Consortium (http://www.wheatgenome. org/).

Full-Sequencing of Representative Clones
For each representative clone determined after clustering the end sequences of clones, both the 5# and 3# ends were resequenced using the Big Dye Terminator version 3.1 Cycle Sequencing Kit and then analyzed with an ABI 3730xl DNA sequencer (Applied Biosystems) to confirm the clone sequence identity. Next, the internal regions were sequenced using a primer-walking method until the sequences from opposite directions overlapped. We designed the sequencing strategy so that at least two reads covered every part of the insert region of each cDNA clone to ensure sequence quality. Finally, we assembled these reads to create a consensus sequence for each clone.

Removing Redundancy
All-against-all BLASTN searches (Altschul et al., 1990) were conducted among the barley (Hordeum vulgare) FLcDNA sequences to detect redundant sequences. When an FLcDNA was similar to another FLcDNA with 90% or greater identity and 95% or greater coverage, the FLcDNAs were clustered. If two FLcDNAs that did not overlap were in the same cluster, we considered that a fused transcript that bridged the two FLcDNAs was contained in this cluster. In this case, the cDNA was discarded and the cluster was separated into two clusters.

Prediction of ORFs and Functional Annotation
Representative ORFs were predicted by BLASTX searches against the RefSeq and UniProtKB databases with a positive match cutoff set at an E value of less than 1 3 10 220 . If there were no homologous proteins in the databases, the longest ORFs (more than 70 amino acids in length) were assigned as predicted ORFs. If no ORFs were predicted for an FLcDNA, it was defined as a noncoding FLcDNA. Noncoding FLcDNAs were compared with rice micro-RNAs downloaded from miRBase (http://www.mirbase.org/; Griffiths-Jones et al., 2008) using BLASTN. To assign a gene function on the basis of conserved domains or motifs, InterProScan searches (Zdobnov and Apweiler, 2001;Hunter et al., 2009) were conducted using the predicted ORFs. Based on the results of the InterProScan searches, GO categories were assigned to each predicted ORF (Barrell et al., 2009). The GO assignments were categorized using map2slim software (http://www.geneontology.org/GO.slims.shtml#whatIs).

Comparison of FLcDNA Sequences with Complete Genome Sequences from Other Crop Species
To conduct genome-wide comparisons of the FLcDNA sequences, genome sequences and annotation data from four completely sequenced crop plants were obtained at the following sites: the Rice Annotation Project Database (RAP-DB; http://rapdb.dna.affrc.go.jp/) for Oryza sativa; the MaizeSequence database (http://www.maizesequence.org/index.html) for Zea mays; the JGI (http://genome.jgi-psf.org/Sorbi1/Sorbi1.home.html) for Sorghum bicolor; and BrachyBase (http://www.brachybase.org/) for Brachypodium distachyon. The FLcDNA sequences were mapped onto the genome sequences using BLASTN search at an E value of less than 1 3 10 25 .

Detection of Novel Barley FLcDNAs
mRNAs and ESTs from barley and wheat (Triticum aestivum) were downloaded from the DDBJ. The poly(A) sequences were then trimmed, and repetitive regions were masked using RepeatMasker software (http://repeatmasker. org). We conducted BLASTN searches of the sequences against all barley FLcDNAs with the threshold of a positive match set at an identity of 90% or greater and a coverage of 90% or greater. Because the data set of barley FLcDNAs contained redundant sequences, multiple hits for the query mRNA or EST sequences were accepted.

Comparison with Rice Trait Genes
A list of rice trait genes was downloaded from Oryzabase (http://www. shigen.nig.ac.jp/rice/oryzabase/genes/genesTop.jsp). Of 4,124 trait genes, 881 genes had RAP-DB transcript information. The amino acid sequences of these genes were used as queries for BLASTP analyses with the predicted ORFs of the Uni-FLcDNAs.

Comparison with Barley BAC Sequences
To compare the FLcDNAs with BAC sequences, one Haruna Nijo BAC sequence (Taketa et al., 2008) and 181 Morex BAC sequences were downloaded from the DDBJ/EMBL/GenBank. The Morex BACs were downloaded using the keywords "Morex" and "BAC." After masking repeat sequences using RepeatMasker and the libraries of MIPS Repeat Element Database 4.3 (http://mips.helmholtz-muenchen.de/plant/genomes.jsp; Spannagl et al., 2007) and Triticeae Repeat Sequence Database release 10 (http://wheat.pw. usda.gov/ITMI/Repeats/), the FLcDNAs were mapped onto the BACs using the same method used for RAP annotation (Tanaka et al., 2008). The FLcDNAs were first mapped using BLASTN to determine the possible mapping regions in the genomic sequences and further mapped using EST2genome (Mott, 1997) with thresholds of 95% or greater identity and 90% or greater coverage. When two or more FLcDNAs were mapped to the same locus, the longest FLcDNA was used for further analysis. All of the sequence data for the representative FLcDNAs have been submitted to the DDBJ (accession nos. AK353559-AK377172).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Distribution of insert lengths of FLcDNAs.
Supplemental Figure S4. Annotation of a BAC sequence using barley FLcDNAs.
Supplemental Table S1. Conditions used for sample preparation for library construction.
Supplemental Table S2. Homologous sequences for rice trait genes in the Uni-FLcDNA library.
Supplemental Table S4. Distribution of homologs of novel barley FLcDNAs in four grass species.
Supplemental Table S5. Distribution of homologs of representative barley FLcDNAs in four grass species.
Supplemental Information S1. Construction and evaluation of the barley FLcDNA libraries.