Multiple Models for Rosaceae Genomics

The plant family Rosaceae consists of over 100 genera and 3,000 species that include many important fruit, nut, ornamental, and wood crops. Members of this family provide high-value nutritional foods and contribute desirable aesthetic and industrial products. Most rosaceous crops have been enhanced by human intervention through sexual hybridization, asexual propagation, and genetic improvement since ancient times, 4,000 to 5,000 B.C. Modern breeding programs have contributed to the selection and release of numerous cultivars having signiﬁcant economic impact on the U.S. and world markets. In recent years, the Rosaceae community, both in the United States and internationally, has beneﬁted from newfound organization and collaboration that have hastened progress in developing genetic and genomic resources for representative crops such as apple ( Malus spp.), peach ( Prunus spp.)

cherry subgenera evolved in Asia Minor, particularly Iran, Iraq, and Syria (Vavilov, 1951), whereas both peach and apricot have a Chinese center of origin. Pots recovered from cave dwellings in Europe contained cherry pits dated from 5,000 to 4,000 B.C. (Marshall, 1954).
During ancient times, rosaceous fruits populated native human habitats, and fruits from these species were valuable sources of food. Subsequent centuries of selection and domestication produced the dramatic increase in fleshy fruit size that distinguishes today's commercial fruit from their wild relatives.
The economic importance of edible rosaceous crops derives from their flavorful fruits and nuts that provide unique contributions to dietary choices of consumers and overall human health. Rosaceous fruits are consumed in multiple forms, including fresh, dried, juice, and processed products. The variety of flavors, textures, and levels of sweetness and acidity offered by these fruits satisfies diverse consumer tastes and choices. Rosaceae fruits are also a major human dietary source of phytochemicals, such as flavonoids and other phenolic compounds, cyanogenic glucosides, phytoestrogens (Mazur et al., 2000), and phenols that could potentially yield health and disease-fighting advantages (Macheix et al., 1991;Swanson, 1998;Selmar, 1999). L-Ascorbic acid, quercetin, kaempferol, myricetin, p-coumaric acid, gallic acid, and ellagic acid are wellknown antioxidants and/or cancer-inhibiting compounds that have been identified in these fruits.
Epidemiological evidence suggests that diets rich in fruit and vegetables significantly reduce cancer risk (Caragay, 1992;Ziegler et al., 1996). In vitro and in vivo studies with animal models provide evidence that fruit and leaf extracts from many Rosaceae species inhibit some cancers or have strong antioxidant activities (Yau et al., 2002). Ellagic acid, present in strawberry, red raspberry, arctic bramble, cloudberry, and other rosaceous berries (Mandal and Stoner, 1990;Häkkinen et al., 1998;Masuda et al., 1999;Häkkinen and Tö rrö nen, 2000;Harris et al., 2001;Cordenunsi et al., 2002), has been shown to affect cell proliferation and apoptosis, suggesting a potential anticancer role. Some strawberry cultivars have the highest concentrations of L-ascorbic acid among all fruits (Haffner and Vestrheim, 1997). Other phenolic compounds belonging to distinct chemical classes, many of which are also potential antioxidants and anticancer agents (Eichholzer et al., 2001;Schieber et al., 2001), have been isolated from rosaceous fruits (Macheix et al., 1991).
growing human population. However, as malnutrition and infectious and nutrition-related diseases remain common, the research focus has begun to shift to include improving the dietary value of different foods and identifying novel compounds with pharmacological properties (Kishore and Shewmaker, 1999). Traditionally, new traits for improvement of crop plants have been primarily derived from their related wild species. However, genomics and bioinformatics advances over the past decade have provided new options for identifying useful compounds in plants and for manipulating encoding genes responsible for their production.
The profitable production of desirable fruits and nuts can be challenging, as rosaceous crop producers must balance stringent quality expectations with yield requirements, cost efficiency, and shipping/marketing constraints. A further challenge is to maintain crop quality following harvest while avoiding loss due to chilling injury disorders, decay, chemical contamination, and overripening. Thus, postharvest and consumervalued qualities and traits serve as ideal targets for crop product improvement. Complementary to the inescapable economic realities of crop production and distribution, consumer perception of quality characteristics deserves recognition as a primary driving force for cultivar development and establishment of research priorities. Consumers make quality judgments based on sensory perception (including smell, taste, and appearance) as well as perceived nutritional value. The flavor quality of fruits and nuts is determined by many criteria, including firmness, juiciness, sweetness, acidity, aroma, and texture.
Recent consumer interests in purchasing locally produced fruits and vegetables have highlighted the importance of rosaceous fruits and invigorated their production in areas having relative proximity to large population centers. Rosaceous crop products are valued precisely because of their nutritional and aesthetic qualities, thereby offering opportunities for varietal improvement aimed at both problem-solving and market expansion. The impressive gains achieved by conventional breeding notwithstanding, the translation of structural and functional genomics research into breeding tools and broadened research perspectives offers unprecedented opportunities for rapid and sustainable enhancement of value to all stakeholders.

ROSACEAE PHYLOGENY AND DIVERSITY DICTATE THE NEED FOR MULTIPLE MODELS
The Rosaceae family has been traditionally divided into four subfamilies grouped by fruit type. These include: Rosoideae (Rosa, Fragaria, Potentilla, and Rubus; fruit, achene; x 5 7, 8, or 9), Prunoideae (Prunus; fruit, drupe; x 5 8), Spiraeoideae (Spirea; fruit, follicle or capsule; x 5 9), and Maloideae (Malus, Pyrus, and Cotoneaster; fruit, pome; x 5 17; Potter et al., 2002). Recent phylogenetic analyses of combined sequence data from six nuclear and four chloroplast loci provided strong support for the division of Rosaceae into three subfamilies (Potter et al., 2007;Fig. 2). These include Dryadoideae (including Cercocarpus, Dryas, and Purshia; x 5 9), Rosoideae (including Fragaria, Potentilla, Rosa, Rubus, and others; x 5 7), and Spiraeoideae (including Kerria,Spiraea,and others;x 5 8,9,15,or 17). Each of the latter two subfamilies has been further divided into supertribes, tribes, and subtribes. With less emphasis on fruit type, which is clearly homoplasious in the family (Fig. 2), the taxa formerly included within Maloideae and Prunoideae have now been reclassified into Spiraeoideae, with Malus placed in the tribe Pyreae and Prunus in the tribe Amygdaleae (Potter et al., 2007). The pome-bearing members of the family, such as Malus and Pyrus, as well as closely related genera that bear polyprenous drupes (e.g. Crataegus), have been classified in subtribe Pyrinae, which corresponds to the former subfamily Maloideae.
Patterns of diversification within the Rosaceae, combined with the need for rapid translation of genomics research into agronomic practice, suggest that multiple rosaceous species must be designated as reference models for acquisition of genomic information. Currently, the best-developed model species for Rosaceae include apple, peach (Prunus persica), and diploid strawberry (Fragaria vesca; Table I). There is no widely accepted model for Dryadoideae, and because there are no major commercial crops within this subfamily, it will not be a focus of this report. Each of the three designated models represents a distant subtaxon within Rosaceae, and each has unique features making it well suited for targeted genomic research. Combined efforts toward developing integrated genetic and genomic resources in three diverse and representative species will advance genomic research in all rosaceous species.

APPLE
Apple, a pome fruit in which the floral receptacle is the fleshy edible tissue, is the most important deciduous tree fruit crop grown in the United States and around the world. The genus Malus has more than 25 species and hybrids; most apple cultivars are diploid (2n 5 2x 5 34), self-incompatible, and have a juvenile period of 3 to 7 years (Korban and Chen, 1992;Janick et al., 1996).
The apple genome, at 1.54 pg DNA/2C nucleus or 750 Mb per haploid genome complement, is approximately the same size as that of tomato (Solanum lycopersicum; Tatum et al., 2005). The apple molecular map currently has more than 900 markers on 17 linkage groups (Maliepaard et al., 1998;Liebhard et al., 2002Liebhard et al., , 2003aLiebhard et al., , 2003bXu and Korban, 2002b;Naik et al., 2006). Several efforts are under way to develop a substantial resource of molecular markers for genotyping and marker-assisted selection (MAS; Tartarini et al., 2000;Liebhard et al., 2002Liebhard et al., , 2003aLiebhard et al., , 2003bSilfverberg-Dilworth et al., 2006). Transgenic apple has been obtained via Agrobacterium-mediated transformation of cultivars (Ko et al., 1998;Bolar et al., 2000;Defilippi et al., 2004;Teo et al., 2006), leading to promising transgenic lines with enhanced resistance to serious orchard diseases such as apple scab (Venturia inaequalis) and fire blight (Erwinia amylovora; Ko et al., 2000;Bolar et al., 2001) and lines with suppressed ethylene and volatile esters in the fruit (Defilippi et al., 2004(Defilippi et al., , 2005. Antisense RNA technology was used to induce posttranscriptional gene silencing and doublestranded RNA-mediated posttranscriptional gene silencing used to engineer resistance to crown gall (Escobar et al., 2003). Over 300,000 apple ESTs have been generated from different tissues, conditions, and genotypes (Newcomb et al., 2006). Over 22,600 EST clones from nine apple cDNA libraries, subjected to temperature or drought stress, were clustered and annotated, then used to study gene expression in response to abiotic stress . Park et al. (2006) analyzed apple EST sequences available in public databases using statistical algorithms to identify genes associated with flavor and aroma components in mature fruit. Sung et al. (1998) generated ESTs from young fruit, peels of mature fruit, and carpels of the Fuji apple and found that most ESTs isolated from young fruit were preferentially expressed in reproductive organs, suggesting their important roles during reproductive organ development.
Two bacterial artificial chromosome (BAC) libraries were constructed (Xu et al., 2001;Xu and Korban, 2002a) and used for map-based positional cloning of a cluster of receptor-like genes within the scab resistance Vf locus in apple (Xu and Korban, 2002a). Other apple BAC libraries (from both fruiting cultivars and rootstocks) were constructed and are being used for gene cloning and mapping (Vinatzer et al., 1998(Vinatzer et al., , 2001. Recently, a scaffold physical map was constructed using BAC fingerprinting . A complete genome sequence of the apple is under way, and  Potter et al. (2007) and modified from their figure 4. Positions of several economically important genera are indicated, and a hypothesis for the evolution of fruit types, based on parsimony optimization of that character, is represented by the different colors of the branches. The three subfamilies recognized by Potter et al. (2007) are shown on the right margin. Dry, Dryadoideae. The tree was rooted with representatives of the family Rhamnaceae (Rh). it will be completed by mid-2008 (R. Velascco, personal communication).

PEACH
Prunus has characteristic stone fruits or drupes in which seeds are encased in a hard, lignified endocarp (the stone), and the edible portion is a juicy mesocarp. Agriculturally important stone fruit species include P. persica (peach, nectarine), Prunus domestica (European or prune plum), Prunus salicina (Japanese plum), Prunus cerasus (sour cherry), Prunus avium (sweet cherry), Prunus armeniaca (apricot), and Prunus amygdalus (almond). The peach karyotype (x 5 8) has a clearly identifiable, large submetacentric chromosome and seven smaller chromosomes, two of which are acrocentric (Jelenkovic and Harrington, 1972;Salesses and Mouras, 1977). Little is known about the chromosomal location and organization of gene sequences in peach, but recent fluorescence in situ hybridization in the closely related almond (Prunus dulcis) has enabled detection of individual chromosomes based on length and position of ribosomal DNA genes (Corredor et al., 2004). Peach chromosomal organization is probably similar to that of other Amygdalus species, as crosses between peach and those closely related species (Prunus ferganensis, Prunus mira, Prunus davidiana, Prunus kansuensis, and the cultivated almond) are possible and produce fertile hybrids. Crosses between peach and species of other subgenera (Prunophora and Cerasus), such as apricot (P. armeniaca), myrobalan plum (Prunus cerasifera), European plum (P. domestica), and Japanese plum (P. salicina), are also possible, but fertile hybrids are only occasionally produced (Scorza and Sherman, 1996). However, as predicted by their divergent evolutionary origin, sweet and sour cherry are considered the commercially important Prunus fruit species most distant from peach.
All commercial peach cultivars belong to P. persica, including nectarines, which differ from peach by the absence of pubescence on the fruit surface. This characteristic segregates as a simple trait, and it is presumably controlled by a single gene or a few closely linked genes. Unlike most congeneric species that exhibit gametophytic self-incompatibility, the peach is self-compatible. Breeding cultivars through selfpollination (Miller et al., 1989) combined with recent genetic bottlenecks (Scorza et al., 1985) have resulted in lower genetic variability in peach as compared with other Prunus crops (Byrne, 1990). Peach is the genetic and genomic reference species for Prunus because of its high economic value, self-compatibility allowing for development of F 2 progenies, availability of homozygous doubled haploids (Pooler and Scorza, 1997), and possibility of shortening its juvenile period to 1 to 2 years following planting (Scorza and Sherman, 1996). However, one drawback is that peach transformation remains inefficient.
Various Prunus maps connected with anchor markers can be found at the Genome Database for Rosaceae (GDR; see below), including more than 2,000 markers, the backbone of which is the reference Prunus map. This consensus map has been constructed with an interspecific almond 3 peach F 2 population (Texas 3 Earlygold) and currently has 827 transportable markers covering a total distance of 524 cM (average density of 0.63 cM/marker; Howad et al., 2005). Twenty-eight

STRAWBERRY
The cultivated strawberry with its accessory fruit comprised of a fleshy receptacle bearing many achenes (the ''true'' fruit) is genomically complex due to its octoploid (2n 5 8x 5 56) composition (Davis et al., 2007). Currently, 23 species are recognized in the genus Fragaria, including 13 diploids (2n 5 2x 5 14), four tetraploids, one hexaploid, and four octoploids (Folta and Davis, 2006). Accessions of the cultivars' two octoploid progenitor species, Fragaria chiloensis and Fragaria virginiana, are valued by strawberry breeders as sources of novel traits, especially pest resistance and abiotic stress tolerance. Because strawberry is a relatively new crop, dating to the 1700s (Darrow, 1966), as few as three introgressive backcrosses can yield selections of cultivar quality (J. Hancock, personal communication).
The current, provisional model of the octoploid strawberry genome constitution is AAA'A'BBB'B' (Bringhurst, 1990). This suggests that the octoploids are diploidized allopolyploids that have descended from four diploid ancestors. Although the diploid ancestry of the octoploid has yet to be definitively established, the leading ancestral candidate diploids are F. vesca and Fragaria iinumae, with Fragaria bucharica and Fragaria mandshurica as additional intriguing possibilities (Folta and Davis, 2006).
One ancestral diploid strawberry, F. vesca, has emerged as an attractive system for both structural and functional genomics due to its many favorable features (Folta and Davis, 2006). Plants are compact enough to be grown on a small scale in a laboratory, easy to propagate vegetatively, self-compatible, and produce many seeds per plant (Darrow, 1966; Folta and Davis (2006).
An efficient transformation protocol renders F. vesca amenable to genetic manipulation (El Mansouri et al., 1996;Haymes and Davis, 1998;Alsheikh et al., 2002;Oosumi et al., 2006). The diploid model complements the development of transformation systems for the octoploid cultivated strawberry (Folta and Davis, 2006;Folta and Dhingra, 2006;Mezzetti and Costantini, 2006), allowing direct ingress of transgenes into breeding populations. The availability of robust transfor-mation and reverse genetic systems in strawberry provide excellent resources for the broader Rosaceae genomics community by uniquely complementing the abundant DNA sequence resources available in apple and peach.
Diploid strawberry has several additional advantages over other widely used model species. Unlike the dry siliques of Arabidopsis, the fleshy strawberry fruit allows for studying the molecular mechanisms of fruit development and ripening. Furthermore, strawberry fruit has several unique properties, including nonclimacteric ripening and unusual metabolites that cannot be studied in traditional fruit models, such as tomato.
Recent evidence, based on nucleotide sequence data, has revealed a close phylogenetic relationship between Fragaria and Rosa (Potter et al., 2007), thus suggesting that genomic resources for either genus may be readily applicable to the other. Genomics resources for roses include a BAC library for Rosa rugosa consisting of 27,264 clones, 0.5 pg or 500 Mb, and corresponding to 5.2 3 haploid genome equivalents (Kaufmann et al., 2003). This BAC library has been used to develop a fine-scale physical map of the Rdr1 region to assemble a 400-kb tiling path of six BAC clones and covering the Rdr1 locus, which confers resistance to blackspot. Genes involved in ethylene perception, due to their role in flower initiation and senescence, and floral scent, among the most valuable traits in roses, have been of particular interest in rose research. Molecular studies have been conducted to understand ethylene perception and signal transduction pathways in rose by taking advantage of knowledge obtained from alterations in the triple response in ethylene-insensitive mutants in Arabidopsis Lavid et al., 2002). Wang et al. (2004) isolated seven putative 1-aminocyclopropane-1-carboxylate synthase clones from a cDNA library of aging rose petals of Rosa hybrida 'Kardinal.' cDNA libraries from petals of the two tetraploid R. hybrida 'Fragrant Cloud' and 'Golden Gate' have been constructed and over 3,000 cDNA clones sequenced from these libraries. An annotated petal EST database of approximately 2,100 unique genes has been generated and used to create a microarray. Scalliet et al. (2006) explored the evolutionary pathway of scent production in European (Rosa gallica officinalis and Rosa damascena) and Chinese rose (Rosa chinensis and Rosa gigantea) species, both progenitors of the modern hybrid rose, R. hybrida. As genomic resources continue to develop for strawberry and rose, their applicability to each other and other Rosaceae will be more clearly defined.

ROSACEAE GENOMICS
The Rosaceae genomics initiative, like those of other crop groups, exploits and extends the fundamental knowledge generated in the Arabidopsis model system, which has dramatically advanced our under-standing of the physiology, biochemistry, genetics, and evolution of plants. Arabidopsis is characterized by its small genome, ease of transformation, rapid regeneration time, and sexual fecundity. As extensive forward and reverse genetic resources are available, this renders Arabidopsis an agile system for exploring fundamental biological questions. The genomic information generated directly from Arabidopsis has proven to translate well to other Brassicaceae. However, Arabidopsis shares only partial functional overlaps with many crop plants in more phylogenetically remote plant families. Here, other suitable research-friendly species have been established as family-specific models, including Medicago truncatula and Lotus japonicus for Fabaceae, rice (Oryza sativa) and Brachypodium for Poaceae, and tomato and potato (Solanum tuberosum) for Solanaceae. For families with valuable crop species, genomics has progressed by selecting a representative model species for the family and focusing research efforts on developing genomic tools for that system. Findings are then translated directly to other related species of agricultural importance.
In recent years, the application of molecular technologies has steadily increased for Rosaceae (Hokanson, 2001). Concerted efforts have been undertaken by the Rosaceae community to develop genomics tools for this economically important family. Coordination within the community led to the establishment of a dedicated bioinformatics portal (GDR) and the production of a national Rosaceae white paper as well as a U.S. Department of Agriculture (USDA) request for proposals that eventually funded 13 new projects to advance Rosaceae genomics (www.rosaceaewhitepaper. com). As these projects have unfolded, the need for developing multiple, subfamily-specific model systems within the Rosaceae has become apparent, focusing on peach (Prunus, Amygdeloideae), apple (Malus, Maloideae), and strawberry (Fragaria, Rosoideae) and their relative community strengths in the areas, respectively, of structural genomics, functional genomics, and reverse genetics. Initial translational genomic efforts have been directed toward developing high-density molecular linkage maps to accelerate breeding through MAS (Dirlewanger et al., 2004b).

Marker-Assisted Breeding
Markers tightly linked to major genes for important traits, such as disease and pest resistances, fruit or nut quality, and self-incompatibility, have been developed in apple (Bus et al., 2000;Tartarini et al., 2000;Huaracha et al., 2004;Tahmatsidou et al., 2006;Gardiner et al., 2007), peach, and strawberry (T. Sjulin, personal communication), and are currently being used for MAS in both public and private breeding programs (Dirlewanger et al., 2004b).
Most rosaceous crops are characterized by long generation cycles and large plant size; hence, the ability to eliminate undesirable progenies in breeding populations through MAS reduces cost and allows breeders to focus on populations comprised of individuals carrying desirable alleles of genes of interest. In peach, Etienne et al. (2002) and Quilot et al. (2004) mapped several major genes and QTLs related to fruit quality in intra-and interspecific peach progenies. Blenda et al. (2006) identified an amplified fragment length polymorphic (AFLP) marker significantly associated with peach tree short life, a complex disease syndrome. This identification could lead to a MAS application in peach. Lalli et al. (2005) generated a resistance map for Prunus with 42 map locations for putative resistance regions distributed among seven of eight linkage groups, and both Lu et al. (1998) and Dirlewanger et al. (2004a) identified markers linked to genes determining root knot nematode resistance. Selection for self-compatibility alleles can be performed in Prunus self-incompatible species using markers based on genes coding for this character (Habu et al., 2006;López et al., 2006). In strawberry, a series of markers has been developed for traits of economic importance, namely, for anthracnose resistance ( Sugimoto et al., 2005). In addition, a sequence-characterized amplified region (SCAR) marker, SCAR R1 A , has been linked to locus Rpf 1 , which confers resistance to a race of Phytophthora fragariae var. fragariae, which is the causal organism of red stele root rot disease in cultivated octoploid strawberry (Haymes et al., 2000). The development of additional markers for MAS is dependent on the ability to perform genome scans of a progeny from a population segregating for a trait of interest and then validating the trait-marker association(s) in alternate populations. However, in many rosaceous species, a genome scan is currently not possible due to a lack of evenly distributed markers that are polymorphic in the respective breeding population. Thus, marker development remains a top priority in most, if not all, rosaceous crops.

Physical Maps
Physical maps are important resources for positional cloning, marker development, QTL mapping, and cloning. They serve as useful platforms for wholegenome sequencing efforts. Physical maps are available for several Rosaceae species, including peach and apple. The current physical map of peach is estimated to cover 287.1 Mb of the approximate peach genome size of 290 Mb (Baird et al., 1994), and it is comprised of 1,899 contigs, 239 of which are anchored to eight linkage groups of the Prunus reference map. The peach map is a second generation map , expanded by high-information-content fingerprinting of additional BAC library clones from the existing BAC libraries of peach. A total of 2,511 markers have been incorporated into the physical framework. Due to the abundance of hybridization data, the initial physical map for peach was biased toward expressed genome regions. According to a conservative estimate, the map covers the entire euchromatic peach genome. This map is publicly available at www.rosaceae.org.
A BAC-based physical map of the apple genome from 74,281 BAC clones representing approximately 10.53 haploid equivalents has been constructed , and a physical map of F. vesca is under construction using a large-insert BAC library (V. Shulaev and A.G. Abbott, personal communication).

Genome Sequencing
Full genome sequencing is an invaluable tool for plant researchers. The Rosaceae community will benefit greatly from sequencing genomes of the representative models for each of the three subfamilies. At present, the largest publicly available genomic sequence resource in Rosaceae is that of F. vesca, from which approximately 1.75 Mb of genomic sequence derived from 50 genomic sites (as fosmid clones) has been deposited in GenBank under accession numbers EU024823 to EU024872. Preliminary analysis of these sequences has revealed a gene density of about one gene per 6 kb and a simple sequence repeat (SSR) density of about one SSR locus per 5 kb (Folta and Davis, 2006). The available Fragaria genomic sequence database will provide a valuable resource in relation to assembly and annotation of other rosaceous genomes.
Peach is a prime candidate for complete sequencing among Rosaceae models, because it has a comparatively small genome and close structural genomic relationship to many other important fruit crops (Dirlewanger et al., 2004b). The full genome sequence of peach will provide information on upstream regions of genes (promoters and enhancers) that are important for gene regulation, aid in identifying genes not discovered through EST sequencing projects, and provide a reference genome for comparative genomics in Rosaceae. The combination of the intrinsic advantages of peach as a model organism, the advent and application of high-throughput technologies for genetic and genomic analyses, and ongoing research collaborations is generating a wealth of genomic information. An initiative to sequence the peach genome was announced by the Joint Genome Institute at the 2007

FUNCTIONAL GENOMICS
Complete sequencing of several plant genomes, including Arabidopsis and rice, has identified tens of thousands of genes. About one-half are of unknown function. A major challenge for genomics is to identify the functions of unassigned genes and use this knowledge to improve economically important agronomic and quality traits in major crops.

ESTs
EST sequencing projects are under way for several rosaceous species. ESTs enable gene and marker discovery, aid genome annotation and gene structure identification, guide single nucleotide polymorphism characterization and aid proteome analysis (Nagaraj et al., 2007). Over 400,000 ESTs are currently available for Rosaceae species via the National Center for Biotechnology Information (NCBI) dbEST and the GDR (www.rosaceae.org). With National Science Foundation (award no. 0321701; Schuyler Korban, principal investigator) funding, over 104,000 5# end apple EST sequences have been generated, and a 40,000 unigene set has been identified. Another approximately 150,000 5# end apple ESTs have been deposited in the NCBI by HortResearch in New Zealand (Newcomb et al., 2006). The combined publicly available apple ESTs add up to over 260,000 sequences, resulting in a 17,000 unigene set (NCBI, apple).
EST sequencing projects in peach are being conducted in several laboratories worldwide. Many of these projects have focused on fruit development, so other tissues are underrepresented in the database. All publicly available ESTs for peach have been downloaded from NCBI and housed in GDR. There are currently approximately 87,751 ESTs with a unigene set comprised of 23,721 different sequences. These ESTs have come predominantly from laboratories of the Prunus Genome Mapping consortium, supported by the USDA Initiative for Future Agricultural and Food Systems program (Albert Abbott), ESTree (Carlo Pozzi), and the Chile Consortium (Ariel Orellano).
Approximately 50,000 EST sequences from F. vesca and several thousand from Fragaria 3ananassa have been deposited in GenBank by several contributing laboratories. From F. vesca, cDNA libraries have sampled cold-, heat-, and salt-stressed seedlings (J.P. Slovin and P.D. Rabinowitz, unpublished data), and flower buds (R.L. Brese and T.M. Davis, unpublished data). These sequences have proven invaluable for annotation of Fragaria genomic sequences and as a source of candidate gene and random clones for use in reverse genetics investigations. Nevertheless, when EST sequence resources are considered, development of libraries and strawberry sequencing has lagged behind other major fruit crops and warrants further investment in genomic resource development. Cultivated strawberry represents a potential diverse source of alleles that may be directly relevant to selected traits of interest. Furthermore, an accounting of alleles may further illuminate the ancient diploid contributors to the octoploid strawberry genome.
The GDR development team has assembled and analyzed genera-specific unigenes for approximately 370,000 Rosaceae ESTs available at the GDR Web site (Jung et al., 2008). The analysis includes identification of putative functions by pairwise comparisons against protein datasets, with SSR and single nucleotide polymorphism detection. Sequence comparison with 14 of the major plant EST datasets suggests that up to 14% of the Rosaceae unigenes may be unique to the family. Many of these genes are being functionally characterized in strawberry (K. Folta, personal communication). The availability of full genome sequence is expected to verify these numbers and provide a more refined gene catalog of Rosaceae species.

Reverse and Forward Genetics
Two fundamentally different approaches are currently being used to elucidate gene function. A forward genetics approach begins with the observation of a unique phenotype and seeks to identify the gene(s) responsible for that phenotype. Conversely, reverse genetics begins with a candidate gene and describes the mutant phenotypes that result upon its disruption. Despite the fact that small genomes render forward genetics a potentially useful strategy for gene discovery, currently there are few forward-genetic populations available for rosaceous crops.
One approach to generating mutant phenotypes in plants is to inactivate a gene by inserting foreign DNA via Agrobacterium-mediated transformation (Krysan et al., 1999). This approach can be used for both forward and reverse genetics. An advantage of using insertional mutagenesis to disrupt gene function is that all mutations are tagged by a T-DNA sequence, enabling rapid direct identification of the gene related to the observed phenotype. In plants with completely sequenced genomes, identification of a short sequence of genomic DNA flanking the T-DNA insertion allows unambiguous identification of the mutated gene and linkage of the insertion to a chromosomal location. For plants that have not been completely sequenced, molecular tools such as EST collections and BAC libraries may allow the successful cloning and characterizing of a gene of interest.
Whereas T-DNA mutants may produce loss-of-function phenotypes, comparable studies have utilized a gainof-function approach through activation tagging (Kardailsky et al., 1999;Weigel et al., 2000). Here, a set of transcriptional enhancers (typically from the cauliflower mosaic virus 35S promoter/enhancer) is integrated into the genome using Agrobacterium-mediated transformation. Upon generally random integration, the enhancers greatly increase the basal transcription rates of nearby promoters. This approach is particularly useful in identifying gene function from members of multigene families where an effective knockout strategy is precluded by genetic redundancy.
F. vesca is an excellent candidate to generate a collection of T-DNA or transposon insertion or activation tagged mutants. With its small genome, F. vesca will require a minimal number of independent mutant lines to cover the complete genome. Based on the approximately 200-Mb genome size of F. vesca and assuming random T-DNA integration and an average gene length of 2 kb, we calculate that 255,000 or 391,000 independent T-DNA transformed lines must be produced to mutate any single gene with 95% or 99% probability, respectively (Krysan et al., 1999). Gene content and spacing in Fragaria are similar to those of Arabidopsis (K.M. Folta and T.M. Davis, unpublished data), suggesting that these goals are realistic. A collection comprised of both T-DNA insertional mutants and AcDs activation tag lines is currently under development, with over 1,000 independently regenerated; these are now being characterized at Virginia Tech. Fewer activation tagged lines will be needed, because one tag may affect several kilobase pairs of gene space. Sequencing flanking regions adjacent to T-DNA integrations will provide resources both for reverse genetics and many markers for future mapping efforts while revealing genes that affect important traits and interesting phenotypes.

RNA Interference
Computational analysis of approximately 120,000 ESTs from apple has identified 10 distinct sequences that can be classified as representatives of seven conserved plant microRNA (miRNA) families, and secondary structure predictions reveal that these sequences possess characteristic fold-back structures of precursor miRNAs . Gleave et al. (2008) analyzed approximately 120,000 M. 3domestica 'Royal Gala' ESTs to identify 10 distinct sequences that could be classified into seven conserved plant miRNA families. Three of these miRNAs were expressed constitutively in all tissues tested, whereas others showed more restricted patterns of expression. A candidate gene approach coupled with RNA interference (RNAi) silencing is being used to determine the function of apple ESTs in resistance to the bacterial fire blight disease (Norelli et al., 2007). Bioinformatics is used to identify publicly available apple ESTs either uniquely associated with fire blight-infected apple or similar to Arabidopsis ESTs associated with Pseudomonas syringae pv. tomato infection. An efficient transformation system for apple has been developed in which only a single EST-silencing gene is inserted per transgenic line to select for apple RNAi mutants. The system uses a multi-vector transformation approach and PCR primers developed for pHellsgate8-derived vectors that detect presence of single or multiple EST silencing insertions in RNAi transgenic line and provide a sequencing template for determining the EST contained in the silencing insert. Additional candidate ESTs are currently being selected based upon cDNA suppression subtractive hybridization and cDNA-AFLP analyses (Norelli et al., 2007).
Neither identification of a mutant phenotype nor identification of a gene sequence alone will explain the molecular function of a gene. Modern functional genomics relies heavily on high throughput profiling methodologies like gene expression profiling, proteomics, metabolomics, and bioinformatics for detailed gene characterization. Most importantly, readily available transformation systems provide means for validating gene function in vivo.

Transcriptomics
Apple has the largest collection of ESTs and microarrays and will provide the foundation for rosaceous microarray and gene expression analyses. Several apple microarrays are currently available or under construction and are available on different platforms, including Nimblegen, Invitrogen, and Affymetrix. Pichler et al. (2007) have demonstrated that field microarray experiments provide a viable approach for measuring seasonal changes in gene expression during apple bud development. Teamed with transgenic plants, custom apple arrays have also revealed that ethylene controls substantial production of flavorassociated volatiles , and other studies have investigated early fruit development (Newcomb et al., 2006;Lee et al., 2007). Some of the most fruitful early plant microarray experiments have been performed in strawberry. Using custom arrays, transcripts associated with ripening (Schwab et al., 2001), stress-and auxin-induced gene expression , volatile production (Aharoni et al., 2004), fruit development , and flavor (Aharoni et al., 2000) have been identified. These seminal studies represent well-designed and highly productive approaches that answered fundamental questions in plant biology. More importantly, research suggests that contemporary studies with larger probe sets may lead to important discoveries for a multitude of plant responses.

Proteomics
Proteomics studies in the family are limited to date. A proteomic approach was applied to study flesh browning in stored Conference pears (Pedreschi et al., 2007) and to identify novel isoforms of Pru av 1, the major cherry allergen in fruit (Reuter et al., 2005). Other studies to investigate the roles of dehydrins, among others, in cold temperature stress responses of peach and apple using proteomic approaches are ongoing (Renaut et al., 2006(Renaut et al., , 2008.

Metabolomics
Rosaceous plants are extremely rich in specialized metabolites, many of which have documented utility in human health and nutrition. Chemistry and biosynthetic pathways of flavonoids, anthocyanins, and phenolics have been extensively studied in numerous rosaceous fruits with targeted analytical assays. Recent developments in metabolomics allow for global analysis and interrogation of metabolic networks. An untargeted metabolomics approach based on Fourier transform ion cyclotron mass spectrometry was used to study four consecutive stages of strawberry fruit development and identified novel information on the metabolic transition from immature to ripe fruit . Metabolomic profiles of apple fruit revealed changes provoked by UV-white light irradiation and cold storage in primary and secondary pathways associated with ethylene synthesis, acid metabolism, flavonoid pigment synthesis, and fruit texture (Rudell et al., 2008). Broader use of metabolomics in Rosaceae will speed the discovery of novel gene functions in primary and secondary metabolism and will provide comprehensive data sets necessary to model metabolic networks related to human health-promoting metabolites.

High-Throughput Transformation and Gene Function Validation
High-throughput profiling platforms and bioinformatics data mining generate numerous biological hypotheses about candidate gene functions that must be tested in subsequent experiments. These follow-up experiments rely upon the availability of a robust transformation protocol. Transformation systems have been developed in many rosaceous crops, including apple, peach, rose, and strawberry. Using degenerate primers designed from conserved regions of transcription factor genes, Ban et al. (2007) identified a transcription factor from apple skin responsible for red coloration; its function was verified in transient transformation of apple seedlings as well as in transgenic tobacco (Nicotiana tabacum). A similar candidate gene approach has been used to identify a transcription factor responsible for red flesh and foliage color in apple Espley et al., 2007).

ROSACEAE COMPARATIVE GENOMICS
A phylogenetic view provides a refined basis for the inference of homology, orthology, and functional conservation with well-studied plant species. Common markers between maps constructed from intra-and interspecific populations of Prunus allow for comparisons among genomes of seven species, including peach, almond, apricot, cherry, P. cerasifera, P. davidiana, and P. ferganenesis. All comparisons are essentially syntenic and colinear Dirlewanger et al., 2004b), thus strongly suggesting that the Prunus genome is similar within the genus and in agreement with patterns of frequent intercrossability and interspecific hybrid fertility known between many species of this genus. Only two major chromosomal rearrangements, both reciprocal translocations, have been reported. One was reported in the F 2 of a cross between Garfi almond and the red leaf peach rootstock Nemared and involved linkage groups 6 and 8 (G6 and G8; Jauregui et al., 2001). This translocation was also found in the cross Akame 3 Juseitou, where Akame is also a red-leafed genotype . The other reciprocal translocation was found in a genotype of myrobalan plum (P. cerasifera) and affecting G3 and G5 (Lambert et al., 2004). These results suggested that reciprocal translocations might occur and were maintained within a given Prunus species characterizing monophyletic transects (i.e. red-leafed peaches), although the majority of individuals within a species were within the general Prunus genome configuration.
Pear maps constructed by Dondini et al. (2004), Pierantoni et al. (2004), and Yamamoto et al. (2002 have sufficient SSRs to enable alignment of all pear and apple linkage groups. Map comparisons suggest that genome organization is conserved between apple and pear. A comparison between apple and peach genomes (Fig. 3) is possible based on 30 common loci (24 random fragment length polymorphisms [RFLPs] and six isozymes) found between the Texas 3 Earlygold Prunus map and the Prima 3 Fiesta Malus map (Dirlewanger et al., 2004a). Prunus linkage groups 3 and 4 (G3 and G4) have three or more markers in common with the Malus map; each Prunus group has detected a pair of homoeologous apple groups with the same marker order. Each of two additional Prunus groups (G2 and G7) has two markers in common with one Malus linkage group, and G1 has eight markers in common with Malus. G1 is the longest and most populated linkage group with markers in Texas 3 Earlygold and most other Prunus maps, suggesting that it corresponds with chromosome 1. Malus does not have a similar long chromosome. Four markers found in the upper part of Prunus G1 correspond to two homoeologous groups of Malus, and the four remaining markers map to a different Malus linkage group. These results indicate that the long Prunus chromosome may be split into two in Malus or in the original species that formed the Malus amphidiploid.
Results of the diploid strawberry map of F. vesca 3 F. nubicola  indirectly suggest that both genomes are highly conserved as markers mapped on their F 2 generate the expected seven linkage groups. Findings by Sargent et al. (2006) indicate that linkage groups found in the octoploid strawberry map can be aligned into seven homoeologous groups, each with four linkage groups. Each homoeologous group corresponds to one of the diploid Fragaria linkage groups. A study between diploid Fragaria and tetraploid Rosa suggests that synteny conservation is high, but chromosomal rearrangements may have occurred between these two genera (markers from two Rosa linkage groups are located in one Fragaria linkage group; M. Rousseau, personal communication).
A comparison between the diploid strawberry and peach genomes based on more than 10 markers per Fragaria linkage group is currently in progress. Results suggest that these two distant genomes still conserve synteny but have undergone reshuffling since their divergence from a common ancestor. Synteny is revealed by considering the eight Prunus linkage groups and noting that five contain most or all markers from only one group of Fragaria and the other three include markers of two groups. Thus, reshuffling must have occurred to a large extent with various fission/fusion, translocation, and inversion events.
The Arabidopsis sequence has been compared with the map positions of: (1) RFLPs mapped in Texas 3 Earlygold, most of them detected with probes based on Rosaceae or Arabidopsis EST sequences (Dominguez et al., 2003); (2) peach ESTs anchored to the Prunus consensus map (Jung et al., 2006); (3) peach ESTs physically located in Prunus BAC contigs (Jung et al., 2006); (4) complete sequence of one and partial sequences of two peach BAC clones (Georgi et al., 2003); and (5) complete sequence of a single myrobalan plum (P. cerasifera) BAC clone (Claverie et al., 2004b).
In all cases, syntenic regions can be found, but these are limited to small DNA fragments. This suggests an overall view of partial genome conservation. The largest Prunus-Arabidopsis conserved region identified by Dominguez et al. (2003) comprised 25 cM in LG2 of Prunus corresponding to 5.4 Mb of Arabidopsis chromosome 5. These results suggest that comparative mapping for gene discovery outside of the family may not be feasible, underscoring the importance of complete genomic characterization of representative species in the family.
A novel marker concept based upon ''gene pair'' polymorphisms provides a promising approach to comparative genomics (Davis et al., 2008). This concept exploits the small genome sizes and the concomitantly small intergenic distances characteristic of rosaceous species. As demonstrated in F. vesca, the commonality of small (2-5 kb) intergenic distances facilitates the amplification of gene pair PCR products in which a forward primer is targeted to conserved coding sequence in one gene and the reverse primer is similarly targeted in an adjacent gene. By mining the abundance of polymorphism in a respective intergenic region, gene pair PCR products yield conveniently detectable polymorphisms of multiple types, enabling the respective loci to be utilized as robust anchor loci for comparative mapping in the various rosaceous species (Davis et al., 2008).

ROSACEAE BIOINFORMATICS AND DATA INTEGRATION
Several important bioinformatics resources have been developed for various rosaceous crops over the  (Joobeur et al., 1998) and Malus (Maliepaard et al., 1998). Prunus linkage groups are noted with G and Malus groups as L followed by a number. Only linkage groups with two or more common markers have been included. Markers in parenthesis in Prunus correspond to approximate positions deduced from other maps. Incomplete linkage groups are indicated by two parallel oblique lines. last few years. Sequence databases like ESTree db (www.itb.cnr.it/estree/) with EST information for peach (Lazzari et al., 2005), the F. vesca EST database (FVdbEST, http://www.vbi.vt.edu/;estap), and a candidate gene database (http://www.bioinfo.wsu.edu/ gdr/projects/prunus/abbott/PP_LEa/) and transcript map for peach (Horn et al., 2005) provide important repositories for Rosaceae sequence information and have advanced genomics research in the family.
''Plant Databases: A Needs Assessment White Paper''  stressed that access to curated, integrated, and searchable genome databases is essential to fully leverage the wealth of genetic, genomic, and molecular data generated worldwide by plant genome projects. Such data integration gateways have been created for many model species (i.e. The Arabidopsis Information Resource [Rhee et al., 2003]; Oryzabase: An Integrated Biological and Genome Information Database for Rice [Kurata and Yamazaki, 2006]; and MaizeGDB, the community database for maize genetics and genomics data [Lawrence et al., 2007]) and for plant families (i.e. Gramene, A Resource for Comparative Mapping in the Grasses [Jaiswal et al., 2006]; SGN, the Solanaceae Genomics Network ; and LIS, the Legume Information System [Gonzales et al., 2005]).
The creation in 2003 of the federally funded GDR (www.bioinfo.wsu.edu/gdr) was an important step toward integrating all Rosaceae structural and functional genomics initiatives (Jung et al., 2004(Jung et al., , 2008. Initiated in response to the growing availability of genomic data for peach, GDR now contains comprehensive data for the genetically anchored peach physical map, Rosaceae maps, markers and traits, and all publicly available Rosaceae ESTs. The ESTs are assembled to produce unigene sets of each genus and the entire Rosaceae. Other annotations include putative function, microsatellites, open reading frames, single nucleotide polymorphisms, plant and gene ontology terms, and anchored map position where applicable. Most of the published Rosaceae genetic maps can be viewed and compared through CMap (Ware et al., 2002), the comparative map viewer. The peach physical map can be viewed using WebFPC/WebChrom and also through an integrated GDR map viewer, which provides a graphical interface to the combined genetic, transcriptome, and physical mapping information categorized by taxonomy, tissue type, putative function, and gene ontology. The GDR datasets are available to download or search in batch mode using Web-based BLAST (Altschul et al., 1990) or FASTA (Pearson and Lipman, 1988) servers. Other Web-based tools include a sequence assembly server using CAP3 (Huang and Madan, 1999) and an SSR server for identifying microsatellites and primers in sequence data sets.
The Rosaceae research community consistently uses GDR as a portal for disseminating and mining data, coordinating collaborations and workgroups, and sharing important news and publications. Usage fig-ures show 218,409 visits and 2,070,880 pages accessed between August 1, 2006 and July 31, 2007. GDR will continue to curate, integrate, and visualize new Rosaceae genomics and genetics data as they become available. These will include genetic maps, markers and traits, phenotype and genotype data, apple physical map, microarray data, and genome sequences from apple, peach, and strawberry.

ROSACEAE COMMUNITY EFFORT AND THE FUTURE OF ROSACEAE GENOMICS
The Rosaceae Genomics Initiative (RosIGI, http:// www.bioinfo.wsu.edu/gdr/community/international/) was established to link and coordinate cross-Rosaceae genomics efforts internationally. These include EST and genome sequencing, genetic and physical mapping, molecular marker discovery, development of a microarray gene expression analysis platform, forward and reverse genetic tools, and high throughput gene validation. In the United States, these efforts are coordinated by the U.S. Rosaceae Genomics, Genetics, and Breeding Executive Committee. Rather than selecting a single model and focusing all resources on developing a full array of resources and tools for it, the Rosaceae community adopted an alternative strategy of developing family-wide resources and leveraging the most appropriate system to maximize efficiency. This strategy has been effective in other plant and microbial research communities (i.e. legume and Phytophthora). A white paper outlining long-and short-term strategies has been prepared by the community and is available at the GDR Web site (www. rosaceaewhitepaper.com).

BEYOND THE HORIZON
Genomics efforts in the Rosaceae family represent more than simply another set of organism sequences. The accelerating accumulation of genomics-level information in this particular family brings great advantages. Over the past 5 years, while various ''revolutionary'' genomics tools have come and gone, a validated set of proven microarray platforms, marker development methods, and high-throughput cloning systems has been established. These proven tools are ripe for application in questions applicable to rosaceous plant biology and crop production. New high-throughput sequencing techniques, means for quantitative gene expression analyses, and novel phenotyping platforms are in their infancy and will mature around economically useful species rather than model systems. The crop species within Rosaceae are well positioned to benefit from these emerging technologies.
Being a late arrival to the party has its advantages. The codified Rosaceae community can coalesce around standardized techniques, plant model systems, and computational tools residing in a central database, while strengthening comparative studies and fortify-ing meaningful meta-analyses. Most importantly, in commitment to the translational genomics paradigm, the Rosaceae genomics community is uniquely poised to incorporate breeders' and growers' input into genomics-level experimental design, bringing greater direct impact to basic science studies while resisting the allure of generating knowledge only for knowledge's sake. With versatile, relevant plant systems, emerging models and a concerted approach, ongoing studies in Rosaceae genomics will contribute to facets of basic plant science while bringing higher-quality products to the consumer with lower environmental impacts. Received January 9, 2008;accepted May 13, 2008;published May 16, 2008. LITERATURE CITED