A dynamic interface for capsaicinoid systems biology.

Capsaicinoids are the pungent alkaloids that give hot peppers (Capsicum spp.) their spiciness. While capsaicinoids are relatively simple molecules, much is unknown about their biosynthesis, which spans diverse metabolisms of essential amino acids, phenylpropanoids, benzenoids, and fatty acids. Pepper is not a model organism, but it has access to the resources developed in model plants through comparative approaches. To aid research in this system, we have implemented a comprehensive model of capsaicinoid biosynthesis and made it publicly available within the SolCyc database at the SOL Genomics Network (http://www.sgn.cornell.edu). As a preliminary test of this model, and to build its value as a resource, targeted transcripts were cloned as candidates for nearly all of the structural genes for capsaicinoid biosynthesis. In support of the role of these transcripts in capsaicinoid biosynthesis beyond correct spatial and temporal expression, their predicted subcellular localizations were compared against the biosynthetic model and experimentally determined compartmentalization in Arabidopsis (Arabidopsis thaliana). To enable their use in a positional candidate gene approach in the Solanaceae, these genes were genetically mapped in pepper. These data were integrated into the SOL Genomics Network, a clade-oriented database that incorporates community annotation of genes, enzymes, phenotypes, mutants, and genomic loci. Here, we describe the creation and integration of these resources as a holistic and dynamic model of the characteristic specialized metabolism of pepper.

The characteristic burning sensation produced by hot peppers (Capsicum spp.) is caused by capsaicinoids, alkaloid compounds that are synthesized and accumulate in pepper fruit (Nelson and Dawson, 1923). Beyond use as a vegetable or colorant, use as a spice is the major trait for which peppers are known and has driven the global dispersal of this New World crop (Govindarajan, 1985). Ecologically, a function of capsaicinoids in pepper is to deter frugivory by mammals by binding a thermoreceptor, TRPV1, found on nociceptive nerve fibers, thereby creating a sensation of burning pain (Caterina et al., 1997;Tewksbury and Nabhan, 2001). Additionally, capsaicinoids have been found to exhibit antimicrobial properties (Billing and Sherman, 1998;Tewksbury et al., 2008) A combinatorial biosynthesis is responsible for capsaicinoid diversity. The structure of capsaicin, the predominant form of the molecule, was solved in 1923 (Nelson and Dawson, 1923;Fig. 1, structure 1a). In 1968, Bennett and Kirby showed that a mixture of several forms of capsaicin (capsaicinoids) was found in hot peppers. These capsaicinoids were shown to share an aromatic portion (Fig. 1, structure 1) derived from Phe as per the contemporary understanding of the phenylpropanoid pathway (Bennett and Kirby, 1968). Leete and Louden and other groups demonstrated that variable acyl moieties (Fig. 1, structures a-i) were derived from catabolism of amino acids that were subsequently elongated by the fatty acid synthase (Leete and Louden, 1968;Kopp and Jurenitsch, 1981;Suzuki and Iwai, 1984;Markai et al., 2002), a biosynthesis similar to that known for the membrane lipids of Bacillus subtilis. Congeners are found where vanillyl alcohol (Fig. 1, structures 2a-2c) or coniferyl alcohol (Fig. 1, structures 3a and 3b) comprise the aromatic moiety of capsaicinoids in place of vanillylamine, rendering the molecule nonpungent (Kobata et al., 1998(Kobata et al., , 1999(Kobata et al., , 2008. Variation in the acyl moiety affects the degree and quality of the pungent sensation (Todd et al., 1977;Krajewska and Powers, 1988;Walpole et al., 1993). Capsaicin and dihydrocapsaicin (Fig. 1, structures 1a and 1b) differ only by the saturation of the acyl moiety and are equally the most potent of the capsaicinoids; changes in the length or branching of the fatty acid from what is found in these two capsaicinoids decrease the potency of the molecule. Modified capsaicinoids have been reported where either the aromatic ring is glycosylated (Fig. 1, structures 6a and 6b) or the acyl group is oxidized (Fig.  1, structure 1j), yielding metabolites that are similar to detoxification products reported for capsaicinoids in animal systems (Ochi et al., 2003;Higashiguchi et al., 2006;Reilly and Yost, 2006). The aromatic rings can also be oxidatively coupled between pairs of compounds ( Fig. 1, structures 4a and 5a), similar to the bonding found between lignin monomers (Diaz et al., 2004).
The SOL Genomics Network (SGN; http://www. sgn.cornell.edu) is a central hub that integrates genomic and biochemical pathway data for the Solanaceae research community. The SGN Web site houses both community-contributed and curated maps and sequence information as well as tools to help researchers link these genomic data to the phenome (Mueller et al., 2005). A component of this resource is SolCyc, a metabolic pathway database for the Solana-ceae that helps researchers navigate small molecule metabolism, by representing chemical structures, enzymes, cofactors, reactions, and pathways, and linking them with the other SGN resources, such as loci, accessions, and phenotypes (Menda et al., 2008). SGN's clade-oriented approach integrates data from several related organisms, which provides the evolutionary context to study the origins of the biochemical variation in the background of conserved solanaceous genomes.
The creation of a metabolic resource for CapCyc, the Capsicum-specific database within SolCyc, is critical to structure biological studies of capsaicinoid biosynthesis around metabolic information that was scattered throughout the literature. While progress has been made in the identification and analysis of genes and proteins found in the historical conceptualization of capsaicinoid biosynthesis (Curry et al., 1999;Kim et al., 2001;Aluru et al., 2003;Lee et al., 2006), the overall capsaicinoid metabolic scheme has not been updated in light of revisions of related pathways in other organisms or considered in terms of its cellular context (Fig. 2). Results from heterologous systems are especially useful in Capsicum because it is not a model organism, as judged by the common criteria of generation time, genome size, and molecular resources. The SGN database aids the ability to capitalize on Capsicum being a member of the Solanaceae, the most developed comparative genetic system in the dicotyledonous plants. Comparative linkage maps (Livingstone et al., 1999), genetic marker sets (Wu et al., 2006), genes controlling quantitative traits (Zygier et al., 2005), proteomics data (Rose et al., 2004), microarrays (Alba et al., 2004, EST libraries, and the in-progress genome sequence are all resources being developed in tomato (Solanum lycopersicum) that can also be utilized by the Capsicum community.
In addition to genomic resources, there is also shared biochemistry among genera of the Solanaceae. Medium-length, branched-chain fatty acids are an unusual metabolite in plants but common in the Solanaceae. Most prominently, they are found as sugar esters in exudates from glandular trichomes that cover the aerial epidermis of plants of the Nicotiana, Datura, Petunia, and Solanum genera (Severson et al., 1985;King et al., 1987King et al., , 1990King and Calhoun, 1988;Chortyk et al., 1997). Capsicum is not known to have glandular trichomes, but it has glandular regions in the fruit that accumulate branched fatty acids as capsaicinoids (Suzuki and Iwai, 1984;Zamski et al., 1987;Stewart et al., 2007;Fig. 2). Across the Solanaceae, branched-chain fatty acids are derived from branched-chain amino acids via two different elongation mechanisms. Nicotiana and Petunia extend these primers with acetyl-CoA in one-carbon increments (a-ketoacid elongation) by reiterative cycles of the Leu biosynthetic pathway (Kandra et al., 1990;Kroumova et al., 1994;Kroumova and Wagner, 2003;Slocombe et al., 2008). In contrast, Solanum, Datura, and Capsicum have been shown to elongate the same substrates but Capsaicin is most commonly the most abundant capsaicinoid in hot peppers. Other capsaicinoids have been reported with aromatic group 1 paired with various acyl groups (a-j). Ester forms of the molecule have also been found (2a-2c, 3a, and 3b). Other reported capsaicinoids are the result of capsaicin being oxidized (4a and 5a), glycosylated (6a and 6b), or hydroxylated (1j), each of which reduces or eliminates pungency. These compounds are marked with asterisks. The discovery of capsaicinoids 1a to 1i has been reviewed by Suzuki and Iwai (1984). References to the other compounds are found in the text.
Here, we describe the creation of a comprehensive SolCyc pathway for capsaicinoid biosynthesis. This pathway integrates all information to date on the subbranches of capsaicinoid biosynthesis that are elucidated in pepper and other systems as a testable model for the production of capsaicinoids. As a preliminary test of this hypothesis, we cloned the predicted transcripts from pepper tissue that was actively synthesizing capsaicinoids. These sequences were integrated and annotated in the SGN database and made available as a comparative, freely accessible, and dynamic resource that will evolve in pace with progress in this area of research and be useful in comparative systems.

CapCyc Model
A testable model for capsaicinoid biosynthesis was developed based on the literature and metabolic databases that incorporates work in Capsicum, related genera, and model organisms (Fig. 2). The reactions and their subcellular localization form a metabolic hypothesis for capsaicinoid biosynthesis. This model combines phenylpropanoid and benzenoid metabolisms (Fig. 3) and medium-length, branched-chain fatty acid biosynthesis (Supplemental Fig. S1), which were classically considered as part of capsaicinoid biosynthesis, and further includes Phe (Supplemental Fig. S2) and branched-chain amino acid biosynthesis (Fig. 4;Suzuki and Iwai, 1984). Previously, amino acid synthesis was not included in the capsaicinoid biosynthetic pathway, because amino acids are considered "primary" metabolites and labeled amino acids fed into pepper stems are incorporated into capsaicinoids. However, these amino acids are not "transport" amino acids that are commonly found at high concentration in the phloem (Lea and Ireland, 1999). Furthermore, if we consider the abundance of fruit that can be produced by a single pepper plant and the high level of capsaicinoids that is possible per fruit in some accessions (more than 5% fruit dry weight; Bosland and Baral, 2007), it seems necessary to consider that the biosynthesis of these amino acids would be upregulated within the capsaicinoid-producing tissue in order to supply a sufficient amount of precursors. In view of the substantial amount of capsaicinoids synthesized during a short developmental period (Iwai et al., 1979), we posit that the model should include not just the substrates but also their production and the regeneration of cofactors. Numerous such considerations and assumptions were part of this process and are detailed in Supplemental Text S1.
Capsaicin and its precursor pathways were added to the SolCyc database (Fig. 5) by adding the pathway to the MetaCyc reference database (Caspi et al., 2006) using the Pathway Tools Pathway Editor functions. Pathway Tools allows visualization of pathways in great detail as well as a display of the integration of individual pathways within the larger metabolic network of an organism in a Pathway Genome Database (PGDB;Caspi et al., 2006;Fig. 5). At present, SolCyc has PGDBs for tomato (LycoCyc), potato (Solanum tuberosum; PotatoCyc), tobacco (Nicotiana tabacum; To-baccoCyc), Petunia (PetuniaCyc), and pepper (Cap-Cyc), all derived from annotations of unigene sets (Mueller et al., 2005). PGDBs were initially computationally predicted based on sequence annotation and  (Stewart et al., 2007). B, Capsaicinoid biosynthetic pathway modeled in this cell. Chorismate is produced from the shikimate pathway (step 1), which is used to synthesize Phe in the plastid (step 2). In the cytoplasm, associated with the endoplasmic reticulum, Phe is converted to feruloyl-CoA by phenylpropanoid metabolism (step 3) and is likely converted to vanillylamine by an unknown enzyme in an unknown compartment (step 4). Pyruvate is the precursor to Val (step 5), which is exported to the mitochondria and catabolized to isobutyryl-CoA (step 6). Isobutyryl-CoA returns to the plastid, where it is elongated to 8-methylnonenoic acid by the fatty acid synthase (step 7). Export from the plastid is concomitant with formation of the CoA thioester. The location and mechanism of the final condensation and export of capsaicin out of the cell (step 8) are debated. The biosynthesis of capsaicin is shown, other capsaicinoids are formed through variation in steps 5 to 7, and ester forms of the molecule are likely formed by variation in steps 3 and 4. [See online article for color version of this figure. ] the MetaCyc reference database. In CapCyc, the phenylpropanoid pathway was already predicted from this template, but Val degradation, the source of the fatty acid moiety of capsaicinoid molecules, had to be manually annotated based on experimentally verified information from the literature along with other reaction steps that were unique or proposed to be unique to Capsicum based on the literature synthesis (Supplemental Text S1). Information gathered for each locus included literature citations, EC numbers, SGN unig-ene sequences, and Gene Ontology and Plant Ontology annotations.

Capsaicinoid Systems Biology
Plant Physiol. Vol. 150, 2009 been shown to be highly expressed at 20 DPA and enriched in the placental dissepiment, coinciding with capsaicinoid accumulation (Aluru et al., 2003;Stewart et al., 2005). Recovery of transcripts predicted by our model from this tissue would be consistent with functions related to capsaicinoid biosynthesis.
Forty-two new transcripts were recovered, corresponding to candidates for 29 previously unaccounted . Branched-chain amino acid metabolism leading to capsaicinoids. Amino acids are synthesized in the plastids and then transported to the mitochondria, where they are deaminated and decarboxylated prior to import back into the plastid to serve as primers for fatty acid biosynthesis. Cofactors are to the left of the reaction arrows and enzymes are to the right. a-KG, a-Ketoglutarate; HE-TPP, hydroxyethyl-thiamine pyrophosphate. Sources are as follows: http://www.kegg.com, Leete and Louden (1968), Kopp and Jurenitsch (1981), Walters and Steffens (1990), van der Hoeven and Steffens (2000), Diebold et al. (2002), Li et al. (2003), and Slocombe et al. (2008). enzymatic steps in this new model of capsaicinoid biosynthesis (Table I). BLASTX searches of the Arabidopsis (Arabidopsis thaliana) proteome identified homologs for each cloned transcript with greater than 70% amino acid identity, on average, and increased to almost 80% identity after removing the predicted targeting peptides from the alignment. Most cloned transcripts were integrated seamlessly with our model, but there were some exceptions. For two genes, ketoacyl-ACP synthase III and the enoyl-ACP reductase component of the fatty acid synthase (KasIII and ENR; Supplemental Fig. S1), pairs of distinct sequences were recovered. Single nucleotide polymorphisms were also observed in some transcripts, albeit infrequently. The coding sequences of cinnamic acid 4-hydroxylase and Phe ammonia-lyase (C4H and PAL; Fig. 3) were sequenced in their entirety here because previously they were only known as partial sequences from Capsicum chinense (Curry et al., 1999). Only the 3# end of NADH-dependent Glu synthetase (NADH-GOGAT; Fig. 5; Supplemental Fig. S2) was obtained, and no transcripts were detected for Thr deaminase or dihydroxy acid dehydratase (TD and DHAD; Fig. 4). The plastid-encoded subunit of the acetyl-CoA carboxylase complex (ACCase bCT; Supplemental Fig.   S1) was cloned from genomic DNA. Two candidates for the acyl-CoA synthetase (Supplemental Fig. S1) that exports fatty acids from the plastid were cloned, both of which were more similar to Arabidopsis proteins other than the major plastid acyl-CoA exporter (Schnurr et al., 2002). An array of alternative candidates have been described that will be investigated in further iterations of our model (Shockey et al., 2003). A second homolog was cloned of the putative aminotransferase that is a candidate for catalyzing the formation of vanillylamine from vanillin (Fig. 3). These candidates for vanillylamine synthesis are questionable; pAMT and pAMT2 are the most similar to POP2 in Arabidopsis (Fig. 3). POP2 is a g-aminobutyric acid transaminase that is implicated in pollen tube growth and also in response to stress and nitrogen storage and mobilization (Palanivelu et al., 2003;Miyashita and Good, 2008). One homolog of arogenate dehydratase was cloned from pepper and was most similar to one of the Arabidopsis candidates for this activity, ADT6 (Supplemental Fig. S2).
These sequences were deposited in GenBank and annotated in the SGN database as new Capsicum unigenes. Loci added to the SGN database were manually curated using the community curation interfaces  PNT a n/a n/a n/a n/a n/a n/a n/a Thr deaminase TD e n/a n/a n/a AT3G10050 f n/a n/a n/a Acetolactate synthase Dihydroxyacid dehydratase DHAD n/a n/a n/a AT3G23940 f n/a n/a n/a  (Menda et al., 2008). Briefly, each locus was annotated and can be queried with locus symbol, name, description, or enzymatic activity. The subsequent building of CapCyc incorporated links to these new unigenes as an automated function of the database (Menda et al., 2008).

Compartmentalization
The enzymes in our model and/or the reactions they catalyze are known to be restricted to certain organelles (Fig. 2). Previously, only EST sequences for the majority of these genes were available. The cloning of full-length transcripts allowed a secondary test of our model: the consistency of the modeled and predicted subcellular localizations of the enzymes determined using TargetP (Emanuelsson et al., 2007). In support of these predictions, the in silico localizations of the pepper proteins were compared with their Arabidopsis homologs (Table II), many of which have experimentally verified locations (Friso et al., 2004;Heazlewood et al., 2007). These predictions also revealed the length of the targeting peptide that is cleaved after organelle import and therefore is informative about the length of the mature, functional protein. The presence of peptides that would target proteins across the thylakoid membrane of the plastid from the stroma was ignored, as none of these en-zymes would be expected to be found in the thylakoid lumen and the existence of thylakoids on this internal fruit organelle is unknown. There were rare notable discrepancies between modeled and sequence-predicted locations. BCAT, the branched-chain amino acid aminotransferase that is thought to be active in both the synthesis of amino acids in the plastid and their catabolism in the mitochondria, was predicted to be targeted to the mitochondria in pepper, while the closest Arabidopsis homolog is one of several copies in the genome and targeted to the plastid (Diebold et al., 2002). Conflicts between the model, TargetP prediction, and experimental evidence for the Arabidopsis homolog also exist for pAMT and the acyl-CoA synthetases.

Genetic Linkage Mapping in Pepper
Cloned genes were placed on a Capsicum linkage map as RFLPs ( Fig. 6; Table I). In combination with previous studies (Aluru et al., 2003;Blum et al., 2003;Stewart et al., 2005), these results account for the location of nearly the entire capsaicinoid biosynthetic model in the Capsicum genome. The consensus of the markers on this map were amplified fragment length polymorphisms (AFLPs), but some shared markers, such as RFLPs, were included to allow the alignment of this map with other maps (http://www.sgn.cornell. edu). The total map length was 1,374 centimorgan and  consisted of 178 markers separated by an average distance of 7.7 centimorgan. The development of COSII markers in pepper will supersede AFLP and RFLP as an ideal way to link this map with the tomato genome and with other pepper maps (Wu et al., 2006(Wu et al., , 2009. The majority of genes were detected as one-or two-copy loci scattered throughout the genome. This is likely an underestimate for several gene families, where paralogs are too divergent for RFLP probe cross-hybridization. For example, there are 272 cytochrome P450 genes in the Arabidopsis genome (Schuler and Werck-Reichhart, 2003), and the two members of this family implicated in capsaicinoid biosynthesis, C3H and C4H, were detected in pepper as one-and two-copy families, respectively. CCoAOMT loci were found as a linked pair of loci on the pepper pseudo-linkage group that combines chromosomes 1 and 8. Pseudolinkage between these chromosomes is a commonly observed phenomenon in interspecific maps in pepper involving crosses between C. annuum and C. chinense or C. frutescens (Livingstone et al., 1999;Wu et al., 2009). Other homologous loci were found to be linked: two ACS loci were found on chromosomes 1 and 8; HCT and AT3 (Pun1) are two members of the BAHD superfamily that are involved in capsaicinoid biosynthesis and are linked on chromosome 2. Of the loci that could not be placed on the genetic map, the CCRb and 4CL loci were linked to each other but could not be associated with a pepper chromosome. Several candidates mapped to chromosomes 1 and 8, but one of the arms of chromosome 5 contains none of the candidates mapped to date. Figure 6. Genetic map of capsaicinoid candidate genes in pepper. This genetic linkage map of pepper represents the 12 chromosomes and one linkage group that could not be assigned to a chromosome. Molecular markers are shown to the right of the chromosome, and candidate genes for capsaicinoid biosynthesis are underlined. Genes mapped in previous studies are shown to the left. Numbers to the left of the chromosomes indicate the centimorgan distances between adjacent markers. Gene names are abbreviated as in Table I.

CapCyc Evolution
SGN is a central hub for the Solanaceae research community. The creation of the CapCyc pathway within SGN provides a natural forum for information related to the elucidation of capsaicinoid biosynthesis in Capsicum. Many of the reaction steps in the capsaicinoid biosynthetic model have not been demonstrated experimentally in pepper but were instead inferred from Solanaceae or other species. In some metabolisms, such as Phe (Cho et al., 2007) and vanillin biosynthesis (Walton et al., 2003), gaps exist in the model where reaction steps are unknown.

Condensation and Export
Capsaicinoid synthase, the final enzyme of capsaicinoid biosynthesis, is yet to be identified. Assays for capsaicinoid synthase activity have been performed with both CoA-activated fatty acids and free fatty acids, with ATP, magnesium, and free CoA, ostensibly to supply substrates and cofactors to endogenous acyl-CoA synthetases present in crude extracts. The formation of capsaicinoids from the CoA-activated fatty acids was more than six times greater than with the free fatty acid, leading to an interpretation of a twostep reaction: first, the formation of the acyl-CoA, followed by transfer to vanillylamine . However, the study of a number of acyltransferase enzymes of the BAHD superfamily that transfer CoA-activated acyl groups to a variety of metabolites has suggested that capsaicinoids would be formed in this manner (St-Pierre and De Luca, 2000;Ma et al., 2005). The gene that is mutated in bell pepper, AT3, which results in a loss of pungency, was recently discovered and found to encode a member of this class of acyltransferase enzymes (Stewart et al., 2005). AT3 appears to be a hot spot for loss-of-pungency mutations in other species and therefore is believed to be critical in the evolution of pungency (Stewart et al., 2007). While AT3 is a candidate to encode for capsaicinoid synthase, peppers with loss-of-pungency mu- tations in AT3 have been reported to synthesize capsaicinoids when the immediate precursors, vanillylamine and branched-chain fatty acids, are supplied exogenously (Iwai et al., 1977). Another candidate had been reported, Csy1, but that report has since been retracted (Prasad et al., 2006(Prasad et al., , 2007. The final condensation enzyme is still an open question. The details of the location of the accumulation and the synthesis of capsaicinoids from vanillylamine and fatty acids have also been a matter of much debate. There is a widespread misconception that the site of capsaicinoid accumulation is the pepper seeds (Stewart et al., 2007). Numerous lines of evidence clearly show that the placental dissepiment is the site of this accumulation and biosynthesis by differentiated cells that line the epidermis of this internal fruit structure Zamski et al., 1987;Stewart et al., 2007; Fig. 2), but there are differing reports of the intracellular site of capsaicinoid synthesis and route of secretion (Suzuki and Iwai, 1984;Zamski et al., 1987). Once out of the cell, capsaicinoids accumulate underneath the cuticle in fluid-filled "blisters" (Stewart et al., 2007), where they are near or in contact with the seeds.

Candidate Genes and Strategies for Nonmodel Genomes
The size of the pepper genome is estimated to be approximately 3,000 Mbp. This is three times the size of the tomato genome and 20 times larger than the Arabidopsis genome (Arumuganathan and Earle, 1991). The gene content of these plants is similar, but the amount of intervening repetitive DNA found as heterochromatin is much increased (Van der Hoeven et al., 2002). A common and efficient approach to access genomic information in pepper is through the use of a positional candidate gene approach (Thorup et al., 2000). This employs cloned genes as candidates for phenotypes of interest and tests for association on genetic maps. Candidate gene approaches have the advantage of being very efficient, but they are also dependent on possession of sequences related to the biological process being studied. An advantage of working in an experimental system like the Solanaceae is the ability to use sequences or map positions in a comparative way to associate phenotypes with genes across genera. This candidate gene approach has been very successful in pepper, identifying the gene for loss of pungency in bell pepper and quantitative trait locus candidates for pungency using genes from pepper (Stewart et al., 2005(Stewart et al., , 2007 and identifying pepper genes that are associated with fruit color loci in tomato (Thorup et al., 2000) and other disease resistance and quality traits (Paran and van der Knaap, 2007). Our collection of pepper sequences related to capsaicinoid biosynthesis will be immediately valuable as candidates for pungency quantitative trait loci in existing pepper populations and will be a resource for discovering the molecular nature of those traits.
A limitation of our approach is that we only identified genes predicted by our model. In addition to not discovering novel genes, regulatory genes were not included in our study because there was no straightforward way to design primers for their cloning based on homology relationships. Future iterations of this model can include genes that were not predicted to be part of the metabolic pathways by the inclusion of genes that are up-regulated during capsaicinoid biosynthesis. Previously, the spatial and temporal expression pattern of capsaicinoids has been used to identify new candidate genes for this pathway, including regulatory factors (Curry et al., 1999;Kim et al., 2001;Aluru et al., 2003;Blum et al., 2003;Lee et al., 2006). A microarray approach in Solanum has already generated many candidates that are up-regulated in trichomes (Slocombe et al., 2008). These resources, and EST libraries in SGN, should be mined for relevance to capsaicinoid biosynthesis, but it should also be kept in mind that differences in mRNA abundance would not necessarily be found for genes involved in other processes or regulated at levels other than transcription.

Comparative Metabolism
The genus Solanum includes tomato and its wild relative S. pennellii, which, like pepper, produce fleshy fruit as a mode of seed dispersal. Genetic components that suggest the ability to produce capsaicinoids are present and expressed, but capsaicinoids are not. A meta-analysis of genes from the capsaicinoid biosynthetic model was conducted by determining the relative abundance of trichome-derived ESTs in tomato unigenes ( Table I). Most of the capsaicinoid transcript candidates were found to have homologs in these libraries. For KasIIIa, FatB, and components of the pyruvate dehydrogenase and branched-chain a-ketoacid dehydrogenase complexes, a substantial portion of the members of the Solanum unigene derived from "trichome" libraries. Slocombe et al. (2008) corroborated this expression for a subset of these genes but were limited by the inclusion of less than half of these structural genes on the tomato microarray. The metabolites that are direct upstream precursors to the capsaicinoids, vanillic acid, and fatty acids are present in tomato (and crosses between tomato and S. pennellii), vanillin is present in the fruit, and the same medium-length, branched chain fatty acids found in capsaicinoids are abundant (specifically 8-methylnonanoic acid; Fig. 1, structure b; Lewinsohn et al., 2001;Moco et al., 2006). In the future, CapCyc-related sequences will be linked to tomato orthologs, allowing researchers to move between model organisms. CONCLUSION We have presented a unified model of the capsaicinoid biosynthetic pathway, the specialized metabolism that is characteristic of hot pepper. In support of this model, the predicted transcripts were cloned from pepper fruit tissue actively synthesizing capsaicinoids, anchored by homology to Arabidopsis loci (The Arabidopsis Information Resource; www.arabidopsis.org), and referenced to tomato unigenes (SGN; www.sgn. cornell.edu), effectively linking this information in GenBank with two other major plant-specific databases. Because the subcellular compartmentalization of many of these enzymes is known in Arabidopsis, we were able to test whether the enzymes encoded by the pepper transcripts are predicted to be targeted to the proper organelle by homology and the presence of characteristic targeting peptides in the pepper sequences. Map locations were determined as a resource for a positional candidate gene approach. These data are dynamically integrated and accessible at the SGN Web site as a community resource.

Plant Materials
Seeds for Santa Fe Grande (Capsicum annuum), a pungent pepper cultivar that produces capsaicinoids, were generously provided by the Chile Pepper Institute (New Mexico State University, Las Cruces). C. annuum is the most cultivated pepper species and is the model pepper species used in SGN. Plants were grown in the Guterman Greenhouse Complex at Cornell University with supplemental lighting and standard greenhouse practices.

Bioinformatics
CapCyc was generated using Pathway Tools software (Karp et al., 2002), developed at SRI International, using MetaCyc (Caspi et al., 2006) as the reference database. Pathway Tools is a comprehensive software package to create, manipulate, edit, visualize, curate, and share pathway information that is written in Common Lisp; it has been applied to several hundred bacterial genomes and about a dozen different plant species. Pathway Tools can be run with a relational back end or based on flatfiles, the latter of which was used in this work. Pathway Tools can be downloaded from the SRI Web site at: http:// biocyc.org/download.shtml. The PathoLogic module (Karp et al., 2002) of Pathway Tools generated a new PGDB using MetaCyc as a reference database and a C. annuum unigene set generated at SGN (Mueller et al., 2005). Annotations for the C. annuum unigenes were obtained by transferring annotations of Arabidopsis (Arabidopsis thaliana) loci (Berardini et al., 2004) as referenced in the AraCyc database (Mueller et al., 2003). Unigenes were selected that matched Arabidopsis sequences with an E value of less than 1 3 10 220 in BLAST (Altschul et al., 1990). The Pathway Tools editor functionalities were used to validate and manually curate pathways specific to Capsicum. Annotations for the loci identified in this work were also added to the SGN locus database using the SGN community curation tools (Menda et al., 2008).

Molecular Biology
Fruit were harvested at 20 DPA to coincide with peak capsaicinoid-related transcript expression, and the placental dissepiment was collected and frozen in liquid nitrogen. Total RNA was extracted using the Qiagen RNeasy kit according to the manufacturer's protocol. 3# and 5# cDNA libraries were generated using the Clontech Smart Race kit and used as the template for PCR amplification of overlapping candidate gene fragments. Primer sequences and RACE product lengths are listed as Supplemental Table S1. PCR products were gel purified (QiaQuick; Qiagen) and cloned into pCR 4 TOPO plasmid vectors prior to electroporation into TOP10 Escherichia coli (Invitrogen). Plasmid DNA was extracted for sequencing using the QiaPrep kit (Qiagen) and sequenced by the Cornell University Life Sciences Core Laboratories Center.

Prediction of Subcellular Localization
Toward determining the subcellular localization of the candidate enzymes for capsaicinoid biosynthesis, the predicted translations of cloned genes were submitted to TargetP and analyzed as described (Emanuelsson et al., 2007). Those sequences that were predicted to contain a plastid-targeting peptide were further analyzed by BLAST searches (http://plantrbp.uoregon.edu/) to identify a putatively orthologous protein whose subcellular localization was experimentally determined as described at http://ppdb.tc.cornell.edu (Friso et al., 2004) and http://www.mitoz.bcs.uwa.edu.au/applications/suba2/ index.php (Heazlewood et al., 2007). Alignment between the predicted pepper proteins and the identified peptides of the mature protein provided a means to further evaluate the accuracy of the predicted targeting peptide cleavage site.

Genetic Mapping
A mapping population was constructed from an interspecific cross of line 1154 (C. annuum accession A44750157; Nijmegen Botanical Garden) and PI 152225 (C. chinense). An F2 population of 182 individuals was used to create the map, which consisted of 200 AFLP, RFLP, and COSII markers and candidate genes from the capsaicinoid biosynthetic pathway. The AFLP markers (performed by Nunhems Netherlands) were chosen as a subset of approximately 450 markers distributed throughout the genome. Procedures for RFLP and AFLP analyses and genetic mapping were described previously (Ben Chaim et al., 2001). The genetic map was constructed using MAPMAKER software (Lander et al., 1987). Map distances were computed with the Kosambi mapping function. Chromosome numbers were assigned to linkage groups based on known markers previously mapped to pepper chromosomes.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Fatty acid elongation related to capsaicinoids.
Supplemental Table S1. RACE primers used for transcript cloning.
Supplemental Text S1. Biosynthetic model assumptions and considerations.