Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

Plants synthesize a wide variety of specialized ("secondary") chemicals that influence their interactions with the environment. Commercial interest in these plant natural products stems from their importance as pharmaceuticals, in plant aromas and flavors, and as feedstocks for industrial chemicals. The synthesis of these metabolites typically occurs in specialized structures such as laticifers (Hagel et al., 2008), resin ducts (Trapp and Croteau, 2001), or trichomes (Werker, 2000). Trichomes are specialized cells present on the surfaces of many plants and are capable of synthesizing and either storing or secreting large amounts of specialized metabolites (Schilmiller et al., 2008). These include compounds of interest to humans such as food flavors from mint (Mentha spp.) and basil (Ocimum basilicum; Rios-Estepa et al., 2008;Xie et al., 2008), the antimalarial drug artemisinin from Artemisia annua , and tetrahydrocannabinol in medical marijuana (Cannabis sativa; Flores-Sanchez and Verpoorte, 2008). Because trichomes are located on plant surfaces and produce biologically active metabolites, they can protect against a number of environmental stresses, including herbivores and pathogens (Werker, 2000;Wagner et al., 2004).
Unlike Arabidopsis, which only makes nonglandular trichomes (Wagner et al., 2004), cultivated tomato (Solanum lycopersicum) also contains glandular trichomes (Luckwill, 1943). There are two abundant types of glandular trichomes on tomato organs. Type I trichomes have a multicellular stalk with a single, small gland cell at the tip (Fig. 1A). Type VI trichomes have a unicellular stalk with a four-cell glandular head (Fig. 1, A and C). Together, these glands produce a variety of compounds, including terpenes, acyl sugars, and phenylpropanoid-derived metabolites (Slocombe et al., 2008;Besser et al., 2009;Schilmiller et al., 2009Schilmiller et al., , 2010. In addition to documented chemical diversity, an increasing number of tomato genetic and genomic resources are becoming available, including an assembly of the tomato genome shotgun sequence (http:// solgenomics.net/). These and other tools make tomato trichomes amenable to studies of glandular trichome function and the synthesis of specialized metabolites.
Proteomics techniques are another useful set of tools for the discovery of enzymes and pathways. These techniques are especially powerful in studies of enriched cell types. Two-dimensional gel electrophoresis and shotgun proteomics approaches were utilized for tomato pollen (Sheoran et al., 2007), elicitor-treated opium poppy (Papaver somniferum) cell culture (Zulak et al., 2009), guard cells of Arabidopsis (Arabidopsis thaliana) and Brassica napus (Zhao et al., 2008;Zhu et al., 2009), and soybean (Glycine max) root hairs (Brechenmacher et al., 2009). Laser capture microdissection combined with proteomics was used to analyze vascular bundles from Arabidopsis stems (Schad et al., 2005) and pericycle cells from maize (Zea mays) primary roots (Dembinsky et al., 2007). Proteomics analysis of isolated trichomes revealed insights into their metabolism and function. For example, several stress-related proteins were found in tobacco (Nicotiana tabacum) trichomes (Amme et al., 2005), and proteins involved in sulfur metabolism and detoxification were revealed in Arabidopsis trichomes (Wienkoop et al., 2004). An in-depth analysis of basil peltate trichomes resulted in the identification of 881 proteins across four different basil lines (Xie et al., 2008).
The lack of genome sequence information limits the ability to discover proteins representing new genes in nonmodel species (Carpentier et al., 2008). Although the number of plants with high-coverage genomic sequences is increasing at a fast rate, proteomics studies in many plants continue to rely on EST sequences (Faurobert et al., 2007;Alvarez et al., 2008;Xie et al., 2008;Zulak et al., 2009). Because ESTs are enriched in protein-coding sequences, they can be especially useful for analyzing proteomics data. However, most species have a relatively small amount of transcriptome sequence available. This problem is compounded by the fact that ESTs typically are derived from many different tissues, limiting the representation of sequences expressed in specific cell types. Technological advances now enable deep cDNA sequencing, permitting more complete representation of expressed genes compared with traditional Sanger sequencing of cDNA clones (Cheung et al., 2006;Weber et al., 2007;Lister et al., 2009;Rounsley and Last, 2010). The utility of combining proteomics and deep cDNA sequencing was recently demonstrated using isolated pea (Pisum sativum) chloroplast envelope protein and total leaf and hypocotyl cDNA sequencing using 454 technology (Bräutigam et al., 2008).
Our systematic studies on glandular trichome function (www.trichome.msu.edu; Schilmiller et al., 2009Schilmiller et al., , 2010 led to a need for a comprehensive RNA and protein "parts list." A large trichome-specific EST sequence collection was generated using 454 pyrosequencing (http://bioinfo.bch.msu.edu/trichome_est; Figure 1. Tomato stem and petiole trichomes. A, Tomato stems and petioles are covered by multiple trichome types, including the glandular type I and type VI trichomes. B and C, Mixed-type trichome samples (B) were isolated by scraping frozen tissue in liquid nitrogen, and type VI gland samples (C) were isolated by glass bead abrasion and filtering with nylon mesh screens.  Schilmiller et al., 2009) and used to analyze shotgun proteomics data from isolated trichomes. The combined transcriptome and proteome data set was mined to identify genes and proteins expressed in trichomes. The utility of this information was demonstrated by characterization of a new sesquiterpene synthase that produces b-caryophyllene and a-humulene in tomato trichomes.

RESULTS
Sequencing and Assembly of Tomato cv M82 Trichome cDNA Glandular trichomes express genes encoding proteins unique to these specialized cell types Shepherd et al., 2005). Because trichomes represent a small fraction of cells on vegetative and reproductive structures, the existing collections of tomato cDNA sequences do not have strong representation of trichome-specific transcripts. We previously generated tomato mixed-type stem and petiole trichome EST assemblies by deep sequencing of cDNA by 454 pyrosequencing (Schilmiller et al., 2009). cDNA from isolated leaf type VI glands was sequenced to complement the mixed-type stem trichome library and enable searching for trichome type-specific transcripts and proteins. A combined total of 467,677 reads (244,655 from the mixed trichome library and 223,022 from the type VI library) were generated and submitted to the GenBank Short Read Archive (SRP000732). Reads from the combined libraries were assembled into 23,585 contigs with an average length of approximately 300 bp. BLASTX searches were conducted against the National Center for Biotechnology Information (NCBI) nonredundant protein database and a subset of this database containing only green plant sequences (viridiplantae, taxid:33090). Each contig was annotated according to the top BLASTX match (Supplemental Table S1), and the top 25 contigs with the highest number of assembled sequence reads are shown in Table I.
We sought to identify trichome-specific transcripts by looking for contigs with an overrepresentation of reads in the trichome 454 assembly. The Institute for Genomic Research (TIGR) transcript assemblies (TAs; http://plantta.jcvi.org/index.shtml; Childs et al., 2007) were used for this comparison because they were generated from 200,250 ESTs derived from a broad collection of tissues. A BLASTN match to a TIGR TA was found for 87% of the trichome contigs using an E-value cutoff of less than 1 3 10 210 (Supplemental Table S1). Because trichomes constitute a relatively small part of the tissues used to generate the TAs, trichome-specific ESTs are underrepresented in those libraries. The monoterpene biosynthetic enzyme Neryl Diphosphate Synthase1 (TA23792_4081) is an example of a trichome-enriched transcript: it represents 0.71% of the trichome ESTs from the combined libraries and only 0.005% of TA ESTs, or a 160-fold enrichment in trichomes (Schilmiller et al., 2009).
A total of 3,052 trichome EST contigs did not have a BLASTN match to a TA, suggesting that our sequencing revealed other good candidates for trichomeenriched genes (Supplemental Table S2). Many of the contigs on the list of 3,052 are relatively short (2,668 have a length less than 300 bp) with few reads and are mathematically less likely to have a match with an E-value below 1 3 10 210 . Out of the 3,052 contigs with no TA match, there were 651 contigs with a BLASTX match of E-value less than 1 3 10 210 in the nonredundant green plants (viridiplantae) protein database at GenBank (Supplemental Table S2). These contigs represent genes that are potentially trichome specific and have some related sequences with annotation that may help in determining their function.
To validate the comparison of trichome reads against the TAs experimentally, five genes with a high enrichment of trichome ESTs (Table II) were chosen for reverse transcription (RT)-PCR analysis (Fig. 2). A putative metallocarboxypeptidase inhibitor and a gene of unannotated function (TA24712) represent the class of genes that are very highly expressed in the trichome based on the number of 454 reads. A second category is represented by lipoxygenase C and hydroperoxide lyase (HPL). These together catalyze the production of volatile aldehydes and are known to be expressed in various tissues, including leaf, flower, and fruit (Howe et al., 2000;Chen et al., 2004), consistent with the presence of ESTs in the TA assemblies. Interestingly, these were not previously reported to be active in trichomes, yet our data show that they have trichome-enriched mRNA accumulation. We also analyzed a putative WRKY transcription factor gene that appears to be modestly expressed yet is enriched in trichome ESTs. RT-PCR was used to compare expression in isolated stem and petiole trichomes with that in stems and petioles with their trichomes removed (Fig.  2). These results confirmed that all five genes have relatively high expression in trichomes compared with the shaved tissue, despite their broad range of expression levels in trichomes.

Shotgun Proteomics Analysis of Mixed-Type Trichomes and Isolated Type VI Glands
The availability of a large number of tomato cDNA contigs from varied tissue types created a DNA sequence data set for use in shotgun proteomics analysis. Protein was purified from trichomes isolated from stem and petiole tissues of 3-week-old tomato plants, the same developmental stage used for cDNA sequencing (Fig. 1A). Mixed-type trichome samples, including stalk and gland cells free from detectable underlying epidermal pavement cells, were collected in bulk by gently abrading stem and petiole sections in liquid nitrogen (Fig. 1B). Type VI glands, which are four cell clusters that are easily removed from the stalk cell, were size selected using nylon mesh screens and could be collected at greater than 98% purity based on microscopy (Fig. 1C). Mixed-type trichome and type VI gland protein samples were analyzed using a gelenhanced liquid chromatography-tandem mass spectrometry (LC-MS/MS) approach (Lee and Cooper, 2006).
A custom protein sequence database was created using translated sequences from the assembly of the M82 mixed-type and type VI trichome 454 reads to analyze the proteomics data. Mascot (www. matrixscience.com) searches were performed using the translated DNA sequence followed by validation using the Scaffold software package (www. proteomesoftware.com), which uses the Peptide-Prophet and ProteinProphet algorithms (Keller et al., 2002;Nesvizhskii et al., 2003). A stringent set of protein sequence validation criteria was employed to reduce the rate of false protein discovery: the protein identification threshold was set to 99% probability and a minimum of two peptides identified with at least 95% peptide probability.
A total of 1,973 proteins were identified from the combined total and type VI trichome data sets using the 454 sequence database (Supplemental Table S3). A significant number of these proteins are represented more than once due to disjointed contigs of the same gene. This is because the short reads from the 454 sequencer did not assemble as well as longer Sanger sequenced ESTs. A manual inspection was done for those contigs that had a BLASTN match to the same TA, and spectral counts were combined when multiple records represented the same protein. This resulted in condensing the list to 1,552 proteins identified using   (2), and stem and petiole tissue with intact trichomes (+). Expression was analyzed for genes with ESTs enriched in trichome tissue compared with the TIGR TA ESTs. LoxC, Lipoxygenase C; MCPI, putative metallocarboxypeptidase inhibitor. Translation elongation factor-1a (EF-1a) was used as a constitutive control, and small subunit of Rubisco (ssRBC; TA17646_4081) was included as a control not expected to be highly expressed in trichome tissue. All reactions were done for 20 cycles except for WRKY, which was 25 cycles.
the 454 contigs (Supplemental Table S4). Of the 1,552 proteins identified, 1,360 were found in both samples, with 67 proteins only in the type VI trichome sample and 125 specific to the mixed-type preparation. The high degree of overlap between the two data sets is not surprising, as type VI glands make up a significant fraction of cells in the mixed-type trichome sample (Kang et al., 2010).

Correlation of Trichome Chemistry with mRNA and Protein Accumulation
The phenylpropanoid pathway-derived compounds rutin and chlorogenic acid are detected in extracts from mixed-type trichomes, with higher MS signal intensity for rutin (mass-to-charge ratio [m/z] 609; Fig.  3A). Similar quantitative differences were recently reported for these compounds in tomato leaf trichomes (Kang et al., 2010). We tested the quality of the transcriptome and proteome data by quantifying the number of ESTs and spectral counts for the enzymes in these biosynthetic pathways. Spectramatching peptides for the first seven enzymes leading from Phe to rutin were found (Fig. 3B). In contrast, no peptides for the known enzymes of chlorogenic synthesis were observed when searching the proteomics data using the M82 trichome EST assemblies. Thus, the relative abundance of these metabolites in these tomato trichomes predicts our ability to detect the mRNAs and peptides encoding their biosynthetic enzymes.
Tomato trichomes accumulate polyphenol oxidase (PPO) protein and activity in addition to the production of phenolic compounds that are PPO substrates (Yu et al., 1992). These compounds are thought to be polymerized by PPO after a trichome gland is ruptured by an insect (Tingey, 1991). Tomato PPO is encoded by a seven-member gene family (Thipyapong et al., 1997). Of these, transcripts (454 reads) and proteins (MS/MS spectra) matching PPO-E and PPO-F were detected in the mixed-type trichome and type VI gland data (Supplemental Tables S1 and S3). This is consistent with in situ hybridization demonstrating the presence of PPO-E and PPO-F transcripts in type I and VI trichomes (Thipyapong et al., 1997).
Previous work on trichomes demonstrated that isopentenyl diphosphate and dimethylallyl diphosphate are synthesized primarily through the 2-Cmethyl-D-erythritol 4-phosphate (MEP) pathway in plastids rather than via the mevalonate pathway in the cytosol (Iijima et al., 2004a;Xie et al., 2008). In the plastids of M82 tomato trichomes, isopentenyl diphosphate and dimethylallyl diphosphate are used by Neryl Diphosphate Synthase1 enzyme, leading to the synthesis of monoterpenes, the major volatile isoprenoid class detected in glands (Schilmiller et al., 2009). Consistent with the plastidic synthesis of trichome monoterpenes, ESTs and spectra matching the proteins of the MEP pathway are signifi-cantly more abundant than those of the mevalonate pathway (Fig. 4).

Using Combined Genomics, Proteomics, and Metabolite Data for Enzyme Discovery
The availability of EST, proteomics, and metabolite data allows for a detailed analysis of trichome biosynthetic pathways. As shown in Table III, a search of the trichome EST database revealed high-level expression of the gene for the previously characterized tomato sesquiterpene synthase SSTLE1 (AF279453; 0.6% of trichome ESTs). This gene encodes a protein that catalyzes the production of germacrene C in vitro (Colby et al., 1998;van der Hoeven et al., 2000). Mass spectra from both the mixed-type trichome and the type VI proteomics samples matched this contig (Table  III). Another contig (M01000000371) that is 91% identical to SSTLE1 at the nucleotide level was also found in the trichome EST database. Similar to SSTLE1, M01000000371 mRNA accumulated to a much higher level in trichomes than in nontrichome tissues; this gene represents only 0.0005% of the TA ESTs; however, it represents 0.11% of the trichome ESTs (Table III).
Previous studies showed that sesquiterpene production requires genes located on chromosome 6, and SSTLE1 was mapped to this region (van der Hoeven et al., 2000). A search of the tomato genome scaffold sequences (http://solgenomics.net/, version 1.0) using M01000000371 revealed a scaffold sequence containing this gene separated from the SSTLE1 sequence by approximately 38 kb. The presence of SSTLE1 along with other markers on the scaffold indicates that M0100000371 is also located on chromosome 6, within regions 6-2 and 6-2-2 in the tomato cv M82 3 Solanum pennellii LA0716 introgression lines Zamir, 1994, 1995). This region is known to control sesquiterpene production in tomato (van der Hoeven et al., 2000;Schilmiller et al., 2010).
While M01000000371 was well represented in the trichome EST database, it was surprising that no mass spectra matched this contig in the proteomics data. However, as shown in Table III, the majority of the reads in contig M01000000371 were from type VI gland cDNA prepared from leaf trichomes rather than stem. It was of interest to ask whether the observed difference in expression between stem and leaf trichomes Figure 4. EST abundance and proteomics-normalized spectral abundance factors (NSAF) for steps of the MEP and mevalonate pathways in type VI trichomes. EST counts (percentage of the total type VI ESTs) and NSAF for genes and proteins of the MEP pathway (A) are more abundant than those of the mevalonate pathway (B) in type VI trichomes. DXS, 1-Deoxy-D-xylulose-5-phosphate synthase; DXR, 1-deoxy-D-xylulose-5-phosphate reductoisomerase; IspD, 4-diphosphocytidyl-2-C-methyl-D-erythritol synthase; IspE, 4-(cytidine 5#-diphospho)-2-C-methyl-D-erythritol kinase; IspF, 2-C-methyl-Derythritol 2,4-cyclodiphosphate synthase; IspG, 4-hydroxy-3-methylbut-2en-1-yl diphosphate synthase; IspH, 4-hydroxy-3-methylbut-2-enyl diphosphate reductase; IDI, isopentenyl/ dimethylallyl diphosphate isomerase; IPP, isopentenyl diphosphate; DMAPP, dimethylallyl diphosphate; HMGS, 3-hydroxy-3-methylglutaryl-CoA synthase; HMGR, HMG-CoA reductase; MVK, mevalonate kinase; PMVK, diphosphomevalonate kinase; MPD, diphosphomevalonate decarboxylase. resulted in altered terpene accumulation. In fact, the sesquiterpenes b-caryophyllene and a-humulene were nearly undetectable in stem type VI glands, whereas leaf type VI glands produced significant levels of these compounds (Table IV). In contrast, the levels of d-elemene (the compound that is detected due to thermal degradation of germacrene C in the gas chromatograph) were the same for stem and leaf. These data strongly suggested that M01000000371 encodes a sesquiterpene synthase producing b-caryophyllene and a-humulene in leaf type VI glands. To test this hypothesis, a His-tagged version of the predicted open reading frame was expressed in Escherichia coli. The affinity-purified enzyme was incubated with E,E-farnesyl diphosphate (FPP), and the reaction products were analyzed by gas chromatography (GC)-MS. Two peaks were detected and identified as b-caryophyllene and a-humulene by comparison of retention times and mass spectra with authentic standards ( Fig. 5; Supplemental Fig. S1). The in vitro reaction products were also identical to sesquiterpenes from analysis of tomato leaf-disc head space ( Fig. 5; Supplemental Fig. S1). Together, these data show that this enzyme, which we designate as CAHS (for b-caryophyllene/a-humulene synthase; GenBank accession no. GU647162), is responsible for b-caryophyllene and a-humulene production in leaf type VI trichomes.

DISCUSSION
Plants make tens of thousands of structurally diverse specialized metabolites, with known biological roles ranging from plant stress tolerance to pharmaceutically important drugs. Production and storage of these compounds are often limited to specific subcellular compartments, specialized cells, or tissue types. This allows the metabolites to be sequestered (e.g. toxic latex in laticifers) and permits them to exert their biological functions where and when needed (e.g. hydrolysis of glucosinolates to toxic products by myrosinase upon tissue damage by chewing insects). However, these specialized cell types can be difficult to isolate and study. In contrast, secreting glandular trichomes are easy-to-isolate specialized epidermal appendages that make large amounts of specific classes of specialized metabolites. This makes them excellent tissues for systematic investigation of these biosynthetic pathways by analyzing metabolite accumulation and gene expression patterns. In this study, we took advantage of the ease of gland isolation to perform deep EST sequencing by 454 pyrosequencing and shotgun proteomics to comprehensively analyze gene expression in tomato type VI and mixed-type trichomes.
This deep analysis of the trichome transcriptome and proteome revealed details of several well-characterized pathways of specialized metabolism. For example, proteins were identified for all eight steps of the MEP pathway, which produces precursors for isoprenoid biosynthesis (Fig. 4). Similarly, cDNAs and proteins were found for nearly all known steps leading to the synthesis of rutin (Fig. 4). These examples validate the hypothesis that the EST and proteomics data should represent the majority of enzymes for pathways active in trichome cells. Thus, these data, coupled with metabolite profiling to reveal classes of compounds that accumulate in the glands, will be Table IV. Terpene levels in type VI glands from leaf and stem tissue, which is the same developmental stage as used for EST sequencing and proteomics Amounts shown are averages and SD from four replicates. useful for discovering new enzymes and regulators of specialized metabolic pathways. The EST and proteomics results revealed that the previously characterized enzymes of short-chain oxylipin biosynthesis are strongly expressed in tomato trichomes. These oxidized lipid products are known to be part of the "leafy green" smell associated with cut grass (Howe and Schilmiller, 2002). The mRNAs for the enzymes that produce aldehydes (lipoxygenase and HPL) are known to accumulate in several tomato tissues, including modest expression of HPL in whole stems (Heitz et al., 1997;Howe et al., 2000). Our results indicate that this low-level expression in whole stems is the result of strong trichome expression and weak or no expression in nontrichome stem tissue (Fig. 2). Synthesis of volatile aldehydes on the surface of the plant is consistent with the involvement of these oxylipins in interactions with insects and pathogens (Matsui, 2006).
The combination of trichome EST and proteomics data analysis with targeted metabolite measurements revealed the function of a previously uncharacterized tomato trichome sesquiterpene synthase. Several lines of evidence led to the hypothesis that CAHS controls b-caryophyllene and a-humulene synthesis. First, b-caryophyllene and a-humulene are produced at significantly higher levels in leaf trichomes compared with stem trichomes, which correlates with differences in EST representation in the glands of these organs. Second, analysis of tomato genomic DNA assemblies revealed that CAHS is located on chromosome 6 in a region previously shown to control sesquiterpene production in tomato (van der Hoeven et al., 2000;Schilmiller et al., 2010). In vitro assays of the recombinant protein showed that it converts FPP to the sesquiterpenes b-caryophyllene and a-humulene in ratios similar to those found in leaf trichomes ( Fig. 5;  Supplemental Fig. S1).
Recently published results showed that virusinduced gene silencing using a SSTLE1 sequence resulted in reduction of accumulation of all sesquiterpenes, including b-caryophyllene and a-humulene (Besser et al., 2009). Our results are consistent with the hypothesis that the sequence used for the virusinduced gene silencing experiment resulted in suppression of both SSTLE1 and CAHS, presumably due to the high identity of the silencing sequence used (94% at the nucleotide level). We propose that tomato trichome expression of SSTLE1 normally results in germacrene accumulation, whereas CAHS is responsible for b-caryophyllene and a-humulene.
An important insight from this work is that there are differences in metabolite accumulation and gene expression of type VI glands between leaf and stem (Tables III and IV) despite the observation that they are morphologically indistinguishable. These differences indicate that the developmental state of the surrounding tissue influences gene expression and metabolite accumulation in gland cells. Thus, development of methods for analyzing the chemistry and gene expres-sion profiles of individual gland cells will advance the discovery of metabolic pathways and physiological roles of these morphologically indistinguishable cell types. These results reinforce the importance of developing comprehensive data sets of mRNA, protein, and metabolite accumulation across tissues and developmental stages.
Identification of approximately 1,500 proteins from two shotgun proteomics experiments using trichome 454 EST assemblies confirms the power of combining deep mRNA and protein sequencing for plants lacking substantial genomics data resources. Improvements in assembly and annotation of the tomato genome will increase the utility of the proteomics data, for example, by revealing proteins encoded by low-abundance mRNAs. Our results also reinforce the value of combining mRNA and protein sequencing with metabolite analysis for the discovery of new metabolic enzymes. Finally, these deep sequence resources will be useful in gene model validation and annotation of the emerging tomato genome sequence.

Plant Growth and Trichome isolation
Tomato (Solanum lycopersicum 'M82') seed was germinated on moistened filter paper and then transferred to Jiffy peat pots (Hummert). Plants were grown in a growth chamber maintained under 16 h of light (250 mE m 22 s 21 ) at 28°C and 8 h of dark at 20°C. Mixed-type trichomes from petioles and stems of 21-d-old plants were isolated by gently scraping tissue with a spatula frozen in liquid nitrogen using a flat-end plastic spatula. Glandular heads from type VI trichomes were collected using a procedure modified from Gang et al. (2001). After removal of leaflets, petiole and stem tissue from 21-d-old plants was quickly collected into a 500-mL volume glass bottle containing ice-cold gland isolation buffer (50 mM Tris-Cl, pH 7.5, 200 mM sorbitol, 20 mM Suc, 10 mM KCl, 5 mM MgCl 2 , 0.5 mM K 2 HPO 4 , 5 mM succinic acid, 1 mM EGTA, and 14 mM b-mercaptoethanol) with 0.5-mm glass beads. Type VI glands were removed by shaking the bottle by hand two times for 30 s each with a 1-min period on ice between shakings. The material was filtered first through a 350-mm nylon mesh followed by a 100-mm mesh to remove stem and petiole tissue and glass beads. The type VI glands were collected onto a 40-mm mesh and washed with isolation buffer. To minimize collection time, the single-layer nylon filters were clamped in embroidery hoops and held one over the other using a ring stand to allow quick sequential filtering. After a brief spin in a microcentrifuge to concentrate gland cells, the buffer was removed using a pipette and the pellet was frozen in liquid nitrogen.

Metabolite Analysis
For analysis of total stem trichomes from 3-week-old M82 plants, stem sections (approximately 2 cm long) were dipped in 1 mL of acetonitrile: isopropanol:water (3:3:2, v/v) for 1 min with gentle rocking. LC-MS analysis of samples was performed as recently described (Gu et al., 2010). Terpenes were analyzed from type VI trichomes on stems and leaves of 3-week-old plants by picking glands using a pulled Pasteur pipette and dipping into tertbutyl methyl ether solvent containing 10 ng mL 21 tetradecane as internal standard. Each sample consisted of 200 glands picked into 100 mL of solvent. Samples were analyzed by GC-MS by injection onto a DB-5 column (Agilent; 10 m length, 0.33 mm film thickness, 100 mm internal diameter) on a 6890 gas chromatograph (Agilent) coupled to a 5975b mass spectrometer (Agilent). Injector temperature was 280°C working in splitless mode. The GC temperature program was as follows: 40°C for 1 min, 30°C min 21 to 90°C, 5°C min 21 to 110°C, 40°C min 21 to 165°C, 5°C min 21 to 180°C, 40°C min 21 to 320°C, hold for 2 min. The mass spectrometer was operated in selected ion monitoring mode for m/z 85 and 93. Amounts of compounds were normalized to a tetradecane internal standard and quantified using an external standard curve of g-terpinene.

cDNA Library Construction and Sequencing
Frozen trichomes were ground to a powder in liquid nitrogen using a mortar and pestle. RNA was extracted using the RNeasy Plant mini kit (Qiagen) according to the manufacturer's instructions and including the oncolumn DNaseI treatment step. RNA quality was assessed using the Agilent 2100 Bioanalyzer RNA chip (Agilent Technologies). Preparation and sequencing of cDNA from the mixed-type trichome sample were described previously (Schilmiller et al., 2009). Preparation of cDNA from total RNA isolated from type VI glands was done using a slightly modified protocol from the SMART cDNA library construction kit (Clontech). First-strand cDNA was synthesized from 1 mg of total RNA as described in the kit protocol. For first-strand synthesis, a modified oligo(dT) primer (5#-TAGAGGCCGAGGCGGCCGA-CATGTTTTGTTTTTTTTTCTTTTTTTTTTVN-3#, where V = A, C, or G and N = A, T, C, or G) was used in a 10-mL reaction using SuperScript II reverse transcriptase (Invitrogen) and the SMART IV primer from the SMART cDNA library construction kit. Double-stranded cDNA was synthesized as described in the SMART cDNA kit protocol using the modified oligo(dT) primer and the 5#PCR primer from the kit. Amplified cDNA was purified using Qiagen QIAquick PCR purification spin columns and pooled to give 5 mg of cDNA. Preparation of type VI gland trichome cDNA for sequencing using the GS-FLX sequencer was done by the Michigan State University Research Technology Support Facility according to the manufacturer's protocol. Reads from both the mixed-type trichome library and the type VI gland library were processed and trimmed to remove low-quality and primer sequences using SeqClean (Pertea et al., 2003). The cleaned reads were assembled using CAP3 (Huang and Madan, 1999) followed by a second round of CAP3 to cluster the first-pass contigs and singletons. First-round CAP3 parameter settings for percentage match, overlap length, maximum overhang percentage, gap penalty, and base quality cutoff for clipping were -p 90, -o 50, -h 15, -g 2, and -c 17, respectively. For the second round CAP3 parameters, -o was changed to 100. A translated BLAST (BLASTX) search was then performed with the final contigs against the nonredundant and green plants databases at NCBI.

Protein Extraction and Proteomics Analysis
Protein was extracted from frozen ground trichomes with warm (80°C) buffer consisting of 100 mM Tris-Cl (pH 6.8), 150 mM NaCl, 10% glycerol, 4% SDS, and 200 mM dithiothreitol supplemented with Complete Mini EDTA-free proteinase inhibitors (one tablet per 10 mL; Roche). Lipophilic contaminants that would interfere with electrophoresis were removed by chloroformmethanol extraction of the solubilized protein. Protein pellets from the chloroform-methanol extraction were resuspended in extraction buffer, and protein concentrations were determined using the Coomassie Plus (Pierce) protein assay. Trichome proteins (mixed type, 120 mg; type VI, 190 mg) were separated on a 4% to 15% one-dimensional SDS-PAGE gradient gel (18-well; Criterion; Bio-Rad). The lane was excised and cut into 12 4-mm slices followed by in-gel trypsin digestion (Shevchenko et al., 1996). The extracted peptides were then injected by a Waters nanoAcquity Sample Manager and loaded for 5 min onto a 5-mm particle, 180-mm 3 20-mm Waters Symmetry C18 peptide trap at 4 mL min 21 in 2% acetonitrile/0.1% formic acid. The bound peptides were then eluted onto a 1.7-mm particle, 100-mm 3 100-mm Waters BEH C18 nanoAcquity column and eluted over 240 min with a gradient of 5% B to 35% B over a period of 205 min, ramping to 90% B from 205 to 215 min, and immediately back to 5% B using a Waters nanoAcquity ultra-HPLC system (buffer A = 99.9% water/0.1% formic acid, buffer B = 99.9% acetonitrile/0.1% formic acid) with a flow rate of 300 nL min 21 . Column eluate was introduced into a ThermoScientific linear ion trap Fourier transform ion cyclotron resonance mass spectrometer outfitted with a Thermo Nanospray I spray source. Survey scans were taken in the Fourier transform ion cyclotron resonance cell at 25,000 resolution at m/z 400, and the top 10 ions in each survey scan were subjected to automatic low-energy collision-induced dissociation in the linear ion trap.
The resulting MS/MS spectra were converted to peak lists using BioWorks Browser version 3.2 (ThermoFisher) and searched against an in-house-generated tomato cv M82 trichome cDNA database (http://bioinfo.bch.msu.edu/ trichome_est) using the Mascot searching algorithm (version 2.2; Matrix Science). The database employed for this analysis contained the assembled M82 mixed-type trichome and type VI trichome reads. Searches allowed for the identification of carbamidomethyl Cys as a fixed peptide modification and Met oxidation as a variable modification. Up to two missed tryptic cleavages were allowed. Peptide tolerance was set to 10 ppm, and MS/MS tolerance to 0.8 D. The Mascot output was then analyzed using Scaffold (Proteome Software), which calculates the probability of protein identification based on ProteinProphet and PeptideProphet algorithms (Keller et al., 2002;Nesvizhskii et al., 2003). The protein and peptide identification thresholds were set to 99% and 95% probability, respectively, with a minimum of two peptides per protein required. Proteomics data were deposited to the NCBI Peptidome database with the accession number PSE138. A normalized spectral abundance factor for each protein was calculated to adjust for protein length according to Paoletti et al. (2006).

RT-PCR Analysis
Total RNA was extracted as described above from mixed-type trichomes isolated from stem and petiole tissues, stem and petiole with intact trichomes, and stem and petiole tissue after removal of trichomes by rubbing frozen tissue by hand with a latex laboratory glove. RT of 1 mg of total RNA was performed using SuperScript II (Invitrogen) and an oligo(dT) 12-18 primer. PCR amplification of select transcripts was performed using gene-specific primers (Supplemental Table S5) designed using GenBank or TIGR tomato TA (http:// plantta.jcvi.org/) sequences. All reactions were performed for 20 cycles except for WRKY, where 25 cycles were used.

CAHS Expression and Enzyme Assay
The CAHS open reading frame (GenBank accession no. GU647162) was amplified using the forward primer 5#-ATGGCTAGCGCTTCTTCTTCTGCT-3#, which introduces an NheI site, and the reverse primer 5#-AAGCTTTCA-TATTTCGACAGACTCA-3#, which creates a HindIII site. The amplified fragment was subcloned into pGEM-Teasy (Promega) for sequence verification, followed by subcloning into the NheI and HindIII sites of the expression vector pET28b (Novagen), creating a fusion open reading frame with an N-terminal 63 His tag. The construct was mobilized into Escherichia coli Rosetta-gami 2 (Novagen) cells, and His-tagged recombinant CAHS was purified as follows. A 500-mL culture was grown in terrific broth medium supplemented with 12.5 mg mL 21 tetracycline, 50 mg mL 21 streptomycin, 34 mg mL 21 chloramphenicol, and 15 mg mL 21 kanamycin at 37°C with shaking at 200 rpm to an optical density at 600 nm of 0.4. Expression of CAHS was induced with 0.1 mM isopropylthio-b-galactoside, and the culture was grown for 20 h at 25°C with shaking at 120 rpm. Cells were harvested by centrifugation and resuspended in lysis buffer (50 mM HEPES, pH 8.0, 5% glycerol, 100 mM KCl, and 7.5 mM MgCl 2 ) followed by sonication. His-tagged CAHS was purified from the cleared lysate by nickel affinity chromatography (Qiagen) according to the manufacturer's protocols. Protein measurements were performed using a bicinchoninic acid assay (Pierce) using bovine serum albumin as a standard.
For terpene product identification, purified CAHS (2 mg) was incubated in a sealed glass vial in lysis buffer containing 10 mM FPP in a final volume of 100 mL at 30°C for 30 min. The gas phase of the glass vials was then extracted with a solid-phase microextraction fiber (polydimethylsiloxane/divinylbenzene; 65 mm; Supelco) for 5 min at 42°C and analyzed by GC-MS. The GC analysis was as described above with modifications as follows: inlet set at 250°C; and oven at 50°C for 0.5 min, 40°C min 21 to 250°C, hold for 0.3 min. The mass spectrometer was set to scan m/z range 45 to 350. Blanks with buffer without enzyme, boiled enzyme controls, and commercial standards were also run to confirm the identity of peaks.
Sequence data from this article can be found in the GenBank/EMBL data libraries under accession numbers GU647162 (CAHS), SRP000732 (454 sequences), and PSE138 (proteomics data).

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Mass spectra of sesquiterpene standards and reaction products.
Supplemental Table S1. List of assembled 454 contigs and associated annotation, closest TIGR TA BLASTN match, number of ESTs in TA, number of reads in 454 contig, percentage of total ESTs in each assembly, and ratio of percentage totals for each assembly.
Supplemental Table S2. List of the 3,052 contigs from 454 sequencing that did not have a significant BLASTN match to a TIGR TA (E-value , 1e 210 ).
Supplemental Table S3. List of 1,973 proteins identified from total and type VI proteins using the assembled 454 contigs.
Supplemental Table S4. List of 1,552 proteins after manually combining records where 454 contigs matched the same TA.
Supplemental Table S5. Primers used for expression analysis by RT-PCR.