ppdb: plant promoter database version 3.0

ppdb (http://ppdb.agr.gifu-u.ac.jp) is a plant promoter database that provides information on transcription start sites (TSSs), core promoter structure (TATA boxes, Initiators, Y Patches, GA and CA elements) and regulatory element groups (REGs) as putative and comprehensive transcriptional regulatory elements. Since the last report in this journal, the database has been updated in three areas to version 3.0. First, new genomes have been included in the database, and now ppdb provides information on Arabidopsis thaliana, rice, Physcomitrella patens and poplar. Second, new TSS tag data (34 million) from A. thaliana, determined by a high throughput sequencer, has been added to give a ∼200-fold increase in TSS data compared with version 1.0. This results in a much higher coverage of ∼27 000 A. thaliana genes and finer positioning of promoters even for genes with low expression levels. Third, microarray data-based predictions have been appended as REG annotations which inform their putative physiological roles.


INTRODUCTION
Gene regulation is a central part of morphogenesis and environmental adaptation of higher plants, and it is controlled by the promoter of each gene. Therefore, understanding of promoter structure is crucial to understand these fundamental processes of plants.
There are three aspects to promoter structure: (i) the position, direction and strength of the transcription start sites (TSSs) that indicate actual promoter position; (ii) the type and position of the core promoter elements such as TATA boxes and Initiators (Inrs) that are thought to be the major determinants of the direction and position of promoters and (iii) the type and position of transcriptional regulatory elements that are involved in gene regulation.
In our last report (1), we introduced the plant promoter database (ppdb), which provided promoter information about TSS clusters, core promoter elements [TATA boxes, Inrs, Y Patches, GA and CA elements (2,3)] and regulatory element groups [REGs, putative positionsensitive transcriptional regulatory elements that are extracted by local distribution of short sequences (LDSS) analysis (2)] as putative and comprehensive sets of transcriptional regulatory elements. The database of the original version 1.0 contained information of two plant species, Arabidopsis thaliana and rice.

MAJOR EXTENSIONS FROM VERSION 1.0
The major amendment in version 3.0 is the addition of the Physcomitrella patens and poplar genomes to the database. The sources used for the information of the four genomes, including A. thaliana and rice, are shown in Table 1. The promoter elements of the moss genome have been extracted by the LDSS method (2). During *To whom correspondence should be addressed. Tel: +81 58 293 2848; Email: yyy@gifu-u.ac.jp extraction, we noticed that considerable numbers of moss genes are driven by a similar type of promoter that is located within long terminal repeats. These promoters affect the extraction process due to tight sequence conservation that is not related to promoter function and for this reason they were excluded from the LDSS analysis. A. thaliana promoter elements have been applied to the poplar genome because the Brassicaceae and Malpighiales are phylogenetically close.
A new function called 'Homologue Gene Search' has been added to facilitate the comparison of promoter structures of orthologous genes within a species or between different species. Orthologue groups have been determined by Gclust, a system that classifies orthologues according to the presence or absence of protein motifs (16).
New A. thaliana TSS data of 34 million tags, which corresponds to a 200-fold increase in the previous data, have been added ( Figure 1). REG annotations have also been appended and show functional predictions based on microarray data of responses to plant hormones (AUX: auxin, BR: brassinosteroid, CK: cytokinin, ABA: abscisic acid, ET: ethylene, JA: jasmonic acid, SA: salicylic acid), responses to a hormone-like chemical (H 2 O 2 ) and some environmental stress-related responses (drought, DREB1A overexpression) (7). Functional annotation of 53 of 308 REGs is now available in version 3.0 ( Figure 2).

BROWSING PROMOTER STRUCTURE
The major function of ppdb is to give an indication of a possible promoter structure for each gene in a genome based on the established lists of LDSS-positive elements. The information can be directly called by the gene ID (e.g. AT1G67090 or Os01g0791600), or selected from a list of 'Keyword Search' or 'Homologue Gene Search'. Pages for individual genes show the following information: (i) DNA sequence, (ii) TSS distribution (direction and strength at a 1-bp resolution), (iii) core promoter structure and (iv) REG data.
At the sequence window, promoter elements including REGs and core elements are highlighted in a positiondependent manner as the default setting. Care should be taken that promoters without any TSS information do not show any elements as default. For an indication of the promoter elements of these genes, the 'Reliable' button should be clicked which changes the state to 'All' ( Figure 1, red arrow). This button is a toggle switch between 'Reliable' and 'All'. 'Reliable' is a default setting where only elements at appropriate positions relative to the peak TSS are detected. The setting 'All' removes the positional restriction as an indication of promoter elements, allowing global detection. The sensitive area in the 'Reliable' mode for each element group is described on the front page of the database. The 'TSS tag distribution' columns in the 'Focused view' provide the expressional strength of each TSS. The expression is the sum of six TSS tag libraries that are prepared from leaves, roots, inflorescences, etiolated seedlings and shoots from low light-grown and high lightgrown seedlings.
The 'Core promoter information' table shows the presence or absence of core promoter elements (TATA boxes, Inrs, Y Patches, GA and CA elements).
The 'REG information' table shows a REG list together with the corresponding PPDB motifs (2,3) and PLACE motifs (6). REG sequences, as well as PPDB and PLACE motifs, are linked to other pages containing biological information. New REG annotations for A. thaliana obtained from predicted cis-regulatory elements based on microarray data (7) have been included (Figure 2). Selection of the 'All' button ( Figure 1) adds another category, 'Not Reliable Promoter Summary' below 'Other Reliable Promoter Summary'. This category can be used when searching for regulatory elements (REGs)  from wider regions or when there is no TSS information on the promoter of interest.

ADDITIONAL PAGES
A whole list of REGs for each of the genomes can be viewed by selecting a cell in the table of 'Index of Genes' at the top of the page. The lists present the relationships between REG ID, sequence, PPDB motifs, PLACE motifs and also functional annotations. Selection of a specific REG entry leads to 'Summary of the REG' and 'Entry Sequences' that show the whole gene lists containing the corresponding REG, together with gene annotations.