Brassica database (BRAD) version 2.0: integrating and mining Brassicaceae species genomic resources

The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/


Introduction
Brassicaceae is a large eudicot family that includes the model plant Arabidopsis thaliana. The Brassicaceae family has a remarkable diversity of species, genetics and morphotypes, as well as scientific and economic importance. Brassicaceae species have become model systems for

Page 1 of 8
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
(page number not for citation purposes) studies of polyploidy and evolution (1). The important genus Brassica of Brassicaceae contains many vegetable, condiment and oil species that account for about 12% of the world's edible vegetable oil production (http://faostat. fao.org/). U's triangle theory (2) has been applied to describe the relationships among six widely cultivated Brassica species, the diploids Brassica rapa (AA), B. nigra (BB) and B. oleracea (CC) and their allotetraploids B. juncea (AABB), B. napus (AACC) and B. carinata (BBCC). Of these, the B. rapa genome was the first to be sequenced in 2011 (3) and the original Brassica database was built based on it (4). BRAD version 1.0 (V1.0) provides B. rapa genome sequences and gene models, as well as all the syntenic and non-syntenic homologous gene pairs between B. rapa and A. thaliana. On all its pages, BRAD V1.0 incorporates a useful navigation dialog-window that provides links to every B. rapa and A. thaliana gene ID. The small navigation window directs users by integrating relevant resource links of the target gene. With the rapid development of next-generation sequencing technology and the dramatic decrease in cost, many Brassicaceae species have been sequenced or were planned to be sequenced after BRAD V1.0 was constructed. Recently, the genomes of B. rapa sister species, B. oleracea and B. napus, have been sequenced (5,6) and nine other Brassicaceae species have also been sequenced (7)(8)(9)(10)(11)(12)(13). These 13 Brassicaceae genome datasets are a valuable resource for genome and gene studies among the closely related Brassicaceae species.
To help researchers and breeders use these recently released Brassicaceae genome sequences efficiently in scientific investigations and breeding applications, we have updated BRAD to version 2.0 (V2.0). BRAD V2.0 contains updated datasets and functions that include all syntenic gene pairs between A. thaliana and the other Brassicaceae species, more genome and gene sequences and gene annotations, as well as syntenic figures and genome visualization of all the incorporated Brassicaceae species in the Genome Browser (GBrowse) (14). BRAD V2.0 provides a comprehensive framework for comparative genomic analysis and studies of the evolution of gene function across Brassicaceae species, especially for the Brassica crops. BRAD V2.0: feature updates Overview of BRAD V2.0 In BRAD V1.0, datasets of genome and gene sequences, gene annotations, non-coding RNAs, transposable elements, genetic markers and linkage maps of B. rapa were provided (15,16). A navigation dialog-window for every gene of B. rapa and A. thaliana was provided to help users obtain all related information. Furthermore, BLAST and GBrowse tools (16) were embedded in BRAD for sequence alignment and for visualizing genomic elements, respectively. BRAD V1.0 has now been updated to V2.0 to include Brassicaceae genome sequences that have been released recently. In BRAD V2.0, a new section has been incorporated that shows genomic synteny and micro-fragmental synteny between any two Brassicaceae species. An alternative pairwise synteny plotting tool, the Generic Synteny Browser (GBrowse_syn) module (17) of GBrowse, has been included to visualize local synteny relationships among multiple genomes. Moreover, genome and gene sequences, gene annotations and syntenic and non-syntenic orthologs between A. thaliana and other Brassicaceae species have been integrated into different sections of BRAD V2.0.

Technical details
All genomic data were processed using the tool SynOrths tool (15) to generate genome and gene level synteny datasets. Then, syntenic figures were generated based on these synteny datasets and stored in a MySQL (18) database.
Genome sequences, gene models and the processed datasets, including all syntenic genes, gene annotation information and specific gene families were all imported into MySQL, which enables multifaceted browsing and searching in BRAD. Furthermore, a standalone BLAST (19) service implemented in BRAD allows sequence searches against Brassicaceae genomes, protein-coding gene sequences and protein sequences. The GBrowse package, which is commonly used to visualize genomic datasets, remains in BRAD V2.0 to view bulk genomic elements of the Brassicaceae species. Furthermore, the syntenic datasets are provided not only as tabular results and pairwise-genome synteny images in the keyword search section, but also are visualized as a multiple genome synteny comparison in the GBrowse module GBrowse_syn.

BRAD stocks: Brassicaceae genomes
Statistics of the Brassicaceae genomic data, including genome sequences, predicted gene models, protein-coding gene sequences and protein sequences are shown in Table 1. In total, about 4 Gb of data have been collected in BRAD V2.0. In addition to the original genome sequences and gene models, seven types of annotation for the predicted genes have been generated. The annotations have been sourced from the Swiss-Prot, TrEMBL (20), KEGG (Kyoto Encyclopedia of Genes and Genomes) (21), InterPro (22) and Gene Ontology (GO) (23) databases and syntenic genes and BLASTX alignments (best hit, e-value 1E-05) of Brassicaceae genes to the A. thaliana genome also have been included. The numbers of annotation records in these datasets for these species (excluding A. thaliana) are shown in Table 2. We used InterProScan (V48.0) (24), which includes 28 175 GO terms, to generate the InterPro domain and GO annotations. When InterProScan is updated, the GO annotations also will be updated in BRAD.

Updated feature: genome synteny analysis
Genome synteny analysis provides information for studies into the evolution of genome and gene function among species. BRAD V1.0 provided syntenic gene pairs between B. rapa and A. thaliana so that the gene information of the well-studied model plant A. thaliana could be used to annotate B. rapa genes. In BRAD V2.0, whole-genome synteny relationships between A. thaliana genes and the genes of other Brassicaceae species have been generated and integrated. We obtained syntenic gene pairs that ranged from 17 800 between A. thaliana and Aethionema arabicum to 59 191 between A. thaliana and Camelina sativa (Table 3 and Supplementary Tables S1 and S2). The number of tandem gene arrays is shown in Table 4; most had syntenic counterparts in the A. thaliana genome. These datasets can be used to investigate genomic rearrangement history, share gene annotation information and investigate functional differentiation of orthologous genes among Brassicaceae species.
Brassica crops experienced a common and relatively recent (9-15 million years ago) whole-genome triplication event after three rounds of polyploidization (c, b and a whole-genome duplication) in Brassicaceae (3,5,6,8,25).  They have three subgenomes in their genomes compared with other Brassicaceae species. B. napus is the allotetraploid of B. rapa and B. oleracea, thus its genome is composed of six subgenomes. Additionally, C. sativa experienced an independent and more recent whole-genome triplication event than the event in Brassica. Based on the rules that have been used to partition the three subgenomes of B. rapa (3,26), syntenic paralogous genes in the subgenomes of the four polyploidy species mentioned above were separated and updated in BRAD V2.0. Syntenic gene pairs were plotted as dots on a two-dimensional figure, where the x and y axes denote the chromosomal positions of the genes in any two genomes. Continuously distributed syntenic genes in any two genomes generate dot plots with fragments of lines ( Figure 2B). The dot-formed lines that are produced represent the chromosomal fragments and their different arrangements between two genomes. The ancestral genomic blocks (GBs) (27,28) of corresponding chromosomal fragments are also shown ( Figure 2B).

Genome synteny resource guidelines
Mining syntenic genes BRAD V2.0 has five main sections: Browse, Search, Tools, Download and Links. Placing the cursor over the Search section activates a drop-down menu. Clicking on the 'Syntenic gene' option ( Figure 1A) opens the search syntenic genes page where checkboxes for 11 Brassicaceae species (B. napus contains the Brassica A and C subgenomes) allow users to choose their required searches; a syntenic gene search between A. thaliana and B. rapa is set as the default ( Figure 1B). Next, users are required to provide a gene ID to search for syntenic genes among the selected species ( Figure 1C). The number of genes flanking the syntenic genes can be selected from a drop-down list as 10, 20 or 50. The search is activated by clicking the 'GO' button. For example, by selecting B. oleracea and A. lyrata as the species, inputting Bra019255 as the gene ID , setting the number of flanking genes to 10 (the default) and clicking the GO button, the results are output in a table that appears below the search panel as shown in Figure 1D. The solid circles indicate genes. Information about a gene can be obtained by placing the cursor over a circle. Clicking on the solid circle opens a pop-up dialog-window in which navigation information for the target gene is displayed ( Figure 1E). Clicking on a tandem symbol (two small dots following a gene symbol) displays the corresponding tandem gene array information at the bottom of the search page ( Figure 1F).  Users can also input their own nucleotide sequences instead of gene IDs using the BLAST services (Blastn, Blastp, Blastx tBlastn and tBlastx) provided under the Tools section in BRAD V2.0. The BLAST search page allows users to search against bulk data from different Brassicaceae sequence databases such as genomes, BACs, protein-coding genes, proteins and ESTs ( Figure 1G). Users will obtain related gene IDs based on the BLAST alignments as output ( Figure 1H). The obtained gene IDs can be used as input for the search syntenic genes analysis described above ( Figure 1B-F). Furthermore, if a user's nucleotide sequences are not derived from gene regions, the user may still be able to obtain the location of their sequences in the genomes of certain species. This information can be used to retrieve the flanking sequences and elements, which can be visualized or downloaded from GBrowse under the Tools section in BRAD V2.0.

Visualization of synteny analysis
A new 'Syntenic figure' function is available under the Search section in BRAD V2.0, which can be used to better illustrate the genomic synteny relationship between two Brassicaceae species. This function can be used to plot genomic synteny relationships as two-dimensional figures. One of the four ancestral species (A. thaliana, A. lyrata, C. rubella and S. parvula) can be selected for display on the y axis and one of eight other Brassicaceae species can be selected for display on the x axis by clicking the corresponding checkboxes. A total of 28 such figures are available (ignoring self-to-self plots). For example, if 'Ath' is chosen for the y axis and 'Aly' is chosen for the x axis, then by clicking the 'View' button ( Figure 2A), users will obtain the image shown in Figure 2B. The lines formed by the red dots show the genomic synteny relationships between the two genome sequences. Clicking on any of the GB regions (shown in color-coded bars), such as GB 'A', opens a figure that shows detailed synteny information ( Figure 2C). Clicking a dot, which represents a particular gene, on the GB figure will open the GBrowse_syn Web page (17) and show the 100-Kb genomic region flanking the clicked dot.

Syntenic blocks analysis for multiple genome resources
The GBrowse_syn (16) Figure 3A) and clicking the 'Search' button next to the Landmark search box, a visualization of syntenic blocks for the multiple genomes is obtained ( Figure 3B). The sequence of the target species (in this case A. lyrata) is shown in the middle of the graph as the reference genome, and the genomes being compared with the reference are displayed above and below it. Clicking on the track of a compared species changes it into the reference species and all others become the compared genomes. Furthermore, a link to the 'Syntenic gene' search section is provided for each gene icon shown on the graph of multiple genome syntenies.  crop genetic data, whereas BolBase (B. oleracea Genome Database) is focused on genomic structure comparisons of the B. oleracea genome. Unlike these other databases, BRAD uses information from genomic studies and gene function studies in the model species A. thaliana to annotate the newly sequenced genomes of Brassica species. BRAD V2.0 is a substantially improved version of BRAD V1.0. In BRAD V2.9, more Brassicaceae genomes have been integrated, and comprehensive functional annotations of all the Brassicaceae gene models, genome and gene-level syntenic datasets and visualization tools have been provided. In addition, we have included a new application 'Syntenic figure' in the search section to allow users to view pairwise syntenic relationships between the Brassicaceae genomes in BRAD V2.0. We used the GBrowse_syn module to visualize multiple genome synteny. The inclusion of bulk Brassicaceae genome datasets and new applications make BRAD V2.0 a user friendly platform from which to conveniently retrieve genomic information from the genome to gene levels. The updated BRAD V2.0 will be a valuable resource for research into comparative genomics, plant evolution and molecular biology, as well as for breeders of Brassicaceae crops.