Abstract

Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.

Introduction

Bread wheat (Triticum aestivum) is one of the most important crop plants worldwide, occupying 17% (one-sixth) of crop acreage of the world (Gupta et al. 2008), a staple food for 35% of the world's population, providing more calories and protein in the global diet than any other crop (www.idrc.ca/en/ev-31631-201-1-DO_TOPIC.html). Annual global wheat consumption has exceeded 600 Mt (http://www.fao.org), equivalent to about 100 kg per capita (Safar et al. 2010), and the demand for wheat production is predicted to grow by >40% by 2020 (Bhalla 2006).

While wheat is of great social and economic importance, it possesses a large and complex genome due to hexaploidy and a high proportion of repetitive DNA (Chantret et al. 2005, Paux et al. 2008, Wanjugi et al. 2009), making genomic analysis a significant challenge. The wheat genome sequence is currently unknown, limiting genomic-based crop improvement; however, efforts are underway throughout the world to sequence this genome. The International Wheat Genome Sequencing Consortium (http://www.wheatgenome.org/), which was established following a wheat genome sequencing workshop in November 2003 (Gill et al. 2004), is taking a BAC (bacterial artificial chromosome) by BAC approach which aims to deliver a complete high quality genome sequence by 2015. Hexaploid bread wheat was selected rather than the individual, ancestral diploid genomes because it is the species grown in 95% of the wheat-growing areas, and the ABD genomes of bread wheat do not correspond physically and functionally to the sum of the ancestral A (Triticum urartu), B (unknown species that are likely to be related to Aegilops speltoides) and D (Aegilops tauschii) genomes (Feuillet et al. 2011). The consortium has adopted a chromosome-based strategy to construct physical BAC clone maps and subsequently to sequence each of the 21 individual chromosomes (Dolezel et al. 2007). The first physical map of the largest wheat chromosome 3B was produced (Paux et al. 2008) and its sequencing is ongoing (http://urgi.versailles.inra.fr/index.php/urgi/Projects/3BSeq).

As an alternative to the BAC by BAC approach, other groups are applying second-generation sequencing technologies to gain insights into this complex genome. A consortium from the UK produced 5× coverage of the bread wheat genome using Roche 454 technology (http://www.cerealsdb.uk.net/). While this is insufficient to produce a finished genome assembly, the data are a valuable resource for gene discovery and genetic variation analysis (Imelfort and Edwards 2009). A draft wheat genome assembly has also been produced from the donor species of the wheat D genome, A. tauschii (http://www.cshl.edu/genome/wheat) and for individual flow-sorted bread wheat chromosome arms (Berkman et al. 2011, Wicker et al. 2011). With the increasing volume of wheat genome data becoming available through such efforts, it is essential to provide resources that can integrate wheat-specific sequence information in a manner accessible to crop improvement researchers (Edwards and Batley 2010).

In recent years, the growth in genome information has led to a challenge for bioinformatics researchers to transform the vast quantities of data being produced into collective knowledge. As sequence availability has increased, data access, representation, analysis and visualization present significant challenges (Batley and Edwards 2009, Ning and Montgomery 2010). In this context, online databases for genome and genomic data are very much in demand.

This paper describes an integrated wheat genome data resource, WheatGenome.info, which provides a variety of web-based systems for access to wheat genome and genomic data to support applied crop research and crop improvement. Moreover, this interface includes links to wheat-related web-based data hosted at other research organizations. WheatGenome.info is available at http://www.wheatgenome.info/.

Database contents

The wheatGenome.info database integrates several main web-based systems. These include an annotated wheat genome viewer based on GBrowse2, searchable using keywords, genome location or by sequence similarity using the BLAST portal; a CMap genetic and physical mapping database; TAGdb for searching wheat short read sequences; an annotated wheat expressed sequence tag (EST)-single nucleotide polymorphism (SNP) database; and a wheat genome Wiki.

A wheat genome viewer for annotated chromosome arm assemblies

The application of second-generation sequencing technology and advanced bioinformatics tools has enabled the assembly and annotation of the genes and low copy regions of isolated wheat chromosome arms, producing syntenic builds containing the majority of wheat genes (Berkman et al. 2011). Assemblies and syntenic builds for each of the group 7 chromosome arms have been produced (Table 1) and are hosted in a GBrowse2 database at wheatgenome.info for public access prior to publication. GBrowse2 is a user-friendly generic genome browser for genome sequence data and annotation (Donlin 2007, Arnaoudova et al. 2009). Each wheat chromosome arm has been annotated with predicted genes, Uniref90 gene similarities as well as homoeologous SNPs.

Table 1

Summary of wheat group 7 chromosome data currently available through TAGdb and GBrowse

Chromosome arm Volume of data on TAGdb (Gbp) Coverage Syntenic build version Syntenic build size (Mbp) 
7AS 17.66 43.40 0.1 2.78 
7AL 20.56 50.52 0.1 4.15 
7BS 19.68 54.66 1.0 6.61 
7BL 15.10 27.97 0.1 2.28 
7DS 27.67 72.62 1.0 7.94 
7DL 26.53 76.67 0.1 7.24 
Chromosome arm Volume of data on TAGdb (Gbp) Coverage Syntenic build version Syntenic build size (Mbp) 
7AS 17.66 43.40 0.1 2.78 
7AL 20.56 50.52 0.1 4.15 
7BS 19.68 54.66 1.0 6.61 
7BL 15.10 27.97 0.1 2.28 
7DS 27.67 72.62 1.0 7.94 
7DL 26.53 76.67 0.1 7.24 

As well as annotation keyword searches, a BLAST portal enables sequence similarity searches of assembled wheat chromosome arm data, with results displayed in the GBrowse2 viewer. DNA or protein query sequence can be uploaded or pasted in the web-based form in FASTA format. The results are displayed in three sliding windows: the Overview window, Region window and Details window. The reference view can be dragged and zoomed. Several tracks of annotation are available, including Uniref90, Genes, Contigs, SNPs and Exons. All of these features can be expanded by clicking the associated plus button, and each feature provides a link to show the feature details (Fig. 1).

Fig. 1

Example of the detailed information for the wheat genome 7AS syntenic build from the reference view of GBrowse2. Several tracks of annotation are available, including Uniref90, Genes, Contigs, SNPs and Exons.

Fig. 1

Example of the detailed information for the wheat genome 7AS syntenic build from the reference view of GBrowse2. Several tracks of annotation are available, including Uniref90, Genes, Contigs, SNPs and Exons.

This GBrowse database enables the rapid dissemination of wheat chromosome arm sequence information prior to publication. In the absence of a finished wheat reference genome upon which to base crop improvement efforts, this tool represents the first opportunity for wheat researchers to interact with chromosome-scale gene-based sequence scaffolds in an intuitive and user-friendly manner. It allows for a more rigorous interrogation of genes surrounding a locus of interest than was previously possible in wheat, to assist the identification of the genomic basis of important traits. With the expansion of wheat genome sequencing activities by several groups internationally, this resource will increasingly provide access to wheat genome information for crop improvement research.

Wheat arms sequence data on TAGdb

TAGdb is an online database system designed to identify and visualize next-generation paired sequence tags that share identity with a submitted query sequence (Marshall et al. 2010). The TAGdb interface requests a FASTA format query sequence of up to 5,000 bp as well as a contact E-mail address, so users can retrieve previous query searches. Users can then select a variety of wheat short read data libraries. After starting the process, TAGdb sends an E-mail to the user stating that the job has started successfully and provides a link to the results web page. Once the search is complete, TAGdb sends a second E-mail to confirm completion, together with a link to the results. Two windows display an overview and zoomed region of the read alignments (Fig. 2); paired reads are connected by a line, with a blue rectangle confirming that the result conforms to the expected orientation and paired read distance. Matching reads, together with their matching or non-matching read pairs, are viewed as a table or can be downloaded as a multi-FASTA format file for further analysis.

Fig. 2

Screenshot of TAGdb showing the alignment of short reads from wheat variety Chinese Spring to a sample query sequence.

Fig. 2

Screenshot of TAGdb showing the alignment of short reads from wheat variety Chinese Spring to a sample query sequence.

The key value of this tool is that it provides researchers with rapid yet simple access to the wheat genome sequence data being produced by new sequencing technologies. The identification of a large number of matching reads may enable the local assembly of the wheat genomic region. Where few reads are identified, read pairs may be used to PCR amplify and sequence the gene as well as genomic sequence flanking the matching query. Wheat TAGdb currently hosts whole-genome paired read libraries of wheat cultivar Chinese Spring, including specific data for the long and short arms of isolated chromosomes. Read lengths vary between 35 and 100 bp, with a range of insert sizes from 300 to 3,700 bp. Additional wheat short read data for different wheat varieties will be hosted on TAGdb as they becomes publicly available in the near future.

Comparative wheat genome and genetic maps on CMap and CMap3D

CMap is a generic, extensible web-based comparative map viewer for displaying and comparing genetic and physical maps from any species (Youens-Clark et al. 2009). There are two main CMap databases of interest to wheat researchers. The most comprehensive is hosted within GrainGenes (Matthews et al. 2003, Carollo et al. 2005) and is linked from the wheatgenome.info front page. The wheatgenome.info installation of the CMap system aims to link specifically the assembled wheat chromosome arm information with the sequenced genomes of Brachypodium distachyon and rice, as well as a genetic map of the D genome donor of hexaploid wheat, A. tauschii. Bread wheat genome data include syntenic builds for chromosome arms 7DS and 7BS, with other chromosomes being added as an ongoing process.

A CMap summary interface provides links to CMap viewer, administration, tutorial document, map search and feature search functionalities. When a main reference sequence is selected, users can add a physical sequence map or a genetic map as a second map. As genetic and physical maps become more abundant, their effective visualization becomes a challenge. CMap3D is a tool developed based on CMap for the visualization and comparison of multiple genome or genetic maps. This software is a stand-alone client and available for Windows, OSX and Linux (Duran et al. 2010a). The comparative maps present each corresponding marker and the links between maps as a three-dimensional view (Fig. 3). CMap3D overcomes the limitation of comparing multiple adjacent aligned maps and provides a more user-friendly comparison of multiple genomes or genetic maps in three-dimensional space.

Fig. 3

An inter-species comparison between a physical map of the wheat 7DS syntenic build chromosome, a genetic map of Aegilops tauschii chromosome 7D and a physical map of the Brachypodium distachyon chromosome 3 (between 39 and 45 Mbp) using CMap3D.

Fig. 3

An inter-species comparison between a physical map of the wheat 7DS syntenic build chromosome, a genetic map of Aegilops tauschii chromosome 7D and a physical map of the Brachypodium distachyon chromosome 3 (between 39 and 45 Mbp) using CMap3D.

Annotated wheat EST single nucleotide polymorphisms within autoSNPdb

Advances in second-generation sequencing technologies have greatly increased the scale and scope to interrogate genomes and uncover genetic variation. However, differentiating between sequence errors and real SNPs remains a challenge, particularly for large and complex genomes such as wheat (Duran et al. 2009c, Imelfort et al. 2009). An approach to improve polymorphism prediction accuracy includes deep sequencing and multiple measures of prediction confidence.

AutoSNPdb (Duran et al. 2009a, Duran et al. 2009b) is the latest version of SNP discovery software which started with autoSNP (Barker et al. 2003, Batley et al. 2003) and includes SNPServer (Savage et al. 2005). It provides an extensible and user-friendly graphical interface facilitating a variety of queries to identify SNPs related to specific genes or traits. This application processes multiple consensus sequences from multiple EST reads and identifies candidate SNPs using a series of Perl scripts.

The current autoSNPdb application hosts data for important crops including rice, barley, Brassica and wheat (Duran et al. 2010b). Within wheat autoSNPdb, the accuracy of polymorphism detection has been improved by adopting the strategy of deep coverage sequencing of specific wheat cultivars. Wheat ESTs generated by Roche 454 second-generation sequencing have been assembled using MIRA, with the resulting assembly processed using autoSNPdb Perl scripts to identify SNPs. Wheat autoSNPdb provides a valuable resource of annotated genetic markers of wheat, which can be used for genetic diversity analysis, cultivar identification and high-resolution genetic map construction.

Wheat autoSNPdb can be searched using keywords, similarity to a query sequence, or by selecting SNPs which differentiate between varieties. A list of consensus contigs is displayed which includes the consensus sequence with aligned reads and highlighted SNPs (Fig. 4). Full annotation of potential gene function is also displayed, and SNPs can also be searched based on homologous locations in the rice genome. AutoSNPdb is recommended to be viewed by using Mozilla Firefox as Internet Explorer may not provide full functionality.

Fig. 4

The wheat autoSNPdb web interface displaying the wheat sequence assembly, with predicted SNPs as vertical bars.

Fig. 4

The wheat autoSNPdb web interface displaying the wheat sequence assembly, with predicted SNPs as vertical bars.

Wheat genome Wiki

The Wiki hosted at wheatgenome.info aims to assist communication between international groups undertaking diverse wheat sequencing activities. The Wiki is based on the popular free web-based Wiki software application from MediaWiki (http://www.mediawiki.org) which is also used by Wikipedia. This Wiki can provide an economic and efficient way to communicate and collaborate, and any research group which is undertaking wheat genome sequencing is welcome to describe their activities on the Wiki, with secure access provided on request.

Conclusions and future direction

The wheatgenome.info system hosts a range of wheat genome information with unrestricted public access. Wheat genome sequencing is still in its infancy, and a complete high quality genome sequence is not expected until 2015 at the earliest. Meanwhile, the number and quality of draft genome assemblies are likely to increase, together with an increasing amount of genome information relating to different wheat cultivars and wild relatives. The wheatgenome.info resource provides researchers with early access to these genetic and genomic data allowing them to compare query sequences with genomic data, identify genes at loci of interest, extract new genetic marker information, distinguish between homoeologous and varietal SNP markers, and access a hub for discussion on wheat genome sequencing activities beyond the current scope of the international consortium. The collation of this information within one place, together with links to external wheat genome resources, greatly facilitates researchers who wish to use this information to improve this valuable crop.

Funding

This work was supported by the Australian Research Council [Projects LP0882095, LP0883462 and DP0985953].

Acknowledgments

Support from the Australian Genome Research Facility (AGRF), the Queensland Cyber Infrastructure Foundation (QCIF), the Australian Partnership for Advanced Computing (APAC) and Queensland Facility for Advanced Bioinformatics (QFAB) is gratefully acknowledged.

Abbreviations

    Abbreviations
  • autoSNPdb

    automatic annotated single nucleotide polymorphism database

  • BAC

    bacterial artificial chromosome

  • BLAST

    Basic Local Alignment Search Tool

  • EST

    expressed sequence tag

  • GBrowse2

    Generic Genome Browser version 2

  • NGS

    next-generation sequencing

  • SNP

    single nucleotide polymorphism

  • TAGdb

    Tag database.

References

Arnaoudova
EG
Bowens
PJ
Chui
RG
Dinkins
RD
Hesse
U
Jaromczyk
JW
, et al.  . 
Visualizing and sharing results in bioinformatics projects: GBrowse and GenBank exports
BMC Bioinform.
 , 
2009
, vol. 
10
 
Barker
G
Batley
J
O'Sullivan
H
Edwards
KJ
Edwards
D
Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP
Bioinformatics
 , 
2003
, vol. 
19
 (pg. 
421
-
422
)
Batley
J
Barker
G
O'Sullivan
H
Edwards
KJ
Edwards
D
Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data
Plant Physiol.
 , 
2003
, vol. 
132
 (pg. 
84
-
91
)
Batley
J
Edwards
D
Genome sequence data: management, storage, and visualization
Biotechniques
 , 
2009
, vol. 
46
 (pg. 
333
-
336
)
Berkman
PJ
Skarshewski
A
Lorenc
MT
Lai
K
Duran
C
Ling
EY
, et al.  . 
Sequencing and assembly of low copy and genic regions of isolated Triticum aestivum chromosome arm 7DS
Plant Biotechnol. J.
 , 
2011
, vol. 
9
 (pg. 
768
-
775
)
Bhalla
PL
Genetic engineering of wheat—current challenges and opportunities
Trends Biotechnol.
 , 
2006
, vol. 
24
 (pg. 
305
-
311
)
Carollo
V
Matthews
DE
Lazo
GR
Blake
TK
Hummel
DD
Lui
N
, et al.  . 
GrainGenes 2.0. An improved resource for the small-grains community
Plant Physiol.
 , 
2005
, vol. 
139
 (pg. 
643
-
651
)
Chantret
N
Salse
J
Sabot
F
Rahman
S
Bellec
A
Laubin
B
, et al.  . 
Molecular basis of evolutionary events that shaped the hardness locus in diploid and polyploid wheat species (Triticum and Aegilops)
Plant Cell
 , 
2005
, vol. 
17
 (pg. 
1033
-
1045
)
Dolezel
J
Kubalakova
M
Paux
E
Bartos
J
Feuillet
C
Chromosome-based genomics in the cereals
Chromosome Res.
 , 
2007
, vol. 
15
 (pg. 
51
-
66
)
Donlin
M
Using the Generic Genome Browser (GBrowse)
Curr. Protoc. Bioinformatics
 , 
2007
 
Chapter 9: 9
Duran
C
Appleby
N
Clark
T
Wood
D
Imelfort
M
Batley
J
, et al.  . 
AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
D951
-
D953
)
Duran
C
Appleby
N
Vardy
M
Imelfort
M
Edwards
D
Batley
J
Single nucleotide polymorphism discovery in barley using autoSNPdb
Plant Biotechnol. J.
 , 
2009
, vol. 
7
 (pg. 
326
-
333
)
Duran
C
Boskovic
Z
Imelfort
M
Batley
J
Hamilton
NA
Edwards
D
CMap3D: a 3D visualization tool for comparative genetic maps
Bioinformatics
 , 
2010
, vol. 
26
 (pg. 
273
-
274
)
Duran
C
Eales
D
Marshall
D
Imelfort
M
Stiller
J
Berkman
PJ
, et al.  . 
Future tools for association mapping in crop plants
Genome
 , 
2010
, vol. 
53
 (pg. 
1017
-
1023
)
Duran
C
Edwards
D
Batley
J
Edwards
D
Hanson
D
Stajich
J
Molecular marker discovery and genetic map visualisation
Applied Bioinformatics
 , 
2009
New York
Springer
(pg. 
165
-
189
)
Edwards
D
Batley
J
Plant genome sequencing: applications for crop improvement
Plant Biotechnol. J.
 , 
2010
, vol. 
7
 (pg. 
1
-
8
)
Feuillet
C
Leach
JE
Rogers
J
Schnable
PS
Eversole
K
Crop genome sequencing: lessons and rationales
Trends Plant Sci.
 , 
2011
, vol. 
16
 (pg. 
77
-
88
)
Gill
BS
Appels
R
Botha-Oberholster
AM
Buell
CR
Bennetzen
JL
Chalhoub
B
, et al.  . 
A workshop report on wheat genome sequencing: International Genome Research on Wheat Consortium
Genetics
 , 
2004
, vol. 
168
 (pg. 
1087
-
1096
)
Gupta
PK
Mir
RR
Mohan
A
Kumar
J
Wheat genomics: present status and future prospects
Int. J. Plant Genomics
 , 
2008
, vol. 
2008
 pg. 
896451
 
Imelfort
M
Duran
C
Batley
J
Edwards
D
Discovering genetic polymorphisms in next-generation sequencing data
Plant Biotechnol. J.
 , 
2009
, vol. 
7
 (pg. 
312
-
317
)
Imelfort
M
Edwards
D
De novo sequencing of plant genomes using second-generation technologies
Brief. Bioinform.
 , 
2009
, vol. 
10
 (pg. 
609
-
618
)
Marshall
DJ
Hayward
A
Eales
D
Imelfort
M
Stiller
J
Berkman
PJ
, et al.  . 
Targeted identification of genomic regions using TAGdb
Plant Methods
 , 
2010
, vol. 
6
 pg. 
19
 
Matthews
DE
Carollo
VL
Lazo
GR
Anderson
OD
GrainGenes, the genome database for small-grain crops
Nucleic Acids Res.
 , 
2003
, vol. 
31
 (pg. 
183
-
186
)
Ning
Z
Montgomery
S
Out of the sequencer and into the wiki as we face new challenges in genome informatics
Genome Biol.
 , 
2010
, vol. 
11
 pg. 
308
 
Paux
E
Sourdille
P
Salse
J
Saintenac
C
Choulet
F
Leroy
P
, et al.  . 
A physical map of the 1-gigabase bread wheat chromosome 3B
Science
 , 
2008
, vol. 
322
 (pg. 
101
-
104
)
Safar
J
Simkova
H
Kubalakova
M
Cihalikova
J
Suchankova
P
Bartos
J
, et al.  . 
Development of chromosome-specific BAC resources for genomics of bread wheat
Cytogenet. Genome Res.
 , 
2010
, vol. 
129
 (pg. 
211
-
223
)
Savage
D
Batley
J
Erwin
T
Logan
E
Love
CG
Lim
GAC
, et al.  . 
SNPServer: a real-time SNP discovery tool
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
W493
-
W495
)
Wanjugi
H
Coleman-Derr
D
Huo
N
Kianian
SF
Luo
MC
Wu
J
, et al.  . 
Rapid development of PCR-based genome-specific repetitive DNA junction markers in wheat
Genome
 , 
2009
, vol. 
52
 (pg. 
576
-
587
)
Wicker
T
Mayer
KFX
Gundlach
H
Martis
M
Steuernagel
B
Scholz
U
, et al.  . 
Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives
Plant Cell
 , 
2011
, vol. 
23
 (pg. 
1706
-
1718
)
Youens-Clark
K
Faga
B
Yap
IV
Stein
L
Ware
D
CMap 1.01: a comparative mapping application for the Internet
Bioinformatics
 , 
2009
, vol. 
25
 (pg. 
3040
-
3042
)