A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank ( http://www.ncbi.nlm.nih.gov/ ) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/ ), etc. Flexible search and data visualization interfaces enable easy access to the data via internet ( https://gabi.rzpd.de/PoMaMo.html ). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes.
Received August 13, 2004; Revised and Accepted September 17, 2004
Genome analysis in potato ( Solanum tuberosum ) started with the construction of molecular linkage maps for the complement of twelve chromosomes, based on restriction fragment length polymorphism (RFLP) markers ( 1 – 7 ). At the Max-Planck-Institute for Plant Breeding Research, more than 1000 RFLP loci have been mapped in different mapping populations. Random potato genomic PstI restriction fragments, potato ESTs and cloned genes of known function were used as marker probes. RFLP mapping of tomato sequences in potato and potato sequences in tomato revealed high co-linearity between the genomes of these two closely related Solanaceae species and connected the different RFLP maps ( 1 , 3 , 6 ). Comparative mapping also identified conserved linkage blocks between the potato and Arabidopsis genomes ( 5 ). DNA sequence information has been collected for most of the markers employed in various mapping experiments. More recently, information on single nucleotide polymorphisms (SNPs) and insertion deletion polymorphisms (InDels) was generated at a number of loci, preferentially linked to genes controlling pathogen resistance ( 8 ). BAC (bacterial artificial chromosome) insertions have been anchored to the genetic map ( 8 ) and several BAC insertions have been fully sequenced (unpublished data). These genome data were the basis for localizing factors within the potato genome, which control agronomic characters relevant for cultivation and use of potato, such as disease resistance ( 9 ) and tuber quality traits ( 10 , 11 ).
To facilitate global access and use of these genome data, the PoMaMo (Potato Maps and More) database was constructed as a part of the GABI Primary Database (GabiPD), which is located at the RZPD German Resource Center for Genome Research (Berlin, Germany). GabiPD has been established as a central internet-based database within the German Plant Genome Project ‘GABI’ (Genomanalyse im biologischen System Pflanze), with the focus to collect and handle data from groups involved in GABI projects.
Here we report the basic structure and information content of this new database.
Twenty-four linkage maps, two for each potato chromosome, with altogether around 1000 mapped elements (RFLP loci or BAC clones), publication references, more than 2000 genomic and cDNA sequences from 30 different diploid and tetraploid potato genotypes, BLAST results, primer information for sequence amplification and more than 1600 SNP and InDel positions have been integrated into a single relational database, which is accessible via internet.
The multi-level inheritance database schema allows the connection between different genomic data sets (for a detailed depiction of the PoMaMo database schema see http://gabi.rzpd.de/projects/Pomamo/PoMaMoDBSchema.shtml ). This way, marker sequences from different potato genotypes, information on variable nucleotide positions, and details about marker chromosomal position were merged, for example. Mapped elements cannot only be retrieved by the name of the element, but also by sequence name, putative gene function, literature references, similarity to other annotated genes, or by GenBank/SGN sequence accession numbers for example.
SEARCH AND VISUALIZATION INTERFACES
The PoMaMo start page is accessible via https://gabi.rzpd.de/PoMaMo.html . The search and visualization interface GreenCards can be called up directly from the PoMaMo entry page and enables the user to browse and search for a comprehensive set of potato genome data. GreenCards allows queries by genotype name (e.g. SR1), marker or sequence name (e.g. P1h3), keyword (i.e. function of annotated genes with a high similarity to the potato sequences, e.g. ‘resistance’) or sequence accession number (e.g. AJ487408). Wildcards can be used within the database searches. A GreenCards query using the keyword ‘resistance’, for example, provides the user with a list of search hits. All these hits can be selected for display on the web.
General information on the object such as genotypic information, details about the clone library, information on amplification or sequencing primers are shown at the top ( Figure 1 ) and publication references are linked with PubMed ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi ). For each element, which is mapped on potato chromosomes, the chromosome number and position in centimorgan is displayed. Since GreenCards is connected with the map visualization interface YAMB, it is possible to call-up the whole chromosomal map from a GreenCards page by a mouse-click ( Figure 1 ). Sequence information is shown and links to GenBank entries via GenBank sequence accession numbers are realized or, in the case of tomato markers (TG markers), to SGN database. GreenCards displays also SNP and InDel positions as described in ( 8 ). The results from BLAST comparison against the non-redundant database from GenBank and the Arabidopsis protein database from MIPS (Munich Information Center for protein sequences, http://mips.gsf.de/proj/thal/db/ ) are accessible, i.e. the BLAST results are shown in summary or the whole BLAST report with a graphical overview of all hits can be called up by a mouse-click.
YAMB—YET ANOTHER MAP BROWSER
The Java servlet tool YAMB was written to display genetic or physical maps of the potato chromosomes. The maps are directly accessible from the PoMaMo startpage or can be started from a GreenCards window as described above. The chromosomal positions of the mapped objects are read directly from the database and the maps are drawn dynamically. This way it is easy to add new elements and view the actual map at once. The chromosome maps constructed using populations BC916 2 and F1840 are shown in parallel, aligned and connected by marker loci mapped in both populations. The alignment shows that map distances between the same pairs of RFLP loci are variable between the different mapping populations. The F1840 map is with 1044 cM total length longer than the BC916 2 map with 922 cM. The differences observed can result from the different population size, the different parental genotypes and different sets of markers used for map construction. The maps can be zoomed out to view all mapped elements on a given chromosome at once or zoomed in for a more detailed view. The maps are interactive, that is a mouse-click on an element of interest opens up a GreenCards window, which displays all the information available for the element as described above. The literature referenced for each mapped element gives access to further genetic or biological data, for example whether a marker was used in mapping experiments for qualitative and quantitative resistance factors or was used for synteny studies between potato and tomato or between potato and Arabidopsis .
Molecular linkage maps
Potato RFLP maps have been constructed based on diploid mapping populations, which originated from crossing non-inbred, heterozygous parental clones. The principles and algorithms for linkage group construction in this material have been described ( 12 ). The mapping populations BC916 2 ( 2 – 4 ) and F1840 ( 3 , 5 , 13 ) consisted of 67 backcross and 92 F1 individuals, respectively. In each population and for each chromosome, a maternal and paternal linkage group and linkage groups based on markers shared between both parents have been constructed, which are connected and aligned with each other via allelic bridges between maternal and paternal RFLP alleles ( 4 , 12 ). For simplicity of display in the PoMaMo database, the linkage groups for each chromosome of each population were merged, based on the arithmetic mean of the genetic distances between pairs of RFLP marker loci that had allelic bridges between the parental linkage groups (anchor loci). The other markers (informative for one parent only or shared among both parents) were ordered according to their genetic distance from flanking anchor loci. Marker order within closely linked groups of markers (<5 cM) should therefore be considered ambiguous.
Sequence analysis, SNP and InDel detection
Marker plasmids with potato genomic (GP markers) or potato leaf cDNA (CP markers) insertions were subjected to single-run sequencing from both ends using vector-specific primers. Potato tuber ESTs (S and P markers originating from cultivar Saturna and Provita, respectively) were sequenced (single-run) from the 5′ end ( 5 ). Vector sequences were removed and overlapping forward and reverse sequences were assembled into single sequences. Custom sequencing was performed by the ADIS unit at the Max-Planck-Institute for Plant Breeding Research on ABI automated DNA sequencers (PE Biosystems, Foster City, CA USA) using the dideoxy chain-termination sequencing method. SNP and InDel detection in diverse diploid and tetraploid potato genotypes was performed as described in ( 8 ). All sequences are also deposited in GenBank.
The PoMaMo database is part of the GABI Primary Database. The website running on Apache Web Server (version 3.1)/Tomcat (version 4.1) ( http://www.apache.org/ ) is accessible via https://gabi.rzpd.de . The object-oriented database design was done using the data modelling application ER/Studio (version 5.5.2; Embarcadero, http://www.embarcadero.com/ ) and was implemented on a relational Oracle database (version 8.1.7; http://www.oracle.com ) running on Tru64Unix ( http://h30097.www3.hp.com/ ). Access to the database is realized via an object-oriented interface available in Perl and Java. The object-oriented interface is generated automatically from the database scheme. Data handling, i.e. insertions, updates and deletions, is done through this interface. Various Perl modules for processing special file formats (e.g. MS-EXCEL, GCG and CLUSTAL alignments, sequence data in FASTA, EMBL and other formats) were used. All potato sequence data are compared at regular intervals by BLASTX analysis ( 14 ) to the non-redundant (nr) protein database mirrored from GenBank and the annotated Arabidopsis protein database as available from MIPS. The BLAST results are integrated semi-automatically in the database. The Oracle InterMedia Text, which is a text-management and analysis extension to the Oracle database server, was utilized to allow database searches (with wildcards allowed) by keywords, gene functions, marker-, clone-, sequence- or cultivar-names, sequence accession numbers, publication references, etc. The web-accessible search and visualization interface GreenCards was written in Perl-cgi. Clone, SNP, primer and sequence information, BLAST results, literature references linked to PubMed, links to other databases, e.g. GenBank or SGN, are read directly from the database and are visualized. The Java servlet tool YAMB interactively displays linkage maps of the potato chromosomes. The maps were built dynamically considering information about the chromosomal positions of each element as read from the database. GreenCards and YAMB are integrated with each other.
DISCUSSION AND CONCLUSION
The PoMaMo database harbours to date a comprehensive collection of genomic potato data, like sequences, SNP/InDel information and mapping data. It is the only database that contains information on SNP and InDels from diploid and tetraploid potato genotypes. The object-oriented structure of the database makes it possible to enlarge the schema with regard to new types of data, e.g. phenotypic data, and to merge phenotypic with genotypic data.
The GabiPD structure will also allow the quick integration of physical and function mapping data, gene expression and proteomics data, once these data become available for potato.
All data visualized via the web-accessible interfaces GreenCards and YAMB were selected directly from the database; in this way, data updates and newly integrated data are accessible at once. Via the networking of YAMB and the text-based search and visualization interface GreenCards, it is easy to switch from the map display to the GreenCards view, which provides detailed information on objects of interests or vice versa. Due to the modular body of GreenCards, the tool is extendable to upcoming data types, like phenotypic information. YAMB allows the interactive and parallel depiction of two or more maps, which are connected by identical markers. YAMB is, therefore, also a very comfortable tool to display synteny maps.
The sequence-tagged sites in the potato genome, as retrievable from the PoMaMo database, provide a resource for the mapping and marker-assisted selection of phenotypic traits in potato, tomato and other related species of the Solanaceae family. They can also be used for anchoring the potato genetic maps including function maps to physical maps of potato and other Solanaceae species and to whole-genome sequences of other plants, thereby connecting structural with functional genome analysis. The PoMaMo database can also function as one module in a global network of plant genome databases.
We thank Iris Bertram for graphical design of the PoMaMo web pages and Julio Cervantes for support in programming. PoMaMo was developed within the project GABI-Primary Database funded by the German Federal Ministry of Education and Research (grant: 0312272).
RZPD German Resource Centre for Genome Research GmbH, Berlin, Germany and 1Max-Planck-Institute for Plant Breeding Research, Cologne, Germany