OGDD (Olive Genetic Diversity Database): a microsatellite markers' genotypes database of worldwide olive trees for cultivar identification and virgin olive oil traceability

Olive (Olea europaea), whose importance is mainly due to nutritional and health features, is one of the most economically significant oil-producing trees in the Mediterranean region. Unfortunately, the increasing market demand towards virgin olive oil could often result in its adulteration with less expensive oils, which is a serious problem for the public and quality control evaluators of virgin olive oil. Therefore, to avoid frauds, olive cultivar identification and virgin olive oil authentication have become a major issue for the producers and consumers of quality control in the olive chain. Presently, genetic traceability using SSR is the cost effective and powerful marker technique that can be employed to resolve such problems. However, to identify an unknown monovarietal virgin olive oil cultivar, a reference system has become necessary. Thus, an Olive Genetic Diversity Database (OGDD) (http://www.bioinfo-cbs.org/ogdd/) is presented in this work. It is a genetic, morphologic and chemical database of worldwide olive tree and oil having a double function. In fact, besides being a reference system generated for the identification of unkown olive or virgin olive oil cultivars based on their microsatellite allele size(s), it provides users additional morphological and chemical information for each identified cultivar. Currently, OGDD is designed to enable users to easily retrieve and visualize biologically important information (SSR markers, and olive tree and oil characteristics of about 200 cultivars worldwide) using a set of efficient query interfaces and analysis tools. It can be accessed through a web service from any modern programming language using a simple hypertext transfer protocol call. The web site is implemented in java, JavaScript, PHP, HTML and Apache with all major browsers supported. Database URL: http://www.bioinfo-cbs.org/ogdd/


Introduction
Olive tree (Olea europaea L.) represents the most important oil producing crop in the Mediterranean basin. Its cultivation covers over eight million hectares of land and about 70% of the produced olive oil is consumed. Olive tree is a diploid species (2n ¼ 46), whose seeds are produced by cross-pollination (1,2), is able to survive for a long time. It is a glycophytic species that exhibits a big tolerance to drought and salt stresses when it is compared with other fruit trees that are generally salt sensitive.
Olive oil is the oil extracted from Olea europaea L. fruit, using only mechanical methods or other physical procedures that do not alter the glyceric structure of the oil, and therefore conserve its vitamins and other natural healthy high-value compounds. The virgin olive oil is recognized to have beneficial effects on health. In fact, it is able to reduce blood pressure and low-density lipoprotein cholesterol. It has also antioxidant and antimicrobial virtues, such as cancer prevention (3,4).
According to the Food and Agriculture data from the United Nations, the Mediterranean countries produce 90% of World olives, and the main olive producers are Spain, Italy, Greece, Turkey, Tunisia, Morocco, Syria and Portugal (FAO). Olive and olive oil have great commercial and economical importance in the Mediterranean region. In the last few years, the growth of olive production has expanded throughout the world and olive oil consumption all over the world has significantly increased, and consumers are becoming more informed and increasingly aware of the quality of foods they buy and eat.
Virgin olive oils obtained from one genetic variety of olive or from different varieties are called monovarietal or coupage, respectively. Regarding the monovarietal virgin olive oils, they possess specific characteristics that are associated with the olive variety from which they are extracted. In addition, a high occurrence of mislabeling, homonyms and synonyms has been reported in olive (5). Therefore, it is very important to improve or develop new traceability systems allowing easy and accurate cultivars and oils identification to manage properly the rich variability of olive. A well-documented traceability process and a confirming authenticity tool are needed for the control of the quality of the virgin olive oil introduced in the market. In other words, great effort is being made to obtain a unique and unequivocal genetic profile for every cultivar using molecular markers, since major chemical analyses of virgin olive oils from the similar category but from diverse origins have a limited significance. Indeed, despite the high variability related to the environmental conditions of the different olive groups, their morphological characteristics and the analyses of chemical composition of fatty acid and secondary metabolites are not able to supply reliable results for oil traceability (6)(7)(8). For this reason, the genetic identity seems to be the most appropriate technique to identify the variety from which the olive oil under study derives. The use of DNA-based technology in the field of food authenticity, particularly olive oil, is gaining increasing attention.
Recently, the use of molecular markers, such as RAPDs (random amplified polymorphic DNA) (9), AFLPs (amplified fragment length polymorphisms) (10) and SSR (simple sequence repeats), has been recommended to depict virgin olive oil origin and traceability (6,7,11,12). At present, microsatellites (SSRs) are the most relevant genetic markers used in olive cultivar characterization and virgin olive oil authenticity thanks to their numerous properties. Indeed, SSR markers are multiallelic, codominant, highly polymorphic, widely distributed along the plant genomes and easily amenable to PCR-based analyses. Moreover, they have great reproducibility and are currently the most reliable DNA profiling techniques in forensic investigation (13). Indeed, by carrying out Simple Sequence Repeats (SSRs) markers analysis, we can characterize the genetic profile of monovarietal virgin olive oil by comparing the oil-derived pattern with reference database olive oil cultivars. Thus, to efficiently identify and analyse the unknown commercial virgin olive oils, the development of a database including information about olive cultivars and olive oils based on genetic data sets, particularly, SSR markers, becomes necessary. Few online databases have been developed for olive tree (olea europaea L.), such as istrian olive database (14), which includes morphological and molecular data of some istrian olive trees and OLEA databases (http://www.oleadb.it/) (15). The latter contain data (microsatellite (SSR) loci) of a wide set of olive cultivars and give the possibility to query for cultivars corresponding to a specific data profile or to look for the variety identity when a profile is available. Another database that can be mentioned is Olea EST database (16). Actually, it is a collection of Olea europaea L. and is constructed by first clustering EST reads to produce tentative consensus (TC) sequences and singletons (sESTs). The database annotates and classifies the unique transcripts found according to their biological functions. Other databases such as GLOBAL INVASIVE SPECIES DATABASES (17) remain classical and give simple botanical and biological description of several tree species (such as olea). Among all databases proposed in the literature, only OLEA databases (15) seem to give effective molecular identification of olive cultivars, but they generate only SSR marker size(s) of each olive cultivar without any other information. Therefore, it is necessary to provide public a new available database for identifying an unknown olive tree or oil using SSR markers and providing extended profile description pertaining to the displayed cultivar.
For this purpose, we generated a simple format database of olive species called OGDD, for Olive Genetics Diversity Database, which currently contains many olive tree and oil characteristics (agro-morphological, chemical, genetics (SSR DNA band size(s) . . . ). The created database also integrates a computer application that provides a supplementary test which compares a user-provided SSR fingerprint. In this paper, we present the OGDD database that is currently implemented on the website (http://www. bioinfo-cbs.org/ogdd/) and which can be considered as a reference system in evaluating data obtained from the analysis of unknown samples and in defining the origin and composition of the virgin olive oil. This database can be used by all researchers and stakeholders interested in olive oil field. In the near future, this database will be a useful platform in the traceability and authenticity of olive oil.

Data extraction
The OGDD site allows anyone interested in the olive oil sector to simply and quickly recover genetic profile from known world-wide olive cultivars. The data included in the 'OGDD' derive from publications on morphological, biochemical, sensory and molecular olive oils cultivated world-wide (Table 1). For example, the genetic profiles of different olive oil varieties involved with SSR families are as follows: (DCA (18), GAPU (19), UDO (20), EMO (21), IAS-oli (22,23), IGP (24). Each family marker is composed of 2 (EMO) to 13 (IAS-oli) markers and for each locus, the allele size of both alleles expressed in base pairs is provided. In the case of homozygosis, both values are equal.
Thus, each variety has a range of additional data such as country, region, morphological data represented by four photos (tree, leaf, fruit and kernel), acidity, taste, synonym, bibliographic references, oil content (%), the physicochemical composition and a matching score calculated to identify each new genetic profile. In fact, a user can compare the genetic profiles of the varieties presented in the database with the genetic profile that s/he experimentally determines. S/He can, therefore, identify the origin of her/ his cultivar.

Data collection and classification
The OGDD data are organized into a database that gathers each type of data separately. The following information is collected for each olive cultivar variety such as the country of origin, the growing region, the biological characteristics of the variety (pollination, climate requirements and resistance to diseases, pests and climates), morphological characteristics (tree, leaf, fruit and nucleus), biochemical characteristics (acidity, oil content, polyphenols, tocopherols, pigments (chlorophylles, carotenes), . . . ), organoleptic characteristics (taste and aroma) and molecular data particularly microsatellite markers (SSR) (Figure 1). For the microsatellite markers, the retrieved data are their name and the groups to which they belong and their allele size (base pairs). OGDD updates are provided manually yearly by the bioinformatics group of the Centre of Biotechnology of Sfax, Tunisia, by checking the SSR markers that are newly obtained by research teams. The current status of the molecular markers data included in OGDD database contains only the SSR markers. In a following step, other molecular markers such as SNP markers will be integrated. Updates include records and complete information concerning Olea europaea species.

Database construction and implementation
OGDD is a browser-independent Web database built using MySQL (version 5.1.41) as the Database Management System. The OGDD database web interface was constructed using HTML, JavaScript and PHP as programming languages. The database was hosted on Apache (version 2.2.14) web server with a Linux operating system. The web tool is compatible with all major browsers including FireFox, Safari, Chrome and Internet Explorer. The architecture of OGDD is shown in Figure 2. The data in OGDD can be used to compute the similarity of an unknown virgin olive oil variety based on the SSR information and compare with those existing in the database.

Database use and access
The major goal for OGDD is to develop an approach toward identification, traceability and authenticity of virgin olive oil to protect the interest of both consumers and producers. OGDD database can be accessed from any computer with web-access, just requiring the registration of the user.

Algorithm development and score calculating
To help the users know and identity the origin of the virgin olive oil genetic profile they have, we propose the diagram of the implemented algorithm that identifies the unknown variety based on the data available in the database by computing the probabilities of homology with the unknown virgin olive oil variety user input (Figure 3). The result is displayed as a percentage of homology.  The graphical depiction of the algorithm is given in Figure 2. The scoring computation will proceed with four major steps. First, it does not only use an intermediate variable X for the exact calculating score(s), but also initializes the scores range to zero for all cultivars existing in the present database. Second, for each variety it initializes the variable X to zero, selects the genetic profile of every variety and checks all markers of this variety already chosen by the  user: if yes, the user must enter the size of the markers of the unknown cultivar. Third, the scoring computation compares allele sizes entered to the database with those of all varieties existing in the OGDD for the same marker. The X value is incremented according to the allele sizes of the existing varieties in OGDD. In fact, it is 1 if the two allele sizes are equal, 0.5 when only one allele size is equal and 0 when the two allele sizes are different. Finally, it calculates the score after the comparison of SSR markers. The score by this method depends on both similarities between the allele sizes and the number of SSR markers.
Calculate the score: Score 5 X/N X : sum of the elementary scores for each marker N : number of markers typed by the user Select the varieties that have a score 0.8. If there is no variety that satisfies this condition, the result will be: 'Not found variety in this genetic profile'.

OGDD web site interrogation
Once the parameters of the database are defined and the data are grouped, the database is created physically with php / MySQL.
Querying the database needs a registration for a user account. The administrator has the option to accept or reject the user access permission. In addition, the administrator must authenticate to access the session and perform the functions described in this page. Once accepted, the user must log, check the boxes to select markers, enter the marker numbers and the allele sizes. The result is a homology probability of the unknown variety in question with the varieties of the database. In addition, for each displayed result, we have a link to the details of the variety, namely, the morphology of the tree, leaves and fruit, biochemical characteristics, oil organoleptic quality and allele sizes of all SSR markers existing until now in the database. Such a result is illustrated in Figure 4 following an interrogation case of our database. Finally, the user can edit or modify his/her search.

Case examples of unknown varieties
Based on the algorithm described above, we have developed software for data basing and managing SSR DNA fingerprint profiles. A query profile can be compared with all fingerprint profiles included in the database, resulting in a profile list formed at decreasing similarity percentages. Only the results with similarity levels higher than 80% are displayed.
Three profile samples (SSR fingerprints of three unknown cultivars) have been separately tested in the OGDD database. The objective was to identify with accuracy these three varieties from their SSR fingerprint profiles. Therefore, for each unknown cultivar, we selected three markers (DCA03, DCA09, GAPU71b). Allele marker sizes were respectively included as follows: 232/253, 194/194, 144/144 for the first sample; 247/255, 194/206, 117/122 for the second sample and 228/240, 183/183, 124/144 for the third sample. For each examined cultivar, the displayed result showed 100% similarity to Chetoui, Picholine Languedoc and Arbequina cultivar, respectively. Clicking on details, the database displayed characteristics (such as, morphological, agronomical, physicochemical data) for each resulting monovarietal virgin olive oil cultivar (Figure 4).

Conclusion
OGDD, which is a new and comprehensive database, has been developed focusing on the identification and authentication of olive plant. Compared with other few existing databases for olive species, OGDD has its own specific features and advantages. In fact, it provides a reference system in evaluating the data obtained from the analysis of unknown samples and in defining the varietal composition of the virgin olive oil. It also contains a great deal of information for each cultivar identified by the user. This database can be used by all researchers and stakeholders interested in olive cultivar identification and virgin olive oil authentication. Up to now, OGDD initializes the search only by SSR markers. However, its next update extends the query by adding other molecular markers, such as, SNP and includes more information of other olive cultivars.