Clinically relevant updates of the HbVar database of human hemoglobin variants and thalassemia mutations

Abstract HbVar (http://globin.bx.psu.edu/hbvar) is a widely-used locus-specific database (LSDB) launched 20 years ago by a multi-center academic effort to provide timely information on the numerous genomic variants leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Here, we report several advances for the database. We made clinically relevant updates of HbVar, implemented as additional querying options in the HbVar query page, allowing the user to explore the clinical phenotype of compound heterozygous patients. We also made significant improvements to the HbVar front page, making comparative data querying, analysis and output more user-friendly. We continued to expand and enrich the regular data content, involving 1820 variants, 230 of which are new entries. We also increased the querying potential and expanded the usefulness of HbVar database in the clinical setting. These several additions, expansions and updates should improve the utility of HbVar both for the globin research community and in a clinical setting.

The HbVar database of hemoglobin variants and thalassemia mutations is one of the oldest and most widely used locus-specific databases (LSDBs), not only from the globin but also from the wider genetic database community. Hb-Var was launched 20 years ago, in 2001. It was built from previous compilations of variants in books (2,3), converting this information into a publicly available LSDB to provide timely information to interested users, e.g. the globin research community, patients and their parents, and providers of genetic services and counseling. HbVar was developed in such a way to allow for regular data entry updates and corrections, as new hemoglobin variants and thalassemias continue to be discovered. In addition, with a comprehensive query interface, HbVar enables the user to easily access the stored information particularly for the research community, but it is also an aid for physicians in diagnosis. Since its launch, HbVar has rapidly become an important data resource for the globin research community and is considered to be one of the premier LSDBs available to date (4).
Here, apart from the regular data content updates and corrections, we report important new updates in HbVar structure and functionality, aiming both at increasing the impact of the database among not only the globin research but also the clinical community, and facilitating data querying and output.

UPDATES TO EXISTING DATA
Since the launch of HbVar (5) and the previous database updates in 2004 (6), 2007 (7) and 2014 (8), HbVar information has been expanded by more than 230 additional entries and data corrections, made continually by the database curators. Importantly, Dr. Philippe Joly (Hôpital Edouard Herriot, Unité de Pathologie Moléculaire du Globule Rouge, Lyon, France) and Dr Serge Pissard (Mondor Institute of Biomedical Research, Department of Genetics, Creteil, France) have recently joined the HbVar team as data curators. In order to identify new hemoglobin variants and thalassemia mutations not previously documented in the database, we manually scanned articles from the specialized journal Hemoglobin, which frequently publishes new hemoglobin variants and thalassemia mutations, and where applicable, previously undocumented variants and additional information for existing variants have been entered into HbVar. We also benefit from continuous communication with the globin research community and independent researchers, who provide information and references that our curators use both to update the HbVar database content with novel variants and also to rectify data errors and inconsistencies in existing variants.

THE NEW HbVar HOME PAGE
In order to better capture the data content, interrelated databases and recent updates and user statistics, the HbVar home page has been completely rebuilt. Firstly, the HbVar logo has been redesigned to capture the original concept as well as the Hb molecule notion in a more vibrant manner. Secondly, the query the database functionality now occupies a more central arrangement in the database to facilitate activity by the end-user, compared to the previous situation. Also, we included, in a tabular format, links to important HbVar functionalities and features that are grouped in different rows in the  (12)] customized to present data from HbVar and other resources. c) auxiliary information, such as the SNP coordinate converter (see below), reference sequences, and a widely used chart with mass differences resulting from amino acid substitutions.
HbVar curators and contact information are provided at the end of the new HbVar home page.

CLINICALLY RELEVANT QUERY PAGE UPGRADES
HbVar database has been considered a beneficial resource in hemoglobin research since its establishment. As such, since its last update, we opted to focus on clinically relevant updates that would also make HbVar more useful to the clinical community as well. Below, we describe two new features that aim to help clinicians in better exploiting the wealth of information available in HbVar. Both features are selfexplanatory with a brief description at the top of each query window to facilitate the user.

Compound heterozygotes phenotype
Given the many genomic variants that yield different Hb variants and thalassemia mutations, and most of them in high allelic frequencies (6,9), there are often compound heterozygous cases that have different clinical features and laboratory findings (13). Knowing the specific clinical features of a combination of certain variants is crucial to establish accurate diagnosis. For example, a common misdiagnosis can be the combination of an HBB and an HBD gene variant that leads to normal HbA 2 levels. The normal levels of HbA 2 means that these cases can easily escape the attention of the physician but identifying them can be of utmost importance especially in the case of prenatal diagnosis.
Therefore, we developed a tool to allow the HbVar users from the clinical community to explore the clinical features associated with combinations of globin gene alleles in compound heterozygotes (a total of 309 entries of the database). The compound heterozygotes phenotype tool (available at http://globin.bx.psu.edu/cgi-bin/hbvar/hematable) includes a large menu of clinical features from which the user can select by ticking on the respective boxes ( Figure 1). The selected features will be included as columns in the subsequent table generated by clicking on the 'Select columns' button. The first two columns of the table include the globin gene alleles combination for all 309 HbVar entries with information for compound heterozygotes. The menu at the left side of the screen includes filters that allows the user to narrow down their query, the top one of which is the associated variant with the number of entries for this variant in brackets. For example, the user can select the entries in which Hb S is the associated allele, where the query returns 38 results and explore the clinical features that he has previously selected in the table output. Each HbVar entry is a hyperlink that takes the user to the respective HbVar entry page ( Figure  1). The query output can be also exported in a .csv file format. Lastly, the user can alter the composition of the table by selecting new columns by clicking on the button at the top left corner of the page.

SNP coordinates converter
With the different numbering systems to determine a genomic position, there is often ambiguity as to the position of a specific variant, especially among clinicians who often need urgently to assess clinical information of a specific variant. We have therefore developed a tool that provides this positional information and specifically converts the genomic position provided in the common number system to the various other systems, such as the official Human Genome Organization (HUGO) genomic DNA-based description, the Human Genome Variation Society (HGVS) coding DNA reference sequence, the DNA-based description using the GenBank reference sequences NG 000007.3 and NG 000006.1 and lastly, the common protein-based description. This tool is available at http://globin.bx.psu.edu/ cgi-bin/hbvar/coorSeqCheck.
In the demo query available in Figure 2, the user can select a given position or range for a specific globin chain (in this case the range between −50 and +50 for the delta globin chain, using the common DNA-based description. By clicking on the 'Submit' button, the query returns 12 HbVar entries and 11 dbSNP entries, from the PSU Genome browser, along with the synonyms of these genomic positions in all other numbering systems, provided at the top of the page.

DATABASE ACCESS
Since their launch in January 2001, the HbVar database and associated resources at the Globin Gene Server [http: //globin.bx.psu.edu], such as the online Syllabi, are regularly used worldwide. Also, HbVar is very frequently accessed by Facebook and mobile devices. Users frequently contact the curators and the rest of the HbVar team members in order to submit new hemoglobin variants and/or thalassemia mutations, report missing information for existing mutants, identify inconsistencies and/or erroneous entries, and even propose collaborative projects related to HbVar data records.
Since its last update, and as seen in the 'User statistics' page that is now available (http://globin.bx.psu.edu/hbvar/ usage graphs.html), the number of annual users now exceeds 15000 for the query page and 8000 for the Summary page (based on unique IP addresses). These figures show the utility of HbVar for the globin research community.

FUTURE PROSPECTS
HbVar has become, since its inception and first launch, a key data resource for information about DNA variants leading to hemoglobinopathies and is still considered one of the most important LSDBs from the various existing ones. Key factors that have contributed to its broad adoption and success are (a) its constant data update and improvements, mostly driven by the long-term devotion and enthusiasm of the data curators and other researchers involved in this project, coming both from Europe and the US, (b) its dynamic data querying and visualization tools, in conjunction with the UCSC and PSU genome browsers, that are constantly being upgraded to become more user friendly and (c) its interrelation with other stable and well-respected international databases. All these features allowed HbVar to maintain a positive impact on the research community and also allowed to attract funding on a continuous basis, dedicated or related to other projects. This is particularly important for keeping HbVar operational, in an environment where dedicated funding opportunities for database development and curation are often very hard to secure, frequently resulting in the discontinuation of many useful databases.
In order to ensure continuous HbVar data enrichment, we plan to implement a broader data searching strategy that includes text-mining tools and other electronic search procedures. This will complement the already existing tight links to the scientific journal Hemoglobin and also other resources such as the Human Gene Mutation database (www. hgmd.org.uk; (14)), next to existing databases with which HbVar has already existing bidirectional links (7,8).
The recent emphasis that HbVar has given to expand its impact also among clinicians apart from researchers involved in globin research highlights its potential to make an impact in the clinical globin community, as well. In particular, HbVar can constitute a focal point for genotype and phenotype data collection from a very large number of hemoglobinopathy patients in registries and clinics world-wide. Similar to the CFTR2 project (www.cftr2.org; (15), such long-term effort would entail a thorough genotype and clinical phenotype data contribution, based on the already well-documented microattribution approach (16,17), allowing the identification of rare variants associated with disease. In these individuals, 159 CFTR gene variants had an allele frequency of 0.01%. These variants were evaluated for both clinical severity and functional consequence, with 127 (80%) meeting both clinical and functional criteria consistent with disease. Assessment of disease penetrance in 2,188 fathers of individuals with cystic fibrosis enabled assignment of 12 of the remaining 32 variants as neutral, whereas the other 20 variants remained of indeterminate effect. This study illustrates that sourcing data directly from well-phenotyped subjects can address the gap in our ability to interpret clinically relevant genomic variation.