-
PDF
- Split View
-
Views
-
Cite
Cite
Hongxin He, Manhong Shi, Yuxin Lin, Chaoying Zhan, Rongrong Wu, Cheng Bi, Xingyun Liu, Shumin Ren, Bairong Shen, HFBD: a biomarker knowledge database for heart failure heterogeneity and personalized applications, Bioinformatics, Volume 37, Issue 23, December 2021, Pages 4534–4539, https://doi.org/10.1093/bioinformatics/btab470
Close - Share Icon Share
Abstract
Heart failure (HF) is a cardiovascular disease with a high incidence around the world. Accumulating studies have focused on the identification of biomarkers for HF precision medicine. To understand the HF heterogeneity and provide biomarker information for the personalized diagnosis and treatment of HF, a knowledge database collecting the distributed and multiple-level biomarker information is necessary.
In this study, the HF biomarker knowledge database (HFBD) was established by manually collecting the data and knowledge from literature in PubMed. HFBD contains 2618 records and 868 HF biomarkers (731 single and 137 combined) extracted from 1237 original articles. The biomarkers were classified into proteins, RNAs, DNAs and the others at molecular, image, cellular and physiological levels. The biomarkers were annotated with biological, clinical and article information as well as the experimental methods used for the biomarker discovery. With its user-friendly interface, this knowledge database provides a unique resource for the systematic understanding of HF heterogeneity and personalized diagnosis and treatment of HF in the era of precision medicine.
The platform is openly available at http://sysbio.org.cn/HFBD/.
1 Introduction
Heart failure (HF) is a common, costly, debilitating and potentially fatal syndrome with complex and varied pathophysiology (McMurray and Pfeffer, 2005; Mebazaa et al., 2014; Richardson et al., 2007). It is defined as a failure of the heart to supply the blood and oxygen required for the metabolic demands of the body (Dickstein et al., 2008; El Amrousy et al., 2017; McMurray et al., 2012). The prevalence of this disease is reported to be 3–20 per 1000 individuals (0.3–2%) in the general population, which is approximately 1–2% of the adult population in developed countries, reaching as high as 60–100 per 1000 (6–10%) in patients older than 65 years of age (Ceia et al., 2002; Mebazaa et al., 2014; Mosterd and Hoes, 2007; Redfield et al., 2003). The number of people with HF is still increasing, most likely as the result of aging and the increasing prevalence of diabetes, obesity, hypertension and atherosclerotic disease (Ohkuma et al., 2017; Shi et al., 2019). Furthermore, excess alcohol use, infection, cardiomyopathy of unknown causes, and coronary artery disease including previous myocardial infarction (heart attack), high blood pressure, atrial fibrillation and valvular heart disease are common causes of HF. All of these conditions cause HF by changing either the structure or functioning of the heart (McMurray and Pfeffer, 2005; National Clinical Guideline Centre, 2010).
HF not only increases medical costs, mortality, morbidity and hospitalization rates but also decreases the health-related quality of life and functional status (Bettencourt et al., 2004; Ohkuma et al., 2017; Wajner et al., 2017). Therefore, HF prevention and management has become an important global public health problem (Ohkuma et al., 2017). In general, diagnosis of HF is based on a physical examination, history of signs and symptoms that commonly include shortness of breath (waking the person at night), limited ability to exercise, excessive tiredness and leg swelling (Davie et al., 1997; Fonseca, 2006; National Clinical Guideline Centre, 2010), and cardiac imaging, such as chest X-ray (Hawkins et al., 2009; Thomas et al., 2002). Since the use of these routine clinical methods still leads to some misdiagnosis and other drawbacks, it is necessary to find a more efficient and effective way to improve the accuracy of diagnosis and treatment of HF.
The concept of biomarker was first introduced in 1989 and standardized in 2001 (Eleuteri and Di Stefano, 2012). A biomarker is a traceable substance in a normal biological process, organ function, particular disease state or pharmacologic response to a therapeutic intervention (Siderowf et al., 2018). Biomarkers have been playing an important role in the diagnosis, prognosis and treatment of diseases. The definition of biomarker is also evolved with the development of modern biological and computational technologies (Lin et al., 2019; 2020; Qi et al., 2020; Shen et al., 2019) and biomarkers could be also metric measurements of dynamic and physiological signals (Bai et al., 2017; Shi et al., 2019; 2020). There is no doubt that precision medicine and healthcare needs to have personalized biomarkers for the heterogeneity of diseases including HF and others (Shen et al., 2017).
Continuous innovation and development of scientific research have led to a steady increase in the number of publications, making it difficult to find personalized biomarker information in larger of unstructured data for a specific disease. Therefore, several databases for different diseases and biomarkers have been established to remedy this problem, such as the gastric cancer biomarkers knowledgebase (GCBKB) (Lee et al., 2007), the urinary protein biomarker database (UPB) (Shao et al., 2011), the epigenetic biomarker database for colorectal cancer (CRC-EBD) (Liu et al., 2020), a knowledgebase for non-syndromic congenital heart disease associated genetic variations (CHDGKB) (Yang et al., 2020), etc. Both researchers and clinicians can benefit from these databases for the systematic understanding of the heterogeneity of the diseases (Wu et al., 2020).
Notably, echocardiography, electrocardiography, blood tests and chest radiography are all useful in determining the underlying causes of HF (National Clinical Guideline Centre, 2010). However, there is no database for collecting all those data, which makes HF-related researchers and clinicians have to spend a large amount of time and energy looking for information they need. Therefore, establishing a database to integrate those data with available HF biomarker information can provide a practical and efficient research tool for the systems level investigation of HF.
As we know, the PubMed database provides a huge amount of information associated with biomarkers for diagnosis, treatment and prognosis of HF; however, the information in PubMed is not structured, the terminologies are not standardized, the relationship between HF phenotypes and the biomarkers cannot be easily identified. In the big data era, the huge amount of knowledge and data provides us the chance to build computational models, including artificial intelligence (AI) methods, to investigate the heterogeneity in human diseases. However, the structured and well-annotated data is the first and vital step of model building. In addition, other resources were also integrated into the constructed knowledge database HFBD for the annotation and standardization of the collected data.
2 Materials and methods
2.1 Data collection
To ensure the reliability of the data, professional databases and knowledgebases were searched and a large number of articles were reviewed. Strict rules were implemented for the selection and classification of the data. First, relevant and original literature was gathered from PubMed using the following keywords: heart failure [Title] AND (biomarker* [Title] OR marker* [Title] OR indicat* [Title] OR predict* [Title]). Based on these criteria, 3982 articles published between 1948 and 2019 were reviewed and used as original data for the HFBD building. Research based on animal samples (611 articles), reviews, meta-analysis, case reports, comments, letters and editorials (534 articles) were excluded, with 2837 original articles remaining. Abstracts of the remaining articles were carefully reviewed in search of relevant and useful clinical information concerning the diagnosis, prognosis and treatment. A total of 1237 original articles matched all of the selection criteria. These articles were used as the data source for the HFBD, with 868 different HF biomarkers, including 731 different single and 137 combined biomarkers.
2.2 Biomarker annotation
During the process of data collecting, to distinguish the biological categories of each biomarker, ensure the unification of biomarkers curated from the publications, and provide the annotation of biomarkers, four other databases were also searched, including the NCBI Gene database (https://www.ncbi.nlm.nih.gov/gene), the NCBI Protein database (https://www.ncbi.nlm.nih.gov/protein), the UniProtKB database (https://www.uniprot.org/uniprot) and the miRBase database (http://www.mirbase.org/), respectively. Wikipedia (https://en.wikipedia.org) was consulted when necessary.
Based on their biological structures, HF biomarkers were classified as molecular, imaging, cellular and physiological biomarkers in the HFBD. The molecular biomarkers could be proteins, DNAs, RNAs and other types of molecules. RNA biomarker annotations are based on the information from the NCBI Gene and miRBase databases, while annotations for protein biomarkers are from the NCBI Gene, the NCBI Protein and the UniProtKB databases. Wikipedia was consulted for the annotation of other type of biomarkers such as imaging, and physiological indicators, etc.
We found that there were actually many terminologies used to describe heart failure. The symptoms, diagnosis and treatment methods corresponding to these terminologies are usually different. By clarifying the classification criteria of these different terminologies and their relationship, the biomarkers in HFBD can be easily structured and annotated. Based on the current European Society of Cardiology (ESC) HF clinical guideline, the HF terminologies were classified as HFrEF (HF with reduced ejection fraction) with left ventricular ejection fraction (LVEF) <40%, HFpEF (HF with preserved ejection fraction) with LVEF ≥50% and HFmrEF (HF with mid-range ejection fraction) with LVEF between 40% and 49%. Besides, HFpEF and HFrEF were previously referred to as diastolic and systolic HF, respectively. In addition, based on the time course of HF, it can be classified alternatively as AHF (Acute heart failure) and CHF (Chronic heart failure). Patients who have had HF for some time are often said to be ‘CHF’ and patients with rapid onset and worsening of symptoms and/or signs of HF are described as ‘AHF’. In the CHF, if a patient has received treatment and his or her signs and symptoms have remained generally unchanged for at least 1 month, the patient is said to have ‘Stable heart failure’. If chronic stable HF deteriorates, the patient may be said to be ‘Decompensated heart failure’. ‘Congestive heart failure’ is also a common terminology that is used to characterize patients who have AHF or CHF with evidence of volume overload. Patients with CHF may also develop into AHF. For example, AHF may be precipitated by extrinsic factors or caused by primary cardiac dysfunction in the CHF patients and present as a consequence of acute decompensation of CHF. In fact, depending upon patients’ stage of illness, many or all of these terminologies may be accurately applied to the same patient at different times (Lam and Solomon, 2014; Lund, 2018; Ponikowski et al., 2016).
To provide users a comprehensive understanding of HF biomarker, we carefully curated information for each biomarker entry, including biomarker information, disease information, sample information, summary and reference described as follows.
Biomarker-Information: basic information concerning HF biomarkers, including single and combined biomarkers, such as their full names and abbreviations.
Disease-Information: the clinical phenotyping about HF, such as the subtypes, phases and stages, etc.
Sample-Information: the human sample information about patients including age, gender and race.
Summary: information pertaining to experimental research and results.
Reference: information about PubMed citations.
2.3 Database implementation
The MySQL (5.0.11) server was used to integrate all of the data into a database format. HTML, PHP (5.6.28) and Javascript were used to build the website. The database was implemented using a Windows operating system (64) and Apache (2.4.23) HTTP server.
3 Results
3.1 Data statistics
The HFBD includes 868 HF biomarkers (Fig. 1A). Based on their biological structures, the top 2 single HF biomarker categories are molecular biomarker and Imaging biomarker (Fig. 1B). Most of the collected molecular biomarkers are proteins (Fig. 1C). Based on their clinical applications, the biomarkers can be used for diagnosis, prognosis and treatment (Fig. 1D and Table 1).
Distribution of different HF biomarkers in the HFBD. (A) Combined and single HF biomarkers distributed in the HFBD. (B) Distribution of single HF biomarkers at different levels in the HFBD. (C) Distribution of single molecular biomarkers in the HFBD. (D) Distribution of HF biomarkers (single + combined) in the HFBD based on their clinical applications
Distribution of HF biomarkers in the HFBD based on their applications
| Application . | Single . | Combined . | Total . |
|---|---|---|---|
| Prognosis | 463 | 95 | 558 |
| Diagnosis | 287 | 34 | 321 |
| Treatment | 138 | 10 | 148 |
| Diagnosis + Prognosis | 67 | 3 | 70 |
| Prognosis + Treatment | 39 | 0 | 39 |
| Diagnosis + Treatment | 15 | 2 | 17 |
| Diagnosis + Prognosis + Treatment | 11 | 0 | 11 |
| Application . | Single . | Combined . | Total . |
|---|---|---|---|
| Prognosis | 463 | 95 | 558 |
| Diagnosis | 287 | 34 | 321 |
| Treatment | 138 | 10 | 148 |
| Diagnosis + Prognosis | 67 | 3 | 70 |
| Prognosis + Treatment | 39 | 0 | 39 |
| Diagnosis + Treatment | 15 | 2 | 17 |
| Diagnosis + Prognosis + Treatment | 11 | 0 | 11 |
Distribution of HF biomarkers in the HFBD based on their applications
| Application . | Single . | Combined . | Total . |
|---|---|---|---|
| Prognosis | 463 | 95 | 558 |
| Diagnosis | 287 | 34 | 321 |
| Treatment | 138 | 10 | 148 |
| Diagnosis + Prognosis | 67 | 3 | 70 |
| Prognosis + Treatment | 39 | 0 | 39 |
| Diagnosis + Treatment | 15 | 2 | 17 |
| Diagnosis + Prognosis + Treatment | 11 | 0 | 11 |
| Application . | Single . | Combined . | Total . |
|---|---|---|---|
| Prognosis | 463 | 95 | 558 |
| Diagnosis | 287 | 34 | 321 |
| Treatment | 138 | 10 | 148 |
| Diagnosis + Prognosis | 67 | 3 | 70 |
| Prognosis + Treatment | 39 | 0 | 39 |
| Diagnosis + Treatment | 15 | 2 | 17 |
| Diagnosis + Prognosis + Treatment | 11 | 0 | 11 |
Table 2 shows the biomarker distribution based on the two categories of HF classifications. Based on these results, we found that the top 1 HF phenotype with the most biomarkers and PubMed citations is chronic heart failure.
Distribution of HF biomarkers in the HFBD and their PubMed citations
| HF classification . | Biomarkers . | PubMed citations . | ||
|---|---|---|---|---|
| Single . | Combined . | All . | ||
| Time course of HF | ||||
| Chronic heart failure | 346 | 61 | 407 | 458 |
| Acute heart failure | 108 | 21 | 129 | 118 |
| According to LVEF | ||||
| HFrEF (LVEF < 40%) | 145 | 13 | 158 | 139 |
| HFpEF (LVEF ≥ 50%) | 80 | 12 | 92 | 70 |
| HFmrEF (LVEF: 40–49%) | 2 | 0 | 2 | 1 |
| HF classification . | Biomarkers . | PubMed citations . | ||
|---|---|---|---|---|
| Single . | Combined . | All . | ||
| Time course of HF | ||||
| Chronic heart failure | 346 | 61 | 407 | 458 |
| Acute heart failure | 108 | 21 | 129 | 118 |
| According to LVEF | ||||
| HFrEF (LVEF < 40%) | 145 | 13 | 158 | 139 |
| HFpEF (LVEF ≥ 50%) | 80 | 12 | 92 | 70 |
| HFmrEF (LVEF: 40–49%) | 2 | 0 | 2 | 1 |
HFrEF, heart failure with reduced ejection fraction; HFpEF, heart failure with preserved ejection fraction; HFmrEF, heart failure with mid-range ejection fraction; LVEF, left ventricular ejection fraction.
Distribution of HF biomarkers in the HFBD and their PubMed citations
| HF classification . | Biomarkers . | PubMed citations . | ||
|---|---|---|---|---|
| Single . | Combined . | All . | ||
| Time course of HF | ||||
| Chronic heart failure | 346 | 61 | 407 | 458 |
| Acute heart failure | 108 | 21 | 129 | 118 |
| According to LVEF | ||||
| HFrEF (LVEF < 40%) | 145 | 13 | 158 | 139 |
| HFpEF (LVEF ≥ 50%) | 80 | 12 | 92 | 70 |
| HFmrEF (LVEF: 40–49%) | 2 | 0 | 2 | 1 |
| HF classification . | Biomarkers . | PubMed citations . | ||
|---|---|---|---|---|
| Single . | Combined . | All . | ||
| Time course of HF | ||||
| Chronic heart failure | 346 | 61 | 407 | 458 |
| Acute heart failure | 108 | 21 | 129 | 118 |
| According to LVEF | ||||
| HFrEF (LVEF < 40%) | 145 | 13 | 158 | 139 |
| HFpEF (LVEF ≥ 50%) | 80 | 12 | 92 | 70 |
| HFmrEF (LVEF: 40–49%) | 2 | 0 | 2 | 1 |
HFrEF, heart failure with reduced ejection fraction; HFpEF, heart failure with preserved ejection fraction; HFmrEF, heart failure with mid-range ejection fraction; LVEF, left ventricular ejection fraction.
The distribution about the HF biomarkers researches and the epidemiological distribution about the biomarkers were also statistically analyzed. Figure 2A displayed the geographical distribution of HF biomarker researches. For the studied populations, totally 1 056 437 participants aged 0–102 years were included in HFDB, which comprised of 629 228 males (60%) and 427 209 females (40%). The mean age of the studied population in HFBD is 63.87 years. Among the subjects in the HFBD, the numbers of white (Caucasian), black, Hispanic, African American and Asian individuals are 189 113, 76 748, 13 071, 9495 and 3848, respectively. The experimental samples, including plasma, serum and urine were used to detect the HF biomarkers, with the number of HFBD records of 601, 480 and 40, respectively. The distributions associated with HF disease stages based on the classification by New York Heart Association is presented in Figure 2B. Most of the researchers studied several disease stages together.
Statistics of biomarkers in HFBD. (A) The geographical distribution of HF biomarker researches. (B) Distribution of records in the HFBD based on the stages of HF
3.2 Web framework
The online HFBD is made up of six parts, including ‘HOME’, ‘BIOMARKERS’, ‘DOCUMENT’, ‘SUBMISSION’, ‘DOWNLOAD’ and ‘ABOUT US’ pages. The ‘HOME’ and ‘DOCUMENT’ pages give a brief introduction about HF and relevant statistical data regarding HF biomarkers, respectively. Database querying is conducted on the ‘BIOMARKERS’ page. Users can submit new data related to biomarkers of HF on the ‘SUBMISSION’ page and download all the data of HFBD on the ‘DOWNLOAD’ page. In addition, users can know how to use the HFBD and contact us within the ‘Help’ and ‘ABOUT US’ page, respectively. Figure 3 displays the pages described above.
HFBD webpages. (A) Keyword search and the search results. (B) List search and the search results. (C) Advanced search and the search results
3.3 Search in the HFBD
Three search methods, including keywords, list and advanced search, are provided for searching HF biomarkers. For the keyword search, users can find target HF biomarkers in the HFBD by typing a biomarker’s name in the search bar (Fig. 3A). In the list search, users can choose different categories, such as different types of HF, biological categories of HF biomarkers, combined HF biomarkers, and different applications of HF biomarkers, to get the corresponding HF biomarkers (Fig. 3B). In the advanced search, six conditions are provided for users to extract the information they needed (Fig. 3C). Figure 3 displayed the search results, which include a table with basic information of the search results, such as the PubMed ID (PMID), full name and abbreviations of the HF biomarkers, biological categories of the HF biomarkers, different HF types, clinical applications of the HF biomarkers, etc.
When users click one of the data bars in the table, the detail about the HF biomarkers will be shown on a new page. Figure 4 shows an example of an HF biomarker named ‘BNP’. The information consists of five categories, including a description of the biomarker, the disease (HF), sample, summary and reference. Also, users can click ‘NCBI_Gene’ ‘UniProt’ ‘miRBase’ and ‘PMID’ to access the online information about the biomarker.
3.4 Submission and download pages
A special message module was designed for users to submit the new data. First, users need to fill in the form on the submission page as required, providing the biomarker information, disease information and reference. Second, we will read corresponding literature and check relevant data based on the information provided by users. Finally, if a biomarker meets our selection criteria after verification, we will add the qualified information into the HFBD.
Users can download all the data in the HFBD from the download page. Users only need to click the download button on the page and set the save location for the file. The file will be downloaded in standard *.xlsx format.
4 Discussion
An increasing number of HF biomarkers are being identified as the number of research and original articles concerning HF biomarkers in PubMed is constantly increasing. One of the advantages of the HFBD database, is that it collected all the useful biomedical information concerning HF biomarkers in the user-friendly database. In this study, a comprehensive and professional database for HF was constructed based on 2618 records, 868 different HF biomarkers, including 731 different single and 137 combined biomarkers, and relevant information from 1237 original articles published between 1948 and 2019. Biological categories for single HF biomarkers included proteins, DNAs, RNAs, other molecules, images, cells and physiologies. Molecular and cellular biomarkers collected in the HFBD contain the only circulating ones. The HFBD classifies HF biomarkers and provides personalized information necessary for further biomarker researches and applications. Since the heterogeneity of HF (Francis et al., 2014; Lund, 2017), HFBD, therefore, provides a systematic perspective on the complexity of HF.
Compared with most of the existed databases, (i) HFBD collects multi-level biomarker data from molecular (protein, RNA, DNA, etc.) to imaging, cellular and physiological data; (ii) It includes clinical phenotyping and application information for diagnosis, prognosis and treatment. All the advantages make the resource useful to the multiple level genotype-phenotype integration and deep phenotyping in precision HF medicine. In addition, the full and abbreviated biomarker names have been checked and standardized according to the NCBI Gene and UniProtKB databases. Introductory information and basic HF biomarker descriptions were assembled from Wikipedia. The HFBD includes charts and tables, providing clear, concise and convenient graphic data presented in the user-friendly manner. The functional modules in the HFBD are versatile and users can search for precise biomarker information in several different ways, including keyword, list and advanced search functions. Using the advanced search function, users can input information concerning biomarkers, HF and original articles to locate HF biomarkers matching the users’ requirements. In the future, HFBD can be improved by incorporating interaction with users. On the biomarker information page, users can make full use of PMID to access the original articles and validate the accuracy of the information in the HFBD. HFBD will be helpful to personalized medicine and healthcare of HF in several ways, (i) HFBD is structured and manually curated, it can be used as a dictionary for searching and comparing. The clinical doctors can search HF-specific biomarkers for personalized diagnosis, treatment and prognosis. (ii) The structured information in the HFBD could be transferred to knowledge graph and it then can be applied to educate both clinicians and patients with the chatbot development, etc. (iii) With HFBD, clinicians and researchers can quickly and conveniently find a variety of biomarkers for the HF phenotype they are interested in. Specific biomarkers could be selected for their discriminating HF with other diseases. (iv) The biomarker-clinical phenotype relationships can be further investigated to identify the unknown HF subtypes for personalized diagnosis and treatment. The biomarkers at different levels can be combined and modeled to improve the accuracy of prediction about HF genesis and progression. At present, most of the databases or knowledgebases for cancers, neurodegenerative diseases and cardiovascular diseases are for a class of diseases, such as, a cancer database often includes information about prostate cancer, lung cancer, breast cancer, etc. which is similar to the association between fruits and apples, oranges, etc. This kind of database can provide chances to study both the common themes among the different cancers, and single caner- (such as prostate cancer or lung cancer, etc.) specific phenomenon. Very few specific-disease oriented databases or knowledgebases include clinical information that existed now. For precision medicine practice, it is very necessary to have the knowledge database for a concrete disease, such as HF, which could be used for future annotation and system-level understanding of the heterogeneous HF.
Different studies indeed have different inclusion and exclusion criteria, it’s a blind-people-feeling-elephant mode for complex disease. Such as the differentially expressed genes for one specific disease collected from different samples often enriched in some consensus pathways although there are very heterogeneous in gene level (Chen et al., 2015; Wang et al., 2011). The collection of the information from different ‘blind men’ is necessary to reconstruct the picture of ‘the elephant’ (complex disease, here HF) and to understand the heterogeneity of the studied disease. HFBD can provide diverse information for clinicians, who will face different situation and HF heterogeneity. The concrete and structured information in HFBD will be helpful to their personalized HF investigation. From the modeling perspective, we could build a model for similarity searching and provide a personalized strategy for the biomarker screening and application.
5 Conclusion
HFBD is a comprehensive and professional knowledge database. It standardized the extracted data from PubMed manually and it can provide structured and labeled data for the future annotation of experimental observation and AI modeling. It can provide the classification of HF subtypes, phenotype-biomarker relationships and also the patient’s information, etc. for modeling to identify the fine structure for personalized or stratified medicine. HFBD not only collects the information from PubMed, it also integrates other resources into the annotation of the collected data, such as NCBI Gene, NCBI Protein, Uniprot, miRbase and the descriptions from Wikipedia, etc. HFBD provides a systematic view about the diversity and heterogeneity. All the information about the ‘HF elephant’ from different ‘blind men’ could be integrated to reconstruct the ‘HF elephant’, i.e. to build a systems biological model for accurate prediction of HF genesis and progression.
Acknowledgements
Author contributions
H.H. and M.S. developed the data collection and curation pipeline. Y.L., C.Z., R.W., C.B. and X.L. curated the database, H.H., S.R., B.S. wrote and edited the manuscript. B.S. conceptualized and supervised the project. All authors contributed to the review of the article and provided comments.
Funding
This work was supported by the National Key Research and Development Program of China [2016YFC1306605], the National Natural Science Foundation of China [32070671] and the regional innovation cooperation between Sichuan and Guangxi Provinces [2020YFQ0019].
Conflict of Interest: none declared.
References
National Clinical Guideline Centre. (



