Dr AFC: drug repositioning through anti-fibrosis characteristic

Abstract Fibrosis is a key component in the pathogenic mechanism of a variety of diseases. These diseases involving fibrosis may share common mechanisms and therapeutic targets, and therefore common intervention strategies and medicines may be applicable for these diseases. For this reason, deliberately introducing anti-fibrosis characteristics into predictive modeling may lead to more success in drug repositioning. In this study, anti-fibrosis knowledge base was first built by collecting data from multiple resources. Both structural and biological profiles were then derived from the knowledge base and used for constructing machine learning models including Structural Profile Prediction Model (SPPM) and Biological Profile Prediction Model (BPPM). Three external public data sets were employed for validation purpose and further exploration of potential repositioning drugs in wider chemical space. The resulting SPPM and BPPM models achieve area under the receiver operating characteristic curve (area under the curve) of 0.879 and 0.972 in the training set, and 0.814 and 0.874 in the testing set. Additionally, our results also demonstrate that substantial amount of multi-targeting natural products possess notable anti-fibrosis characteristics and might serve as encouraging candidates in fibrosis treatment and drug repositioning. To leverage our methodology and findings, we developed repositioning prediction platform, drug repositioning based on anti-fibrosis characteristic that is freely accessible via https://www.biosino.org/drafc.


Introduction
Fibrosis is defined as the process of excessive accumulation of fibrous connective tissue in most tissues or organs, where normal cell injury, transdifferentiation and abnormal proliferation, and deposition of extracellular matrix can disrupt tissue function. In the new era of 21st century, the morbidity and mortality rates of various fibrotic diseases have increased progressively, bringing an enormous global health burden. In developed countries, fibroproliferative diseases are responsible for nearly 45% of deaths [1]. One of the well-known fibrotic diseases, idiopathic pulmonary fibrosis (IPF), has a poor prognosis within the 5-year survival rate less than 30% and median survival ranging from 3 to 5 years [2]. The situation of IPF patients is even worse than those with various types of cancers [3]. As data obtained by Clinical Practice Research Datalink revealed, the prevalence of IPF patients in board case definitions has doubled from 19.94 per 100 000 patients in 2000 to 38.82 per 100 000 patients in 2012, and a 80% increase in incidence was observed [4]. Another lifethreatening fibrotic disease, cardiac fibrosis, is one of the leading factors causing heart failure [5]. A research from 2008 to 2014 revealed that in 318 patients with systolic dysfunction, 78% had one type of myocardial fibrosis, whereas 25% had at least two types [6].
The polypharmacology of most anti-fibrosis drugs could improve therapeutic efficacy. Recent studies have found that, firstly, fibrosis is the common pathogenic process in most diseases. For example, there are multiple common cellular processes between lung cancer and IPF, including inflammation, cell apoptosis and tissue infiltration [7]. Secondly, fibrosisrelated processes have common mechanisms, targets and drugs [8,9]. A multi-organ fibrosis research discovered 90 common differentially expressed genes across lung, heart, liver and kidney. In the two most active gene networks generated by ingenuity pathway analysis, these genes play a key role in connective tissue disorders and genetic, skeletal and muscular disorders [10]. Similarly, another multi-organ fibrosis research also obtained 11 metzincin-related differentially expressed genes across heart, lung, liver, kidney and pancreas including THBS2, TIMP1, COL1A2, COL3A1, HYOU1, MMP2 and MMP7 [11]. Thirdly, fibrosis is a complicated pathological process involving multiple pathways, thus multi-target drugs are more appropriate for fibrosis-related diseases [9]. Different pathways interact and counter-interact with each other to establish a 'check-andbalance' system, for instance, the core regulators, transforming growth factor-β and connective tissue growth factor signaling pathways could collaborate to elicit pulmonary and renal fibrosis [12,13]. In summary, these evidences indicate that anti-fibrosis intervention strategies and medicines may be applicable for diseases that were not originally considered and targeted through targeting their common fibrosis-related mechanisms. Therefore, compounds that more specifically target anti-fibrosis could have greater potential of repositioning and are more applicable for drug repositioning research.
Drug repositioning or repurposing refers to the 'reuse of old drugs', recycling existing drugs for new medical indications. Compared with de novo drug discovery, drug repositioning has obvious advantages such that it could significantly shorten drug development periods, reduce laboratory cost and minimize potential safety risk. Nowadays, drug repositioning is one of the most productive strategies in drug development [14]. With the advancement of high-throughput sequencing technology and deep learning, various data-driven computational prediction and analytic models stand out [15,16], including similarity ensemble approach (SEA) [17] and connectivity map (CMap) [18]. SEA clusters ligands into sets and calculates the similarity scores between ligand sets from ligand topology [17]. CMap computes the similarity to 'signatures' deduced from compound-induced gene profiles to quantify the biological functional relationships between compounds. Moreover, the relationship between compounds and diseases could also be quantified in reversed manner [18]. However, with countless repositioning methods and algorithms developed [19][20][21], there are still no attempts hitherto in introducing anti-fibrosis characteristic into drug repositioning strategy.
For the first time, we built the anti-fibrosis knowledge base upon anti-fibrosis targeted research. Based on the knowledge base, two repositioning models, Structural Profile Prediction Model (SPPM) and Biological Profile Prediction Model (BPPM) were constructed to achieve high prediction accuracy. Centered on these two models, we then developed a repositioning computing platform, drug repositioning based on anti-fibrosis characteristic (Dr AFC), to accelerate the exploratory process of repositioning drugs and contribute to the cutting-edge study of its underlying mechanisms.

Anti-fibrosis knowledge base
Anti-fibrosis-related literatures were collected through key word query 'fibrosis AND target' from PubMed from January 1 2000 to October 31 2019. The compound-target interaction information on 'fibrosis' were collected from Comparative Toxicogenomics Database (CTD) [22] from January 1 2000 to October 31 2019. Anti-fibrosis trials were collected from ClinicalTrials.gov [23] from January 1 2000 to October 31 2019. Meanwhile, approved anti-fibrosis drugs from DrugBank (Version 5.1.3) were collected. Finally, anti-fibrosis treatments, targets and compoundtarget interactions were extracted and aggregated into the knowledge base.

Model construction
Structural and biological profiles of compounds were collected from DrugBank [24] and CMap (build02), respectively, and used for model construction. A total of 2408 approved drugs in Drug-Bank and 1223 compounds in the anti-fibrosis knowledge base served as the raw data for SPPM construction. In total, 6100 biological profiles (gene expression) of 1309 small molecules in CMap served as the raw data for BPPM.

Case studies
A total of 20 263 natural products from Traditional Chinese Medicine Integrated Database (TCMID) [25], 5968 DrugBank experimental drugs [24] and 5000 random compounds from ChEMBL [26] were collected as external validations and used for SPPM case studies. And external biological profiles from Gene Expression Omnibus database (GSE85871) that contains transcriptomics perturbation profiles of 105 natural products in Michigan Cancer Foundation-7 cell line were used for BPPM case studies.

Pre-processing of modeling data
In raw chemical structures (from DrugBank approved drugs and the anti-fibrosis knowledge base) and biological profiles (from CMap) data, compounds that exist in the anti-fibrosis knowledge base were labeled as positive candidates, whereas the rest were labeled as negative candidates. Then, chemical structures were converted into chemical fingerprints (166-bits MACCS keys) for processing chemical information in a fast and convenient way using RDKit [27]. As to biological profiles, Quantile Transformer was used to transform biological profiles into ranking orders to improve model generalizability and also made data sets from different batches and platforms more comparable.
One-class support vector machine (nu = 0.3) was performed to estimate sample quality, remove outliers and confirm final positive and negative samples. In total, 70% of final samples were used as training set for model selection and super-parameter determination, whereas the remainder as testing set for model validation.

Anti-fibrosis model construction and validation
Four different machine learning algorithms were selected for modeling on training set, including logistic regression with l1 and l2 penalties, decision tree, random forest and gradient boosting. Among them, method with highest precision and area under the curve (AUC) calculated by 5-fold cross-validation was selected for subsequent analysis. Iterative feature elimination (IFE) algorithm was performed to select optimal feature set through one-by-one feature deletion. Finally, SPPM and BPPM were constructed based on optimal modeling algorithm and feature set, and further validated by testing set. The details of model construction are shown in Supplementary Figure S1.

Drug repositioning mechanism analysis
Network-based inference approaches are wildly applied in the realm of drug repositioning [20,21]. Here, we infer the potential drug repositioning mechanism through compound-targetdisease network. Firstly, based on SPPM and BPPM, the repositioning characteristics of compounds were predicted through their structural or biological profiles, in which compounds with reposition score > 0.5 were considered as anti-fibrosis and having repositioning potential. Next, the anti-fibrosis characteristic and potential repositioning mechanisms of these candidates were explored on the basis of compound-target-disease corresponding information in the anti-fibrosis knowledge base. Similar compounds that may interact with same targets and diseases were calculated through Tanimoto similarity of chemical structural fingerprints or Spearman's rank correlation coefficient of biological profiles. Targets and disease information of compounds reported in previous researches were refined from the anti-fibrosis knowledge base to explore anti-fibrosis mechanism of compounds. Finally, the potential mechanisms among compounds in compound-target-disease network displayed in drug repositioning analysis were used to help propose feasible drug repositioning solutions.

Webserver construction of Dr AFC
Dr AFC was constructed through PostgreSQL database and Django framework. This platform serves as a practical tool for prediction of drug repositioning potential based on compound structures (via SPPM) and biological profiles (via BPPM) as well as displaying compound-target-disease network of drug repositioning mechanisms. Meanwhile, Dr AFC also integrated toolkits such as quantitative estimate of drug-likeness from Silicos-it [28], and similarity calculation and structure matching borrowed from RDkit to provide convenient web-based calculations for users.
The overall process is shown in Figure 1.

SPPM and BPPM show high performances for anti-fibrosis prediction
To construct the anti-fibrosis knowledge base, 7058 fibrosisrelated references from PubMed, 302 from CTD [22] and 2664 fibrosis-related trials from ClinicalTrials.gov [23] were collected through text mining. Finally, 1223 anti-fibrosis treatments (containing 902 small molecules, including 386 approved drugs from DrugBank), 1067 fibrosis-related targets, 3096 fibrosis-related records from references and 1787 from trials, and 507 anti-fibrosis compound-target interactions were obtained and integrated into anti-fibrosis knowledge base (Supplementary Figure S2). In modeling session, 2885 compound structures (from DrugBank approved drugs and anti-fibrosis knowledge base compounds) [24] and 6100 biological profiles (from CMap) were labeled as positive candidates or negative candidates depending on their anti-fibrosis characteristic in the antifibrosis knowledge base. After sanity check and outlier removal, 1701 compound structures and 2735 biological profiles were filtered out for model construction (Supplementary Table S1).
Four different machine learning classifiers were evaluated and compared to choose the most optimal modeling method (Supplementary Table S2). Gradient boosting was eventually selected according to its highest precision and AUC (structural profile: precision = 0.737, recall = 0.608, AUC = 0.839; biological profile: precision = 0.892, recall = 0.522, AUC = 0.912).
In the process of building SPPM and BPPM, we found that even a small number of features could reach certain stability and reasonably good performance (Supplementary Figure S3,  Figure 2B). Besides, several mapped genes were associated with fibrosis-related indications like retroperitoneal fibrosis, keloids, tissue adhesions and cicatrix.
Finally, SPPM and BPPM were built based on the most optimal modeling method and the selected small feature subset (top 38 features in SPPM and top 47 features in BPPM). In testing set, the average AUC for SPMM reaches 0.814 ( Figure 2C) (recall = 0.512, precision = 0.731), whereas the average AUC for BPMM reaches 0.874 (recall = 0.613, precision = 0.867) ( Figure 2D).

Anti-fibrosis drugs exhibit greater drug repositioning potential
We used SPPM to predict anti-fibrosis drugs from DrugBank experimental drugs. The comparative analysis was performed between the CTD compound-gene interactions of the predicted anti-fibrosis and non-anti-fibrosis drugs. The results show that the anti-fibrosis group accommodates stronger interactions, presumably more genetic effects thus greater repositioning potential ( Figure 3A).
In Drugbank experimental drugs, multiple drugs with great repositioning potential (related genes > 500 and diseases > 20, Figure 3B, Supplementary Table S3) were already developed for fibrotic diseases and other diseases. Quercetin was discovered to ameliorate liver fibrosis through regulating macrophage infiltration and polarization, and it could alleviate IPF through fibroblasts apoptosis [29,30]. Based on our results, we confirm that quercetin interacts with numerous genes and is strongly linked to multiple diseases (repositioning score = 0.856, related genes = 3938 and diseases = 150, Supplementary Table S3). Another natural compound from turmeric (extracted from turmeric plant), curcumin (repositioning score = 0.855, related genes = 903 and diseases = 138, Supplementary Table S3), is also believed for treating multiple fibrotic diseases. It could inhibit fibroblast proliferation and myofibroblast differentiation in IPF [31], whereas inhibit oxidative stress and exhibit antiinflammatory effect in liver fibrosis [32]. Apart from fibrosis, curcumin has also been applied for osteoarthritis and rheumatoid arthritis treatment [33,34]. Moreover, other drugs, such as resveratrol also has great repositioning potential (repositioning score = 0.821, Figure 3B).

Natural compounds are the better repositories for drug repositioning
To expand the resources of potential repositioning drugs and further explore the chemical space, we introduced two external molecule sets, natural products from TCMID [25] and random compounds from ChEMBL [26]. SPPM was used to predict the repositioning potential of compounds from both external molecule sets in a structural perspective. The results show that 35.42, 77.26 and 37.04% of compounds could be potentially repositioned from DrugBank experimental drugs, TCMID and ChEMBL, respectively. The reserves in natural products from TCMID are significantly higher than others, indicating that natural products are promising repositioning repositories and worth further investigations ( Figure 3C).
From a genetic perspective, BPPM was used to predict the repositioning potential of 105 natural products (GSE85871) using their gene profiles. The results show that a total of 66 natural products have anti-fibrosis characteristic and repositioning potential, including ginsenoside Re (repositioning score = 0.979), muscone (repositioning score = 0.974) and cinnamic acid (repositioning score = 0.948) (Supplementary Table S4). Among them, ginsenoside Re is potentially impacting HDAC2, HDAC9 and HMGCR and playing anti-fibrosis roles via 'inflammation', 'preventing collagen deposition' and 'targeting myeloperoxidase', as reported by drug repositioning mechanism analysis tools from Dr AFC ( Figure 3D). Ginsenoside Re is the extract of Panax ginseng, which exhibit protective effects in neural and systematic inflammations through inhibiting the interaction between LPS and TLR4 in macrophages [35]. It was reported to exert anti-fibrosis effect on cardiac fibrosis through downregulating the expression of p-Smad3, collagen I and reducing the augmentation of collagen fibers [36]. Apart from fibrosis, ginsenoside Re could alleviate inflammation through inhibiting (E) Drug repositioning mechanism analysis was implemented to infer the drug potential repositioning mechanism through relationships among similar compounds, fibrosis-related targets and diseases. myeloperoxidase activity [37] and decrease fat accumulation through inhibiting HMGCR and cholesterol biosynthesis [38]. Besides, other ginsenosides, like ginsenoside Rb1, ginsenoside Rc, ginsenoside Rb3, ginsenoside Rb2, ginsenoside Rd and ginsenoside Rg, also exhibit anti-fibrosis characteristic and repositioning potential (Supplementary Table S4).

Dr AFC webserver
Based on SPPM and BPPM, we constructed a computing platform for repositioning research purpose, named Dr AFC, the main function and workflow of which is shown in Figure 4. Through Dr AFC platform, anti-fibrosis and potential repositioning could be predicted from compound structures and/or biological profiles. The supported 'drug repositioning mechanism analysis' could infer the relationships among compounds, fibrosis-related targets and diseases, which help researcher understand pathology. Furthermore, drug-likeness estimation, chemical similarity calculation and structure matching were integrated into Dr AFC to provide useful information for drug development.

Drug repositioning analysis function
Dr AFC allows users to upload compound structures or compound-induced biological profiles for repositioning potential prediction. As shown in Figure 4B, Dr AFC accepts simplified molecular-input line-entry system (SMILES) strings of compound structures for SPPM prediction and accepts gene profiles with row names in the format of Affymetrix U133A probe ID, Entrez ID or gene symbol for BPPM prediction. Both methods support .txt, .csv and .xlsx files ( Figure 4C).
After the required files are uploaded, the webserver would perform corresponding prediction analysis automatically based on the files and display the output on the result page from three aspects ( Figure 4D): (1) basic part including compound ID, compound name, 2D compound sketch (only for SPPM) and SMILES string (only for SPPM). (2) Prediction part including repositioning scores of anti-fibrosis characteristic and repositioning potential prediction. The repositioning scores ranges from 0 to 1 and higher score indicates higher potential. If repositioning score is ≥0.5 (the probability of existing anti-fibrosis property is ≥0.5), the compound is defined as an anti-fibrosis and potential repositioning compound. (3) Drug repositioning mechanism analysis part. This analysis infers the potential anti-fibrosis and repositioning mechanisms of compound structures or biological profiles uploaded by users via our anti-fibrosis knowledge base. The reported potential mechanisms can provide biological indications and mechanistic verifications for drug repositioning efforts.

Other functions
Dr AFC also contains functionalities including drug-likeness estimation, chemical similarity calculation and structure matching tools. Users could upload their compounds in SMILES and perform these ligand-based calculations. Druglikeness estimation could evaluate and score the compound drug-likeness, ranging from 0 to 1 with higher score indicating higher potential for lead compound. Chemical similarity calculation and structure matching provide convenient ways for users to search compound with similar structures, same structures or substructures, supporting single compound calculation and simultaneous calculation for multiple compounds.

Discussion
Fibrosis is the common mechanism underlying diseases that has attracted global attention. Anti-fibrosis characteristic of a compound could essentially infer its great repositioning potential. However, the anti-fibrosis characteristic has not been thoroughly considered and extensively applied into the realm of drug discovery till now. In this study, we first bridge the gap by developing a platform that can provide intensive information conveniently on Dr AFC data (https://www.biosino.org/drafc). This in silico platform also provides a highly accurate way to generate data for rational drug design via combining the advanced machinelearning algorithm.
Dr AFC was built upon the anti-fibrosis knowledge base, which pioneered the curation and deployment of fibrosis-related studies over the past decade. Structural profile (SPPM) and biological profile (BPPM) that show extraordinary capabilities in drug repositioning prediction (with AUC 0.814 and 0.874, respectively) were integrated into Dr AFC. BPPM shows slightly higher performance than SPPM based on AUC metric. The possible reason could be that biological profile is more tolerant and could contain information reflecting an overall effect of compound in the body. Biological profile has shown its advantage in multiple repositioning algorithms previously, such as CMap [18], L1000CDS 2 [39] and MANTRA [40]. Besides, certain therapies without available structural profile such as biotech drugs or cocktail therapies could also be studied in repositioning research based on their biological profiles.
In BPPM, 47 biological markers exhibited strong prediction capabilities. These genes are directly or indirectly linked to various fibrotic diseases. Notably, ribosomal proteins including RPL30, MRPL15, RPL32, RPS3A, RPLP0, RPL7, RPL23A and RPL13A are the main part of these biological markers. Ribosomes serve as significant regulators in immune signaling pathways, tumorigenesis pathways, and cardiovascular and metabolic diseases [41,42]. For example, the expression of RPL30 is negatively correlated with carcinogenesis process in medulloblastoma that is usually accompanied by desmoplasia and could thus serve as a prognosis biomarker [43]. Besides, the over-activation of ribonucleic acid (RNA) polymerase in the biogenesis of ribosomes could enhance protein synthesis and decrease translation accuracy, in turn triggering cancers or exacerbating cancer processes [44]. Furthermore, some biological markers are associated with the spliceosome formation including RBM8A, HNRNPA3, SNRPG and DHX15. Spliceosome is a large molecular machine composed of five small nuclear ribonucleic acids and many other proteins, and serves as the catalyzer of pre-RNA introns that are crucial for protein expression and function. It has been reported to be closely associated with multiple diseases including cystic fibrosis and pulmonary fibrosis [45,46].
Based on external molecule sets, natural products are validated to have the most appealing anti-fibrosis characteristics and repositioning potential among chemicals from different sources. Natural products provide a wealth of valuable natural resources for modern medicine and are seen as promising and popular candidates for drug repositioning studies [47]. Their natural scaffold novelty, structural complexity, abundant stereochemistry and 'metabolite-likeness' mainly account for their broad-spectrum of biological activities [48,49]. The multitargeting and synergistic effects of natural products exhibit great advantages in treating diseases undergoing sophisticated mechanisms, such as fibrosis [50]. Our studies show that natural products such as ginsenoside have great anti-fibrosis characteristic and repositioning potential and should give top priority to consider repositioned drug discovery. Additionally, the natural products in Drugbank experimental drugs, such as quercetin, curcumin and resveratrol, also highlight their strong repositioning capabilities. Therefore, natural products are a fruitful and promising source for future drug development studies.

Conclusion
In summary, based on anti-fibrosis characteristics, we constructed two predictive repositioning models, SPPM and BPPM, which predict the anti-fibrosis characteristics and repositioning potential from compound structures and/or compound-induced biological profiles. SPPM and BPPM take the advantage of therapeutic commonality and universality of fibrotic diseases, and dramatically increase the success rate of drug repositioning predications. This study not only established a highly efficient strategy of predicting repositioning, but also developed a convenient and user-friendly computing platform, Dr AFC (https:// www.biosino.org/drafc), for studying fibrosis mechanisms and drug repositioning.

Key Points
• Fibrosis is the common mechanism of diseases, which could be applied in drug repositioning.
• We developed a convenient and user-friendly computing platform, Dr AFC, for studying fibrosis mechanisms and drug repositioning.
• Dr AFC shows high performance on both crossvalidation and external validation, which demonstrates its potential applications in drug discovery.
• Natural compounds proved to be the better repositories for drug repositioning.

Supplementary Data
Supplementary data are available online at https://academi c.oup.com/bib. University at Buffalo Community of Excellence in Genome, Environment and Microbiome (to L.Z.).