-
PDF
- Split View
-
Views
-
Cite
Cite
Momoko Hamano, Toru Nakamura, Ryoku Ito, Yuki Shimada, Michio Iwata, Jun-ichi Takeshita, Ryohei Eguchi, Yoshihiro Yamanishi, DIRECTEUR: transcriptome-based prediction of small molecules that replace transcription factors for direct cell conversion, Bioinformatics, Volume 40, Issue 2, February 2024, btae048, https://doi.org/10.1093/bioinformatics/btae048
- Share Icon Share
Abstract
Direct reprogramming (DR) is a process that directly converts somatic cells to target cells. Although DR via small molecules is safer than using transcription factors (TFs) in terms of avoidance of tumorigenic risk, the determination of DR-inducing small molecules is challenging.
Here we present a novel in silico method, DIRECTEUR, to predict small molecules that replace TFs for DR. We extracted DR-characteristic genes using transcriptome profiles of cells in which DR was induced by TFs, and performed a variant of simulated annealing to explore small molecule combinations with similar gene expression patterns with DR-inducing TFs. We applied DIRECTEUR to predicting combinations of small molecules that convert fibroblasts into neurons or cardiomyocytes, and were able to reproduce experimentally verified and functionally related molecules inducing the corresponding conversions. The proposed method is expected to be useful for practical applications in regenerative medicine.
The code and data are available at the following link: https://github.com/HamanoLaboratory/DIRECTEUR.git.
1 Introduction
Cell therapy, the transplantation of normal cells to a damaged organ, is a promising approach for regenerative medicine. Induced pluripotent stem cells (iPSCs) are generally used for establishing target cells (i.e. healthy cells required for treatment) from somatic cells. Direct reprogramming (DR) is an alternative technique for converting somatic cells (source cells) to target cells via a single-step conversion without the use of iPSCs (Xu et al. 2015, Volarevic et al. 2018, Wang et al. 2021). Because cells converted by DR are less damaged by genomic mutations, the cancerization risk after cell transplantation is expected to be significantly reduced (Davis et al. 1987, Ferber et al. 2000). Thus, DR-based cell generation is suitable for the treatment of damaged tissues (Grath and Dai 2019). A previous study reported that mouse embryonic fibroblasts (MEFs) can be converted to myoblasts by overexpressing the transcription factor (TF) MyoD (Davis et al. 1987), and several methods of TF-based DR induction have since been developed using experimental (Ieda et al. 2010, Caiazzo et al. 2011, Sekiya and Suzuki 2011) and computational approaches (Cahan et al. 2014, D’Alessio et al. 2015, Rackham et al. 2016, Jung et al. 2021). Nevertheless, because of the use of viral vectors to overexpress TF-coded genes, a risk of tumorigenesis caused by viral insertional mutagenesis remains for TF-induced DR (Isomi et al. 2021).
To reduce the risk of tumorigenesis caused by gene insertions, induction of DR by small molecules such as drugs or chemical agents has been proposed (Federation et al. 2014, Qin et al. 2017, Xie et al. 2017). Small molecules have the potential to induce DR by regulating (i.e. activating or inhibiting) multiple signaling pathways, modifying epigenetic patterns (e.g. histone acetylation and methylation), and manipulating the expression level of TF-coded genes (Han et al. 2017). Small molecule combinations have been successfully used to convert somatic cells into cardiomyocytes (Fu et al. 2015) and neurons (Li et al. 2015). Furthermore, DR induced by small molecules enables in vivo reprogramming without cell transplantation. Notably, in vitro and in vivo experiments have allowed the conversion of cardiac fibroblasts into cardiomyocytes in the heart (Mohamed et al. 2017, Sadahiro and Ieda 2021), and in vivo experiment on mouse brain successfully converted astrocytes into neurons in situ (Ma et al. 2021, Guo et al. 2014). However, as more studies have focused on TF-induced DR, the small molecule combinations required for inducing DR in specific tissues is still poorly known. Thus, the development of a novel method for in silico identification of small molecules combination for DR induction is a challenging issue.
Because of cost and expense, identifying the optimal combination of small molecules for DR is difficult using in vitro experiments alone. There is a previous work on the optimization of small molecule combinations that encompass the biological pathways identified from proteins targeted by known DR-inducing small molecules (Nakamura et al. 2022). However, the previous method cannot be applied to cell conversions for which DR-inducing small molecules have not been experimentally identified yet. A regression-based method with the biological pathway profiles was proposed, but the gene expression alteration during the DR induction was not taken into account (Napolitano et al. 2021).
In this study, we developed a novel computational method, DIRECTEUR (DIrect REprogramming by Chemical with TranscriptomE Utilization for Regenerative medicine), to identify new combinations of small molecules for DR from approved drugs by integrating heterogeneous transcriptome data. Using large-scale small molecule-induced transcriptome profiles and TF-induced transcriptome profiles acquired from TF-based DR experiments, we performed a variant of simulated annealing to explore small molecule combinations with similar gene expression patterns with DR-inducing TFs. We applied DIRECTEUR to predicting combinations of small molecules that convert fibroblasts into neurons or cardiomyocytes, and were able to reproduce experimentally verified and functionally related molecules inducing the corresponding conversions. The proposed method is expected to be useful for practical applications in regenerative medicine.
2 Materials and methods
2.1 Construction of target cell-specific transcriptome profiles
Transcriptome profiles of target cells induced by DR through the overexpression of TFs in mouse fibroblasts were acquired. More precisely, we focused on the DR of mouse embryonic fibroblasts into neurons and cardiomyocytes. Transcriptome profiles of fibroblasts (source cells) and neurons (target cells) were acquired from GSE22292 (Caiazzo et al. 2011) in the Gene Expression Omnibus Database (GEO) (Barrett et al. 2007). Similarly, transcriptome profiles of fibroblasts (source cells) and cardiomyocytes (target cells) were acquired from GSE27174 (Ieda et al. 2010). Gene expression values of the source cells and target cells were compared, and a transcriptome profile comprising log2 ratios was constructed. The number of genes was 25023 in the neuron-specific transcriptome profile and 24947 in the cardiomyocyte-specific transcriptome profile.
2.2 Construction of TF-induced transcriptome profiles
Transcriptome profiles in response to gene overexpression of TFs for DR were acquired. Ascl1, Lmx1a, and Nurr1 are TFs that induce neurons from fibroblasts (Caiazzo et al. 2011), and Tbx3, Mef2c, and Gata4 are TFs that induce cardiomyocytes from fibroblasts (Ieda et al. 2010). Thus, transcriptome profiles in response to gene overexpression of Lmx1a, Nurr1, Tbx3, and Mef2c were acquired from the L1000 mRNA profiling assay in the LINCS project (Subramanian et al. 2017) (https://lincsproject.org). Transcriptome profiles of the L1000 mRNA profiling assay were acquired from GEO (GSE70138 and GSE92742). The dataset consists of a total of 591855 transcriptome profiles, in which gene expression values were measured using 93 human cell lines with various perturbations. When there were multiple transcriptome profiles of the same gene, they were averaged. A transcriptome profile with each TF-coding gene overexpression was defined as where p is the number of landmark genes (p = 978).
We constructed a TF-induced transcriptome profile for each cell type by an average of transcriptome profiles with TF-coding gene overexpression, which was defined as . In the case of neurons, transcriptome profiles with gene overexpression of Lmx1a and Nurr1 were averaged to construct . In the case of cardiomyocytes, transcriptome profiles of Tbx3 and Mef2c were averaged to construct .
2.3 Construction of small molecule-induced transcriptome profiles
The number of small molecules was 21182. We obtained 312596 small molecule-treatment profiles (denoted as “trt_cp”) from the L1000 mRNA profiling assay. For each small molecule, the corresponding International Chemical Identifier code (InChIKey) was also obtained from GEO. For each small molecule, we represented a small molecule-induced transcriptome profile by a feature vector defined as , where p is the number of genes (p = 978). Of these, 1486 small molecules were registered in KEGG (Kanehisa et al. 2009), and 36 small molecules were experimentally confirmed to induce DR (Sizykh et al. 2021).
2.4 DR-characteristic transcriptome signatures
To identify DR-characteristic gene expression signatures, we extracted common patterns between target cell-specific transcriptome profiles and TF-induced transcriptome profiles by calculating the correlation coefficients between them. Since not all genes are involved in DR, we removed genes that did not contribute to the common gene expression patterns and selected genes that constituted the DR-characteristic gene expression patterns as DR-characteristic genes.
To calculate the correlation coefficient for organism-different transcriptome profiles, genes were converted to gene orthologs in the target cell-specific transcriptome profile (derived from mice) and the TF-induced transcriptome profile (derived from humans). Here, the target cell-specific transcriptome profiles consisting of common gene orthologs were defined as and the TF-induced transcriptome profile was defined as , where was the number of common orthologs ( = 594).
To identify commonly altered genes across the target cell-specific transcriptome profile and TF-induced transcriptome profiles, correlation coefficients between and were calculated to select highly correlated orthologs. The threshold values of the correlation coefficient were set at 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, and 1.0. In the case of the correlation coefficient of 1, the number of orthologs was less than the number of the overexpressed TFs (=3); hence, DR-characteristic gene expression patterns in neurons and cardiomyocytes were not identified at the threshold of 1. We constructed the target cell-specific transcriptome profile that consisted of common orthologs based on the correlation coefficient analysis, defined it as = , and referred to it as DR-characteristic transcriptome signature. Here, s is the number of correlated orthologs selected on the basis of correlation coefficient analysis. We also constructed small molecule-induced transcriptome signature that consisted of only correlated orthologs on the basis of correlation coefficient analysis from small molecule-induced transcriptome profiles.
2.5 Transcriptome-based prediction of the combination of DR-inducing small molecules by simulated annealing (DIRECTEUR)
We formulate a combinatorial optimization problem to identify the optimal combination of small molecules for DR, and propose a novel combinatorial optimization algorithm for solving it on the basis of simulated annealing in order to identify the optimal combination of small molecules. We attempted to predict a new combination of small compounds that induce DR from a candidate set of 1486 approved drugs and 36 known DR-inducing small molecules.
In this study, simulated annealing (Kirkpatrick et al. 1983, Černý 1985), which is one of the local search methods in meta-heuristics, was applied to acquire the solution to the combinatorial optimization problem. We propose the following algorithm:
Step 1: Arbitrarily choose k satisfying and randomly select a set of k small molecules as the initial state. In other words, generate an n-dimensional 0–1 vector with k elements of 1 and (n − k) elements of 0. Then, calculate the objective function .
Step 2: Randomly select whether to increase the number of small molecules by one to generate a neighborhood solution or to decrease the number of small molecules by one to generate a neighborhood solution.
1) In the case of increasing by one:
Randomly select one of the small molecules that have not yet been selected and add it to the candidate combinations. In other words, randomly select one of the elements of that is 0 and change it to 1, and set the n-dimensional 0–1 vector as the neighborhood solution .
2) In the case of decreasing by one:
Randomly select one of the small molecules from the already selected small molecules and remove it from the candidate combination. In other words, randomly select one of the elements of that is 1 and change it to 0 and set the n-dimensional 0–1 vector as the neighborhood solution .Then, the objective function is calculated.
Step 3: If , set with probability 1. Otherwise, with the probability exp(−(−)⋅iteration/).
Step 4: If a predetermined fixed time has elapsed, let be the final solution. Otherwise, return to Step 2.
Here, the number of iterations was within 1000000. We defined the highest objective function score as the prediction score. Note that in the original annealing method, the end temperature was set as the end condition of the algorithm, but the calculation time was set as the end condition of the algorithm in this study.
2.6 Evaluation of biological functions of small molecules target protein groups
To acquire the list of target proteins regulated by predicted small molecules for DR, we collected a set of interactions between small molecules and proteins. These interaction data were acquired from public databases such as ChEMBL (Gaulton et al. 2012), MATADOR (Günther et al. 2008), DrugBank (Knox et al. 2011), BindingDB (Liu et al. 2007), KEGG DRUG (Kanehisa et al. 2009), Psychoactive Drug Screening Program Ki (PDSP-Ki) (Roth et al. 2000), and Therapeutic Target Database (Qin et al. 2014). This dataset included 1522 small molecules and 2470 proteins, along with 18650 interactions.
GO and KEGG pathway enrichment analyses were performed to elucidate the biological functions of the target proteins of the predicted small molecules and highly correlated genes. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used for GO analysis of the target proteins of predicted small molecules (Dennis et al. 2003). The top three GO terms in the annotation clusters that ranked in the Functional Annotation Clustering function with statistical significance (P <0.05) were extracted. The enrichment P-values of all extracted GO terms were calculated using DAVID.
2.7 Visualization of the PPA network of a group of target proteins of DR-inducing small molecules
To elucidate the association network among target proteins of predicted small molecules, the protein-protein association (PPA) network was constructed from molecular association network data stored in the Search Tool for the Retrieval of Interacting Genes (STRING) Database (Szklarczyk et al. 2015) (https://www.string-db.org/). In STRING, protein–protein associations are based on evidence such as experiments, databases, co-expression, neighborhood, gene fusion, and co-occurrence. We used a dataset of 11944806 protein–protein associations of Mus Musculus in the STRING database that involved 21291 proteins. We extracted the protein–protein association network involving target proteins of the predicted small molecules and visualized it with Cytoscape (Shannon et al. 2003).
The degree of centrality was calculated using Cytoscape to detect nodes with high centrality and depicted by the degree of centrality as the size of the node. The association between proteins was depicted by the “combined score” of STRING as the thickness of the edge. To demonstrate whether the gene expression levels of the target proteins were altered by DR induction, the ratio of expression value of target cells against source cells was depicted by the color of the node using the target cell-specific transcriptome profile. Red indicates that gene expression was upregulated, blue indicates that the gene expression was downregulated, and gray indicates that the gene expression level was unknown.
2.8 Evaluation of small molecules that induce DR
To evaluate the proportion of known DR-inducing small molecules or small molecules with similar functions to known DR-inducing ones against the number of all small molecules in the predicted combination, the statistical significance was evaluated by Fisher’s exact probability test.
3 Results
3.1 Overview of the proposed method for predicting small molecules for DR
To identify DR-characteristic gene expression patterns, we prepared two types of transcriptome profiles. First, we constructed a set of target cell-specific transcriptome profiles by calculating the expression ratio from the transcriptome profiles of the source cells and the target cells (Fig. 1a). Second, we constructed a TF-induced transcriptome profile, which is the sum of transcriptome profiles in response to the overexpression of genes of DR-inducing TFs (Fig. 1b). Next, we selected correlated genes that have similar expression patterns across the target cell-specific transcriptome profile and TF-induced transcriptome profile, and referred to them as DR-characteristic genes. Finally, we constructed a DR-characteristic transcriptome signature that was defined as the target cell-specific transcriptome profile consisting of only the selected DR-characteristic genes (Fig. 1c).

Overview of our proposed method to predict the combination of small molecules that induce DR. Target cell-specific transcriptome profiles for neurons and cardiomyocytes, TF-induced transcriptome profiles, and small molecule-induced transcriptome profiles were constructed. Next, DR-characteristic transcriptome signatures that reflect gene expression patterns of TF-based DR were identified. A new combination of small molecules for DR was predicted by a variant of simulated annealing-based optimization algorithm.
To identify combinations of small molecules that induce DR, we prepared a set of small molecule-induced transcriptome profiles (Fig. 1d), which included approved drugs. We performed a variant of simulated annealing to explore small molecule combinations on the basis of the correlation between the DR-characteristic transcriptome signature and the sum of small molecule-induced transcriptome signatures (Fig. 1e). In this study, we demonstrated the usefulness of the proposed method through applications of DR of fibroblasts into neurons and cardiomyocytes.
3.2 Identification of DR-characteristic transcriptome signatures
To identify DR-characteristic gene expression signatures, we extracted common patterns between target cell-specific transcriptome profiles and TF-induced transcriptome profiles by calculating the correlation coefficients between them. Since not all genes are involved in DR, we removed genes that did not contribute to the common gene expression patterns and selected genes that constituted the DR-characteristic gene expression patterns as DR-characteristic genes. Changing the threshold values of the correlation coefficients little by little, we prepared DR-characteristic genes at different levels. We defined the target-specific transcriptome profile consisting of the selected DR-characteristic genes as the DR-characteristic transcriptome signature. We constructed 11 levels of DR-characteristic transcriptome signatures with correlation coefficient thresholds of 0, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9 and 0.95.
Reportedly, the DR of fibroblasts into neurons is induced by three TFs, Ascl1, Lmx1a, and Nurr1 (Caiazzo et al. 2011). Therefore, a DRneuron-characteristic transcriptome signature was constructed on the basis of the neuron-specific transcriptome profile and these three TFs-induced transcriptome profiles. For each correlation coefficient, the number of DRneuron-characteristic genes was 594, 441, 411, 381, 351, 322, 290, 255, 219, 179, and 135, respectively (Supplementary Fig. S1a and Supplementary Table S1).
Reportedly, the DR of fibroblasts into cardiomyocytes is also induced by three TFs, namely, Tbx3, Mef2c, and Gata4 (Ieda et al. 2010). Using the same procedure as per the neurons, the number of DRcardio-characteristic genes for each correlation coefficient threshold was 594, 418, 398, 377, 357, 333, 308, 279, 242, 198, and 141, respectively (Supplementary Fig. S1b and Supplementary Table S1).
We first focused on DRneuron-characteristic genes of neurons with a correlation coefficient of 0.95 (Supplementary Table S2). To understand the biological functions of these genes, we performed enrichment analysis through gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway (Kanehisa et al. 2009). Multiple GO terms associated with “transcription” were detected (Supplementary Fig. S2a). We then performed the same analyses on DRcardio-characteristic genes (Supplementary Fig. S2b). Here, multiple GO terms associated with apoptosis were detected. The term “cell cycle” was enriched in both GO terms and KEGG pathways. Remarkably, “negative regulation of ERK1 and ERK2 cascade,” one of the MAPK signaling cascades associated with cardiac reprogramming, was enriched. These results suggest that DR-characteristic genes of cardiomyocytes are involved in these biological pathways to induce reprogramming of fibroblasts into neurons and cardiomyocytes.
3.3 Determination of optimal conditions for predicting small molecules for DR
We applied our method on DR-characteristic transcriptome signatures at certain correlation coefficients to identify the combination of small molecules that induces DR of neurons and cardiomyocytes. We identified a combination of small molecules that approximate the DR-characteristic transcriptome signature by the sum of the small molecule-induced transcriptome signatures by simulated annealing. Then, we calculated the number of predicted small molecules and prediction scores at each correlation coefficient (Fig. 2).

Number of predicted small molecules and the prediction score for the induction of neurons (a) and cardiomyocytes (b). Upper panels show the number of predicted small molecules at each correlation coefficient between target cell-specific and TF-induced transcriptome profiles. Blue bar indicates the number of known DR-inducing small molecules, light blue bar indicates the number of small molecules with similar functions to the known DR-inducing small molecules, and gray bar indicates the number of newly predicted small molecules. The lower panels show the prediction score of the objective function at each correlation coefficient.
To determine the small molecules that could drive DR of fibroblast into neurons and cardiomyocytes, we predicted small molecules based on the score of the objective function at each correlation coefficient. Six to ten candidate small molecules were predicted at each correlation coefficient (Fig. 2a and b, Supplementary Table S3a and b). The small molecule list included known DR-inducing small molecules that have been experimentally proven to induce DR and small molecules that have similar biological functions to known ones. The prediction score increased with the correlation coefficient and reached a maximum at a correlation coefficient of 0.95. These results suggest that a correlation coefficient of 0.95 is suitable for predicting the combination of small molecules that convert fibroblasts into neurons and cardiomyocytes.
3.4 Identification of the combination of small molecules that induces DR
Using DR-characteristic gene expression signature at the 0.95 correlation coefficient, we predicted the combination of small molecules that induce neurons and cardiomyocytes (Tables 1 and 2). Among predicted combinations of 10 small molecules for DR of neurons, TTNPB (arotinoid acid), romidepsin, and etoposide corresponded to known DR-inducing small molecules (Park et al. 2011, Katz et al. 2013, Kim et al. 2020) (Table 1). We evaluated the ratio of known DR-inducing small molecules among the predicted small molecules by Fisher’s exact test, and confirmed that the known DR-inducing small molecules were significantly present in the predicted small molecules (P =0.004503). Additionally, metaraminol and zafirlukast, regulators of neuroactive ligand–receptor interaction, were included among the candidates. These results suggest the potential involvement of DR-inducing small molecules in the regulation of neuronal function.
List of the small molecules in the predicted combinations for DR of fibroblasts into neurons.
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
TTNPB∗ | Potent RAR agonist | ||
Romidepsin∗ | Antineoplastic | Histone deacetylase inhibitor | Cell cycle, Notch signaling pathway, Pathways in cancer |
Etoposide∗ | Antineoplastic | Topoisomerase II inhibitor | |
Ibrutinib | Antineoplastic | Bruton's tyrosine kinase inhibitor | B cell receptor signaling pathway |
Turofexorate isopropyl | Antidyslipidemia | Farnesoid X receptor (FXR) agonist | |
Bufexamac | Anti-inflammatory | COX inhibitor | Arachidonic acid metabolism |
Metaraminol | Antihypotensive | alpha-Adrenergic receptor agonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction, Adrenergic signaling in cardiomyocytes |
Sulfachlorpyridazine | Antibacterial | Folate biosynthesis | |
Zafirlukast | Antiasthmatic | Leukotriene receptor antagonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction |
Ceforanide | Antibacterial | Cell wall biosynthesis inhibitor | Peptidoglycan biosynthesis |
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
TTNPB∗ | Potent RAR agonist | ||
Romidepsin∗ | Antineoplastic | Histone deacetylase inhibitor | Cell cycle, Notch signaling pathway, Pathways in cancer |
Etoposide∗ | Antineoplastic | Topoisomerase II inhibitor | |
Ibrutinib | Antineoplastic | Bruton's tyrosine kinase inhibitor | B cell receptor signaling pathway |
Turofexorate isopropyl | Antidyslipidemia | Farnesoid X receptor (FXR) agonist | |
Bufexamac | Anti-inflammatory | COX inhibitor | Arachidonic acid metabolism |
Metaraminol | Antihypotensive | alpha-Adrenergic receptor agonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction, Adrenergic signaling in cardiomyocytes |
Sulfachlorpyridazine | Antibacterial | Folate biosynthesis | |
Zafirlukast | Antiasthmatic | Leukotriene receptor antagonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction |
Ceforanide | Antibacterial | Cell wall biosynthesis inhibitor | Peptidoglycan biosynthesis |
Known DR-inducing molecules are highlighted in bold and asterisk. Small molecules with similar functions to known DR-inducing molecules are highlighted in bold.
List of the small molecules in the predicted combinations for DR of fibroblasts into neurons.
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
TTNPB∗ | Potent RAR agonist | ||
Romidepsin∗ | Antineoplastic | Histone deacetylase inhibitor | Cell cycle, Notch signaling pathway, Pathways in cancer |
Etoposide∗ | Antineoplastic | Topoisomerase II inhibitor | |
Ibrutinib | Antineoplastic | Bruton's tyrosine kinase inhibitor | B cell receptor signaling pathway |
Turofexorate isopropyl | Antidyslipidemia | Farnesoid X receptor (FXR) agonist | |
Bufexamac | Anti-inflammatory | COX inhibitor | Arachidonic acid metabolism |
Metaraminol | Antihypotensive | alpha-Adrenergic receptor agonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction, Adrenergic signaling in cardiomyocytes |
Sulfachlorpyridazine | Antibacterial | Folate biosynthesis | |
Zafirlukast | Antiasthmatic | Leukotriene receptor antagonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction |
Ceforanide | Antibacterial | Cell wall biosynthesis inhibitor | Peptidoglycan biosynthesis |
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
TTNPB∗ | Potent RAR agonist | ||
Romidepsin∗ | Antineoplastic | Histone deacetylase inhibitor | Cell cycle, Notch signaling pathway, Pathways in cancer |
Etoposide∗ | Antineoplastic | Topoisomerase II inhibitor | |
Ibrutinib | Antineoplastic | Bruton's tyrosine kinase inhibitor | B cell receptor signaling pathway |
Turofexorate isopropyl | Antidyslipidemia | Farnesoid X receptor (FXR) agonist | |
Bufexamac | Anti-inflammatory | COX inhibitor | Arachidonic acid metabolism |
Metaraminol | Antihypotensive | alpha-Adrenergic receptor agonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction, Adrenergic signaling in cardiomyocytes |
Sulfachlorpyridazine | Antibacterial | Folate biosynthesis | |
Zafirlukast | Antiasthmatic | Leukotriene receptor antagonist | Calcium signaling pathway, Neuroactive ligand-receptor interaction |
Ceforanide | Antibacterial | Cell wall biosynthesis inhibitor | Peptidoglycan biosynthesis |
Known DR-inducing molecules are highlighted in bold and asterisk. Small molecules with similar functions to known DR-inducing molecules are highlighted in bold.
List of the small molecules in the predicted combinations for DR of fibroblasts into cardiomyocytes.a
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
I-BET151∗ | BET bromodomain inhibitor | ||
Binimetinib | Antineoplastic | Mitogen-activated extracellular signal-regulated kinase (MEK) inhibitor | MAPK signaling pathway, Pathways in cancer, Melanoma |
Pepstatin | Antineoplastic | Pepsin inhibitor | |
Calcipotriene | Antipsoriatic | Vitamin D receptor agonist | Endocrine and other factor-regulated calcium reabsorption, Mineral absorption |
Gamolenic acid | Analgesic | Anti-inflammatory | |
Nifenazone | Analgesic | Antirheumatic | |
Adenosine phosphate | Supplement (nutrient) | Neuroactive ligand-receptor interaction | |
Filgotinib | Anti-inflammatory | Janus kinase (JAK) inhibitor | JAK-STAT signaling pathway, Th1 and Th2 cell differentiation, Th17 cell differentiation |
Tiratricol | Replenisher (thyroid hormone) | Neuroactive ligand-receptor interaction, Thyroid hormone signaling pathway | |
Glemanserin | Antianxiety | Neuroactive ligand-receptor interaction, Serotonergic synapse |
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
I-BET151∗ | BET bromodomain inhibitor | ||
Binimetinib | Antineoplastic | Mitogen-activated extracellular signal-regulated kinase (MEK) inhibitor | MAPK signaling pathway, Pathways in cancer, Melanoma |
Pepstatin | Antineoplastic | Pepsin inhibitor | |
Calcipotriene | Antipsoriatic | Vitamin D receptor agonist | Endocrine and other factor-regulated calcium reabsorption, Mineral absorption |
Gamolenic acid | Analgesic | Anti-inflammatory | |
Nifenazone | Analgesic | Antirheumatic | |
Adenosine phosphate | Supplement (nutrient) | Neuroactive ligand-receptor interaction | |
Filgotinib | Anti-inflammatory | Janus kinase (JAK) inhibitor | JAK-STAT signaling pathway, Th1 and Th2 cell differentiation, Th17 cell differentiation |
Tiratricol | Replenisher (thyroid hormone) | Neuroactive ligand-receptor interaction, Thyroid hormone signaling pathway | |
Glemanserin | Antianxiety | Neuroactive ligand-receptor interaction, Serotonergic synapse |
Known DR-inducing molecules are highlighted in bold and asterisk. Small molecules with similar functions to known DR-inducing molecules are highlighted in bold.
List of the small molecules in the predicted combinations for DR of fibroblasts into cardiomyocytes.a
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
I-BET151∗ | BET bromodomain inhibitor | ||
Binimetinib | Antineoplastic | Mitogen-activated extracellular signal-regulated kinase (MEK) inhibitor | MAPK signaling pathway, Pathways in cancer, Melanoma |
Pepstatin | Antineoplastic | Pepsin inhibitor | |
Calcipotriene | Antipsoriatic | Vitamin D receptor agonist | Endocrine and other factor-regulated calcium reabsorption, Mineral absorption |
Gamolenic acid | Analgesic | Anti-inflammatory | |
Nifenazone | Analgesic | Antirheumatic | |
Adenosine phosphate | Supplement (nutrient) | Neuroactive ligand-receptor interaction | |
Filgotinib | Anti-inflammatory | Janus kinase (JAK) inhibitor | JAK-STAT signaling pathway, Th1 and Th2 cell differentiation, Th17 cell differentiation |
Tiratricol | Replenisher (thyroid hormone) | Neuroactive ligand-receptor interaction, Thyroid hormone signaling pathway | |
Glemanserin | Antianxiety | Neuroactive ligand-receptor interaction, Serotonergic synapse |
Small molecule name . | Efficacy . | Function . | Pathway . |
---|---|---|---|
I-BET151∗ | BET bromodomain inhibitor | ||
Binimetinib | Antineoplastic | Mitogen-activated extracellular signal-regulated kinase (MEK) inhibitor | MAPK signaling pathway, Pathways in cancer, Melanoma |
Pepstatin | Antineoplastic | Pepsin inhibitor | |
Calcipotriene | Antipsoriatic | Vitamin D receptor agonist | Endocrine and other factor-regulated calcium reabsorption, Mineral absorption |
Gamolenic acid | Analgesic | Anti-inflammatory | |
Nifenazone | Analgesic | Antirheumatic | |
Adenosine phosphate | Supplement (nutrient) | Neuroactive ligand-receptor interaction | |
Filgotinib | Anti-inflammatory | Janus kinase (JAK) inhibitor | JAK-STAT signaling pathway, Th1 and Th2 cell differentiation, Th17 cell differentiation |
Tiratricol | Replenisher (thyroid hormone) | Neuroactive ligand-receptor interaction, Thyroid hormone signaling pathway | |
Glemanserin | Antianxiety | Neuroactive ligand-receptor interaction, Serotonergic synapse |
Known DR-inducing molecules are highlighted in bold and asterisk. Small molecules with similar functions to known DR-inducing molecules are highlighted in bold.
We next looked at the predicted combinations of 10 small molecules for DR of cardiomyocytes (Table 2). I-BET151 is a known DR-inducing small molecule, and binimetinib and filgotinib are small molecules with similar functions to known DR-inducing small molecules (Van Rompaey et al. 2013, Woodfield et al. 2016, Qin et al. 2017). Known DR-inducing small molecules and small molecules with similar functions to known DR-inducing small molecules were significantly present in the predicted small molecules (P =0.004503; Fisher’s exact test). Binimetinib and figotinib are known to regulate MAPK and JAK-STAT signaling (Van Rompaey et al. 2013, Woodfield et al. 2016), which have been reported to be target pathways of DR of cardiomyocytes. Another identified small molecule, gamolenic acid, acts similarly to anti-inflammatory drugs (Kapoor and Huang 2006), and it has been reported to promote DR of cardiomyocytes (Muraoka et al. 2019). Altogether, these results suggest that small molecules predicted with our proposed method are involved in multiple biological processes related with DR of cardiomyocytes.
The results with different T values (the number of small molecules to be selected) are shown in Supplementary Table S4. When the value of T was small (e.g. T = 1), no known DR-inducing small molecules were identified.
3.5 Molecular mechanisms of the predicted combination of small molecules for DR
To determine the active pathway of the small molecule combination, we examined the molecular mechanisms of the predicted combination of small molecules for DR of neurons and cardiomyocytes. We detected the target proteins of the predicted small molecules in the compound–protein interactome data and evaluated their biological functions and protein–protein associations.
We detected 75 target proteins for 10 small molecules (Table 1) in the combination predicted for neuronal induction (Supplementary Table S6a). We performed GO and KEGG pathway enrichment analysis on these target proteins (Fig. 3a). Multiple GO terms associated with “protein phosphorylation” were detected, implying that some signaling pathways were activated by the combination of small molecules via protein phosphorylation (Fig. 3a top). ERK1 and ERK2 cascades, common target signaling pathways of cell differentiation and cell reprogramming (Girardi et al. 2019, Thomas et al. 2020), were detected (Fig 3a top). Moreover, pathways involved in neural activities such as “neuroactive ligand–receptor interaction” and “calcium signaling” were detected in the KEGG pathway (Fig. 3a bottom). These results suggest that the predicted combination of small molecules may induce DR by jointly regulating multiple pathways of neuronal function. Next, we looked at the combination of 10 small molecules (Table 2) predicted for cardiomyocyte induction and obtained 66 target proteins (Supplementary Table S6b). GO analysis of the target proteins (Fig. 3b) detected terms such as “protein phosphorylation” and “positive regulation of ERK1 and ERK2 cascade” (Fig. 3b top), it is similar to neurons-induction linked target proteins. KEGG pathway enrichment analysis detected “hypertrophic cardiomyopathy,” associated with myocardial dysfunction, and “cAMP signaling pathway,” the second messenger that activates intracellular signaling transduction (Fig. 3b bottom). In previous experimental works on small molecule-based DR, Forskolin was often used for induction of several cell types and responsible for activating intracellular signaling by increasing cAMP (Xie et al. 2017, Yuan et al. 2020). It implies that small molecules in the predicted combination may include candidates that replace forskolin. These results suggest that the predicted combination of small molecules is likely to activate intracellular signaling transduction, resulting in the DR of cardiomyocytes.

Gene ontology and KEGG pathway enrichment analyses for target proteins of predicted small molecules that induce DR from fibroblasts to neurons and cardiomyocytes. (a) Top 10 GO terms and KEGG pathways of 75 target proteins of predicted small molecules for neurons. (b) Top 10 GO terms and KEGG pathways of 66 target proteins of predicted small molecules for cardiomyocytes.
To elucidate the association between target proteins of small molecules in the predicted combination, we examined the target proteins on the protein–protein association (PPA) network (Fig. 4). Mapk1, Src, and Fyn were identified as highly central nodes in the PPA network targeted by the neuron-inducing small molecules (Fig. 4a). Mapk1 is a component protein of the MAPK signaling pathway (Wei and Liu 2002). Src and Fyn are tyrosine–protein kinases of the Src family, and both have been reported to regulate neuronal development through activation of neuronal signaling pathways (Trepanier et al. 2012). Therefore, these results suggest that target proteins of predicted small molecules for DR of neurons are involved in neuronal development and functions. We then performed the same analysis on target proteins of small molecules in the combination predicted for cardiomyocytes (Fig. 4b). As a small network of tightly interacted nodes, multiple proteins of AMP-activated protein kinase (AMPK) and the AMPK catalytic subunits (Prkaa1, Prkaa2, Prkab1, Prkab2, Prkag1, Prkag2, and Prkag3) were detected. AMPK are the target proteins of adenosine phosphate, which is activated by increasing cAMP and phosphorylation of PKA (Aslam and Ladilov 2022), and widely regulates cellular functions such as autophagy, metabolism, cytoskeletal signaling autophagy, metabolism, and cytoskeletal signaling (Mihaylova and Shaw 2011). Autophagy and metabolic regulation have been reported as target biological processes that induce DR and cell reprogramming (Xie et al. 2017), suggesting that the combination of small molecules targeting AMPK may promote DR. Map2k1 and Map2k2, which are component proteins of the MAPK signaling pathway (Kim and Choi 2010), were detected as highly central target proteins in the network. Additionally, IL6, a cytokine that induces inflammatory responses and crosstalks with MAPK signaling and JAK-STAT signaling (Kagan et al. 2017), was detected as a highly central target protein. Gamolenic acid, an anti-inflammatory drug, has been identified as a suppressor of inflammatory response (Muraoka et al. 2019) for DR of cardiomyocytes. Gene expression levels of IL6, IL1b, IL18, and Tnf, which are regulators of the inflammatory response (Qin et al. 2017), were decreased during the DR of cardiomyocytes. In fact, small molecules in the predicted combination included small molecules that target inflammatory response-regulating proteins. These results suggest that the suppression of inflammatory responses is likely to enhance the DR of cardiomyocytes.

Protein–protein association networks of target proteins of the predicted small molecules that induce DR for neurons (a) and cardiomyocytes (b) from fibroblasts. The size of the nodes indicates the degree of centrality, and the thickness of the edges indicates the protein–protein association scores. The expression ratio is depicted by the color of the node using the target cell-specific transcriptome profile. Red node indicates upregulation of the gene expression level, blue node indicates downregulation of the gene expression level, and gray node indicates that the gene expression level is unknown.
4 Discussion and conclusion
In this study, we developed a novel method, DIRECTUER, to replace DR-inducing TFs by DR-inducing small molecules for any direct cell conversions from the integration of TF-inducing transcriptome and small molecule-induced transcriptome data. To the best of our knowledge, this is the first in silico method for predicting the combination of small molecules to induce DR based on perturbed transcriptome data. The novelty of the proposed method lies in the extraction of DR-characteristic gene expression patterns, in the development of a variant of simulated annealing to explore the optimal combination of small molecules, and in the applicability to any types of direct cell conversions whose TFs are known. The proposed method enables us to identify the optimal combination of small molecules for DR from a huge number of candidate small molecules and is expected to be useful for promoting research in regenerative medicine.
To induce DR, multiple biological functions must be regulated cooperatively (Qin et al. 2017). Previous studies reported that DR is induced by regulating multiple biological functions such as signaling pathways, histone modifications, and other biological processes (e.g. autophagy and metabolic processes) (Qin et al. 2017, Xie et al. 2017), by altering gene expression in source cells. Additionally, it has been shown that suppression of inflammatory responses enhances cell reprogramming (Muraoka et al. 2019). In the case of DR of neurons, we identified a candidate combination of TTNPB, romidepsin, etoposide, and bufexamac (an anti-inflammatory drug). TTNPB regulates signaling pathways (Qin et al. 2017, Yuan et al. 2020), romidepsin and etoposide regulate histone modifications (Bertino and Otterson 2011, Montecucco et al. 2015), and Bufexamac (an anti-inflammatory drug) promotes histone modifications (Bantscheff et al. 2011) (Supplementary Fig. S3a). In the case of DR for cardiomyocytes, a candidate combination of I-BET151, binimetinib and filgotinib (two anti-inflammatory drugs) was identified. I-BET151 regulates histone modification (Qin et al. 2017, Yuan et al. 2020), and binimetinib and filgotinib inhibit MAPK signaling and JAK-STAT signaling, respectively (Van Rompaey et al. 2013, Woodfield et al. 2016) (Supplementary Fig. S3b). The predicted combinations of small molecules could induce DR by regulating multiple biological processes that are essential for the induction of DR.
Most of the previous studies on DR have focused on experimentally identifying TFs that induce DR for various cell types (Horisawa and Suzuki 2020). However, the risk of tumorigenesis caused by gene insertions is a serious problem, induction of DR by small molecules such as drugs or chemical agents is desired (Federation et al. 2014, Qin et al. 2017, Xie et al. 2017, Yuan et al. 2020). DR-inducing small molecules have not been identified yet for most cell types. For example, HNF4α, Foxa3, Gata6 and Cdx2 were identified as DR-inducing TFs for intestinal cells (Miura and Suzuki 2017), but the DR-inducing small molecules have not been established. The proposed method can predict the optimal combination of small molecules on the basis of the list of TFs that are known to induce DR as soon as TF-induced transcriptome profiles are available; thus, the proposed method can be applied to any cell types whenever its DR-inducing TFs have been identified. Hence, the proposed method is expected to be useful for replacing DR-inducing TFs by DR-inducing small molecules, thus reducing the tumorigenic risk due to mutations.
In our proposed method, a combinatorial optimization algorithm was performed to search for combinations of small molecules that induce DR. This method enables the reduction of experimental costs for identifying candidate small molecules from an infinite number of possible small molecules. This was made possible through the restriction of the total number of predictable small molecules by a sub-objective function designed in this study. Nevertheless, there remain parameters that will need to be improved. A related previous method is a regression-based approach (Napolitano et al. 2021). There are differences between our proposed method and the previous method. First, the previous method used gene expression data representing each cell type, whereas we used gene expression data during the process of inducing DR. The prediction using our proposed method is performed based on the gene expression profile that reflects the transcriptomic characteristics of DR. Second, the previous method transformed gene expression profiles into pathway-based profiles for prediction, whereas we directly used element values from the gene expression profiles for prediction. A possible research direction would be to take into account optimal experimental conditions such as molecule concentrations and duration time for small molecules.
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest
None declared.
Funding
This work was supported by a grant from JSPS KAKENHI [18K19930, 20H05797, 21K18327]; and The Okawa Foundation for Information and Telecommunications.