-
PDF
- Split View
-
Views
-
Cite
Cite
Toru Nakamura, Michio Iwata, Momoko Hamano, Ryohei Eguchi, Jun-ichi Takeshita, Yoshihiro Yamanishi, Small compound-based direct cell conversion with combinatorial optimization of pathway regulations, Bioinformatics, Volume 38, Issue Supplement_2, September 2022, Pages ii99–ii105, https://doi.org/10.1093/bioinformatics/btac475
- Share Icon Share
Abstract
Direct cell conversion, direct reprogramming (DR), is an innovative technology that directly converts source cells to target cells without bypassing induced pluripotent stem cells. The use of small compounds (e.g. drugs) for DR can help avoid carcinogenic risk induced by gene transfection; however, experimentally identifying small compounds remains challenging because of combinatorial explosion.
In this article, we present a new computational method, COMPRENDRE (combinatorial optimization of pathway regulations for direct reprograming), to elucidate the mechanism of small compound-based DR and predict new combinations of small compounds for DR. We estimated the potential target proteins of DR-inducing small compounds and identified a set of target pathways involving DR. We identified multiple DR-related pathways that have not previously been reported to induce neurons or cardiomyocytes from fibroblasts. To overcome the problem of combinatorial explosion, we developed a variant of a simulated annealing algorithm to identify the best set of compounds that can regulate DR-related pathways. Consequently, the proposed method enabled to predict new DR-inducing candidate combinations with fewer compounds and to successfully reproduce experimentally verified compounds inducing the direct conversion from fibroblasts to neurons or cardiomyocytes. The proposed method is expected to be useful for practical applications in regenerative medicine.
The code supporting the current study is available at the http://labo.bio.kyutech.ac.jp/~yamani/comprendre.
Supplementary data are available at Bioinformatics online.
1 Introduction
Direct cell conversion, direct reprogramming (DR), is an innovative technology that directly converts source cells to target cells without bypassing induced pluripotent stem cells (iPSCs) (Horisawa and Suzuki, 2020; Takahashi and Yamanaka, 2006; Xu et al., 2015). To obtain target cells using iPSCs, it is necessary to generate iPSCs from source cells and then convert the iPSCs to target cells. The conversion of source cells to target cells via iPSCs is a prolonged process involving two steps. In contrast, DR can be used to obtain target cells in less time because source cells can be converted to target cells without iPSCs. However, the mainstream method for cell conversions is gene transfection using viral vectors; thus, using this method, there is a risk of virus-induced gene insertion and mutation that can result in cancer (Assou et al., 2018; Grath and Dai, 2019). Although DR requires fewer transfection genes compared with those required when using iPSCs (Firas et al., 2015), there remains a risk of carcinogenesis (Gam et al., 2019).
To overcome the inherent problems of gene transfection-based DR, the use of small compounds for DR has been proposed (Qin et al., 2017; Xie et al., 2017). For example, fibroblasts can be converted to neurons using five small compounds: CHIR99021 (an activator of the Wnt signaling pathway), SB431542 [an inhibitor of the transforming growth factor beta (TGFβ) signaling pathway], LDN193189 (an inhibitor of the bone morphogenetic protein type I receptor ALK2/3), PD0325901 [an inhibitor of the mitogen-activated protein kinase/extracellular signal-regulated kinase (MAPK/ERK) signaling pathway], pifithrin-a (an inhibitor of p53) and forskolin [an activator of the protein kinase A (PKA)] (Dai et al., 2015). Small compound-based DR does not require viral vectors; thus, the carcinogenic risk can be avoided. Several experimental studies of small compound-based DR for conversions to neurons and cardiomyocytes have been conducted; however, the mechanism of small compound-based DR remains unclear (Yuan et al., 2020).
DR-inducing small-compound candidates have been experimentally identified without considering the molecular biological mechanism underlying the process. However, such experimental identification of candidates involves exhaustive screening that is costly and time-consuming. Nevertheless, it is crucial to understand the underlying mechanism of DR to move toward the rational design of small-compound combinations. Another issue is the low efficacy of cell conversion from the DR-inducing small compounds that have been reported to date (Xie et al., 2017). The challenge remains to identify fewer small compounds that enhance the efficacy of cell conversion. Computational approaches are expected to play key roles in DR research. Indeed, several computational methods have already been developed to predict the transcription factors that induce DR (Guerrero-Ramirez et al., 2018; Rackham et al., 2016; Ronquist et al., 2017). However, research into the computational prediction of small compounds that induce DR is currently lacking.
In this study, we present a novel computational method, COMPRENDRE (combinatorial optimization of pathway regulations for direct reprograming), to elucidate the mechanism of small compound-based DR and predict new combinations of small compounds for DR. Using the method, we elucidated the aforementioned mechanism, which was achieved by estimating the potential target proteins of DR-inducing small compounds and identifying a set of target pathways involving DR. We also developed a computational algorithm with which to predict new combinations of DR-inducing small compounds, which was made possible by combinatorial optimization of DR-related pathway regulation. Finally, we predicted new combinations of compounds that induce DR from fibroblasts to neurons or cardiomyocytes.
2 Materials and methods
2.1 Overview of the proposed method for small compound-based DR
We established a novel computational framework with which to elucidate the mechanism of small compound-based DR and predict new combinations of small compounds for DR. An overview of our proposed method is summarized in Figure 1a and b.

Overview of the proposed method. (a) Identification of DR-related biological pathways. To identify the activity of known combinations of DR compounds at the pathway level, we curated known target proteins for the compounds. Lists of target pathways were constructed using pathway enrichment analysis for the target proteins; pathways detected at a significance level of P ≤ 0.05 were selected as DR-related target pathways. (b) Combinatorial optimization of DR-related pathway regulations. We predicted new combinations of small compounds for DR induction using simulated annealing, where the combination of small compounds that covers the most DR-related target pathways was predicted as a new candidate combination. (c, d) Compounds in the combinations that induce DR from fibroblasts to (c) neurons and (d) cardiomyocytes. Rows indicate the number of each set of compounds; columns indicate the compound names
First, target proteins of DR-inducing small compounds were estimated based on small compound–protein interactome data. Next, pathway enrichment analysis of the target proteins was performed to identify DR-related target pathways (Fig. 1a). If the pathways regulated by compounds in the predicted combinations were similar to those regulated by known combinations, the set of small compounds was predicted to be a candidate set for DR induction (Fig. 1b). We optimized compound combinations that can regulate a larger number of DR-related target pathways using a variant of simulated annealing (Černý, 1985; Kirkpatrick et al., 1983).
We focused on DR from fibroblasts to neurons or cardiomyocytes because there are many reported cases of DR related to these cells (Han et al., 2021; Testa et al., 2021). We manually curated known combinations of compounds for DR to neurons from six previous studies (Dai et al., 2015; He et al., 2015; Hu et al., 2015; Li et al., 2015; Wan et al., 2018; Xie et al., 2018). Each combination was reported to induce DR from fibroblasts to neurons in each study; the combinations are denoted by the compound set numbers 1–6 (Fig. 1c). Similarly for DR to cardiomyocytes, we curated known combinations from seven previous studies (Fu et al., 2015; Han et al., 2017; Huang et al., 2018; Ifkovits et al., 2014; Jayawardena et al., 2014; Testa et al., 2020; Xie et al., 2017) and the combinations are denoted by the compound set numbers 7–13 (Fig. 1d). With the aim of developing clinical applications, we also predicted candidate combinations of approved drugs.
2.2 Known small compounds for DR induction
The combinations for DR to neurons were denoted by compound set numbers 1–6 (Dai et al., 2015; He et al., 2015; Hu et al., 2015; Li et al., 2015; Wan et al., 2018; Xie et al., 2018; Fig. 1c), whereas those for DR to cardiomyocytes were denoted by compound set numbers 7–13 (Fu et al., 2015; Han et al., 2017; Huang et al., 2018; Ifkovits et al., 2014; Jayawardena et al., 2014; Testa et al., 2020; Xie et al., 2017; Fig. 1d). As an example, the combination of four small compounds (forskolin, a PKA activator; CHIR99021, a glycogen synthase kinase 3 inhibitor; I-BET151, a bromodomain and extraterminal domain inhibitor and ISX9, a neurogenic agent) that induce DR from fibroblasts to neurons comprised set 4 (Xie et al., 2018).
2.3 Interactome data: interactions between small compounds and their target proteins
Interactions between small compounds and their targets proteins were acquired from seven public databases: ChEMBL (Gaulton et al., 2012), MATADOR (Günther et al., 2008), Drug Bank (Knox et al., 2011), Psychoactive Drug Screening Program Ki (PDSP-Ki) (Roth et al., 2000), Kyoto Encyclopedia of Genes and Genomes (KEGG) DRUG (Kanehisa et al., 2009), BindingDB (Liu et al., 2007) and Therapeutic Target Database (Qin et al., 2014). We extracted 35 366 compound–protein interactions involving 4956 drugs and 3062 target proteins from the small compound–protein interactome data. This dataset was referred to as ‘drug–protein interactome data’.
2.4 Network analysis for identifying DR-related pathways regulated by small compounds
To identify the pathways regulated by small compounds, we performed pathway enrichment analysis of proteins targeted by DR-inducing small compounds. We used 205 biological pathways in the KEGG.
We calculated P-values using the false discovery rate (FDR) (Benjamini et al., 1995). Enriched pathways with statistically lower P-values were predicted to be DR-related pathways.
2.5 Combinatorial optimization of DR-related pathway regulations to predict new combinations of small compounds for DR (COMPRENDRE)
the jth element of is 1 if pathway j is significantly regulated (P < 0.05) by the selected k compounds and 0 if pathway j is not regulated.
where .
In this study, we applied a variant simulated annealing algorithm, which is one of the local search methods in a meta-strategy, to obtain a good solution to the combinatorial optimization problem (Černý, 1985; Kirkpatrick et al., 1983). The ‘local search method’ is a general term for a method that obtains a solution by selecting one neighboring solution from the current solution and repeating the operation of transferring the current solution to the neighboring solution under certain conditions until the termination condition is satisfied. The simulated annealing is an algorithm among local search methods that allows the current solution to transition to the neighboring solution, even if the neighboring solution deteriorates with a certain probability. In this context, we propose the following algorithm:
Step 1: Arbitrarily choose k satisfying and randomly select a set of k-small compounds as the initial state. In other words, generate an n-dimensional 0–1 vector with k elements of 1 and elements of 0. Then, generate an n-dimensional 0–1 vector with elements of 1 and elements of 0 for a set of known compounds. Then, the objective function is calculated.
Step 2: Randomly select whether to increase the number of small compounds by one to generate a neighborhood solution or to decrease the number of small compounds by one to generate a neighborhood solution.
In the case of increasing by one:
Randomly select one of the small compounds that have not yet been selected and add it to the candidate combinations. In other words, randomly select one of the elements of that is 0 and change it to 1, and set the n-dimensional 0–1 vector as the neighborhood solution .
In the case of decreasing by one:
Randomly select one of the small compounds from the already selected small compounds and remove it from the candidate combinations. In other words, randomly select one of the elements of that is 1 and change it to 0 and set the n-dimensional 0–1 vector as the neighborhood solution .
The objective function is then calculated.
Step 3: If , set with the probability 1. Otherwise, set with the probability .
Step 4: If a predetermined fixed time has elapsed, let be the final solution. Otherwise, return to Step 2.
Here, compound is the number of small compounds used in the prediction (4956). The calculation time was set to 10 h and the number of iterations was within 1 000 000. We defined the highest objective function score as the prediction score. Note that in the original annealing method, the end temperature was set as the end condition of the algorithm; however, in this study, the calculation time was set as the end condition of the algorithm.
2.6 Chemical similarity score
3 Results
3.1 DR-related biological pathways can be identified using network analysis of the target proteins of DR-inducing compounds
To elucidate the molecular mechanism underlying the regulation of DR-inducing small compounds, we identified all possible DR-related pathways using network analysis of proteins targeted by DR-inducing small compounds.
For DR from fibroblasts to neurons, we identified 90 target pathways, including MAPK signaling, TGFβ signaling, Hedgehog signaling and Wnt signaling pathways; these are known to be important pathways in the neuron conversion process (Qin et al., 2017; Vasan et al., 2021; Yuan et al., 2020; Fig. 2a). Because the identification of target pathways depends on the number of target proteins of known small compounds, compound set 5 had fewer target pathways than those of the other compound sets. Additionally, lysine degradation was predicted by compound set 5 only, implying that the small compounds in this set have different regulations compared with those of the small compounds in the other sets. In the five small-compound combinations other than those in set 5, the γ-aminobutyric acid (GABA)ergic synapse, cholinergic synapse and glutamatergic synapse pathways, each associated with nervous function, were predicted. This suggests that the proposed pathway analysis method identifies relevant pathways associated with DR induction for neurons.

Target pathways identified using known combinations of compounds. (a, b) Target pathways for DR from fibroblasts to (a) neurons and (b) cardiomyocytes. Rows show the names of the detected biological pathways (P ≤ 0.05). Columns indicate the set numbers. Each pathway is divided according to its functional category. Pathways indicated by asterisks are known pathways for DR
For DR from fibroblasts to cardiomyocytes, we identified 102 target pathways, including the TGFβ, Janus kinase–signal transducer and activator of transcription and PI3K-Akt signaling pathways, which are fundamentally involved in converting cardiomyocytes (Garry et al., 2022; Qin et al., 2017; Snyder et al., 2010; Thomas et al., 2020; Yamakawa and Ieda, 2021; Yuan et al., 2020; Fig. 2b). The pathway regulation patterns were similar for conversion to neurons and cardiomyocytes with some differences. For example, metabolism of carbohydrates, lipids, cofactors and vitamins tended to be regulated for cardiomyocytes. The immune system tended to be regulated in compound set 8 only. Inflammation and immune responses hinder the cardiac reprogramming process (Yamakawa and Ieda, 2021). A previous study demonstrated that an anti-inflammatory compound promoted DR from fibroblasts to cardiomyocytes (Muraoka et al., 2019). The Rap1 pathway, which regulates cardiomyocyte differentiation and reprogramming (Thomas et al., 2020), was commonly predicted in six small-compound combinations (except for set 7). Overall, we identified both currently known and new biological pathways for DR.
3.2 Performance evaluation of the proposed method for the prediction of DR-inducing compound combinations
To evaluate the performance of our proposed method, for each DR case, we predicted new combinations of small compounds that induce DR by applying a variant of the simulated annealing algorithm. As the number of iterations for the optimization increased, the value of the total objective function, , and the values of subobjective functions, and , increased (Fig. 3a–d). The output of the total objective function suggested that the iterative calculation successfully converges to the maximum value (=3.0) (Fig. 3a). The output of the subobjective function suggested that the controlled pathways for the predicted combination of compounds were close to the target pathways for the known combination of compounds (Fig. 3b). Additionally, the output of the subobjective function suggested that the controlled pathways for the predicted combination of compounds were limited to those for the known combination of compounds (Fig. 3c). Finally, the output of the subobjective function suggested that the number of compounds in the predicted combination was smaller than or equal to that of the compounds in the known combination of compounds (Fig. 3d). These results suggest that the simulated annealing method can effectively identify optimal combinations of new small compounds with an arbitrary objective function.

Performance evaluation of the proposed method. (a) Output of the total objective function during optimization. The function takes a value of 0.0–3.0. (b) Output of the subobjective function that evaluates whether the target pathways are covered. (c) Output of the subobjective function that evaluates whether or not pathways that are not the target are included. (d) Output of the subobjective function that evaluates the number of compounds in the predicted combination. These scores take a value of 0.0–1.0. The transient for DR from fibroblasts to neurons (compound set 4) is shown in this figure. The vertical axis shows the value of each function or score. The horizontal axis shows the number of iterations for optimization. (e,f) Distributions of prediction scores for DR from fibroblasts to (e) neurons and (f) cardiomyocytes. The prediction score is the value of the total objective function for the best combination for each prediction. Each boxplot shows the distribution of scores in 10 trials in which new combinations of small compounds were predicted. The vertical axes show the predicted scores of the new combinations consisting of small compounds
In the prediction of combinations that induce DR from fibroblasts to neurons, new combinations with a prediction score (i.e. the highest objective function score) ≥2.7 were predicted for all compound sets (Fig. 3e). In sets 4 and 5, the prediction score was 3.0, that is, the theoretical maximum for all 10 predictions. In predictions for DR from fibroblasts to cardiomyocytes, new combinations with a prediction score ≥2.7 were predicted for five of the seven small-compound sets (Fig. 3f). The other sets, that is, sets 7 and 8, consisted of the compounds SB431542 and JAK inhibitor I, respectively. Given the difficulty of controlling the same pathways using another compound, the prediction scores were not high for these sets. Note that these compounds are not approved drugs; therefore, they were not searched for via the proposed method. In compound sets 9 and 11, the prediction score was also 3.0. Thus, by applying the simulated annealing method to detect the optimal combination of small compounds for DR, we predicted new compound combinations from small compounds that control the same pathways as those targeted by known compound combinations.
3.3 Newly predicted combinations of small compounds that induce DR
Finally, we comprehensively predicted new combinations of small compounds for DRs from fibroblasts to neurons and from fibroblasts to cardiomyocytes (Supplementary Figs S1–S13).
As alternatives to the known combination of small compounds that induce DR from fibroblasts to neurons (forskolin, CHIR99021, I-BET151 and ISX9), the predicted small compounds were forskolin and amantadine hydrochloride [a neuropsychiatric agent involved in hippocampal neurogenesis (Raupp-Barcaro et al., 2021)] (Fig. 4a) Thus, there were fewer compounds in the predicted combination than in the known combination and forskolin was included in both combinations. The other compounds in the known and predicted combinations were different and structurally dissimilar. Furthermore, their target proteins were not common among compounds, suggesting that the proposed method enables the prediction of new combinations that cannot be achieved using target proteins. According to the proposed prediction method, amantadine hydrochloride in the predicted combination could be expected to have the same pathway regulation as that of ISX9, CHIR99021 and I-BET151 in the known combination.

Newly predicted combinations of small compounds. (a) Associations between known compounds and predicted compounds for DR from fibroblasts to neurons (compound set 4). (b) Associations between known compounds and predicted compounds for DR from fibroblasts to cardiomyocytes (compound set 11). Columns of the confusion matrix represent predicted compounds, whereas rows represent known compounds. Blue-colored bars indicate the chemical similarity score. Green-colored bars indicate whether the compounds have a common target protein {0, 1}. The number of target proteins of each compound is shown in brackets. The table shows the numbers of target proteins and pathways identified for the known combination and the prediction score for the predicted combination. (c) Comparison of the numbers of compounds in known and predicted combinations for DR from fibroblasts to neurons. (d) Comparison of the numbers of compounds in known and predicted combinations for DR from fibroblasts to cardiomyocytes. Blue and orange bars indicate the number of compounds in the known combination and small compounds in the predicted combination, respectively. The number of small compounds is the average for 10 predictions. Error bars in both bar charts indicate the standard deviation. (e,f) The frequency of small compounds in predicted combinations for DR from fibroblasts to (e) neurons and (f) cardiomyocytes. The horizontal axes show the top 10 small compounds in the predicted combinations (among 10 predictions for all sets). The vertical axes show the number of compounds in all predictions. Small compounds shown in bold are known compounds for DR
As alternatives to the known combination of small compounds for DR from fibroblasts to cardiomyocytes (CHIR99021, RepSox, forskolin, valproic acid, parnate, TTNPB and rolipram), the predicted small compounds were forskolin, valproic acid, apixaban (a cardiovascular agent) and escitalopram (a neuropsychiatric agent) (Fig. 4b). Forskolin and valproic acid were included in both combinations, suggesting that the other predicted compounds, that is, apixaban and escitalopram, regulate the same pathways as those regulated by CHIR99021, RepSox, parnate, TTNPB and rolipram. Some pairs of compounds had common target proteins. For example, CHIR99021 and valproic acid shared at least one target protein; however, these compounds were not similar in terms of chemical structure (similarity score = 0.03). Therefore, the method does not depend on the chemical structures of compounds.
We repeated the prediction for each set of compounds 10 times and compared the number of small compounds in known and predicted combinations (Fig. 4c and d). For DR of both neurons and cardiomyocytes, we identified novel compound combinations with fewer compounds that regulate DR-related pathways. Overall, forskolin and valproic acid, which are commonly used for DR, tended to be consistently included in the novel compound combinations (Fig. 4e and f). Given that forskolin and valproic acid are significant factors in DR induction, the novel compound combinations for DR induction can be considered reasonable. Overall, these results suggest that the activity of compounds in the known combinations can be combinatorially covered by fewer compounds.
4 Discussion
In this study, we developed a new computational method, COMPRENDRE, to predict combinations of small compounds that induce DR and we showed the usefulness of the method when applied to small compound-based DRs from fibroblasts to neurons or cardiomyocytes. Using a variant of a simulated annealing algorithm, the proposed method was able to identify new target pathways for DR and predict new compound combinations that regulate target pathways. The originality of the proposed method lies in its ability to predict fewer compounds in predicted combinations, its focus on the pathway regulation of compounds and its prediction of new compounds that could not previously be explored using target protein or chemical structure similarity. Thus, the proposed method is expected to be useful for computational identification of DR-inducing small compounds. In this study, we showed applications for DR from fibroblasts to neurons or cardiomyocytes. The proposed method could be applied to other cell conversions if there is information on the target proteins or regulatory pathways of the compounds that induces the cell conversion.
The predicted combinations frequently included forskolin (a labdane diterpene) and valproic acid (an anticonvulsant), which have previously been used for DR in many cell types including cardiomyocytes (Fu et al., 2015), neurons (Hu et al., 2015) and hepatocytes (Guo et al., 2017); thus, these compounds are apparently necessary for small compound-based DR across cell types. We also predicted alternative small compounds from those that have the same function as known small compounds. The predicted combinations in neurons and cardiomyocytes frequently included colforsin daropate hydrochloride, which is a novel water-soluble analog of forskolin (Hayashida et al., 2001). Additionally, galunisertib (LY2157299), a TGFβ receptor I (TβRI) inhibitor, was frequently predicted for DR to neurons. Furthermore, RepSox, another TβRI inhibitor, is a key small compound in many types of cell conversions (Cao et al., 2017; Fu et al., 2015; Hu et al., 2015). Therefore, the predicted compounds are relevant to DR induction.
We optimized the regulation of potential DR-related biological pathways via the target proteins of DR-inducing small compounds. Certain biological pathways are known to be regulated during cell conversion (Qin et al., 2017); however, it is specifically important to distinguish the activatory and inhibitory regulation of pathways for cell conversion. For example, the Wnt signaling pathway is reportedly activated during DR from mouse fibroblasts to neural progenitor cells by CHIR99021 (Ladewig et al., 2012), whereas the MAPK/ERK signaling pathway was shown to be inactivated during DR from human fibroblasts to neurons by PD0325901 (Dai et al., 2015). This process may be limited by a lack of information on the functional type of compound–target interactions. If accurate information on the inhibition or activation of all target proteins were available, it would be possible to identify the inhibitory and activated pathways separately. Therefore, the proposed method has the potential to improve prediction performance by specifically considering the activation or inhibition of DR-related pathways via each compound.
Funding
This paper was published as part of a special issue financially supported by ECCB2022. This work was supported by the JSPS KAKENHI Grant Numbers 18K19930, 20H05797 and 21K18327.
Conflict of Interest: none declared.
Data Availability
All data are incorporated into the article and its online supplementary material.