-
PDF
- Split View
-
Views
-
Cite
Cite
Yutong Li, Pedro Henrique da Costa Avelar, Xinyue Chen, Li Zhang, Min Wu, Sophia Tsoka, CLigOpt: controllable ligand design through target-specific optimization, Bioinformatics, Volume 40, Issue Supplement_2, September 2024, Pages ii62–ii69, https://doi.org/10.1093/bioinformatics/btae396
- Share Icon Share
Abstract
A key challenge in deep generative models for molecular design is to navigate random sampling of the vast molecular space, and produce promising molecules that strike a balance across multiple chemical criteria. Fragment-based drug design (FBDD), using fragments as starting points, is an effective way to constrain chemical space and improve generation of biologically active molecules. Furthermore, optimization approaches are often implemented with generative models to search through chemical space, and identify promising samples which satisfy specific properties. Controllable FBDD has promising potential in efficient target-specific ligand design.
We propose a controllable FBDD model, CLigOpt, which can generate molecules with desired properties from a given fragment pair. CLigOpt is a variational autoencoder-based model which utilizes co-embeddings of node and edge features to fully mine information from molecular graphs, as well as a multi-objective Controllable Generation Module to generate molecules under property controls. CLigOpt achieves consistently strong performance in generating structurally and chemically valid molecules, as evaluated across six metrics. Applicability is illustrated through ligand candidates for hDHFR and it is shown that the proportion of feasible active molecules from the generated set is increased by 10%. Molecular docking and synthesizability prediction tasks are conducted to prioritize generated molecules to derive potential lead compounds.
The source code is available via https://github.com/yutongLi1997/CLigOpt-Controllable-Ligand-Design-through-Target-Specific-Optimisation.
1 Introduction
The development of novel drugs is dependent on the efficient discovery of novel molecules (Bilodeau et al. 2022b). A critical challenge in this process is the difficulty in navigating the immense molecular space. Traditional methods, such as high-throughput screening, are inefficient as they require large resources and only result in a small number of hits (Zeng et al. 2022). Attention has been drawn to powerful generative deep learning techniques for efficient drug design due to their ability to learn a distribution for a given dataset and produce novel synthetic data from it. Deep generative models can explore chemical space efficiently through formulating a mapping between desired properties and corresponding molecular structures (Bilodeau et al. 2022a). Attributed to the success of syntactic pattern recognition in language modelling tasks, early molecular generative models are designed on the basis of SMILES, a line notation that describes chemical structures using ASCII strings. However, SMILES representation is limited by its intrinsic instability in describing molecular structural information, as structurally similar molecules can result in dissimilar SMILES strings (Pang et al. 2024).
Graph-based generative models are considered particularly suitable for molecular generation, as molecular structures are naturally well-represented by graphs (Cheng et al. 2021). With success in applications of generative models and graph neural networks (GNNs), many popular generative architectures such as variational autoencoder (VAE) (Liu et al. 2018), flow-based models (Jin et al. 2023), are adapted to graph data structures and applied on molecule generation tasks. A common generation strategy starts with a latent representation with no atoms and bonds, building molecular graphs through autoregressive or one-shot generation. However, such generation strategy utilizes random sampling to introduce novelty in the generated molecular structures. As molecule generation is extremely strict with precise validity and biological constraints, the basic graph-based molecular generation is unreliable and requires further control, which leads to the adoption of chemistry-aware constrained generation steps (e.g. Liu et al. 2018).
Fragment-based drug design (FBDD) approaches, which generate molecules conditioned on pre-existing sub-molecular units, constitute effective means to further constrain chemical space and improve generation of biologically active compounds (Mukaidaisi et al. 2022). Compared to random sampling schemes widely applied in exploring novel structures, using chemically meaningful fragments in molecular generation keeps search space chemically relevant and reduces the number of generation steps needed for frequent molecular motifs (Maziarz et al. 2021). Moreover, FBDD strategies can utilize information obtained from structure–activity relationship analysis, and can preserve such useful biological activity properties in the synthesized molecules. FBDD strategies have been introduced and successfully applied in fragment-to-lead studies via different techniques, such as scaffold hopping (Imrie et al. 2021), linker/PROTAC design (Imrie et al. 2020), and R-group optimization (Jin et al. 2023).
Optimization approaches are often integrated with generative models to navigate through the search space and identify promising initial seeds which can generate molecules with specified properties. Such optimization is usually implemented through heuristic optimization techniques, such as, Bayesian optimization (BO) (Gómez-Bombarelli et al. 2018), or reinforcement learning (RL) (Popova et al. 2018). However, these methods require additional computation while balancing exploration and exploitation within the explored space. For instance, BO may struggle with high-dimensional search spaces as the number of features increases and the search space becomes increasingly sparse, thereby requiring more time and training data to find promising regions to sample. RL is limited in multi-objective optimization, as it requires large volume of data and involves a lot of computation. Whereas for molecular generation problem, the more conditions added to optimization objective, the less suitable molecules are for policy training. Therefore, controlling multiple attributes in molecular design efficiently remains challenging.
To generate molecules that preserve chemically meaningful fragments and combine multiple attributes relating to different properties, we propose a controllable graph-based deep generative model for fragment linking, CLigOpt, towards target-specific ligand generation. CLigOpt takes two substructures with desired chemical functions and generates a linker that connect them, resulting in a valid molecule. CLigOpt incorporates a Controllable Generation Module (CGM), which evaluates an initial sample with respect to its chemical properties before generation, so that linkers satisfy multiple chemical property requirements. Our model exhibits consistent and strong performance across six evaluation metrics in both structural feasibility assessment and chemical usefulness. A case study on designing inhibitors for hDHFR is used to demonstrate applicability, and the quality of generated inhibitors is assessed via molecular docking simulations and retrosynthesis reaction estimation. The top prioritized molecules are provided as potential leads for hDHFR inhibition, together with the predicted retrosynthesis reactions.
The main contributions of our work are as follows: (i) a VAE-based model for linker design, where co-embeddings of nodes and edges are utilized to improve the information learned from molecular graphs, is developed and evaluated, (ii) a multi-objective controllable molecular generation module, CGM, is introduced to generate promising target-specific ligands, (iii) comparisons with state-of-the-art methods and multiple metrics are reported to evaluate results of the generative process.
2 Materials and methods
2.1 Data
The processed and filtered ZINC (Sterling and Irwin 2015) and CASF (Su et al. 2018) datasets were used as previously described (Imrie et al. 2020) for general-purpose model training and testing (41 74 997 examples from ZINC for training, 400 examples from ZINC and 309 examples from CASF for testing). Each example consisted of two fragments and a linker with size between 3 and 12 atoms. For target-specific linker design, 814 inhibitors and their binding affinities (pIC50) were obtained from ChEMBL (Gaulton et al. 2017), targeting human dihydrofolate reductase (hDHFR). The inhibitors were processed and fragmented following the procedure described in Hussain and Rea (2010), resulting in 1576 fragment pairs, 80% of which were used for training affinity predictors, and 20% selected for target-specific controllable generation.
2.2 Model framework

A diagrammatic representation of CLigOptCensNet. (A) An overview of the CensNet encoder, two fragments and a complete molecule are encoded into latent space, from where an initial representation for decoder is formed through the weighted sum of the average fragment representation and a sample is how the model Encoder encodes two molecules into the latent space, which is then decoded by our Decoder. (B) An overview of CensNet decoder, where the initial representation is applied through a sequential edge prediction process joining the two fragments. (C) The controllable generation module (CGM) where our trained encoder is used to encode molecules into the latent space to fit an explicit density model and attribute classifiers, and then sample from the fitted density model and attribute classifiers before decoding each point into the final molecule.
2.2.1 Encoder
The workflow of encoder is shown in Fig. 1A. Two models [GCN (Kipf and Welling 2016) and CensNet (Jiang et al. 2023)] are employed as encoders and their ability to learn representations from hidden graph embeddings is investigated. Specifically, GCN is one of the most commonly used GNN architectures and its convolutions can be seen as message-passing through the node adjacency matrix, using only edge information to determine which nodes are adjacent to update node features. CensNet defines two types of layers, i.e. node and edge layer, which employ convolutions on line graphs, enabling the roles of nodes and edges to switch. Through convolution operations, co-embeddings of nodes and edges can be learned. Owing to this alternative updating rule, CensNet allows embedding nodes and edges to latent feature space simultaneously, as well as mutual enrichment over node and edge information.
2.2.2 Decoder
Our training and generative procedures are explained in Algorithms S1 and S2. The decoder utilizes a similar sequential generation scheme inspired by CGVAE (Liu et al. 2018) where GNNs are applied to predict edge connection through a bond-by-bond manner (see Fig. 1B). We utilize the same graph embedding network engaged in Encoder (GCN and CensNet) to learn an informative representation of the generated graph. Networks are improved by adding layer normalization and dropout unit for each layer to reduce the impact of internal co-variate shift and prevent overfitting.
The reconstructed molecule is then formed by and .
2.2.3 Training
2.3 Target-specific controllable generation
2.3.1 Controllable generation module
We adapt CLaSS as our CGM aiming to generate molecules with high drug-likeness, high synthetic accessibility (SA) and high binding affinity to a target protein. A schematic representation of our module is illustrated in Fig. 1C. A Gaussian mixture density model is fitted over known molecules to estimate the posterior distribution of latent space z. From there, we design three classifiers in parallel, which take in a given latent embedding z and a pair of fragments encoded by CLigOpt, to evaluate the quantitative estimate of druglikeness (QED), SA score, and the negative logarithm of IC50 (pIC50) respectively, aligning with the aforementioned three standards. Rejection sampling then operates through sampling from the fitted density model. CGM evaluates whether an initial sample is likely to satisfy certain standards on selected attributes through the previously mentioned classifiers, and thus decides whether to proceed with generation with this seed or not.
3 Results
The performance of CLigOpt was measured in different aspects of molecular generation, i.e. (i) fragment linking and forming plausible molecules, assessed through benchmark linker design (benchmark evaluation on test sets) and (ii) generating target-specific drug-like molecules (controllable target-specific ligand generation), measured through a case study of generating lead candidates for hDHFR. Generated molecules were then prioritized and binding potential of the generated molecule set was evaluated via docking (evaluation via molecular docking). Finally, synthesizability via specific reaction steps of promising molecules are indicated (synthesizability of generated molecules).
3.1 Benchmark evaluation on test sets
The ability of CLigOpt in generating reasonable molecules was evaluated on two benchmark datasets, namely ZINC and CASF, and compared with two baseline models for fragment linking, namely DeLinker (Imrie et al. 2020) and FFLOM (Jin et al. 2023). DeLinker is a VAE-based model that generates molecules in bond-by-bond manner, whereas FFLOM is a flow-based model that carries out one-shot generation of molecules. For each test set, 250 molecules were generated for each fragment pair, the quality of generated molecules was assessed on the basis of (i) structural usefulness via validity, novelty and uniqueness metrics, and (ii) chemical usefulness, i.e. SA score, ring aromaticity, and Pan-Assay Interference compounds (PAINS). The results are shown in Fig. 2 and Supplementary Table S1.

A radar plot showing the performance of our model as well as the two baselines. Although our model does not outperform the best model in each category, CLigOptCensNet achieves the best average performance.
3.1.1 Model comparison
CLigOpt shows comparative but more consistent performance across all metrics compared to baseline models on ZINC test set. All models achieve similar performance in terms of validity, ring aromaticity and PAINS. CLigOptCensNet attains the best overall performance with high scores in every metric, while CLigOptGCN performs slightly worse in uniqueness. While DeLinker underperforms in uniqueness and novelty (30.21% and 49.44% less molecules compared to CLigOptCensNet), FFLOM underperforms in SA filter (59.38% less molecules). Similar results are observed on CASF test set (see Supplementary Fig. S1), where CLigOpt outperforms DeLinker (58.42% on uniqueness, 48.58% on novelty) and FFLOM (49.34% on SA).
This indicates that DeLinker is able to generate molecules that are synthesizable and have good potential as drug candidates, but the generated set is repetitive and underperforms in novelty. On other hand, FFLOM can generate highly novel and unique molecules, however mostly difficult to synthesize. As a trade-off between these two cases, CLigOpt demonstrates competitive performance across all evaluation aspects.
3.2 Controllable target-specific ligand generation
A potential anticancer agent, hDHFR, is employed as target for a case study to investigate the ability of CGM to generate molecules with desired properties. QED, SA, and pIC50 were used as the optimization goals, aiming to generate molecular structures that fulfil pre-set conditions for three properties, (i) druglikeness, (ii) synthesizability, and (iii) binding activity to hDHFR. The three conditions correspond to three classifiers predicting from an initial sampling of the latent space, to evaluate whether CLigOpt can generate a molecule with QED > 0.6, SA < 3, and pIC50 > 6. The classifiers can be used together or independently to generate molecules with different property control. We take the test set of hDHFR inhibitors described in Section ‘Data’, and generate 10 molecules for each pair of fragments for random sampling and rejection sampling.
3.2.1 Ground truth property estimation
To evaluate the ability of CLigOpt in filtering out samples that did not meet the pre-set conditions, tools are employed to compute the true properties for the generated molecules as a guide. Due to concerns of introducing biases from overfitting, we utilize two pre-defined algorithms from rdkit (Bento et al. 2020) instead of neural networks, to compute the QED and SA score. Three machine learning approaches with diverse architectures were explored to predict pIC50: (i) transformer-CNN (Karpov et al. 2020), a SMILES-based Quantitative Structure–Activity Relationship (QSAR) model that learns molecular embeddings from a transformer encoder, and predicts the corresponding activity using CharNN, (ii) modSAR (Li et al. 2024, Cardoso-Silva et al., 2019a, 2019b), a two-stage QSAR model that involves detecting clusters of molecules, and applying mathematical optimization-based piecewise linear regression to link molecular structures to their biological activities, and (iii) DeepAffinity (Karimi et al. 2019), a semi-supervised deep learning model that utilizes a unified RNN and CNN model to predict binding affinity. We compare transformer-CNN and modSAR that are trained on a subset of hDHFR inhibitors, and DeepAffinity with the provided pre-trained parameters to select the most accurate model for predicting binding affinity.
Comparisons of the three models via the RMSE metric is shown in Supplementary Table S5. Transformer-CNN outperforms DeepAffinity and modSAR with RMSE = 0.82, and is selected to predict pIC50 for the generated molecules as a reference to evaluate the performance of CLaSS module. The results of the normalized fraction of molecules that meet the pre-set conditions, for a specific property as well as balancing all three properties, in a random generated set compared with CLaSS-generated set are shown in Table 1. Evaluation on how CLaSS performs in a single property with different thresholds is shown in Supplementary Tables S2–S4.
The proportion of molecules that pass certain filters when sampling randomly versus when sampling using the CLaSS accepted set.
. | QED > 0.6 . | SA < 3 . | pIC50 > 6 . | QED > 0.6 & SA <3 & pIC50 > 6 . |
---|---|---|---|---|
GCN | ||||
Random | 9.78% | 16.27% | 53.32% | 1.48% |
Accepted | 57.71% | 31.58% | 58.49% | 18.92% |
CensNet | ||||
Random | 11.17% | 13.23% | 56.99% | 1.33% |
Accepted | 49.47% | 22.07% | 62.78% | 10.35% |
. | QED > 0.6 . | SA < 3 . | pIC50 > 6 . | QED > 0.6 & SA <3 & pIC50 > 6 . |
---|---|---|---|---|
GCN | ||||
Random | 9.78% | 16.27% | 53.32% | 1.48% |
Accepted | 57.71% | 31.58% | 58.49% | 18.92% |
CensNet | ||||
Random | 11.17% | 13.23% | 56.99% | 1.33% |
Accepted | 49.47% | 22.07% | 62.78% | 10.35% |
More results for each individual property can be seen in Supplementary Tables S2–S4.
The proportion of molecules that pass certain filters when sampling randomly versus when sampling using the CLaSS accepted set.
. | QED > 0.6 . | SA < 3 . | pIC50 > 6 . | QED > 0.6 & SA <3 & pIC50 > 6 . |
---|---|---|---|---|
GCN | ||||
Random | 9.78% | 16.27% | 53.32% | 1.48% |
Accepted | 57.71% | 31.58% | 58.49% | 18.92% |
CensNet | ||||
Random | 11.17% | 13.23% | 56.99% | 1.33% |
Accepted | 49.47% | 22.07% | 62.78% | 10.35% |
. | QED > 0.6 . | SA < 3 . | pIC50 > 6 . | QED > 0.6 & SA <3 & pIC50 > 6 . |
---|---|---|---|---|
GCN | ||||
Random | 9.78% | 16.27% | 53.32% | 1.48% |
Accepted | 57.71% | 31.58% | 58.49% | 18.92% |
CensNet | ||||
Random | 11.17% | 13.23% | 56.99% | 1.33% |
Accepted | 49.47% | 22.07% | 62.78% | 10.35% |
More results for each individual property can be seen in Supplementary Tables S2–S4.
3.2.2 Controllable generation
Results show that CGM excels in efficiently generating more useful molecules. Table 1 and Supplementary Tables S2–S4 report higher percentage of molecules meeting the pre-set conditions in CLaSS-generated set compared to the randomly sampled set, indicating that CLigOpt is able to identify the latent embeddings that result in molecules with desired properties. Supplementary Tables S2–S4 illustrate that, for QED and SA control, CLigOpt can generate a molecule set that contains three times more qualified molecules than random sampling. CLigOpt is less effective in pIC50 control, yet still better than random sampling. CLigOpt is also capable of multi-objective control, generating molecules that satisfy requirements for different goals. Moreover, the accepted set maintains a certain level of similarity to the original set in terms of QED and LogP (see Supplementary Fig. S2).
3.3 Evaluation via molecular docking
AutoDock docking (Trott and Olson 2010) was employed, as developed by DeepChem (Ramsundar et al. 2019), to further evaluate the binding of CLigOpt generated molecules. Molecular docking is conducted with the 3D structure of hDHFR on the CLaSS-generated molecules with 10 highest-predicted pIC50. We compare the binding free energy (BFE) of the generated molecules with 10 hDHFR inhibitors with highest binding affinities. Table 2 shows the results of average (E) and minimum (Min) BFE of these molecules, with respect to their binding affinities. Results indicate that CLigOpt is able to generate promising molecules that bind the druggable pocket of the 3D target structure with similar BFEs compared to known hDHFR inhibitors.
. | Pred. aff. . | . | Min. . |
---|---|---|---|
ChEMBL | 8.970 | −10.168 | |
GCN | 6.671 | −9.930 | |
CensNet | 7.058 | −9.732 |
. | Pred. aff. . | . | Min. . |
---|---|---|---|
ChEMBL | 8.970 | −10.168 | |
GCN | 6.671 | −9.930 | |
CensNet | 7.058 | −9.732 |
. | Pred. aff. . | . | Min. . |
---|---|---|---|
ChEMBL | 8.970 | −10.168 | |
GCN | 6.671 | −9.930 | |
CensNet | 7.058 | −9.732 |
. | Pred. aff. . | . | Min. . |
---|---|---|---|
ChEMBL | 8.970 | −10.168 | |
GCN | 6.671 | −9.930 | |
CensNet | 7.058 | −9.732 |
3.4 Synthesizability of generated molecules
The number of reaction steps required to complete the synthesis of a given molecule provides an indication of its complexity with respect to commercially available materials. We randomly selected 100 molecules from CLigOptGCN and CLigOptCensNet generated set obtained in controllable target-specific ligand generation, as well as the ChEMBL set as a baseline, to compute and compare the percentages of synthesizable molecules within each set. The retrosynthetic routes for each molecule were assessed using a molecular transformer-based model (Schwaller et al. 2019a, 2019b), and it was observed that the generated molecules have higher successful retrosynthesis prediction than the ChEMBL set, with success rate of 84% molecules from the ChEMBL set, 96% from both generated sets. The distribution of the number of steps needed for each set is shown in Fig. 3.

A histogram showing the proportion of molecules that could have a retrosynthetic path predicted by the model in Schwaller et al. (2019b). Random samples of 100 molecules from each of our models and from 100 random molecules drawn from ChEMBL were used.
Overall, compared to the baseline set, CLigOpt generated molecules exhibit promising results in synthesizability regarding retrosynthetic analysis. Most molecules can be synthesized within four reaction steps. CLigOptCensNet-generated molecules show better synthesizability, managing most retrosynthetic routes in 1 or 3 steps. ChEMBL and CLigOptGCN generated molecules show similar results in the amount of molecules that can be made within two steps, both achieving a peak at three steps. In comparison, more CLigOptGCN generated molecules require an additional step to synthesize than the ChEMBL molecules. We prioritize molecules from the two generated sets according to their QED, SA score, and pIC50, and provide the predicted retrosynthesis route for the best prioritized molecule in CLigOptCensNet in Fig. 4.

The two-step predicted retrosynthesis route for the best prioritized molecule (confidence: 0.92) in CLigOptCensNet. The molecule was prioritized according to its QED (0.655439), pIC50 (7.231879), and SA (2.643538).
4 Conclusion
In this work, we introduce, CLigOpt, a pipeline for efficient optimized molecule generation that contains a CGM, in which the generative process is restricted via rejection sampling. CLigOpt shows advantages over existing optimization approaches in terms of reduced computational complexity, as it does not require heavy pre-training, surrogate model fitting, or policy learning. From comparing the quality of CLigOpt generated molecules with and without the property control module, we show that CLigOpt is an effective yet simple sampling scheme that improves the generation performance significantly and can handle multi-objective criteria well.
The evaluation of results is based on a variety of metrics that reflect the chemical and structural properties of the generated molecules. CLigOpt obtains consistent and strong performance across all six metrics. Performance of CLigOpt is evaluated with two encoders, GCN and CensNet. CLigOptCensNet shows slightly better performance than CLigOptGCN in novelty, uniqueness, and ring aromaticity, suggesting that employing co-embedding for nodes and edges provides more information than sole node embedding, and can also improve performance for the ensuing generative model.
The applicability of CLigOpt is illustrated via a case study on generating hDHFR ligands, evaluated through binding free energy simulation and retrosynthesis prediction. Generated molecules show similar binding compared to known hDHFR inhibitors, indicating the potential of CLigOpt to generate promising lead candidates. Future work can focus on a set of systematic optimization objectives to further restrict and optimize searching molecular space.
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest
None declared.
Funding
Y.L. and X.C. are funded by the China Scholarship Council. P.H.d.C.A. acknowledges funding through KCL/A*STAR (to M.W. and S.T.). M.W. is partially supported by A*STAR’s Decentralised Gap funding [I23D1AG081]. S.T. acknowledges funding from the British Skin Foundation [006/R/22] and the UK Royal Society [IES\R2\222084]. This paper was published as part of a supplement financially supported by ECCB2024.
Data availability
Details for data related to this article can be found in: https://github.com/yutongLi1997/CLigOpt-Controllable-Ligand-Design-through-Target-Specific-Optimisation.