Revolutionizing GPCR–ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery

Abstract G-protein coupled receptors (GPCRs), crucial in various diseases, are targeted of over 40% of approved drugs. However, the reliable acquisition of experimental GPCRs structures is hindered by their lipid-embedded conformations. Traditional protein–ligand interaction models falter in GPCR–drug interactions, caused by limited and low-quality structures. Generalized models, trained on soluble protein–ligand pairs, are also inadequate. To address these issues, we developed two models, DeepGPCR_BC for binary classification and DeepGPCR_RG for affinity prediction. These models use non-structural GPCR–ligand interaction data, leveraging graph convolutional networks and mol2vec techniques to represent binding pockets and ligands as graphs. This approach significantly speeds up predictions while preserving critical physical–chemical and spatial information. In independent tests, DeepGPCR_BC surpassed Autodock Vina and Schrödinger Dock with an area under the curve of 0.72, accuracy of 0.68 and true positive rate of 0.73, whereas DeepGPCR_RG demonstrated a Pearson correlation of 0.39 and root mean squared error of 1.34. We applied these models to screen drug candidates for GPR35 (Q9HC97), yielding promising results with three (F545-1970, K297-0698, S948-0241) out of eight candidates. Furthermore, we also successfully obtained six active inhibitors for GLP-1R. Our GPCR-specific models pave the way for efficient and accurate large-scale virtual screening, potentially revolutionizing drug discovery in the GPCR field.


Introduction
G protein-coupled receptors (GPCRs) are a crucial family of membrane proteins that play a critical role in signal transduction, regulating numerous physiological processes in humans, such as neurotransmission, secretion, cellular differentiation, growth, inf lammation and more [1][2][3].Many diseases are associated with GPCRs, and approximately one-third of approved drugs are designed to interact with GPCRs [4].However, many diseases associated with GPCRs still lack approved drugs that can effectively modulate them, thus underscoring the enormous potential of GPCRs as novel targets for disease curing.Unfortunately, the limited availability of GPCR-ligand structures (below 500) [5] and the difficulty in obtaining reliable GPCR structures because of their embedding in the lipid membrane pose significant challenges for developing structure-based GPCRligand prediction models.It is worth noting that the unique properties of GPCRs compared to soluble proteins in proteinligand binding further complicate the modeling process.
In recent years, deep learning has emerged as a powerful tool for predicting protein-ligand interaction, thanks to the availability of large protein-ligand datasets like PDBbind [6] and BindingDB [7].Numerous generalized protein-ligand prediction models have been developed, including structure complex-based models like DeepBindBC [8], 3D fusion model [9], PointNet [10], PointTransformer [10] and pafnucy [11], as well as non-structure complexbased models like dense fully connected neural network (DFCNN) [12], DeepLPI [13], DeepDTAF [14], CAPLA [15] and GraphDTA [16].However, GPCR proteins are embedded in the lipid membrane, meaning their physical-chemical environment differs from other soluble proteins [17].As a result, the characteristics of GPCRligand interactions are distinctly different from those of other protein-ligand interactions, and in some cases, the interaction rules are opposite.Currently available generalized protein-ligand models have poor performance on GPCR-ligand prediction, as highlighted in recent research [12].For example, the DFCNN model's virtual screening over GPCR is inferior to other types of proteins and sometimes even demonstrates worse performance than random predictions [12].Hence, there is a necessity to develop a GPCR-ligand-specific model that exclusively trains on the GPCRligand dataset and implicitly accounts for the lipid effect.Directly training such a model with the limited availability of GPCRligand structures is not feasible in the current scenario.Moreover, generalized models prove to be ineffective for GPCR due to the distinctive binding properties of GPCR-ligand interactions.
Therefore, developing a model that can accurately predict GPCR-ligand interactions is highly attractive and necessary.However, the available three-dimensional (3D) structures for GPCRligand complexes are limited (<500).At the same time, most of the current protein-ligand predictions are based on learning experimental obtained 3D protein-ligand complexes, such as pafnucy [11], DeepBindBC [8] and DeepBindRG [18].Even worse, the quality of GPCR-ligand is poor because of its difficulty in crystallizing; its native state is in lipid membranes with f lexible conformation.Some protein-ligand prediction models do not rely on protein-ligand 3D complexes, such as DFCNN [19], but DFCNN lacks spatial information about the binding pocket, which may be critical for those GPCR-ligand binding.Since the GPCR-ligand interaction pattern differs from other types of proteins, it would be impractical to use transfer learning for the GPCR-ligand interaction from other types of protein-ligand 3D datasets.Conversely, GPCR-ligand binding information is abundant; for instance, the GLASS (GPCR-Ligand Association) database has collected more than 100 000 protein-ligand pairs with affinity information.Considering the above, developing a non-structurebased GPCR-ligand interaction model would be an optimal option.Here, we focused on training GPCR-ligand interaction models by implementing the protein pocket and ligand information with the graph representation.By adopting this methodology, the model achieves independence from the 3D protein-ligand complexes.
In this study, we gathered GPCR-compound pairs from the GLASS database [20] and focused on the extracted GPCR structures and their known pockets.Finally, we employed these GPCR-ligand pairs to train and validate our models and obtained DeepGPCR_BC and DeepGPCR_RG for binary classification and affinity prediction.Additionally, we used a separate test set of eight GPCRs with known active compounds, using modeled structures and predicted pockets.Our two models demonstrated high accuracy, indicating their ability to screen potential drugs for GPCRs even when experimental structures are unavailable.In performance metrics, both DeepGPCR_BC and DeepGPCR_RG showed clear advantages over Schrödinger and vina docking methods in terms of performance.We integrated our computational models with docking techniques to identify possible active agents targeting Q9HC97 (GPR35) and GLP-1R.Two distinct strategies were employed: Strategy A involved the use of DeepGCPR_BC, DeepGPCR_RG and Schrödinger docking, while Strategy B utilized DeepGPCR_RG in conjunction with Schrödinger docking.From the eight candidates chosen through Strategy A for experimental validation, three demonstrated activities against GPR35 (compounds F545-1970, K297-0698 and S948-0241).For GLP-1R, Strategy A yielded five active compounds out of 12 candidates tested, and Strategy B produced one active compound from three tested.Additionally, molecular dynamics (MD) simulations were conducted to investigate the binding dynamics and atomistic interactions of the three active GPR35 compounds.The findings underscore the potential of our method GPCR-targeted drug screening, particularly when integrated with docking and MD simulation techniques.

Data collection
The GPCR-ligand pairs for this study were sourced from the GLASS database [20], with ligand molecules in SMILES format converted to 3DSDF format using the RDKit tool [21].For DeepGPCR_BC and DeepGPCR_RG, we have different treatments as follows.

DeepGPCR_BC
GPCR-ligand pairs with binding affinities characterized by IC50, Ki or Kd values smaller than 4 nM were considered positive data, indicating strong binding affinity.In contrast, pairs with IC50, Ki or Kd values larger than 4000 nM were considered negative data, indicating weak or non-binding.We have selected cutoff values of 4 and 4000 nM for several important reasons.Firstly, these thresholds assist in maintaining a balanced distribution between our positive and negative datasets.Secondly, our objective is to pinpoint ligands with robust binding affinities to GPCRs, as these are likely to signify substantial biological interactions.Thirdly, ligands with affinities falling between 4 and 4000 nM often exhibit weaker binding, making it challenging to accurately classify them as binders or non-binders.The corresponding Protein Data Bank (PDB) structure for each GPCR UniProt ID was retrieved by ID Mapping [22] and downloaded from the PDB database [23].The representative PDB structure with a known ligand was selected for each GPCR.Residues within 0.6 nm of the known ligand were extracted as the protein pocket.

DeepGPCR_RG
Binding affinities of GPCR-ligand pairs are used as training labels.The corresponding structures for each GPCR uniport ID were retrieved from the Alphafold database [24].The predicted ligands were obtained by the COFACTOR [25], with selection criteria favoring those closest to the N-terminus of the GPCR.The N-terminus of a GPCR, typically extracellular, often contains the native ligand binding region.This region is responsible for recognizing and binding specific ligands, such as neurotransmitters or hormones, which activate the receptor and initiate signaling pathways within the cell.The extracellular N-terminus may also play a role in receptor activation and stabilization.The residues within 0.8 nm of the predicted ligand were extracted as the protein pocket.
Utilizing AlphaFold-predicted GPCR structures offers a significant advantage by ensuring most data include structural information, thereby increasing quantity of available data.

Data preparation
To prepare the data, we first transformed the protein pocket into a graph representation with the backbone carbon coordinates as the residue positions, designating residues as nodes and contacting residue pairs as edges with a cutoff set as 0.5 nm.The protein pocket was defined based on a cutoff from the known ligand, retaining any residue whose atoms fell within this cutoff as part of the pocket.Cutoff values were set at 0.6 nm for DeepGPCR_BC, in line with DeepBindGCN_BC [26] and at 0.8 nm for DeepGPCR_RG, mirroring DeepBindGCN_RG [26].Subsequently, we created a feature vector for each node using a 30-dimensional molecular vector trained using mol2vec [27].We used a parallel approach for the ligands, converting it into a graph representation with atoms as nodes and bonds as edges.We described each atom node with a one-hot-like representation, similar to the methodology implemented in GraphDTA [16].

Training, test1 dataset, test2 dataset and extra testing set
The data training, test1, test2 and extra test set are for DeepG-PCR_BC and DeepGPCR_RG (Table 1).

DeepGPCR_BC
Our study comprised 35 000 training datasets and 3531 test1 sets, as shown in Figure S1.Although the protein-compound pairs in the test1 set were not included in the training dataset, the protein has been found in the training set, indicating that the test1 dataset is not fully independent.For a more accurate evaluation, 1668 GPCR-compound pairs were chosen for the test2 set, ensuring full independence from the training dataset.All GPCRs lacking structural information were excluded from the training, test1 and test2 datasets.In the test1 set and validation, while the specific GPCR-ligand pairs were not found in the training, the GPCR may have formed pairs with other ligands.It is likely that the model has effectively learned this GPCR feature potentially leading to high-accuracy predictions compared to those instances involving GPCR proteins not previously encountered in the training dataset.The test2 set posed a greater challenge, as not only GPCR-ligand pairs but also the GPCRs themselves were not in the training set, mirroring real-world scenarios with novel GPCRs.However, predicting the structure and pocket of these novel GPCRs could significantly expand the size and diversity of the testing set.This would be particularly meaningful for applications, where the structure and pocket of GPCRs are often identified.We selected the 16 largest protein-related datasets to create an extra test set containing 9987 protein-ligand pairs.We downloaded AlphaFold2 [28] predicted proteins from the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/).Ligand cofactors were modeled using COFACTOR [29].COFACTOR's ligand-binding prediction involves identifying functional homologies from a non-redundant set of BioLiP templates [30].Next, ligands are superposed to the predicted binding sites.Finally, the consensus binding sites are obtained by classified all ligands superposed to the query structure.In this work, we use COFACTOR within the local version of I-TASSER with the protein PDB file as the input.We then extracted residues within 0.6 nm of the selected ligand to form a pocket.

DeepGPCR_RG
For DeepGPCR_RG, we compiled 186 000 training datasets and 8408 test sets (Table 1).In analogue to DeepGPCR_BC, we selected the 16 largest protein-related datasets as an extra test set, comprising 51 525 protein-ligand pairs.The availability of Alphafold2 modeled structures, more abundant than GPCR structures from the PDB database, results in a larger number of GPCR proteins in the DeepGPCR_RG dataset compared to DeepGPCR_BC.Furthermore, unlike DeepGPCR_BC, DeepGPCR_RG does not exclude the protein-ligand pairs with binding affinities between 4 and 4000 nM.Consequently, the total data used to train DeepGPCR_RG is significantly larger, encompassing 186 000 pairs compared to DeepGPCR_BC's, which are 35 000 (Table 1).

Model construction
Our model structure consists of two inputs, drug-target pair and one output structure (Fig. 1A and C).The ligand and pocket information are fed into two separate layers of a graph network.The outputs of the two graph networks are then merged into fully connected layers, culminating in a single node as the final output.For DeepGPCR_BC, the sigmoid activation function for binary prediction was employed, returning values between 0 and 1.Conversely, for DeepGPCR_RG, linear function for affinity prediction was implemented, yielding real values.The ReLU activation function was chosen as the activation function for each layer, except for the final node in the neural network.A dropout operation, with a rate of 0.2, was used following the pocket GCN layer, ligand GCN layer and after the second merge layer.

Model training
The torch_geometric module generated input data and constructed the graph neural network.The input data were saved in PyTorch InMemoryDataset format.The PyTorch was operated for doing training.For DeepGPCR_BC, BCELoss (Binary Cross Entropy Loss) was selected as the loss function with the Adam optimizer.Similarly, for DeepGPCR_RG, MSEloss (mean squared error loss) was chosen as the loss function, also employing the Adam optimizer.The learning rate was 0.0005, and the total number of epochs was set to 2000.The model was saved at every

Virtual screening
We performed virtual screening against the ChemDiv database using the DeepGPCR_BC, DeepGPCR_RG and Schrödinger docking with Q9HC97 (GPR35) and GLP_1R serving as representative examples.The Chemdiv database, provided by ChemDiv company (https://www.chemdiv.com/),contains approximately 1 500 000 compounds, most of which are purchasable from ChemDiv Company.For this purpose, the GPR35 sequence was sourced from UniProt, and 3D structure models were constructed using AlphaFold2.The binding sites were predicted using COFACTOR, with the selection of the binding site candidate nearest to the N-terminal of the protein.The structure of GLP_1R was from PDB structure (PDBID:7s15 [31]); the binding site was determined by the known ligand 82L within the PDB structure.We used the trained models as the core component during the screening and developed custom scripts.For each protein, multiple binding site candidates were considered and the screening process utilizing the selected binding site.The ChemDiv database was screened using the DeepGPCR_BC, DeepGPCR_RG and Schrödinger docking.The Schrödinger docking procedures are outlined in Supplementary Materials Section 1.

Performance metrics for binary classification and regression model
The

Dynamic mass redistribution assay
When the cells approached 90% conf luence, they were seeded in 384-well biosensor plates with a density of 1.5 × 10 4 CHO-GPR35 or 2.5 × 10 4 HEK293-GLP-1R cells/well and cultured for 24 h.The culture medium in the 384-well biosensor plates was replaced with 30 μl of Hank's balanced salt solution (1 × HBSS) and then further incubated inside the system for 1 h before measurement.For the dynamic mass redistribution (DMR) agonism assay, a 2-min baseline was first established, followed by adding compounds using the multi-channel pipette, and the compoundtriggered DMR responses were recorded for approximately 1 h.Subsequently, the baseline was re-established, zaprinast (a known GPR35 agonist) at a fixed concentration (100 nM) was added and the DMR responses induced were recorded for 1 h, while taspoglutide (a known GLP-1R agonist) at a fixed concentration (1 μM) was added, and the DMR responses induced were recorded for 90 min.For DMR antagonism assay, cells were initially treated with either an antagonist or compound for 1 h in GPR35 and 90 min in GLP-1R assays, respectively.Afterward, the baseline was reestablished, followed by adding zaprinast at a fixed concentration (100 nM) and taspoglutide at a fixed concentration (1 μM), and then monitoring the DMR responses induced by zaprinast for 1 h and taspoglutide for 90 min.All DMR responses were backgroundcorrected.

Results and discussion
The construction workf low of our two models is illustrated in Fig. 1.We represented pockets and ligands as graph representations for input and employed the graph convolutional network along with fully connected layers to train the model (Fig. 1B).The binary classifier model generates output values ranging from 0 to 1, wherein values closer to 0 signifying weak or no binding, while values closer to 1 denoting strong binding interactions.For the regression model, the output is a numerical value, where larger values correspond to stronger binding interactions.For performance evaluation, we employed 16 GPCR-related datasets as an extra test set.However, due to the unavailability of experimental 3D structures and binding pocket information for these GPCRs, we resorted to obtaining the AlphaFold-predicted 3D structures [33] and utilizing COFACTOR [34] to predict the ligand binding pocket (Fig. 1B).

Performance of DeepGPCR_BC model
We assessed the effectiveness of our DeepGPCR_BC model on both the training and testing datasets and summarized the results in Table S1 and Figure S2.Our findings revealed that the model's performance converged and remained relatively stable after approximately 100 epochs.Our model demonstrated excellent performance on the test set1 with an AUC of 0.97, TPR of 0.90, precision of 0.90, accuracy of 0.92 and MCC of 0.84 at epoch 2000.
However, the high performance on the test1 dataset may be due to the inclusion of its protein in the training set, implying that the model had fully learned its pocket feature during training.Therefore, this dataset is not entirely independent of the training set.To assess the model's ability to generalize the non-trained protein pockets, its performance on the test2 dataset was evaluated.The performance metrics in test set2 at epoch 2000 revealed a decrease, with AUC, TPR, precision, accuracy and MCC, dropping to 0.72, 0.46, 0.46, 0.70 and 0.16, respectively (Table S2).Since the test set2 only contains one protein (P29274), it provides a limited scope for fully evaluating the model's reliability; we afterwards test the performance on a more diversified and fully independent extra dataset.
The selected proteins with the predicted pockets and the performance are shown in Fig. 2 and Table 2, representative.The model achieved metrics 0.72, 0.73, 0.73, 0.68, 0.37 and 0.64 for all the datasets for AUC, TPR, precision, accuracy, MCC and F1 score, respectively.Upon evaluating individual protein performance, the model demonstrated optimal results for P20309 and P35372.Notably, DeepGPCR_BC outperforms the Schrödinger's Glide and AutoDock Vina docking as detailed in Tables 3 and S3.It should be noted that the DeepGPCR_BC score is the output score with a range between 0 to 1.In contrast, the Schrödinger and Autodock Vina score represent the predicted binding affinity of the ligand to the receptor in a continuous unit, kcal/mol.Here, we set −6 kcal/mol as the cutoff distance, the values larger than -6 kcal/mol were assigned a value of 0, indicating nonbind, and those scores less than and equal to −6 kcal/mol were given a value of 1, indicating able to bind.In this way, we can evaluate their performance with evaluation metrics of AUC, TPR, precision, accuracy and MCC.The Schrödinger dock yielded poor performance, with an AUC of 0.45, TPR of 0.29, precision of 0.43, accuracy of 0.45, MCC of -0.10 and F1 score of 0.35 across all datasets.The poor performance of the Schrödinger dock could be attributed to the use of generalized protein datasets primarily comprising soluble proteins, which have significantly different physical-chemical properties than GPCR proteins.Similarly, Autodock Vina also exhibited poor performance, with an AUC of 0.49, TPR of 0.78, precision of 0.50, accuracy of 0.49, MCC of −0.02 and F1 score of 0.61 for the entire dataset (Table S3).For a fair comparison with Autodock, we adopt cutoff values of −5 kcal/mol and −7 kcal/mol, as illustrated in Tables S4 and S5.However, these results demonstrate significantly poorer performance compared to DeepGPCR_BC.Regarding the training data, a significant improvement in performance was observed between the 200th and 2000th epoch.The RMSE decreased from 0.72 to 0.64, showing a reduction in the variance of the prediction errors.The Pearson correlation showed a slight increase from 0.84 to 0.87, signifying a strong linear relationship between the actual and predicted outputs.The performance on the test data remained relatively stable across different epochs (Table S6).Indicating that although the model's performance on training data improved with successive epochs, its ability to generalize to unseen data did not significantly improve over the observed period.Furthermore, DeepGPCR_RG's performance was assessed on an additional test set composed of data.This model's performance on the additional test set was subsequently compared to that of Schrödinger and Vina docking methods on the same datasets.Performance measures for DeepGPCR_RG on the extra test datasets are depicted in Table 4. Overall, DeepGPCR_RG showed average RMSE and MSE values of 1.34 and 1.80, respectively.Pearson's and Spearman's correlation coefficients averaged at 0.39 and 0.35, respectively, across all datasets.This suggests a moderate monotonic and a linear relationship between actual and predicted outputs.We have computed the percentage deviation (Prmsd) from the experimental data using the following formula:

Performance of DeepGPCR_RG model
where The computed percentage deviation (Prmsd) is 0.19 when using all the data, shown in Table 4.
In contrast, Schrödinger's docking method showed higher average RMSE and MSE values, illustrating a larger discrepancy between the predicted and actual values.Average Pearson and Spearman correlation coefficients were both negative −0.04, in Table S7, suggesting that the Schrödinger's ability in predicting dataset values was weaker than to DeepGPCR_RG.Lastly, the performance of the Vina docking method as shown in Table S8, also demonstrated higher average RMSE and MSE values.Like Schrödinger's docking method, both Pearson's and Spearman's correlation coefficients are negative (average −0.06 and −0.07, respectively).
In summary, on the additional test datasets, DeepGPCR_RG was superior to the other two methods, demonstrating lower error rates and better correlation coefficients.This illustrates that DeepGPCR_RG enhanced prediction accuracy and its stronger alignment with actual values compared to Schrödinger and Vina docking methods.

Screening against target Q9HC97 (GPR35) by DeepGPCR_BC, DeepGCPR_RG and Schrödinger
Q9HC97 was chosen to demonstrate the applications of our models in screening potential therapeutic compounds.GPR35 has been identified as a potential target for various diseases [35].The screening procedures are shown in Fig. 3. Using our DeepGPCR_BC and DeepGPCR_RG models, we screened 102 592 candidates with a DeepGPCR_BC score ≥ 0.999 and a DeepGPCR_RG score ≥ 9 for Q9HC97.We also calculated the Schrödinger score for those candidates.The DeepGPCR_BC score ≥ 0.999, DeepGPCR_RG ≥ 9 and Schrödinger score ≤ -6.7 Kcal/mol (Table 5) and those candidates are selected for final experimental validation.
To further investigate the reliability of using DeepGPCR_RG and Schrödinger's software independently, we conducted experimental validation over another candidate list by using DeepGPCR_RG and Schrödinger (DeepGPCR_RG ≥ 10, Schrödinger score ≤−6.35 Kcal/mol) (Table S9).Together with Table 5, a total of 12 candidates were forwarded for experimental validation; the structures of those compounds are shown in Figure S3.

Screening against target GLP-1R by DeepGPCR_BC, DeepGCPR_RG and Schrödinger
We selected GLP-1R to further validate our model's capability in screening potential therapeutic compounds.GLP-1R, identified as a potential target for various diseases including cancer, was our focus.The screening procedures are shown in Figure S4.Using our DeepGPCR_BC and DeepGPCR_RG models, we screened 158 candidates with a DeepGPCR_BC score ≥ 0.999 and a DeepG-PCR_RG score ≥ 9.5 for GLP-1R.We also calculated Schrödinger score for these candidates.Selection criteria were set at a DeepGPCR_BC score ≥ 0.999, DeepGPCR_RG ≥ 9.5 and Schrödinger score ≤ −8.7 Kcal/mol (Table 6).For further investigation, the standalone reliability of DeepGPCR_RG and Schrödinger's software, we select three based on a DeepGPCR_RG ≥ 10, Schrödinger score ≤ −6.35 kcal/mol) (Table S10).Combining the results from with Table 6, a total of 15 candidates were selected for experimental validation; the structures of those compounds are shown in Figure S5.

Characteristics of selective candidates on GPR35
We applied the GPR35 overexpressed CHO cells to assess the activity of selective candidates as well as zaprinast, an endogenous ligand of GPR35.Among the 12 candidates, S948-0241, K297-0698 and F545-1970 exhibit desensitization effects on the GPR35 receptor when stimulated by the agonist zaprinast, and they do not show a significant increase in DMR signal in CHO-GPR35 cells, indicating that they possess GPR35 receptor antagonistic activity, with relatively weak activity and IC50 values around 30-80 μM (Fig. 4A-D).E014-0043, C301-4662, V026-2353 and G219-1242 only exhibit desensitization effects on the GPR35 receptor when stimulated at high concentrations with poor desensitization effects, and they do not show a significant increase in DMR signal in CHO-GPR35 cells, indicating that they have weak GPR35 receptor antagonistic activity (Fig. 4E-H).E146-0380, D103-0816, L311-0042, V011-4148 and E216-0947 do not exhibit desensitization effects or only show weak desensitization effects when stimulated by the agonist zaprinast on the GPR35 receptor, indicating an absence of GPR35 receptor activity (Fig. 4I-K).These results suggest that 3 out of 12 candidates possess significant selective antagonistic activity targeting GPR35.

Activity validation of selective candidates on GLP-1R
Next, we utilized the GLP-1R overexpressed HEK293 to test the activity of selective candidates for GLP-1R, along with taspoglutide, a positive control as a GLP-1 agonist.Among the 15 selective candidates, V005-2405 exhibits desensitization effects on the GLP-1R receptor when stimulated by the agonist taspoglutide.This effect is dose-dependent, with an IC50 value of 9.60 μM (Fig. 5A  and B).Additionally, V005-2405 does not induce a DMR signal, indicating that V005-2405 possesses GLP1R receptor antagonistic activity.C700-1841 and G764-0921 exhibit desensitization effects on GLP-1R when stimulated by the agonist taspoglutide (partial desensitization, approximately 40% inhibition) (Fig. 5C and D).This effect is dose-dependent, and C700-1841 and G764-0921 do not induce a DMR signal, indicating that they possess partial GLP-1R antagonistic activity.S954-5266 only exhibit desensitization effects on GLP-1R when exposed to high concentrations at 200 μM of the agonist taspoglutide.At other concentrations, it does not inhibit taspoglutide activity and induce a DMR signal.This suggests that S954-5266 has GLP1R receptor antagonistic activity at 200 μM (Fig. 5E).V009-0856 and V027-3795 exhibit desensitization effects on GLP-1R when exposed to high concentration with 200 μM of the agonist taspoglutide.However, they either do not induce a DMR signal or exhibit a weak signal in GLP-1R-HEK293 cells, indicating that these two compounds have GLP-1R receptor antagonistic activity at 200 μM (Fig. 5F and G).The remaining compounds either do not have GLP-1R activity or exhibit weak GLP-1R activity (Fig. 5H-P).These findings indicate that 6 out of the 15 candidates exhibit noteworthy selective antagonistic activity against GLP-1R.

Detailed analysis of the GPR35 with those identified active compounds
In Fig. 6, we present the docking interactions of GPR35 with three active molecules and a known active control molecule in both  3D and two-dimensional (2D) representations.The interaction between GPR35 and K297-0698 is shown in Fig. 6A.The primary interactions between the control compounds and proteins are characterized by hydrophobic, electrostatic and polar forces.TYR259, LEU97 and PRO176 predominantly engage in hydrophobic interactions with the cyclohexane moiety of compound K297-0698.The positive charged ARG151 establishes a robust π -cation interaction with the pyrimidine ring of compound K297-0698, while the charged interaction between ARG240 and the chlorine atom of the compound facilitates the formation of hydrogen bonds between the oxygen atom in the structure of PHE163 and the oxygen atom of the amide bond in compound K297-0698.The interactions between GPR35 and F545-1970, S948-0241 and zaprinast are depicted in Fig. 6B-D, respectively.Further details regarding their specific interactions can be found in Supplementary Material Section 3.

Detailed analysis of the GLP-R1 with those identified active compounds
In Fig. 7, we present the docking interactions of GLP-1R pocket with three active molecules in detailed 3D representations.The overall binding view and 2D interaction plot can be found in Figure S6.Fig. 7A shows the interactions between GLP-1R and C700-1841, highlighting that hydrophobic interactions are dominant.Hydrophobic residues, including TRP4, LEU188, TRP174 and LEU172 are closely contacted with the C700-1841.The interactions between GLP-1R and G764-0921 are shown in Fig. 7B.The compound and protein predominantly engage in hydrophobic, electrostatic and polar interactions.TRP174, PHE352 and PHE201 are primarily involved in hydrophobic interactions.Notably, PHE201 establishes π -π interactions with the benzene-like ring of the compound.Polar and charge residues CYS197 and LYS168 also have electrostatic and polar interaction with the compounds.The interactions between GLP_1R and S954-5266 are shown in Fig. 7C; residues GLN192, LEU172, TRP 174, TRP 4, SER2 and GLU109 have close contact.Consequently, the interaction between the compound and the protein is structurally characterized by hydration.
The V005-2405 contacted with GLP-1R significantly, due to its larger size, shown in Fig. 7D, has larger interactions with many hydrophobic residues such as LEU, TRP and PHE.Some polar residues, GLN192, CYS267 and THR269, also contributed to the binding interactions.The charge residue LYS168 may also interact with the carbon atom of trif luoromethyl of the compound.V009-0856 has formed hydrophobic and polar interactions with GLP_1R, shown in Fig. 7E.Hydrophobic residues, such as LEU3, TRP4 and PHE352, formed interaction with the compound.The polar residue SER2 is in close contact with the compound and may form hydrogen bonding with the f luorine atom on the benzene ring of the compound.
V027-3795 has formed hydrophobic and polar interactions with GLP-1R, shown in Fig. 7F.The residues containing a benzene ring, such as TRP4, TRP174, PHE201 and PHE352, and polar residues CYS267 and GLN192 have closely contacted with the compound.Notably, TRP174 has formed a π -π interaction with the cyclohexane benzene-like ring of the compound.
In summary, these six active compounds have similar binding style with the binding site, and the interactions are mostly dominated by the hydrophobic interaction, especially the πrelated interactions.The most frequently involved interacting residues, TRP174 and TRP4, reveal their importance upon interactions.

Comparing the performance of DeepGPCR_BC with other deep learning-based protein-ligand binary prediction methods
To compare the DeepGPCR_BC with other generalized proteinligand binary prediction methods, we compared the performance of DFCNN [12,19,36] and DeepBindGCN_BC [26] on the 16 extra test cases.Although previously, we have reported that DFCNN has good performance in drug screening tasks for many solvable proteins, we found that it has notably poor performance in most of those 16 GPCR cases, listed in Table S11.In analogue to DeepBindGCN_BC, which shares a similar model architecture but is trained on the PDBbind database primarily consisting of soluble proteins, also showed poor performance in these 16 GPCR cases, listed in Table S12.The underwhelming performance of DFCNN and DeepBindGCN_BC strongly underscores the critical importance of training models like DeepGPCR_BC on GPCR-ligand pairs for creating effective GPCR-ligand prediction models.
In parallel, we compared DeepGPCR_RG with the previously DeepBindGCN_RG, a graph convolutional network model trained on the PDBbind database.In order to compare it with the  specialized DeepGCN_RG model developed for GPCR, we tested its performance on additional data (Table S13).As anticipated, DeepBindGCN_RG, having been trained on a large number of water-soluble proteins and few GPCRs, performed significantly inferior performance compared to DeepGPCR_RG on GPCR targets (Table S13) within the extra test set.

Problem encountered when using DeepGPCR_BC and Schrödinger
Using DeepGPCR_BC and Schrödinger for small molecule screening offers certain advantages, a major limitation arises due to the extensive list of potential candidates.When thousands of predicted values are close to 1, it becomes difficult to select candidate molecules within a smaller range.Although clustering might be attempted to address this issue, the class centers may not fully capture the diverse attributes of other molecules within the same class, which could lead to the loss of effective molecules.To show this issue, we performed a small molecule screening of three proteins (O14626, O95800 and Q9HC97) using only DeepGPCR_BC and Schrödinger.We found that relying solely on DeepGPCR_BC and Schrödinger for small molecule screening results in a large data set, making it difficult to identify candidate molecules within a narrow range.Therefore, in practical applications, a more refined screening method should be adopted and combined with other tools for comprehensive analysis to ensure optimal screening performance.Detailed screening processes and results are provided in Supplementary Material Section 4, Table S14 and Figure S7.

Importance of combining DeepGPCR_BC and DeepGPCR_RG
As described in the Results and Discussion section, in order to further investigate the reliability of DeepGPCR_RG and Schrödinger's software, we conducted experimental validation on a separate list of GPR35 candidates using DeepGPCR_RG and Schrödinger (Deep-GPCR_RG > 10, Schrödinger score ≥ −6.35 kcal/mol) (Table S9).Remarkably, none of the six compounds screened illustrated any activity, depicted in Fig. 5.This outcome strongly underscores the significance of integrating both DeepGPCR_BC and DeepGPCR_RG for effective screening of GPR35.Therefore, our findings reinforce the necessity of a comprehensive approach that synergizes multiple tools and methodologies in the process of compound screening, significantly enhancing the likelihood of identifying active compounds.

Evaluating the reliability of Alphafold2 predicted GPCR structure
Since the GPCRs of the 16 extra test cases were all modeled by Alphafold2, it is necessary to check the accuracy of the predictions of Alphafold2 relative to the correct GPCR structure, especially the reliability of the pocket region.Here, we selected 62 cases with known PDB structures and share a sequence identity larger than 0.83 with our target sequence.We used the TMalign tool [37] to evaluate the aligned RMSD between the Alphafold2 predicted structures and the PDB database counterparts (Table S15).The results indicated that most GPCRs align closely with experimental structures, exhibiting a low average RMSD of 2.19 Å, which suggests a high degree of accuracy in Alphafold2's predictions.

The potential role of MD simulations in analyzing interactions
MD simulation for the GPCR with membrane lipids and ligands is relatively complicated and not suitable for large-scale screening.However, they can be particularly valuable in late-stage screening or analysis.To explore the accurate interaction details of GPR35 with the three known active compounds, (F545-1970, K297-0698 and S948-0241), we carried MD simulations and metadynamics.The comprehensive simulation procedures are outlined in Supplementary Material Section 5. We observed a consistently stable binding during the MD simulations.The analysis result of MD and metadynamics simulation for GPR35 binding with the three compounds is shown in Figure S8.The calculated RMSD values along the 40 ns simulation time are relatively small, stable at around 0.2 nm, for K297-0698 and S948-0241, shown in Figure S8B, indicating quite stable binding.There are slight larger RMSD f luctuations for F545-1970, around 0.2 ∼ 0.6 nm, but compensated with a higher number of hydrogen bonds formed during the 40 ns simulation time, shown in Figure S8B.The calculated binding free energy landscape by metadynamics further supports the propensity of these three compounds to bind (Figure S8C).However, the simulation duration was relative short, and techniques like funnel metadynamics may be more suitable for free energy calculations.It is worth noting that an in-depth exploration of MD-related methods was beyond the scope of this study.

Future improvements
The DeepGPCR_BC and DeepGPCR_RG models, which predict GPCR-ligand interactions, can be enhanced in several ways.Firstly, incorporating more training data is crucial, as many protein-ligand pairs were excluded from the training set due to unknown ligand-binding pockets.Developing accurate methods to identify GPCR pockets would enable the inclusion of these data, expanding the training set and enhancing screening accuracy, especially for GPCR targets lacking known pockets.Secondly, diversifying input representations, such as incorporating pretrained information like mol2vec or Unimol for ligand representation, or utilizing molecule fragments as nodes with their corresponding molecular vectors as features, could improve model performance.Thirdly, utilizing more sophisticated model architectures, like adding attention layers, as explored by others in ligand property prediction, could further enhance performance.
Given the challenge of predicting long loop regions in GPCRs, we strongly advise users to truncate these loops during docking and MD simulations to reduce potential inaccuracies.Although our model primarily focuses on the pocket region, excluding loop areas, we recommend excising any long loop segments that may extend into the pocket region due to inaccuracies before pocket extraction.Looking ahead, as accurate prediction of GPCRligand complex structures becomes feasible, incorporating interface information could lead to improved accuracy in predicting GPCR-ligand interactions.
In the screening process, the Schrödinger scores were calculated in GPR35 screening.Results in Table 3 strongly indicating its performance of the GPCR-ligand prediction task is highly inaccurate.Therefore, it should be noted that the Schrödinger score has less reference value on the prediction, but valuable in generating GPRC-ligand complexes for MD simulations, and a more accurate affinity prediction method should be developed.Additionally, a more comprehensive pipeline could be constructed to categorize high-potential candidates step-by-step, for example, integrating methods such as MD simulations and metadynamics to identify reliable candidates.Finally, experimental validations will be essential in assessing the usefulness of such predictions.
Deep-learning-based methods usually demonstrating the black-box property, in fact, considering the interpretability of the DeepGPCR_BC and DeepGPCR_RG models in specific problemsolving is extremely important and guide to the potential drug candidate optimization.Here, we recommend two possible solutions to relieve the black-box property of DeepGPCR_BC and DeepGPCR_RG.First, the GCN model has an advantage in interpreting each node's contributions to the prediction.In other words, the atoms in ligands or pocket residues contributions can be revealed; then, the hot spot atom or residue can be detected.Users are able to achieve and visualize the atom contributions by RdKit tools.Researchers have provided several methods [39] with relevant scripts (https://github.com/biomed-AI/MolRep).The second way is to combine the DeepGPCR_BC and DeepGPCR_RG with the docking tools or MD simulation tools to further explore the atomic binding details and compare the binding pose with the known GPCR-drug complexes.Employing MD simulations to ensure the binding stability and applying funnel metadynamics [40] methods to calculate the binding free energy of interested protein-ligand pairs obtained by DeepGPCR_BC and DeepGPCR_RG are recommended.However, such a method is relatively time and resource-consuming, and GPCR simulation needs to incorporate a large number of lipids and solvents; the simulations are especially complicated.
The performance of test2 and additional test sets showed a decrease, highlighting the difficulty of generalizing to entirely new protein-ligand interactions.It's expected that deep learning models face challenges with fully independent data.The decline in performance on these independent datasets emphasizes the need for continued efforts to enhance model generalization in our future work.Strategies to achieve this include expanding the training dataset to encompass a wider range of proteinligand interactions and optimizing model parameters to prevent overfitting.We are exploring various avenues for future improvements, such as integrating attention mechanisms and incorporating diverse input features like unimodal properties to enrich our model's understanding and predictive capabilities.We will also investigate the feasibility and performance implications of using AlphaFold-predicted models for the binary classification task (DeepGPCR_BC) in our future research endeavors.Additionally, we aim to develop web servers for DeepGPCR_BC and DeepGPCR_RG to facilitate discrimination and prediction, enhancing accessibility for users.In future studies, we plan to use different classes of GPCRs as separate test sets to validate our model, which could provide insights into class-specific performance.Furthermore, identifying potential molecules specific to each class of GPCRs could be valuable for drug discovery efforts.
We also assessed models using cutoff values of 5 and 4000 nM to define the positive and negative datasets (affinity ≤ 5 as positive and affinity ≥ 3000 as negative).This evaluation will help determine the optimal cutoff for the current method.We utilized two models for this assessment: (1) the first model, which continues to utilize known PDB structures, ensuring reliability based on validated databases and (2) the second model, which utilizes predicted protein structures and predicted binding sites, expanding the diversity of our data sources and potentially enhancing our understanding of model robustness with unconventional data sources.The performance on the test set of both models showed convergence around 300 epochs, as depicted in Figure S9.The DeepGPCR_BC_5_3000_V1's performance on the independent test set of 16 protein cases slightly worse than that of DeepGPCR_BC, as shown in Supplementary Table S16.The performance of Deep-GPCR_BC_5_3000_V2 on an independent test set of 16 protein cases significantly surpasses that of DeepGPCR_BC, as demonstrated in Supplemental Table S17.This improvement strongly suggests that incorporating modeled structures, which greatly expand the dataset size, can significantly enhance the model's accuracy.

Conclusion
The GPCR is a critical drug target; the traditional protein-ligand interaction prediction software, however, is unsuitable for GPCR drug virtual screening due to its unique properties and environment.To narrow this gap, a specific GPCR-ligand interaction model has been developed in this work by learning the underlying interacting rules between the GPCR-ligand system, with inputting the GPCR pocket and ligand separately.The model employs graph representations denoting the protein pocket and ligand, with residues or atoms as nodes and contacting residues or bonds as edges.Each pocket and ligand input are submitted into the graphic neural network, and the final output is merged.The model has been trained on a huge GPCR dataset with no available GPCRligand complex structures.This advantage is significant over traditional docking or generalized models trained on mostly non-GPCR-ligand datasets.This model fully utilizes the spatial and physical-chemical features of GPCR pockets and known ligands, making it more suitable for screening active compounds for GPCR.The DeepGPCR_BC model has achieved an average of 0.72, 0.73 and 0.73 for AUC, TPR (recall) and precision, respectively.In the 16 fully independent test sets, our model exhibits a significantly superior performance compared to the scores from Schrödinger and Autodock Vina, with a cutoff of −6 kcal/mol.Notably, 9 out of the 16 cases achieved an AUC greater than 0.7, and 4 cases achieved an AUC above 0.8.It should be noted that a few cases have not performed well, partly due to the challenges of pocket identification.Also, the performance of our model can be further enhanced when more GPCRs with identified ligand pockets are available.The DeepGPCR_RG model has achieved an average 0.39 Pearson correlation, much better performance than Schrödinger and Autodock Vina, also superior than our previous DeepBindGCN_RG, in the 16-protein-related extra GPCR test dataset.Additionally, the development of an affinity prediction model would help narrow the candidate list and aid in obtaining more reliable and high-affinity candidates.Most importantly, we have built a screening pipeline using DeepGPCR models as core components and successfully applied to two important GPCR therapeutic target GPR35 and GLP_R1, resulting in three active compounds out of 8 candidates for GPR35 by strategy A, five active compounds out of 12 candidates for GLP_R1 by strategy A and 1 out of 3 candidates by Strategy B. Overall, the DeepGPCR_BC and DeepGPCR_RG models provide promising advancement in the field of GPCR drug discovery, facilitating the identification of novel GPCR drugs and enhancing the precision of GPCR drug virtual screening.

Figure 1 .
Figure 1.The workf low of DeepGPCR_BC and DeepGPCR_RG models construction and evaluation.(A) illustrates the input graphic representation, model architecture, output and performance evaluation metrics.(B) depicts the process of obtaining GPCR-compound pairs and a 3D binding pocket of GPCR.(C) The detailed layer information of the GCN model.(D) brief ly introduces the training label and the usage of the predicted label during application.

Figure 2 .
Figure 2. The modeled protein for the extra test set and the modeled ligands determines the pockets.Out of the 16 test cases, 6 achieved an AUC larger than 0.7, and 10 achieved an AUC larger than 0.6.It should be noted that none of the 16 proteins were included in the training set.

Figure 3 .
Figure 3. GPR35 screening pipeline and identification of active compounds.(A) Schematic representation of the stepwise screening process leading to the discovery of three active molecules.(B) Conformation of last frame from the MD simulation and 2D protein-ligand interaction diagrams for the identified active compounds.

Figure 6 .
Figure 6.The docking interactions of GPR35 with three active molecules and a known active control molecule in both 3D and 2D representations.(A) Residue-specific interactions between pocket residues and compound K297-0698.Interaction diagram of pocket residues with compound K297-0698.(B) Residue-specific interactions between pocket residues and compound F545-1970.Interaction diagram of pocket residues with compound F545-1970.(C) Residue-specific interactions between pocket residues and compound S948-0241.Interaction diagram of pocket residues with compound S948-0241.(D) Residue-specific interactions between pocket residues and known active compound zaprinast.

Figure 7 .
Figure 7.The predicted interactions of GLP_R1 with six active molecules from docking.(A) Residue-specific interactions between pocket residues and compound C700-1841.(B) Residue-specific interactions between pocket residues and compound G764-0921.(C) Residue-specific interactions between pocket residues and compound S954-5266.(D) Residue-specific interactions between pocket residues and known active compound V005-2405.(E) Residue-specific interactions between pocket residues and known active compound V009-0856.(F) Residue-specific interactions between pocket residues and known active compound V027-3795.

Table 1 .
The number of samples in training and several test sets of DeepGPCR_BC/RG performance metrics for the binary classification model are AUC (area under the ROC curve, where ROC stands for Receiver Operating Characteristic), TPR (true positive rate), Precision, Accuracy, MCC (Matthews correlation coefficient) and F1 score.The performance metrics for the regression model are RMSE (root mean squared error), MSE (mean squared error), Pearson (Pearson correlation coefficient), Spearman (Spearman correlation coefficient) and C-index (concordance index, CI).Detailed descriptions of this metrics are provided in Supplementary Materials Section 2.

Table 2 .
The performance of the DeepGPCR_RG model across different training epochs is detailed in TableS6.The training was performed over 2000 iterations, with performance metrics including RMSE, MSE, Pearson's correlation coefficient and Spearman's rank correlation coefficient recorded at intervals of every 200 epochs.DeepGPCR_BC performance on an extra dataset with modeled GPCR protein and predicted pocket

Table 3 .
The Schrödinger docking performance on an extra dataset with modeled GPCR protein and predicted pocket.We used −6 Kcal/mol as the cutoff, those scores > −6 Kcal/mol were assigned a value of 0 (indicating non-bind), and those scores ≤ −6 Kcal/mol were assigned a value of 1 (indicating able to bind)

Table 4 .
The DeepGPCR_RG performance on regression model's extra test datasets