Prediction of bitterness based on modular designed graph neural network

Abstract Motivation Bitterness plays a pivotal role in our ability to identify and evade harmful substances in food. As one of the five tastes, it constitutes a critical component of our sensory experiences. However, the reliance on human tasting for discerning flavors presents cost challenges, rendering in silico prediction of bitterness a more practical alternative. Results In this study, we introduce the use of Graph Neural Networks (GNNs) in bitterness prediction, superseding traditional machine learning techniques. We developed an advanced model, a Hybrid Graph Neural Network (HGNN), surpassing conventional GNNs according to tests on public datasets. Using HGNN and three other GNNs, we designed BitterGNNs, a bitterness predictor that achieved an AUC value of 0.87 in both external bitter/non-bitter and bitter/sweet evaluations, outperforming the acclaimed RDKFP-MLP predictor with AUC values of 0.86 and 0.85. We further created a bitterness prediction website and database, TastePD (https://www.tastepd.com/). The BitterGNNs predictor, built on GNNs, offers accurate bitterness predictions, enhancing the efficacy of bitterness prediction, aiding advanced food testing methodology development, and deepening our understanding of bitterness origins. Availability and implementation TastePD can be available at https://www.tastepd.com, all codes are at https://github.com/heyigacu/BitterGNN.


Introduction
Humans can perceive five major distinct tastes: sweet, bitter, sour, salty, and umami (Witt 2019).Bitterness is often associated with recognizing and preventing the consumption of toxic foods (Wooding et al. 2021).However, not all bitter tastes are harmful; some bitter compounds promote health, as seen in vegetables (Drewnowski and Gomez-Carneros 2000) and clinical drugs (Mennella et al. 2013), promoting appetite, increasing food flavor, and preventing overconsumption.Therefore, determining whether a compound is bitter is crucial in the screening process for sweeteners and bitterants.Since humans cannot taste all studied compounds, predicting bitterness using computational methods is particularly important.
In silico approaches for predicting bitterants are primarily divided into structure-based and ligand-based methods.However, the diversity of physicochemical properties and structural characteristics of the molecules studied, along with the complexity and ambiguity of receptor structures, make predicting bitterness challenging.One solution to this problem involves traditional machine learning (ML) techniques, including AdaBoost (AB) (Freund and Schapire 1997), support vector machines (SVM), and random forests (RF).BittersweetForest, a random forest model based on molecular fingerprints, achieved a 95% accuracy and 0.98 AUC in a cross-validation study (Banerjee and Preissner 2018).BitterX, which utilizes the SVM method, demonstrates good performance in predicting bitterness according to chemical descriptors (Huang et al. 2016), but the dataset for BitterX training was small and its website is not suitable for large-scale predicting.Dagan-Wiener et al combined physical and chemical descriptors with an "AdaBoost-based" ML classifier for bitter prediction (Dagan-Wiener et al. 2017), but many of the descriptors do not provide useful information that helps to interpret structural features.Another solution to this problem is deep learning (DL) approaches.Bo et al. conducted bitterness prediction using a multi-layer perceptron (MLP) based on molecular descriptors and molecular fingerprints, respectively (Bo et al. 2022).Additionally, a method was proposed to convert molecules into images and classify them as bitterants or not using a convolutional neural network (CNN).Fingerprint-based MLP performed the best but they didn't provide the code.Bitter peptide prediction using natural language processing (NLP) is sequence-based and not suitable for structure-based prediction for small molecules (Charoenkwan et al. 2021).
Moreover, most deep learning flavor prediction work focuses on sweetener prediction, making it necessary to develop deep learning models for bitterant prediction.Graph neural networks are gaining popularity for molecular property prediction and classification.GCN (Kipf and Welling 2017), GAT (Veli� ckovi� c 2023), GATv2 (Brody et al. 2022), were initially applied to node classification of graphs, without considering edge information.GCN assumes that all nodes contribute equally to the representation of a node, which may not always be the case.These methods can be simply transformed into entire graph classification, such as using DGLife (https://lifesci.dgl.ai/index.html).Molecular classification based on graph convolutional networks is a full graph classification problem, with nodes as atoms and edges as chemical bonds.Graphs and molecules seem to have good natural adaptability in structure.NFP first applied graph convolutional networks to molecular fingerprints, did not require fixed-size molecular input, and simply aggregated the information of edges and nodes (Duvenaud et al. 2015).Weave employed a hybrid learning approach for edges and nodes, guaranteeing isomorphism and proposing a molecular layer representation with less information loss in the readout function (Kearnes et al. 2016).Later, neural fingerprints made significant progress in predicting quantum chemical properties of molecules, such as SchNet (Sch€ utt et al. 2017), MPNN (Gilmer et al. 2017), MGCN (Lu et al. 2019).To improve graph neural networks, GIN (Xu et al. 2019, Hu et al. 2020a) achieved a new embedded learning approach, OGB (Hu et al. 2020b) virtualized nodes, and PAGTN\ (Chen et al. 2019) proposed new attention mechanisms.It was later suggested that AFP could more effectively capture atom information than Weave and NFP (Xiong et al. 2020).GraphSAGE (Hamilton Ying and Leskovec 2017) can generate node embeddings for previously unseen data, making it capable of handling dynamic graphs.It also uses a sampling strategy to control the number of neighbors, which helps to scale to large graphs.The sampling strategy can cause information loss and may not work well on graphs where a node's neighborhood is essential for its representation.There are also proposals for 3D fingerprint learning (Wang et al. 2020), but complex inputs may increase computational costs.It is worth considering whether combining multiple methods could result in a better graph neural network.
In this study, we propose a new graph neural network generation strategy and identify an excellent GNN model HGNN.Using HGNN and other well-performed GNNs, we built outstanding bitterness predictors and systematically analyzed the properties of bitterants.This work provides a method reference for graph neural network design and molecular taste recognition based on deep learning.

Flow chart
Our workflow can be shown in Fig. 1.First we design the GNN module with edge attention, and then merge it into the basic GCN architecture (Fig. 1C) with three other traditional GNN models GAT, weave and MPNN (Fig. 1B) to obtain a GNN containing four basic GNN modules (Fig. 1D), each of modules has a state of presence or absence and a fixed position, thus generating 16 (2 � 2 � 2 � 2) son models.The performance of the 16 son models and other 5 popular GNNs are compared on the public datasets (Fig. 1E), and the best 4 GNN are selected as the deep learning model for bitter prediction.The trained bitterness predictors are called BitterGNNs and will compare its performance with other existing bitterness predictors (Fig. 1F).

Modular design of GNN
The structure of the modular design can be seen in where a T e 2 R 1�d 0 , are learned, and k denotes vector concatenation that is sum here.These attention scores are normalized across all neighbour edges using softmax function, and the attention function is defined as: Then, edge attention computes a weighted average of the transformed features of the neighbor edges (followed by a non-linearity r) as the new representation of i, using the normalized attention coefficients: Canonical atom and bond features (Li et al. 2021) are used for node and edge embedding for all GNNs, including onehot encoding of 74 atom features and 12 bond features (Supplementary Table S1).
Four modules have a state of existence or absence, the passing message needs to be learned by the module when it exists, otherwise will pass directly when it does not exist.Therefore, 16 models can be generated according to the existence status of the 4 modules, and the sub-module status and meaning of these models can be seen in Table 1.The code is available at https://github.com/heyigacu/BitterGNN.
The 8 public datasets (Table 2) were used to evaluate our 16 GNN models and other 5 novel GNNs (WLN (Morris et al. 2018), PAGTN, AttentiveFP, Graph Transformer and GraphSAGE).We evaluated the performance of total 21 models using 5-fold cross-validation under the same parameters (Table S2), and it was repeated 5 times.Assessment metrics for regression and classification are provided in Supplementary Part S1.We selected three metrics of regression (R2, MAE, RMSE) and five metrics of classification (ACC, AP, F1, MCC, AUC) for evaluation.Since different indicators have their own evaluation characteristics, we adopted a comprehensive evaluation method, ranking the model in various measures (ranking in reverse order) and then taking an average value.The higher the score, the better the performance of the model.

Comparation of bitterness predictors
To test the performance of bitter predictors based on GNN, we compared them with the other mainstream predictor for predicting bitterness (Fig. 1F).We rewrote the code for CNN and MLP to predict bitter/non-bitter and bitter/sweet using python as much as possible based on the original article (Bo et al. 2022).Similarly, we re-implement BitterPredict based on Adapt boost, which requires the descriptor of the molecule generated by schrodinger's QikProp module as input.BitterX and BitterSweetForest are web-based versions that allow bitterness prediction at http://mdl.shsmu.edu.cn/BitterX/ and https://insilico-cyp.charite.de/VirtualTaste/,respectively.
In order to obtain a common molecular set as input for all comparison methods, we cleaned the original bitter dataset   S2.

Post analysis
Factor analysis was performed using the Python-based Factor_Analysis module.We calculate 195 molecular descriptors (see Supplementary Table S3) for all molecules in the bitter dataset.Bartlett's sphericity test and Kaiser-Meyer-Olkin (KMO) test were performed then on the descriptor features of the bitter/sweet dataset and the bitter/non-bitter dataset, respectively.To ensure smooth matrix decomposition during  the calculation, we removed some features, following the rule that no feature should have more than 900 zeros in the numerator.
Next, the number of common factors was determined for the two datasets, and the analysis model was established according to the maximum variance factor rotation, and the variance contribution rate and component matrix are calculated.Finally, the t-test was performed for the features whose component matrix value is greater than 0.8.
OpenBabel (O'Boyle et al. 2011) was first used to convert all the bitters from SMILES to PDBQT format, and then Vina (Trott and Olson 2010) was used to docking them to the bitter receptor TAS2R46.Next, we carried out the correlation analysis on 195 descriptors with the affinity to bitter receptors, and the Pearson correlation coefficient greater than 0.63 is considered to be a strong correlation.

Performance of HGNN
As can be seen from the Fig. 3, Model 1 has the best performance (here we use the comprehensive score, which is the average rank of the regression or categorical evaluation indicator on several datasets, and the higher the ranking, the higher the score) among 16 sub-models for classification problems, but not as good as WLN, PAGTN and GraphSAGE.Model 1 integrates weave, node attention, edge attention and MPNN to have stronger learning ability, we called it Hybrid Graph Neural Network (HGNN).According to our design, every 4 models in 16 models is a cycle, and the later the cycle, the simpler the model, but have worse learning ability.Note that C4 model here is just a simple implementation of Graph Transformer, without positional coding and pre-training.We review how HGNN transmits messages from one node to another node.First, the initial node information passed through the linear layer (n2n) and then passed through the node attention layer, and the adjacent edge information was aggregated with the linear layer (e2n) and the edge attention layer, and then the learned node information and adjacent edge information were obtained through the linear layer (un).Finally, it was selectively forgotten or remembered through the GRU layer with the initial node information, and finally passed to the target node.At the same time, the edge information was also updated at this time, and the initial node information after the linear layer (n2e) was added to the edge information after the linear layer (e2n), and then the updated edge is obtained by a linear layer (ue).As for why HGNN achieves better performance than other basic four models, we believe that the combination of models has greater complexity, so that it has stronger robustness in the face of more complex data, which has been confirmed in the ablation experiment composed of these 16 combined models.Finally, we select four GNNs with the best classification performance as the models for bitterness prediction.

Performance of bitter prediction based on GNN
The comparison results of mainstreaming predictors for bitter/non-bitter and bitter/sweet prediction was shown in Table 3.In bitter/non-bitter prediction, AttentiveFP, GraphSAGE and WLN have a good performance (AUC > 0.85) on the external test dataset.BoMLP with RDKFP as input performs best (AUC ¼ 0.86) in the contrast methods.In contrast, HGNN performed less well, with accuracy and AUC values of 0.81 and 0.82, respectively.The worst performer is BitterX, presumably the reason is the insufficient training set of the website version.Traditional machine learning methods like BitterPredict and BoMLP with descriptors as input also perform poorly, potentially because the number of descriptors is too few to extract the critical properties of the molecule and can't to solve the noise.These demonstrated that GNNs with atom and bond information as node and edge feature embedding are sufficient to identify and classify the key information of the molecule.
However, in the bitter/sweet test, the 4 GNN predictors and BoMLP with RDKFP as input maintain similar performance about with value of 0.86.The worst performers were BoCNN and BoMLP with descriptors as input, presumably, the reason is that the image size of the molecule set by BoCNN is too small.In addition, the performance of HGNN in the bitter/sweet dataset is higher than that of the bitter/ non-bitter dataset, which is caused by the more inconsistent characteristics of non-bitter data, which indicates that the generalization ability of HGNN needs to be strengthened.

Factor analysis
An adequacy test of the bitter/sweet data was conducted, revealing a P-value of .00 for Bartlett's Test and a KMO Test value of 0.92.These results indicate that the data was suitable for factor analysis.The first four common factors accounted for 76.98% of the total variance.The original 68 descriptors were reduced to 18 descriptors after applying a filter on values greater than the threshold of 0.8 in the composition matrix (refer to Fig. 4B). Figure 4C illustrated that the majority of the descriptor variables contributed to the first component, with D44 and D52 contributing to the second component, D9 to the third, and D49 and D53 to the fourth.A T-test on the 18 descriptors of the bitter/sweet dataset, as depicted in Fig. 4D, revealed significant distribution differences among 14 descriptors.These differences were primarily found in hydrogen bonding (NumHDonors and NHOHCount), molecular polarization (SMR_VSA and TPSA), hydrophobic and hydrophilic interactions (SlogP_VSA), and electric charge (PEOE_VSA, EState_VSA9, and MinAbsPartialCharge). Hence, molecules with smaller interacting surface areas, such as those with charge interactions and polarity interactions, may have been more likely to elicit a sweet taste.Most notable is the indicators of lipophilicity, SLogP, which suggested that bitter molecules may have stronger hydrophobic interactions than sweet molecules (Di Pizio et al. 2019).
The bitter/non-bitter dataset resulted in a P-value of .92 for Bartlett's Test and a KMO Test value of 10 � 10 −8 , indicating that this dataset was also suitable for factor analysis.As depicted in Fig. 5A, the selection of four common factors for subsequent factor analysis was considered appropriate, with these factors accounting for 76.55% of the total variance.A threshold of 0.8 was applied to screen the 68 descriptors in the component matrix (Fig. 5B), which resulted in 17 descriptors passing the threshold.The majority of these descriptors contributed to the first component, with the exceptions of D44 and D52 contributing to the second component, and D49 and D53 contributing to the fourth.A T-test on these descriptors revealed that only 11 descriptors showed significant differences.This number was lower than that of the bitter/non-bitter dataset, which could potentially Prediction of bitterness  be attributed to the broader distribution of the nonbitter dataset.Figure 6A displayed the molecular docking result of the well-known bitterant caffeine (Poole and Tordoff 2017) and the broad-spectrum bitter receptor TAS2R46 (Xu et al. 2022).The molecular docking results of bitter and non-bitter substances with TAS2R46, as determined by Autodock Vina, revealed that TAS2R46's affinity for these substances does  Prediction of bitterness not follow a normal distribution (Fig. 6B).The P-value of 2.91e-44 from the Mann-Whitney test indicated that the affinity of bitter agents for TAS2R46 is significantly higher than that of non-bitter agents (P < .05),which is in accordance with our cognitive expectations.Further correlation analysis of the affinity of bitterant descriptors (Fig. 6C) revealed that the count of heavy atoms, count of rings, BertzCT, Chi1, and molecular relative mass had a strong positive correlation with TAS2R46 affinity.Consequently, molecules with a larger mass, more complex shape, and valence electronic information may exhibit higher affinity to TAS2R46.We hypothesize that molecules with a larger mass and greater complexity are potentially more toxic, and that the evolution of the bitter taste receptor in human functions to deter the ingestion of such substances.

Discussion
Graph neural networks hold significant advantages in molecular property prediction and classification.To obtain the interpretability of the HGNN, we explore the mechanism of the model learning molecular features.The probability that caffeine is predicted as bitter by HGNN is 96%, so we explored how HGNN learns bitterness characteristics from the molecular structure of caffeine.As shown in Fig. 7A, the initial nodes embedding characteristics of the caffeine molecule can be divided into two groups (one group is atoms 0, 12 and 13, and the other is atoms 4, 5, 6 and 9) based on correlation.When at the first layer of graph convolutions, we only find that atoms 2, 7 and 10 have a decrease in correlation with other atoms, but when the graph convolutions at the second layer, we find that the atoms except atom 7 and 10 show a strong connection, and these two oxygen atoms are almost cut off from them, so we think that these two carbonyl oxygen atoms play an important role in bitterness of caffeine.In addition, we also explored the attention mechanism of HGNN.We output the last layer of attention coefficient (Fig. 7B) of HGNN's node attention when caffeine is used as input, and we find that the value of the attention coefficient matrix of all neighbor atoms is not completely symmetrical, which indicates that the attention from neighbors (horizontal axis) and the attention to neighbors (vertical axis) are not consistent, and the atom with the highest attention by neighbors is atom 6 N.In edge attention (Fig. 7C), more edges, such as N1-C2 and C2 ¼ N3, are recognized by attention mechanisms.
In order to explore the bitterness-related chemical groups that HGNN learned from more bitter molecules other than caffeine, we used a depth-first search (DFS) algorithm to traverse the molecular graph to search the continuous path combined with atomic pairs with HGNN attention score > 0.5, which we call the continuous attention substructure, and they may play an important role in the bitterness recognition of molecules by HGNN.A total of 8370 substructures were excavated from 1383 bitter molecules, and 88 key substructures were obtained after dropping duplication (Fig. 8), of which more than half were hydroxyl carbon atoms, which indicated that hydroxyl groups accounted for an important degree in the attention of carbon atoms.
Building on this novel approach to taste prediction, the taste of natural product molecules from COCONUT (https:// coconut.naturalproducts.net/)has been successfully predicted.After removing 351 of these error molecules, 103 762 of the remaining 406 919 molecules were predicted to be bitter and 303 157 were predicted to be non-bitter.To understand the scale of this achievement, consider that an  experienced taster evaluating 100 substances per day would require nearly eleven years to complete this task, not accounting for the difficulty of acquiring molecules and potential toxicity of the molecules.Hence, this model marks a significant advancement in the field, accelerating the process of predicting tastes and serving as a valuable tool for both taste research and flavor development.By using the simple, modularized model HGNN which can compete with the latest GNN models, this research offers fresh insights into how straightforward algorithms and architectures can tackle complex tasks.The way the modules work together demonstrates the potential of modular designs to yield results greater than the sum of their parts, shows a cumulative effect where 1 þ 1 > 2, inviting further exploration for novel applications of established algorithms.It provides a novel way to design deep learning models.
For the first time, this modular design and the latest GNN models have been applied to the prediction of bitter-sweet tastes, achieving an accuracy of 0.86, which is almost on par with the highest level model in this field with an accuracy of 0.88.This innovative approach propels the advancement of the taste prediction domain and represents a significant contribution to our understanding of this intricate biological system.
The TastePD website, developed alongside the model, currently hosts a bitter-sweet taste predictor and a complementary database.Efforts are underway to expand this into a multi-flavor predictor, aiming to provide a comprehensive tool for taste prediction research.Welcome to continue to follow our progress.This development will greatly enhance accessibility and applicability of taste prediction models, deepening our comprehensive understanding of taste.
Regarding practical considerations such as individual variances in taste perception thresholds, the relationship between flavor intensity and concentration, and the dynamic nature of datasets in real-world settings, it is hopeful that with an increase in training data, the performance of the taste prediction model will continue to improve, further contributing to the exploration of the fascinating world of taste.

Conclusion
Through a novel modular design method, we obtained a new graph neural network HGNN, which is superior to the traditional GNN like weave, GAT, and MPNN.Based on HGNN and other three well-performed GNNs, BitterGNNs were developed, which have the same performance than even better than the best-known method RDKFP-MLP.And through factor analysis and correlation analysis, we determined that molecules with greater interacting surface areas, mass, shape complexity are associated with bitterness.In addition, we developed a server TastePD where the models and datasets were deployed.
Fig 1 BCD.The algorithms of GATv2 (Veli� ckovi� c 2023), MPNN (Sch€ utt et al. 2017), and weave (Duvenaud et al. 2015) have been given in their corresponding articles, and here we give the algorithm for edge attention:

Figure 1 .
Figure 1.Chart of the study.(A) The process of transmitting information in the molecular graph.(B) The basic 4 GNN modules, it shows how the embedding feature of source node (src) transfer to node of destination (dst), the linear layer is colored with purple, e2n represents edge to node, and in the same way, n2n, n2e, e2e represents node to node, node to edge, edge to edge, respectively.(C) Basic GCN framework.(D) Add 4 GNN modules to the GCN framework, a, b, c and d correspond to the 4 modules in Figure 1 B, each module can be absent and present, so there will be 16 combined models.(E) 8 Public datasets.(F) Existing models for bitterness prediction.
with the filter rules shown in Fig.2.The detailed data cleaning rules are as below:1) Molecules are uniformly converted into canonical smiles and deduplicated.2) Delete molecules containing metal ions.3)Delete molecules with less than 2 heavy atoms.4) Delete molecules that schrodinger (https://www.schrodinger.com)and RDKit cannot read.After data cleaning, we paired bitter/sweet and bitter/nonbitter datasets and divide each data into train set and test set by 9:1 randomly.To prevent contamination of the test set, all hyperparameters are determined by optimizing mean AUC of 5-fold cross-validation in the training set.All the training parameters of GNNs and models are compared are shown in Supplementary Table

Figure 2 .4
Figure 2. The data cleaning and splitting of bitter dataset.

Figure 3 .
Figure 3.The 21 models test on 8 public datasets.(A) 5 Classification metrics (ACC, AP, F1, MCC, AUC) of the models test on 4 classification datasets, (B) 3 regression metrics (R2, MAE, RMSE) of the models test on 4 regression datasets.(C) Comprehensive regression and classification scores of the models.

Figure 4 .
Figure 4. Factor analysis for 68 descriptors of bitter/sweet dataset.(A) Eigenvalues for 68 descriptors of bitter/sweet dataset.(B) Component matrix of first 4 components for the 68 descriptors.(C) Values of the screened 18 descriptors for 4 primary components.(D) T-test of 18 descriptors on the bitter/ sweet dataset.

Figure 5 .
Figure 5. Factor analysis for 68 descriptors of bitter/non-bitter dataset.(A) Eigenvalues for 68 descriptors of bitter/non-bitter dataset.(B) Component matrix of first 4 components for the 68 descriptors.(C) Values of the screened 17 descriptors for 4 primary components.(D) T-test of the 17 descriptors on the bitter/non-bitter dataset.

Figure 6 .8
Figure 6.(A) The molecular docking of caffeine with TAS2R46.(B) The distribution difference of the affinity to TAS2R46 of bitterants and sweeteners respectively.(C) Pearson correlation coefficient between count of heavy atoms, count of rings, BertzCT, Chi1 and molecular relative mass with Affinity to TAS2R46.

Figure 7 .
Figure 7. (A) As the number of layers of the graph convolution deepens, HGNN learned the correlation matrix of the atomic characteristics of the caffeine molecule, The caffeine molecule shows two sets of atoms with an internal correlation of > 0.3, represented in green and blue, respectively.(B) Node attention and (C) Edge attention in the last layer of HGNN when caffeine as input to predict the bitterness, bonds with an attention value > 0.7 are marked in ray.

Figure 8 .
Figure 8. 88 key substructures searched from bitter molecules, below each substructure is the substructure SMILES and the number of occurrences.

Table 1 .
The modules and means of 16 models.

Table 2 .
The information of 8 public datasets.

Table 3 .
Comparison results of predictors for bitter/non-bitter and bitter/sweet prediction.