A neighborhood-regularization method leveraging multiview data for predicting the frequency of drug–side effects

Abstract Motivation A critical issue in drug benefit-risk assessment is to determine the frequency of side effects, which is performed by randomized controlled trails. Computationally predicted frequencies of drug side effects can be used to effectively guide the randomized controlled trails. However, it is more challenging to predict drug side effect frequencies, and thus only a few studies cope with this problem. Results In this work, we propose a neighborhood-regularization method (NRFSE) that leverages multiview data on drugs and side effects to predict the frequency of side effects. First, we adopt a class-weighted non-negative matrix factorization to decompose the drug–side effect frequency matrix, in which Gaussian likelihood is used to model unknown drug–side effect pairs. Second, we design a multiview neighborhood regularization to integrate three drug attributes and two side effect attributes, respectively, which makes most similar drugs and most similar side effects have similar latent signatures. The regularization can adaptively determine the weights of different attributes. We conduct extensive experiments on one benchmark dataset, and NRFSE improves the prediction performance compared with five state-of-the-art approaches. Independent test set of post-marketing side effects further validate the effectiveness of NRFSE. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/NRFSE or https://codeocean.com/capsule/4741497/tree/v1.


Solving algorithm for the optimization model
We propose the following optimization model for predicting the frequency of drug side effects: = 1, ℎ  > 0.
We use an iterative update algorithm to solve our model.Specifically,   (p=1,…,x) and ℎ  (q=1,…y) are first fixed, and the multiplicative update procedure is applied to update  and .The details are shown as follows: According to Karush-Kuhn-Tucker (KKT) dual complementarity condition, So the update formulas for U and V are designed as follows: where  0 and  0 represent the matrices before updating, and U and V are the matrices after updating.
Then, we fix U and V, and the optimization for   (p=1,…,x) and ℎ  (q=1,…y) is shown as follows.
Since   and ℎ  are independent, we solve their optimal solutions, respectively.The Lagrange function for   is defined as follows − 1).= 1 , we can obtain ∑ ( . Similarly, we can get optimal solution for .

Sensitivity analysis of hyperparameters
Here, we present the sensitivity analysis results of the remaining hyperparameters in the NRFSE under the CV1 setting, including the regularization coefficients for drugs and side effects ( and ), the latent feature dimension for drugs and side effects ()， the number of neighbors for drugs and side effects ( 1 ) and the number of neighbors for new drugs and new side effects ( 2 ).We conducted 10-fold cross validation for each parameter under the CV1 setting, and then calculated five metrics for each fold, such as AUC, AUPR, MAE, RMSE and PCC, and drew a boxplot of the 10 fold results for each metric ('+' represents the mean value).Among the five metrics, AUC and AUPR represent the predictive ability of the NRFSE to predict drug-side effect associations, while MAE, RMSE, and PCC represent the predictive ability of the NRFSE to predict drug-side effect frequencies.It is worth noting that when testing each hyperparameter, we fixed other hyperparameters as default values, where the default values are  = 2,  = 2,  = 200,  1 = 20, and  2 = 10。 1) Regularization coefficients for drugs and side effects We selected  and  values from {0, 1, 2, 3, 4} respectively, where 0 represents no regularization constraint, and the larger the value, the stronger the regularization constraint.Fig. S1 and S2 respectively show the sensitivity analysis results of  and .They illustrate that the results with regularization constraints are better than those without.Although the association prediction performance of  = 1 and  = 1 is better than that of  = 2 and  = 2, the frequency prediction performance is worse than that of  = 2 and  = 2.The performance of  = 2 and  = 2 is generally better than those of  = 3,  = 4 and  = 3,  = 4.  3)  1 -nearest neighbors of drugs and side effects We selected  1 from {5, 10, 15, 20, 25}.Fig. S4 shows the sensitivity analysis results of  1 .It illustrates that with the increase of  1 , NRFSE tends to decrease the performance of association prediction and improve the performance of frequency prediction.Therefore,  1 = 20 was selected to balance the performances of association prediction and frequency prediction of NRFSE.Fig. S4.The sensitivity analysis for  1 -nearest neighbors for drugs and side effects.

4) 𝑘𝑘 2 -nearest neighbors of new drugs and new side effects
We selected  2 from {1, 5, 10, 15, 20}.For new drugs or new side effects, the embedding directly learned by the model is inaccurate due to the lack of the corresponding drug-side effect frequency terms, so we use the  2 -nearest neighbors of the corresponding drug or side effect to obtain the embedding by means of weighted average.Fig. S5 and S6 show the sensitivity analysis under the settings of CV1 and CV2, respectively.Because under CV1 setting, there are rarely new drugs or new side effects, the value of  2 will not have an impact on the prediction results, as shown in Fig. S5.However, in the case of new drugs or new side effects, that is, under CV2 setting, the overall prediction performance of NRFSE first increases and then decreases, as shown in Fig. S6.It illustrates that when  2 = 1 0, the association prediction performance of NRFSE is better than other options, and the frequency prediction performance is also excellent.

Fig. S1 .
Fig. S1.The sensitivity analysis for regularization coefficient  for drugs

Fig. S3 .
Fig.S3.The sensitivity analysis for the latent feature dimension  of drugs and side effects.

Fig. S5 .
Fig. S5.The sensitivity analysis for  2 -nearest neighbors for new drugs and new side effects under CV1 setting.

Fig. S6 .
Fig. S6.The sensitivity analysis for  2 -nearest neighbors for new drugs and new side effects under CV2 setting.