SPIN: sex-specific and pathway-based interpretable neural network for sexual dimorphism analysis

Abstract Sexual dimorphism in prevalence, severity and genetic susceptibility exists for most common diseases. However, most genetic and clinical outcome studies are designed in sex-combined framework considering sex as a covariate. Few sex-specific studies have analyzed males and females separately, which failed to identify gene-by-sex interaction. Here, we propose a novel unified biologically interpretable deep learning-based framework (named SPIN) for sexual dimorphism analysis. We demonstrate that SPIN significantly improved the C-index up to 23.6% in TCGA cancer datasets, and it was further validated using asthma datasets. In addition, SPIN identifies sex-specific and -shared risk loci that are often missed in previous sex-combined/-separate analysis. We also show that SPIN is interpretable for explaining how biological pathways contribute to sexual dimorphism and improve risk prediction in an individual level, which can result in the development of precision medicine tailored to a specific individual’s characteristics.

that unaffected the minimization of the loss on each epoch until convergence.We also used the dropout and  2 regularization.Additionally, we considered class weights on the objective function to avoid bias to the majority class in the imbalanced data.SPIN involves the following hyperparameters: (1) learning rate, (2) dropout rate, and (3) weight decay ( 2 penalty).We used ReLU as an activation, kaiming uniform (a.k.a. He initialization) as an initializer, and Adam as an optimizer.With the training and validation sets, the optimal hyperparameters are heuristically determined on each experiment using grid search.The optimal model is then applied to the testing set to assess the C-index and AUC for survival analysis and risk prediction, respectively.

S4. Global interpretation analysis
We compute importance scores of each gene/pathway by interpreting the optimized SPIN's parameters.An importance score on each gene/pathway reflects a magnitude how much a gene/pathway contributes to the predictive outcome.A partial derivative of a function ()  measures how the predictive function () changes as the given gene/pathway value () is changed.We define the importance score on each node in a layer by accumulating gradients along the path from  to ().Specifically, the calculation procedures for partial derivatives on pathways and genes are as follows: For pathways, where   is the matrix of sparse connections between the sex-specific pathway layers and the hidden layer.For genes, where    ,    are the matrices of connections regulated by the biological prior knowledge between genes and pathways.Supplementary Figure 3 shows an architecture diagram of SPIN in which each layer, weight, and gradient calculation are notated.For the global interpretable model, we applied the averaged optimal parameters (i.e., weights and nodes) of each optimized SPIN model of ten experiments to capture the robust significant biological factors.We generated a matrix reflecting the gene or pathway importance (∈ ℝ × or ℝ × ).
The statistical significance of the importance score distributions for each gene/pathway is tested using the one-sample t-test under the null hypothesis of zero-mean ( 0 :  = 0).We assume that the zero value of the partial derivative for a gene/pathway means no impact on the SPIN prediction.The t-test of the importance score distributions is performed separately for male and female groups to prevent canceling effects that may happen when males and females act in opposite directions on specific genes/pathways.
Then, the resulting p-values are corrected for multiple testing using a false-discovery rate (FDR)-controlling method (the Benjamini-Hochberg (BH) procedure) in which we set the family-wise error rate (FWER) as We conducted the gene-sex interaction analysis using the conventional linear-based statistical method.For survival analysis, a regression Cox-PH model was used to fit gene expression, sex, and the interaction between them.For risk score prediction, we used a logistic regression model as a linear combination of gene expression, sex, and the interaction.To be specific, where  , is the target label for gene  and sample ,  , is sample 's expression level for gene , and   is sample 's value for the sex.Based on this representation, we consider an interaction effect between a gene and sex in regard to asthma takes place if the learned  3 coefficient is statistically significant (after FDR correction over all genes).

S5. Local interpretation analysis
Compared to the global interpretation analysis that identifies 'what' significant features (i.e., genes/pathways) are involved in a biological system from the whole population, the local interpretation analysis scrutinizes each individual sample on 'how' the features contribute to the target outcomes.
Specifically, the local interpretation analysis reveals how the features have positive (or negative) impacts on a particular prediction using Shapley Additive Explanations (SHAP) based on a concept from game theory called the Shapley value [29].SHAP assigns a value   of the magnitude for the feature effect on the prediction  to a feature  as follows: where (( ∪ ) − ()) represents the difference in predictions when the feature  is included and excluded.

|𝑆|!(𝑛−|𝑆|−1)! 𝑛!
represents the weighting for the marginal contributions.The total sum of all SHAP values is equal to the predicted probability of a sample (a.k.a. the local additivity property of SHAP), so that the SHAP value of each feature reflects the relative impact on the prediction.
In our local interpretation analysis, we utilized a SHAP explanation model (i.e., DeepExplainer) that approximates SHAP values for deep learning models, applying the intermediate layer (pathway layer in SPIN) into DeepExplainer to obtain SHAP values of the pathways.We note that we separately used the DeepExplainer for males and females, since the architecture of SPIN contains male-and female-specific pathway layers.We observed various patterns of pathway contributions among individual samples, which cannot be identified by the global interpretation analysis.For instance, one of the widely known asthmarelated pathways, Jak-STAT signaling pathway [14,24], indicates high influence on the risk of asthma as shown in the global interpretation analysis (the right side in Fig. 3B).However, the second patient of asthma female in Fig. 4A shows that Jak-STAT signaling pathway might not have significant influence on risk of asthma.Instead, Endocytosis was detected as the most influential pathway for the patient. 1 Chr is the corresponding chromosomes. 2Score represents a bigger importance score between male and female by our global interpretation. 3Interaction p-value indicates the p-value of interaction variables in the conventional statistical approach.

Models
In the sex-shared genes, MAPK8, an autophagy-related gene, is a protective factor for GBM survival [20].
AKT3 has an influence on tumor suppressive function in GBM and activates DNA repair and resistance to radiation and chemotherapy in GBM [22].For the male-specific genes, MAP3K1 is highly expressed in peripheral infiltrating GBM cells than in normal tissues [27].IFNG contributes to the anti-tumor immune response in the GBM microenvironment [11].As the female-specific genes, NRAS is upregulated in the glioma pathways [23].High expression of PLCG1 is relevant to tumor progression and poor survival in LGG.Additionally, PLCG1 is reported to exert influence on the growth, migration, and invasiveness of LGG cells [13].The mutation of TSC2 found in peripheral blood is recognized in glioblastoma as well as in glioblastoma-derived cells [19].As the sex-shared pathways, MAPK signaling pathway and p53 signaling pathway are reported as significant in GBM.The hyperactivation of the MAPK signaling pathway plays a key role in GBM, and the progression of the glioma can be restricted by the inhibition of the MAPK signaling pathway [1].The p53 signaling pathway suppresses the activity of enzyme ubiquitin specific peptidase 7 (USP7) in glioma [26].

Supplementary
In a male-enriched pathway, Notch signaling pathway in hypocretin-1-treated cells in GBM is crucially downregulated such that the inhibition of the Notch signaling pathway allows the hypocretin-1 to exert the antitumor effect on GBM [8].As a pathway enriched in females, the dysregulation of key proteins associated with the Spliceosome may affect the Temozolomide (TMZ) resistance, a reason why GBM treatment fails, which are related to prognosis of GBM patients [25].For the sex-shared genes, PIK3R1 is related to the asthma severity with its higher expression in peripheral blood mononuclear cells [15].HLA-G is an immunomodulatory factor in asthma [12].In the 3'UTR segment of the HLA-G gene, the variation sites, +3010C/G and +3142C/G, are separately concerned in asthma severity [2].IKBKB's SNPs in children increase susceptibility to the development of wheezing that may result in possibly subsequent asthma [5].As the male-specific genes, TGFB1 is reported in asthma airway inflammation, remodeling and cytokine [4], and decline in lung function [10].MAPK1 suppresses the Th2 inflammation in airway epithelial cells of allergic asthma [21].IL1B upregulation appears in neutrophilic asthma [17], and high expressions of IL1B related to neutrophilic inflammation are involved in inferior lung function and raised chronic obstructive pulmonary disease (COPD) severity [3].For the female-specific genes, ALDH3A1 is identified as a potential marker to predict the diagnosis of difficult-tocontrol asthma [18].ITGB4's deficiency activates airway inflammation, which is a significant incentive for bipolar disorder (BD)-like behavior during asthma pathogenesis [7].In the sex-shared pathways, regarding the regulation of the JAK-STAT signaling pathway, pyrroloquinoline quinone (PQQ) mitigates allergic airway inflammation in mice.PQQ is a potential therapeutic agent for inflammatory diseases, including asthma [14].JAK-STAT signaling pathway allows the exposure of perfluorooctanesulfonate (PFOS) and perfluorooctanoate (PFOA) to induce airway inflammation of asthma.

Supplementary
JAK-STAT6 signaling pathway, a key member of JAK-STAT, engages in several asthma stages [24].The enzyme arginase of the Arginine and proline metabolism pathway, relevant to asthma pathogenesis, increases activity in the serum of asthmatic individuals [16].For the pathway enriched in males, Panax Notoginseng Saponins R1 (PNS-R1) alleviates Dexamethasone (Dex)-induced apoptosis in bronchial epithelial cells by the inhibition of mitochondrial Apoptosis pathway, highlighting the potential of the PNS-R1 gene in asthma treatment [28].As the female-enriched pathway, Ubiquitin mediated proteolysis pathway might contribute to the development of asthma and the credible therapeutic approaches of asthma diagnosis and treatment in the future [9].Upregulated expression in hypomethylation of genes in Ubiquitin mediated proteolysis may describe increased abundance or phenotypic differences of monocytes in asthma [6].

Table 3 .
The performance comparisons of asthma datasets.

Table 6 .
The top-ranked genes of GSE172367 166

Table 7 .
The top-ranked pathways of GSE172367 180