Abstract

Prostate cancer (PCa) brings huge public health burden in men. A growing number of conventional observational studies report associations of multiple circulating proteins with PCa risk. However, the existing findings may be subject to incoherent biases of conventional epidemiologic studies. To better characterize their associations, herein, we evaluated associations of genetically predicted concentrations of plasma proteins with PCa risk. We developed comprehensive genetic prediction models for protein levels in plasma. After testing 1308 proteins in 79 194 cases and 61 112 controls of European ancestry included in the consortia of BPC3, CAPS, CRUK, PEGASUS, and PRACTICAL, 24 proteins showed significant associations with PCa risk, including 16 previously reported proteins and eight novel proteins. Of them, 14 proteins showed negative associations and 10 showed positive associations with PCa risk. For 18 of the identified proteins, potential functional somatic changes of encoding genes were detected in PCa patients in The Cancer Genome Atlas (TCGA). Genes encoding these proteins were significantly involved in cancer-related pathways. We further identified drugs targeting the identified proteins, which may serve as candidates for drug repurposing for treating PCa. In conclusion, this study identifies novel protein biomarker candidates for PCa risk, which may provide new perspectives on the etiology of PCa and improve its therapeutic strategies.

Introduction

Prostate cancer (PCa) is the second most frequently diagnosed malignancy and the fifth leading cause of cancer-related mortality among males worldwide [1]. In 2023, there were 288 300 estimated new PCa cases and 34 700 estimated deaths due to PCa in the United States, making it a malignancy with the highest incidence (29%) and second highest mortality (11%) in male [2]. The survival rate of PCa plunges at a metastatic stage while it is higher when cancer is diagnosed at a localized stage [3]. Prostate-specific antigen (PSA) screening is a significant advance for early PCa diagnosis [4]. Nevertheless, the use of PSA remains controversial considering low specificity, high false-negative results [5,6], the unclear utility to reduce mortality in some populations [7,8], and the risk associated with PCa overdiagnosis [9].

Therefore, there is a critical need to identify additional biomarkers for improving risk assessment of PCa. Several protein biomarkers measured in serum have been reported to be potentially associated with PCa risk, such as IL6, KLK11, and EPCA [10–13]. However, results have been inconsistent from previous research, most of which only involved relatively small sample sizes and few protein candidates. Additionally, they were potentially subject to selection bias and residual and unmeasured confounding due to the study design of traditional epidemiology with classical measurement of exposure [14].

An alternative study design is to use genetic instruments to assess the associations between genetically predicted protein levels and PCa risk [15]. In a previous study, we leveraged massive protein quantitative traits loci (pQTLs) as instrumental variables for evaluating the associations of genetically predicted protein levels with PCa risk, in which we were able to identify 31 such proteins [16]. It is known that for specific proteins, the proportions of variance of protein blood levels that can be explained by the GWAS-identified associated pQTLs could be relatively small. Therefore, the design using merely pQTLs as instruments may be underpowered for certain proteins. Another design of building comprehensive genetic prediction models for each protein to capture the prediction value of multiple SNPs may better capture genetically regulated components of protein blood levels and thus further enhance statistical power [11]. In this study, we investigated the associations between genetically predicted blood protein levels and PCa risk using comprehensive protein genetic prediction models as instruments. We further generated a list of drugs targeting the identified proteins which may serve as candidates for drug repurposing of PCa.

Results

Plasma protein genetic prediction models

The overall design of this study is displayed in Fig. 1. The genetic prediction models were established for 1864 proteins, of which 1674 showed a prediction performance of R2 ≥ 0.01 in internal cross-validation. Proteins predicted well in INTERVAL subcohort1 also tended to be predicted well in subcohort2 in external validation analyses (Supplementary Fig. 1). The correlation coefficient for R2 of the 1674 proteins in the two subcohorts was 0.98. Of them, prediction models for 1308 proteins assayed using 1389 SOMAmers also showed a correlation coefficient ≥0.01 between the predicted levels and measured levels in external validation using INTERVAL subcohort2 data.

The overall design of this study.
Figure 1

The overall design of this study.

Associations of predicted protein levels in plasma with PCa risk

We observed that for 24 proteins, their genetically predicted levels were associated with PCa risk at False Discovery Rate (FDR) ≤ 0.05 (Fig. 2). Of these, nine protein models were predicted only by trans-predicting SNPs, six only by cis-predicting SNPs, and the other nine by both cis- and trans- predicting SNPs (Fig. 2 and Tables 12). Three of them (PSP-94, DcR3, and Angiostatin) further reached P-value ≤ 5 × 10−8 (Tables 12). Sixteen of the proteins have been identified in our prior study using pQTLs as instruments [16] (Table 1). The other eight proteins were novel proteins reported for the first time (Table 2). Among them, five are encoded by genes located more than 500 kb away from GWAS/fine-mapping identified PCa risk variants, and three are encoded by genes (ARL3, B3GNT8, and PLG) located within 500 kb of PCa risk variants. Of the 15 proteins reported in our previous study using pQTLs as instruments but not in the current study, nine proteins showed an association with PCa risk at P-value < 0.05 in the current study (Supplementary Table 1). Among the other six proteins, for three (ARFP2, GPC6, and PIM) there was no cis-or trans- SNP associated at our pre-defined threshold thus no prediction models could be established. The internal cross validation performance (R2) of prediction model of ZHX3 in subcohort1 was smaller than 0.01, thus it was excluded for further analysis. For two remaining proteins GSTP1 and WISP-3, the performance of their prediction models in external validation procedure in subcohort2 was less than 0.01, thus were not evaluated for the downstream association analyses.

In our study, an inverse association between predicted protein abundances and PCa risk was detected for 14 proteins including Laminin, Fas ligand, soluble, IGF-II receptor, Cathepsin S, Angiostatin, and PSP-94 (Z score ranging from −3.38 to −25.6). Conversely, an association between higher predicted protein levels and increased PCa risk was identified for DOCK9, MICB, REG4, TPST1, B3GN8, GRIA4, PDE4D, TIP39, IL-21, and SPINT2 (Z score ranging from 3.36 to 4.81). We further assessed the associations between measured levels of identified proteins and known PCa risk variants in the INTERVAL subcohort 1 data. Levels of 11 of the 24 identified proteins were significantly associated (FDR < 0.05) with at least one known PCa risk SNP (Supplementary Table 2) [17–20].

Manhattan plot of 24 identified proteins associated with PCa risk. The horizontal dotted line indicated the significance threshold at FDR ≤ 0.05. Circles located above the line represent those proteins identified in our previous paper using pQTLs as instruments, and triangles represent novel proteins identified in the current study.
Figure 2

Manhattan plot of 24 identified proteins associated with PCa risk. The horizontal dotted line indicated the significance threshold at FDR ≤ 0.05. Circles located above the line represent those proteins identified in our previous paper using pQTLs as instruments, and triangles represent novel proteins identified in the current study.

Robustness analyses

We firstly re-established protein prediction models for the 24 proteins identified in the main analyses using subcohort 2 data. Of them, prediction models of 19 were successfully established using this new subcohort data, and models for the remaining five proteins could not be developed due to a lack of SNPs associated at FDR < 0.05 in cis-regions and P-value < 5 × 10−8 in trans-regions. For the 19 proteins with prediction models established, 16 (16/19 = 84.21%) showed an association with PCa risk at P -value < 0.05 with consistent directions (Supplementary Table 3). For Summary data-based Mendelian Randomization (SMR) sensitivity analysis, 16 (16/24 = 66.67%) proteins were suggested to have likely causal effects on PCa risk with the consistent effect direction at P-value < 0.05 and PHEIDI ≥ 0.01 (Supplementary Table 4). Based on two-stage constrained maximum likelihood (2ScML) sensitivity analysis, 19 (19/24 = 79.17%) proteins could be replicated (Supplementary Table 5). Overall, all the 24 proteins identified in our main analyses could be replicated by at least one of the examined approaches except for DcR3. Notably, eight proteins (B3GN8, Cathepsin S, KDEL2, Laminin, MED-1, MICB, PSP-94, and TPST1) could be validated with all three sensitivity analyses methods. These support the robustness of our findings.

Somatic level variants of genes encoding associated proteins in PCa patients

We evaluated nonsynonymous somatic variants in prostate tumor tissue and tumor-adjacent normal tissue obtained from 499 TCGA prostate adenocarcinoma (TCGA-PRAD) patients. The somatic level changes of nonsynonymous variants were observed in at least one patient for 18 of the 24 genes encoding associated proteins (Supplementary Table 6). This proportion (18/24 = 75%) is significantly higher (enrichment P-value < 0.00001) than the proportion of genes encoding the 1308 proteins (available for 1303) tested for association analyses (665/1303 = 51.04%) and the proportion for genes across the whole genome (18 768/56 602 = 33.16%). Among those somatic variations, missense mutations were detected in 13 genes and frameshift deletions were observed in two genes (FASLG and ZNF175) (Supplementary Table 6). As a comparison, when focusing on patients of another randomly selected cancer type, TCGA-UVM, only two of the 24 assessed genes (NCF2 and KDELC2) had somatic variants (2/24 = 8.33%). When focusing on the genes encoding the 1308 tested proteins, 83 (83/1308 = 6.35%) had potentially functional somatic level variants in TCGA-UVM. This observation indicated that genes encoding PCa associated proteins identified in our study were enriched for somatic level variations compared with background.

Ingenuity Pathway Analysis (IPA) and Protein-Protein Interaction (PPI) analysis results

To investigate functional protein–protein association networks, interactions among 24 identified proteins were assessed using the STRING database with the K-means cluster of 3 (Fig. 3). PLG and FASLG had the highest node degree (N = 2) in the network, which have been previously reported to be related to PCa [21,22]. In the IPA diseases and biological functions analysis, 268 functions were enriched for the genes encoding our identified proteins at P-value < 0.05, of which four were significant even after Bonferroni correction. Among them, we identified five categories related to PCa in the diseases and biological functions analysis (Fig. 3 and Supplementary Table 7). For example, 12 genes (MICB, FASLG, LAMC1, MED1, PDE4D, MSMB, GRIA4, TNFRSF6B, ATF6, PLG, IGF2R, and SPINT2) were found to play important roles in PCa (P-value = 0.03). In addition, we identified a significant enriched network (cancer, neurological disease, organismal injury, and abnormalities) with P-value of 3.92 × 10−5 (Supplementary Table 8). The most significant upstream regulators in the current study were GFAP, SPI1, CD3, A2M, and ADAM9 (Supplementary Table 9). Among them, SPI1 was identified to be an upstream regulator for three genes coding PCa-associated proteins (CTSS, FASLG, and NCF2). As a ETS family transcription factor, SPI1 has been implicated in regulation of ACP3 which encodes prostate-specific acid phosphatase (PAP) [23]. Furthermore, ADAM9 was recognized as another upstream regulator for REG4, which contributes to the survival and progression of PCa cells [24].

Table 1

Sixteen protein-PCa associations for proteins already identified in our previous research using pQTLs as instruments.

ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rate
CisbTrans
ATF6AATF6.11277.23.3Cyclic AMP-dependent transcription factor ATF-6 alphaATF6lasso4040.020.010.040.04-4.016.05 × 10−58.37 × 10−3
Cathepsin SCTSS.3181.50.2Cathepsin SCTSStop11100.110.130.110.13-5.152.56 × 10−78.85 × 10−5
GRIA4GRIA4.10760.107.3Glutamate receptor 4GRIA4enet540540.090.120.060.093.891.00 × 10−40.01
IGF-II receptorIGF2R.3676.15.3Cation-independent mannose-6-phosphate receptorIGF2Renet666330.340.380.180.20-3.396.95 × 10−40.04
IL-21IL21.7124.18.3Interleukin-21IL21enet610610.100.110.130.094.371.23 × 10−52.84 × 10−3
KDEL2KDELC2.8296.117.3KDEL motif-containing protein 2KDELC2enet565420.040.050.090.07-4.271.96 × 10−53.87 × 10−3
LamininLAMA1.LAMB1. LAMC1.2728.62.2LamininLAMC1enet6215470.080.050.060.06-3.387.22 × 10−40.04
MICBMICB.5102.55.3MHC class I polypeptide-related sequence BMICBenet7347260.210.210.130.113.387.24 × 10−40.04
PSP-94MSMB.10620.21.3Beta-microseminoproteinMSMBlasso14860.460.460.470.48-25.63.26 × 10−1444.51 × 10−141
NCF-2NCF2.10047.12.3Neutrophil cytosol factor 2NCF2enet890890.180.220.150.16-3.543.97 × 10−40.03
PDE4DPDE4D.5255.22.3cAMP-specific 3′,5′-cyclic phosphodiesterase 4DPDE4Denet880880.090.110.070.083.938.56 × 10−50.01
TIP39PTH2.7257.18.3Tuberoinfundibular peptide of 39 residuesPTH2enet200200.020.010.020.024.153.26 × 10−55.64 × 10−3
SPINT2SPINT2.2843.13.2Kunitz-type protease inhibitor 2SPINT2enet737210.360.390.360.394.811.54 × 10−64.26 × 10−4
DcR3TNFRSF6B.5070.76.3Tumor necrosis factor receptor superfamily member 6BTNFRSF6Btop11100.010.010.020.02-8.973.07 × 10−191.42 × 10−16
TPST1TPST1.7928.183.3Protein-tyrosine sulfotransferase 1TPST1lasso4400.020.030.020.033.652.58 × 10−40.02
ZN175ZNF175.12716.3.3Zinc finger protein 175ZNF175enet520520.050.030.090.06-3.881.03 × 10−40.01
ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rate
CisbTrans
ATF6AATF6.11277.23.3Cyclic AMP-dependent transcription factor ATF-6 alphaATF6lasso4040.020.010.040.04-4.016.05 × 10−58.37 × 10−3
Cathepsin SCTSS.3181.50.2Cathepsin SCTSStop11100.110.130.110.13-5.152.56 × 10−78.85 × 10−5
GRIA4GRIA4.10760.107.3Glutamate receptor 4GRIA4enet540540.090.120.060.093.891.00 × 10−40.01
IGF-II receptorIGF2R.3676.15.3Cation-independent mannose-6-phosphate receptorIGF2Renet666330.340.380.180.20-3.396.95 × 10−40.04
IL-21IL21.7124.18.3Interleukin-21IL21enet610610.100.110.130.094.371.23 × 10−52.84 × 10−3
KDEL2KDELC2.8296.117.3KDEL motif-containing protein 2KDELC2enet565420.040.050.090.07-4.271.96 × 10−53.87 × 10−3
LamininLAMA1.LAMB1. LAMC1.2728.62.2LamininLAMC1enet6215470.080.050.060.06-3.387.22 × 10−40.04
MICBMICB.5102.55.3MHC class I polypeptide-related sequence BMICBenet7347260.210.210.130.113.387.24 × 10−40.04
PSP-94MSMB.10620.21.3Beta-microseminoproteinMSMBlasso14860.460.460.470.48-25.63.26 × 10−1444.51 × 10−141
NCF-2NCF2.10047.12.3Neutrophil cytosol factor 2NCF2enet890890.180.220.150.16-3.543.97 × 10−40.03
PDE4DPDE4D.5255.22.3cAMP-specific 3′,5′-cyclic phosphodiesterase 4DPDE4Denet880880.090.110.070.083.938.56 × 10−50.01
TIP39PTH2.7257.18.3Tuberoinfundibular peptide of 39 residuesPTH2enet200200.020.010.020.024.153.26 × 10−55.64 × 10−3
SPINT2SPINT2.2843.13.2Kunitz-type protease inhibitor 2SPINT2enet737210.360.390.360.394.811.54 × 10−64.26 × 10−4
DcR3TNFRSF6B.5070.76.3Tumor necrosis factor receptor superfamily member 6BTNFRSF6Btop11100.010.010.020.02-8.973.07 × 10−191.42 × 10−16
TPST1TPST1.7928.183.3Protein-tyrosine sulfotransferase 1TPST1lasso4400.020.030.020.033.652.58 × 10−40.02
ZN175ZNF175.12716.3.3Zinc finger protein 175ZNF175enet520520.050.030.090.06-3.881.03 × 10−40.01

aSentinel variant IDs were derived from previous study by Sun et al. [55].

bSNPs within 1 MB of TSS of the target protein.

Table 1

Sixteen protein-PCa associations for proteins already identified in our previous research using pQTLs as instruments.

ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rate
CisbTrans
ATF6AATF6.11277.23.3Cyclic AMP-dependent transcription factor ATF-6 alphaATF6lasso4040.020.010.040.04-4.016.05 × 10−58.37 × 10−3
Cathepsin SCTSS.3181.50.2Cathepsin SCTSStop11100.110.130.110.13-5.152.56 × 10−78.85 × 10−5
GRIA4GRIA4.10760.107.3Glutamate receptor 4GRIA4enet540540.090.120.060.093.891.00 × 10−40.01
IGF-II receptorIGF2R.3676.15.3Cation-independent mannose-6-phosphate receptorIGF2Renet666330.340.380.180.20-3.396.95 × 10−40.04
IL-21IL21.7124.18.3Interleukin-21IL21enet610610.100.110.130.094.371.23 × 10−52.84 × 10−3
KDEL2KDELC2.8296.117.3KDEL motif-containing protein 2KDELC2enet565420.040.050.090.07-4.271.96 × 10−53.87 × 10−3
LamininLAMA1.LAMB1. LAMC1.2728.62.2LamininLAMC1enet6215470.080.050.060.06-3.387.22 × 10−40.04
MICBMICB.5102.55.3MHC class I polypeptide-related sequence BMICBenet7347260.210.210.130.113.387.24 × 10−40.04
PSP-94MSMB.10620.21.3Beta-microseminoproteinMSMBlasso14860.460.460.470.48-25.63.26 × 10−1444.51 × 10−141
NCF-2NCF2.10047.12.3Neutrophil cytosol factor 2NCF2enet890890.180.220.150.16-3.543.97 × 10−40.03
PDE4DPDE4D.5255.22.3cAMP-specific 3′,5′-cyclic phosphodiesterase 4DPDE4Denet880880.090.110.070.083.938.56 × 10−50.01
TIP39PTH2.7257.18.3Tuberoinfundibular peptide of 39 residuesPTH2enet200200.020.010.020.024.153.26 × 10−55.64 × 10−3
SPINT2SPINT2.2843.13.2Kunitz-type protease inhibitor 2SPINT2enet737210.360.390.360.394.811.54 × 10−64.26 × 10−4
DcR3TNFRSF6B.5070.76.3Tumor necrosis factor receptor superfamily member 6BTNFRSF6Btop11100.010.010.020.02-8.973.07 × 10−191.42 × 10−16
TPST1TPST1.7928.183.3Protein-tyrosine sulfotransferase 1TPST1lasso4400.020.030.020.033.652.58 × 10−40.02
ZN175ZNF175.12716.3.3Zinc finger protein 175ZNF175enet520520.050.030.090.06-3.881.03 × 10−40.01
ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rate
CisbTrans
ATF6AATF6.11277.23.3Cyclic AMP-dependent transcription factor ATF-6 alphaATF6lasso4040.020.010.040.04-4.016.05 × 10−58.37 × 10−3
Cathepsin SCTSS.3181.50.2Cathepsin SCTSStop11100.110.130.110.13-5.152.56 × 10−78.85 × 10−5
GRIA4GRIA4.10760.107.3Glutamate receptor 4GRIA4enet540540.090.120.060.093.891.00 × 10−40.01
IGF-II receptorIGF2R.3676.15.3Cation-independent mannose-6-phosphate receptorIGF2Renet666330.340.380.180.20-3.396.95 × 10−40.04
IL-21IL21.7124.18.3Interleukin-21IL21enet610610.100.110.130.094.371.23 × 10−52.84 × 10−3
KDEL2KDELC2.8296.117.3KDEL motif-containing protein 2KDELC2enet565420.040.050.090.07-4.271.96 × 10−53.87 × 10−3
LamininLAMA1.LAMB1. LAMC1.2728.62.2LamininLAMC1enet6215470.080.050.060.06-3.387.22 × 10−40.04
MICBMICB.5102.55.3MHC class I polypeptide-related sequence BMICBenet7347260.210.210.130.113.387.24 × 10−40.04
PSP-94MSMB.10620.21.3Beta-microseminoproteinMSMBlasso14860.460.460.470.48-25.63.26 × 10−1444.51 × 10−141
NCF-2NCF2.10047.12.3Neutrophil cytosol factor 2NCF2enet890890.180.220.150.16-3.543.97 × 10−40.03
PDE4DPDE4D.5255.22.3cAMP-specific 3′,5′-cyclic phosphodiesterase 4DPDE4Denet880880.090.110.070.083.938.56 × 10−50.01
TIP39PTH2.7257.18.3Tuberoinfundibular peptide of 39 residuesPTH2enet200200.020.010.020.024.153.26 × 10−55.64 × 10−3
SPINT2SPINT2.2843.13.2Kunitz-type protease inhibitor 2SPINT2enet737210.360.390.360.394.811.54 × 10−64.26 × 10−4
DcR3TNFRSF6B.5070.76.3Tumor necrosis factor receptor superfamily member 6BTNFRSF6Btop11100.010.010.020.02-8.973.07 × 10−191.42 × 10−16
TPST1TPST1.7928.183.3Protein-tyrosine sulfotransferase 1TPST1lasso4400.020.030.020.033.652.58 × 10−40.02
ZN175ZNF175.12716.3.3Zinc finger protein 175ZNF175enet520520.050.030.090.06-3.881.03 × 10−40.01

aSentinel variant IDs were derived from previous study by Sun et al. [55].

bSNPs within 1 MB of TSS of the target protein.

Table 2

Eight protein-PCa associations for novel proteins which have not been previously reported.

ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rateDistance of gene to closest risk SNP (kb)
CisbTrans
ARL3ARL3.12571.14.3ADP-ribosylation factor-like protein 3ARL3enet4115260.130.120.110.11−3.494.75 × 10−40.0418.39
B3GN8B3GNT8.9297.12.3UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8B3GNT8lasso7700.140.140.140.143.841.25 × 10−40.0150.95
CC134CCDC134.5587.3.3Coiled-coil domain-containing protein 134CCDC134lasso9270.020.01NANA−4.035.62 × 10−50.011271.57
DOCK9DOCK9.14002.18.3Dedicator of cytokinesis protein 9DOCK9lasso5050.040.030.040.033.358.06 × 10−40.0525449.81
Fas ligand, solubleFASLG.3052.8.2Tumor necrosis factor ligand superfamily member 6, soluble formFASLGblup1317013170.020.020.020.02−3.387.34 × 10−40.045492.30
MED-1MED1.5470.69.2Mediator of RNA polymerase II transcription subunit 1MED1enet150150.100.110.090.09−3.455.62 × 10−40.041456.97
AngiostatinPLG.3710.49.2AngiostatinPLGlasso131300.040.060.040.04−9.751.77 × 10−221.22 × 10−19241.20
REG4REG4.11102.22.3Regenerating islet-derived protein 4REG4enet242400.020.020.020.033.632.88 × 10−40.0230304.20
ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rateDistance of gene to closest risk SNP (kb)
CisbTrans
ARL3ARL3.12571.14.3ADP-ribosylation factor-like protein 3ARL3enet4115260.130.120.110.11−3.494.75 × 10−40.0418.39
B3GN8B3GNT8.9297.12.3UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8B3GNT8lasso7700.140.140.140.143.841.25 × 10−40.0150.95
CC134CCDC134.5587.3.3Coiled-coil domain-containing protein 134CCDC134lasso9270.020.01NANA−4.035.62 × 10−50.011271.57
DOCK9DOCK9.14002.18.3Dedicator of cytokinesis protein 9DOCK9lasso5050.040.030.040.033.358.06 × 10−40.0525449.81
Fas ligand, solubleFASLG.3052.8.2Tumor necrosis factor ligand superfamily member 6, soluble formFASLGblup1317013170.020.020.020.02−3.387.34 × 10−40.045492.30
MED-1MED1.5470.69.2Mediator of RNA polymerase II transcription subunit 1MED1enet150150.100.110.090.09−3.455.62 × 10−40.041456.97
AngiostatinPLG.3710.49.2AngiostatinPLGlasso131300.040.060.040.04−9.751.77 × 10−221.22 × 10−19241.20
REG4REG4.11102.22.3Regenerating islet-derived protein 4REG4enet242400.020.020.020.033.632.88 × 10−40.0230304.20

aSentinel variant IDs were derived from previous study by Sun et al. [55].

bSNPs within 1 MB of TSS of the target protein.

Table 2

Eight protein-PCa associations for novel proteins which have not been previously reported.

ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rateDistance of gene to closest risk SNP (kb)
CisbTrans
ARL3ARL3.12571.14.3ADP-ribosylation factor-like protein 3ARL3enet4115260.130.120.110.11−3.494.75 × 10−40.0418.39
B3GN8B3GNT8.9297.12.3UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8B3GNT8lasso7700.140.140.140.143.841.25 × 10−40.0150.95
CC134CCDC134.5587.3.3Coiled-coil domain-containing protein 134CCDC134lasso9270.020.01NANA−4.035.62 × 10−50.011271.57
DOCK9DOCK9.14002.18.3Dedicator of cytokinesis protein 9DOCK9lasso5050.040.030.040.033.358.06 × 10−40.0525449.81
Fas ligand, solubleFASLG.3052.8.2Tumor necrosis factor ligand superfamily member 6, soluble formFASLGblup1317013170.020.020.020.02−3.387.34 × 10−40.045492.30
MED-1MED1.5470.69.2Mediator of RNA polymerase II transcription subunit 1MED1enet150150.100.110.090.09−3.455.62 × 10−40.041456.97
AngiostatinPLG.3710.49.2AngiostatinPLGlasso131300.040.060.040.04−9.751.77 × 10−221.22 × 10−19241.20
REG4REG4.11102.22.3Regenerating islet-derived protein 4REG4enet242400.020.020.020.033.632.88 × 10−40.0230304.20
ProteinSOMAmer IDProtein full nameProtein-encoding geneModeling methodNum of predicting SNPs in modelType of predicting SNPs in modelModel internal cross validation R2Model external validation R2pQTL R2 in subcohort1apQTL R2 in subcohort2aZ scoreP-valueP-value false discovery rateDistance of gene to closest risk SNP (kb)
CisbTrans
ARL3ARL3.12571.14.3ADP-ribosylation factor-like protein 3ARL3enet4115260.130.120.110.11−3.494.75 × 10−40.0418.39
B3GN8B3GNT8.9297.12.3UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8B3GNT8lasso7700.140.140.140.143.841.25 × 10−40.0150.95
CC134CCDC134.5587.3.3Coiled-coil domain-containing protein 134CCDC134lasso9270.020.01NANA−4.035.62 × 10−50.011271.57
DOCK9DOCK9.14002.18.3Dedicator of cytokinesis protein 9DOCK9lasso5050.040.030.040.033.358.06 × 10−40.0525449.81
Fas ligand, solubleFASLG.3052.8.2Tumor necrosis factor ligand superfamily member 6, soluble formFASLGblup1317013170.020.020.020.02−3.387.34 × 10−40.045492.30
MED-1MED1.5470.69.2Mediator of RNA polymerase II transcription subunit 1MED1enet150150.100.110.090.09−3.455.62 × 10−40.041456.97
AngiostatinPLG.3710.49.2AngiostatinPLGlasso131300.040.060.040.04−9.751.77 × 10−221.22 × 10−19241.20
REG4REG4.11102.22.3Regenerating islet-derived protein 4REG4enet242400.020.020.020.033.632.88 × 10−40.0230304.20

aSentinel variant IDs were derived from previous study by Sun et al. [55].

bSNPs within 1 MB of TSS of the target protein.

PPI network and canonical pathways of the identified proteins associated with PCa risk. Network nodes represent proteins; edge thickness is proportional to the evidence for the PPI; and dashed lines represent the interaction among clusters. The enrichment of canonical pathways was determined using IPA software.
Figure 3

PPI network and canonical pathways of the identified proteins associated with PCa risk. Network nodes represent proteins; edge thickness is proportional to the evidence for the PPI; and dashed lines represent the interaction among clusters. The enrichment of canonical pathways was determined using IPA software.

Candidate drugs targeting identified proteins

Based on interrogation using the OpenTargets, 15 of the associated proteins were supported to be relevant to PCa, based on an annotated “overallAssociationScore” to be higher than zero for PCa related outcomes (Table 3 and Supplementary Table 10). Of them, three proteins were also targets of existing drugs that are approved for treating specific human conditions (Table 3). Our work thus indicated potential drug repurposing opportunities for these drug targets to other indications.

Table 3

Ten drug repurposing opportunities of three identified proteins.

ProteinProtein full nameProtein-encoding geneOpenTargets information (overall association score)Drugbank IDDrug nameMolecular actionMolecular docking scorea
Cathepsin SCathepsin SCTSS0.149DB12010Fostamatinibinhibitor−9
IGF-II receptorCation-independent mannose-6-phosphate receptorIGF2R0.111DB14751Mecasermin rinfabateNA
DB01277MecaserminNA
DB13173Cerliponase alfaligand
DB16099Avalglucosidase alfaligand
PDE4DcAMP-specific 3_5-cyclic phosphodiesterase 4DPDE4D0.037DB00651Dyphyllineinhibitor−6.8
DB01656Roflumilastinhibitor−8
DB05219Crisaboroleinhibitor
DB01088Iloprostinducer−7.3
DB00131Adenosine phosphateproduct of−7.3
ProteinProtein full nameProtein-encoding geneOpenTargets information (overall association score)Drugbank IDDrug nameMolecular actionMolecular docking scorea
Cathepsin SCathepsin SCTSS0.149DB12010Fostamatinibinhibitor−9
IGF-II receptorCation-independent mannose-6-phosphate receptorIGF2R0.111DB14751Mecasermin rinfabateNA
DB01277MecaserminNA
DB13173Cerliponase alfaligand
DB16099Avalglucosidase alfaligand
PDE4DcAMP-specific 3_5-cyclic phosphodiesterase 4DPDE4D0.037DB00651Dyphyllineinhibitor−6.8
DB01656Roflumilastinhibitor−8
DB05219Crisaboroleinhibitor
DB01088Iloprostinducer−7.3
DB00131Adenosine phosphateproduct of−7.3

aA score of ≤ − 7 represents a good interaction between the protein and corresponding drug agent and is bolded. For cells that are missing, they represent that the corresponding drug structure cannot be downloaded so the analysis cannot be performed.

Table 3

Ten drug repurposing opportunities of three identified proteins.

ProteinProtein full nameProtein-encoding geneOpenTargets information (overall association score)Drugbank IDDrug nameMolecular actionMolecular docking scorea
Cathepsin SCathepsin SCTSS0.149DB12010Fostamatinibinhibitor−9
IGF-II receptorCation-independent mannose-6-phosphate receptorIGF2R0.111DB14751Mecasermin rinfabateNA
DB01277MecaserminNA
DB13173Cerliponase alfaligand
DB16099Avalglucosidase alfaligand
PDE4DcAMP-specific 3_5-cyclic phosphodiesterase 4DPDE4D0.037DB00651Dyphyllineinhibitor−6.8
DB01656Roflumilastinhibitor−8
DB05219Crisaboroleinhibitor
DB01088Iloprostinducer−7.3
DB00131Adenosine phosphateproduct of−7.3
ProteinProtein full nameProtein-encoding geneOpenTargets information (overall association score)Drugbank IDDrug nameMolecular actionMolecular docking scorea
Cathepsin SCathepsin SCTSS0.149DB12010Fostamatinibinhibitor−9
IGF-II receptorCation-independent mannose-6-phosphate receptorIGF2R0.111DB14751Mecasermin rinfabateNA
DB01277MecaserminNA
DB13173Cerliponase alfaligand
DB16099Avalglucosidase alfaligand
PDE4DcAMP-specific 3_5-cyclic phosphodiesterase 4DPDE4D0.037DB00651Dyphyllineinhibitor−6.8
DB01656Roflumilastinhibitor−8
DB05219Crisaboroleinhibitor
DB01088Iloprostinducer−7.3
DB00131Adenosine phosphateproduct of−7.3

aA score of ≤ − 7 represents a good interaction between the protein and corresponding drug agent and is bolded. For cells that are missing, they represent that the corresponding drug structure cannot be downloaded so the analysis cannot be performed.

Discussion

Our study aims to identify novel protein biomarkers of which the genetically predicted abundances in blood are associated with PCa risk using protein genetic prediction models as instruments. In this large study, we identified 24 proteins that demonstrated statistically significant associations with PCa risk after FDR correction, including eight novel proteins and 16 proteins reported in our previous study using pQTLs as instruments. In previous work [16], 11 proteins have been reported to be associated with PCa risk for their measured levels, namely, Cathepsin S, IGF-II receptor, MICB, PSP-94, PDE4D, SPINT2, MED-1, Angiostatin, RED4, Fas ligand, soluble, and DOCK9 [25–35]. Among them, seven showed consistent directions between protein abundances and PCa risk in the present work and published studies.

We compared prediction performance of models we established for the 24 PCa associated proteins identified in this analysis with the proportion of protein level variance that can be captured by reported pQTLs used in our earlier work [16], in subcohort2 of INTERVAL. We found that for most proteins, the external validation R2 for the prediction models were higher than the R2 that can be captured by known pQTLs [30] (Tables 12, Supplementary Table 1 and Supplementary Fig. 2). These results confirmed that the current approach of using comprehensive genetic prediction models brought improved power for detecting protein-disease associations due to higher proportions of protein variance that can be captured. Our study provides novel information to improve the understanding of genetics and etiology of PCa and generates a list of novel proteins for risk assessment of PCa, the most common malignancy among men in most countries around the world. Here, we were able to observe significant protein-PCa associations for 16 proteins already identified in our previous research using pQTLs as instruments [16]. In vitro/in vivo studies and human studies have suggested that some of these genes may play an important role in prostate tumorigenesis. For instance, LAMC1 encodes for the laminin γ1 subunit of the laminin [36]. The interaction of cancer cells with laminin was identified as a key event in tumorigenicity and metastasis [37]. The LAMC1 was upregulated in PCa, and silencing LAMC1 markedly inhibited migration and invasion of PCa cells [38–40]. The TNFRSF6B encodes a member of the TNF receptor superfamily. In PCa, TNF-mediated prosurvival signaling is the predominant pathway that leads to cell survival and resistance to therapy [41]. The SPINT2/HAI-2 has been reported to be a potential inhibitor of matriptase and hepsin through biochemical and genetic analysis [42,43]. In PCa, a significant decrease of SPINT2 protein can promote enhanced proteolytic activity and accelerate tumor progression [30,44,45]. These prior studies support the potential role of these genes in prostate carcinogenesis.

Of note, we were able to identify associations for eight novel proteins. Among them, three associated proteins, namely, ARL3, B3GN8, and Angiostatin had their encoding genes involved in the GWAS-identified PCa risk loci. For the other five novel proteins, their encoding genes are located more than 500 kb away from any reported PCa risk variants identified in GWAS or fine-mapping studies. Of these novel proteins, several have also been found to potentially play functional roles in PCa development. For example, the FAS receptor can trigger apoptosis by interacting with its ligand (FASLG). The genetic alteration of the FAS/FASLG signal pathway could induce tumorigenesis by resulting in immune escape [46]. As a key component of the Mediator complex, inhibition of the CDK7-directed MED1 phosphorylation led to tumor growth inhibition of castration-resistant PCa [47]. MED1 overexpression and proliferation are associated with ERK and AKT signaling in PCa [31]. REG4 is suggested to be a candidate marker of PCa metastasis or hormone refractory growth [33]. Plasminogen (PLG) activator inhibitor type1 expressed by human prostate carcinoma cells can inhibit tumor growth, angiogenesis and metastasis [48]. Compared with healthy donors, plasma of PCa patients contains increased levels of IgG capable of binding to PLG [49]. In our IPA study, PLG was involved in three PCa-related categories in diseases and biological functions analysis and two networks annotated with cellular survival and development functions (Supplementary Tables 4 and 5), indicating an important role of this gene.

Prior transcriptome-wide association studies (TWAS) have investigated associations between genetically predicted gene expression and PCa risk. It is worth noting that our proteome-wide association study (PWAS) analysis applies a different design of focusing on predicted protein expression, which is known to be closer to human diseases and phenotypes compared with RNA level expression. Due to the number of proteins that can be captured in the currently available proteome measurement platform, we are able to only assess a few thousand proteins, the number of which is less than 10% of the genes that can be assessed in TWAS. We found that six of the genes (ARL3, CTSS, IGF2R, MICB, SPINT2, and TNFRSF6B) encoding identified associated proteins were also significantly associated with PCa risk in the published TWAS [50–52]. Interestingly, inconsistent associations between the mRNA levels-PCa risk and protein levels-PCa risk were observed for IGF2R and MICB. This suggests that the protein-PCa associations and transcript-PCa associations are generally distinct, and only a small proportion tend to be shared. This highlights the importance of conducting comprehensive PWAS like our study to better characterize disease related proteins to improve its etiology understanding.

Based on drug repurposing analyses, we prioritized several drugs that may serve as promising candidates for treating PCa, such as fostamatinib targeting Cathepsin S, and roflumilast, iloprost, and adenosine phosphate targeting PDE4D. Previous research has indicated potential effects of these drugs on PCa. For example, it has been shown that fostamatinib can effectively inhibit the proliferation of PCa cells [53], and roflumilast can reduce the viability of PCa cells [54]. Future work focusing on humans would be needed to evaluate potential utility of such drugs in treating PCa.

The sample size of our study for the main association analysis was large, providing high statistical power to detect the protein-PCa associations. Also, the design of using genetic instruments reduces biases, such as selection bias and potential confounding, and eliminates potential influence due to reverse causation. Compared with the design of using pQTLs as instruments, the current design of using comprehensive protein genetic prediction models should provide an improved power for detecting associations. Indeed, of the nine proteins which were reported in our previous work using pQTLs as instruments but did not show significant associations at FDR significance in the current study, predict models for them were built and their PCa-risk associations showed a P-value < 0.05 in the present study (Supplementary Table 1). Reassuringly, the directions of associations between predicted protein levels and PCa risk observed in the current work were concordant with those in the previous study [16]. The results provided further assurance for the robustness of our findings. On the other hand, potential limitations of our study need to be recognized. Our findings may subject to potential pleiotropic effects, limiting the ability to draw causal insights. Further mechanistic investigation would be needed to establish the potential causal relationships. The current work focuses on PCa risk. Thus, the identified proteins may play a role in PCa etiology. Future research focusing on PCa aggressiveness, a more clinically relevant outcome, would be needed, with the aim of developing effective strategies for predicting the lethal form of this common cancer.

Conclusion

In summary, we identified 24 proteins with genetically predicted circulating levels to be associated with PCa risk. The identified proteins could serve as promising candidates in future investigations. An in-depth investigation of these proteins will provide novel evidence revealing mechanistic networks underlying prostate carcinogenesis.

Materials and Methods

Protein genetic prediction model development and validation

To develop the protein genetic prediction models, the genome and plasma proteome data of two subcohorts [1 and 2] of the INTERVAL study were analyzed. The relative protein abundances within each subcohort were normalized by rank-inverse normalization. The sample information has been described previously in detail [55]. Briefly, the participants were in general healthy and Europeans. The relative concentrations of 3622 plasma proteins were measured using SOMAscan assay. At the sample and SOMAmer levels, quality control (QC) was performed based on replicate calibrator samples and control aptamers. A total of 3283 SOMAmers mapping to 2994 unique proteins were included for further analysis. Information of about 830 000 variants was obtained using the Affymetrix Axiom UK Biobank genotyping array. Standard data processing was performed according to the original publication [55,56]. SHAPEIT3 and Sanger Imputation Server were further used to phase and impute SNPs using a combined 1000 Genomes Phase 3-UK10K reference panel. Of a total of 87 696 888 imputed variants, the following criteria were further applied to select SNPs: (1) imputation quality of at least 0.7, (2) minor allele count of at least 5%, (3) missing rates <5%, and (4) SNPs present in the 1000 Genome reference panel for European populations. Finally, 4 662 360 high-quality variants passed these criteria and were retained for use.

The protein profiles were preprocessed by log2 transformation and adjusted for age, sex, the first three principal components, and duration between blood draw and processing in subcohort1 (N = 2481). TWAS/FUSION framework [57] was used to establish genetic prediction models based on the rank-inverse normalized residuals of each protein of interest.

We determined potentially associated SNPs in both cis- and trans- regions for each protein of interest. Cis-regions were defined to be within 1 Mb of the transcriptional start site (TSS) of the gene encoding the target protein of interest. A FDR < 0.05 was used to determine significantly associated SNPs in cis-regions and P-value < 5 × 10−8 was used to determine associated SNPs in trans regions. We then established protein prediction models using nearby strand unambiguous SNPs (within 100 kb) of such potentially associated SNPs as potential predictors. We implemented four statistics methods, namely, Best Linear Unbiased Predictor (BLUP), Least Absolute Shrinkage and Selection Operator (LASSO), elastic net (enet), and top1 to construct the prediction models [58]. The cross-validation P-value represents the statistical significance of the linear regression model that assesses the relationship between predicted levels and measured levels. The model showing the most significant cross-validation P-value among those developed using the four methods was used for downstream analysis of the protein of interest. The external validation was conducted using the subcohort2 (N = 820) dataset. In brief, the established genetic prediction models were applied to genetic data of subcohort2 to generate the genetically predicted levels for each protein of interest, which were compared with measured levels. Protein prediction models with a model prediction R2 ≥ 0.01 in subcohort1 and a correlation coefficient ≥ 0.01 (predicted vs measured protein levels) in subcohort2 were retained for the downstream association analysis.

Assess associations of genetically predicted protein levels with PCa risk

To estimate the associations between genetically predicted protein concentrations and PCa risk, the validated protein prediction models were applied to the summary statistics from the large GWAS meta-analysis of PCa risk involving 79 194 PCa cases and 61 112 controls of European Ancestry [20,51,52,59,60]. The protein-PCa risk associations were determined by TWAS/FUSION framework, which leveraged correlation between SNPs included in the prediction models based on 1000 Genomes Project phase III data of the European ancestry [57]. The associations between predicted protein levels and PCa risk were determined to be statistically significant using the FDR ≤ 0.05 as the threshold.

Robustness of association results

To investigate the robustness of the observed associations, we performed three sensitivity analyses. Firstly, focusing on the identified associated proteins, we trained their prediction models using data of INTERVAL subcohort2 to rerun association analyses with PCa risk. The same model development method as described above was used. Secondly, we conducted SMR analyses (v.1.3.0 software) [61] to test whether the associated proteins showed consistent effects using this approach. For this analysis, the Plink was used to determine pQTLs in the subcohort1 dataset using linear regression. We further applied SMR to the pQTL results and the PCa GWAS summary statistics. The heterogeneity in dependent instruments (HEIDI) test was used to distinguish pleiotropy from linkage and PHEIDI ≥ 0.01 was used as the threshold for passing the HEIDI test. Thirdly, we conducted 2ScML analysis [62] to examine the protein-PCa associations. For this analysis, valid instruments were selected using constrained maximum likelihood and a minimum of three predictors within the predictive model was required. We used the SNPs from the next best prediction model when there were less than three predictors or more than 500 predictors for the best prediction model for each protein of interest. Associations showing the consistent effect direction and nominal P-value < 0.05 were considered replicated.

Somatic variants of genes encoding associated proteins

For each of the genes encoding the proteins that are associated with PCa risk, we evaluated nonsynonymous somatic variants (missense mutations, splice site mutations, nonstop mutations, nonsense mutations, frameshift mutations, in-frame mutations and translation start site mutations) in prostate tumor vs tumor-adjacent normal tissues from patients with PCa included in TCGA dataset. The somatic variants in each TCGA-PRAD patient were calculated using the MuTect2-processed mutation data which are deposited to the GDC data portal [63]. Analysis was performed using the “TCGAbiolinks” R package [64]. We compared the proportion of assessed genes encoding the identified proteins with that of genes encoding the 1308 proteins tested for association analyses as well as that of genes across the whole genome. We also evaluated the somatic changes in TCGA uveal melanoma (TCGA-UVM) patients as a negative control. Analysis was performed using socscistatistics website (https://www.socscistatistics.com/tests/ztest/default2.aspx).

IPA and PPI analysis

The IPA was performed to assess enriched pathways, networks, and molecular functions of the genes encoding PCa-associated proteins. The detailed methodology of this tool has been described elsewhere [65]. In brief, an “enrichment” score (Fisher exact test P-value) that measures overlap of observed and predicted regulated gene sets was generated for each of the tested gene sets. Genes encoding the 1308 proteins tested for association analyses with PCa risk were set as background. The most significant pathways and functions with an enrichment P-value less than 0.05 were reported. We also assessed the PPI network using STRING database version 11.5 with 0.400 confidence level [66] and the proteins were classified into three clusters using the algorithms K-means.

Drug repurposing analysis

For the identified associated proteins, we firstly evaluated whether there was evidence supporting their potential relevance with PCa by using the OpenTargets [67]. Basically, those showing a positive overallAssociationScore with prostate cancer related outcomes were retained. Focusing on those showing a potential relevance, we further mined evidence of their targeting drugs using the DrugBank database [68]. Briefly, we searched the proteins of interest and assessed whether there were any drugs with a group annotation indicating “approved” or “approved, investigational” or “approved, experimental” etc. Those with a group annotation of only “experimental” were not considered. Furthermore, we conducted molecular docking analysis for the identified proteins and corresponding candidate drug agents [69]. The binding affinity scores (kcal/mol) for each pair of proteins and drugs were calculated.

Acknowledgements

We thank The PRACTICAL, CRUK, BPC3, CAPS, and PEGASUS consortia for making the PCa GWAS summary statistics publicly available. The authors also would like to thank all of the individuals for their participation in the parent PRACTICAL studies and all the researchers, clinicians, technicians, and administrative staff for their contribution to the studies. The Prostate cancer genome-wide association analyses are supported by the Canadian Institutes of Health Research, European Commission’s Seventh Framework Programme grant agreement n° 223175 (HEALTH-F2-2009-223175), Cancer Research UK Grants C5047/A7357, C1287/A10118, C1287/A16563, C5047/A3354, C5047/A10692, C16913/A6135, and The National Institute of Health (NIH) Cancer Post-Cancer GWAS initiative grant: No. 1 U19 CA 148537-01 (the GAME-ON initiative). We would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now PCUK), The Orchid Cancer Appeal, Rosetrees Trust, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. The Prostate Cancer Program of Cancer Council Victoria also acknowledge grant support from The National Health and Medical Research Council, Australia (126402, 209057, 251533, 396414, 450104, 504700, 504702, 504715, 623204, 940394, 614296), VicHealth, Cancer Council Victoria, The Prostate Cancer Foundation of Australia, The Whitten Foundation, PricewaterhouseCoopers, and Tattersall’s. EAO, DMK, and EMK acknowledge the Intramural Program of the National Human Genome Research Institute for their support. Genotyping of the OncoArray was funded by the US National Institutes of Health (NIH) [U19 CA 148537 for ELucidating Loci Involved in Prostate cancer SuscEptibility (ELLIPSE) project and X01HG007492 to the Center for Inherited Disease Research (CIDR) under contract number HHSN268201200008I] and by Cancer Research UK grant A8197/A16565. Additional analytic support was provided by NIH NCI U01 CA188392 (PI: Schumacher). Funding for the iCOGS infrastructure came from: the European Community’s Seventh Framework Programme under grant agreement n° 223175 (HEALTH-F2-2009-223175) (COGS), Cancer Research UK (C1287/A10118, C1287/A 10710, C12292/A11174, C1281/A12014, C5047/A8384, C5047/A15007, C5047/A10692, C8197/A16565), the National Institutes of Health (CA128978, CA128813) and Post-Cancer GWAS initiative (1U19 CA148537, 1U19 CA148065 and 1U19 CA148112—the GAME-ON initiative), the Department of Defense (W81XWH-10-1-0341), the Canadian Institutes of Health Research (CIHR) for the CIHR Team in Familial Risks of Breast Cancer, Komen Foundation for the Cure, the Breast Cancer Research Foundation, and the Ovarian Cancer Research Fund. The BPC3 was supported by the U.S. National Institutes of Health, National Cancer Institute (cooperative agreements U01-CA98233 to D.J.H., U01-CA98710 to S.M.G., U01-CA98216 to E.R., and U01-CA98758 to B.E.H., and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics). CAPS GWAS study was supported by the Swedish Cancer Foundation (grant no 09-0677, 11-484, 12-823), the Cancer Risk Prediction Center (CRisP; www.crispcenter.org), a Linneus Centre (Contract ID 70867902) financed by the Swedish Research Council, Swedish Research Council (grant no K2010-70X-20430-04-3, 2014-2269). PEGASUS was supported by the Intramural Research Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health. Participants in the INTERVAL randomised controlled trial were recruited with the active collaboration of NHS Blood and Transplant England (www.nhsbt.nhs.uk), which has supported field work and other elements of the trial. DNA extraction and genotyping were co-funded by the National Institute for Health Research (NIHR), the NIHR BioResource (http://bioresource.nihr.ac.uk) and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014) [*]. The academic coordinating centre for INTERVAL was supported by core funding from the: NIHR Blood and Transplant Research Unit in Donor Health and Genomics (NIHR BTRU-2014-10024), UK Medical Research Council (MR/L003120/1), British Heart Foundation (SP/09/002; RG/13/13/30194; RG/18/13/33946) and NIHR Cambridge BRC (BRC-1215-20014) [*]. A complete list of the investigators and contributors to the INTERVAL trial is provided in reference [**]. The academic coordinating centre would like to thank blood donor centre staff and blood donors for participating in the INTERVAL trial. *The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. **Di Angelantonio E, Thompson SG, Kaptoge SK, Moore C, Walker M, Armitage J, Ouwehand WH, Roberts DJ, Danesh J, INTERVAL Trial Group. Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors. Lancet. 2017 Nov 25;390(10110):2360-2371.

Author contributions

L.W. conceived the study. C.W. and J.Z. contributed to the study design and prediction model building. S.L., H.Z, and D.H.G. performed model building and statistical analyses. J.Z. performed the drug repurposing curation. M.A.A. performed molecular docking analysis. H.Z. contributed to the bioinformatics and pathway analyses. S.L. prepared figs. L.W. and H.Z. wrote the first version of manuscript, and J.Z. and S. L. significantly revised and/or wrote additional contents/sections. S.L., P.S., T.L., S.F., H-W. D., H.Y., A.B., and H.Y. contributed to manuscript revision and/or INTERVAL data management. All authors have reviewed and approved the final manuscript.

Conflict of Interest statement

The authors declare that they have no conflict of interest.

Funding

This research is supported by University of Hawaii Cancer Center. Lang Wu is also supported by V Foundation V Scholar Award and R01CA263494-01A1. Chong Wu is supported by NIA 1R03AG070669 and NCI R01CA263494-01A1. Jingjing Zhu was supported by NCI T32 Postdoctoral Fellowship (T32 CA229110: Multidisciplinary Training in Ethnic Diversity and Cancer Disparities).

Data Availability

Full association results of this study are available from the corresponding author upon request. For the INTERVAL SomaLogic study, the individual-level genotype and protein data, and full summary association results from the genetic analysis, are available through the European Genotype Archive (accession number EGAS00001002555). The summary statistics of genome-wide association studies of prostate cancer in the PRACTICAL consortium are available at http://practical.icr.ac.uk/blog/?page_id=8164. Genetic prediction models for the 24 significantly associated proteins have been provided in Supplementary Table 11. The models for other proteins can be requested through the corresponding author.

References

1.

Sung
,
H.
,
Ferlay
,
J.
,
Siegel
,
R.L.
et al. (
2021
)
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
,
71
,
209
249
.

2.

Siegel
,
R.L.
,
Miller
,
K.D.
,
Wagle
,
N.S.
et al. (
2023
)
Cancer statistics, 2023
.
CA Cancer J Clin
,
73
,
17
48
.

3.

Gaudreau
,
P.-O.
,
Stagg
,
J.
,
Soulières
,
D.
et al. (
2016
)
The present and future of biomarkers in prostate cancer: proteomics, genomics, and immunology advancements
.
Biomark Cancer
,
8
,
15
33
.

4.

Sardana
,
G.
,
Dowell
,
B.
and
Diamandis
,
E.P.
(
2008
)
Emerging biomarkers for the diagnosis and prognosis of prostate cancer
.
Clin Chem
,
54
,
1951
1960
.

5.

David Crawford
,
E.
,
Ventii
,
K.
and
Shore
,
N.D.
(
2014
)
New biomarkers in prostate cancer
.
ONCOLOGY (United States)
,
28
,
135
142
.

6.

Stephan
,
C.
,
Rittenhouse
,
H.
,
Hu
,
X.
et al. (
2014
)
Prostate-specific antigen (PSA) screening and new biomarkers for prostate cancer (PCa)
.
EJIFCC
,
25
,
55
78
.

7.

Schröder
,
F.H.
,
Hugosson
,
J.
,
Roobol
,
M.J.
et al. (
2014
)
Screening and prostate cancer mortality: results of the European randomised study of screening for prostate cancer (ERSPC) at 13 years of follow-up
.
Lancet
,
384
,
2027
2035
.

8.

Perron
,
L.
,
Moore
,
L.
,
Bairati
,
I.
et al. (
2002
)
PSA screening and prostate cancer mortality
.
CMAJ
,
166
,
586
591
.

9.

Draisma
,
G.
,
Etzioni
,
R.
,
Tsodikov
,
A.
et al. (
2009
)
Lead time and overdiagnosis in prostate-specific antigen screening: importance of methods and context
.
J Natl Cancer Inst
,
101
,
374
383
.

10.

Nakashima
,
J.
,
Tachibana
,
M.
,
Horiguchi
,
Y.
et al. (
2000
)
Serum interleukin 6 as a prognostic factor in patients with prostate cancer
.
Clin Cancer Res
,
6
,
2702
2706
.

11.

Stephan
,
C.
,
Meyer
,
H.A.
,
Cammann
,
H.
et al. (
2006
)
Improved prostate cancer detection with a human kallikrein 11 and percentage free PSA-based artificial neural network
.
Biol Chem
,
387
,
801
805
.

12.

Uetsuki
,
H.
,
Tsunemori
,
H.
,
Taoka
,
R.
et al. (
2005
)
Expression of a novel biomarker, EPCA, in adenocarcinomas and precancerous lesions in the prostate
.
J Urol
,
174
,
514
518
.

13.

Paul
,
B.
,
Dhir
,
R.
,
Landsittel
,
D.
et al. (
2005
)
Detection of prostate cancer with a blood-based assay for early prostate cancer antigen
.
Cancer Res
,
65
,
4097
4100
.

14.

Burgess
,
S.
,
Small
,
D.S.
and
Thompson
,
S.G.
(
2017
)
A review of instrumental variable estimators for Mendelian randomization
.
Stat Methods Med Res
,
26
,
2333
2355
.

15.

Farashi
,
S.
,
Kryza
,
T.
,
Clements
,
J.
et al. (
2019
)
Post-GWAS in prostate cancer: from genetic association to biological contribution
.
Nat Rev Cancer
,
19
,
46
59
.

16.

Wu
,
L.
,
Shu
,
X.
,
Bao
,
J.
et al. (
2019
)
Analysis of over 140,000 European descendants identifies genetically predicted blood protein biomarkers associated with prostate cancer risk
.
Cancer Res
,
79
,
4592
4598
.

17.

Conti
,
D.V.
,
Darst
,
B.F.
,
Moss
,
L.C.
et al. (
2021
)
Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction
.
Nat Genet
,
53
,
65
75
.

18.

Eeles
,
R.A.
,
Al Olama
,
A.A.
,
Benlloch
,
S.
et al. (
2013
)
Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array
.
Nat Genet
,
45
,
385
391
.

19.

Al Olama
,
A.A.
,
Kote-Jarai
,
Z.
,
Berndt
,
S.I.
et al. (
2014
)
A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer
.
Nat Genet
,
46
,
1103
1109
.

20.

Schumacher
,
F.R.
,
Al Olama
,
A.A.
,
Berndt
,
S.I.
et al. (
2018
)
Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci
.
Nat Genet
,
50
,
928
936
.

21.

Goufmana
,
E.I.
,
Iakovlev
,
V.N.
,
Tikhonov
,
N.B.
et al. (
2015
)
Quantification of autoantibodies to plasminogen in plasma of patients with cancer
.
Cancer Biomark
,
15
,
281
287
.

22.

Liu
,
Q.-Y.
,
Rubin
,
M.A.
,
Omene
,
C.
et al. (
1998
)
Fas ligand is constitutively secreted by prostate cancer cells in vitro
.
Clin Cancer Res
,
4
,
1803
1811
.

23.

Han
,
H.
,
Lee
,
H.H.
,
Choi
,
K.
et al. (
2021
)
Prostate epithelial genes define therapy-relevant prostate cancer molecular subtype
.
Prostate Cancer Prostatic Dis
,
24
,
1080
1092
.

24.

Sung
,
S.-Y.
,
Kubo
,
H.
,
Shigemura
,
K.
et al. (
2006
)
Oxidative stress induces ADAM9 protein expression in human prostate cancer cells
.
Cancer Res
,
66
,
9519
9526
.

25.

Lindahl
,
C.
,
Simonsson
,
M.
,
Bergh
,
A.
et al. (
2009
)
Increased levels of macrophage-secreted cathepsin S during prostate cancer progression in TRAMP mice and patients
.
Cancer Genomics Proteomics
,
6
,
149
159
.

26.

Bouffard
,
E.
,
Mauriello Jimenez
,
C.
,
El Cheikh
,
K.
et al. (
2019
)
Efficient photodynamic therapy of prostate cancer cells through an improved targeting of the cation-independent mannose 6-phosphate receptor
.
Int J Mol Sci
,
20
,
2809
.

27.

Liu
,
G.
,
Lu
,
S.
,
Wang
,
X.
et al. (
2013
)
Perturbation of NK cell peripheral homeostasis accelerates prostate carcinoma metastasis
.
J Clin Invest
,
123
,
4410
4422
.

28.

Haiman
,
C.A.
,
Stram
,
D.O.
,
Vickers
,
A.J.
et al. (
2013
)
Levels of beta-microseminoprotein in blood and risk of prostate cancer in multiple populations
.
J Natl Cancer Inst
,
105
,
237
243
.

29.

Rahrmann
,
E.P.
,
Collier
,
L.S.
,
Knutson
,
T.P.
et al. (
2009
)
Identification of PDE4D as a proliferation promoting factor in prostate cancer using a sleeping beauty transposon-based somatic mutagenesis screen
.
Cancer Res
,
69
,
4388
4397
.

30.

Pereira
,
M.S.
,
de
Almeida
,
G.C.
,
Pinto
,
F.
et al. (
2016
)
SPINT2 deregulation in prostate carcinoma
.
J Histochem Cytochem
,
64
,
32
41
.

31.

Jin
,
F.
,
Irshad
,
S.
,
Yu
,
W.
et al. (
2013
)
ERK and AKT Signaling drive MED1 overexpression in prostate cancer in association with elevated proliferation and tumorigenicity
.
Mol Cancer Res
,
11
,
736
747
.

32.

Kawahara
,
R.
,
Recuero
,
S.
,
Nogueira
,
F.C.S.
et al. (
2019
)
Tissue proteome signatures associated with five grades of prostate cancer and benign prostatic hyperplasia
.
Proteomics
,
19
,
1900174
.

33.

Gu
,
Z.
,
Rubin
,
M.A.
,
Yang
,
Y.
et al. (
2005
)
Reg IV: a promising marker of hormone refractory metastatic prostate cancer
.
Clin Cancer Res
,
11
,
2237
2243
.

34.

Hyer
,
M.L.
,
Sudarshan
,
S.
,
Schwartz
,
D.A.
et al. (
2003
)
Quantification and characterization of the bystander effect in prostate cancer cells following adenovirus-mediated FasL expression
.
Cancer Gene Ther
,
10
,
330
339
.

35.

Alkhateeb
,
A.
,
Rezaeian
,
I.
,
Singireddy
,
S.
et al. (
2019
)
Transcriptomics signature from next-generation sequencing data reveals new transcriptomic biomarkers related to prostate cancer
.
Cancer Inform
,
18
,
1176935119835522
.

36.

Smyth
,
N.
,
Vatansever
,
S.H.
,
Murray
,
P.
et al. (
1999
)
Absence of basement membranes after targeting the LAMC1 gene results in embryonic lethality due to failure of endoderm differentiation
.
J Cell Biol
,
144
,
151
160
.

37.

Givant-Horwitz
,
V.
,
Davidson
,
B.
and
Reich
,
R.
(
2005
)
Laminin-induced signaling in tumor cells
.
Cancer Lett
,
223
,
1
10
.

38.

Sprenger
,
C.C.T.
,
Drivdahl
,
R.H.
,
Woodke
,
L.B.
et al. (
2008
)
Senescence-induced alterations of laminin chain expression modulate tumorigenicity of prostate cancer cells
.
Neoplasia
,
10
,
1350
1361
.

39.

Pasqualini
,
L.
,
Bu
,
H.
,
Puhr
,
M.
et al. (
2015
)
miR-22 and miR-29a are members of the androgen receptor cistrome modulating LAMC1 and Mcl-1 in prostate cancer
.
Mol Endocrinol
,
29
,
1037
1054
.

40.

Nishikawa
,
R.
,
Goto
,
Y.
,
Kojima
,
S.
et al. (
2014
)
Tumor-suppressive microRNA-29s inhibit cancer cell migration and invasion via targeting LAMC1 in prostate cancer
.
Int J Oncol
,
45
,
401
410
.

41.

Srinivasan
,
S.
,
Kumar
,
R.
,
Koduru
,
S.
et al. (
2010
)
Inhibiting TNF-mediated signaling: a novel therapeutic paradigm for androgen independent prostate cancer
.
Apoptosis
,
15
,
153
161
.

42.

Kirchhofer
,
D.
,
Peek
,
M.
,
Lipari
,
M.T.
et al. (
2005
)
Hepsin activates pro-hepatocyte growth factor and is inhibited by hepatocyte growth factor activator inhibitor-1B (HAI-1B) and HAI-2
.
FEBS Lett
,
579
,
1945
1950
.

43.

Szabo
,
R.
,
Hobson
,
J.P.
,
List
,
K.
et al. (
2008
)
Potent inhibition and global co-localization implicate the transmembrane Kunitz-type serine protease inhibitor hepatocyte growth factor activator inhibitor-2 in the regulation of epithelial matriptase activity
.
J Biol Chem
,
283
,
29495
29504
.

44.

Tsai
,
C.-H.
,
Teng
,
C.-H.
,
Tu
,
Y.-T.
et al. (
2014
)
HAI-2 suppresses the invasive growth and metastasis of prostate cancer through regulation of matriptase
.
Oncogene
,
33
,
4643
4652
.

45.

Bergum
,
C.
and
List
,
K.
(
2010
)
Loss of the matriptase inhibitor HAI-2 during prostate cancer progression
.
Prostate
,
70
,
1422
1428
.

46.

Lei
,
D.
,
Sturgis
,
E.M.
,
Wang
,
L.-E.
et al. (
2010
)
FAS and FASLG genetic variants and risk for second primary malignancy in patients with squamous cell carcinoma of the head and NeckFAS and FASLG polymorphisms and second primary malignancies
.
Cancer Epidemiol Biomark Prev
,
19
,
1484
1491
.

47.

Ur Rasool
,
R.
,
Natesan
,
R.
,
Deng
,
Q.
et al. (
2019
)
CDK7 inhibition suppresses castration-resistant prostate cancer through MED1 inactivation
.
Cancer Discov
,
9
,
1538
1555
.

48.

Soff
,
G.A.
,
Sanderowitz
,
J.
,
Gately
,
S.
et al. (
1995
)
Expression of plasminogen activator inhibitor type 1 by human prostate carcinoma cells inhibits primary tumor growth, tumor-associated angiogenesis, and metastasis to lung and liver in an athymic mouse model
.
J Clin Invest
,
96
,
2593
2600
.

49.

Lokshin
,
A.
,
Mikhaleva
,
L.M.
,
Goufman
,
E.I.
et al. (
2021
)
Proteolyzed variant of IgG with free C-terminal lysine as a biomarker of prostate cancer
.
Biology (Basel)
,
10
,
817
.

50.

Mancuso
,
N.
,
Gayther
,
S.
,
Gusev
,
A.
et al. (
2018
)
Large-scale transcriptome-wide association study identifies new prostate cancer risk regions
.
Nat Commun
,
9
,
4079
.

51.

Liu
,
D.
,
Zhu
,
J.
,
Zhou
,
D.
et al. (
2022
)
A transcriptome-wide association study identifies novel candidate susceptibility genes for prostate cancer risk
.
Int J Cancer
,
150
,
80
90
.

52.

Wu
,
L.
,
Wang
,
J.
,
Cai
,
Q.
et al. (
2019
)
Identification of novel susceptibility loci and genes for prostate cancer risk: a transcriptome-wide association study in over 140,000 European descendants
.
Cancer Res
,
79
,
3192
3204
.

53.

Wang
,
J.
,
Wang
,
L.
,
Chen
,
S.
et al. (
2020
)
PKMYT1 is associated with prostate cancer malignancy and may serve as a therapeutic target
.
Gene
,
744
,
144608
.

54.

Abdel-Wahab
,
B.A.
,
Walbi
,
I.A.
,
Albarqi
,
H.A.
et al. (
2021
)
Roflumilast protects from cisplatin-induced testicular toxicity in male rats and enhances its cytotoxicity in prostate cancer cell line. Role of NF-κB-p65, cAMP/PKA and Nrf2/HO-1, NQO1 signaling
.
Food Chem Toxicol
,
151
,
112133
.

55.

Sun
,
B.B.
,
Maranville
,
J.C.
,
Peters
,
J.E.
et al. (
2018
)
Genomic atlas of the human plasma proteome
.
Nature
,
558
,
73
79
.

56.

Astle
,
W.J.
,
Elding
,
H.
,
Jiang
,
T.
et al. (
2016
)
The allelic landscape of human blood cell trait variation and links to common complex disease
.
Cell
,
167
,
1415
1429
.

57.

Gusev
,
A.
,
Ko
,
A.
,
Shi
,
H.
et al. (
2016
)
Integrative approaches for large-scale transcriptome-wide association studies
.
Nat Genet
,
48
,
245
252
.

58.

Zhong
,
H.
,
Liu
,
S.
,
Zhu
,
J.
et al. (
2023
)
Associations between genetically predicted levels of blood metabolites and pancreatic cancer risk
.
Int J Cancer
,
153
,
103
110
.

59.

Liu
,
D.
,
Zhu
,
J.
,
Zhao
,
T.
et al. (
2021
)
Associations between genetically predicted plasma N-Glycans and prostate cancer risk: analysis of over 140,000 European descendants
.
Pharmgenomics Pers Med
,
14
,
1211
1220
.

60.

Wu
,
C.
,
Zhu
,
J.
,
King
,
A.
et al. (
2021
)
Novel strategy for disease risk prediction incorporating predicted gene expression and DNA methylation data: a multi-phased study of prostate cancer
.
Cancer Commun
,
41
,
1387
1397
.

61.

Zhu
,
Z.
,
Zhang
,
F.
,
Hu
,
H.
et al. (
2016
)
Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets
.
Nat Genet
,
48
,
481
487
.

62.

Xue
,
H.
,
Shen
,
X.
and
Pan
,
W.
(
2023
)
Causal inference in transcriptome-wide association studies with invalid instruments and GWAS summary data
.
J Am Stat Assoc
,
0
,
1
13
.

63.

Benjamin
,
D.
,
Sato
,
T.
,
Cibulskis
,
K.
et al. (
2019
)
Calling somatic SNVs and indels with Mutect2
.
bioRxiv
,
861054
.

64.

Mounir
,
M.
,
Lucchetta
,
M.
,
Silva
,
T.C.
et al. (
2019
)
New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEX
.
PLoS Comput Biol
,
15
,
e1006701
.

65.

Krämer
,
A.
,
Green
,
J.
,
Pollard
,
J.
, Jr.
et al. (
2014
)
Causal analysis approaches in ingenuity pathway analysis
.
Bioinformatics
,
30
,
523
530
.

66.

Szklarczyk
,
D.
,
Gable
,
A.L.
,
Nastou
,
K.C.
et al. (
2021
)
The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets
.
Nucleic Acids Res
,
49
,
D605
D612
.

67.

Koscielny
,
G.
,
An
,
P.
,
Carvalho-Silva
,
D.
et al. (
2017
)
Open targets: a platform for therapeutic target identification and validation
.
Nucleic Acids Res
,
45
,
D985
D994
.

68.

Wishart
,
D.S.
,
Knox
,
C.
,
Guo
,
A.C.
et al. (
2006
)
DrugBank: a comprehensive resource for in silico drug discovery and exploration
.
Nucleic Acids Res
,
34
,
D668
D672
.

69.

Alam
MA
,
Shen
H
,
Deng
H-W
.
A robust kernel machine regression towards biomarker selection in multi-omics datasets of osteoporosis for drug discovery
.
2022
;
arXiv preprint arXiv:2201.05060
.

Author notes

Hua Zhong, Jingjing Zhu, and Shuai Liu contributed equally to this work and are co-first authors.

Chong Wu and Lang Wu jointly supervised this work and are co-senior authors.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model ( https://academic.oup.com/pages/standard-publication-reuse-rights)