Circulating proteins and risk of pancreatic cancer: a case-subcohort study among Chinese adults

Abstract Background Pancreatic cancer has a very poor prognosis. Biomarkers that may help predict or diagnose pancreatic cancer may lead to earlier diagnosis and improved survival. Methods The prospective China Kadoorie Biobank (CKB) recruited 512 891 adults aged 30–79 years during 2004–08, recording 702 incident cases of pancreatic cancer during 9 years of follow-up. We conducted a case-subcohort study measuring 92 proteins in 610 cases and a subcohort of 623 individuals, using the OLINK immuno-oncology panel in stored baseline plasma samples. Cox regression with the Prentice pseudo-partial likelihood was used to estimate adjusted hazard ratios (HRs) for risk of pancreatic cancer by protein levels. Results Among 1233 individuals (including 610 cases), several chemokines, interleukins, growth factors and membrane proteins were associated with risk of pancreatic cancer, with adjusted HRs per 1 standard deviation (SD) of 0.86 to 1.86, including monocyte chemotactic protein 3 (MCP3/CCL7) {1.29 [95% CI (confidence interval) (1.10, 1.51)]}, angiopoietin-2 (ANGPT2) [1.27 (1.10, 1.48)], interleukin-18 (IL18) [1.24 (1.07, 1.43)] and interleukin-6 (IL6) [1.21 (1.06, 1.38)]. Associations between some proteins [e.g. matrix metalloproteinase-7 (MMP7), hepatocyte growth factor (HGF) and tumour necrosis factor receptor superfamily member 9 [TNFRSF9)] and risk of pancreatic cancer were time-varying, with higher levels associated with higher short-term risk. Within the first year, the discriminatory ability of a model with known risk factors (age, age squared, sex, region, smoking, alcohol, education, diabetes and family history of cancer) was increased when several proteins were incorporated (weighted C-statistic changed from 0.85 to 0.99; P for difference = 4.5 × 10–5), although only a small increase in discrimination (0.77 to 0.79, P = 0.04) was achieved for long-term risk. Conclusions Several plasma proteins were associated with subsequent diagnosis of pancreatic cancer. The potential clinical utility of these biomarkers warrants further investigation.


Introduction
Pancreatic cancer has a 5-year survival of 5-10% and a median survival of 4-6 months. 1 Most patients are diagnosed at a late stage when surgical resection is not possible and treatment options are limited. 2 This is mainly due to patients developing symptoms late in the course of disease, symptoms being non-specific, 3 lack of effective screening tools, and challenges in diagnosis, 4 which is currently based mainly on computed tomography (CT) and/or magnetic resonance imaging (MRI) with magnetic resonance cholangiopancreatography (MRCP), or biopsy or fine-needle aspiration using endoscopic ultrasound (EUS). 5 Non-invasive tests of predictive utility therefore have the potential to transform patient care.
The aetiology of pancreatic cancer remains poorly understood, although several risk factors have been identified, such as diabetes, chronic pancreatitis, smoking, family history of certain cancers and some germline mutations, adiposity, alcohol consumption, gallstones, dietary factors and some chronic infections. 1,6-10 Inflammation plays an important role in pancreatic carcinogenesis. 11,12 Precursor lesions exist but many are undetectable by imaging. 13 However, pancreatic intraepithelial neoplasia (PanIN) lesions may secrete factors that modify their microenvironment. 14 Although some risk factors, signs and symptoms can help identify individuals at high risk, predicting risk of pancreatic cancer is challenging. A few biomarkers have been identified, carbohydrate antigen 19-9 (CA 19-9) being the most well established, but their discriminatory ability is limited and they are not recommended for screening asymptomatic individuals. 15,16 Other tumour markers and proteins have been studied but they have not been shown to substantially improve on the sensitivity and specificity of CA 19-9 alone. 17 A compendium of secreted proteins overexpressed in pancreatic cancer has been published 18 and such blood-based biomarkers may have a role in predicting or diagnosing the disease. In this case-subcohort study within the China Kadoorie Biobank (CKB), we aimed to examine the prospective associations of >90 protein biomarkers with development of pancreatic cancer and to assess the extent to which they could help predict risk of a future diagnosis.

Study population
The CKB is a prospective cohort study of 512 891 Chinese adults aged 30-79 years who were recruited from 10 geographically defined localities (five urban and five rural) in China during 2004-08. 19 Ethics approval from the Oxford University Tropical Research Ethics Committee, the Chinese Centre for Disease Control and Prevention (CDC) Ethical Review Committee and the local CDC of each study area was obtained, and all participants provided written informed consent.

Case-subcohort study of pancreatic cancer
We designed a case-subcohort study to examine the associations of proteins with risk of pancreatic cancer. All 700 pancreatic cancer cases (ICD-10 C25) that accumulated until 1 January 2016 and had an available plasma sample were included. A subcohort of 700 participants was sampled using simple random sampling from a randomly selected subset of the baseline cohort.

Measurement of protein biomarkers
The OLINK immuno-oncology panel of 92 proteins was used, which uses proximity extension assay (PEA) technology to obtain normalized protein expression (NPX) values for the 92 proteins. These proteins are involved in tumour immunity, chemotaxis, vascular and tissue remodelling, apoptosis and autophagy (Supplementary Table S1

Statistical analysis
In total, plasma samples of 1397 participants were assayed. Participants with a history of cancer at baseline (n ¼ 21) were excluded from the main analyses. Moreover, 145 samples with either a quality control warning or precipitation (partly overlapping with those with prior cancer) were also excluded, leaving 1233 individuals (610 cases and 623 subcohort members) for the main analyses.
The associations between proteins and risk of pancreatic cancer were assessed using Cox proportional hazards models, using the Prentice pseudo-partial likelihood. 20 Models in the main analysis were stratified by region and adjusted for age, age squared, sex, smoking, alcohol drinking, educational attainment, diabetes, and time since last meal, and time in study was used as the time scale.
Proteins were standardized (i.e. values of each marker were divided by its standard deviation) in analyses where they were treated as continuous variables. For each marker, adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) per 1 standard deviation (SD) increase in protein expression were estimated. The shape of the associations was assessed by splitting protein values into groups at their quartiles and additionally by using splines (penalized splines with four degrees of freedom). The plausibility of the proportional hazards assumption was assessed using plots of scaled Schoenfeld residuals and the associated chi square tests. 21,22 We explored time dependence of associations by examining whether associations varied by the number of years between blood collection and time at risk (four groups: <1, 1 to <2, 2 to <5, !5 years) and by including an interaction with log(time þ 0.01).
We interpreted P-values <0.05 as providing some evidence of an association. In addition, transformed P-values (À log P) were plotted against their expected values based on the R enyi decomposition, 23 and adjusted P-values were calculated using the false discovery rate correction of Benjamini and Hochberg 24 to aid interpretation.
Multivariable models with several proteins were fitted using the approach of Cox and Battey. 25 Discrimination of risk prediction models was assessed using a weighted C-index. 26 Further details are given in Supplementary Methods, available as Supplementary data at IJE online.

Characteristics of individuals in the case-subcohort study
Of the 1233 participants included in the main analysis, the mean age at study baseline of pancreatic cancer cases was higher than that of subcohort participants [60.3 (SD 9.0) vs 52.1 (10.5)]. There was a lower proportion of females among cases than in the subcohort (50.6% vs 60.9%), but similar proportions of living in urban regions and similar levels of adiposity. Moreover, cases were more likely to have had regularly smoked, regularly consumed alcohol, to have rated their health as poor and to have diabetes at baseline (13.6% vs 6.3%). Among pancreatic cancer cases, the median time from study entry to diagnosis was 5.3 years [interquartile range (IQR) 4.3, range 0.05 to 11.1] and mean age at diagnosis was 66.0 (SD 8.9) ( Table 1).
Associations were similar when models were only adjusted for age and sex and stratified by region (Supplementary Figure S3 and Supplementary Table S5, available as Supplementary data at IJE online). When dichotomizing the 13 proteins for which !500 individuals had values below LOD into less than and greater than or equal to LOD, tumour necrosis factor (TNF) was associated with a higher risk of pancreatic cancer [HR ¼ 2.36, 95% CI (1.31, 4.25); P ¼ 0.0043]. The findings were otherwise in concordance with the analysis treating them as continuous variables, although associations tended to be less precisely estimated (Supplementary Figure S4, available as Supplementary data at IJE online). Supplementary Figure S5, available as Supplementary data at IJE online, shows associations of proteins per SD higher NPX by protein class. Among the chemokines, MCP3 was most strongly associated with pancreatic cancer risk. C-C motif chemokines showed a trend towards a positive association, whereas most C-X-C motif chemokines showed no evidence of association. Among interleukins, IL2, IL6 and IL18 were positively associated and IL4 was inversely associated with risk. A few members of the TNF(R) superfamily, two growth factors (ANGPT2 and HGF) and two enzymes (GZMA and HO1) were positively associated with risk. Of the membrane proteins, CD4 and CD8A were positively associated with risk. When examining the shape of associations for proteins identified as being associated with pancreatic cancer risk among proteins with <500 individuals with values below LOD (Figure 3), the associations appear monotonic and broadly consistent with a linear increase in risk. Moreover, some of the proteins not found to be significantly associated with risk of pancreatic cancer in the analysis of linear associations show monotonic trends with risk ( Supplementary Figures S6 and S7  For proteins with evidence of time-varying associations, HRs were higher in the first few years of follow-up and attenuated afterwards, as expected. Among pancreatic cancer cases, 39 (6.4%) were diagnosed within a year from baseline (i.e. when blood was collected), 47 (7.7%) at 1 to less than 2 years, 199 (32.6%) were diagnosed 2 to less than 5 years and 325 (53.3%) were diagnosed 5 years or more after blood collection. Models including an interaction of the protein with a function of time showed that for proteins showing evidence of a time-varying association, the HR was initially greater than 1 and decreased with log time, except for IL1a for which there was initially an inverse association which attenuated over time (Supplementary Table S6). When exploring the time dependence of associations by examining whether HRs varied by the number of years between blood collection and

Subgroup analyses
Analysis of subgroups showed a few differences in associations by age (Supplementary Figure S12

Sensitivity analysis
Using age as the underlying time scale with delayed entry at age at baseline (Supplementary Figure S17, Supplementary Tables S7 and S8, available as Supplementary data at IJE online), or including individuals with a history of cancer or with samples with quality control (QC) warnings or precipitation (Supplementary Table  S9, available as Supplementary data at IJE online) showed similar results.

Multivariable analyses for risk prediction
Sets of proteins identified using the Cox-Battey approach largely overlapped with the proteins identified in analyses where proteins were fitted one at a time (Supplementary  Table S10, available as Supplementary data at IJE online). MCP3 and ANGPT2 were identified in all subsets.
Adding proteins to a model with established risk factors (age, age squared, sex, region, smoking, alcohol, education, diabetes and family history of cancer) led to small increases in discriminatory ability. Adding ANGPT2 and MCP3 yielded a small increase in the weighted C statistic, from 0.   (Table 2). Adding squared terms for all proteins yielded a C of 0.990 (se 0.007), but this model may be unstable due to the large number of explanatory variables and relatively small number of events.

Discussion
In this case-subcohort study of Chinese adults, several protein biomarkers were shown to be associated with pancreatic cancer risk, including chemokines, interleukins, growth factors, enzymes and membrane proteins, with most showing a dose-response association. Some of the associations varied over follow-up time, suggesting that the associated risks may be elevated in the years preceding Figure 3 Adjusted hazard ratios for pancreatic cancer associated with selected proteins by normalized protein expression split at quartiles. Proteins were split at tertiles when quartiles were not unique. Models were adjusted for age, age squared, sex, smoking status, alcohol drinking, education, diabetes, and time since last meal, and stratified by region. Time in study was used as the time scale. The boxes are HRs and the vertical lines 95% CIs. The area of the box is inversely proportional to the variance of the logHR. The number above the box is the HR. MCP3/CCL7: monocyte chemotactic protein 3; ANGPT2: angiopoietin-2; IL18: interleukin-18; IL6: interleukin-6; LAMP3: lysosome-associated membrane glycoprotein 3; CCL3: C-C motif chemokine 3; CD4: T cell surface glycoprotein; CD8A: T cell surface glycoprotein CD8 alpha chain; HO1: haeme oxygenase 1; HGF: hepatocyte growth factor; GZMA: granzyme A; CRTAM: cytotoxic and regulatory T cell molecule diagnosis and these proteins may therefore have potential utility in predicting short-term risk. Multivariable analyses showed that adding these protein markers to conventional risk factors may lead to some improvement in discrimination when predicting pancreatic cancer risk, particularly in the short term.
Some of the markers that we have found to be associated with a higher risk throughout follow-up have been shown to be implicated in pancreatic disease. For example, MCP3/CCL7, IL4 and IL3 have been previously shown to be involved in the tumour microenvironment of pancreatic ductal adenocarcinoma and play a complex role in the regulation of tumour-promoting inflammation. 27 ANGPT2 is a vascular growth factor involved in angiogenesis, one of the main hallmarks of cancer. 28 It has been considered as a target for antiangiogenic therapy 29 and shown to be secreted by hepatocellular carcinoma exosomes, small extracellular vesicles which are involved in the communication between cells. 30 The ANGPT2 gene has been shown to be mutated in pancreatic neuroendocrine tumours in Asian patients. 31 Furthermore, ANGPT2-TIE2 signalling has been shown to be involved in tumour resistance to anti-VEGFA therapy 32 and in metastasis of neuroendocrine tumours. 33 In addition, prior studies have implicated IL18 in pancreatitis and pancreatic cancer 34 and higher serum levels of IL18 have been shown to be associated with prognosis in pancreatic adenocarcinoma patients. 35 Another interleukin, IL6 has also been implicated in pancreatic Figure 4 Adjusted hazard ratios for pancreatic cancer within the first and second year since study entry per standard deviation higher normalized protein expression. Models were adjusted for age, age squared, sex, smoking status, alcohol drinking, education, diabetes, and time since last meal, and stratified by region. Time in study was used as the time scale. The boxes are HRs and the horizontal lines 95% CIs. The area of the box is inversely proportional to the variance of the logHR. During the first and second years there were 39 and 47 cases, respectively. cancer and has been shown to be associated with a poorer prognosis 36 and disease progression, 37 and its receptor is being explored as a potential drug target for the disease. 38 However, a nested case-control study within the EPIC cohort found no evidence of an association of IL6 with risk of pancreatic cancer, but found weak evidence of associations for members of the TNF superfamily. 39 Similarly, a pooled analysis of five prospective cohort studies involving 470 pancreatic cancer cases found no evidence of an association of IL6, C-reactive protein (CRP) or TNFa receptor 2 with pancreatic cancer risk. 40 The difference in findings for IL6 compared with the present study may be due to the association being driven by higher levels of IL6 in the time preceding diagnosis. CCL3 and other CC chemokines have complex roles in the tumour microenvironment. 41 LAMP3 has not been previously studied in relation to pancreatic cancer, but lysosome-associated membrane proteins are involved in autophagy and have been proposed to have functions in tumour progression and metastatic spread. 42 Among markers found to be associated with short-term risk in the present study, MMP7 had the greatest magnitude of association. MMP7 is involved in the injury response of mucosal epithelia and the degradation of extracellular matrix components and has been previously shown to be overexpressed in pancreatic ductal adenocarcinoma and its precursors, PanIN and intraductal papillary mucinous neoplasms, with MMP7 changes apparent even in intermediate-grade PanIN. 43 In cancer, the programmed death 1 (PD-1) protein binds the ligands PD-L1 and PD-L2 to attenuate T cell receptor signalling, thus allowing the tumour to evade the cytotoxic T cell response. 44 PD-L1 is one of the main targets of immune checkpoint inhibitors and pembrolizumab, an anti PD-1 monoclonal antibody, is effective in some pancreatic cancers with DNA mismatch repair deficiencies. 45 This pathway is considered as a potential target for the development of immunotherapy for pancreatic cancer. 46 NCR1 is one of the activating receptors of natural killer cells and has been considered as a target to make the immune system recognize cancer cells. 47 When we combined proteins with conventional risk factors, only small increases in the discrimination were achieved but when restricting analyses to the first year of follow-up, the increase was substantial, suggesting potential utility of these biomarkers for short-term prediction. Early detection, even if just a few months-years prior to conventional diagnosis, may be beneficial to patients and facilitate surgical resection. Further examination of the potential utility of the markers identified and mechanisms underlying these associations is warranted. Such markers may be used in combination with other risk factors to develop risk prediction models in order to identify individuals at an increased risk of pancreatic cancer who may benefit from screening or surveillance programmes. Future studies are required to assess whether these markers are useful for longitudinal surveillance of high-risk individuals, or as diagnostic biomarkers, to help distinguish pancreatic cancer from differential diagnoses in symptomatic individuals, perhaps in combination with existing biomarkers such as CA19-9, potentially complementing other diagnostic modalities.
The differences between the markers associated with short-term and with long-term risk are likely due to changes in protein levels in the presence of yet undiagnosed pancreatic cancer or the presence of precursor lesions. Whether markers associated with long-term risk are causally related to risk of pancreatic cancer or are only markers of a long natural history of the disease needs to be assessed in further studies, employing genetic epidemiological studies such as Mendelian randomization. 48 The primary strength of the study is its prospective design; the use of blood samples drawn before diagnosis of pancreatic cancer allows the identification of biomarkers present up to several years before its diagnosis. The study also has limitations. First, even though our study includes a relatively large number of incident cases of pancreatic cancer, given the relatively low incidence rate of this cancer, the sample size might not be large enough to identify some associations of more modest magnitude, in particular when investigating time-varying relationships. Second, although the majority of these cancers are likely to be pancreatic ductal adenocarcinoma, 49 we do not have detailed information on histological subtypes or on stage at diagnosis for all cases. Third, we only measured 92 proteins, which is a small proportion of the proteome and does not include CA19-9. Fourth, we did not have data to independently validate our findings. However, a recent paper 50 used the same panel in a case-control study of patients with pancreatic ductal adenocarcinoma (PDAC), patients with premalignant conditions and healthy controls, and identified markers which were associated with PDAC which largely overlapped with our findings (Supplementary Table S11).
In summary, we have identified a number of protein biomarkers that are associated with future risk of pancreatic cancer and a set of proteins which are associated with higher short-term risk. Future studies are warranted to replicate our findings and assess the potential utility of proteins in predicting the risk of pancreatic cancer, among both unselected and high-risk individuals, and in aiding the diagnostic process. Moreover, future studies could assess larger panels of proteins, as identifying more proteins associated with risk may improve our ability to predict future risk of pancreatic cancer. Additionally, our findings may provide motivation to characterize the mechanistic roles these proteins may play in the development and progression of pancreatic cancer, and future studies are needed to assess whether they could represent therapeutic targets.

Ethics approval
Ethics approval from the Oxford University Tropical Research Ethics Committee, the Chinese Centre for Disease Control and Prevention (CDC) Ethical Review Committee and the local CDC of each study area were obtained, and all participants provided written informed consent.

Data availability
The China Kadoorie Biobank (CKB) is a global resource for the investigation of lifestyle, environmental, blood biochemical and genetic factors as determinants of common diseases. The CKB study group is committed to making the cohort data available to the scientific community in China, the UK and worldwide to advance knowledge about the causes, prevention and treatment of disease. For detailed information on what data are currently available to open access users and how to apply for it, visit: [http://www.ckbiobank. org/site/Data+Access]. Researchers who are interested in obtaining from the China Kadoorie Biobank study the raw data that underlie this paper should contact [ckbaccess@ndph.ox.ac.uk]. A research proposal will be requested to ensure that any analysis is performed by bona fide researchers and-where data are not currently available to open access researchers-that analysis is restricted to the topic covered in this paper.