A Plasma-Derived Protein-Metabolite Multiplexed Panel for Early-Stage Pancreatic Cancer

Abstract Background We applied a training and testing approach to develop and validate a plasma metabolite panel for the detection of early-stage pancreatic ductal adenocarcinoma (PDAC) alone and in combination with a previously validated protein panel for early-stage PDAC. Methods A comprehensive metabolomics platform was initially applied to plasmas collected from 20 PDAC cases and 80 controls. Candidate markers were filtered based on a second independent cohort that included nine invasive intraductal papillary mucinous neoplasm cases and 51 benign pancreatic cysts. Blinded validation of the resulting metabolite panel was performed in an independent test cohort consisting of 39 resectable PDAC cases and 82 matched healthy controls. The additive value of combining the metabolite panel with a previously validated protein panel was evaluated. Results Five metabolites (acetylspermidine, diacetylspermine, an indole-derivative, and two lysophosphatidylcholines) were selected as a panel based on filtering criteria. A combination rule was developed for distinguishing between PDAC and healthy controls using the Training Set. In the blinded validation study with early-stage PDAC samples and controls, the five metabolites yielded areas under the curve (AUCs) ranging from 0.726 to 0.842, and the combined metabolite model yielded an AUC of 0.892 (95% confidence interval [CI] = 0.828 to 0.956). Performance was further statistically significantly improved by combining the metabolite panel with a previously validated protein marker panel consisting of CA 19–9, LRG1, and TIMP1 (AUC = 0.924, 95% CI = 0.864 to 0.983, comparison DeLong test one-sided P= .02). Conclusions A metabolite panel in combination with CA19-9, TIMP1, and LRG1 exhibited substantially improved performance in the detection of early-stage PDAC compared with a protein panel alone.

time points were performed in triplicates or quadruplicates. Blank samples containing media only were included and collected at T0 and T6. The 6 hours samples were used to count cell numbers for data normalization. Once all the media samples were collected, the tubes were centrifuged at 2000 x g for 10 min to remove residual debris and the supernatants transferred to 1.5 mL tubes (Eppendorf) and stored in -80°C until use for metabolomics analysis.

Primary Metabolites and Biogenic Amines
Plasma metabolites were extracted from pre-aliquoted EDTA plasma (10 µL) with 30µL of LCMS grade methanol (ThermoFisher) in a 96-well microplate (Eppendorf). Plates were heat sealed, vortexed for 5min at 750 rpm, and centrifuged at 2000 × g for 10 minutes at room temperature. The supernatant (10 µL) was carefully transferred to a 96-well plate, leaving behind the precipitated protein. The supernatant was further diluted with 10 µL of 100 mM ammonium formate, pH3. For Hydrophilic Interaction Liquid Chromatography (HILIC) analysis, the samples were diluted with 60 µL LCMS grade acetonitrile (ThermoFisher), whereas samples for C18 analysis were diluted with 60 µL water (GenPure ultrapure water system, Thermofisher). Each sample solution was transferred to 384-well microplate (Eppendorf) for LCMS analysis.
For cell lysates, 100 µL (3:1 isopropanol:ultrapure water) was aliquoted into two 300 µL, 96-well plates (Eppendorf) and evaporated to dryness under vacuum. The samples were then reconstituted as follows: for the HILIC assays, the dried samples were dissolved in 65 µL of ACN (Fisher Scientific): 100 mM Ammonium Formate pH3 (9:1) whereas for the reverse phase C18 assays, the dried samples were dissolved in 65 µL of H2O: 100 mM Ammonium Formate pH3 (9:1). The samples were then spun down to remove any insoluble materials and transferred to a 384-well plate for high throughput analysis using LCMS.
Frozen media samples were thawed on ice and 30µl transferred to a 96-well microplate (Eppendorf) containing 30µL of 100mM ammonium formate, pH 3.0. The microplates were heat sealed, vortexed for 5min at 750 rpm, and centrifuged at 2000 x g for 10 minutes at room temperature. For Hydrophilic Interaction Liquid Chromatography (HILIC) analysis, 25µL of sample was transferred to a new 96 well microplate containing 75µL acetonitrile, whereas samples for C18 analysis were transferred to a new 96well microplate containing 75µL water (GenPure ultrapure water system, Thermofisher). Each sample solution was transferred to 384-well microplate (Eppendorf) for LCMS analysis.
For each batch, samples were randomized and matrix-matched reference quality controls and batchspecific pooled quality controls were included.
For each batch, samples were randomized and matrix-matched reference quality controls and batchspecific pooled quality controls were included.
Quaternary solvent system mobile phases were (A) 0.1% formic acid in water, (B) 0.1% formic acid in acetonitrile and (D) 100mM ammonium formate, pH 3. Samples were separated using the following gradient profile: for the HILIC separation a starting gradient of 95% B and 5% D was increase linearly to 70% A, 25% B and 5% D over a 5min period at 0.4mL/min flow rate, followed by 1 min isocratic gradient at 100 % A at 0.4mL/min flow rate. For C18 separation, a chromatography gradient of was as follows: starting conditions, 100% A, with linear increase to final conditions of 5% A, 95% B followed by isocratic gradient at 95% B, 5% D for 1 min.

Untargeted Analysis of Complex Lipids
For the lipidomic assay, untargeted metabolomics analysis was conducted on a Waters Acquity™ UPLC system coupled to a Xevo G2-XS quadrupole time-of-flight (qTOF) mass spectrometer.

Mass Spectrometry Data Acquisition
Mass spectrometry data was acquired in sensitivity, positive and negative electrospray ionization Da (negative) for lockspray correction and scans were performed at 0.5 min. The injection volume for each sample was 3µL, unless otherwise specified. The acquisition was carried out will instrument auto gain control to optimize instrument sensitivity over the samples acquisition time.
Pooled quality control samples were analyzed after a defined number of samples to assess replicate precision and allow LOESS correction by injection order. Additional data was captured using the MSe function for pooled quality control samples.

Data Processing
Peak picking and retention time alignment of LC-MS and MSe data were performed using Progenesis QI (Nonlinear, Waters). Data processing and peak annotations were performed using an in-house automated pipeline. Annotations were determined by matching accurate mass and retention times using customized libraries created from authentic standards and/or by matching experimental tandem mass spectrometry data against the NIST MSMS, LipidBlast or HMDB v3 theoretical fragmentations. To correct for injection order drift, each feature was normalized using data from repeat injections of quality control samples collected every 10 injections throughout the run sequence. Measurement data were smoothed by Locally Weighted Scatterplot Smoothing (LOESS) signal correction (QC-RLSC) as previously described [1]. Only detected features exhibiting a relative standard deviation (RSD) less than 30 in quality control samples were considered for further statistical analysis. To reduce data matrix complexity, annotated features with multiple adducts or acquisition mode repeats were collapsed to one representative unique feature. Features were selected based on replicate precision (RSD<30), intensity and best isotope similarity matching to theoretical isotope distributions. To adjust for inter-site variability, each metabolite was standardized by median-centering the metabolite value for each sample to the median value of the healthy controls.

Enzyme-Linked Immunosorbent Assay
Plasma protein concentrations for CA19-9, LRG1 and TIMP1 were determined as previously described [2]. For all ELISA experiments, each sample was assayed in duplicate and the absorbance or chemiluminescence measured with a SpectraMax M5 microplate reader (Molecular Devices, Sunnyvale, CA). An internal control sample was run in every plate and each value of the samples was divided by the mean value of the internal control in the same plate to correct for interpolate variability.

Statistical Analyses
The AUC that corresponds to the individual performance of all biomarkers is estimated using the area under the empirical estimator of the receiver operating characteristic curve (ROC). The standard error (S.E.) and the corresponding 95% confidence intervals presented for the individual performance of each biomarker were based on the bootstrap procedure in which we re-sampled with replacement separately for the controls and the diseased 1000 bootstrap samples. We note that for markers LPC(18:0), LPC(20:3) and indole-3-lactate we took into account the inverse directionality, since these markers tend to exhibit higher measurements for the controls compared to the ones that correspond to the cancer related samples. Our model building is based on a logistic regression model using the logit link function. The estimated AUC of the proposed metabolite panel (0.9034) was derived by using the empirical estimator of the linear combination that corresponds to the aforementioned model. The 95% confidence interval we report for the metabolite panel based AUC (0.8180-0.9889) takes into account the fact that the coefficients of the underlying logistic regression model are estimated, and hence exhibit variability, by using the bootstrap with 1000 iterations, for which in every bootstrap iteration the coefficients of the model are re-estimated in order to provide proper inference. Below we present the bootstrap scheme we employ in order to take into account this variability. If we denote with nx the sample size of the controls and ny the sample size of the cases the bootstrap based algorithm we used to derive the CIs for the logistic regression based panel during training is the following: Step 1: Sample with replacement nx healthy individuals and ny diseased individuals. Collect for these sampled individuals all scores for each marker that contribute to our marker panel.
Step 2: Based on the samples drawn in Step 1, re-fit the logistic regression model by using the binary disease status as a response and the biomarker scores as covariates.
Step 3: Using the model fitted in Step 2 generate based on its linear part the overall scores from this model for each drawn individual and calculate the AUC.
Step 5: The SE of the original AUC estimate for our data set, let ̂, is given by: . The corresponding p-value is then calculated based on =̂− 0.5 (̂)~( 0,1).
The protein-metabolite multiplexed panel, i.e. the panel that refers to the combination of the two underlying panels -one for the proteins and one for the metabolites-has been developed using those two panels as two composite markers, considering their respective coefficients fixed (one composite marker for the proteins and one for the metabolites). The protein-metabolite multiplexed panel was developed by combining those two underlying composite markers using a logistic regression model in which we considered the logit link function. The p-values are calculated based on the approximate normality of the 1000 bootstrap based AUC estimates [3]. Wald type CIs are employed using (̂-1.96*SE, ̂+ 1.96*SE) and the corresponding one tailed p-value is provided.
Regarding testing whether an AUC is significant when focusing on an individual marker in the training set, we explored a simpler bootstrap scheme, in which no regression model is involved. More specifically, we consider: Regarding the comparison of two AUCs we need to take into account the underlying correlation. Our bootstrap scheme in this case is of the following form:  Step 1: Sample with replacement nx healthy individuals and ny diseased individuals.  (1) √Var(AUĈ (2) ) + Var(AUĈ (2) ) − 2Cov(AUĈ (2) , AUĈ (1) )~N (0,1).
where Var(AUĈ (k) ) = * P values were calculated using a two-sided Wilcoxon rank-sum test. † Values in parentheses (e.g. (18:0)) represent the number of carbon atoms (i.e. 18) and number of double bonds (0) in the fatty acyl chain of the respective lipid. ‡ Designation between p-and o-refers to whether the respective ether lipid is a plasmanyl-or plasmenyl-phospholipid, respectively.