Chemical signature of colorectal cancer: case–control study for profiling the breath print

Background Effective screening for colorectal cancer can reduce mortality by early detection of tumours and colonic polyps. An altered pattern of volatile organic compounds (VOCs) in exhaled breath has been proposed as a potential non‐invasive diagnostic tool for detection of cancer. The aim of this study was to evaluate the reliability of breath‐testing for colorectal cancer screening and early diagnosis using an advanced breath sampler. Methods The exhaled breath of patients with colorectal cancer and non‐cancer controls with negative findings on colonoscopy was collected using the ReCIVA® Breath Sampler. This portable device is able to capture the alveolar breath fraction without environmental contamination. VOCs were desorbed thermally and analysed by gas chromatography–mass spectrometry. The discriminatory ability of VOCs in detecting colorectal cancer was evaluated by receiver operating characteristic (ROC) curve analysis for each VOC, followed by cross‐validation by the leave‐one‐out method, and by applying stepwise logistic regression analysis. Results The study included 83 patients with colorectal cancer and 90 non‐cancer controls. Fourteen VOCs were found to have significant discriminatory ability in detecting patients with colorectal cancer. The model with the diagnosis of cancer versus no cancer resulted in a statistically significant likelihood of discrimination of 173·45 (P < 0·001), with an area under the ROC curve of 0·979. Cross‐validation of the model resulted in a true predictive value for colorectal cancer of 93 per cent overall. Reliability of the breath analysis was maintained irrespective of cancer stage. Conclusion This study demonstrated that analysis of exhaled VOCs can discriminate patients with colorectal cancer from those without. This finding may eventually lead to the creation of a smart online sensory device, capable of providing a binary answer (cancer/no cancer) and directing to further screening.


Introduction
Colorectal cancer is amongst the four biggest cancer killers in the European Union 1 , and the second most frequent cause of death in Italy 2 . Given the encouraging data with selected endoscopic screening and early polypectomy with regard to cancer prevention and survival advantage 3 , there is an ongoing search within national health programmes for effective and reliable screening methods. Recently, breath-testing has been pursued as an option, and a range of volatile organic compounds (VOCs) present in the exhaled breath of patients with cancer have been identified 4 . Several groups have reported specific VOCs for cancer diagnosis, not only for colorectal cancer [4][5][6][7][8] but also for cancers of the lung 9,10 stomach 11,12 pancreas 13 and breast 14 ; each cancer type is associated with a unique breath print.
Currently, a wide variety of VOC candidates have been identified in colorectal cancer, probably reflecting the use of varying substrates and analytical platforms 15 . In 2013, the present authors' group published the first study 4 outlining the potential of breath analysis for colorectal cancer screening, using a customized breath sampler and gas chromatography-mass spectrometry (GC-MS) analysis. That study identified a pattern of 15 specific VOCs that had discriminatory capacity between patients with colorectal cancer and controls. Since then, other groups 6,7,16 have independently published a variety of different VOC breath patterns in colorectal cancer, using similar methodologies.
The aim of the present case-control study was to evaluate whether the addition of a new advanced breath sampler could specifically capture only the alveolar breath fraction and exclude environmental contaminants effectively.

Methods
The study was approved by the ethics committee of the Azienda Ospedaliero-Universitaria Policlinico, Bari, Italy, and performed in compliance with the Declaration of Helsinki; it was registered at https://clinicaltrials .gov (identifier NCT04217083). All of the patients and non-cancer controls recruited provided written informed consent before breath-testing.
The study recruited patients of any sex or age undergoing curative surgery for histologically proven adenocarcinoma of the colon or rectum (of any clinical stage) and a comparative group of non-cancer controls who had negative findings on colonoscopy performed within the previous 3 years. The control group was recruited from patients who had undergone colonoscopy for occult or overt rectal bleeding, or as part of diagnostic workup for chronic constipation. Patients excluded from analysis were those who had any history of another type of cancer, previous endoscopic removal of colonic polyps or a history of familial adenomatous polyposis or Lynch syndrome, and a history of active inflammatory disorder or liver disease. Patients with diverticulosis, chronic constipation or haemorrhoids were included in the analysis, as were those who were smokers or who had diabetes and/or hypertension.
A dedicated electronic database was established to analyse the data. To test the reliability of the pattern of VOCs in patients with colorectal cancer, patients were divided according to clinical cancer stage 17 into two main staging subgroups: early (stage I-II) and advanced (stage III-IV) stage disease.
Exhaled breath was collected with the ReCIVA ® Breath Sampler (Owlstone Medical, Cambridge, UK). The device was connected to a breath-sampling kit (mask and sorbent tubes), ensuring reproducible collection of VOCs during real-time monitoring of the patient's breathing. The exhaled breath of each patient was captured into four carbon tubes containing a mixed formulation of Tenax ® and Carbograph ® (Markes International, Llantrisant, UK) capable of retaining a range of carbon compounds (from C4 to C30). The apparatus comprised infrared carbon dioxide detection with pressure sensors, permitting the selection of different volumes and fractions of the exhaled breath. A mask manufactured from medical grade silicone, which included a high-efficiency, low-resistance bacterial filter, was fixed on to the device before each sampling. This was connected to a medical air canister via a plastic pressure reducer, set to 15 litres/min.
A USB cable connected the ReCIVA ® breath sampler to a laptop installed with breath-sampling software (Owlstone Medical), designed to ensure accurate monitoring of breathing air pressures (partial pressure of carbon dioxide). All subjects were fasted for at least 4 h before breath sampling. Sampling was always performed in the same room, aerated for 30 min before each procedure. Patients were instructed to keep the mask securely adhered to their face and to breathe normally the air released by the medical air canister. After a 60-s ReCIVA ® device washout with pure air (purity 99⋅99 per cent; SOL Group, Monza, Italy), patient breath was collected for 10 min under PC-dedicated program control. At the completion of sampling the sorbent tubes were removed, covered with a plastic cap, and delivered to the chemistry department within 24 h for GC-MS analysis.
Once collected in the ReCIVA ® sorbent tubes, VOCs were processed by a thermal desorber (TurboMatrix™ 350; PerkinElmer, Waltham, Massachusetts, USA) fed with air (purity 99⋅988 per cent; SOL Group, Monza, Italy) and nitrogen (purity 99⋅999 per cent; Sapio Group, Taranto, Italy). Products were connected directly to a heated transfer line and then to GC-MS (TRACE GC Ultra; Thermo Scientific, Waltham, Massachusetts, USA). Each tube was heated at 250 ∘ C for 20 min. Desorbed VOCs were then transferred at 200 ∘ C in splitless mode by the chromatograph injector using helium (purity 99⋅999 per cent; Sapio Group) as a carrier gas at a linear velocity of 0⋅6 cm/s. Separation and quantification of the desorbed VOCs was performed by coupling the chromatograph with a quadrupole mass spectrometer (ISQ™; Thermo Scientific) using a capillary column of 30 m × 0⋅25-mm internal diameter with a 1⋅4-μm film thickness (DB-624 UI; Agilent Technologies, Santa Clara, California, USA), and at 40 ∘ C oven temperature for 5 min. The protocol is then ramped up to 160 ∘ C by 6 ∘ C per min (with 10 min at 160 ∘ C) and then increased by 6 ∘ C per min until it reaches 200 ∘ C. This heating is maintained for a further 15 min before being increased at 6 ∘ C per min to 220 ∘ C for a further 5 min. Temperatures reached at the transfer line and the quadrupole ion source were 280 ∘ C and 220 ∘ C respectively. Mass spectrometry was performed at 70 eV electron impact ionization energy in full-scan mode, with a scan range of 40-250 atomic mass units. The Xcalibur™ software (Thermo Scientific) allowed acquisition and analysis of the data. In an effort to exclude extraneous contamination, on each sampling day three ReCIVA ® steel tubes containing room air were sample-tested before commencement of the breath sampling. A preliminary study on the first 20 patients was performed to ascertain whether significant qualitative or quantitative differences in the chromatogram were present, desorbing and analysing each one of the four Tenax ® tubes; no significant differences were found in the peaks among the four tubes in terms of intensity and resolution. However, to reproduce the same experimental conditions for each patient as far as possible, it was decided always to analyse the tube positioned in the same housing (position B) of the ReCIVA ® unit. The other three tubes were also desorbed and analysed, and the chromatograms were saved and stored for control. Individual VOCs were identified by injecting 1 μl of the working solution and calibrating as 50-or 100-μg/ml aliquots with authenticated standards using the MS database of the National Institute of Standards and Technology. Stock solutions (20 mg/ml) of each volatile compound (purity at least 97 per cent; Sigma-Aldrich, Milan, Italy) were prepared in methanol (purity at least 98 per cent; Sigma-Aldrich), stored at 8 ∘ C, and diluted to prepare working solutions.
Targeted analysis was performed of putative colorectal cancer VOCs by depositing 50 ng of each analytical standard into a ReCIVA ® sorbent tube. The tubes were analysed at random and in triplicate on the same day, as part of a reliability testing for all GC-MS runs. Proposed ReCIVA ® GC-MS methods were tested by linear regression analysis using standardized solutions of suitable concentration, plotting the peak area against the concentration of analyte. After adding 1 ml of a standard solution containing the analytes at appropriate levels into the tubes, estimates were made of linearity, reproducibility, the limit of detection (LOD), the limit of quantification (LOQ), and threefold and tenfold basal signal-to-noise ratios.

Statistical analysis
Quantitative variables approaching a normal distribution were reported as mean(s.d.) values. Comparison between patients with colorectal cancer and controls was done with Student's t test for independent samples. Sex and other categorical variables were reported as counts and percentages, with comparisons between independent groups performed with the χ 2 test.
To assess the accuracy of the discriminatory capacity of VOCs for the detection of patients with colorectal cancer, the analysis was conducted in two stages. The first stage constructed a receiver operating characteristic (ROC) curve (with 95 per cent c.i., sensitivity and specificity) for each VOC and with the establishment of an individual cut-off value for discrimination of colorectal cancer. The second stage involved cross-validation using the 'leave-one-out' method, and applying logistic regression with the dependent variable being the status of the patients. The estimates were therefore classified as cancer versus no cancer, with each VOC acting as an independent variable in the analysis.
To evaluate the effect of multiple VOCs, multivariable logistic regression analysis was performed with stepwise selection of independent variables. Initially, all VOCs were dichotomized in accordance with their threshold, as had been determined in the univariable analysis. The    (7) months before the start of the study. Table 1 shows the overall patient demographics and co-morbidities. The groups were matched for sex, but the mean age of patients with colorectal cancer was significantly higher than that of the controls (P < 0⋅001). Hypertension was the only co-morbidity that was more common in the patients with colorectal cancer than in the control group (P = 0⋅033).
GC-MS analysis showed that 64 compounds were present in at least 50 per cent of colorectal cancer or noncancer breath samples collected from the ReCIVA ® system and absent in compared room air samples. To identify VOCs able to discriminate between patients with cancer and those with no cancer, a non-parametric Mann-Whitney U test was performed; this showed that levels of 38 of the compounds were significantly different in the two groups.
Using the relevant ROC curves for sensitivity and specificity and true predictive value with a cross-validation univariable analysis, the VOCs were ordered from the most to the least accurate (by AUC and sensitivity for discriminating colorectal cancer from non-cancer) ( Table 2). Fifteen VOCs were statistically significant in discriminating patients with colorectal cancer in univariable analysis. The highest AUC values were noted for tetradecane ethylbenzene and methylbenzene, octanal, nonanal and decanal. To evaluate the predictive value of the association of the diagnosis of colorectal cancer with particular VOCs, multivariable logistic regression analysis using discrete modelling was performed against age and colorectal cancer stage. Table 3 shows the analysis using the model A approach, comparing cancer and no cancer diagnosis. With this approach there was a significant likelihood of association (173⋅45, P < 0⋅001; R 2 = 0⋅642, AUC = 0⋅979) (Fig. 1b). Cross-validation of this model confirmed a 90 per cent rate of true positives, a 93 per cent rate of true negatives and a predictive value of 93 per cent. This selection procedure retained age group as well as the VOCs Multivariable models: a with no adjustment for age (likelihood = 151⋅03, P < 0⋅001, area under the ROC curve (AUC) = 0⋅963); b with adjustment for age class (model A) (likelihood = 173⋅45, P < 0⋅001, AUC = 0⋅979); c for detection of early-stage cancer (model B) (likelihood = 113⋅63, P < 0⋅001, AUC = 0⋅985); d for detection of advanced cancer (model C) (likelihood = 88⋅51, P < 0⋅001, AUC = 0⋅933). tetradecane, ethylbenzene, methylbenzene, acetic acid and 5,9-undecadien-2-one, 6,10-dimethyl (E), all of which were retained in the list of the top 14 VOCs from the univariable analysis. Table 4 shows application of the multinomial model B; there was a significant likelihood of discrimination of early-stage colorectal cancer compared with control samples (likelihood = 113⋅63, P < 0⋅001; R 2 = 0⋅603 per cent, AUC = 0⋅985) (Fig. 1c). The VOCs entered into the model B format had already been selected from the model A analysis; they included tetradecane, ethylbenzene, methylbenzene, 5,9-undecadien-2-one and tridecane. The model B analysis produced several further informative markers, including benzaldehyde, dodecane and butyl hydroxy toluene. Table 5 shows application of model C, comparing advanced colorectal cancer with no cancer. There was a significant likelihood of discrimination (likelihood = 88⋅51, P < 0⋅001; R 2 = 0⋅496 per cent, AUC = 0⋅933) (Fig. 1d). Significant VOCs identified with the model C format *Cut-off values for volatile organic compounds are expressed as peak area percentage, and listed in order of decreasing accuracy for colorectal cancer discrimination. Likelihood: 113⋅63, P < 0⋅001.
included tetradecane, ethylbenzene, methylbenzene and benzaldehyde. Although benzoic acid did not reach significance in the univariable analysis in discriminating patients with colorectal cancer, it showed high specificity (81 per cent) in the model C assessment. Cross-validation of the three models used is shown in Table 6. Model B resulted in the correct classification in 86 per cent of patients with early-stage colorectal cancer and in 94 per cent of controls, with an overall positive predictive value (PPV) of 86 per cent. Model C showed a correct classification in 71 per cent of patients with advanced colorectal cancer and 97 per cent of controls, with an overall PPV of 91 per cent.
In the analytical validation and threshold quantification of VOCs, ethylbenzene, methylbenzene and tetradecane were the only three compounds identified as statistically relevant in discriminating patients with colorectal cancer in all univariable and multivariable models. To quantify these three VOCs, linear regression analysis of the peak area versus analyte concentration was done. For this purpose, standard mixtures with concentration ranges that were dependent on the individual compound were added directly to virgin ReCIVA ® sorbent tubes. Table 7 gives the analytical validation parameters of the GC-MS method on standard reagents of the three compounds.
Threshold concentration values in the breath for the three selected substances (above or below which a patient could be affected by colorectal cancer with a 95 per cent c.i. of 90 per cent) were calculated as less than or equal to the LOQ value for ethylbenzene, greater than the LOD value for methylbenzene, and 53 ng or less for tetradecane.

Discussion
Analysis of exhaled VOCs for the detection of early-stage cancers would appear to be an attractive screening option because a non-invasive approach such as a breath test is well tolerated. Despite an encouraging meta-analysis 18 reporting an overall high accuracy for exhaled VOCs in ancer diagnosis, there is currently no technical standardization of either breath collection or analysis. This study has identified specific exhaled VOCs as putative candidates in the discrimination of patients with colorectal cancer from controls. In addition, the reliability of the methodology appears to be independent of disease stage, highlighting its potential value as a diagnostic or screening tool.
Comparing the results of the univariable analysis with the three models obtained by multivariable logistic regression, three VOCs (ethylbenzene, methylbenzene and tetradecane) were found consistently to have a highly significant discriminatory ability in detecting patients with colorectal cancer.
In a previous study 4 a customized breath sampler and a different GC-MS column was employed and, in comparison with the present study, only a few VOCs with discriminatory capacity for colorectal cancer were identified. Differences between the findings of these studies are probably technical, as in the present study only the alveolar breath fraction was measured, with exclusion of extraneous contaminants. The use of Tedlar ® bags (Sigma-Aldrich) for breath collection in the earlier study 4 may also have provided an opportunity for extraneous VOC contamination.
There are other potential reasons for the differences in these results that may be a consequence of the use of different capillary columns and oven profiles, as well as a different spectrometer. There has also been an important change in the statistical analysis used since the first report, from a probabilistic neural network to a ROC sensitivity curve. The present lack of standardization in breath collection, VOC measurement and statistical analysis all likely contribute to inconsistencies in the VOC profiles identified by different groups, and presently this limits the value of exhaled VOC patterns as a mass screening tool for diagnosis of colorectal cancer 18 . In this regard, another report 19 identified a pattern of six VOCs capable of discriminating patients with colorectal cancer from those without cancer. In that report, breath samples were obtained via a simple mouthpiece containing a filter cartridge on the inspiratory port, collected into a 750-ml polyvinyl fluoride sampling bag and then analysed by a GC-MS (GC6890N; MS-5975; Agilent Technologies), and the statistical analysis was performed using a principal components analysis (PCA) algorithm. Despite these important methodological differences, however, three of the VOCs identified in that study 19 matched the present VOC pattern.
Another study 6 , published in 2014, compared the breath of 20 patients with colorectal cancer with that of 20 non-cancer controls, and selected nine VOCs as discriminators for colorectal cancer, but only one VOC from that study, dodecane, was also isolated as a relevant VOC in the present study. A GC-MS analytical platform was similarly used; however, their settings were different to the present ones and PCA was used for their statistical evaluation 6 . In 2016, Amal and colleagues 7 identified another pattern of exhaled VOCs in a cohort of 65 patients with colorectal cancer and 122 controls; acetone and ethyl acetate were found in higher concentration, and ethanol and 4-methyl octane in comparatively lower concentration, in the patients with cancer. Although the discriminatory value of ethanol in this study should be considered, there were methodological differences with the present analysis: the alveolar breath fraction was analysed using a two-bed ORBOTM 420 Tenax ® TA sorption tube and a GCMS-QP2010 (Shimadzu, Kyoto, Japan) and an SLB ® -5 ms capillary column (Sigma-Aldrich) 7 .
Another recent study 20 used a different, sophisticated, analytical platform, selected ion flow tube mass spectrometry (SIFT-MS), which permits real-time measurement of trace concentrations of gases in humidified air samples. Even though there was no attempt to isolate the alveolar fraction of air, the authors specifically identified a single VOC, propanal, as a significant biomarker with high sensitivity (96 per cent) and specificity (76 per cent) for detection of colorectal cancer. That study 20 documented a specific aldehyde molecule (one of the shorter volatile aldehydes and a structural isomer of acetone) that is difficult to detect with a standard GC-MS platform. Propanal has also been implicated in the detection of other diseases, such as coeliac disease and bile acid diarrhoea 21,22 .
The biological significance of the VOCs involved in carcinogenesis is extremely difficult to explain and would require further studies on the biochemical pathways behind their production. However, it seems clear that some of them, such as ethylbenzene and tetradecane, correlate significantly with the diagnosis of colorectal cancer, irrespective of stage; it may be speculated that their metabolic pathways are closely linked to the process of carcinogenesis, at least in the colon.
Although the search for a single specific colorectal cancer biomarker represents the 'holy grail' in VOC detection, and reflects a discretely measurable metabolic derangement within colorectal cancer cells 23 , cancer cells still use the same metabolic pathways as normal cells, although in quantitatively different ways. Hence, it may be more realistic to expect changes in the relative quantities of VOCs in colorectal cancer, rather than to isolate particular 'cancer-specific' VOCs. Most studies concerning colorectal cancer detection have identified a patterned VOC signature, with VOCs such as nonanal and 1,3-dimethylbenzene repeatedly detected as significant in a range of reports. It is also expected that discrepancies in the VOC profiles of patients with colorectal cancer could, at least in part, be the result of gastrointestinal dysbiosis, whereby complex interaction of the gut microbiome with the local inflammatory response is implicated in the transformation from normal to neoplastic tissue 24 . In a recently published study 25 , in which the VOCs exhaled by seven patients with colorectal cancer were compared with those produced from their own cancer tissue (the latter sampled ex vivo after surgery by headspace solid-phase microextraction), concentrations of benzaldehyde and indole were significantly higher in cancerous tissue than in normal colonic mucosa, whereas ethylbenzene production was reduced in the cancer tissue. This finding suggests that normal colonic mucosa and cancer tissue can produce similar VOC patterns, even though their individual fingerprints may differ.
A limitation of this study is that patients and controls were recruited by a single centre, and came from the same geographical area with similar diet and environmental exposure; this could have a role in the pattern of exhaled VOCs. Therefore, a multicentre international trial on this topic is awaited. Currently, the equipment needed is expensive, and the data require complicated analysis, limiting the utility of this approach as a mass colorectal cancer screening tool. Some groups 16,26,27 have reported the use of commercial or purpose-designed electronic (e) noses to discriminate patients with colorectal cancer from those with advanced adenomas, as well as from non-cancer controls. These results have, however, been mixed, as each chemical sensor used in an e-nose can be activated by several VOCs belonging to the same chemical class of compound.
Confirmation of the pattern of VOCs identified in this paper in a multicentre international study may support the next step in the clinical application of breath sampling in colorectal cancer screening, which is to create a smart, inexpensive, online sensory device with reproducibly high discrimination for colorectal cancer. The binary result should be capable of directing a further screening investigation, such as colonoscopy.