University of Groningen Urine steroid metabolomics as a novel tool for detection of recurrent adrenocortical carcinoma

Abstract Context Urine steroid metabolomics, combining mass spectrometry-based steroid profiling and machine learning, has been described as a novel diagnostic tool for detection of adrenocortical carcinoma (ACC). Objective, Design, Setting This proof-of-concept study evaluated the performance of urine steroid metabolomics as a tool for postoperative recurrence detection after microscopically complete (R0) resection of ACC. Patients and Methods 135 patients from 14 clinical centers provided postoperative urine samples, which were analyzed by gas chromatography–mass spectrometry. We assessed the utility of these urine steroid profiles in detecting ACC recurrence, either when interpreted by expert clinicians or when analyzed by random forest, a machine learning-based classifier. Radiological recurrence detection served as the reference standard. Results Imaging detected recurrent disease in 42 of 135 patients; 32 had provided pre- and post-recurrence urine samples. 39 patients remained disease-free for ≥3 years. The urine “steroid fingerprint” at recurrence resembled that observed before R0 resection in the majority of cases. Review of longitudinally collected urine steroid profiles by 3 blinded experts detected recurrence by the time of radiological diagnosis in 50% to 72% of cases, improving to 69% to 92%, if a preoperative urine steroid result was available. Recurrence detection by steroid profiling preceded detection by imaging by more than 2 months in 22% to 39% of patients. Specificities varied considerably, ranging from 61% to 97%. The computational classifier detected ACC recurrence with superior accuracy (sensitivity = specificity = 81%). Conclusion Urine steroid metabolomics is a promising tool for postoperative recurrence detection in ACC; availability of a preoperative urine considerably improves the ability to detect ACC recurrence.

A drenocortical carcinoma (ACC) is a rare and aggressive malignancy (1,2). Disease recurrence rates are high, even in patients with microscopically complete (R0) resection (3,4). Therefore, vigilant surveillance of all operated patients by regular cross-sectional imaging for several years is essential to facilitate early intervention in case of recurrence (5)(6)(7). Although the optimal surveillance protocol has yet to be established, a common approach involves 3-month CT scans (thorax, abdomen, pelvis) in the first 2 postoperative years, 6-month CT scans in the next 3 years and, thereafter, annual scans until 10 years postoperatively (8). This is associated with considerable costs, repeated radiation exposure, and frequent diagnostic ambiguity in early stages of recurrent/metastatic disease (9). Early detection of disease recurrence is important, as it may allow radical revision surgery in cases of limited metastatic disease volume or timely initiation of cytotoxic chemotherapy, potentially improving survival (5,7,(10)(11)(12)(13). The number of metastatic sites at diagnosis of recurrent disease and time from surgery to detection of recurrence have been shown to be independent prognostic factors (10,14).
Most ACCs are biochemically active, typically presenting a steroidogenic pattern dominated by steroid precursor metabolites rather than end products of steroidogenesis (15). This pattern has been attributed to the relative dedifferentiation of malignant cells (15,16). Most of these steroid precursors, which represent intermediate steps along the 3 major adrenocortical steroid biosynthetic pathways, are not measured by routine clinical biochemistry. Analysis of 24-hour urine collections by gas chromatographymass spectrometry (GC-MS), however, can identify and quantify the metabolites of the large majority of adrenal-derived steroids, providing a truly comprehensive steroid profiling tool (15). This allows the detection of minute changes in steroidogenesis and the illumination of all intermediate steps that tend to be perturbed in the setting of adrenocortical malignancy. Recent retrospective studies revealed the capacity of urinary steroid profiling to distinguish ACC from benign adrenal tumors. In 2011, our group analyzed steroid metabolite profiles in 24-hour collections from 102 patients with benign adrenocortical adenomas and 56 patients with ACC by GC-MS (15). Machine learning-based analysis of the steroid data identified a distinct malignant steroid "fingerprint" for ACC and could differentiate benign from malignant adrenal tumors with a sensitivity and specificity of 90% (15). Using GC-MS, 95% of ACCs showed evidence of steroid excess, while routine biochemistry only indicated steroid excess in 73% (15). Two subsequent retrospective studies also employing GC-MS produced similar results, albeit in smaller cohorts and without the use of machine learning analysis (16,17).
In this study, we evaluated the diagnostic performance of urine steroid metabolomics, the combination of mass spectrometry-based steroid profiling and data analysis by machine learning-based algorithms, in the postoperative surveillance of ACC patients following microscopically complete (R0) tumor resection. We assessed the performance of this approach in the detection of disease recurrence, also comparing direct interpretation of steroid profiles by clinical experts to fully automated, machine learning-based analysis of the steroid metabolome.

Patients and clinical protocol
Serial postoperative 24-hour urine samples were collected from patients with histologically confirmed ACC, who had undergone microscopically complete (R0) tumor resection in 14 clinical specialist referral centers participating in the European Network for the Study of Adrenal Tumors (ENS@T; www.ensat.org), with approval of local ethical review boards and after obtaining written informed patient consent. Participating countries included the United Kingdom (Birmingham), Germany (Würzburg, Munich, Berlin), France (Paris), Italy (Florence, Turin), Greece (Athens), the Republic of Ireland (Dublin, Galway), Poland (Warsaw), Croatia (Zagreb), and Portugal (Coimbra, Lisbon). Urine samples were collected between 2007 and 2016. Inclusion criteria were defined as (i) histologically confirmed diagnosis of ACC, (ii) complete (R0) tumor resection, and (iii) provision of at least 1 postoperative 24-hour urine sample when disease-free, that is, before any radiological evidence of disease recurrence (as assessed by computed tomography [CT] thorax, abdomen, and pelvis) and within 2 years from surgery. Participating centers were prompted to provide urine samples every 3 months, but actual frequency of provided samples did not constitute an exclusion criterion as long as at least 1 postoperative sample had been provided at a time with no evidence of disease recurrence on surveillance imaging.
ACC recurrence had to be confirmed by 1 of the following: (i) emergence of new lesions on cross-sectional imaging (CT, magnetic resonance imaging), which either enlarge on follow-up scans or regress in response to systemic antitumor therapy; (ii) emergence of enhancing lesions on positron emission tomography or positron emission tomography CT scans; or (iii) histological evidence of recurrent/metastatic ACC from percutaneous biopsy or revision surgery.

Biochemical analysis
Measurement of 24-hour urinary steroid metabolite excretion was carried out by GC-MS, as described in detail previously (15). In brief, free and conjugated steroids were extracted from 1 mL urine by solid-phase extraction. Steroid conjugates were enzymatically hydrolyzed, re-extracted, and chemically derivatized to form methyloxime trimethyl silyl ethers. GC-MS was carried out on an Agilent 5975 instrument operating in selected-ion-monitoring mode to achieve sensitive and specific detection and quantification of 19 selected steroid metabolites ( Table 1) comprising 8 of the 9 previously described "malignant steroid fingerprint" metabolites indicative of ACC (15). We did not include glucocorticoid metabolites as these are uninterpretable in mitotane-treated ACC patients, who all receive high-dose glucocorticoid replacement while also being subject to the strong induction of the cortisolmetabolizing enzyme CYP3A4 by mitotane (18).

Clinical expert review of steroid profiles
Three clinical experts with extensive experience in adrenal disease (I.B., M.O.R., W.A.) were provided with longitudinally collected postoperative urinary steroid profiles from patients who either (i) developed disease recurrence ("recurrence cohort") or (ii) remained recurrence-free over a follow-up period of at least 3 years, which we considered our "recurrence negative" cohort, as the chances of ACC recurrence past this time-point are low (19). Provision of at least 1 sample at a "disease-free" state was an essential inclusion criterion for this study; therefore, all included recurred patients had provided at least 2 postoperative urine samples (one pre-and one post-recurrence). Similarly, we only included patients from the "recurrence-free" cohort who had provided at least 2 postoperative urine samples for this study part. Preoperative steroid profiles were provided when available.
The 3 assessors were blinded to clinical and radiological information other than basic patient demographics (age, sex) and were only provided with a previously established steroid metabolite reference range derived from a healthy adult control cohort (age range 20-81 years; 77 women, 54 men). The clinical experts were asked to identify the first urine indicative of a recurrence (or state "no recurrence" in patients that they considered as non-recurred), taking into account differences of the steroid profiles to those observed in healthy controls and the previously observed "malignant steroid fingerprint" in patients with a primary ACC tumor in situ (15).
Recurrence detection by the clinical experts was considered successful only if based on interpretation of the steroid profile in a urine sample collected before or at the time of the first radiological detection of recurrent disease. This means that late biochemical detection in relation to imaging did not count as true positive for the purposes of sensitivity calculations.

Machine learning-based data analysis
Supervised machine learning was used to create an approach for automatic separation of recurrent from nonrecurrent patients (20). The machine learning algorithm was developed by presenting the results of the 19 steroid markers measured by GC-MS in a given 24-hour urine, and the corresponding output, that is, a "yes" or "no" answer to the question of whether an ACC recurrence had been radiologically detected at the time of urine collection. From these "training" examples, the algorithm learned to generalize by finding patterns in the steroid data and use them to provide an output answer when the output is not known.
We used machine learning to approach 2 separate 2-class classification problems. First, we considered the differentiation of all 215 urine samples collected in the 39 non-recurred patients from all 76 urine samples collected post-recurrence in the 32 recurred patients. Second, to test the ability of our approach for very early detection, we aimed to differentiate all non-recurred samples against the first urine sample collected in each recurrent patient at the time of first radiological detection of ACC recurrence (35 samples, as 3 of 32 patients had 2 recurrences).
Random forests were used as machine learning classifier (21,22). The random forest is a classification framework based on the concept of decision trees. It builds a forest of many decision trees to create a strong classifier that is resistant to noise and overtraining. Another favorable property of random forests is that they give insight into the importance of features, which we exploited to inspect the contribution and relevance of each steroid metabolite in the classification problem. For all experiments, training the random forest prediction models and validating them, we used Matlab 2015a, specifically, the TreeBagger class of Matlab (included in the Statistics and Machine Learning Toolbox) (Matlab documentation, 2018). To estimate predictor importance, the parameter that controls computation of predictor importance was set to "on" (this parameter is called "oobvarimp" in Matlab 2015a). The number of decision trees used to obtain the results was 128, which provided optimal trade-off between speed and performance. Tenfold cross-validation was used to estimate the classifier's predictive quality. To account for the differences in the number of samples between the healthy and the recurrence classes, the validation procedure was repeated 50 times for randomized splits of the data. In each run, the healthy class was randomly subsampled to make sure that both classes had an equal number of samples.

Statistical analysis
Data analysis and graphic representation was completed using GraphPad Prism Software Version 8. Data are summarized as median (interquartile range) values unless otherwise stated. Sensitivities and specificities are accompanied by 95% confidence intervals (95% CI), derived using the Wilson/ Brown method (23,24).

Patient characteristics
We recruited 135 patients (50 men, 85 women) who had undergone complete (R0) resection of a histologically confirmed ACC and provided at least one 24-hour urine sample while considered disease-free according to their most recent clinical and radiological assessment and no later than 2 years postoperatively ( Fig. 1). Median age at diagnosis was 49 years (range 18-80 years).
During a median follow-up period of 32 months (interquartile range 15-48 months), 42 of 135 patients (31%) developed disease recurrence; of these, 10 had to be excluded from the analysis as they had not provided 24-hour urines after the detection of recurrence.
Of the cohort of patients who remained disease-free postoperatively, 39 were clinically and radiologically followed for more than 3 years; as ACC recurrence presenting beyond that time frame is rare (19), we defined those 39 patients as the "recurrence-free" cohort for the purposes of this study. Relevant clinical details of both cohorts are summarized in Table 2. The remaining 54 patients without radiological evidence of recurrence but postoperative follow-up <3 years were excluded from further analysis, as they were still at high risk of potentially harboring minimal recurrent disease which had yet to manifest radiologically.
The 39 patients of the "recurrence-free" cohort provided a median of 4 (range 1-24) postoperative 24-hour urine samples. In the "recurrence" cohort, the patients collected a median of 5 (range 2-35) postoperative 24-hour urine samples; 13 of the 32 patients had also collected a preoperative 24-hour urine sample, facilitating the comparison of steroid profiles observed at diagnosis of the primary tumor and at detection of ACC recurrence. All samples provided by the recurred patients are depicted in Fig. 2A, plotted against time after surgery. Single-organ involvement at recurrence detection was diagnosed in 26 of the 32 recurred patients; the remaining 6 had disease affecting more than 1 organ (Table 2). We classified 15 of the 32 recurrences as "high volume" at the time of the first abnormal imaging, defined as at least 1 solid-organ lesion ≥1 cm, and 12 as "low volume"; 5 were indeterminate due to incomplete imaging information.  11  N/A Number of organs involved in recurrence 1 (n = 27); 2 (n = 6); 3 (n = 3) N/A Location of recurrences Lung (n = 22); Liver (n = 10); Lymph nodes (n = 5); Local recurrence (n = 4); Bone, spleen, omentum, pleura (each n = 1)

N/A
Demographics and clinical characteristics of the "recurrence" cohort of patients with disease recurrence and at least one post-recurrence urine (n = 32) and the "recurrence-free" cohort (patients disease-free after ≥3 years of follow-up; n = 39). Where data are not available for the full cohort, number of patients with available data is provided as denominator.

Longitudinal urine steroid profiling
We hypothesized that the development of radiologically detectable recurrent or metastatic disease would be heralded by an increase in one or more adrenal steroid metabolites excreted in 24-hour urine. Such changes were indeed observed; indicative example cases are shown in heat-map format in Fig. 2B.
An important question here was whether the "malignant steroid fingerprint" observed at baseline (ie, in the preoperative urine at the time of first diagnosis of ACC) represents an inherent characteristic of the individual ACC that is largely preserved upon disease recurrence. We found that this was indeed the case, with re-emergence of steroid metabolites at recurrence mostly identical to those found increased at baseline in the vast majority of patients (Fig. 3A). The overall 6 most increased steroids comprised the 11-deoxycortisol metabolite, tetrahydro-11-deoxycortisol (THS); the 11-deoxycorticosterone metabolite, tetrahydrodeoxycorticosterone; the pregnenolone and 17-hydroxypregnenolone metabolites, 5-pregnenediol and 5-pregnenetriol; and the progesterone and 17-hydroxyprogesterone metabolites, pregnanediol and pregnanetriol (Fig. 3B). The magnitude of steroid marker elevation, however, was substantially smaller upon disease recurrence than in primary ACCs (Table 3), as expected in view of the major differences in disease volume between primary tumor and ACC recurrence (median maximum diameter 92 vs. 11 mm, respectively; Table 2).  Table 1.
The diagnostic performance of the steroid profile review was not altered by adjuvant mitotane treatment (Fig. 4B). Whether the tumor was found to be hormonally active or not at baseline (on clinical biochemistry) also did not appear to affect the diagnostic performance of the urine steroid metabolome on recurrence (data not shown).
The proportion of non-recurred patients in whom recurrences were incorrectly detected by reviewing clinicians varied considerably across the assessors (false positive rates 23% [95% CI 11-39%], 3% [95% CI 0-16%], and 39% [95% CI 24-56%] for clinicians 1, 2 and 3, respectively; Fig. 4C). The effect of the availability of a preoperative urine sample on the specificity of detection could not be meaningfully assessed as only 5 nonrecurred patients had provided a preoperative sample.
Tumor volume at the time of first abnormal surveillance imaging was a second clinical factor, which showed a tendency toward affecting clinician ability to detect recurrence ( 1-3, respectively).   (15). Expressed as fold change in comparison to the upper limit of normal (ULN) referring to a healthy adult control cohort. We compared steroid excretion in the preoperative samples collected with the primary tumor in situ to the first urine samples collected after radiological recurrence detection ( = 1st post-recurrence sample) in the 13 patients with ACC recurrence who provided both pre-and postoperative urine samples. Abbreviations: THS, tetrahydro-11-deoxycortisol; 5-PD, 5-pregnenediol; PD, pregnanediol; PT, pregnanetriol; 5-PT, 5-pregnenetriol; THDOC, tetrahydrodeoxycorticosterone; Etio, etiocholanolone; 5α-THA, 5α-tetrahydro-11-dehydrocorticosterone.
Of note, a considerable proportion of correct recurrence detections (ranging from 22%-39% for the 3 experts) were made based on urine collections that predated the first radiological evidence of recurrence by more than 2 months (Fig. 4D). Only a small number of recurrences were detected later by urine steroid profile interpretation than radiological detection. If late detections were accepted as positive, the overall sensitivities of the clinicians would improve to 75% (95% CI 58-87%), 56% (95% CI 39-72%), and 81% (95% CI 65-91%) for clinicians 1-3, respectively.

Computational analysis of steroid data
Machine learning-based analysis of the urine steroid profile data by random forests were able to distinguish post-recurrence urine samples (n = 76) provided by the 32 recurred patients from postoperative urine samples (n = 215) provided by the 39 non-recurred patients with high accuracy (85%; area under the receiver operating characteristic curve (AUROC) 0.89, 95% CI 0.86-0.91; sensitivity = specificity = 81%) (Fig. 5A).
The machine learning analysis determined the 11-deoxycortisol metabolite THS as the single most important steroid metabolite in differentiating postrecurrence urine samples from samples provided by non-recurred patients, followed by the mineralocorticoid precursor metabolite tetrahydrocorticosterone, the pregnenolone metabolite pregnenediol and the androgen metabolite etiocholanolone (Fig. 5B).

Discussion
In this study, we explored the utility of urinary steroid profiling as a novel diagnostic tool for recurrence detection in patients with microscopically complete (R0) resection of ACC. Results show that analysis of the steroid profiling data by a machine learning-based algorithm represents a highly promising noninvasive and radiation-free tool. Once validated prospectively, this would be a useful addition to the current imaging-focused follow-up protocols, expediting scans in patients with suspicious biochemistry and informing discussions in cases with ambiguous imaging results.
Urinary steroid profiling in conjunction with machine learning-based data analysis, also termed urine steroid metabolomics, has already yielded highly promising results in several retrospective studies in patients with primary adrenal masses, where it was employed to differentiate ACCs from benign adrenal tumors (15,16). In the distinct clinical setting of postoperative patient surveillance after resection of ACC, the use of urine steroid profiling has only been reported in a few cases (25,26), but has never been systematically investigated.
In the present study, we studied 135 adult patients with microscopically complete R0 resection of ACC recruited from 14 centers associated with the European Network for the Study of Adrenal Tumors (ENSAT). Of the 81 patients who completed 3 years of postoperative surveillance, 42 (52%) recurred, a rate that is similar to previous retrospective studies (19,27), which suggests that our patient cohort is representative of ACC patients routinely seen in clinical practice.
An important finding of our study is that there were substantial similarities between the steroid profiles of recurrent ACCs and their respective steroid profiles collected preoperatively, with the primary tumor in situ. Indeed, half of all recurrent ACCs shared 4 or 5 of their "top 6" most elevated steroid metabolites with their primary tumor of origin. Most of the "malignant steroid biomarkers" that were identified in our 2011 study on primary adrenal tumors (15) were also highly relevant in the context of recurrent disease, comprising the "top 6" increased metabolites detected in urines collected from patients with ACC recurrences. Consequently, expert clinicians had improved ability to detect recurrence if a preoperative urine steroid profile was available. This emphasizes the importance of preoperative, baseline sample collection to facilitate personalized management in patients with ACC, a rare cancer in which baseline tissue and blood collection is increasingly becoming routine to support individualized diagnosis and therapy (28).
We assessed the diagnostic potential of urinary steroid profiling as a recurrence surveillance tool using two approaches: (i) an "expert review" approach and (ii) automatic recurrence detection by computational analysis of steroid data using a machine learning-based algorithm. On retrospective, blinded assessment of serial 24-hour urine collections, clinicians were able to detect recurrence by the time of its first radiological manifestation with high sensitivity in cases where a preoperative urine sample was available. In patients who were only able to contribute postoperative urine samples, the ability of clinicians to detect recurrence was substantially lower.
Adjuvant mitotane did not compromise the diagnostic performance of reviewing clinicians, despite the drug's well documented ability to inhibit steroidogenesis (18). Mitotane interferes with adrenal steroidogenesis in a number of ways, including (i) overall suppression of steroidogenesis resulting in lower excretion values for all steroid metabolites, (ii) increased glucocorticoid breakdown by induction of CYP3A4, necessitating high-dose hydrocortisone replacement; and (iii) 5α-reductase inhibition, leading to a decrease in 5α-reduced steroids (18). Although mitotane appeared to blunt the magnitude of the increases in ACC-specific steroid biomarkers in recurred patients, it also suppressed the random sample-tosample variability, which can be diagnostically helpful. We systematically excluded glucocorticoid metabolites from the steroid analysis, as these would be compromised both by mitotane-induced changes in glucocorticoid metabolism and exogenous hydrocortisone replacement.
We applied a machine learning-based approach to the urinary steroid profiling data to detect recurrent ACC in an automated and defined fashion. The biochemical complexity of steroidogenesis with multiple substrates, products, and pathways, in combination with the small underlying disease volumes in the setting of recurrent malignancy, render individual biomarkers diagnostically insufficient. Machine learning-based approaches are ideally suited to systematically evaluate the wealth of data provided by multisteroid profiling in an objective and reproducible fashion, as already demonstrated in the differential diagnosis of adrenal incidentalomas (15). Our classifier could distinguish recurred samples from samples provided by non-recurred patients with considerable accuracy. THS was the most important indicator of malignancy, reflecting the pattern of inefficient steroidogenesis in ACC that emerged in previous studies on detection of ACC in patients with adrenal masses (15,17,29). Indeed, all but one of the 6 steroids that were identified by random forest as most differentiating between recurrence and non-recurrence are contained in the previously described "malignant steroid fingerprint" in ACC (15). It should be noted that, unlike assessing clinicians, the computational classifier did not take into account the dynamic longitudinal changes in steroid metabolites in individual patients but judged every sample on its own.
To our knowledge, this is the first study systematically exploring the diagnostic potential of urine steroid profiling in the postoperative monitoring of ACC patients. Strengths of our study include the large cohort size and the application of computational analysis to meet the demands of the multivariable GC-MS steroid datasets. The limitations of our work pertain to the paucity of preoperative samples in the majority of patients, the variable frequency of postoperative sample collections and the fact that the machine learning classifier has not been validated on an additional data set. We also did not systematically compare the results of routine biochemical analysis of serum steroids to the 24-hour urine analysis by GC-MS; however, we previously demonstrated that routine serum biochemistry only identified abnormalities in 73% of ACC patients (n = 47), while urine steroid metabolomics by GC-MS found abnormalities in 95% (15).
On this background, and despite the generally small disease volume in the recurred patients, in comparison to patients presenting with a large primary tumors (15), our approach yielded very promising diagnostic results. Our data indicate that availability of a preoperative urine and, thus, of the preoperative "steroid fingerprint" considerably improves the likelihood of recurrence detection and, therefore, the preservation of a preoperative 24-hour urine sample should be routinely considered, in addition to preservation of serum, plasma, and tissue, to facilitate precision medicine.
In conclusion, we demonstrated that urine steroid metabolomics, that is, the combination of mass spectrometry-based steroid profiling with machine learningbased steroid data analysis, is superior to interpretation of steroid profile results by individual experts. Following potential further refinement of this algorithm, this diagnostic approach should be taken forward to be assessed against radiological disease detection in a prospective test validation study with systematic collection of preand postoperative urines in defined intervals. This will also allow for systematic comparison of serum and 24-hour urine steroid profiles and ideally utilize highthroughput technology, such as liquid chromatographytandem mass spectrometry or also, as recently published (30), high resolution accurate mass spectrometry, both assays highly suitable for rollout of urine steroid metabolomics into the routine clinical context.