An Anatomical Site and Genetic-Based Prognostic Model for Patients With Nuclear Protein in Testis (NUT) Midline Carcinoma: Analysis of 124 Patients.

Abstract Background NUT midline carcinoma, renamed NUT carcinoma (NC), is an aggressive squamous cancer defined by rearrangement of the NUTM1 gene. Although a subset of patients can be cured, for the majority of patients the prognosis is grim. We sought to classify patients into risk groups based on molecular and clinicopathologic factors at the time of diagnosis. Methods Clinicopathologic variables and survival outcomes were extracted for a total of 141 NC patients from the NUT midline carcinoma Registry using questionnaires and medical records. Translocation type was identified by molecular analyses. Survival tree regression analysis was performed to determine risk factors associated with overall survival (OS). Results For 141 patients, the median age at diagnosis was 23.6 years. Fifty-one percent had thoracic origin compared with 49% nonthoracic sites (41% head and neck, 6% bone or soft tissue, 1% other). The median OS was 6.5 months (95% confidence interval [CI] = 5.8 to 9.1 months). Most patients had the BRD4-NUTM1 fusion (78%), followed by BRD3-NUTM1 (15%) and NSD3-NUTM1 (6%). Survival tree regression identified three statistically distinct risk groups among 124 patients classified by anatomical site and genetics: group A is nonthoracic primary, BRD3-, or NSD3-NUT (n = 12, median OS = 36.5 months, 95% CI = 12.5 to not reported months); group B is nonthoracic primary, BRD4-NUT (n = 45, median OS = 10 months, 95% CI = 7 to 14.6 months); and group C is thoracic primary (n = 67, median OS = 4.4 months, 95% CI = 3.5 to 5.6 months). Only groups A and B had long-term (≥3 years, n = 12) survivors. Conclusions We identify three risk groups defined by anatomic site and NUT fusion type. Nonthoracic primary with non-BRD4-NUT fusion confers the best prognosis, followed by nonthoracic primary with BRD4-NUT. Thoracic NC patients, regardless of the NUT fusion, have the worst survival.


Archer® FusionPlex®
Archer was performed in the Center for Integrated Diagnostics (CID) in the department of pathology at the Massachusetts General Hospital for select cases where the fusion partner to NUTM1 was not identified by FISH or cytogenetics. An Anchored Multiplex PCR (AMP) assay was performed with Archer® FusionPlex® Solid Tumor Kit (ArcherDX, Boulder, CO) for detection of targeted fusion transcripts using next generation sequencing (NGS) 3 . Total nucleic acid was isolated from FFPE sections after histological review for tumor enrichment. The total nucleic acid was reverse transcribed with random hexamers, followed by second strand synthesis to create double-stranded complementary DNA (cDNA). The double-stranded cDNA was endrepaired, adenylated, and ligated with a half-functional adapter. Two hemi-nested PCR reactions using the Archer® FusionPlex® Solid Tumor Kit primers were performed to create a fully functional sequencing library that targets specific genes (exons) listed previously 4

and validated
for clinical reporting, including amongst multiple other genes BRD3 (exons 9-12), BRD4 (exons 10, 11), and NUTM1 (exon 3). Illumina NextSeq 2 x 150 base paired-end sequencing results were aligned to the hg19 human genome reference using bwa-mem 5 . A laboratory-developed algorithm was used for fusion transcript detection and annotation 4 . The integrity of the input nucleic acid and the technical performance of the assay were assessed with a qualitative reverse transcription qPCR assay and assessing the DNA/RNA content in the sequencing results. The assay is validated for samples showing 5% or higher tumor cellularity.

Next-generation (OncoPanel) targeted sequencing
OncoPanel molecular profiling of FFPE sections of tumors was performed in patients who received routine care at the Dana-Farber Cancer Institute (DFCI) and had consented to an IRB-approved, institute-wide research protocol. DNA extracted from FFPE tissue was subjected to targeted exon hybrid capture (Agilent, Santa Clara, CA) and NGS using an Illumina HiSeq 2500 (Illumina, San Diego, CA). Exons of 447 cancer-associated genes were interrogated for mutations and copy number variations, and 191 introns across 60 genes were examined for structural rearrangements, using the targeted sequencing platform developed at Brigham and Women's Hospital (BWH 6 ).
Bioinformatic detection of single nucleotide variants and small indels was performed using MuTect and GATK software. RobustCNV (an internally developed tool) was used for copy number analysis, and BreaKmer for large structural variations as described 7,8 .

Simulation study to determine the sample size needed to validate the risk classification model
We performed computer simulations to determine the minimum sample size of a prospectivelycollected NC patient cohort to validate our proposed risk classification model with 80% power.
We used the observed characteristics of the prognostic factors and overall survival outcomes in our current cohort of N=141 patients to simulate "mock" validation datasets. We then reconstructed the proposed risk classification model using the simulated dataset: we first dichotomized patients by primary tumor site, and then further dichotomized non-thoracic patients by NUT translocation. We performed Cox proportional-hazards regression at each of the two "splits" in the survival tree. If both Cox models were statistically significant (p<0.05), we declared our risk classification model "validated" in this simulated dataset. Conversely, if one or more Cox model was not statistically significant, we declared our risk classification model not validated in this simulated dataset. We then repeated this process in 10,000 independently simulated datasets.
The power to validate the risk classification model is the proportion of 10,000 simulation replicates where the risk classification model was validated. We assessed the validation power across 18 different sample sizes for the validation cohort, ranging from N=100 to 200.

Details for generating the risk classification model using survival tree regression
Using survival tree regression analysis, we selected the risk factor with the strongest statistical association (e.g. lowest p-value) to recursively dichotomize the patients into risk groups. We first dichotomized patients by primary tumor site, which was the risk factor with the strongest univariate association in the full cohort (HR=3.4 [95% CI=2.2-5.1]; p<0.0001; Table 2). Within the subset of patients with thoracic primaries (N=67), none of the remaining risk factors were significantly associated with OS (p>0.07; Supplementary Table S1); thus, the thoracic patient subset was not further dichotomized. For the subset of patients with non-thoracic primaries (N=64), we further dichotomized patients by gene fusion, which was the only significant risk factor in this subset (HR=2.5 [95%CI=1-6]; p=0.042). Since no further risk factors were significant within the gene fusion subsets, patients were not further dichotomized.