TP53 mutations, tetraploidy and homologous recombination repair defects in early stage high-grade serous ovarian cancer

To determine early somatic changes in high-grade serous ovarian cancer (HGSOC), we performed whole genome sequencing on a rare collection of 16 low stage HGSOCs. The majority showed extensive structural alterations (one had an ultramutated profile), exhibited high levels of p53 immunoreactivity, and harboured a TP53 mutation, deletion or inactivation. BRCA1 and BRCA2 mutations were observed in two tumors, with nine showing evidence of a homologous recombination (HR) defect. Combined Analysis with The Cancer Genome Atlas (TCGA) indicated that low and late stage HGSOCs have similar mutation and copy number profiles. We also found evidence that deleterious TP53 mutations are the earliest events, followed by deletions or loss of heterozygosity (LOH) of chromosomes carrying TP53, BRCA1 or BRCA2. Inactivation of HR appears to be an early event, as 62.5% of tumours showed a LOH pattern suggestive of HR defects. Three tumours with the highest ploidy had little genome-wide LOH, yet one of these had a homozygous somatic frame-shift BRCA2 mutation, suggesting that some carcinomas begin as tetraploid then descend into diploidy accompanied by genome-wide LOH. Lastly, we found evidence that structural variants (SV) cluster in HGSOC, but are absent in one ultramutated tumor, providing insights into the pathogenesis of low stage HGSOC.


INTRODUCTION
Worldwide, ovarian cancer has been estimated to affect 225 500 women and claim 140 200 lives annually (1). The majority of ovarian cancers are of epithelial origin and consist of four major morphological subtypes: serous, endometrioid, clear cell and mucinous (2). Low-grade serous and mucinous carcinomas may develop in stepwise fashion from adenomas to carcinomas, while clear cell and endometrioid car-cinomas often arise from endometriosis. In contrast, highgrade serous (HGS) carcinomas develop from an undefined precursor lesion and may progress rapidly without obvious intermediate steps (3). Due to this rapid progression, as well as the lack of specific symptoms and effective early detection methods, HGS ovarian carcinomas are the most lethal subtype, being primarily diagnosed at advanced stages (4). Consequently, early stage HGS ovarian carcinoma is rare (5). In order to elucidate early events in this lethal disease, determine whether the pathogenesis of early stage disease is the same or different from more advanced disease, and accelerate the development of genome-based biomarkers for early detection, we analyzed whole cancer genomes for 16 low stage HGS ovarian carcinomas from the Mayo Clinic and cancer exomes from 28 low stage and 316 late stage HGS carcinomas from TCGA.

Mayo Clinic patients and sequencing
We characterized the whole genomes and transcriptomes from 16 low stage HGS ovarian cancer cases seen at Mayo Clinic (Supplementary Tables S1 and S13). Fresh frozen carcinomas and patient-matched blood samples were used to generate DNA paired end libraries that were sequenced using 100 bp reads on an Illumina GAIIx platform and for genotyping on Illumina Human660W-Quad BeadChip arrays. RNA sequencing was performed on 14 samples using RNA from fresh frozen cancer tissue on the Illumina GAIIx using 50 bp paired end reads. Formalin-fixed paraffin-embedded carcinomas were used for immunohistochemical staining.

TCGA data
In additional somatic mutation analyses, we utilized exome sequence data from 28 low stage HGS carcinomas and 316 late stage HGS carcinomas from the TCGA (6). For additional copy number (CN) analysis, we utilized 28 low stage HGS carcinomas and 446 late stage HGS carcinomas with access-level 3 segmented data from Affymetrix 6.0 single nucleotide polymorphism (SNP) arrays (downloaded from tcga-data.nci.nih.gov).

Bioinformatic sequence analysis
To call DNA sequence variants using single samples, genotypes were initially called on carcinoma and germline samples separately using the Casava 1.7 pipeline. Those calls were compared to the genotype calls from Illumina Human660W-Quad BeadChip. For somatic paired variant calling, reads were aligned with the bwa (7) aligner, then realigned as paired samples using the IndelRealigner in GATK v0.6 (8). Somatic variants were called with Somatic-Sniper (SS) v1.0 (9) for single nucleotide variant (SNV) calls and GATK Somatic Indel detector in GATK v0.6 (8) for insertions/deletions (indels). Variants were annotated using the TREAT workflow (10). Variants were filtered, requiring a minimum somatic score of 15 for SNV, not overlapping a SNP/indel in dbSNP132 (with a minor allele frequency (MAF) of 1% or more), at least two mutant alleles in the carcinoma (with quality of at least Q20), with one forward and one reverse read, maximum fisher strand bias of 40, no more than one mutant read in the germline (or 1% of coverage, whichever is higher) and with no more than two highly homologous regions in the human genome (85% homology using blat) for a region with 50 nucleotides on either side of the variant. To identify significant somatic mutations (i.e. enriched beyond regional background mutation rates), we used the background mutation rate computed with bmr and smg tools of GenomeMuSiC package v0.4 (11). To count transcript levels in RNA sequence data, 50 bp paired end reads were aligned with against hg19 using Tophat 1.4 (12), with the option to align first to known genes with bowtie 1; counts were produced using htseq 0.5.3 (13).

Technical validation of Mayo Clinic somatic mutations
Five carcinomas were sequenced on a standardized NextGen sequencing cancer panel (Supplementary Table  S2) at key genes of interest; all carcinomas with TP53 and BRCA2 mutations were confirmed by this approach. Candidate somatic or deleterious germline mutations were filtered from the standardized NextGen panel results by removing common SNPs (MAF 2% of higher in dbSNP 137) as long as they did not have known deleterious alleles in dbSNP or a locus-specific database (14). Remaining variants are reported in Supplementary Table S3. All the somatic variants reported by NextGen sequencing were also included in the filtered set of somatic variants from the whole genome sequencing.

Mutation validation with Sanger sequencing
Sanger sequencing of regions containing somatic mutations in the 20 most frequently mutated genes among 16 tumors was performed using dye termination chemistry (Big Dye Terminator with the model 3730xl sequencer; Applied Biosystems, CA). Primer sets for polymerase chain reaction (PCR) were designed using the design tool Oligo 6.0. PCR was carried out using AmpliTaq Gold DNA Polymerase (Applied Biosystems) based on the standard protocol. Amplicons from target genes in 16 tumors were amplified (Supplementary Table S4). After PCR, amplicons were treated with the ExoSAP-IT (USB Corp, OH) to degrade unincorporated PCR primers and deoxynucleotide triphosphates. The cleaned products were mixed with 5 pmol of the forward or reverse PCR primers for sequencing. DNA sequence variants were identified using Mutation Surveyor software.

Mutation validation with RNA sequencing and qRT-PCR data
Fourteen out of the 16 Mayo Clinic cancer samples were also subjected to RNA-sequencing analysis as a secondary confirmation of somatic variant calls. TopHat 1.4 was used for alignment to human reference genome (hg19) and variant reporting was done using GATK at the locations of DNA mutations. Additionally, for TP53, manual review of the variants was performed using Integrative Genomics Viewer (IGV). Validation provided by RNA was used to estimate sensitivity and specificity of the DNA somatic calls. Nucleic Acids Research, 2015, Vol. 43, No. 14 6947 Quantitative Reverse Transcriptase PCR (qRT-PCR) was also used to validate mutations.

Sensitivity and specificity of somatic variant calling pipeline
Because the variants that were Sanger sequenced were generated by an independent pipeline, we estimated the sensitivity/specificity of our calling strategy. Estimating the truth for variants using Sanger sequencing results will be a pessimistic evaluation because Sanger sequencing is not sensitive for heterozygous mutations in tumor with low tumor content or with low percentage subclones. A number of mutations have convincing 'high confidence NextGen' variants (mapping quality 30, base quality 30, no overlap with repeats, genome duplications or homologous regions) with negative results in Sanger sequencing but mean mutant allele frequency (MMAF) of only 31%, perhaps because they were in subclones. Half of those high-confidence variants had confirmatory RNASeq alleles, indicating that perhaps they are below the percentage detection threshold of Sanger sequencing. Depending on how the true positive mutations are defined, our sensitivity/specificity can be estimated as 76%/36% (strictly trusting Sanger Results), 84%/55% (high confidence next gen +RNASeq) and 94%/100% (or trusting NextGen results over negatives in Sanger Sequencing). Only three variants (6%) that ended up validating were filtered out by our filtering strategy. The number of indels that were Sanger sequenced was too small to evaluate sensitivity/specificity, but the sensitivity should be poor with standard GATK parameters (requires the indel be present in at least 30% of reads).

p53 immunohistochemistry
Sections of Mayo Clinic tissues were deparaffinized, rehydrated and submitted to antigen retrieval by a steamer for 25 min in target retrieval solution (Dako, Carpinteria, CA, USA). Endogenous peroxide was diminished with 3% H 2 O 2 for 30 min. Slides were blocked in protein block solution for 30 min and then blocked with avidin and biotin for 10 min each, followed by overnight incubation with 1:50 diluted anti-p53 antibody (M7001, Clone DO-7, DAKO) at 4 • C. Sections were then incubated with biotinylated universal link for 15 min and streptavidin-horseradish peroxidase for 25 min at 25 • C. Slides were developed in diaminobenzidine and counterstained with hematoxylin as previously described (15).

Copy number estimation
The CN in cancer genomes from the Mayo Clinic patients was estimated using two approaches. First, whole genome CN was computed using a novel algorithm (patternCNV) which is able to use depth of coverage to compute CN variation despite local coverage variation by learning the pattern of this coverage bias (16). PatternCNV firstly summarizes coverage variation across multiple germline samples to evaluate the local coverage and the variation. Somatic CN were computed for each sample as the difference from the reference pattern in windows of 5000 nucleotides. Regions of CN were computed using the DNAcopy package by weighting each bin inversely to the variance of the pattern. Second, Illumina 650 genotyping array data were used. We used the paired option in Illumina GenomeStudio and extracted the log relative ratio (LRR) and the B-allele frequency of the germline and tumor DNA. The genoCN (17) R package was used to detect somatic CN alterations (CNAs) requiring a minimum of 20 probes per CNA interval. The genoCNA function also estimates computationally the tumor purity, defined as the percentage of cancer cells in the tissue specimen (Supplementary Table S5), based on the levels of the CN segments.

Ploidy and LOH estimation
The WaveCNV (18) segmenter was used to compute loss of heterozygosity (LOH) and ploidy genome wide using the pattern of alleles and CN. As input to the tool, we provided (i) A Variant Calling Format (VCF) file containing the number of reads carrying alleles for germline variants that were called using GATK 2.6 in discovery mode for paired samples, (ii) CN segments obtained from PatternCNV and (iii) known tumor purity from cellularity measurements. The program outputs ploidy, CN per segment and LOH region calls.

CN comparisons
To minimize false detection of CNAs, we limited analysis to recurrent regions identified in TCGA carcinomas which were previously reported to be recurrent (19,20). The aim was to find CN regions differing in prevalence between low stage and high stage cases. In the first step, we used data sets consisting of SNP-based CN estimation (15 Mayo Clinic low stage cases excluding a detected ultramutated carcinoma; 28 TCGA low stage cases; 446 TCGA late stage cases) and WGS-based CN estimation (15 Mayo Clinic low stage cases excluding a detected ultramutated carcinoma). Given the kth region and a CN cutoff, we compute CN occurrence frequency f k, late in late stage, and check if f k, TCGA-early , f k, Mayo-early-SNP and f k, Mayo-early-SNP have consistent and significant difference with f k, late , based on binomial test. Array comparative genomic hybridization (aCGH) data from TCGA serous was downloaded as level 3 data. Segmented data log2 ratios were then averaged over CN region of interest. Average log2 ratios were compared between the low versus late stage carcinomas using a t-test.

Temporal relationship between somatic mutation and CN in Mayo Clinic patients
For every gene with recurrent somatic mutations (more than one mutation observed in Mayo Clinic low stage HGS cases), we identified somatic SNVs (deleterious or benign) within 1 Mb of each gene. These somatic SNVs were filtered to have at least 15 reads in the tumor (so we can accurately estimate the frequency of mutations), no mutant allele in the germline, not overlapping a dbSNP SNP and with germline coverage of at least 10. The somatic frequency for that range was defined as the weighted mean of individual SNV frequencies, weighted by the one minus the probability that this could be a SNP given the germline coverage. The CN was a segment of constant value for the selected region, but was computed as the mean across all somatic SNVs. To correct for low purity (percentage of tumor in specimen), we divided the log2 ratio by the global tumor purity estimated from all variants. We expected the MMAF m for a deletion in clone present in a fraction c i of the tumor in a sample with purity x to be m = x*c i /((2*(1−c i *x)) + 1*x*c i ). If all mutations in that region occurred before deletion events, m should be given by this formula. If some mutation occurred after deletion events, m will be reduced by the fraction f i of the clone for sample i carrying the mutation (Supplementary Figure S5) With each purity-corrected CNA = c i x, we computed a mean estimator F for the low stage/late stage factor as We estimated the minimum and maximum confidence range using leave-one-out cross validation. In Supplementary Figure S1, the red line is the data fit, while the blue lines are the fit from the leave-one-out cross validation data sets. A one-sided Wilcoxon test of f i -1 is used to estimate the significance of the results when F<1.

Kataegis analysis
Mutations were additionally filtered to remove mutations that are potentially from mapping artifact by removing any mutations where a 100 bp region surrounding the mutation maps at more than three locations on the genome at 90% identity using Basic Local Alignment Search Tool. The graph was generated to plots the distance between consecutive mutations on a sample. Pink and black lines at the bottom alternate color between different chromosomes (even and odd chromosomes), and vertical dotted green lines highlight chromosome boundaries.

HR-deficient signatures
The signature from (21) was used to create a heatmap. Because the signature was created on a different platform, we only selected genes with the top 50% of expression in our data set and a ratio of standard deviation to median expression in the top 50%. Gene counts were divided by the total gene counts of each sample, and the median was subtracted prior to dividing by the standard deviation. Hierarchical clustering of the samples was performed in R using the heatmap function with the Euclidean distance metric. Results are shown in Supplementary Figure S2.

Structural variant analyses
Structural variants (SVs) were called using CREST 1.0 (22), SVs where the ends were within 10 bp of each other were marked as duplicate and only counted once. SVs within 1 Mb of each other were considered clustered.

RESULTS
From 16 low stage HGS ovarian cancer cases treated at the Mayo Clinic, we generated an average of 46-fold coverage of the haploid genomes (tumor and matched germline DNA) by whole genome sequencing. Approximately 93% of the exomes and 86% of the genomes were covered by at least 10 reads in both tumor and germline DNA (Supplementary Table S1). Concordance between sequencing SNVs and SNP array genotypes was greater than 97% (Supplementary Table S6). In total, we identified 138 767 somatic single nucleotide mutations among 15 low stage HGS carcinomas (Table 1) and 1 087 366 somatic single nucleotide mutations in an ultramutated low stage cancer. The ultramutated stage IC HGS carcinoma was obtained from a patient not exposed to prior chemotherapy or radiation therapy and without any previous cancer diagnosis. This ultramutated carcinoma had a POLE V411L mutation previously implicated in ultramutated endometrial (23,24) and colorectal carcinomas (25). Excluding this carcinoma, there were a total of 949 coding or splice-site somatic mutations among the remaining 15 low stage HGS ovarian carcinomas. From this point forward, analyses excluded the ultramutated carcinoma unless otherwise stated.
Among the 15 low stage HGS cases, there were a greater number of non-synonymous than synonymous somatic mutations (paired t-test, P<1.1×10 −4 ) ( Figure 1A), with a mean ratio of 2.89; this was also true for the ultramutated cancer. C to T (C>T) transitions were the most common mutations (27%) and were enriched relative to the mean number of mutations per type (P = 7.0×10 −8 ). The least common mutations were T>G transversions (8%) (Supplementary Table S7), which were rarer than expected by chance (P = 8.0×10 −6 ). Enrichment of C>T at CpG sites was different in exonic versus non-exonic regions. Within the exonic regions, C>T substitutions were significantly higher at CpG sites than at non-CpG sites (paired t-test, P = 2.5×10 −4 ) ( Figure 1B). High rates of C>T substitutions at CpG sites were also observed in previous tumor exome studies (26,27); the deamination of methylated Cs has been suggested as a possible mechanism of C>T substitutions. In contrast, in non-exonic regions, C>T substitution rates at CpG and non-CpG sites were similar (P = 0.19) (Figure 1C), consistent with global hypomethylation in cancers (28).
Most chromosomes showed similar mutation rates, and all showed greater transition than transversion rates ( Figure 2A). Chromosomes 17, 19 and 22 showed significantly higher genomic (exonic + non-exonic) transition/transversion ratios compared to the average (paired t-test, P-values 0.03, 0.048 and 0.007, respectively) ( Figure 2B), which may reflect the fact that these chromosomes have the highest GC content. These chromosomes also had the lowest non-exonic mutation rates ( Figure 2C). Chromosome X had significantly lower transition/transversion ratios than the genome-wide average (P = 0.0198) ( Figure 2B). The ultramutated cancer (Patient D) also had more transitions than transversions (Supplementary Table S7). Genome wide, the mutation rate was significantly higher in the non-exonic regions compared to the exome (paired t-test, P = 2.0×10 −17 )   Mutation rate for selected types of mutation (indicating nucleotide before mutated nucleotide, nucleotide to mutate and possible product of mutation) for each cancer, divided by the number of mutational contexts for each mutation type, in order to detect possible biases for mutational processes (e.g. CpG->(G/A) has two contexts) in exonic regions (B) and non-exonic regions (C). Note that C>T somatic substitutions are significantly higher at CpG sites than at non-CpG sites in coding regions (B) but not in non-coding regions (C). A>mut is the mutation rate of A to any other substitutions.
( Figure 2D), suggesting some selection against mutations in exonic regions. Recent studies have uncovered the presence of hypermutated regions, termed kataegis, in some cancer genomes (29). Therefore, we investigated evidence for kataegis among the 16 low stage HGS ovarian carcinomas. We found evidence of a potential kataegis event in chromosome 19 in Patient I, a stage I ovarian cancer ( Figure 3A). In addition, we found a novel chromosome-wide hypermutational event with enriched T>C substitutions in chromosome 6 in the cancer from Patient C with stage IB disease ( Figure 3B). This observation is novel because previous kataegis events were associated with either C>T or C>G substitutions (29).
Excluding the ultramutated cancer, the coding sequences of 38 genes were mutated in more than one of the 15 low stage HGS ovarian carcinomas and were above the expected background mutation rates corrected for gene size and specific mutation rate (Table 2). Manual curation of mapped sequences using IGV (30,31) revealed that the TP53 gene was Median is indicated by red dotted line. Note that mutation rates in non-coding regions are consistently higher than mutation rates within coding regions. mutated in all but one of the Mayo Clinic low stage HGS ovarian carcinomas (Table 3) and at high mutant frequency when results in these carcinomas were combined with the 28 low stage HGS carcinomas of TCGA (Supplementary Table S8). A high frequency of mutations was found throughout the TP53 coding regions as well as at known hotspots, consistent with the mutation profiles of tumor suppressor genes (26,32). Consistent with the presence of TP53 mutations, high levels of p53 staining were observed in all but two cases analyzed (Table 3 and Supplementary Figure S3). These data are consistent with results of TP53 studies in late stage (6,33) and low stage HGS ovarian carcinomas (34). Combined with the Mayo Clinic data, results collectively support the hypothesis that mutations in TP53 represent the earliest documented genetic lesions in HGS ovarian cancer (34). For a subset of 14 patients with RNA sequence data (Table 3), the mutant allele was observed at a higher frequency than the wild type allele, either through deletion of the wild type allele or through epigenetic silencing of the other allele by mechanisms that were not further character-ized here. The cancer from Patient H is the only with no mutation in TP53; this cancer, however, had a mutation in the CHEK2 (p.V246L) gene, although the functional significance of this somatic mutation is not known .
In addition to TP53, there were additional alterations in genes frequently mutated in late stage HGS (Table 4). In particular, our low stage cases contained potentially deleterious somatic variants of unknown significance (VUSs) in BRCA2 (Patient F: P2612S and Patient M: K2017I) and CSMD3 (Patient K: P1653Q). Moreover, one case (Patient P) had a germline frameshift mutation in exon 13 of BRCA2 mutation along with somatic loss of the nonmutated allele in the cancer. Patients C, D, F and G also have germline VUSs in BRCA2. For Patients F and M, with somatic BRCA2 mutations, the mutant BRCA2 allele was homozygous (Table 4) as often occurs in functional mutation of tumor suppressor genes; however, both variants are VUSs. In addition, Patient N acquired a homozygous deleterious somatic mutation in BRCA1 (Frameshift in exon 17, rs80357572) (36 out of 43 reads carried mutation), and the ultramutated carcinoma (Patient D) acquired a VUS in BRCA1 (T231N). Patient B had a germline VUS in BRCA1 (E1167K, rs80356923), but with predicted deleterious effects. Results described by the TCGA suggest that many HGS ovarian carcinomas have both extensive changes in CN and defects in the homologous recombination (HR) pathway (6). Subsequent studies have suggested that LOH changes correlate with HR defects (35). Therefore, we summarized CN and LOH changes and examined evidence of defects in HR. The number of long LOH regions (>15 Mb), called the homologous recombination deficiency score (HRD score), in cancer genomes has been shown to correlate with the deficiencies in HR (35). We determined this HRD score using the WaveCNV package to estimate LOH and also determined an HRD-like score (counting CN decrease instead of LOH). High HRD scores (>15) were found in 10 out of the 16 low stage HGS carcinomas. The six carcinomas with low HRD scores were patients with high predicted ploidy and the ultramutated cancer for Patient D (Table 4 and Supplementary Table S5).
The majority of carcinomas with high HRD scores (HRD score ≥ 15) contained alterations in HR genes, suggesting potential defects in HR pathway (Table 4). These defects included somatic homozygous frameshift mutation in BRCA1 (Patient N), somatic homozygous muta-tions in BRCA2 (Patients F and M), known delerious (36 ) RAD51C mutation (Patient L), CN loss associated with BRCA1, BRCA2 and RAD51D (Patient J), and CN loss associated with BRCA1, RAD51C and RAD51D (Patient I). Patient P had a germline frameshift mutation in BRCA2 that became homozygous in the carcinoma, suggestive of a potentially deleterious mutation. Interestingly, this carcinoma has a low HRD score. In carcinomas from Patients A, G and M with high HRD scores, BRCA1 and/or BRCA2 mRNA were markedly downregulated by RNA-sequencing analysis (Table 4) and qRT-PCR (data not shown). Thus, it appears that decreased expression, like deleterious mutations, might contribute to genomic instability, as manifested by a high HRD score, in these low stage HGS ovarian carcinomas.
Carcinomas from three patients (Patients E, O and P), including a patient with homozygous frameshift mutation in BRCA2 (Patient P), had relatively low HRD scores (< 10) and low genome-wide LOH burden (<0.10). The estimated CN in these carcinomas was comparable to tetraploidy (WaveCNV > 3.5, Supplementary Table S5).
To explore additional evidence for HR deficiency, we conducted hierarchical clustering of RNAseq data using a signature of genes differentially expressed between HRdeficient and HR-competent cell lines (21). Carcinomas with confirmed deleterious mutations in HR genes clustered   closely with those with LOH in multiple HR genes (Supplementary Figure S2). For example, carcinomas from Patients C and G appear to cluster together and both have high HRD score, low BRCA1 expression and BRCA1 LOH. In addition, Patient G has a germline BRCA2 P451Q VUS that Table 4. Evidence suggestive of defects in the homologous recombination repair pathway  became homozygous in the carcinoma. Carcinomas from Patients M and N cluster together and both have high HRD score and homozygous somatic mutations in BRCA2 and BRCA1, respectively. Carcinomas from Patients A, B, O, E and K form another cluster with those from Patients A and B having high HRD score, the carcinoma from Patient K is just below the cutoff (HRD score < 15) and those from Patients O and E have low HRD score but have high ploidy. Carcinomas from Patients P, I and L form another group with mixed HRD score. This later group consists of Patient P with low HRD score who has a homozygous frameshift mutation in BRCA2 and Patient I with high HRD score. Using CN data from microarrays, we identified three recurrent regions of CNA where the frequency of alteration appeared to differ between low and late stage HGS ovarian carcinomas (TERT, SKP2 and PRIM2). However, only the PRIM2 region showed a consistent CN difference between low stage and late stage cases (Supplementary Figure S4) when aCGH CN data from TCGA was used as a technical validation (P = 1.1 × 10 −11 ). PRIM2 also exhibited significant changes in mRNA levels with a significant expression-CN correlation (Figure 4).
The MMAF versus purity-corrected CN frequency can be used to infer the sequence of genetic alterations in the evolution of cancer (Supplementary Figure S5). In Supplementary Figure S1, we provide the MMAF versus puritycorrected CN in 15 patients. We computed the factor F (see Materials and Methods section) for three key genes using carcinomas with deletions with log2 ratio ≤ −0.3 and one or more somatic mutation. In Supplementary Figure S5, an F factor of 1 indicates that mutations arose before the CN event. Considering any somatic mutation, we found that BRCA2 somatic mutations (chr 13) occurred before CN events with F = 0.98 [0.85-1.16] (P wilcox = 0.125), based on four carcinomas. However, TP53 (chr 17), with F = 0.62 [0.55-0.83] (P wilcox = 0.01) from seven carcinomas and BRCA1 (chr 17), with F = 0.63 [0.61-0.68] (P wilcox = 0.007) from eight carcinomas, did not show the same pattern. This suggests that the accumulation of somatic mutations in BRCA2 occurred prior to the deletion of chromosome 13 but that multiple somatic mutations occurred on chromosome 17 (TP53 and BRCA1) after the CN deletion. However, in Supplementary Figure S1D, we limited to deleterious-only TP53 mutations. In this case, we found F = 1.10 (1.03-1.11) with seven carcinomas, suggesting that the deleterious TP53 mutations predate the deletion of the other functional allele. Overall, this analysis indicates that a heterozygous somatic mutation in TP53 is an early event which, in some samples, is followed by a chromosome 17 deletion resulting in TP53 inactivation at a relatively early point in the progression of the carcinoma. In contrast, chromosome 13 deletions are a relatively later event in the evolution of HGS ovarian cancer in this study.
Supplementary Figure S6 shows the genome-wide distribution of LOH for all samples, with the relative number of samples with an LOH event plotted in blue at the bottom of the figure. We observed that LOH was most common in chromosome 17 (14 samples, 54-70 M), followed by 4q31-35 (11 samples), 22q12 (12 samples), 9q33 (11 samples), 8p23 (10 samples), 18q (10 samples), X (10 samples), 1p (nine samples) and 13p (nine samples). Supplementary Figure S7 shows a clustered genome-wide LOH heatmap and Supplementary Figure S8 shows the LOH heatmaps for the chromosomes with the greatest LOH. Despite the widespread LOH in most samples, carcinomas from Patients E, O and P, which had high mean ploidy, had low LOH as indicated above.
Finally, to identify somatic structural variation (SVs) such as large INDELs and interchromosomal translocations, we applied the CREST bioinformatics tool and characterized somatic SVs uncovered in these samples. Results shown in Supplementary Table S9 indicates heterogeneity in SV burden in these tumors. Interestingly, the ultramutated carcinoma from Patient D has no detectable SV in addition to little CNV. Genes affected by recurrent translocation in at least two samples are shown in Supplementary Table S10. DAVID Bioinformatics Analysis (37) of functional annotation of genes affected by recurrent SVs in these samples indicate cell projection, morphogenesis and GTPase regulator activities as biological functions affected by these genes (Supplementary Table S11). It is striking that 43% of the SV events (Supplementary Table S12) cluster in groups of SVs (separated by 1 MB or less). The paucity of Kataegis events but presence of clustered SVs in these samples raises the possibility that these might be a preferential outcome of showers of DNA damage in OCs with repair defects.

DISCUSSION
Using a rare set of low stage HGS ovarian carcinomas from the Mayo Clinic, we report the first whole genome sequencing in ovarian cancer and identify early mutational events in this disease. Integrated analysis of whole genome sequences, RNA sequences and CN changes produced five major findings: (i) frequent mutations and LOH in TP53, (2) PRIM2 loss as a frequent genetic alteration, (iii) frequent HR defects and genome-wide CNA ,(iv) clustering of structural variants in genome, and (v) polyploidy in certain low stage HGS ovarian cancers, .

TP53 mutations and LOH in low stage HGS ovarian cancer
Our analysis indicates that TP53 is the most frequently mutated gene in low stage HGS ovarian cancer. This observation, taken together with previous studies indicating that TP53 mutations are ubiquitous in late stage HGS ovarian cancer (6,33), highlights TP53 mutations as the pathognomonic feature of HGS ovarian cancer. As this patient collection consisted of low stage HGS carcinomas, this finding lends further support to prior studies of early/low stage HGS, which showed somatic TP53 mutations in serous tubal intraepithelial carcinomas from women with germline mutations in BRCA1 or BRCA2 (38). Our results also indicate that deleterious TP53 mutations precede CN deletion events that eliminated the wild type copy of TP53. Collectively, these results suggest that the most frequent sequence of events leading to tumorigenesis in HGS ovarian cancer is an early somatic mutation in TP53 followed by genomewide LOH that often includes CN loss in chromosomes 13 or 17.
Detailed analysis of TP53 mutations in our study led to three additional observations. First, we identified loss of the wild type copy via deletion or LOH in 10 out of 16 low stage HGS carcinomas with mutant TP53. Second, in the carcinomas without allelic loss of wild type TP53, expression analysis showed inactivation of the transcriptional activity of the wild type copy in our RNA sequence data for four more carcinomas. Third, one cancer without any TP53 mutation (Patient H) had the smallest number of deleterious somatic mutations (four missense, one nonsense and one splice-site mutations), yet had a relatively high HRD score. Interestingly, this carcinoma also had no mutations in the high frequency driver genes found by TCGA, namely, NF1, CDK12, FAT3, GAGRA6, BRCA1 and RB1. It did, however, have a somatic CHEK2 mutation, which might well account for the genomic instability as manifested by the relatively high HRD score. Interestingly, Patient H is also the only patient with stage I ovarian carcinoma who experienced recurrence and died from the disease (Supplementary  Tables S1 and Table S13).
In this series of low stage HGS ovarian carcinomas, we observed four TP53 mutations (R273C, R248W, R248Q and R175H) that were previously described as 'neomorphic' (gain of function, GOF) changes (39). The frequency of neomorphic mutations in the Mayo Clinic carcinomas was similar to that reported in the TCGA set of primarily late stage carcinomas (P-value 1.0, t-test). Some of these neomorphic mutations have been shown to produce metastatic tumors in mouse model systems. For example, R172H (equivalent to R175H in human) mutant mice produce highly metastatic tumors (40,41). It is therefore interesting that a neomorphic TP53 mutation, R175H, is found in early stage ovarian cancer (Patient C) with stage IB at diagnosis. It is possible that incidental finding of some of these early stage carcinomas may have limited the metastatic potential. In addition, the metastatic behavior of R172H mutant in mice is also host-and dose-dependent (40,41), and host factors may have limited the metastatic potential of R175H in this particular case of HGS ovarian carcinoma. Consistent with the role of GOF TP53 mutants in promoting metastatic potential, Kang et al. observed that patients with GOF TP53 mutations in the TCGA cohort were more likely to develop distant metastasis (42). However, no significant association between GOF TP53 mutations and clinical outcome was observed (42). These results are also consistent with prior studies by Ahmed et al., who also found no significant association between two types of TP53 mutations (missense versus non-missense) and clinical outcome (33).
The other three patients with putative neomorphic TP53 mutations were diagnosed with stage II carcinomas. Two of them (Patients K and O) had low HRD scores and no defects in HR genes. Since HR deficiency is associated with better response to platinum-based chemotherapy and PARP inhibitors (6,(43)(44), it would be important to investigate the extent to which some of the neomorphic TP53 mutations are associated with HR proficiency and whether these mutants are associated with poor response to platinum-based chemotherapy or PARP inhibitors.

PRIM2 loss as a frequent alteration in early/low stage HGS ovarian cancer
Our results showed that the PRIM2 gene was more frequently deleted in early stage than in late stage HGS ovarian cancer. PRIM2 encodes DNA primase polypeptide 2, which is involved in DNA replication and synthesizes Okazaki fragments to seed DNA replication starting points. It is possible that lower levels of this enzyme facilitate structural rearrangements by disruption of DNA replication. Studies in the Saccharomyces cerevisiae model system indicate that defects in lagging-strand replication are repaired by a RAD51-dependent mechanism (45). Therefore, both PRIM2 loss and RAD51 downregulation in HGS cancer may result in utilization of the error-prone repair pathway (e.g. non-homologous end joining) and may contribute to the extensive rearrangements found in HGS ovarian cancer. Interestingly, both PRIM2 loss and CHEK2 mutation are observed in the cancer from Patient H without any TP53 mutations. The CNAs observed this cancer are comparable to other early stage HGS ovarian carcinomas with TP53 mutations and HR defects, suggesting the possibility that PRIM2 and CHEK2 loss in Patient H may contribute to comparable levels of structural alterations.

HR defects and genome-wide LOH
The pattern of LOH found in early stage HGS ovarian carcinomas is consistent with the pattern previously reported in late stage disease (19). Wang et al. reported chromosome 17 and 13 as the two chromosomes most frequently exhibiting large-scale LOH. While our results replicate the earlier findings for chromosome 17 (LOH in 14 samples), chromosome 13 LOH occurred in only nine samples, and other regions were more frequently subject to LOH [22q12 (12 samples), 4q31-35 (11 samples), 9q33 (11 samples), 8p23 (10 samples), 18q (10 samples) and X (10 samples)]. This lower frequency of chromosome 13 LOH compared to chromosome 17 reinforces our thesis that chromosome 13 LOH may be a later event. Although this conclusion is limited by small sample size, this is consistent with our joint analysis of mutations and deletions on chromosome 13, which showed deletions to be late events.

SVs and clustered damage
In contrast to a number of cancers, where clusters of point mutations called Kataegis events are observed, few such events were observed in the low stage HGS ovarian cancers examined here. On the other hand, many of the SVs detected in the low stage ovarian cancer samples occurred in clusters. These results suggest that DNA damage might have occurred in clusters in these tumors but, perhaps because of differences in repair capacities or anomalous function of the topo-isomerase genes, led to clusters of doublestrand breaks and subsequent SVs in early ovarian cancer with lower rates of Kataegis events than observed in other cancers (46).

Polyploidy in carcinomas with minimal LOH
The Mayo Clinic carcinomas with high HRD scores showed LOH alteration, consistent with Wang's 'Cluster HiA' (High LOH) carcinomas with HR defects and good response to platinum therapy (19). The Mayo Clinic carcinomas with low levels of LOH had low HRD scores and were similar to Wang's cluster Lo carcinomas, which were less responsive to platinum therapies (19), presumably due to a functional HR pathway. Our observations also indicate that carcinomas with low LOH are associated with polyploidy, which has also been implicated in cisplatin resistance (47) and associated with worse prognosis (48).
We are the first to show the co-existence of polyploidy and low genomic LOH in HGS ovarian carcinoma. Coexistence of polyploidy and functional HR in these samples would be consistent with prior studies in yeast indicating the synthetic lethality of tetraploidy and HR defects in yeast (49). However, there is other evidence that tetraploid tumors can have defective HR (via BRCA1 mutation) at the same rate as diploid tumors (50). It is also possible that the tetraploid tumors are not able to display a high HRD score despite the presence of a HR defect.
Previous studies in mice identified a possible mechanism of ovarian epithelial tumorigenesis via a transition to a tetraploid state (51) followed by aneuploidy changes as long as TP53 mutations were present. Carcinomas from Patients O, P and E may be in the earliest stages of this transformation and have not yet undergone transition toward the diploid state with a high frequency of LOH despite strong evidence of HR inactivation in Patient P (homozygous frameshift mutation of BRCA2). Thus, some of those tetraploid tumors may still have a defective HR pathway despite exhibiting a low HRD score. Nevertheless many tetraploid tumors may still be HR competent to explain the higher recurrence rate of low LOH tumors in (19). This hypothesis is also supported by the clustering of carcinomas from Patients O, P and E with other samples with high HRD scores based on the gene expression signature previously described (21).
Lastly, we found one ultramutated carcinoma (Patient D), which initially appeared completely different from other patients. This cancer had POLE V411L mutation that was characterized as one of the hotspot mutations in ultramutated endometrial and colorectal carcinomas (25,52). This patient had minimal levels of SV, CNV and LOH and thus the same pattern as seen in other carcinomas with high rate of somatic SNVs (53). For this cancer, it is likely thatPOLE mutation contributes to an ultramutated phenotype with low levels of structural alterations.

CONCLUSIONS
Although our sample size is limited due to the rarity of low stage HGS ovarian carcinomas, the overall genomic landscape of these carcinomas is remarkably similar to late stage HGS ovarian carcinomas in mutation spectrum, burden and structural rearrangement. These results, which are consistent with the earlier observation by Kobel et al. that early and late stage HGS carcinomas are indistinguishable based on immunohistochemical analysis 21 biomarkers (54), extend these findings to the whole-genome level.
While TP53 mutation is a key early event, we have not identified an event that would lead to a high rate of somatic mutations to initially inactivate TP53. Additional studies evaluating candidate mechanisms, such as APOBEC3B (46) are needed to understand initial carcinogenic events that lead to inactivation of TP53. Our data, which show that mutations in TP53 may be driver events in ovarian carcinogenesis, provide additional impetus for efforts to restore the function of TP53 or exploit its deficit in patients with HGS ovarian cancer. The present results also indicate that chromosome 13 events (CN changes or LOH) are later than chromosome 17 events. We have also identified one early stage HGS ovarian cancer with an ultramutated phenotype, but LOH analysis has revealed that this ultramutated phenotype is superimposed on a background of low-levels CN and LOH events consistent with many of the other low stage, HGS ovarian carcinomas, indicating that this is also a late event. In addition, we identified several carcinomas with low LOH and high estimated ploidy which is consistent with the model in which some HGS ovarian cancers are initially tetraploid and become nearly diploid due to LOH.
Despite heterogeneity in somatic sequence variations, CNAs and structural variations, none of these somatic genotypes appears to be associated with clinical outcome. Only stage appears to be strongly linked to patient survival, as six of eight patients with stage I HGS ovarian carcinomas are still alive (median follow-up 101 months), whereas only two of eight with stage 2 disease are still alive (median follow-up 79 months). Collectively, these observations provide new insight into the biology and likely pathogenesis of early stage HGS ovarian cancer.