Acetaldehyde makes a distinct mutation signature in single-stranded DNA

Abstract Acetaldehyde (AA), a by-product of ethanol metabolism, is acutely toxic due to its ability to react with various biological molecules including DNA and proteins, which can greatly impede key processes such as replication and transcription and lead to DNA damage. As such AA is classified as a group 1 carcinogen by the International Agency for Research on Cancer (IARC). Previous in vitro studies have shown that AA generates bulky adducts on DNA, with signature guanine-centered (GG→TT) mutations. However, due to its weak mutagenicity, short chemical half-life, and the absence of powerful genetic assays, there is considerable variability in reporting the mutagenic effects of AA in vivo. Here, we used an established yeast genetic reporter system and demonstrate that AA treatment is highly mutagenic to cells and leads to strand-biased mutations on guanines (G→T) at a high frequency on single stranded DNA (ssDNA). We further demonstrate that AA-derived mutations occur through lesion bypass on ssDNA by the translesion polymerase Polζ. Finally, we describe a unique mutation signature for AA, which we then identify in several whole-genome and -exome sequenced cancers, particularly those associated with alcohol consumption. Our study proposes a key mechanism underlying carcinogenesis by acetaldehyde—mutagenesis of single-stranded DNA.


INTRODUCTION
Alcohol consumption is associated with a variety of cancers and is among the leading causes of mortality in humans. One of the key metabolites driving alcohol toxicity is acetaldehyde (AA), which is generated by the oxidation of ethanol. Additionally, AA can be obtained from a variety of other foods and beverages, as well as tobacco smoke (1). In vivo, AA is oxidized to acetate through a series of NAD+-dependent aldehyde dehydrogenases ((2) and reviewed in (3)). However, dysfunction of AA-detoxifying mechanisms has severe cytotoxic and mutagenic outcomes, which can lead to carcinogenesis. Free AA is highly reactive towards key biomolecules including DNA and protein, which can inhibit cellular processes and contribute to carcinogenesis (4,5). Based on its toxic properties, the International Agency on Research on Cancer classifies alcohol consumption-associated AA as a group I carcinogen (6).
The genotoxic effects of AA have been explored in several studies. Previous in vitro studies have demonstrated that AA can react with guanine residues on DNA and form bulky adducts (reviewed in (7)). Examples include the well-studied adduct N 2 -ethyl-2 -deoxyguanosine (N 2 -Et-dG) (8), which has been often used in in vitro studies to analyze the effects of AA-mediated DNA damage (9)(10)(11). Earlier studies of AA-mediated genotoxicity have heavily relied on either adducted plasmids, modified oligonucleotides, or single marker plasmid systems. For example, in human embryonic kidney cells, N 2 -Et-dG are associated with GC→TA transversions (12). N 2 -Et-dG has been shown to stall replication (12), as well as transcription (13) Additionally, other AA-derived DNA adducts such as ␣-Sand ␣-R-methyl-␥hydroxy-1, N 2 -propano-2 -deoxyguanosine (CrPdG) have been shown to induce DNA intra-strand crosslinks in adjacent guanine residues, leading to signature GG→TT transversions in single-and double-stranded plasmid DNA, as well as human fibroblast cell lines (14). Acetaldehyde has been shown to induce chromosomal-scale DNA damage in CHO cells deficient in homologous recombination and nucleotide excision repair (15). Studies in both budding yeast and fission yeast have genetically demonstrated the induction of DNA repair pathways by AA treatment (16,17). Using Xenopus egg extracts, it was shown that AA-derived DNA inter-strand crosslinks retard replication fork progression and lead to increased mutation frequency (18). In colorectal cancer cell lines, elimination of tumor-suppressor HR genes results in AA hypersensitivity (19), further highlighting AA-induced DNA damage and the ensuing cellular responses that sense such damage.
However, a diagnostic and specific mutagenic signature of AA exposure in vivo has proven elusive. Mutation signatures, which are the characteristic patterns of single and double base substitutions associated with discrete mutagens and metabolic processes, help understand the mutagenesis mechanisms leading up to cancer development. Advances in computational analyses of large-scale sequencing data has revealed the patterns of somatic mutations for thousands of whole-genome and-exome sequenced cancers by several groups (20)(21)(22)(23). Therefore, identification of a discrete AA-associated mutation signature would serve as a critical predictor of carcinogenesis especially in alcoholassociated cancers. Previous studies investigating ethanol mutagenicity in yeast were similarly inconclusive as to the role of AA in mediating mutagenesis as AA was not found to be mutagenic in these yeast strains (24). Treatment of induced pluripotent stem cells (IPSCs) with AA led to a profound DNA damage response, however, no increase in mutagenesis was seen in these cells (25). In esophageal carcinoma patients, an alcohol-associated T→C mutation signatures has been described; although the mutation signature correlated with smoking and drinking, the signature was not specific for AA-induced mutations (26). Overall, while AA is demonstrably mutagenic, there is a lack of consensus among different studies as to its precise mutation signature and spectrum in vivo. An obvious commonality that could reconcile the above studies is that most in vivo analyses of AA mutagenicity is conducted in DNA-repair proficient backgrounds. As a result, low mutagenicity combined with efficient repair could easily confound an AA-specific mutation signature. Discrepancies between the induction of a DNA damage response, increased genome-instability and carcinogenesis associated with AA and the lack of detectable mutagenesis by AA necessitates sensitive models to unambiguously detect the mutations induced by AA.
In the present study, we investigate the mutation spectrum and signature of acetaldehyde using a sensitive yeast reporter assay. We show that AA strongly mutagenizes single-stranded DNA in vivo and can induce strand biased mutations at an elevated frequency. AA-derived mutations depend on translesion synthesis by DNA polymerase zeta (Pol ). Using whole-genome sequencing, we determined the mutation spectrum of AA and describe its trinucleotide mutation signature. Finally, we identified cancers among the whole-genome-sequenced cohorts from the Pan-cancer Analysis of Whole Genomes (PCAWG) (23) and wholeexome-sequenced cohorts from the International Cancer Genome Consortium (ICGC) (27) carrying the novel AA mutation signature and show that it positively correlates with a previously identified AA-associated mutation signature. Our work provides a novel and diagnostic signature of acetaldehyde exposure in yeast and cancers.

Yeast strains
Strains used in this study are derivatives of CG379 with the genotype MATα his7-2 leu2-3,112 trp1-289, cdc13-1. The genes CAN1, URA3, ADE2 and LYS2 were deleted from their original loci and reintroduced as the array lys2::ADE2-URA3-CAN1 at the left de novo telomere arm of Chromosome 5, as described earlier (28). RAD1, REV3 and RAD30 were deleted using KANMX. The strain carrying the rev1-AA allele was the same as described earlier (29). All the strains and the primers used in the study are listed in Supplementary Table S1.

Acetaldehyde-induced mutagenesis
Yeast strains carrying the cdc13-1 temperature sensitive (ts) allele were grown as described earlier (28). Briefly, cultures of the cdc13-1 strains were grown at 23 • C for 72 h until saturation. Roughly 10 7 cells were inoculated into fresh YPD and grown with shaking at 37 • C for 6 h in Erlenmeyer flasks to induce resection at telomeres. Cells were monitored for G2 arrest by microscopy, at which point >95% cells arrested as large double buds. Thereafter, cells were harvested by centrifugation, washed three times with sterile water and resuspended in water in 15ml conical tubes. Acetaldehyde was added to samples at a final concentration of 0.2%, and samples were incubated alongside the control samples (without AA) at 37 • C in a rotary shaker for 24 h. To minimize AA evaporation during the experiment: (i) AA was kept on ice prior to addition, (ii) prechilled pipette tips were used to dispense AA, (iii) tubes were filled to the top, leaving very little headspace during rotary incubation and (iv) tubes were sealed with parafilm. Appropriate dilutions of cells were plated on complete synthetic complete (SC) media (MP Biomedicals) to measure viability and SC-Arginine plates containing 60mg/ml canavanine (Sigma) and 20mg/ml adenine to isolate Can R Ademutants (red colonies). All plates were incubated at 23 • C for 5-7 days until colonies appeared. Mutants were tested for Chromosome V left arm loss by replica plating cells on SC-Uracil to select for loss of the URA3 gene as described earlier (28). Genomic DNA was isolated from independent Can R Ademutants for whole genome sequencing.

DNA sequencing
Genomic DNA was isolated from yeast strains using the Zymo YeastStar genomic DNA isolation kit (Genesee Scientific) using the manufacturer's recommended protocol. DNA concentrations were measured via Qubit (Invitrogen) and diluted to approximately 10 ng/ul for library preparation. Diluted DNA was used for fragmentation on Covaris LE 220 system. The KAPA Hyper prep system (Roche) for library preparation with each sample acquiring a unique dual index adapter. After pooling all instances, the Illumina NovaSeq sequencing system for analysis. Sequencing reads were aligned to the reference genome ySR127 (30) using BWA-mem (31) and duplicate reads were removed using Picard tools (http://broadinstitute.github.io/picard/). Single nucleotide variants (SNVs) were identified using VarScan2 (32), using a variant allele frequency filter of 90%. Unique SNVs were by identified by comparing AA-treated samples with untreated parent strains serving as matched normal and removing duplicates.

Mutation spectrum and signature analysis
Mutation analysis was done as previously described (29). Mutations within 30 kb of the chromosome ends were classified as sub-telomeric and those lying beyond were binned as mid-chromosomal. Mutation spectra was plotted as pyrimidine changes, taking into consideration reverse complements for each base change. SomaticSignatures (33) was used to plot mutation spectra across 96 channels to account for all possible base substitutions. Strandbiased mutations were evaluated based on whether the SNVs were located on ssDNA generated upon telomere uncapping and resection. Mutations per isolate were calculated by plotting SNV as a function of the total number of strains used per treatment condition. PLogo (https://plogo. uconn.edu/) was used to infer the mutation signature of AA treatment, by evaluating the statistical probability of over-/underrepresentation of residues in the ±1 trinucleotide context of the mutated residue compared to the background sequence. For a particular substitution, reverse complements were taken into consideration to perform the PLogo calculation.

Mutation enrichment and mutation load analysis
Mutation enrichment and mutation loads were calculated based on (28,29,34) using Trinucleotide Mutation Signatures (TriMS) whereby the number of a given substitution in a specific trinucleotide context is compared against the total number of the given substitution genome-wide, as well the incidence of the mutated residue within the ± 20 nucleotide context of the mutation. The following calculation was used: Enri chment gCn→A = Mutati ons gCn→A X Context C Mutati ons C→A X Context gcn A one-sided Fisher's Exact test was used to calculate the P-values of enrichment of the given mutation signature in each sample and in the total yeast samples. Mutation loads for a given signature were calculated with a minimum enrichment probability of >1 and a Bonferroni corrected Pvalue of ≥0.05, using the following equation:

Mutational analysis in cancers
Somatic mutation load and enrichment were calculated for a given signature using mutation data from de-duplicated somatic SNV calls from different donors in whole-genomesequenced cancers from PCAWG (23) and whole-exomesequenced cancers from ICGC data portal (27). For cancer samples carrying an enrichment of the AA mutation signature of ≥1 (Bonferroni-corrected P-value of ≤0.05), transcriptional strand bias of mutations was calculated with BEDTools (35) intersect, using hg19 as the reference genome (UCSC Table Browser, (36)) and a goodness of fit test was performed to test the statistical significance of the ratios of mutations on transcribed versus non-transcribed strands. Mutation signature correlations were performed using simple linear regression and significance of correlations were estimated based on Pearson's r coefficient.

Acetaldehyde treatment results in elevated mutation frequency in yeast cells
A primer on the yeast mutation reporter system. Single stranded DNA (ssDNA) can be exposed during several key steps of DNA metabolism including DNA replication and transcription, and is highly susceptible to mutagens, which makes it an ideal template to uncover mutation profiles of even weakly mutagenic agents. To this end, we used a previously developed genetic reporter assay using the budding yeast Saccharomyces cerevisiae, whereby the strain has a temperature-sensitive (ts) cdc13-1 allele, and has additionally been engineered to carry the ADE2, CAN1 and URA3 genes near the de novo telomere of the left arm of Chromosome 5 after being removed from their original chromosomal loci ( Figure 1A, (28)). At the non-permissive temperature (37 • C), the cdc13-1 allele causes telomere uncapping, resulting in extensive resection at chromosome ends and the production of large tracts of ssDNA (37). These tracts extend up to 30 kb from telomere ends and span the above reporter genes placed within the sub-telomeric regions (38,39). Additionally, induction of the ts allele causes yeast to arrest in G2 as large buds (40). Treatment of cells at this stage with a mutagen permits accrual of lesions within the exposed ssDNA and shifting of the strains to the permissive temperature (23 • C) hereafter results in resynthesis of the second strand with the mutagenic bypass of the lesions and produce selectable mutations (28). Due to the single-stranded nature of DNA, excision repair pathways cannot remove the lesions allowing us to analyze the signature of lesion formation in ssDNA. Mutations within the CAN1 and ADE2 reporter genes are selectable. CAN1 mutations render yeast cells resistant to the arginine mimicking chemical canavanine (Can R ), whereas mutations in ADE2 fail to synthesize adenine and when placed on low adenine media, appear as red or reddish-pink colonies (Ade -). Clustering of mutations within these reporters produce Can R Ademutants which are visually quantified to calculate the mutation frequency associated with the given mutagen. This system has been successfully employed to test the mutagenicity of multiple agents, including the APOBEC family of cytidine deaminases (28,41), and alkylating agents methyl-methanosulfonate (MMS) and ethylmethanosulfonate (EMS) (29,42).
Acetaldehyde is mutagenic on ssDNA in yeast. We used the above reporter system to test if AA can induce mutations in yeast. To test this, we induced the formation of ssDNA in the cdc13-1 allele-carrying strains via shifting yeast cells to non-permissible temperature (37 • C). At this temperature, the telomeres are uncapped, which allows strains to accumulate ssDNA via large scale end resection at the telomeres. The yeast cells are arrested in the G2-phase of the cell cycle due to checkpoint activation. We subsequently treated G2arrested yeast cells with AA. AA is extremely volatile, with a melting temperature of 18 • C; therefore, to minimize loss due to evaporation, we added chilled AA to cultures and additionally filled culture tubes to capacity to minimize AA oxidation during incubation. The strains were incubated with AA in a rotating incubator to allow the yeast cells to

23°C, telomere re-capping and dsDNA synthesis, TLS mediated lesion bypass and mutations
Acetaldehyde -+ -+ -+ -+ -+ Figure 1. Testing the mutagenicity of acetaldehyde in yeast. (A) Schematic of the yeast reporter system. As described previously (28), the assay utilizes a yeast strain harboring the cdc13-1 allele and sub-telomeric reporters CAN1, ADE2 and URA3 on chromosome V (Materials and Methods). Temperature shift (37 • C) followed by AA addition produces lesions (orange triangles) on guanine residues within ssDNA within reporter genes, which upon restoration of permissive temperature (23 • C) would be erroneously bypassed by translesion synthesis, resulting in 'clustered' Can R Ademutations within the reporters (solid red circles). Grey ovals-telomere protection complex, solid black circle-centromere. (B) Acetaldehyde mutation frequency estimates. Frequency of Can R Adeisolates after 24h AA exposure in the indicated strains. Minus (-) indicates water-treated controls whereas plus (+) indicates AA treatment. Data represent median frequencies with 95% CI. Asterisks represent P-value <0.05 based on an unpaired t-test (untreated v acetaldehyde-treated wildtype = ***(P-value = 0.0007), acetaldehyde-treated wildtype vs Δrev3 = ** (P-value = 0.005). Plotting and analysis were performed using Prism (v 9.3.1, GraphPad Software, LLC). stay in suspension in the media and to allow maximum exposure to AA in the media. We did not notice any appreciable reduction in the viability of cells treated with 0.2% AA, compared to water-only control samples (Supplementary Figure S1, Supplementary Table S2). When strains were maintained at the permissive growth temperature (23 • C) prior to addition of AA, we did not observe such an increase in mutagenesis compared to strains treated with AA post-cdc13-1 induction at 37 • C (Supplementary Table S2).
This demonstrates that AA-induced mutagenesis is dependent on the induction of ssDNA. AA treatment for 24 h led to a >30-fold increase in mutation frequency (median mutation frequency 5.30E-05) over strains treated with water (median frequency 1.40E-06), Figure 1B). Since AA is highly volatile at 37 • C, we also determined the mutation frequencies of yeast treated with AA at 4 • C for 24 h. These strains were first incubated at 37 • C to ensure telomere uncapping and resection led to the forma-tion of ssDNA and then acetaldehyde was added, and the cultures were incubated at 4 • C. AA was found to be equally mutagenic at this temperature as 37 • C as such, all further experiments were conducted at 37 • C (Supplementary Table  S2).

Acetaldehyde mutagenesis relies on translesion synthesis
Since cells were treated with AA after the induction of the formation of ssDNA, the mutations were likely due to bypass of lesions during the resynthesis of the second strand to restore DNA to its double-stranded form. Such mutations should be independent of DNA repair by nucleotide excision repair (NER), which requires a complementary strand (43). We deleted RAD1 to abolish NER in strains and saw that there was no statistically significant change in the mutation frequency upon treatment with AA (Median mutation frequency 1.85E-05) ( Figure 1B, Supplementary Table S2), indicating that NER does not function to repair lesions in ssDNA.
In yeast the two major TLS pathways rely upon either DNA polymerase eta (Pol ) which is involved in errorfree bypass of lesions, or the error prone DNA polymerase zeta (Pol ). The latter additionally involves Rev1, which in addition to being a structural component of Pol can independently catalyze bypass of certain types of lesions (44)(45)(46). To test which of the above pathways contributes to the elevated mutation frequency observed with AA, we deleted RAD30 which codes for the catalytic subunit of the Pol polymerase. There was no reduction in mutation frequency from AA treatment in Δrad30 strains compared to the wildtype strains (Median mutation frequency 2.35E-05) ( Figure 1B, Supplementary Table S2). Next, we deleted REV3 in our strains, which encodes the catalytic subunit of Pol . In the Δrev3 background, AA treatment caused a 13-fold reduction in mutation frequencies compared to the wildtype strains ( Figure 1B, Supplementary Table S2). Lastly, we assessed the effect of Rev1 (29). The catalyticallyinactive rev1-AA mutant (D647A and E648A, (45) did not alter the mutation frequency with AA treatment (Figure 1B,  Supplementary Table S2). Our results strongly suggest that erroneous bypass of lesions in ssDNA by Pol underlie the mutagenicity of AA.

Acetaldehyde predominantly makes G→T transversions on ssDNA
We next asked if we can identify the mutation spectrum associated with exposure to acetaldehyde. For this, we isolated genomic DNA from 115 Can R Ademutant colonies obtained from AA treatment and performed whole genome sequencing. AA treatment leads to the formation of a lesion on ssDNA whose erroneous bypass further leads to mutations culminating as Can R Adeisolates. However, Can R Adeisolates can also arise from the loss of the chromosomal arm carrying the reporters. To rule out this possibility, we additionally tested the Can R Adeisolates derived from AA treatment for the presence of the URA3 gene within the reporter and only selected Can R Ade -Ura + isolates in the AA treatment cohort for further analysis (Supplementary Table S3).
To identify mutations specific to AA treatment, we sequenced and analyzed 27 isolates derived from control (water) treatment in parallel. Compared to mutagen treated samples, control treatments do not result in the appearance of a high frequency of red Can R Ade -(red) colonies, therefore we sequenced a randomized mixture of Can R (white) and Can R Ade -(red) colonies (Supplementary Table S2). Unlike samples treated with water, among the 504 unique mutations identified in AA-treated samples, C→A (G→T) transversions were the predominant SNVs (41%, 15% for water), along with a smaller percentage of C→T (G→A) changes (6.7%, 33% for water) (mutation density for cumulative C→A changes per isolate, water = 0.59, AA = 1.79), Figure 2, Supplementary Table  S4). Our observations are in accordance with prior studies showing guanines to be the primary target of AA-induced lesions.
In our assay, 5 →3 resection from chromosome ends would render the bottom strand of the left arm and top strand of the right chromosomal arm single stranded. Accordingly, we asked if AA preferentially mutates a specific base on single stranded DNA. Upon comparing the locations of the observed mutations to the reference strand, we observed that C→A changes were pre-dominant on the left arms of chromosomes (indicating bottom singlestrand G lesions) and conversely saw G→T changes on the right chromosome arms (indicating top single-strand G lesions) ( Figure 2B). We estimated the distance of mutations from the telomeres and observed that most mutations (453/504, Supplementary Table S4) Table S4), indicating that AA has a propensity to damage ssDNA regardless of a selection bias. Our results confirm that in our assay system, telomereproximal single-stranded DNA is the preferred template for AA-induced mutagenesis.
Overall, our analysis reveals a distinct mutation spectrum for AA, whereby guanines are strongly preferred over other bases and the ensuing G→T (C→A) transversions constitute the major mutation type. We did not observe enough indels in our samples for signature analysis (Supplementary Table S4). Nevertheless, our data agrees with prior in vitro studies showing that the primary target of AA induced DNA damage is guanine residues (8,14,47).

Mutation signature of acetaldehyde in single stranded DNA
We next determined the proportion of mutations falling within all the 96 possible single base substitutions within trinucleotide mutational motifs, consisting of a central base and the flanking -1 and + 1 residues (33). When plotted as cumulative pyrimidine changes (i.e. mutated base along with the complementary mutated base) there was a marked enrichment of C→A mutations when the -1 base was a guanine ( Figure 3A, Supplementary Table S5). The contribution of the remaining signature motifs remained at the baseline level, strongly suggesting that the major mutation sig-  nature of AA is centered around mutated guanine residues (showing as C→A changes). To confirm the above observation, we used pLogo to check the proportion of over-and under-represented nucleotides flanking the C→A (or G→T) change. We noticed a strong over-representation for a guanine in the -1-position flanking the mutated cytosine ( Figure 3B, Supplementary Table S6 conversely represented to show mutated base as a pyrimidine), yielding a gCn→gAn (nGc→nTc) signature for AA exposure. In comparison, no statistical enrichment was observed for this signature in the water-treated control samples ( Figure 3B). We further used the knowledge-based pipeline we termed TriMS (Trinucleotide Mutation Signature) to determine if this mutation signature is enriched in our samples. TriMS is based on a previously described pipeline (34) and is broadly customizable to any oligonucleotide-centered mutational motif. We calculated the enrichment of the gCn→gAn change versus all C→A changes genome-wide, with an enrichment ≥ 1 and a Benjamini-Hochberg-corrected Pvalue ≤ 0.05 deemed statistically significant. In the AAtreated samples, the gCn→gAn signature was highly enriched (3.8, BH-corrected P-value 2.09E-22) compared to control samples (1.15 BH-corrected P-value 0.65829269) ( Figure 3C, Supplementary Table S7).

Acetaldehyde-derived mutation signature can be found in alcohol-associated human cancers
Alcohol consumption is unequivocally associated with the risk of malignancy for a variety of cancer types, including cancers of the digestive tract, colorectal cancer as well as hepatocellular carcinomas (48). Since AA concentrations are likely elevated in alcohol-associated cancers, we sought to detect the AA mutation signature in published cancer datasets. In addition, AA is present in tobacco smoke (49), thus we also analyzed lung cancers for enrichment of the AA mutation signature.
We analyzed >9000 whole-exome sequenced cancers spanning 24 cancer types from ICGC and >1600 wholegenome sequenced cancers across 15 cancer types from the PCAWG database. We identified a significant enrichment of the gCn→gAn (nGc→nTc) signature in various samples amongst both cohorts (Table 1, Supplementary Table S8). In the whole-exome sequenced lung cancer samples, we detected signature enrichment at a low frequency ( Table 1, 5/1001 samples with enrichment >1 for LUAD and LUSC). In liver cancers, 27 samples across four different ICGC cancer cohorts (LICA, LICA-FR, LIHC, LIRI-JP) had a significant gCn→gAn enrichment (Table 1). Of note, the LICA cohort had a high proportion of samples enriched for the AA-associated mutation signature (85/400 samples), with a mean gCn→gAn mutation density exceeding 5000 per exome (Table 1). Within the whole-genome sequenced PCAWG datasets, we were able to identify samples from esophageal carcinoma (ESCA) and head-andneck cancer (HNSCC) that had a significant enrichment of the gCn→gAn mutation signature ((>1, BH-corrected P-value ≤ 0.05), Table 1, Supplementary Table S10). Like the ICGC samples, various whole-genome sequenced hepatocellular carcinoma, as well as stomach adenocarcinoma samples demonstrated a statistically significant enrichment of the AA-associated mutation signature gCn→gAn ( Table 1). Importantly, all the cancers where an enrichment for the gCn→gAn mutation signature was observed are associated with alcohol and/or smoking, leading us to posit that samples in these types of cancer likely experience exposure to acetaldehyde. In contrast, we did not observe an enrichment for the AA signature in non-alcohol/smoking associated cancer types, including reproductive, neurological, renal, or hematological cancers (Tables 1, S12, S13).
AA exposure is typically associated with an increase in CC→AA (or GG→TT) transversions (14,50). We hypothesized since gCn→gAn mutations are also induced by AA, CC→AA mutation loads should correlate with the gCn→gAn mutations in AA-associated cancer types. While a low number of GG→TT (CC→AA) substitutions precluded such a comparison in yeast, we analyzed this correlation in cancers from Table 1 that displayed a statistical enrichment of the gCn→gAn mutation signature and had an associated increase in median gCn→gAn mutations. The analyzed cancer cohorts displayed a positive correlation between the cumulative CC→AA double base substitutions and the cumulative gCn→gAn mutation loads in the same samples for both ICGC and PCAWG cancer cohorts ( Figure 4, Tables S9, S11). These data indicate that the gCn→gAn mutations and the GG→TT mutations likely rose from the same source mutagen.
Lastly, we looked for clinical correlations between alcohol/smoking and the presence of the AA-associated mutation signature. The absence of comprehensive clinical data for the ICGC samples and most of the PCAWG samples, particularly with regards to alcohol and tobacco history, precluded analysis of correlations between signature enrichment and patient exposure. However, within the PCAWG esophageal cancer (ESCA) dataset, five samples that were enriched for the gCn→gAn signature also had associated clinical data with all samples demonstrating significant enrichment belonging to patients with a known history of either tobacco usage or alcohol consumption (Supplementary Figureure S2, Supplementary Table S14).

The acetaldehyde mutation signature displays transcriptional strand bias in human cancers
During transcription, the template strand (transcribed strand) remains associated with the newly synthesized RNA molecule, and lesions on this strand are capable of stalling RNA polymerase, which leads to the recruitment of transcription-coupled repair machinery such as TC-NER (51,52). Conversely, the coding strand (non-transcribed strand) is rendered single stranded, which makes it particularly susceptible to transcription-associated mutagenesis (53). Many cancers display a strand asymmetry of mutations, for example those associated with UV exposure, smoking, alkylating agents, or oxidative damage (29,54,55). Exploring such mutational biases can greatly contribute to the overall understanding of the mutational processes that drive carcinogenesis. To this end, we asked if the cancers displaying significant enrichment for the gCn→gAn (nGc→nTc) mutation signature have a transcriptionally biased mutation spectrum. For whole-exome sequenced, liver- Table 1. List of cancers analyzed for the gCn→gAn mutation signature. 15 cancer types were analyzed from the PCAWG consortium of whole-genome sequenced cancers. 24 cancer types were analyzed from the ICGC consortium of whole-exome sequenced cancers. Mean mutation load of the combined gCn→gAn mutations within genomes/exomes were calculated for samples with a statistical enrichment of the gCn→gAn mutation signature (≥1) with a Benjamini-Hoechberg corrected P-value of ≤0.05

Cancer
Alcohol associated cancers from ICGC, there was strong bias for nGc→nTc to occur on the non-transcribed strand ( Figure  5A, Supplementary Table S15). Interestingly, we did not observe a statistically significant bias for lung-associated cancers ( Figure 5A, Supplementary Table S15). Similarly in whole-genome sequenced PCAWG cancers datasets, significantly more nGc→nTc mutations were found associated with the non-transcribed strand in liver and stomachassociated cancers, but not in upper respiratory tractassociated ESCA and HNSCC ( Figure 5B, Supplementary  Table S15). Our results suggest that ssDNA formed during transcription is a likely source of AA-induced DNA damage in cancers. However, we cannot rule out the possibility that TC-NER also functions to abrogate AA-adducts in the transcribed strand. Such activity of TC-NER would also result in the observed transcriptional-strand bias for AAinduced mutagenesis in cancers.

DISCUSSION
In the present study, we demonstrated that acetaldehyde (AA) exposure generates strand-biased, guanine centered mutations upon damage in ssDNA in yeast. We showed that AA treatment is highly mutagenic on ssDNA and observe a preponderance of C→A (G→T) single nucleotide polymorphisms. We surmise the observed mutations likely arise from lesion bypass by the Pol polymerase associated with error-prone translesion synthesis. We deciphered a distinct gCn→gAn (nGc→nTc) mutation signature for AA in yeast. Importantly, we were able to detect an enrichment of the AA-associated signature in various alcohol-associated cancer genomes, indicating that the mutation signature identified in yeast is diagnostic of AA-exposure in cancers. Previously GG→TT changes have been ascribed to AAinduced DNA damage in in vitro studies (14,50). Interestingly, we did not see an enrichment in GG→TT changes or other double base substitutions in yeast strains treated with AA. The difference in the signature of AA-induced mutagenesis likely reflects the type of lesions formed in doublestranded plasmid DNA in vitro as compared to the lesions in ssDNA in vivo. Two molecules of AA can react with guanines to form ␣-Sand ␣-R-methyl-␥ -hydroxy-1, N2propano-2 -deoxyguanosine (CrPdG) (56). When the openring forms of CrPdG react with each other, inter-or intrastrand crosslinks may be formed (14,57,58). Mutagenic bypass of these crosslinks leads to GG→TT changes--double base substitutions classically associated with AA exposure (14,18,22). On the other hand, various studies have demonstrated that AA also forms lesions on single guanine moieties, with N2-ethylidene-dG being the most commonly derived adduct (59), and bypass of such lesions results in single base substitutions at guanine residues (9,10,12). It is possible that AA predominantly forms such mutagenic lesions on single guanine residues in ssDNA in yeast, leading to G→T single base substitutions. Because the mutation spectrum of AA in our assay bears similarities to prior studies and is found to correlate with the previously identified AA-signature in cancers, it strongly suggests that the novel signature we describe (nGc→nTc) that we observe is AAspecific. However, our current assay design does not allow us to identify the specific adduct(s) responsible for the mutagenic lesions associated with AA exposure.
Furthermore, studies in fission yeast have demonstrated that AA exposure leads to the activation of various DNA repair pathways including nucleotide excision repair, base excision repair and homologous recombination (17). As such, it is likely that most of the studies aimed at understanding the mutation signatures of AA were unable to detect single base substitutions, as the N2-ethylidene-dG lesion could have been efficiently removed by these DNA repair pathways. Because excision repair pathways cannot function on ssDNA, we are likely able to enrich and reliably detect single base substitutions associated with erroneous bypass of this lesion in our system.
Based on prior reports of a link between ethanol consumption and oxidative stress (60), it is reasonable to assume that AA treatment by itself may also impart oxidative stress. Consequently, it is possible that AA-induced oxidative stress makes a non-trivial contribution to the observed mutagenesis in our assay. However, in yeast, induction of oxidative stress has been shown to produce a distinct C→T mutation signature in ssDNA (55). However, in our yeast samples, we observe a much lower number of sub-telomeric C→T mutations compared to C→A changes with AA treatment (Supplementary Table S4). Although we cannot fully rule out AA-induced oxidative stress contributing to the observed C→A mutagenesis, our data suggest that this most likely is not the primary mechanism of AAassociated mutagenesis.
Mutagenic bypass of DNA lesions requires the activity of translesion polymerases. Studies on alcohol-associated cancers identified mutation signature associated with the TLS polymerase Pol (61). Additionally, alcohol-induced mutagenesis in budding yeast and AA sensitivity in fission yeast was found to be dependent on the activity of translesion polymerases (17,24). Further, removal of Rev1 in cell-free assays has been shown to impact the mutagenicity of AAderived interstrand crosslinks (18). In agreement with the above, we observed that TLS was required for AA-induced mutagenesis. Abolishing REV3 led to a drastic reduction in AA-induced mutation frequency (Figure 1, Supplementary  Table S2), indicating that Pol is essential for the mutagenic bypass of AA-induced lesions in ssDNA in budding yeast.
We detected an enrichment of the AA-associated mutation signature (gCn→gAn) in several samples from alcoholassociated cancers in the ICGC and PCAWG datasets; however, no significant enrichment for the AA-associated mutation signature was seen in any of the several other wholegenome and -exome sequenced cancer datasets, including reproductive cancers, skin malignancies, neurological cancers as well as urothelial cancers (Supplementary Tables S12, S13). As such, it is reasonable to argue that chronic alcohol exposure leads to higher AA-induced genomic damage, especially in the form of lesions on ssDNA and signif-icantly contributes to carcinogenesis. The observed correlation between the smoking/drinking status and AA signature enrichment for esophageal carcinoma samples further substantiates this argument (Supplementary Figure  S2). However, the lack of comprehensive clinical data for different cancer types prevents better statistical analysis of the correlation between chronic alcohol consumption and AA-induced mutation signature in these cancers. Notably, in cohorts from ICGC and PCAWG cancers associated with alcohol consumption, we observed a remarkable correlation between an increase in CC→AA double base substitutions and elevated gCn→gAn mutations (Figure 4). Our data underline the specificity of the acetaldehyde-associated mutation signature and suggests that in vivo, AA mutagenesis likely occurs in a repair-, and template-dependent manner, with differential lesions on ssDNA vs dsDNA resulting in varying mutation outcomes.
Previously, a T→C mutation signature (Signature E4/SBS Signature 16) was ascribed to alcohol and smoking in esophageal carcinoma (26,62), however the molecular etiology of the signature was unknown. Similarly, previous studies have described a smoking associated signature in COSMIC (SBS Signature 4), and tobacco chewing (SBS Signature 29), and reactive oxygen species (ROS) (SBS Signature 18) (22), which have a similar mutation pattern to that we observed for AA, predominantly C→A changes. However, unlike our analysis, there was no clear trinucleotide mutational motif in these studies. As such, these signatures of the mutations from other etiologies and are not diagnostic of AA exposure.
For both PCAWG and ICGC cancers analyzed in our study, we observe a transcriptional strand bias for the AA mutation signature; however, this bias is more pronounced in cancers typically associated with heavy alcohol consumption, mainly liver and/or gastro-intestinal tumors ( Figure  5). Surprisingly, we see either no strand bias (LUSC, ESCA, HNSCC) or a small degree of bias (LUAD) in cancers of the upper respiratory tract, even though these tissue types are sites of primary exposure to ethanol. One possibility is that most of consumed ethanol is metabolized in hepatocytes, which ensures a higher probability of exposure to AA in hepatic and surrounding tissues (63,64). Also, oral AA levels are influenced by a gamut of factors including beverage type, tobacco smoking history, oral hygiene, and metabolism via the oral microbiome (65)(66)(67)(68). The variability in AA exposure on oral and upper respiratory tissues could alter the genome-wide distribution, accumulation and/or spectra of mutations associated with AA exposure in these tissues. On the other hand, oral and upperrespiratory tract tissues are also exposed to a wide variety of other mutagens including tobacco smoke which can lead to an accumulation of lesions and mutations in guanines leading the characteristic C→A (G→T) changes (22,69). Such overlapping mutations may confound the analysis of the contribution of AA-induced mutations in these samples.

CONCLUSION
Environmental aldehydes represent a growing class of toxic agents that are linked with an increasing risk of many human ailments, including neurodegenerative disease, car-diopulmonary diseases, and aging. Due to similarities in their physicochemical properties, environmental and endogenous aldehydes can not only act synergistically but also can cross-react to produce amplified genotoxic effects (70). The variability in reported AA-associated mutagenesis from past studies in model systems and in alcohol-associated cancers suggest that the AA mutation spectrum might be governed by multiple factors, including specific genomic contexts, the replication/transcriptional status, DNA repair proficiency, and perhaps epigenetic modifications. Understanding the molecular mechanisms underpinning aldehyde toxicity would go a long way in determining the risks associated with exposure to addictive agents such as alcohol and tobacco smoke and devising appropriate therapeutic strategies.

DATA AVAILABILITY
The yeast strains used in the study are available upon request. Raw FASTQ sequence files from whole-genome sequencing of yeast samples have been deposited to the Sequence Read Archives (SRA) database under BioProject ID PRJNA817061. Sequence for the reference yeast genome used in this study (ySR127) is accessible on GenBank (CP011547-CP011563). Source code for TriMS is available on GitHub (https://github.com/nataliesaini11/TriMS).