Short sequence motif dynamics in the SARS-CoV-2 genome suggest a role for cytosine deamination in CpG reduction

Dear Editor, The apolipoprotein B editing complex (APOBEC) protein family members are host antiviral enzymes known for catalyzing cytosine to uracil (C>U) deamination in foreign single-stranded DNA (ssDNA) and RNA (ssRNA) (Blanc and Davidson, 2010; Salter and Smith, 2018). Enzymatic target motifs for most of the APOBEC enzymes have been experimentally identified, among which the most common ones are 50-[T/U]C-30 and 50-CC-30 for DNA/RNA substrates (Salter and Smith, 2018; McDaniel et al., 2020). It was recently suggested that SARS-CoV-2 undergoes genome editing by host-dependent RNA-editing proteins such as APOBEC (Di Giorgio et al., 2020; Rice et al., 2020; Simmonds, 2020; Schmidt et al., 2021). Given the large amount of available data and the relatively low mutation rate of the SARS-CoV-2 virus (Rambaut et al., 2020), we aimed to monitor its genomic evolution on a very brief time scale during the COVID-19 pandemic. Here, we demonstrate progressive C>U substitutions in SARS-CoV-2 genome within the timeframe of 5 months. We highlight the role of C>U substitutions in the reduction of 50-UCG-30 motifs and hypothesize that this progressive decrease is driven by host APOBEC activity. We aligned 22164 SARS-CoV-2 genomes from GISAID database to the reference genome and observed a total of 9210 single-nucleotide changes with C>U being the most abundant (Figure 1A; Supplementary Text, Figures S1 and S2, and Table S1). Over a period of 5 months, we found a steady and substantial increase in C>U substitutions (Figure 1B), with almost half of them being synonymous (Supplementary Text and Figure S3), but not in other changes (Supplementary Figure S4). One potential driver behind the increase in C>U changes could be the recently proposed APOBEC-mediated viral RNA editing (Di Giorgio et al., 2020; Simmonds, 2020; Supplementary Text). Since APOBEC3 family members display a preference for RNA in open conformation as opposed to forming secondary structures (McDaniel et al., 2020), we calculated the folding potential of all genomic sites that include C>U substitutions (Figure 1C). Positions with C>U changes are more often located in regions with low potential for forming secondary RNA structures. These observations are in agreement with the notion that members of the APOBEC family are the main drivers of cytosine deamination in SARS-CoV-2 (Di Giorgio et al., 2020; Simmonds, 2020). We searched for possible APOBEC genetic footprints (50-UC-30>50-UU-30) in viral dinucleotide frequencies (Supplementary Figure S5). Among all dinucleotides, UpC showed the highest degree of decrease, while UpU exerted the highest rates of increase, which is consistent with APOBEC activity (Supplementary Text). When analyzing the context of genomic sites undergoing C>U changes, we noticed an enrichment for 50-UCG-30 motifs (Supplementary Table S2). To assess the contribution of C>U changes in CpG loss, we examined the dynamics of [A/C/G/U]CG trinucleotides over time (Figure 1D). The progressive change ( 1% over a 5-month period) of 50-UCG-30 to 50-UUG-30 is most striking when supported by a larger number of genomes (Days 70–115), whereas no such pattern is observed for the other trinucleotides (Figure 1D). The association between cytosine deamination and CpG loss is further underlined by the rapid, progressive increase in 50-UCG-30>50-UUG-30 changes compared to other 50-UC[A/C/U]-30 motifs (Supplementary Figure S6). The genomic region for the highest percentage of 50-UCG-30 loss is located in ORF1 (Supplementary Text and Figure S7). No apparent progression of 50-UCG-30 over time is observed on the negative strand, suggesting that the action of APOBEC on the negative strand of SARS-CoV-2 is limited compared to that on the positive strand (Supplementary Figure S8). The zinc-finger antiviral protein (ZAP) selectively binds to viral CpG regions, resulting in viral RNA degradation (Takata et al., 2017). Previous studies reported that the reduced number of CpG motifs in HIV and other viruses played an important role in the viral replication inside the host cell, allowing the virus to escape ZAP protein activity (Takata et al., 2017). Similarly, a stronger suppression of CpGs is observed in SARS-CoV-2 compared to other coronaviruses (Digard et al., 2020). Given the high expression levels of APOBEC and ZAP genes in COVID-19 patients (BlancoMelo et al., 2020), the direct interaction of APOBEC with viral RNA (Schmidt et al., 2021), and our observations, we

Given the large amount of available data and the relatively low mutation rate of the SARS-CoV-2 virus (Rambaut et al., 2020), we aimed to monitor its genomic evolution on a very brief time scale during the COVID-19 pandemic. Here, we demonstrate progressive C>U substitutions in SARS-CoV-2 genome within the timeframe of 5 months. We highlight the role of C>U substitutions in the reduction of 5 0 -UCG-3 0 motifs and hypothesize that this progressive decrease is driven by host APOBEC activity.
We aligned 22164 SARS-CoV-2 genomes from GISAID database to the reference genome and observed a total of 9210 single-nucleotide changes with C>U being the most abundant ( Figure  1A; Supplementary Text, Figures S1 and S2, and Table S1). Over a period of 5 months, we found a steady and substantial increase in C>U substitutions ( Figure 1B), with almost half of them being synonymous (Supplementary Text and Figure S3), but not in other changes (Supplementary Figure S4). One potential driver behind the increase in C>U changes could be the recently proposed APOBEC-mediated viral RNA editing (Di Giorgio et al., 2020;Simmonds, 2020;Supplementary Text). Since APOBEC3 family members display a preference for RNA in open conformation as opposed to forming secondary structures (McDaniel et al., 2020), we calculated the folding potential of all genomic sites that include C>U substitutions ( Figure 1C). Positions with C>U changes are more often located in regions with low potential for forming secondary RNA structures. These observations are in agreement with the notion that members of the APOBEC family are the main drivers of cytosine deamination in SARS-CoV-2 (Di Giorgio et al., 2020;Simmonds, 2020).
When analyzing the context of genomic sites undergoing C>U changes, we noticed an enrichment for 5 0 -UCG-3 0 motifs (Supplementary Table S2). To assess the contribution of C>U changes in CpG loss, we examined the dynamics of [A/C/G/U]CG trinucleotides over time ( Figure 1D). The progressive change (1% over a 5-month period) of 5 0 -UCG-3 0 to 5 0 -UUG-3 0 is most striking when supported by a larger number of genomes (Days 70-115), whereas no such pattern is observed for the other trinucleotides ( Figure 1D). The association between cytosine deamination and CpG loss is further underlined by the rapid, progressive increase in 5 0 -UCG-3 0 >5 0 -UUG-3 0 changes compared to other 5 0 -UC[A/C/U]-3 0 motifs (Supplementary Figure S6). The genomic region for the highest percentage of 5 0 -UCG-3 0 loss is located in ORF1 (Supplementary Text and Figure S7). No apparent progression of 5 0 -UCG-3 0 over time is observed on the negative strand, suggesting that the action of APOBEC on the negative strand of SARS-CoV-2 is limited compared to that on the positive strand (Supplementary Figure S8).
The zinc-finger antiviral protein (ZAP) selectively binds to viral CpG regions, resulting in viral RNA degradation (Takata et al., 2017). Previous studies reported that the reduced number of CpG motifs in HIV and other viruses played an important role in the viral replication inside the host cell, allowing the virus to escape ZAP protein activity (Takata et al., 2017). Similarly, a stronger suppression of CpGs is observed in SARS-CoV-2 compared to other coronaviruses (Digard et al., 2020). Given the high expression levels of APOBEC and ZAP genes in COVID-19 patients (Blanco-Melo et al., 2020), the direct interaction of APOBEC with viral RNA (Schmidt et al., 2021), and our observations, we hypothesize that as a consequence of APOBEC-mediated RNA editing, SARS-CoV-2 genome may escape host cell ZAP activity. Both APOBEC and ZAP are interferoninduced genes that act preferentially on ssRNA in open conformation (Luo et al., 2020;McDaniel et al., 2020). Initially, APOBEC and ZAP enzymes may have overlapping preferred target motifs for their enzymatic functions ( Figure 1E). The catalytic activity of APOBEC on 5 0 -UC-3 0 leads to cytosine deamination, which destroys ZAP's specific acting site (5 0 -CG-3 0 ). The conversion of C>U allows viral RNA to escape from ZAP-mediated RNA destruction. Therefore, uracil editing is more likely to become fixed at UCG positions due to the selective advantage this conveys to subvert ZAP-mediated degradation.
A recent study hypothesized that both ZAP and APOBEC provide selective pressure that drives the adaptation of SARS-CoV-2 to its host (Wei et al., 2020). Here, we provided one of the potential mechanisms that contribute to CpG reduction in SARS-CoV-2.
In summary, our phylogeny-free approach, together with other recent studies, strongly supports the proposed model and merits future experimental validation. To our knowledge, this is the first study linking the dynamics of viral genome mutation to two known host molecular defense mechanisms, the APOBEC and ZAP proteins.
[Supplementary material is available at Journal of Molecular Cell Biology online. The data underlying this work are available in GISAID, at https://gisaid.org. The ID numbers of genomes used are provided in Supplementary Table S1. We thank all laboratories that have contributed sequences to the GISAID database and Zhadyra Yerkesh for giving her comments and helpful discussions. This work was supported by funding from King Abdullah University of Science and Technology (KAUST) R3T initiative. Work in A.P.'s laboratory is supported by the KAUST Faculty Baseline