The eccDNA Replicon: A heritable, extra-nuclear vehicle that enables gene amplification and glyphosate resistance in Amaranthus palmeri

Gene copy number variation is a predominant mechanism by which organisms respond to selective pressures in nature, which often results in unbalanced structural variations that perpetuate as adaptations to sustain life. However, the underlying mechanisms that give rise to gene proliferation are poorly understood. Here, we show a unique result of genomic plasticity in Amaranthus palmeri, a massive, ∼400kb extrachromosomal circular DNA (eccDNA), that harbors the 5-enoylpyruvylshikimate-3-phosphate synthase (EPSPS) gene and 58 other encoded genes whose functions traverse detoxification, replication, recombination, transposition, tethering, and transport. Gene expression analysis under glyphosate stress showed transcription of 41 of the 59 genes, with high expression of EPSPS, aminotransferase, zinc-finger, and several uncharacterized proteins. The genomic architecture of the eccDNA replicon is comprised of a complex arrangement of repeat sequences and mobile genetic elements interspersed among arrays of clustered palindromes that may be crucial for stability, DNA duplication and tethering, and/or a means of nuclear integration of the adjacent and intervening sequences. Comparative analysis of orthologous genes in grain amaranth (Amaranthus hypochondriacus) and water-hemp (Amaranthus tuberculatus) suggest higher order chromatin interactions contribute to the genomic origins of the Amaranthus palmeri eccDNA replicon structure. One-sentence summary The eccDNA replicon is a large extra-nuclear circular DNA that is composed of a sophisticated repetitive structure, harbors the EPSPS and several other genes that are transcribed during glyphosate stress.

Short Title: The eccDNA replicon enables glyphosate resistance in Amaranthus palmeri 15 16 One-sentence summary: The eccDNA replicon is a large extra-nuclear circular DNA that is 17 composed of a sophisticated repetitive structure, harbors the EPSPS and several other genes that 18 are transcribed during glyphosate stress.

20
The author responsible for distribution of materials integral to the findings presented in this 21 article in accordance with the policy described in the Instructions for Authors 22 (www.plantcell.org) is: Christopher A. Saski (saski@clemson.edu) 23 24 ABSTRACT 25 Gene copy number variation is a predominant mechanism by which organisms respond to 26 selective pressures in nature, which often results in unbalanced structural variations that 27 perpetuate as adaptations to sustain life. However, the underlying mechanisms that give rise to 28 gene proliferation are poorly understood. Here, we show a unique result of genomic plasticity in 29 Amaranthus palmeri, a massive, ~400kb extrachromosomal circular DNA (eccDNA), that 30 harbors the 5-enoylpyruvylshikimate-3-phosphate synthase (EPSPS) gene and 58 other encoded 31 genes whose functions traverse detoxification, replication, recombination, transposition, 32 tethering, and transport. Gene expression analysis under glyphosate stress showed transcription 33 of 41 of the 59 genes, with high expression of EPSPS, aminotransferase, zinc-finger, and several 34 uncharacterized proteins. The genomic architecture of the eccDNA replicon is comprised of a 35 complex arrangement of repeat sequences and mobile genetic elements interspersed among 36 arrays of clustered palindromes that may be crucial for stability, DNA duplication and tethering, 37 and/or a means of nuclear integration of the adjacent and intervening sequences. Comparative 38 analysis of orthologous genes in grain amaranth (Amaranthus hypochondriacus) and water-hemp 39 (Amaranthus tuberculatus) suggest higher order chromatin interactions contribute to the genomic 40 origins of the Amaranthus palmeri eccDNA replicon structure. 41 42

INTRODUCTION 45
McClintock (McClintock, 1984) stated "a sensing mechanism must be present in plants 46 when experiencing unfavorable conditions to alert the cell to imminent danger and to set in 47 motion the orderly sequence of events that will mitigate this danger." Amplification of genes 48 and gene clusters is an example of one such stress avoidance mechanism that leads to altered 49 physiology and an intriguing example of genomic plasticity that imparts genetic diversity which 50 can rapidly lead to instances of adaptation. This phenomenon is conserved across kingdoms, and 51 these amplified genes are often incorporated and maintained as extrachromosomal circular 52 DNAs (eccDNA). EccDNAs are found in human healthy and tumor cells (Kumar et al., 2017;53 Moller et al., 2018), cancer cell lines (Vanloon et al., 1994), in a variety of human diseases, and 54 with limited reports in plants and other eukaryotic organisms (Lanciano et al., 2017). Recent 55 research has reported that eccDNAs harbor bona fide oncogenes with massive expression levels 56 driven by increased copy numbers (Wu et al., 2019). Furthermore, this study also demonstrated 57 that eccDNAs in cancer contains highly accessible chromatin structure, when compared to 58 chromosomal DNA, and enables ultra-long range chromatin contacts (Wu et al., 2019). 59 Reported eccDNAs vary in size from a few hundred base pairs to megabases, which include 60 double minutes (DM), and ring chromosomes (Storlazzi et al., 2010;Turner et al., 2017;Koo et 61 al., 2018b;Koo et al., 2018a). The genesis and topological architecture of eccDNAs are not well 62 understood, however, the formation of small eccDNAs (<2kb) are thought to be a result of 63 intramolecular recombination between adjacent repeats (telomeric, centromeric, satellite, etc.) 64 initiated by a double-strand break (Mukherjee and Storici, 2012) and propagated using the 65 breakage-fusion-bridge cycle (McClintock, 1941). Larger eccDNAs, more than 100kb, such as 66 those found in human tumors, most likely arise from a random mutational process involving all 67 parts of the genome (Vonhoff et al., 1992). In yeast, some larger eccDNAs may have the 68 capacity to self-replicate (Moller et al., 2015), and up to eighty percent of studied yeast eccDNAs 69 contain consensus sequences for autonomous replication origins, which may partially explain 70 their maintenance (Moller et al., 2015). 71 In plants, the establishment and persistence of weedy and invasive species has long been 72 under investigation to understand the elements that contribute to their dominance (Thompson and 73 Lumaret, 1992;Chandi et al., 2013). In the species Amaranthus palmeri, amplification of the 74 gene encoding 5-enoylpyruvylshikimate-3-phosphate synthase (EPSPS) and its product, EPSP 75 synthase, confers resistance to the herbicide glyphosate. The EPSPS gene may become amplified 76 40 to 100-fold in highly resistant populations (Gaines et al., 2010). Amplification of the EPSPS 77 gene and gene product ameliorates the unbalanced or unregulated metabolic changes, such as 78 shikimate accumulation and loss of aromatic amino acids associated with glyphosate activity in 79 sensitive plants (Gaines et al., 2010;Sammons and Gaines, 2014). Previous low-resolution FISH 80 analysis of glyphosate-resistant A. palmeri, shows the amplified EPSPS gene is distributed 81 among many chromosomes, suggesting a transposon-based mechanism of mobility; while EPSP 82 synthase activity was also elevated (Gaines et al., 2010). Partial sequencing of the genomic 83 landscape flanking the EPSPS gene revealed a large contiguous sequence of 297-kb, that was 84 pachytene chromosomes also revealed a chromosome tethering mechanism for inclusion in 92 daughter cells during cell division, rather than complete genome integration. Here, we present 93 the reference sequence of the eccDNA replicon, its unique genomic content and structural 94 organization, experimental verification of autonomous replication, and a putative mechanism of 95 genome persistence. We anticipate these findings will lead to a better understanding of adaptive 96 evolution, and perhaps, toward the development of stable artificial plant chromosomes carrying 97 agronomically useful traits. The eccDNA replicon contains 59 predicted protein coding gene sequences, including the 112 EPSPS gene (Supplemental Table 1 Many of the eccDNA replicon encoded genes have predicted functional protein domains 121 that may endow the critical cellular processes necessary for stress avoidance, maintenance, 122 stability, replication, and tethering of the eccDNA replicon. These processes include DNA 123 transport and mobility, molecule sequestration, hormonal control, DNA replication and repair, 124 heat shock, transcription regulation, DNA unwinding, and nuclease activity, (Supplemental 125 which may be indicative of parallel genomic recombination features. AP_R.00g000496 encodes 142 a helicase domain, which is implicated in DNA replication and unwinding. AP_R.00g000450 143 contains a zinc binding reverse transcriptase domain and an integrase catalytic core, which are 144 characteristic of a retroviral mechanism to integrate viral DNA into the host. This integrase is 145 also found in various transposase proteins and is a member of the ribonuclease H-like 146 superfamily involved in replication, homologous recombination, DNA repair, transposition, and 147 RNA interference (Supplemental Table 1). In the GR biotype with an assembled eccDNA 148 replicon, the most transcriptionally active genes at 24 hours post glyphosate treatment are a 149 number of uncharacterized proteins, the EPSPS synthase (AP_R.00g000210), zinc finger 150 nuclease (AP_R.00g000492), and several genes with plant mobile domains ( Figure 1B).  Table 2). The most transcriptionally 156 active genes include 2 uncharacterized proteins, followed by the EPSPS gene (Supplemental 157 Table 2). Additional genes include 6 other uncharacterized protein sequences and 2 158 aminotransferase like genes that are mostly expressed in the GR biotype. Interestingly, at 4 159 hours after glyphosate and water control treatments, the resistant and sensitive plants gene 160 expression profiles were indistinguishable from their water controls (Supplemental Figure 1 and 161 Supplemental Table 2). At 24 hours after glyphosate exposure, 11 genes were detected to be 162 upregulated in the GR biotype with at least a 2-fold increase when compared to the glyphosate 163 sensitive biotype (Supplemental Table 3). These genes include 7 uncharacterized proteins, 2 164 aminotransferase-like genes, EPSPS, and a protein with a zinc finger SWIM domain 165 (Supplemental Table 3).

Epstein-barr virus has been reported to anchor to the host genome through interaction of encoded 195
Epstein-Barr nuclear antigen 1 (EBNA1) and the cellular protein Eukaryotic rRNA processing 196 protein (EBP2) forming a protein-protein interaction or by directly associating to chromatin via 197 an AT-hook motif that binds to A/T rich sequences on metaphase chromosomes (Wu et al., 2000;198 Sears et al., 2004). In Rhadinoviruses, a role has been suggested for the latency-associated protein that interacts with the helicase protein (E1 helicase) for replication (Masterson et al., 204 1998) and facilitates genome association through interaction within the N-terminal 205 transactivation domain (Skiadopoulos and McBride, 1998). Computational analysis of the 206 eccDNA replicon revealed several genes which may function in the tethering mechanism. 207 AP_R.00g000496 contains 2 core AT-hook motifs (GRP) and also encodes a zinc finger SWIM 208 domain which is recognized to bind DNA, proteins, and/or lipid structures (Supplemental Table  209 1). The optimal binding sequences of the core AT-hook are AAAT and AATT, which when 210 bound together forms a concave DNA conformation for tight binding (Reeves, 2000). In the 211 eccDNA replicion, there are 143 and 186 (AAAT) 2 or (AATT) 2 motifs, respectively (Figure 2A). water-hemp pseudochromosomes, respectively. The spatial topology of these ortholog matches 231 was distributed across most of the pseudochromosomes in the assemblies (Figure 4 and 232 Supplemental Table 4). Eleven of the RBH's between the eccDNA replicon and the two 233 Amaranthus genomes were located in a collinear fashion on the same scaffolds in the same order 234  Table 4). Moreover, this cluster of genes is separated in the middle by a gene that 239 resides on a completely different pseudochromosome (scaffold 3) and is annotated as a NAC (No  Table 4). These results suggest that the 241 eccDNA replicon coding sequences likely originate from multiple genomic regions in 242 Amaranthus palmeri, which supports the hypothesis of intra-genomic recombination during the 243 formation of the eccDNA replicon. 244 The eccDNA replicon is a massive eccDNA vehicle for gene amplification, trait 245 expression, maintenance and transfer of genomic information. The genomic origin is unknown, 246 but likely a result of mobile element activation and extensive genome shuffling that may involve 247 higher order chromatin interactions that may be influenced by xenobiotic pressures. It has 248 various functional modalities for integration, stability and maintenance to ensure genomic 249 persistence. Furthermore, because of the functional implications of the putative genes in the 250 eccDNA replicon, the presence of this unit could cause a general increase in abiotic stress 251 resilience, or perhaps, an increased disposition to adapt. This vehicle may afford new directions 252 in breeding and biotechnology through deeper understandings of its origin and function. Further 253 investigations will be required to ascertain the essential components of eccDNA replicon 254 proliferation and its compatibility in other species. BAC library construction, partial tile path isolation, sequencing and analysis were 277 described previously (Molin et al., 2017). Two additional BAC clones, 08H14 and 01G15, were 278 determined by chromosome walking by hybridization with overgo probes designed from unique 279 distal sequence on the terminal ends of the EPSPS cassette (clones 03A06 and 13C09). These 280 two BAC clones were harvested and sequenced using Pacific Biosciences RSII sequencing to a 281 depth greater than 100X, as described in (Molin et al., 2017). Raw single molecule sequence 282 was self-corrected using the CANU Celera assembler (Koren et al., 2017) with the 283 corOutCoverage=1000 to increase the output of corrected sequences. BAC end sequences were 284 determined using standard Sanger sequencing methods and aligned to the reference assemblies 285 with Phrap and opened in Consed (Gordon et al., 1998) for editing. BAC overlaps were 286 identified using CrossMatch (Gordon et al., 1998)  were prepared using the Circos plotting toolset (Krzywinski et al., 2009). 295

RNAseq 296
Plants for RNAseq were grown as previously to the two-true leaf stage under previously 297 described greenhouse conditions at which time they were transplanted into 8 cm × 8 cm × 7 cm 298 pots containing the same potting mix. Thereafter, plants were watered as needed and fertilized 299 once two weeks after transplanting with a water-soluble fertilizer (Miracle-Gro, Scotts Miracle-300 Gro Products, Inc., Marysville, OH). When seedlings reached the six-leaf stage they were 301 sprayed with either water, or water plus surfactant, or water plus surfactant plus glyphosate using 302 the spray chamber. The surfactant was 0.5 % v/v Tween 20 and glyphosate was applied at 0.42 303 kg•ai•ha −1 after neutralization with 0.1 N KOH solution. Leaves from the third and fourth nodes 304 were harvested for RNA extraction at 0, 4 and 24 hours after treatment. Plants were held for two 305 weeks post leaf harvest to verify survival following glyphosate treatment. 306 Total RNA was harvested at 4 and 24 hours in biological triplicates using the RNeasy plant 307 mini kit (Qiagen). Purified RNA was verified for intactness on a Bioanalyzer 2100 (Agilent) and 308 subject to stranded mRNA-seq using standard TruSeq procedures and sequenced to a target 309 depth of at least 15M reads per sample. Raw sequence data was preprocessed for adapter and (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)). 325

Accession Numbers 326
Sequence data from this article can be found in the EMBL/GenBank data under BioProject ID 327 PRJNA413471; Submission MT025716.

Author Contributions and Acknowledgements 330
Author contributions 331 W.M. and C.S. designed, conducted, analyzed, and interpreted the eccDNA replicon capture, 332 sequencing, and analysis components of the study, which include BAC screening, isolation, 333 sequencing, assembly, annotation, and data analysis. W.M. and C.S. also prepared the initial 334