Reverse transcriptases prime DNA synthesis

Abstract The discovery of reverse transcriptases (RTs) challenged the central dogma by establishing that genetic information can also flow from RNA to DNA. Although they act as DNA polymerases, RTs are distantly related to replicases that also possess de novo primase activity. Here we identify that CRISPR associated RTs (CARTs) directly prime DNA synthesis on both RNA and DNA. We demonstrate that RT-dependent priming is utilized by some CRISPR-Cas complexes to synthesise new spacers and integrate these into CRISPR arrays. Expanding our analyses, we show that primer synthesis activity is conserved in representatives of other major RT classes, including group II intron RT, telomerase and retroviruses. Together, these findings establish a conserved innate ability of RTs to catalyse de novo DNA primer synthesis, independently of accessory domains or alternative priming mechanisms, which likely plays important roles in a wide variety of biological pathways.


INTRODUCTION
The replication of genetic information is essential for the propagation of all organisms and this role is performed by replicati v e pol ymerases, w hich efficientl y duplicate strands of nucleic acids and gi v e rise to two copies of the genome. Polymerase activities are also required for other genome maintenance pathways, including DNA repair and damage tolerance. Replicati v e polymerases can be di vided into two broad classes, depending on their template preference, namel y DN A-dependent and RN A-dependent pol ymerases. While cellular organisms possess genomes composed of DNA, many viruses encode their genetic information within RNA. Over half a century ago, Baltimore and Temin independently discovered that eukaryotic RNA retroviruses encode RNA-dependent DNA polymerases called re v erse transcriptases (RTs) involv ed in viral genome replication ( 1 , 2 ).
Re v erse transcriptases r epr esent a key evolutionary stage in nucleic acid replication as organisms transitioned away from RNA to DNA centric genomes ( 3 , 4 ). RTs belong to a major lineage of replicases, which includes viral RN A-dependent RN A pol ymerases (RdRPs) and Primase-P olymerases (Prim-P ols). In common with other polymerases, the catalytic 'palm' subdomains e volv ed from an ancestral RN A Reco gnition Motif like (RRM-like) fold, containing ad vantageousl y placed acti v e site residues tha t nuclea te metal-dependent ca talysis (5)(6)(7)(8)(9)(10)(11)(12)(13). W hile most RdRPs and Prim-Pols retained the ability to both prime and extend on RNA and / or DNA templates, presumably inherited from ancestral progenitors, RTs appeared to be an exception having a pparentl y lost this innate primase activity.
It has been recognised that RTs also play di v erse roles in cells, including r etrotransposition, telomer e maintenance and Clustered Regularly Interspaced Short Palindromic Repea ts (CRISPR) adapta tion. The general mechanisms of DNA synthesis by RTs have been extensively studied and although a number of unorthodox RT-specific priming mechanisms have been reported ( 14 ), including endonucleaseproduced 3' termini and tRNAs that act as surrogate primers, the putati v e priming mechanism(s) of other RTs remains to be elucidated. Gi v en the near uni v ersal usage of de novo primer synthesis to initiate DNA replication, it is conspicuous that an equivalent mechanism has not been identified within RT-dependent replicati v e comple xes.
A highly di v ergent group of RTs, pr edominantly r elated to group II intron RTs (GIIiRT), are genetically associated with a subset of Type III CRISPR-Cas operons and reportedly play roles in spacer acquisition from RNA (15)(16)(17)(18)(19)(20). In this study, we investigated the activities of C RISPRa ssociated r e v erse t ranscriptases (CARTs) and discovered a previously unidentified DNA priming mechanism. We sho w ho w this primer synthesis activity is utilised to facilitate the integration of spacers, originating from RNA, into CRISPR arrays. We also expand our study to other major phylogenetic groups of RTs and establish the conservation of de novo primer synthesis across the RT superfamily.

Bioinformatics
Sequences of RTs, previously identified to associate with CRISPR, were extracted from their respecti v e original publications ( 18 , 21 , 22 ). Sequences were aligned using MUS-CLE ( 23 ) and phylogenetic tree was build using FastTree 2 ( 24 ). The RT domain type was inferred using myRT tool ( 25 ).

Cloning, expression and purification of recombinant proteins
Table of plasmids used in this study is provided in Supplementary Table S1. The sequence of all cloned genes was codon optimised for expression in Esc heric hia coli Bl21 using IDT's Codon Optimization Tool. Table of recombinant protein expression and purification conditions is provided in Supplementary Table S2. In general, for protein expression BL21 Star (DE3) E. coli strain (Invitrogen) was used, using TB medium (Molecular Dimensions, MD12-104-01) and induced by 1 mM isopropyl ␤-D -1thiogalactopyranoside (IPTG) either as overnight expression at 16 • C or 3-h expression at 37 • C. The cells were disrupted by sonication and centrifuged at 50 000 g for 30 min, proteins were purified by indica ted purifica tion methods found in Supplementary Table S2, and flash-frozen in liquid nitrogen as 50% glycerol stocks. Two g of purified proteins used in this study were run on SDS-PAGE and Coomassie stained (Supplementary Figure S1).

Polymerase assay
All standard polymerase reactions were assayed in a 20 l volume, containing 10 mM buffer, 10 mM divalent metal cation, 0.2 mM dNTP mix, 50 nM FAM-labelled primer strand annealed with a template strand (oPK404-407, Supplementary Table S3) and 50 nM protein. Detailed reaction conditions for each assay are displayed in Supplementary Table S4. After indicated time, the reactions were stopped with an equal volume of CTAB buffer (200 M cetyltrimethylammonium bromide (CTAB), 30 mM (NH 4 ) 2 SO 4 , 25 mM ethylenediaminetetraacetic acid (EDTA)) and centrifuged at 16 000 g to precipitate nucleic acids. The supernatant was aspirated and the pellet dissolved in sample loading buffer (92.5% formamide, 25 mM EDTA, 0.5% Ficoll 400), boiled at 95 • C for 3 min, loaded onto 10% TBE-urea PAGE gel (10% acrylamide / bisacrylamide, 19:1, 8 M urea and 1 × Tris / borate / EDTA (TBE)) and resolved for 90 min at 25 W. Gel were imaged on a FLA 5100 scanner (FujiFilm) and final image was adjusted in FIJI ( 26 ) using the linear range.

Fluorescence polarization assay
Equilibrium binding r eactions wer e assayed in a 50 l volume, containing 10 mM Tris pH 7.5, 10 mM MnCl 2 , 0.05% Tween-20, 50 nM FAM-labelled primer strand annealed with a template strand (oPK404-407, Supplementary Table S3) and increasing protein concentration up to a value, which started to influence parallel fluor escence. Fluor escent polarization was measured by CLARIOstar (BMG Labtech) with filter settings (482-16 / LP 504 / 530-40) and results normalized by subtracting background fluorescent polariza tion. Da ta was fitted with Hill-Langmuir equa tion and plotted using Python.

Gel-based primase assay
All standard gel-based primase reactions were assayed in a 20 l volume, containing 10 mM buffer, 10 mM divalent metal cation, 1:10 ratio of labelled and unlabelled nucleotide (Supplementary Table S3 ( 26 ) using the linear range. For quantification, the signal of each sample in individual lanes without signal of the fluorescently labelled mononucleotide signal was measured using FUJI ( 26 ) and the background signal of the no pr otein contr ol was subtracted from each protein sample lane from the same gel. The signal of priming products of HsPrimPol was used as a standard (100%) for other samples from the same gel. Python was used to calculate mean, SD and create the bar graph.
Inter calating fluor escent dy e-based primase assa y Basic principle of this assay exploits the use of 3' end phosphoryla ted templa tes and GelRed (Biotium) intercala ting fluorescent d ye to detect dsDNA forma tion. All standard reactions were assayed in 50 l volume, containing reaction components indicated in Supplementary Table  S6. Fluorescence intensity was measured by CLARIOstar (BMG Labtech) with filter settings (545-20 / 600-20) over the course of 15-30 min. The time course was plotted as an average of three independent reactions with standard error highlighted (CI 95) in python. The first-order reaction rates were compared after linear regression fit of the linear portion of curves and statistical analysis using python. The data for Michaelis constant ( K m ) of dGTP was fitted as slopes of the linear r egr ession fit of the reaction rates with Hill-Langmuir equation and plotted using Python. Statistical analysis was performed using sta tannota tions Python package with two-tailed independent t-test.

Pyrophosphate luminescence-based primase assay
Basic principle of this coupled assay combines the release of pyrophosphate (PPi) upon phosphodiester-bond formation with the luminescent detection of PPi by PPiLight kit (Lonza). All standard reactions were assayed at room temperature in 50 l volume, containing 10 mM Tris pH 7.5, 1 mM MnCl 2 , 1 M oMZ13, 10 nM protein and varying dGTP concentra tions. The da ta for Michaelis-Menten constant of dGTP was fitted as slopes of the linear r egr ession fit of the reaction rates using the Hill-Langmuir equation and plotted using Python.

DNA strand-displacement assay
Strand displacement assays involving gapped DNA substrates were performed as previously reported ( 22 , 27 ). In brief, substrates were formed by annealing oligonucleotides oNB1 and oNB2 ( 22 , 27 ) with oNB8-11 (Supplementary  Table S3). 50 nM substrate was mixed with 100 M dNTPs and 50 nM Ca CART or 100 nM Mm CART. The reactions were incubated at 37 • C for 30 min, reaction was processed as per polymerase assay samples, loaded onto 10% TBEurea PAGE gel and resolved for 90 min at 25 W. Gels were imaged on a FLA 5100 scanner (FujiFilm) and the images were adjusted in FIJI ( 26 ) using the linear range.

In vitro primed prespacer integration assay
10 M MmCas6-CART-Cas1 and 20 M MmCas2 were pr emix ed together in Protein dilution buffer and 20 l of the pr emix ed protein was added to the 200 l r eaction (final volume) containing 50 mM Bis-Tris Propane; pH7, 10 mM MnCl 2 , 10 mM TCEP, 10 M ␥ -phosphate FAM-labelled GTP (FAM-␥ GTP), 100 M dTTP, dGTP and dCTP and 1 M short ssRNA template (oKZ535). The reaction was incuba ted a t 37 • C for 15 min before addition of CRISPR arr ay B substr ate (25 nM final). The reaction was incubated at 37 • C for 60 min before stopping by addition of 4 l of Proteinase K (0.8 U / ul --NEB) and incubated for another 30 min at 37 • C. 25 l of the reaction (INPUT fraction) was mixed with 25 l of CTAB buffer and spun for 10 min at 16000 g at room temperature. The pellet was resuspended in 20 l of sample loading buffer (92.5% formamide, 25 mM EDTA, 0.5% Ficoll 400) before loading on urea-PAGE. 8.75 l of 200 mM PMSF was added into the remaining 175 l of the reaction to inactivate proteinase K. 2 l of the reaction was mixed with 8 l of H 2 O, boiled for 10 min and 1 l was used in 10 l PCR reaction. 70 l of magnetic streptavidin beads (Merck, Roche --11641786001) and 200 l of protein dilution buffer containing 25 mM EDTA was added to the r emaining r eaction followed by incubation for 15 min at room temperature. The magnetic beads were washed 3 times with 500 l of Protein dilution buffer and resuspended with 100 l of water and 200 l of DNA Cleanup Binding Buffer (Monarch ® PCR & DNA Cleanup Kit, NEB) and incubated for 5 min at room temperature before addition of 600 l of 100% Et-OH. The sample was loaded to the Monarch ® DNA cleanup columns following NEB oligon ucleotide clean up protocol. The samples were eluted with 20 l of sample loading buffer --BOUND fraction. All samples (input and bound fractions) were boiled for 3 min and resolve on 10% urea-PAGE (19:1 acrylamide / bisacrylamide, 8 M urea) in 1 × TBE for 75 min at 25 W. Gel was imaged on a FLA 5100 (FujiFilm) and images adjusted in FIJI ( 26 ) in the linear range.

PCR, cloning and sequencing of in vitro primed prespacer integration assay products
Phusion polymerase (NEB) in combination with GC buffer was used to amplify Cas1-Cas2 integrated products in 7128 Nucleic Acids Research, 2023, Vol. 51, No. 14 CRISPR array using primers oKZ536 and oKZ537 which include adaptors for NEBuilder HiFi DN A assembl y cloning into Hind III digested pUC19. Primer oKZ537 binds to the repeat-spacer end of the CRISPR array and oKZ537 binds only if the integrated DNA strand is a priming product of RT using RNA template oKZ535. PCR conditions: T a --61 • C, extension time --10 s, 20 cycles. The PCR product was resolved on 2% agarose gel in 1 × TAE containing ethidium bromide. Gel images were adjusted in FIJI ( 26 ) using the linear range.
After gel extraction, the PCR product was cloned into Hind III digested pUC19 using NEBuilder HiFi DNA assembly cloning kit (NEB). The cloning products were transformed into E. coli and the plasmid from 22 single colonies was isolated and send form Sanger sequencing. The sequencing r esults wer e analysed using QIAGEN CLC Main Workbench. For the alignment, the sequence of pUC19 was omitted. For simplicity 5 sequenced samples were omitted in Figure 4 e, howe v er, all samples are shown in the Supplementary Figure S4.

Pr epar ation of DNA / RNA substrates
Sequences and modifications of the synthetic oligonucleotides used are shown in Supplementary Table S2. Overhangs and gapped substrates for polymerase and stranddisplacement assays were prepared by mixing equimolar amounts of the corresponding oligonucleotides in buffer containing 10 mM Tris pH 7.5 and 50 mM NaCl, hea ting a t 95 • C for 3 min and cooling slowly to room temperature.
The CRISPR array A was pr epar ed by PCR amplification using pKZ223 plasmid as a template and primers oKZ500 and oKZ501, Phusion polymerase and GC buffer, T a --58 • C, extension time --15 s, 40 cycles in 200 l. The products were resolved on 10% native PAGE (37.5:1 acrylamide / bis-acrylamide) in 1 × TBE for 1 h at 120 V, the band containing labelled PCR product was cut out and eluted from the crushed gel into water by diffusion for 1 h at room temperature. The DNA was precipitated with ethanol and resuspended in water.
The CRISPR array B was pr epar ed by PCR amplification using pKZ223 plasmid as a template and primers oKZ515 and oKZ503, Phusion polymerase and GC buffer, T a --58 • C, extension time --15 s, 40 cycles in 400 l. The PCR product was precipitated with ethanol and resuspended in 25 l of H 2 O and then loaded on Dye Ex 2.0 (Qiagen) to r emove fr ee nucleotides. Eluted DNA was 3' labelled by Cy5-ddCTP (Jena Bioscience, NU-850-CY5) using Terminal Deoxynucleotidyl Tr ansfer ase (TdT) (Thermo Fisher Scientific, EP0161) in 80 l reaction containing DNA, 0.1 mM Cy5-ddCTP, 15 l of 5 × reaction buffer and 140 U of TdT which was incubated for 2 h at 37 • C. The products wer e r esolved on 10% nati v e PAGE (37.5:1 Acrylamide / Bis-acrylamide) in 1 × TBE for 75 min at 120 V, the band containing labelled DNA fragment was cut out and eluted from the crushed gel into water by diffusion for 1 h at room temperature. The DNA was ethanol precipitated and resuspended in 200 l of water.

Crystallisation, data collection and structure determination
Crystal screening experiments were set up with 500 mM Ca CAR T-CAPP R T (aa1-204) domain and matrix screens (Molecular Dimensions, Hampton Research) using sittingdrop, vapour-diffusion method, with equal volumes of protein solution and reservoir buffer. Crystal was grown in 20% PEG 3350 and 0.2 M sodium tartrate dibasic dihydrate, and cryoprotected in the mother liquor with 25% PEG 400. Dif fraction da ta were collected a t beamline I24 of Diamond Light Source (Didcot, UK).

DNA polymerase activities of Ca CART-CAPP
A subset of CARTs are fused with C RISPR-a ssociated P rim-P ols (CAPPs). CAPPs possess both primer synthesis and e xtension acti vities, implicated in spacer acquisition in Type III CRISPR-Cas systems ( 22 ).

DNA primase activities of Ca CART-CAPP
Gi v en the PP domains of CAPPs also possess de novo primer synthesis activities, implicated in CRISPR-Cas adaptation processes ( 22 ), we next analysed such activities for Ca CART-CAPP (FL) and observ ed that it also e xhibited a robust primase activity. Although the RT-PP fragment was primase proficient, PP domain alone was unexpectedly not, correlating with the lack of polymerase activity. In contrast and unexpectedly, the RT domain alone possessed the striking capacity to perform DNA primer synthesis, comparable with the level of activity of FL and RT-PP. We previously showed that PP domains of CAPPs r equir e a ribonucleotide, with a strong pr efer ence for purines, to initiate primer synthesis ( 22 ). To determine if this is also the case for CARTs, we characterised the substrate r equir ements for the primase activity of the RT domain of Ca CART-CAPP. The initiation of primer synthesis by RT domain is not directly dependent on NTPs (Figure 2 F), but NTPs can be incorporated in the first position and alwa ys f ollowed by dNTPs, enabling us to specifically detect primer synthesis products using ␥ -phosphate FAM-labelled  GTP in gel-based assays (Figure 2 G). Analysing different DNA template sequences for their ability to promote primer synthesis re v ealed a clear pr efer ence for the initiation of primer synthesis at cytosines, incorporating guanosines to begin the synthesis of a new DNA strand (Figure 2 H). A minimum of two cytosines in the template sequence was sufficient to act as a potent initiation site for primer synthesis (Figure 2 I). The affinity for dGTP on a homo-polymeric DNA template (C 20 ) was approximated by measuring its K m with two independent methods, first detecting the dsDNA formation with a DNA intercalating dye (Figure 2 J). Second, detecting the release of pyrophosphate upon the phosphodiester bond formation (Figure 2 K) that also identified a strong co-operativity of dGTP (Hill coefficient ≥ 2), as expected for dinucleotide formation. In conclusion, RT domain of Ca CART-CAPP is a bona fide DNA primase and polymerase, that is acti v e on both DNA and RNA templa tes, initia tes de novo synthesis with NTPs or dNTPs, and has a strong pr efer ence for priming on templates containing a CC sequence.

Conservation of catalytic activities in CART proteins
Known Thus, we hypothesize the inacti v e PP domain is likely a common feature of these multidomain proteins, and the RT domain has replaced its activities. Together, these results establish a prototypical activity profile of CARTs, which utilize a broad range of substrates for the initiation of de novo DNA synthesis, and its extension, to potentially facilitate their putati v e role(s) in CRISPR adaptation.

Primase activities of CART facilitates integration into CRISPR arrays
To determine the biological relevance of the primase activities of the RT domains, we investigated the adaptation stage of type III CRISPR complexes involving CARTs, as a functional model for understanding their biological r ole(s). CRISPR-Cas oper ons containing CARTs can integrate new spacers originating from RNA sources into CRISPR arrays and mutation of the CART domain acti v e site abolishes this activity in vivo (15)(16)(17)(18)(19)(20). To explain these in vivo observations at the molecular le v el and understand the specific role(s) of CARTs primase activity in the na ïve CRISPR adaptation -spacer acquisition step, we conducted in vitro spacer integration assays with purified Marinomonas mediterranea ( Mm ) integrase complex composed of Mm Cas6-CART-Cas1 and Mm Cas2. To optimize reaction conditions for these assays, we first analysed the enzymatic properties of Mm Cas6-CART-Cas1. This protein was capable of DNA e xtension acti vity with magnesium and manganese (Supplementary Figure S2B). It was proficient at primer extension in the presence of dNTPs on e v ery nucleic acid substra te combina tion (Supplementary Figure S2C) but exhibited very limited strand-displacement activity (Supplementary Figure S2D). The primer synthesis activity was not dependent on the presence of NTPs (Supplementary Figure S2E) and was only supported by manganese (Supplementary Figure S2F). The initiation of de novo synthesis was most efficient on templates containing a CC sequence (Supplementary Figure S2G). The RT domain catal ytic m utant (D532N, D533N) was inacti v e. Although RT-dependent priming was most efficient on CC sequences in vitro , other factors may influence the sequence preference of Mm RT domain in vivo , e.g. cellular concentrations of nucleotides, sequence context, secondary structures, etc. which could significantly change or abolish RT's bias for specific RNA template sequences. Ther efor e, it is not surprising that no pr efer ence for protospacer sequence, including 15 bp of flanking sequence on each protospacer, was observed (Note: spacers in the CRISPR array are (usually) deri v ed from fragments of viral or plasmid genetic information termed protospacers.) ( 16 ). Howe v er, we can't e xclude the possibility that the synthesized prespacers, deri v ed from pr otospacers, are significantly pr ocessed before their integration into the CRISPR array and ther efor e this information has been lost in the sequencing analyses. Nonetheless, Mm RT domain's clear pr efer ence for CC sequences in vitro was taken advantage of when designing RNA templates for the modified prespacer integration assays, described below. Next, we utilized a 5' FAM-labelled Mm CRISPR array and a variety of 5' Cy5 / Cy3-labelled DNA or RNA substrates (Supplementary Figure S3A), to visualize products of prespacer integration assays. Under conditions with manganese and without dNTPs, we observed integration of ssDN A, dsDN A and the DN A strand of an RN A-DN A heteroduplex into the Mm CRISPR array by Mm Cas6-CART-Cas1-Cas2 comple x. Howe v er, RNA integration under the same conditions, either stand-alone or in a heteroduplex with DNA, was very inefficient ( Supplementary Figure S3B) suggesting that DNA synthesis is requisite for integration of RNA-deri v ed prespacers ( 16 ). In assay conditions with magnesium and with or without dNTPs, integration was observed the only for ssDNA (Supplementary Figure S3C) and in the presence of dNTPs a small fraction of ssDN A and ssRN A prespacers was extended.
Howe v er, both ssDNA and ssRNA were integrated in presence of manganese and dNTPs ( 16 ) (Supplementary Figure S3D, S3E) and a significant fraction of both prespacers was extended. RT domain catalytic mutant (D532N, D533N; RT M) did not extend these protospacers, indica ting tha t the RT domain was responsible for the nonspecific extension of ssDNA and ssRNA prespacers. This enabled the integration of ssRNA prespacers, extended by dNTPs, into the Mm CRISPR array (Supplementary Figure S3E) and showed that Mm Cas6-CART-Cas1-Mm Cas2 complex r equir es deoxynucleosides on the 3' end of the prespacer for the integration. It was previously suggested, that RT domain is extending the 3' end of the 'nicked' CRISPR arr ay DNA str and and cop ying the r epeat and integrated ssRN A ( 16 ), w hich could be attributed to strand displacement synthesis. Howe v er, we did not observe extension of the 3' end of the 'nicked' CRISRPR array DNA strands (Supplementary Figure S3C), possibly due to the limited strand displacement activity of Mm Cas6-CART-Cas1 (Supplementary Figure S2D), consistent with activities found in other Cas6-CART-Cas1 proteins ( 19 ). These findings indica te tha t Mm Cas6-CART-Cas1 is unlikely to be involved in strand-displacement synthesis after integration of ssRNA into CRISPR arrays, as was previously proposed ( 16 ).
Howe v er, the Mm Cas6-CART-Cas1 --Mm Cas2 complex could potentially utilize its RT-dependent primase activity to synthesise a DNA prespacer, originating from ss-RNA, and integrate it into the CRISPR array. To test this hypothesis, we used a modification of the prespacer integration assay (Figure 4 A, B), in which Mm Cas6-CART-Cas1 --Mm Cas2 complex was preincubated with ss-RNA template, nucleotides (dCTP , dGTP , dTTP and ␥phosphate FAM-labelled GTP (FAM-␥ GTP)) and manganese to allow de novo synthesis of 5' end FAM-labelled DNA. To pre v ent non-specific extension and integration, the ssRNA template was chain terminated with an inverted dTTP on the 3' end. Mm CRISPR array, biotin-labelled on its 5' end of its leader sequence and chain terminated and Cy5-labelled on both 3' ends (Cy5-dideoxycytidine) (Figure 4 A), was added to initiate the prespacer integration. After the integration, the biotin ylated CRISPR arra y was bound to streptavidin beads to selecti v ely purify de novo synthesized FAM-labelled DNA prespacers integrated into Mm CRISPR arrays, and products were separated on denaturing gels (Figure 4  No integration was observed with the Cas1 mutant (Figure 4 C). Together, these results establish that the Cas1dependent integration into CRISPR array of de novo synthesized DNA strands copied from RNA templates is dependent on the RT primase activity.
These r esults wer e v alidated b y PCR analyses to confirm that pr oducts fr om the CRISPR integration assays r epr e-sent bona fide RNA-deri v ed sequences (Figure 4 D). Primers complementary to the 3' end of the RNA template and the 3' spacer end of the Mm CRISPR array were used to amplify the integration products (Figure 4 D, top panel). Notably, only reactions containing ssRNA template, Mm CRISPR array and Mm Cas6-CART-Cas1 WT produced amplicons of the expected size ( ∼127 nt), confirming the specific integration of de novo synthesized DNA (Figure 4 D, bottom  panel). These amplicons were not observed with mutant proteins (RT M and Cas1 M) or when the CRISPR array was omitted. Sub-cloning of integration products after PCR amplification and sequencing of individual colonies confirmed that the consensus sequence of the spacers integrated into the leader-repeat junction of the Mm CRISPR array was complementary to the RNA template ( Figure  4 E, Supplementary Figure S4). The new spacers integrated into the Mm CRISPR array varied in length and sequence, which corelates with the observation of variable length of de novo synthesized products (Figure 4 C). Together, these findings establish that the primase activity of the RT domain of Mm Cas6-CART-Cas1 is r equir ed to r e v erse transcribe ssRNA templates into new DNA strands, which are then ef ficiently integra ted into the Mm CRISPR arrays by Cas1-Cas2.

Crystal structure of the RT domain of Ca CART-CAPP
To better understand the ar chitectur e of CARTs, we elucidated the crystal structure of RT domain of Ca CART-CAPP at 1.63 Å (Figure 5 A), crystallised in space group P3 2 with 3 monomers in the asymmetric unit ( Table 1 ). The architecture of the RT catalytic core is composed of a 'palm' subdomain with an R NA R ecognition M otif like (RRMlike) fold, which contains catalytic acidic residues bound to a single metal ion (manganese coordinated by D73, I74 and D154) and 'fingers' subdomain, adhering to the canonical 'hand' analogy for replicati v e polymerases. The N-terminal part of the 'fingers' subdomain is highly dynamic, reflected by the high B-factors in the modelled residues (average of 122.0 Å 2 for atoms in the first 20 r esidues, compar ed to 40.4 Å 2 for the rest of the monomer) (Figure 5 B). The structure of Ca CAR T-CAPP R T domain was elucidated by molecular replacement using a bacterial Geobacillus stearothermophilus GsI-IIC intron RT ( Gs RT) from the GIIiRT protein family ( 31 ). Both structur es ar e r emar kab ly similar, with a root-mean-square deviation (RMSD) of 2.1 Å across 190 aligned residues (Figure 5 C). The catalytic core of GI-IiRTs contains three major extensions, which dif ferentia tes them fr om retr oviral RTs, including an N-terminal extension (NTE), Motif 2 extension (RT2e) and Motif 3 extension (R T3a) ( 31 ). Ca CAR T-CAPP R T domain lacks an NTE, which is likely contributing to the higher flexibility of the N-terminus. The alignment of selected members of the RT protein superfamily shows the conserved RT motifs ( Figure 5 D) among different phylogenetic branches (Figure 5 E).

Conservation of DNA priming in other RT superfamily members
The close sequence and structural relationships of the cor e catalytic r esidues prompted us to inv estigate possib le    170 P Q G S I I S P P Q G G P L S P P L G S P L S P P Q G A P I S P P Q G G I L S P V Q G S P L S P P Q G S I L S T R Q G C P L S P V Q G N A I S P P Q G W K G S P P F G L V N A P L Q G D P L S G primase activities among other major phylogenetic branches of the R T superfamily. GIIiR Ts comprise the largest RT family in prokaryotes and play major roles in the life cycle of mobile genetic elements (MGEs), from which they arise. We selected Gs RT, a prototypical standalone GIIiRT, tha t opera tes as part of a group II intron retrotransposon ( 37 ). Full length Gs RT preferentially extends DNA primers on DNA and RNA templates with dNTPs (Figure 6 A, Supplementary Figure S5A) but extension with NTPs is less efficient (Supplementary Figure S5B). Similar to CAR T primase activities, Gs R T displays robust dinucleotide synthesis with pr efer ence for RNA templates (Figure 6 B), which results in weaker DNA-dependent priming (Figure 6 C) when compared to RNA-dependent priming (Figure 6 D). The initiation of de novo DNA synthesis is not dependent on the presence of NTPs (Supplementary Figure S5C). Gs RT can efficiently form different dinucleotides, showing a broader substrate specificity for the templa ted initia tion of synthesis than observed for Ca CAR T-CAPP R T domain (Supplementary Figure S5D). The primase activity is supported by manganese and cobalt (Supplementary Figure S5E) at temperatures up to 50 • C (Supplementary Figure S5F). The RT domain catal ytic m utant (D223N, D224N) didn't possess any activity. In contrast to CAR Ts, Gs R T exhibited a much stronger pr efer ence for RNA substrates. We hypothesise that, gi v en that the life cy cle of gr oup II intr ons involv es re v erse transcription of an intron RNA during retrotransposition, DNA-dependent synthesis may not be r equir ed for its physiological functions.
Other mobile group II introns encode RTs that contain an additional endonuclease domain (EN), proposed to produce 3' ends that act as initiation primers for their extension activity. We purified the RT domain of Ll LtrA (aa 1-472) from Lactococcus lactis ( Ll ) ( 38 ) lacking the EN domain and observed that, besides its primer extension activity (Figure 6 E), it also possesses RNA-dependent primer synthesis activity (Figure 6 F, G), not observable in RT domain catal ytic m utant (D308N, D309N). Such de novo DN A synthesis activity can provide an alternati v e, or e v en complementary, mode of priming that assists in the replication of MGEs.
Budding yeast ( Sacchar om y ces cer evisiae ) Ty3 LTRretrotransposon ( Sc Ty3) encodes a RT r equir ed for its replication, which is more di v ergent when compared to bacterial RTs. To determine if it exhibits catalytic activities similar to CAR Ts or GIIiR Ts, we purified the RT domain of Sc Ty3 (aa 1-339) ( Sc Ty3 RT) with a C-terminal MBP fusion.

Gs
RT Ll

ScTY3
TERT Tc  We observed DNA-dependent DNA polymerase activity (Figure 6 H), utilizing magnesium, manganese or cobalt for primer extension (Supplementary Figure S6A). We observ ed e xtension of DNA, but not RNA, primers (Supplementary Figure S6B). This is contrary to a previous report ( 39 ), which may be due to the absence of a nucleocapsid protein (NCp9) to facilitate extension of RNA primers ( 40 , 41 ). Similar to Gs RT, Sc Ty3 RT displays dinucleotide synthesis ( Retroviral RTs were next investigated, choosing to study the p66 -p51 heterodimer from human immunodeficiency virus (HIV RT). Initially, we did not observe any significant primase activity in gel-based primase assays. Howe v er, it was later discovered that this resulted from an inability of HIV RT to incorporate fluorescently labelled nucleotides during primer synthesis. To circumvent this issue, we used radiolabelled dGTP in gel-based primase assays and observed robust de novo DNA primer synthesis of HIV RT (Figure 7 A). The primase activity of HIV RT was unexpectedly stronger on homopolymeric ssDNA, than ssRNA (Supplementary Figure S7A). This led to further investigation of its substrate pr efer ences using an intercalating fluorescent dye-based primase assay. HIV RT initiated de novo DN A synthesis onl y on ssDN A containing at least three consecuti v e cytosines (Figure 7 B), which notably correlates with the three cytosines in the tRNA primer binding region of the viral RNA ( 42 ). The affinity for dGTP on a homopol ymeric ssDN A template (C 20 T 20 ) was approximated by measuring its K m in the sub-millimolar ranges (Figure 7 C). Further studies are required to establish whether de novo primer synthesis occurs during viral replication in vivo .
Next, we compared the primase activities of all RTs from this study with full-length human PrimPol protein ( Hs PrimPol), a member of Prim-Pol superfamily with established primase activities ( 43 , 44 ). The results from two different primase assays are shown here. First, a gel-based primase assay (  Figure S7C). Similarly, no primase activity of Klenow fragment was detected in the second assay (Figure 7 F). All tested RTs showed le v el of primase activities comparable to Hs PrimPol in the gel-based assay (Figure 7 D, E). We additionally show the primase activity of Fusicatenibacter saccharivorans Fs CART-Cas1 RT domain, to exemplify a conservation of such activity in a protein domain fusion, other than shown previously. Note, Ll LtrA R T and HIV R T were excluded from the comparison as Ll LtrA RT showed no detectable primase activity on ss-DNA templates (Figure 6 F) and HIV RT doesn't incorporate FAM-labelled nucleotides. In the intercalating fluorescent dye-based primase assay, most of RTs showed primase activities either comparable to Hs PrimPol, or at least significantly more than their catalytic mutants (Figure 7 F, Supplementary Figure S7D). We attribute the lower observable activities for some of RTs to the limitation of the second assay, w hich can onl y use ssDN A substrates, and some RTs having much stronger activities on ssRNA, e.g . Ll LtrA RT and Sc Ty3 RT. In summary, the primase activities of most RTs are comparable with a known exemplar primase, indica ting tha t these observa tions are not artefactual, and thus