PAM recognition by miniature CRISPR-Cas12f nucleases triggers programmable double-stranded DNA target cleavage.

Abstract In recent years, CRISPR-associated (Cas) nucleases have revolutionized the genome editing field. Being guided by an RNA to cleave double-stranded (ds) DNA targets near a short sequence termed a protospacer adjacent motif (PAM), Cas9 and Cas12 offer unprecedented flexibility, however, more compact versions would simplify delivery and extend application. Here, we present a collection of 10 exceptionally compact (422–603 amino acids) CRISPR–Cas12f nucleases that recognize and cleave dsDNA in a PAM dependent manner. Categorized as class 2 type V-F, they originate from the previously identified Cas14 family and distantly related type V-U3 Cas proteins found in bacteria. Using biochemical methods, we demonstrate that a 5′ T- or C-rich PAM sequence triggers dsDNA target cleavage. Based on this discovery, we evaluated whether they can protect against invading dsDNA in Escherichia coli and find that some but not all can. Altogether, our findings show that miniature Cas12f nucleases can protect against invading dsDNA like much larger class 2 CRISPR effectors and have the potential to be harnessed as programmable nucleases for genome editing.


INTRODUCTION
Clustered regularly interspaced short palindromic repeat (CRISPR) associated (Cas) microbial defense systems protect their hosts against foreign nucleic acid invasion (1)(2)(3). Utilizing small guide RNAs (gRNAs) transcribed from a CRISPR locus, accessory Cas proteins are directed to si-lence invading foreign RNA and DNA (2,4,5). Based on the number and composition of proteins involved in nucleic acid interference, CRISPR-Cas systems are categorized into distinct classes 1-2 and types I-VI (2,6). Class 2 systems encode a single effector protein for interference and are further subdivided into types, II, V and VI (7). Cas9 (type II) and Cas12 (type V) proteins have been shown to cleave invading double-stranded (ds) DNA, single-stranded (ss) DNA and ssRNA (8)(9)(10)(11)(12)(13). To recognize and cleave a dsDNA target, both Cas9 and Cas12 require a short sequence, termed the protospacer adjacent motif (PAM), in the vicinity of a DNA sequence targeted by the gRNA (8,9,14). Over the past several years, these endonucleases have been adopted as robust genome editing and transcriptome manipulation tools (15)(16)(17)(18). Although both nucleases have been widely used, the size of Cas9 and Cas12 provides constraints on cellular delivery that may limit certain applications, including therapeutics (19,20).
Recently, small CRISPR-associated effector proteins (Cas12f) belonging to the type V-F subtype have been identified through the mining of sequence databases (6,7,13,21). The locus architecture for some of these systems includes genes (cas1, cas2 and cas4) that encode an adaptation module ( Figure 1). The presence of these in combination with functional assessments for a single nuclease were previously used to denote a subset of Cas12f proteins as Cas14 (21). More recent classifications have grouped this family into Cas12f1 (Cas14a and type V-U3), Cas12f2 (Cas14b) and Cas12f3 (Cas14c, type V-U2 and U4) (6). In this study, we use the updated nomenclature and also keep the Cas14 designation to bridge the different naming conventions.
The majority of Cas12f proteins (from V-U3 and Cas14 families) are nearly half the size of the smallest Cas9 or Cas12 nucleases ( Figure 1) (2,6,7,21 orthologs, the N-terminal half of Cas12f proteins differs significantly in length accounting for most of the size difference between the two groups ( Figure 1 and Supplementary Figure S1). Due to this and their similarity to transposaseassociated TnpB proteins, it is hypothesized that they are remnants or intermediates of type V CRISPR-Cas system evolution and are incapable of forming the protein architecture required for dsDNA target recognition and cleavage (7,21). To our knowledge, functional characterization of Cas12f proteins are limited to a single protein identified from an uncultured archaeon (Un1), initially named Cas14a1 and renamed here to Un1Cas12f1, which was shown to exclusively target and cleave ssDNA in a PAMindependent manner (21). Additionally, target binding and cleavage by Un1Cas12f1 triggered collateral nuclease activity that manifested as trans-acting non-specific ssDNA degradation (21), a feature that seems to be largely shared across the type V family, although Cas12g was reported to initially target ssRNA and then indiscriminately degrade both ssDNA and ssRNA (12,13).
Here, using biochemical assays, we report that miniature Cas12f effectors (from Cas14 and type V-U3 families), like most other Cas12 proteins, are able to cleave dsDNA targets if a 5 PAM sequence is present in the vicinity of the guide RNA target. For the Cas12f proteins from the Cas14 family, this primarily consists of 5 T-rich sequences while type V-U3 PAM recognition includes both C-and T-rich motifs. Additionally, we demonstrate that some type V-U3 bacterial nucleases (but none of the Cas14 proteins tested) function as bona fide CRISPR-Cas systems in E. coli cells to protect against invading dsDNA despite their small size. These findings provide the first experimental evidence that miniature Cas12f effectors are programmable dsDNA nucleases and lay the foundation for the adoption of these proteins as genome editing tools.

Engineering CRISPR-Cas12f systems to target a PAM library
CRISPR-Cas12f systems were modified to target the 7 bp randomized PAM library described previously (22). This was accomplished by replacing the native CRISPR array with three repeat:spacer:repeat units that encoded a spacer (33-39 nt depending on the average spacer length observed in the respective Cas12f system) capable of complementing to a sequence (anti-sense strand) immediately 3 of the region of PAM randomization. The engineered CRISPR-Cas12f systems were then synthesized (GenScript) and cloned into a modified pET-duet1 (MilliporeSigma) or pA-CYC184 (NEB) plasmid. For the CRISPR-Cas12f1 system initially named Cas14a1 (21) and renamed here as Un1Cas12f1, the pLBH531 MBP-Cas14a1 plasmid (gift from Jennifer Doudna, Addgene plasmid #112500) was used. The sequences of the Cas12f proteins are listed in Supplementary Data S1 file and links to the plasmid sequences encoding the Cas12f-CRISPR systems engineered to target the PAM library are provided in Supplementary Table S2.

RNA synthesis
Un1Cas12f1 (Cas14a1) single guide RNAs (sgRNA) were produced by in vitro transcription using TranscriptAid T7 High Yield Transcription Kit (Thermo Fisher Scientific) and purified using GeneJET RNA Purification Kit (Thermo Fisher Scientific). Templates for T7 transcription were generated by PCR using overlapping oligonucleotides, altogether, containing a T7 promoter at the proximal end followed by the sgRNA sequence. Sequences of the sgRNAs used in our study are available in Supplementary Data S1 file.

Detecting Cas12f dsDNA cleavage and PAM recognition
Plasmid DNA targets were cleaved with Cas12f ribonucleoprotein (RNP) complexes produced from the modified locus or by combining Escherichia coli lysate containing Un1Cas12f1 (Cas14a1) protein with T7 transcribed sgRNA (20 nt spacer). First, E. coli DH5␣ or ArcticExpress (DE3) cells were transformed with CRISPR-Cas12f encoding plasmids (pACYC or pET-duet1 and pLBH531, respectively) and cultures grown in LB broth (30 ml) supplemented with either chloramphenicol (25 g/ml) (pA-CYC plasmids) or ampicillin (100 g/ml) (pET-duet1 and pLBH531 plasmids). Next, for plasmids with a T7 promoter (pET-duet1 and pLBH531 plasmids), expression was induced with 0.5 mM IPTG when cultures reached OD 600 of 0.5 and incubated overnight at 16 • C. Cells (from 10 ml) were collected by centrifugation and re-suspended in 1 ml of lysis buffer (20 mM phosphate, pH 7.0, 0.5 M NaCl, 5% (v/v) glycerol) supplemented with 10 l PMSF (final conc. 2 mM) and lysed by sonication. Cell debris was removed by centrifugation. 10 l of the obtained supernatant containing RNPs was used directly in the digestion experiments. For Un1Cas12f1, 20 l of clarified supernatant was combined with 1 l of RiboLock RNase Inhibitor (Thermo Fisher Scientific) and 2 g of sgRNA and allowed to complex with the clarified lysate as described below.
Cas12f RNP complexes were used to cleave either the 7 bp randomized PAM library or a plasmid containing a fixed PAM and gRNA target. Briefly, 10 l of Cas12f-gRNA RNP containing lysate was mixed with 1 g of PAM library or 1 g of plasmid containing a single PAM and gRNA target in 100 l of reaction buffer (10 mM Tris-HCl, pH 7.5 at 37 • C, 100 mM NaCl, 1 mM DTT and 10 mM MgCl 2 ). After 1 h incubation at 37 • C, DNA ends were repaired by adding 1 l of T4 DNA polymerase (Thermo Fisher Scientific) and 1 l of 10 mM dNTP mix (Thermo Fisher Scientific) and incubating the reaction for 20 min at 11 • C. The reaction was then inactivated by heating it up to 75 • C for 10 min and 3 -dA overhangs added by incubating the reaction mixture with 1 l of DreamTaq polymerase (Thermo Fisher Scientific) and 1 l of 10 mM dATP (Thermo Fisher Scientific) for 30 min at 72 • C. Additionally, RNA was removed by incubation for 15 min at 37 • C with 1 l of RNase A (Thermo Fisher Scientific). Following purification with a GeneJet PCR Purification column (Thermo Fisher Scientific), the end repaired cleavage products (100 ng) were ligated with a double-stranded DNA adapter containing a 3 -dT overhang (100 ng) for 1 h at 22 • C using T4 DNA ligase (Thermo Fisher Scientific). After ligation, cleavage products were PCR amplified appending sequences required for deep sequencing and subjected to Illumina sequencing (22,23).
Double-stranded DNA target cleavage was evaluated by examining the unique junction generated by target cleavage and adapter ligation in deep sequence reads. This was accomplished by first generating a collection of sequences representing all possible outcomes of dsDNA cleavage and adapter ligation within the target region. For example, cleavage and adapter ligation at just after the 21st position of the target would produce the following sequence (5 -CCGCTCTTCCG ATCTGCCGGCGACGTTGGGTCAACT-3 ) where the adapter and target sequences comprise 5 -CCGCTCT TCCGATCT-3 and 5 -GCCGGCGACGTTGGGTCAA CT-3 , respectively. The frequency of the resulting sequences was then tabulated using a custom Perl script (provided at https://github.com/cortevaCRISPR/Cas12f-InformaticsTools.git) and compared to negative controls (experiments setup without functional Cas12f complexes) to identify target cleavage.
Evidence of PAM recognition was evaluated as described previously (22,23). Briefly, the sequence of the protospacer adapter ligation exhibiting an elevated frequency in the previous step was used in combination with a 10 bp sequence 5 of the 7 bp PAM region to identify reads that supported dsDNA cleavage. Once identified, the intervening 7 bp PAM sequence was isolated by trimming away the 5 and 3 flanking sequences using a custom Perl script (provided at https://github.com/ cortevaCRISPR/Cas12f-InformaticsTools.git) and the frequency of the extracted PAM sequences normalized to the original PAM library to account for inherent biases using the following formula. Following normalization, a position frequency matrix (PFM) (24) was calculated and compared to negative controls (experiments setup without functional Cas12f complexes) to look for biases in nucleotide composition as a function of PAM position. Biases were considered signifi-cant and indicative of PAM recognition if they deviated by >2.5-fold from the negative control. Analyses were limited to the top 10% most frequent PAMs to reduce the impact of background noise resulting from non-specific cleavage coming from other components in the E. coli cell lysate mixtures.

Expression and purification of an Un1Cas12f1 (Cas14a1) protein
Un1Cas12f1 (Cas14a1) protein was expressed in E. coli BL21(DE3) strain from the pLBH531 MBP-Cas14a1 plasmid (gift from Jennifer Doudna, Addgene plasmid #112500). Un1Cas12f1 D326A and Un1Cas12f1 D510A expression plasmids were engineered from pLBH531 using Phusion Site-Directed Mutagenesis Kit (Thermo Fisher Scientific). Escherichia coli cells were grown in LB broth supplemented with ampicillin (100 g/ml) at 37 • C. After culturing to an OD 600 of 0.5, temperature was decreased to 16 • C and expression induced with 0.5 mM IPTG. After 16 h cells were pelleted, re-suspended in loading buffer (20 mM Tris-HCl, pH 8.0 at 25 • C, 1.5 M NaCl, 5 mM 2-mercaptoethanol, 10 mM imidazole, 2 mM PMSF, 5% (v/v) glycerol) and disrupted by sonication. Cell debris was removed by centrifugation. The supernatant was loaded on the Ni 2+ -charged HiTrap chelating HP column (GE Healthcare) and eluted with a linear gradient of increasing imidazole concentration (from 10 to 500 mM) in 20 mM Tris-HCl, pH 8.0 at 25 • C, 0.5 M NaCl, 5 mM 2-mercaptoethanol buffer. The fractions containing Un1Cas12f1 variants were pooled and subsequently loaded on HiTrap heparin HP column (GE Healthcare) for elution using a linear gradient of increasing NaCl concentration (from 0.1 to 1.5 M). The fractions containing the protein of interest were pooled and the 10×His-MBP-tag was cleaved by overnight incubation with TEV protease at 4 • C. To remove the cleaved 10×His-MBP-tag and TEV protease, reaction mixtures were loaded onto a HiTrap heparin HP 5 column (GE Healthcare) for elution using a linear gradient of increasing NaCl concentration (from 0.1 to 1.5 M). Next, the elution from the HiTrap heparin column was loaded on a MBPTrap column (GE Healthcare) and the Un1Cas12f1 proteins were collected in the flow-through. The collected fractions with Un1Cas12f1 were then dialyzed against 20 mM Tris-HCl, pH 8.0 at 25 • C, 500 mM NaCl, 2 mM DTT and 50% (v/v) glycerol and stored at -20 • C. The sequences of the Un1Cas12f1 proteins are listed in Supplementary Data S1 file.

DNA substrate generation
Plasmid DNA substrates were generated by cloning oligoduplexes assembled after annealing complementary Nucleic Acids Research, 2020, Vol. 48, No. 9 5019 oligonucleotides (Metabion) into pUC18 plasmid over HindIII (Thermo Fisher Scientific) and EcoRI (Thermo Fisher Scientific) restriction sites. The sequences of the inserts are listed in Supplementary Data S1 file and the links to the plasmid sequences are provided in Supplementary  Table S2.
To generate radiolabeled DNA substrates, the 5 -ends of oligonucleotides were radiolabeled using T4 PNK (Thermo Fisher Scientific) and [␥ -33P]ATP (PerkinElmer). Duplexes were made by annealing two oligonucleotides with complementary sequences at 95 • C following slow cooling to room temperature. A radioactive label was introduced at the 5end of individual DNA strands before annealing with the unlabeled strands. The sequences of the oligoduplexes are provided in Supplementary Data S1 file.

DNA substrate cleavage assay
Plasmid DNA cleavage reactions were initiated by mixing plasmid DNA with Un1Cas12f1 (Cas14a1) RNP complex at 46 • C. The final reaction mixture typically contained 3 nM plasmid DNA, 100 nM Un1Cas12f1 RNP complex in 2.5 mM Tris-HCl, pH 7.5 at 37 • C, 25 mM NaCl, 0.25 mM DTT and 10 mM MgCl 2 reaction buffer. Aliquots were removed at timed intervals (30 min if not indicated differently) and mixed with 3× loading dye solution (0.01% Bromophenol Blue and 75 mM EDTA in 50% (v/v) glycerol) and reaction products were analyzed by agarose gel electrophoresis and ethidium bromide staining.

M13 cleavage assay
M13 ssDNA cleavage reactions were initiated by mixing M13 ssDNA (New England Biolabs) and DNA activator with Un1Cas12f1 (Cas14a1) RNP complex at 46 • C. Cleavage assays were conducted in 2.5 mM Tris-HCl, pH 7.5 at 37 • C, 25 mM NaCl, 0.25 mM DTT and 10 mM MgCl 2 . The final reaction mixture contained 5 nM M13 ssDNA, 100 nM ssDNA or dsDNA activator and 100 nM Un1Cas12f1 RNP. The reaction was initiated by addition of Un1Cas12f1 RNP complex and was quenched at timed intervals (0, 5, 15, 30, 60 and 90 min) by mixing with 3× loading dye solution (0.01% Bromophenol Blue and 75 mM EDTA in 50% (v/v) glycerol). Products were separated on an agarose gel and stained with SYBR Gold (Thermo Fisher Scientific). The sequences of the activators are listed in Supplementary Data S1 file.

Plasmid interference assay
Plasmid interference assays were performed in E. coli Arctic Express (DE3) strain bearing Cas12f systems (plasmids encoding CRISPR-Cas12f systems are listed in Supplementary Table S2). For Un1Cas12f1 (Cas14a1), E. coli BL21 (DE3) strain was transformed with pGB53 plasmid, which was engineered from the pLBH545 Tet-Cas14a1 Locus plasmid (gift from Jennifer Doudna, Addgene plasmid #112501) by removing tracrRNA and CRISPR array with Bsp1407I and AvrII and adding sgRNA encoding sequence with T7 promoter, HDV ribozyme and terminator sequences. The cells were grown at 37 • C to an OD 600 of ∼0.5 and electroporated with 100 ng of low copy number pSC101 target plasmids obtained by cloning oligoduplexes over EcoRI and XhoI or EcoRI and NheI restriction sites into pTHSSe 1 (gift from Christopher Voigt, Addgene plasmid #109233) or pSG4K5 (gift from Xiao Wang, Addgene plasmid #74492) plasmids, respectively (the sequences of the inserts are listed in Supplementary Data S1 file and the links to the plasmid sequences are provided in Supplementary Table S2). The co-transformed cells were further diluted by serial 10× fold dilutions and grown at 37 • C for 16-20 h on plates containing inductor and antibiotics. For Un1Cas12f1 -AHT (50 ng/ml), IPTG (0.5 mM) and chloramphenicol (30 g/ml); Cas12f2 from Micrarchaeota archaeon (Mi1) previously named Cas14b4 (21)gentamycin (10 g/ml) and carbenicillin (100 g/ml); for all other Cas12f proteins -IPTG (0.5 mM), gentamycin (10 g/ml) and carbenicillin (100 g/ml) were used.

DNA targeting requirements by Cas12f proteins
To our knowledge, just a single Cas12f protein, Un1Cas12f1 (Cas14a1), has been experimentally characterized where it was shown to only cleave ssDNA targets in a PAMindependent manner (21). In our experimentation, shared structural features with other type V effectors capable of dsDNA target cleavage (Supplementary Figure S1) motivated us to evaluate Cas12f family proteins for dsDNA cleavage activity. To establish DNA cleavage requirements, we initially tested a Cas12f2 protein from Micrarchaeota archaeon (Mi1) previously named Cas14b4 (Supplementary Table S1) (21). First, spacers capable of targeting a randomized PAM plasmid library (22) were incorporated into its CRISPR array. The resulting modified CRISPR-Mi1Cas12f2 locus was synthesized, cloned into a low copy plasmid, and transformed into E. coli. A PAM determination assay (22,23) was adapted to test the ability of the Mi1Cas12f2 to recognize and cleave a dsDNA target in vitro (Figure 2A). This was accomplished by combining clarified lysate containing Mi1Cas12f2 protein and gRNAs expressed from the reengineered locus with the PAM library. Next, DNA breaks were captured by doublestranded adapter ligation, enriched by PCR, and deep sequenced as described previously (22,23). DNA cleavage occurring in the target sequence was evaluated by scanning regions in the protospacer for elevated frequencies of adapter ligation relative to negative controls (experiments using lysate from E. coli not transformed with the Mi1Cas12f2  Figure S2B). To confirm the PAM sequence and the dsDNA cleavage position, a plasmid was constructed containing a target adjacent to the identified 5 -TTAT-3 PAM sequence and subjected to cell lysate cleavage experiments. To increase Mi1Cas12f2 concentration in the cell lysate, a higher copy number DNA expression plasmid equipped with an inducible T7 promoter was also utilized. Sequencing of the target plasmid cleavage products confirmed cleavage at the 21st position (Supplementary Figure  S2C). Reactions using deletion variants further confirmed that Mi1Cas12f2 was the sole endonuclease required for the observed dsDNA target recognition and cleavage activity (Supplementary Figure S2C).
Four additional Cas12f proteins from the Cas14 family (21) were also evaluated. These consisted of two Cas12f1 proteins from uncultured archaeons (Un1 and Un2 which were previously named Cas14a1 and Cas14a3, respectively) and two Cas12f2 proteins from Micrarchaeota archaeon (Mi2) and Aureabacteria bacterium (Au) ( Supplementary  Table S1). First, for Un2Cas12f1, Mi2Cas12f2, and Au-Cas12f2, T7 inducible expression plasmids were synthesized (Supplementary Table S2). They contained a minimal locus: the cas12f gene, sequence encoding a putative tracrRNA between the nuclease gene and CRISPR repeats, and a CRISPR array modified to target the PAM library. Then, similar to Mi1Cas12f2 experimentation, E. coli lysate from cells expressing the Cas12f nuclease and guide RNAs was combined with a randomized PAM library. Cleavage products were then captured and analyzed as described for Mi1Cas12f2 (Figure 2A). For Un1Cas12f1, a modified approach was used. Here, an in vitro transcribed single guide RNA (sgRNA) capable of targeting the PAM library was combined with E. coli lysate containing Un1Cas12f1 protein and then assayed for dsDNA target recognition and cleavage as described above. Similarly to Mi1Cas12f2, a cleavage signal distal to the PAM region was detected for all proteins. For Cas12f2 nucleases, this occurred just after the 21st protospacer position (for AuCas12f2, a second cleavage site was also observed after the 23rd position) while for Cas12f1 proteins the signal was found after the 24th position (Supplementary Figures S3A, S4A, S5A and S6A). Analysis of the sequences supporting cleavage yielded 5 Trich PAM recognition like that recovered for Mi1Cas12f2 ( Figure 2B and Supplementary Figures S2B, S3B, S4B, S5B and S6B).
Next, to further explore the DNA cleavage requirements of miniature CRISPR-Cas effectors, we sought to evaluate the dsDNA cleavage activity of proteins from an uncharacterized putative group of class 2 CRISPR-Cas systems, type V-U3, which lack the proteins involved in adaptation (2,7). First, PSI-BLAST searches were performed to identify a group of CRISPR-associated proteins primarily from bacteria, in particular, lineages of Clostridia and Bacilli, that belonged to the type V-U3 family. They contained a conserved C-terminal tri-split RuvC domain similar to other Cas12 nucleases and a short variable N-terminal sequence as observed for the Cas12f proteins from Cas14 family. Next, five members were selected for functional characteri-  Table S1). In general, they were chosen to represent the uncovered diversity and ranged in size between 422-497 amino acids. As described for Cas12f proteins above, minimal CRISPR loci containing the nuclease gene, putative tracrRNA encoding region and CRISPR array reprogrammed to target the PAM library were then synthesized, cloned into an inducible expression plasmid (Supplementary Table S2) and examined for dsDNA target recognition and cleavage. As shown in Supplementary Figures S7A, S8A, S9A, S10A and S11A, all produced cleavage around the 24th position 3 of the PAM region. For nucleases from Parageobacillus thermoglucosidasius (Pt) and Acidibacillus sulfuroxidans (As), secondary cleavage signals (>5% of all reads) were also recovered either before or after the 24th position. Like Cas12f proteins from Cas14 family, type V-U3 nucleases cleaved the dsDNA library in a 5 PAM-dependent manner and altogether expanded miniature Cas nucleases PAM diversity to encompass not only T-rich but also C-rich motifs ( Figure 2C and Supplementary  Figures S7B, S8B, S9B, S10B and S11B).

Biochemical characterization of Un1Cas12f1 (Cas14a1) mediated dsDNA cleavage
Since a sgRNA solution was available for Un1Cas12f1 (Cas14a1) (21) we further probed the Un1Cas12f1 protein for programmable dsDNA cleavage using purified components. First, to decipher optimal reaction conditions, the effect of sgRNA spacer length, temperature, salt concentration and divalent metal ions were evaluated on Un1Cas12f1 dsDNA cleavage activity ( Supplementary Figure S12). Experiments revealed that Un1Cas12f1 ribonucleoprotein (RNP) complex is a Mg 2+ -dependent endonuclease that functions best in low salt concentrations (5-25 mM) and is active in a wide temperature range ∼37-50 • C, with a temperature optimum of ∼46 • C (Supplemen-tary Figure S12). Furthermore, sgRNA spacers of around 20 nt supported the most robust dsDNA cleavage activity (Supplementary Figure S12). Under optimized reaction conditions, supercoiled (SC) plasmid DNA containing a target sequence flanked by an Un1Cas12f1 PAM (5 -TTTA-3 ) was completely converted to a linear form (FLL) indicating the formation of a dsDNA break ( Figure 3A). Additionally, cleavage of linear DNA yielded DNA fragments of expected size, further validating Un1Cas12f1 mediated dsDNA break formation ( Figure 3A). Next, we confirmed that Un1Cas12f1 requires both PAM and sgRNA recognition to cleave a dsDNA target (Supplementary Figure S13). Finally, alanine substitution of conserved RuvC active site residues (21) abolished cutting activity, confirming that the RuvC domain is essential for the observed dsDNA cleavage activity ( Figure 3B).
The type of dsDNA break generated by Un1Cas12f1 was examined next. Using run-off sequencing, we observed that Un1Cas12f1 makes 5 staggered overhanging DNA cutsites. Cleavage predominantly occurred centered around positions 20-24 bp in respect to PAM sequence ( Figure  3C) and was independent of spacer length or plasmid topology (Supplementary Figure S14). The cleavage pattern of Un1Cas12f1 was also assessed on synthetic doublestranded oligodeoxynucleotides. As illustrated in Figure 3D and Supplementary Figure S15, a 5 staggered cut pattern, albeit with less strictly defined cleavage positions than observed with larger DNA fragments was seen.
Next we investigated if the non-specific ssDNA degradation activity of Un1Cas12f1 could be induced not just by ssDNA targets (21) but also by dsDNA targets. First, the ability of Un1Cas12f1 to indiscriminately degrade singlestranded M13 DNA in the presence of a ssDNA target without a PAM was confirmed ( Figure 3E). Then, a dsDNA target containing a 5 PAM and sgRNA target for Un1Cas12f1 was also tested for its ability to trigger non-selective ssDNA degradation. As shown in Figure 3E, the trans-acting ss-DNase activity of Un1Cas12f1 was activated by both ss-DNA and dsDNA targets, similar to observations made for Cas12a (12). Additionally, in the absence of a target, the Un1Cas12f1 RNP complex non-selectively degraded singlestranded M13 DNA ( Figure 3E) as well as synthetic singlestranded DNA oligodeoxynucleotides ( Supplementary Figure S15) albeit at much slower rates suggesting that it also possesses a non-specific single-stranded DNA nuclease activity.

Cas12f mediated plasmid DNA interference in E. coli
We next tested if CRISPR-Cas12f systems (from Cas14 and type V-U3 protein effector families) can be programmed to target and cleave invading dsDNA in a heterologous E. coli host. First, an E. coli plasmid DNA interference assay was adopted (25,26) using a minimal Cas12f CRISPR locus modified to target the incoming low copy number plasmid DNA. To assess transformation efficiency, each experiment was serially diluted by 10× and compared with controls (experiments performed with a plasmid that does not contain a target site). Surprisingly, none of the Cas12f nucleases from Cas14 protein family interfered with plasmid DNA transformation as evidenced by similar recovery of resis-  tant colonies compared to controls ( Figure 4A). These findings, however, are in agreement with a previous study which showed that Un1Cas12f1 (Cas14a1) was incapable of depleting PAM plasmid libraries in a heterologous E. coli host and presumably due to this reason failed to detect PAMdependent dsDNA cleavage (21). In contrast, type V-U3 effectors from A. sulfuroxidans (As) and Syntrophomonas palmitatica (Sp) both produced notable levels of plasmid interference and slight interference was also observable for nucleases from P. thermoglucosidasius (Pt) and Ruminococcus sp. (Ru) ( Figure 4B).

DISCUSSION
Contrary to previous hypotheses suggesting that Cas12f proteins from Cas14 and type V-U3 CRISPR families lack the ability to defend against invading dsDNA (7,21), we illustrate that despite their miniature size some of these nucleases have this capacity. Here, using biochemical approaches, we provide evidence demonstrating that these compact proteins are programmable enzymes capable of introducing targeted dsDNA breaks like larger effectors, Cas9 and Cas12. First, we uncovered PAM-dependent dsDNA cleavage activity similar to other type V interference proteins (13,14,27) for 10 Cas12f family members. Then, we purified the nuclease and sgRNA for Un1Cas12f1 (Cas14a1) and reconstituted its nuclease activity in vitro confirming that both PAM and sgRNA recognition are required for dsDNA target cleavage. Additionally, we observed that ssDNA collateral nuclease activity was triggered not only by ssDNA targets (21) but also by dsDNA targets, a feature shared by most other type V family members (12,13). Finally, using plasmid DNA interference assays, we report that like shown previously (21), Cas12f effectors from the Cas14 family failed to protect against invading dsDNA in E. coli, while even more compact CRISPR-associated type V-U3 nucleases, AsCas12f1 (422 aa) and SpCas12f1 (497 aa), efficiently interfered with plasmid transformation. Taken together, this confirms that at least some Cas12f effectors function like Cas9 and Cas12 nucleases to recognize, cleave, and protect against dsDNA invaders in a heterologous host. In all, our results significantly improve our understanding of novel CRISPR-Cas systems and pave the way for the adoption of programmable miniature nucleases for genome editing applications.