CRISPR-Cas systems are widespread accessory elements across bacterial and archaeal plasmids

Abstract Many prokaryotes encode CRISPR-Cas systems as immune protection against mobile genetic elements (MGEs), yet a number of MGEs also harbor CRISPR-Cas components. With a few exceptions, CRISPR-Cas loci encoded on MGEs are uncharted and a comprehensive analysis of their distribution, prevalence, diversity, and function is lacking. Here, we systematically investigated CRISPR-Cas loci across the largest curated collection of natural bacterial and archaeal plasmids. CRISPR-Cas loci are widely but heterogeneously distributed across plasmids and, in comparison to host chromosomes, their mean prevalence per Mbp is higher and their distribution is distinct. Furthermore, the spacer content of plasmid CRISPRs exhibits a strong targeting bias towards other plasmids, while chromosomal arrays are enriched with virus-targeting spacers. These contrasting targeting preferences highlight the genetic independence of plasmids and suggest a major role for mediating plasmid-plasmid conflicts. Altogether, CRISPR-Cas are frequent accessory components of many plasmids, which is an overlooked phenomenon that possibly facilitates their dissemination across microbiomes.


INTRODUCTION
Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated (cas) genes encode adaptive immune systems that provide prokaryotes with sequencespecific protection against viruses, plasmids, and other mobile genetic elements (MGEs) (1). These systems consist of two main components: (i) a CRISPR array, which is a DNA memory bank composed of sequences derived from previous infections by MGEs, and (ii) cas genes that encode the protein machinery that is necessary for the three stages of immunity (adaptation, RNA biogenesis and interference) (2). Briefly, during adaptation, short sequence fragments from the genomes of invading MGEs are integrated at the CRISPR leader end as new 'spacers' flanked directly by repeats in the array. Biogenesis involves expression of the CRISPR array as a long transcript (pre-crRNA) and its subsequent processing into mature CRISPR RNAs (crRNAs), each corresponding to a single spacer. Finally, during interference, the mature crRNAs are coupled with one or multiple Cas proteins in search of a complementary sequence (protospacer), leading to the nuclease-dependent degradation of target nucleic acids.
CRISPR-Cas systems are broadly distributed across the genomes of about 42% of bacteria and 85% of archaea (3). Despite the aforementioned commonalities, these systems display remarkable diversity in their mechanisms of action and in the phylogeny of their components. They are divided into two major classes, six types and more than 45 subtypes on the basis of the distinct architectures and the organization of their effector modules (3,4). Previous work has focused primarily on investigating the canonical adaptive immune functions of CRISPR-Cas systems, their distributions across prokaryotic lineages, and their numerous biotechnological applications (5,6). Although much less attention has been paid to their presence and function in MGEs, recent research demonstrates that CRISPR-Cas loci are encoded by different types of MGEs (7). Several viruses, transposons, and plasmids have been shown to carry CRISPR-Cas components that perform different roles, including participating in inter-MGE warfare (4,(8)(9)(10)(11)(12), RNA-guided DNA transposition (13)(14)(15) and in anti-defense functions (7,16).
Plasmids are extrachromosomal, self-replicating MGEs that are ubiquitous across microbiomes on Earth. They are known to shape the ecology and evolution of microbial communities by, for example, promoting horizontal gene transfer (HGT) between taxa (17,18). Although the fates of plasmids are linked to those of their microbial hosts, plasmids and host chromosomes are subject to distinct selective constraints and follow different evolutionary trajectories (19,20). Despite the beneficial traits that some plasmids provide to their hosts under certain conditions (e.g. antibiotic or heavy metal resistance), they can also impose a physiological burden. Thus, plasmid-host relationships are often dynamic and, depending on the ecological context, extend from parasitic to mutualistic (19). Epitomizing the existence of plasmid-host conflicts, a fraction of chromosomal CRISPR spacers typically match plasmids (21)(22)(23). Furthermore, several studies have reported experimental evidence for strong CRISPR-based anti-plasmid immunity (24)(25)(26). In turn, many plasmids carry Anti-CRISPR proteins that block host CRISPR-Cas targeting (27,28).
Even though some plasmids have been reported to encode CRISPR-Cas loci (7,(29)(30)(31)(32)(33)(34)(35)(36) their incidence, diversity, distribution and function(s) remain largely unstudied. Type IV CRISPR-Cas systems, in particular, are found almost exclusively on plasmids (3,7,37) and recent work indicates that they participate in plasmid-plasmid competition dynamics (4,11). Furthermore, a study analyzing CRISPR-Cas systems across a large subset of prokaryotic genomes identified several plasmid-encoded CRISPR-Cas loci, whereas very few were encoded by associated (pro)phages (34). Here, we undertook the first systematic investigation of CRISPR-Cas contents across publicly available bacterial and archaeal plasmid data. We focused on analysing their prevalence, distribution and diversity, and investigated their CRISPR array spacer contents to infer their biological functions.

Software and code availability
Scripts for downloading data and reproducing all analyses are available at https://github.com/Russel88/ CRISPRCas on Plasmids. Analyses were made with a combination of shell, python 3, and R 3.6.3 scripting. Plots were made with ggplot2, heatmaps with pheatmap, phylogenetic trees with iTOL (38), and networks with gephi (39).

Dataset construction
A total of 27 939 complete bacterial plasmid sequences were downloaded from PLSDB 2020 11 19 (https://ccbmicrobe.cs.uni-saarland.de/plsdb) (40), together with their associated metadata (40). A total of 253 manually curated archaeal plasmids were downloaded from NCBI RefSeq on 6 January 2020. Plasmid-host chromosome associations were determined through the NCBI assembly information, for which only sequences annotated as 'chromosome' were included as host sequences. Using this approach, we were able to assign a host for 21 974 of the plasmids. The number of archaeal plasmids selected is relatively low because few archaeal plasmids have been characterised and sequenced. We used GTDBtk v1.4.1 (41) to re-annotate the taxonomy of the host of each plasmid in a common phylogenomic framework. To filter out redundant plasmids, they were dereplicated using dRep version 3.1.0 (42) with the following parameters: 90% ANI cut-off for primary clustering, 95% ANI cut-off for secondary clustering and a total coverage of 90%, with fastANI (43) as secondary clustering algorithm. Size was the only criterion used to choose the plasmid to include in each cluster, such that the largest plasmid (or random among these given ties) was picked among the clustered plasmids. Dereplication resulted in a total of 17 828 plasmids, out of which 13 265 could be associated with known prokaryotic hosts.

Identification of CRISPR loci
Detection of CRISPR arrays was carried out by using CRISPRCasFinder 4.2.17 (44), coupled to an optimized algorithm for false-positive array removal (Supplementary Figure S1) and an additional analysis for finding CRISPR loci that are commonly missed by this algorithm. Briefly, high confidence arrays predicted by CRISPRCasFinder (evidence level 4) were automatically kept. The remaining arrays were binned into a 'quarantine list' if they were found to clear a series of conservative manually-curated parameter cutoffs: (i) calculated average CRISPR repeat conservation across the array >70%, (ii) spacer conservation <50%, (iii) standard error of the mean of the array's spacer lengths <3 and (iv) array does not overlap with an open reading frame (ORF) with a prediction confidence of at least 90% (45). Putative arrays from the quarantined list were rescued for further analysis if they were found within 1 kb to a predicted cas gene or matched (95% coverage and 95% identity) with any previously defined high confidence CRISPR repeat: CRISPRCasFinder evidence level 4 or archived in CRISPRCasdb (46). This upgrade reduced the rate of detection of false positive CRISPRs, most of which constitute short repetitive genomic regions that are erroneously selected by CRISPRCasFinder (47), and which are more common on plasmids (e.g. iterons and tandem transposonassociated repeats) (48)(49)(50). High confidence CRISPR repeats (see above) were then BLASTed (task: blastn-short, 95% coverage and 95% identity) to a database in which the CRISPR loci that were already detected were masked and any matches within 100 bp were clustered into arrays. Arrays with less than three repeats were excluded from all analyses.

Identification and typing of cas loci
The prediction and classification (at the subtype or variant level) of cas operons was carried out by CRISPRCasTyper  (51). CRISPR arrays closer than 10 kb to the nearest cas operon were considered to be linked; the 10 kb cutoff was based on an analysis of the distribution of distances of CRISPR arrays to the closest cas operon (Supplementary Figure S2). Furthermore, we used CRISPR-repeat similarity information to type arrays that were not found linked to cas operons. These distant arrays (>10 kb from the nearest cas operon) were considered associated with a cas operon if the direct repeat sequence was at least 85% identical to the direct repeat sequence of an array adjacent to that cas operon (Supplementary Figure S3). When possible, CRISPR-Cas systems annotated as 'Ambiguous' were manually subtyped. The identified CRISPR-Cas loci on plasmids, plasmid-associated host chromosomes and related information are found in Supplementary Datasheet S1.

Indicator analysis
Enrichment of certain CRISPR-Cas subtypes on either plasmids or host chromosomes was investigated with an indicator species analysis, using the indicspecies R package.
For the comparison between all plasmids and chromosomes the IndVal.g statistic was used, which controls for difference in group sizes. For the direct comparison between plasmids and hosts chromosomes, where both carry CRISPR-Cas, the IndVal statistic was used. Statistical significance was determined by permutation (n = 9999) and a Bonferroni adjusted P-value threshold of 0.05 was used.

Plasmid conjugative transfer and incompatibility group prediction
The conjugative transfer functions and incompatibility (Inc) typing of all plasmids in PLSDB was predicted with MOB-suite v3.0.1 using mob typer function (52) using default parameters.

Spacer-protospacer match analysis
The genomic regions where CRISPR arrays were identified on plasmids (including CRISPR arrays with two repeats, which were otherwise excluded from the analyses) were masked in order to avoid false positive matches to spacers in arrays. Furthermore, for matches to plasmids only matches to high confidence ORFs were included, also to rule out any matches to possibly undetected CRISPR arrays. Spacers from orphan arrays whose consensus repeat could not be typed by repeatTyper from CRISPRCasTyper (https://typer.crispr.dk, model version 2021 03 (51)) were excluded from the spacer analysis to avoid any bias stemming from possible false positive arrays in this group. Viral genomes were obtained from the IMG/VR v3 (2020-10-12 5.1, (53)) only including those annotated as 'Reference', which includes 39 296 viral genomes. Spacer sequences from plasmids and plasmid-associated host chromosomes were aligned against the masked dereplicated plasmid database and the virus database using FASTA 36.3.8e (54). Alignments were filtered using an e-value cutoff of 0.05. To reduce redundancy bias, spacers were only counted once, no matter the absolute number of matches.
Networks were visualized in gephi with layout generated by a combination of OpenOrd and Noverlap algorithms. For calculating taxonomic confinement of spacerprotospacer matches between plasmids, each pair of plasmids connected by at least one spacer-protospacer match was counted as one matching pair. Cross-targeting plasmids were included as two separate plasmid pairs. Confinement was calculated as the number of matches found exclusively within a specific taxonomic rank, such that each plasmidplasmid pair was only counted once. For estimating confinement of random spacer-protospacer matching, the taxonomic annotations were permuted among the plasmidplasmid pairs with observed spacer-protospacer matches. This was repeated 100 times and the median number of matches was used as an estimate of confinement for hypothetically random matches. For estimating targeting bias towards conjugative versus non-conjugative plasmids each unique spacer was counted with a weight of 1 with the targeting bias proportional to the number of matches to conjugative and non-conjugative plasmids, respectively. For example, a spacer matching four conjugative plasmids and one non-conjugative plasmids is counted as 0.8 for conjugative matches and 0.2 for non-conjugative matches. The spacer-protospacer matches identified for plasmid and associated host chromosome-derived CRISPR array contents are found in Supplementary Datasheet S2.

CRISPR-Cas systems are common on plasmids
We scanned the largest curated collection of complete wildtype bacterial (27 939) and archaeal (253) plasmid genomes in search of CRISPR and cas loci. To reduce the confounding effect of sequencing biases, we removed identical or highly similar plasmids from further analyses. This resulted in a non-redundant dataset of 17 608 bacterial and 220 archaeal plasmid sequences, spanning 30 phyla and 771 genera. For a total of 13 265 non-redundant plasmids, we were able to collect the corresponding set of host chromosome sequences (n = 6979). Overall, our survey identified a total of 338 complete and 313 putatively incomplete loci (207 orphan CRISPR arrays and 106 orphan cas), indicating that ∼3% of sequenced plasmids naturally carry one or more CRISPR and/or cas loci ( Figure 1A, top). This contrasts with the much higher incidence we found on the plasmidassociated host chromosomes, which amounted to 42.3% (42% in bacteria and 63% in archaea). However, since chromosomes are substantially larger than plasmids, we corrected their incidence to genome sequence length (per Mbp) (55). Strikingly, we found that CRISPR-Cas components are on average more prevalent across plasmid sequences ( Figure 1A, bottom), suggesting a selective advantage for many plasmids to carry these systems.
Whereas most detected loci represent complete CRISPR-Cas systems, solitary (orphan) CRISPR arrays and cas operons were also commonly identified. These putatively incomplete systems are more frequent on plasmids than chromosomes ( Figure 1A). Intriguingly, the average lengths of orphan arrays are significantly smaller than cas-associated CRISPRs (on average 39% shorter, P < 2e-16, negativebinomial generalized linear model; Supplementary  other' represents systems that could not be unambiguously assigned (e.g. co-localised/hybrid systems and orphan-untyped components). 'I-F T' refers to the transposon-associated subtype I-F variant and subtype IV-A is subdivided into its known variants (IV-A1 to A3). Total counts per CRISPR-Cas type are summarised on the left. The horizontal discontinuous line (instead of continuous green) indicates that the particular subtypes are not described in the current CRISPR-Cas classification. (C) Prevalence and distribution of CRISPR-Cas loci in plasmid genomes (green) and in plasmid-host chromosomes (purple) across taxa (at class level). Prevalence is expressed as the percentage of replicons carrying CRISPR-Cas. The lane of circles reflects the abundance of publicly available plasmid genome sequences for each plasmid-host class. Taxa are displayed at the tips of a neighbour-joining tree that is built on the basis of the median cophenetic distance (from the whole-genome GTDB phylogeny) between the different classes and rooted by the archaeal clade. The taxonomic differences in prevalence of the different CRISPR-Cas subtypes are displayed as a heat map with log2 ratios between plasmid and chromosome prevalence. Only taxa for which at least 10 non-redundant plasmids are characterised are displayed. Figure S4A), which may reflect the importance of neighboring adaptation modules (cas1-2) for array expansion and maintenance. Furthermore, we found a less frequent association of plasmid-encoded systems with adaptation modules in plasmids compared to chromosomes (36% vs. 88%, respectively; Supplementary Figure S5), yet no significant difference in the array sizes of cas-associated CRISPRs. Although the reasons for the lack of adaptation modules are poorly understood, it is a characteristic feature of many other MGE-encoded CRISPR-Cas systems (e.g. carried by phages and transposons) that is thought to be compensated via in trans use of chromosomally-encoded adaptation machinery (4,7,10,13). Finally, we observed that host chromosomes tend to carry more CRISPR arrays than plasmids; 68% of chromosomes encoding CRISPR have more than 1 array, in contrast to 36% of plasmids (Supplementary Figure S4B). Together, our results underscore a pervasive acquisition of CRISPR-Cas components by plasmids and considerable differences in the composition of plasmid-and chromosome-encoded systems.

Plasmid CRISPR-Cas subtype diversity is rich and distinct from chromosomes
We then sought to investigate the diversity of CRISPR-Cas systems across plasmid genomes. Our analysis revealed a broad range of plasmid-encoded subtypes and marked differences in their abundances ( Figure 1B). Except for type VI, representatives of all CRISPR-Cas types were identified in plasmids. Overall, Class 1 systems dominate the plasmid landscape (e.g. subtypes I-E, I-B, III-B and IV-A3), whereas Class 2 systems are poorly represented, with the notable exception of subtype V-F.
Next, we explored whether the subtype distributions on plasmids differed from those found across plasmidassociated host chromosomes. Inspection of the distribution and prevalence of CRISPR-Cas subtypes on chromosomes revealed notable differences ( Supplementary Figures S6 and S7). An indicator analysis (see Materials and Methods for details) showed that IV-A3, V-F, IV-B, III-B and IV-A1 are significantly enriched subtypes for plasmid genomes when comparing all plasmids and their associated host chromosomes. A direct comparison, including only plasmid-chromosome pairs where both have CRISPR-Cas components, showed that IV-A3 is enriched on plasmids and I-D, V-J and I-F are relatively more prevalent on chromosomes (Supplementary Figure S7). Furthermore, our analyses revealed that the higher abundance of orphan cas loci on plasmids ( Figure 1A) is largely driven by the type IV-B systems which, consistent with previous reports (4,7), are primarily encoded on plasmids and lack CRISPR arrays ( Figure 1B and Supplementary Figure S7). Although relatively infrequent, we found that some individual plasmids carry multiple CRISPR-Cas systems (44 out of 385 cascontaining loci) (Supplementary Figure S8). Among these, combinations involving type I were most common, primarily paired with type III, IV and V, which may reflect functional compatibility between systems and, possibly, synergistic effects (56,57).
We next examined the diversity of CRISPR-Cas systems on plasmids across taxa to determine the possible influ-ence of host phylogeny on their prevalence and subtype distributions. In agreement with previous surveys across prokaryotic genomes (3,58), our analysis revealed that the abundance of CRISPR-Cas is highly variable across host taxonomy ( Figure 1C and Supplementary Figure S9). For instance, while CRISPR-Cas incidence on plasmids from Rhodothermia, Deinococci and Clostridia lies between 19 and 27%, in other taxa their incidence is very low or even zero. Strikingly, the prevalence and diversity of CRISPR-Cas subtypes on plasmids correlates poorly with their abundance across the chromosomes of plasmid-host taxa ( Figure  1C), even when directly comparing the pool of plasmid-host chromosome pairs where both the plasmid and associated host chromosome carry CRISPR-Cas ( Supplementary Figure S9). These results show distinct CRISPR-Cas compositions for plasmids and their associated host chromosomes, a pattern that likely results from the genetic autonomy of plasmids.
It is noteworthy that most available sequenced plasmids are harbored by members of Gammaproteobacteria, Bacilli and Alphaproteobacteria ( Figure 1C), which together represent 84% of all plasmids with a known host. It is therefore important to consider our results in light of this strong inherent database bias, which results from traditionally higher sampling and sequencing rates of cultivable and clinically relevant microbes (59,60). Consequently, given the comparatively rare occurrence of plasmid-encoded CRISPR-Cas in these dominant taxa ( Figure 1C), the calculated averaged prevalence for all plasmid-encoded CRISPR-Cas systems (∼3%) is predicted to be an underestimate of their true representation across environments. Taken together, our results indicate that plasmid-encoded CRISPR-Cas loci are frequent in nature and do not simply mirror those found in their host chromosomes, thereby highlighting the influence of distinct selective pressures that promote the recruitment and retention of specific CRISPR-Cas subtypes on plasmids versus chromosomes.

Plasmids contribute to the horizontal dissemination of CRISPR-Cas
The recently proposed bacterial pan-immune model is based on the idea that defense systems are frequently lost and acquired by community members through HGT (61). Therefore, we investigated whether there is a link between plasmid conjugative transmissibility and CRISPR-Cas presence. We specifically focused on proteobacterial plasmids, since high confidence predictions for conjugative transmissibility are limited to this phylum (59,60) and because proteobacterial plasmids dominate the dataset (62% of all non-redundant plasmid genomes).
We detected an enrichment of conjugative transfer functions within plasmids carrying CRISPR-Cas components (over 47%: average of complete systems and orphan loci; Figure 2A), a higher proportion than for plasmids not encoding CRISPR or cas (∼36%; Fisher's exact test: Pvalue = 5.9e-05; odds-ratio = 2.23). These results support the notion that conjugative plasmids facilitate HGT of CRISPR-Cas systems in the environment and, given the remarkably broad transfer ranges of some proteobacterial plasmids (e.g. IncQ, IncP, IncH and IncN) (  possibly also across distantly related taxa. Less is known about plasmid-transfer modes outside Proteobacteria and their impact on gene exchange networks (59,60). For instance, many plasmids in Gram-positive bacteria transfer via conjugation but their transfer machinery is poorly characterized, thus rendering mobility predictions based on sequence data unreliable (59,66) and highlighting that conjugative plasmids are likely underestimated in our database. Moreover, it is expected that many non-conjugative plasmids transfer horizontally through alternative mechanisms, e.g., via transformation (67), mobilization (68), transduction (69,70), and outer membrane vesicles (71). Therefore, our results underpin the idea that plasmids are major contributors to the active dissemination of CRISPR-Cas systems across microbiomes.

CRISPR-Cas systems are enriched on plasmids of larger sizes
We then sought to examine other biological characteristics of the plasmids and searched especially for common or distinctive patterns shared by CRISPR-Cas-encoding plasmids. We focused on exploring the link between plasmid genome size and the presence of CRISPR-Cas mod-ules. In contrast to the collection of non-CRISPR-Casencoding plasmids--which displayed the previously reported bimodal size distribution (59,60)--plasmids carrying CRISPR-Cas components exhibited unimodal distributions, with the peak shifted towards larger genome sizes (180-250 kb on average) ( Figure 2B). Given the relatively large sizes of CRISPR-Cas systems, a bias towards larger genomes is unsurprising and possibly stems from size-related constraints associated with certain plasmid life history strategies. Larger plasmids allocate considerable portions of their genomes to transfer, stabilization and accessory modules that enhance their persistence (17). This is congruent with the observed enrichment of CRISPR-Cas systems on conjugative plasmids ( Figure  2A), which are known to be relatively large and show a unimodal size distribution centered around 250 kb (59). Similar genomic streamlining dynamics appear to extend to other MGEs, including phages, where complete CRISPR-Cas systems have been reported in huge phages (>500 kb) (10) but rarely occur in the more common, smaller-sized (pro)viral genomes (7,34). In conclusion, our data show that CRISPR-Cas systems are important components of many plasmid accessory repertoires, and are more frequently associated with plasmids of larger sizes.

Highly uneven distribution of CRISPR-Cas across plasmid Incompatibility groups
Next, we examined whether CRISPR-Cas systems in plasmids have short-lived associations or whether we could identify signs of retention by specific plasmid lineages. To this end, a common plasmid classification scheme types plasmids into incompatibility (Inc) groups and is deeply rooted in plasmid eco-evolutionary dynamics, i.e. based on the observation that plasmids sharing replication or partitioning components cannot stably propagate within a given cell host lineage (72). We therefore investigated the distribution and prevalence of CRISPR-Cas-containing plasmids across the Inc-typeable fraction of non-redundant plasmids, which corresponds to 29% of all plasmids (98% of which have a host belonging to Proteobacteria) ( Figure 2C).
Overall, we found that only a reduced number of Inc types (15/50) include plasmids carrying CRISPR-Cas ( Figure 2C and Supplementary Figure S10). Most CRISPR-Cas-encoding plasmids are distinctively concentrated within specific Inc families (e.g. IncH), underscoring the patchy distribution of CRISPR-Cas components across plasmids. Importantly, Inc families are used to infer a degree of genetic relatedness (phylogeny) and ecological cohesiveness, thus typically grouping plasmids that exhibit comparable backbone architecture, host range breadth, propagation mechanism, etc. (59,73). Therefore, our results indicate that some CRISPR-Cas systems are acquired by specific plasmid lineages (i.e. groups of plasmids sharing similar ecological strategies, niches and a related evolutionary trajectory) and are thus maintained stably through evolutionary timescales, presumably due to their adaptive benefits.

Plasmid spacer contents reveal a robust plasmid-targeting bias
We then focused on understanding the possible function(s) of plasmid-encoded CRISPR-Cas systems. CRISPR arrays are uniquely suited to provide ecological and biological insights; the origins of many spacer sequences can be backtracked, providing valuable clues about the functions of CRISPR-Cas and their selective benefits (22,(74)(75)(76). It has been considered that the primary role of chromosomeencoded CRISPR-Cas systems is to protect cells against viruses (22,58,77). This raised the question as to whether plasmid-encoded CRISPR-Cas components reinforce this function, especially given that many plasmids encode genes that enhance the fitness of their hosts against diverse environmental threats (e.g. antimicrobial resistance) (17,78).
All spacer sequences (n = 11 080) were extracted from the bacterial and archaeal plasmid-encoded CRISPR arrays and searched against comprehensive virus and plasmid sequence datasets (Materials and Methods). For comparison, analogous searches were performed with the collection of spacers originating from: (i) the host chromosomes associated with the plasmids in this study (a total of 96 870 spacers) and (ii) plasmid-host chromosome pairs where both the plasmid and associated host chromosome carry at least one CRISPR array (4816 plasmid spacers and 10 315 chromosomal spacers). Only a limited fraction of spacers yielded significant matches to protospacer sequences (plasmids: 11.1%; hosts: 12.9%), consistent with previous studies (4,21,22,75,(79)(80)(81). This is ascribed to a combination of factors, including the paucity of mobilome sequences across public databases and the high mutation rates of MGE protospacers, presumably to escape CRISPR-Cas targeting (21,22,82).
Subsequently, we examined the origins of these protospacer targets. Strikingly, a larger fraction of plasmid spacers matched sequences from other plasmids (66%), while a substantially smaller fraction matched viruses (27%) (Figure 3A). In contrast, the spacer contents originating from plasmid-host chromosomes revealed the opposite trend: a larger proportion of spacers matched viral sequences compared to plasmids (62% and 24%, respectively) (Figure 3B; Supplementary Figure S11)-consistent with a primary antiviral role of chromosomal CRISPR-Cas systems (12,22,58,76,77,83). Importantly, a more direct examination of plasmid-host chromosome pairs (limited to comparisons where both parties carry at least one CRISPR-Cas system) revealed an analogous targeting trend (Supplementary Figure S12A and B).
The abundance of plasmid spacers targeting other plasmids raised the question of whether the reported plasmidtargeting preference of type IV CRISPR-Cas systems (4) could be driving this trend, especially given the abundance of type IV spacers within our dataset (12% of plasmid spacers, yielding 48% of the spacers with any match) (Supplementary Figure S13A). However, we found that the plasmid-targeting bias also held true for the majority of other plasmid-encoded CRISPR-Cas subtypes/variants ( Figure 3C and Supplementary Figure S12C). In contrast, chromosomal spacers maintained a virus-targeting preference, regardless of CRISPR-Cas subtype ( Figure 3C and Supplementary Figure S12C). Furthermore, we found that the plasmid-to-plasmid versus chromosome-to-virus targeting patterns are maintained across the different taxa, implying the existence of a robust biological underpinning of this phenomenon ( Figure 3D and Supplementary Figure S12D). Nevertheless, the plasmid-encoded CRISPRs from certain underrepresented taxa ( Figure 3D and Supplementary Figure S12D) appear to be enriched with virustargeting spacers (e.g. Rhodothernia and Cyanobacteria), suggesting that enhancement of antiviral host immunity could still be an important evolved strategy for some groups of plasmids.

A reticulated web of CRISPR-based plasmid-plasmid targeting
The identification of extensive plasmid-plasmid targeting provides a practical framework for investigating plasmid eco-evolutionary dynamics and offers a unique opportunity to gain insights into HGT routes. This prompted us to build a global network of plasmid-plasmid interactions based on the linkage information provided by the CRISPR-targeting data ( Figure 4A). The corresponding directed graph consists of de-replicated plasmid genomes (nodes), connected by the predicted spacer-protospacer matches (edges). Overall, our analyses revealed a network with a pronounced modular structure, where a reduced number of densely  connected clusters accrue the majority of plasmids, and links between clusters are very sparse. A highly visible trend across the targeting network is the clustering of plasmids according to host taxonomy, with the two largest clusters consisting of plasmids from either Enterobacteriales or Bacillales. However, generalisations based on such a trend should be made with caution and viewed in the context of the his-torical sequencing bias towards plasmids from cultivable and/or clinical strains. For example, inferring that plasmid targeting is a distinctive phenomenon among Enterobacteriales or Bacillales plasmids could be an inaccurate assumption, since plasmids carrying CRISPR-Cas are relatively rare in these taxa ( Figure 1C), despite their sequences comprising the overwhelming majority of sequenced representation of plasmid-plasmid targeting network colored at the host order level. Nodes correspond to individual plasmids and edges represent CRISPR-Cas targeting, based on predicted spacer-protospacer matches. Large and small nodes indicate the presence or absence of CRISPR-Cas in the plasmid, respectively. Edge thickness is proportional to the number of spacer-protospacer matches between plasmid pairs. The phylogeny in the legend is based on the median cophenetic distance from the GTDB whole-genome phylogenies, with the tree inferred by neighbor-joining. 'Other' indicates plasmids without a known host, a host with a different taxonomy than those displayed, or with a host with unspecific taxonomy. An expanded view of plasmid-plasmid targeting within the class Gammaproteobacteria can be found in Supplementary Figure S14. (B) Taxonomic confinement of plasmid-plasmid CRISPR-Cas targeting predictions. The percentage distribution of spacer-protospacer matches restricted within the different plasmid-host taxonomic levels (observed) is compared to their distribution when the taxonomic labels are randomly permuted among the set of targeting plasmid-plasmid pairs (random). (C) Proportion of conjugative plasmids targeted by plasmid and plasmid-host chromosomal spacers. The incidence of conjugative plasmids across PLSDB is shown below. This assessment is restricted to Proteobacterial plasmids. plasmids ( Figure 1C). As more accurate sampling and sequence representation of plasmid diversity becomes available, a more clear understanding of plasmid-plasmid targeting will emerge.
Notably, the clustering analysis demonstrates a pronounced inverse relationship between the number of plasmid connections and the phylogenetic hierarchy of the cognate bacterial hosts. Whereas targeting between plasmids within a single species, genus and family account for the bulk of all predictions (∼42%, 28% and 28%, respectively), matches confined to higher taxa comprise less than 2% ( Figure 4B). Indeed, a closer examination of the plasmidplasmid targeting network in Gammaproteobacteria revealed abundant links between plasmids from different genera (Supplementary Figure S14). These results underscore that taxonomic boundaries represent a major hurdle for plasmid dissemination. Indeed, although some plasmids are able to transfer between distantly-related taxa, their longterm evolutionary host range is primarily constrained to a narrower group of phylogenetically related hosts (73,84). Furthermore, acquisition of spacers from plasmids sharing a similar host range is expected to be more frequent due to the conceivably higher rates of encounters within cells. From a CRISPR-targeting standpoint, spacer retention is also likely influenced by the selective advantage they can provide in plasmid-plasmid competition dynamics.
Given the self-transmissible properties of conjugative plasmids, we wondered whether their effective spread through bacterial populations could render them common targets for CRISPR-Cas compared to non-conjugative plasmids. In support of this, we observed an overrepresentation of plasmid spacers predicted to target conjugative plasmids ( Figure 4C). This may indicate that conjugative invasion is detrimental to plasmids already established in a cell. This is consistent with previous reports of plasmid-encoded mechanisms specifically directed towards preventing the entry of conjugative plasmids (e.g. fertility inhibition strategies and entry exclusion systems) (85). Interestingly, we found that chromosomal spacers showed a relative underrepresentation of spacers targeting conjugative plasmids, suggesting that this type of plasmids may be less detrimental to bacteria, possibly owing to the fitness benefits associated with the adaptive gene cargos that they frequently carry (17,59).

DISCUSSION
The study of CRISPR-Cas biology has primarily focused on chromosomally-encoded systems and their adaptive antiviral functions in bacteria and archaea. While recent work has started to uncover the common association of CRISPR-Cas systems with diverse MGEs and the importance of this phenomenon for CRISPR-Cas ecology and evolution (7), their recruitment by plasmids has remained largely unexplored. Here, we present the first comprehensive analysis of CRISPR-Cas systems across the largest curated dataset of wildtype bacterial and archeal plasmids. We show that CRISPR-Cas components are pervasive accessory components of many plasmids and span a broad diversity of systems, including subtype representatives covering five out of the six known types. Interestingly, we found that certain plasmids carry multiple CRISPR-Cas systems ( Figure 5A). The incidence of plasmid-encoded systems is highly uneven across taxa--ranging from 0 to 30%, but averaging at ∼3%--and the subtype diversity does not simply reflect the CRISPR-Cas contents found in the chromosomes of their host. Our results thus underscore the genetic independence of plasmids and the influence of distinct evolutionary pressures in the acquisition and retention of CRISPR-Cas on plasmids versus their associated host chromosomes.
Intriguingly, putatively incomplete loci were more abundant on plasmids than chromosomes, although less abundant than previously reported (34). It has been suggested that orphan CRISPR arrays and cas loci may be remnants of decaying CRISPR-Cas systems (58). Their relatively higher occurrence on plasmids could indicate that CRISPR-Cas systems erode faster on plasmids, or that orphan components are recruited and/or selectively maintained to perform important, but as yet unknown, biological functions. Orphan CRISPR arrays could, for instance, employ host Cas machinery in trans (86,87) or facilitate plasmid chromosome integration via recombination between plasmid and host-encoded CRISPRs (88,89). On the other hand, the higher proportion of orphan components may be an artefact of CRISPR-Cas prediction tools unable to detect a conceivably greater diversity of uncharted (sub)types across plasmids. Indeed, novel subtypes have recently been identified on diverse MGEs (4,7,10,13).
The observed enrichment of conjugative functions across CRISPR-Cas-encoding plasmids, together with the expected underestimation of transmissible plasmids in our database (i.e. due to unreliable bioinformatic prediction methods and unknown plasmid mobility mechanisms), suggest an active contribution of plasmids to the conspicuous dissemination of CRISPR-Cas systems across microbiomes. These results are in agreement with the proposed bacterial pan-immune concept, where defense systems are continually lost and (re)gained by bacteria through HGT mechanisms (61), and further consistent with the common observation of restriction-modification and toxin-antitoxin systems on plasmids (90,91).
Notably, we found that plasmid-encoded CRISPR arrays tend to carry a larger fraction of spacers predicted to target other plasmids, while plasmid-host chromosomeencoded systems show the commonly observed targeting bias towards viruses. This contrasting targeting preference was consistently observed across taxa and the different CRISPR-Cas subtypes, indicating that plasmids may primarily exploit CRISPR-Cas systems to target other plasmids, and thus likely play a less dominant role in host protection against viral predators ( Figure 5B). These observations extend the hypothesis that the main function of plasmid-encoded type IV CRISPR-Cas systems is to eliminate plasmid competitors (4,11). Interestingly, we found a number of cases of plasmid cross-targeting pairs (26 in total, 4 de-replicated), where CRISPR-Cas-encoding plasmids are predicted to target each other upon crossing paths in a host cell ( Figure 5C). We also found 29 examples of de-replicated plasmids predicted to target other plasmids within the same cell ( Figure 5D), which could indicate the presence of counter-defense strategies to avoid targeting, such as plasmid-encoded anti-CRISPRs (Acrs) (27,28). Although we failed to identify known Acrs across the coresiding targeted plasmids, recent work describes an analogous co-evolutionary arms race between a conjugative island-encoded I-C CRISPR-Cas system and diverse MGEencoded Acrs in Pseudomonas aeruginosa (92).
Together, our results are consistent with previous reports of the co-option of CRISPR-Cas systems, or components thereof, by different MGEs for waging inter-MGE conflicts. For example, the ICP1 Vibrio cholerae phage encodes a I-F CRISPR-Cas system to restrict the phage satellite PLE, a MGE that parasitizes ICP1 (8,16). Additionally, some giant phages and other viruses carry either complete CRISPR-Cas systems or 'mini-arrays' that might contribute to interviral conflicts (7,9,10,93). Our findings thus support the 'guns for hire' concept (94), whereby CRISPR-Cas systems are continually repurposed by different genetic entities. Because similar entities are expected to compete more strongly due to niche overlap (e.g. space and resources), it is not surprising to observe CRISPR-Cas driven inter-viral and inter-plasmid conflicts. Moreover, the higher proportions of virus-derived chromosomal spacers found here, and earlier, illustrate how viruses exert a stronger selection on hosts than plasmids do. Indeed, while viruses often kill their host cell, plasmids tend to only affect fitness -and can be beneficial under certain conditions. Together, these results suggest that retention of CRISPR spacer content is primarily shaped by the selective advantage single spacers confer on the genetic entities carrying them and to a lesser extent by any possible biases inherent to the spacer acquisition and targeting mechanisms.
More broadly, the implications of our findings have practical applications beyond CRISPR-Cas biology. Plasmid sequences may hide an uncharted diversity of CRISPR-Cas systems with promising biotechnological applica-tions, e.g. in genome engineering. Furthermore, plasmidderived CRISPRs can be exploited to determine information about a plasmid's direct relationships with other elements across evolutionary timescales. When available, spacer-protospacer match prediction data could comprise an added layer of information during retrospective plasmid host-range inference analyses, similar to how chromosomal CRISPR contents are leveraged for bioinformatic deconvolution of virus-host associations (81,(95)(96)(97)(98). Furthermore, the distinctive spacer acquisition bias at the leader end of most CRISPR arrays (82,99) suggests a promising resource for extracting chronological information about plasmid dissemination routes. Such analyses may become particularly valuable in the study of clinically relevant plasmids (e.g. those carrying antibiotic resistance or virulence determinants), for which plasmid typing and epidemiological tracking are crucial but currently difficult to infer through sequence analyses alone (64,73,100,101).
Overall, CRISPR-Cas systems constitute powerful barriers against MGE-mediated HGT in microbial communities. While the investigation of CRISPR-Cas biology has focused on chromosomally-encoded systems, our work uncovers their pervasive association with plasmids across a broad phylogenetic breadth, where they appear to play a major role in mediating plasmid-plasmid conflicts. We anticipate that MGE-MGE warfare likely constitutes an important, yet largely overlooked, factor influencing the dynamics of gene flow across microbiomes.

DATA AVAILABILITY
Scripts for downloading data and reproducing all analyses are available at https://github.com/Russel88/ CRISPRCas on Plasmids.