Reappraisal of the DNA phosphorothioate modification machinery: uncovering neglected functional modalities and identification of new counter-invader defense systems

Abstract The DndABCDE systems catalysing the unusual phosphorothioate (PT) DNA backbone modification, and the DndFGH systems, which restrict invasive DNA, have enigmatic and paradoxical features. Using comparative genomics and sequence-structure analyses, we show that the DndABCDE module is commonly functionally decoupled from the DndFGH module. However, the modification gene-neighborhoods encode other nucleases, potentially acting as the actual restriction components or suicide effectors limiting propagation of the selfish elements. The modification module's core consists of a coevolving gene-pair encoding the DNA-scanning apparatus – a DndD/CxC-clade ABC ATPase and DndE with two ribbon-helix-helix (MetJ/Arc) DNA-binding domains. Diversification of DndE’s DNA-binding interface suggests a multiplicity of target specificities. Additionally, many systems feature DNA cytosine methylase genes instead of PT modification, indicating the DndDE core can recruit other nucleobase modifications. We show that DndFGH is a distinct counter-invader system with several previously uncharacterized domains, including a nucleotide kinase. These likely trigger its restriction endonuclease domain in response to multiple stimuli, like nucleotides, while blocking protective modifications by invader methylases. Remarkably, different DndH variants contain a HerA/FtsK ATPase domain acquired from multiple sources, including cellular genome-segregation systems and mobile elements. Thus, we uncovered novel HerA/FtsK-dependent defense systems that might intercept invasive DNA during replication, conjugation, or packaging.

The DndABCDE operon codes for the components of a complex that includes: (i) DndA, a pyridoxal 5 -phosphate (PLP)-dependent cysteine desulfurase that extracts sulfur from cysteine, acting as a sulfur donor ( 17 ).(ii) DndC belongs to the PP-loop ATP pyrophosphatase family and features a 4Fe-4S iron-sulfur center coordinated with three cysteines.Recent research has claimed that DndC binds S-adenosylmethionine (S-AdoMet), which is used for a radical SAM reaction, via its 4Fe-4S cluster ( 15 ,18 ).However, it should be noted that DndC is evolutionarily unrelated to the radical SAM superfamily of enzymes, and the use of S-AdoMet as a substrate is unprecedented in the PP-loop superfamily.DndC possesses a distinct dyad of cysteines, which are likely to receive the sulfur abstracted from the donor by DndA and transfer it into the DNA backbone.The exact interplay between the demonstrated ATP-pyrophosphatase activity of DndC (e.g. in adenylation of the target) and the proposed S-AdoMet substrate in the transfer of the sulfur necessitates further investigation.(iii) DndB, formerly classified within the ParB superfamily of enzymes, is characterized by members exhibiting both nuclease and NTPase activity.( 19 ).Based on wet-lab studies, it has been claimed that DndB functions as an ATP-dependent negative transcriptional regulator, preventing excessive PT incorporation across the self-genome ( 20 ,21 ).However, the possibility of DndB having an additional regulatory role as an NTPase that plays a direct role in limiting PT incorporation or as a nuclease that facilitates access to DNA for the modification machinery, or as a co-effector protecting the system through nucleotide-degradation cannot be ruled out.
The remaining two genes, DndD and DndE, always occur as a linked gene dyad; previous studies have often mischaracterized DndD as an AAA+ ATPase ( 14 ,22 ) and DndE as an uncharacterized globular α+ β domain capable of nonspecific nuclease activity independent of PT ( 23 ).However, we had previously shown that DndD is an ABC-type ATPase belonging to the coiled-coil assemblage of the ABC superfamily, and DndE is a DNA-binding protein containing two RHH repeats ( 24 ).In our earlier study, we also identified a previously unknown sister clade to DndD, named the CxC ABC-ATPase, based on the presence of a 'CxC' type Zn-hook at the apex of the coiled-coil, which is absent in DndD.Like DndD, CxC ABC-ATPases are universally linked to DndE, although these DndEs show substantial sequence diversity compared to their counterparts linked to the canonical DndD.The CxC-ABC ATPases are at the core of anti-invader conflict systems that encompass the recently described PbeABCD systems ( 25 ).The CxC-DndE pair is also coupled in conserved gene neighborhoods to other genes suggestive of restriction and DNA mod-ification activities, hinting at additional biochemical diversity in the Dnd system ( 24 ).
Earlier studies on the proposed three-gene restriction module DndFGH suggested that it is capable of preferentially binding to PT-modified consensus sites in self-DNA, such as 5 -GPTAAC-3 or 5 -GPTTTC-3 , preventing it from targeting nearby unmodified consensus sites in the genome ( 16 ).This binding to PT-modified self-DNA is also said to reduce the AT-Pase, translocase, and nicking activities of DndFGH, thereby protecting self-DNA ( 16 ).The fact that only 10-15% of the GAAC / GTTC consensus sites in the self-genome were identified as PT-modified (using single-molecule real-time (SMRT) sequencing analysis) leaves the open question of how other unmodified consensus sites are left untargeted by DndFGH ( 26 ).Previous studies have shown that the deletion of any gene within the conserved DndFGH cluster significantly impairs their restriction ability, whereas the deletion of modification genes results in phage induction, upregulation of DNA repair pathways, and SOS response genes ( 27 ).However, these prior studies do not present a coherent picture of the action of the DndFGH complex with respect to its domain architecture and evolutionary history.Recent structural studies suggested that DndF contains a P-loop NTPase domain at its Nterminus, DndG has an HTH domain in its C-terminus, and DndH comprises a REase coupled to a RecA-like domain at its C-terminus ( 16 ).However, our initial analysis indicated that these provide an inaccurate account of the domain architectures of the large DndFGH complex, thereby leading to a poor understanding of its biochemistry and mode of action.
Thus, preliminary investigations suggested that there are several unsettled questions regarding the Dnd system as a whole: (i) Is there a phyletic decoupling of DndABCDE and DndFGH, and how do the individual modules function when they occur by themselves?(ii) Why is the modification component often linked to other endonucleases, and do they play a restriction role on their own, if any? (iii) What are the roles of the previously unrecognized domains in the functions of the DndFGH complex?(iv) How do these domains dissociate from PT-modified self-DNA to target and degrade unmodified invasive DNA?In this study, we address these questions through in-depth sequence-structure and comparative genomics analyses, with special attention to previously unrecognized and unannotated domains of the Dnd system.
Consequently, we show that the extended phyletic spread strongly supports the decoupling of the modification and proposed restriction components of the Dnd system in numerous bacterial and archaeal taxa.This indicates the absence of strong selective pressure to retain them together, raising questions about their functional ties.Our systematic delineation of all the domains encoded by DndFGH, along with careful analysis of their conserved features, provides strong evidence that they function as a distinctive prokaryotic immune system on their own.DndFGH independently encompasses domains for restriction, DNA-binding, and ATP-dependent translocase activities.Notably, we show that DndH / DptH has 'captured' HerA / FtsK-like DNA translocase domains from diverse cellular and mobile element genomes.Reconstruction of gene neighborhoods also reveals a variety of endonuclease domains coupled to the Dnd modification module in numerous taxa.This suggests potential alternative restriction capabilities in this system.Together, we present a revised understanding of the functions of DndABCDE, the DndFGH and other systems sharing certain diagnostic domains with them.

Sequence analysis
Sequence profile searches were conducted against the NCBI non-redundant (nr) database using the PSI-BLAST ( 28 ) ( RRID: SCR_001010) and JACKHMMER programs ( 29 ) ( RRID: SCR_005305), creating PSSM and HMM-based profiles at each iteration.Distant and borderline homologs retrieved from these iterative searches, displaying subtle sequence synapomorphies (shared derived characters) with the query, were used as separate search seeds for independent BLASTP ( 30 ) ( RRID: SCR_001010) searches against the NCBI NR database.The BLASTCLUST program ( ftp:// ftp.ncbi.nih.gov/blast/ documents/ blastclust.html ) ( RRID:SCR _ 016641 ; version: 2.2.26) was employed to cluster the dataset into homolog groups and remove nearly identical sequences.The length of pairwise alignments (L) and the bit-score (S) were adjusted based on the desired level of clustering.To detect and verify remote homologs, sensitive HMMbased profile-profile searches were performed using the HHpred program ( 31 ,32 ) ( RRID: SCR_010276).Profile-profile searches were conducted against HMMs derived from PDB ( 33 ), Pfam models ( 34 ) and a custom database of diverse domains maintained by the Aravind group.HHblits ( 35 ) against UniRef30 were used as the default method for MSA generation in HHpred, along with other default parameters.Multiple sequence alignments (MSA) were constructed using Kalign (V2) ( 36 ) ( RRID: SCR_011810) and MAFFT ( 37 ), with manual adjustments based on profile-profile searches against PDB, structural alignments, and predicted 3D structure models (see below).

Structure analysis
Structure similarity searches were performed using the DaliLite program ( 38 ) ( RRID: SCR_003047) against the PDB database, clustered at 75% sequence identity.The JPred (V4) program ( 39 ) ( RRID: SCR_016504) was utilized to predict secondary structures from the generated multiple sequence alignments (MSA).Each MSA served as a reference model to obtain a 3D structure using AlphaFold2 (AF2) ( 40 ,41 ).The predicted secondary structure boundaries from Jpred4 were compared with the AF2 models for verification.AF2-predicted 3D structures were generated separately for each domain identified in this study and used for additional structure similarity searches with the DaliLite program against the PDB database.Structure similarity trees were constructed based on Z -scores obtained from an all-vs-all search of the compared structures using average linkage clustering.The figures showing structural visualization and 3D structure rendering were created using PyMOL ( http://www.pymol.org ) ( RRID: SCR_000305).

Comparative genomics and phylogenetic analysis
Complete taxonomic lineages for all identified sequences in this study were obtained from the NCBI taxonomy database.Contextual information from prokaryotic gene neighborhoods was extracted using a Perl script that retrieves the upstream and downstream genes of the query or anchor gene of interest.The script utilizes GenBank genome files corresponding to unique GenBank genome assemblies (GCA IDs) to retrieve the gene neighborhoods.The protein products were clustered using BLASTCLUST to identify conserved gene neighborhoods based on conservation across different taxa.
Each cluster was annotated separately using the aforementioned sequence-structure analysis, and their domain architectures were delineated.Smaller or divergent clusters that remained unclustered as singlets were merged with larger clusters if there was supporting evidence, such as shared sequence motifs, shared structural synapomorphies, reciprocal BLAST search results, and / or shared genome context associations.The annotated cluster file was then mapped to the corresponding genes to reconstruct the overall gene neighborhoods.Additional filters were applied to identify valid neighborhoods for further analysis, including nucleotide distance constraints (typically 100 nucleotides), conservation of gene directionality within the neighborhood, and presence in more than one phylum.
Phylogenetic relationships were inferred using an approximate maximum likelihood (ML) method implemented in the FastTree program ( 42 ).Local support values were estimated accordingly.To improve the accuracy of topology, the number of rounds of minimum-evolution subtree-prune-regraft (SPR) moves in FastTree was increased to 4 (-spr 4).The options '-mlacc' and '-slownni' were used to make the ML nearestneighbor interchanges (NNIs) more exhaustive.Phylogenetic tree topologies were also derived using ML methods based on the edge-linked partition model implemented in the IQ-TREE software ( 43 ,44 ).Branch supports were obtained using the ultrafast bootstrap method (1000 replicates) in IQ-TREE ( 45 ).Relationships and clustering of protein families and subfamilies were also determined by computing an all-against-all pairwise sequence similarities as implemented in the CLANS software ( 46 ).

Entropy analysis
Position-wise Shannon entropy (H) for a given multiplesequence alignment was calculated using the following equation: where P is the fraction of residues of amino acid type, i and M is the number of amino acid types.The Shannon entropy value for the i th position in the alignment ranges from 0 (only one residue at that position) to 4.32 (all 20 residues equally represented at that position) (47)(48)(49).Analysis of the entropy values which were thus derived was performed using the R language.

Results and discussion
Multi-pronged sequence-structure-based searches revise and extend the phyletic spread of the

DndABCDE and DndFGH systems
To comprehensively determine the phyletic patterns of all eight Dnd genes of both the modification (DndABCDE) and restriction (DndFGH) modules and revise the previous appraisals of the same, we performed iterative sequence searches using the PSI-BLAST and JACKHMMER programs against the NCBI nr database.We did this using all specific domains and proteins previously characterized in the Dnd system as search seeds.All searches were run until convergence or stopped when PSSMs and HMM profiles built in subsequent iterations recovered generic homologous sequences (see methods).The sequences obtained through these searches were clustered using BLASTCLUST, and one representative from each cluster was utilized as seeds for additional searches.Sequences retrieved with borderline statistical significance (evalues between 0.01 and 0.001) that displayed subtle sequence features shared with the query were utilized as seeds for reciprocal searches against the NCBI nr database.These searches broadened the horizon of retrieved homologs of each of the Dnd system proteins.The procedure was repeated until no more clearly delineable members of the family could be added to the collection.For example, a search initiated with DndE exemplar WP_011991365.1 recovered divergent members, and these subsequently served as starting points to recover even more distant versions of DndE (see sections below).Similarly, searches initiated with the previously characterized version of DndF (Genbank accession WP_228949113.1)recovered numerous previously unidentified versions of DndF, and further transitive searches recovered additional DndFs that displayed an array of additional domain fusions in their N-termini (See sections below).This also recapitulated our earlier identification of the CxC type ABC-ATPases, which, when compared to its sister clade DndD , displays considerable sequence heterogeneity and differs from the former in possessing a Zn-hook typified by a CxC motif at the apex of the coiled-coil ( 24 ).
We then systematically confirmed through three steps: (i) clustering or aligning the newly identified and divergent candidates with the canonical or previously classified members of the family and scanning them for sequence synapomorphies to confirm membership.(ii) looking for conserved secondary structural elements through sensitive HMM-HMMbased alignments against the PDB database using the HHpred and directly aligning the three-dimensional structures predicted using AlphaFold2 (AF2); (iii) extracting conserved gene neighborhood information for all newly identified candidates and comparing them to known Dnd systems (see methods).All sequences confirmed using the above steps were utilized for further analysis, while those that did not qualify were discarded as false positives.We then made a phyletic assessment based on the unique NCBI tax IDs at the species level rather than the number of genomes in which the Dnd genes were found.Thereby, we avoided the inflation of counts due to multiple genome assemblies in the database for each taxon.

Phyletic patterns of dnd genes
In this study, we found that the genes belonging to the Dnd systems are present in 5228 taxa across both bacteria and archaea (Figure 1 ).Out of these, the components of the Dnd modification and related systems were found in 4287 taxa, while the DndFGH components were recovered only in 1798 taxa.The total number of taxa in which all eight Dnd genes (DndABCDE + DndFGH) are present corresponds to only a fraction of 7% (316 taxa) of all recovered taxa.In the Dnd modification operon, the two genes DndD / CxC ABC-ATPase and DndE form an 'obligate pair' representing its conserved core (see sections below).Among the 4287 taxa where the modification component is present, DndD / CxC and DndE are encoded in 4018 and 4006 taxa, respectively ( ∼93%).Careful inspection of the remaining ∼250 taxa indicated that the failure to recover DndD / CxC and DndE is likely due to the fragmentary nature or poor quality of the corresponding genome assemblies rather than genuine absence.This observation is further supported by the absolute presence of DndD / CxC and DndE in all higher-order taxa (phyla / lineages), as the ATP-dependent Dnd modification component would be nonfunctional without the ABC-ATPase and its conserved 'DNAscanning' operonic partner in DndE (see sections below).As previously reported ( 50 ), the other components are underrepresented, with DndC, DndB and DndA occurring in 3010 (70%), 2227 (51%) and 1384 (32%) taxa, respectively.A part of this absence, especially of DndA, might be explained by its substitution by 'housekeeping' homologs of the cysteine desulfurase (IscS, NifS, SufS) ( 15 ).Indeed, IscS has been shown to be a functional substitute for DndA in in vitro reconstitution of the PT modification system ( 18 ).However, none of the other widely conserved PP-loop ATPases (ThiI, TilS, CysD) are close (51)(52)(53)(54), biochemically equivalent homologs of DndC, raising the possibility that systems lacking this enzyme might rely on an alternative DNA modification or mechanism of actions.
Although the DndFGH module is present in only 1728 (33%) of all analyzed / recovered taxa (5228), it exhibits a contrasting conservation pattern (Figure 1 ).Unlike the modification component, where two or more genes are underrepresented, the tripartite operonic association in the form of Dnd-FGH is invariably present in almost all taxa in which Dnd-FGH was recovered.Out of the 1798 taxa, DndF is present in all (100%), while DndG and DndH were recovered in 1732 (96%) and 1688 taxa (93%), respectively.The absence of DndG and DndH in only a few taxa is again primarily attributable to incomplete genome assemblies (See Methods and Supplementary Data S1 ).This indicates a strong selection to retain all three genes of the restriction module, as they function as a large multi-domain complex where each domain plays an indispensable and specific role (see below).For the first time, we uncover the DndFGH system in numerous archaeal taxa, dispelling two earlier views: first, the restriction component is found only in bacteria and not in archaea ( 25 ); second, DndFGH does not co-occur with the so-called Pbe-ABCD system ( 25 ) (see below for further discussion).
We then compared our appraisal of the Dnd genes with datasets from previous reports and carefully reviewed the overall presence / absence of Dnd genes, correcting instances of faulty overestimation of their distribution in the previous studies ( Supplementary Data S1 ).For instance, while the cysteine desulfurase IscS might potentially substitute for DndA, it is a distinct gene with a clear housekeeping role in cysteine metabolism.Furthermore, IscS genes do not occur in operonic association with other Dnd genes in the genome.Thus, including it in the estimates inflated the overall phyletic distribution of the Dnd system in previous studies ( 50 ), as IscS is widespread throughout bacteria.Likewise, we observed an overestimation in the phyletic spread of DndH, resulting from the incorporation of numerous false-positive homologs that solely featured the common C-terminal P-loop ATPase domain.These homologs were devoid of any of the specific Nterminal domains characteristic of DndH ( 50 ) (see sections below explaining the domain architectures).Thus, our screen, which accounts for these problems, identified Dnd genes in 40 bacterial and 11 archaeal phyla, including 16 phyla (4 archaea, 12 bacteria) that were not previously reported.At the genus level, we have identified Dnd genes in 440 genera (401 bacterial and 39 archaeal genera) that were not reported in any previous study.In summary, we have revised the previous phyletic distributions and expanded the overall presence of the Dnd system.All DndE homologs are typified by two ribbon-helix-helix (MetJ-arc) DNA-binding domains While we had previously shown that the DndE family is characterized by two tandem copies of the DNA-binding ribbonhelix-helix (RHH) domains ( 24 ), recent studies in the Dnd field see DndE as a single globular α+ β domain comprising two strands and five helices or as an ordinary helix-turnhelix (HTH) domain ( 23 ,55 ).The latter interpretations of these proteins have led to a suboptimal understanding of the structural basis for the DndE function ( 23 , 55 , 56 ).To remedy this, in the current work, we present an in-depth analysis of this family.The RHH domain is a variant of the classical tri-helical HTH domain where the first helix has transmogrified into an N-terminal β-strand, resulting in a 'strand-helix-helix' configuration ( 57 ) (Figure 2 A, B).Consequently, single RHH domains obligately homodimerize via their β-strand through the formation of an antiparallel βsheet.Unlike conventional HTH domains, this two-stranded sheet forms the primary DNA-binding interface via insertion into the major groove (57)(58)(59).By analyzing over eight thousand DndE sequences identified in this study, we validate that all DndEs contain two tandem copies of the RHH domain in the same polypeptide.We further confirmed this through the analysis of previously known structures and all AF2-predicted structures of DndE.To differentiate between the two units, we denote the secondary structures as 'S1a-H1a-H2a' and 'S1b-H1b-H2b,' where the subscripts 'a' and 'b' indicate the first and second RHH units, respectively, and letters 'S' and 'H' refer to strands and helices (Figure 2 C-G).The two RHH units in DndE are connected by a long variable linker, between 10 and 20 residues in length (Figure 2 C-G, Supplementary Data S2 -S3 ).The DndEs are also characterized by an elongated H2b that forms the stabilizing core of the structure.Further, most DndEs possess a C-terminal helical extension beyond H2b, which potentially facilitates proteinprotein interactions with its obligate ABC ATPase partner (Figure 2 C-G).
While most proteins have a single RHH domain per polypeptide (e.g.Arc, MetJ, CopG) ( 58 ), structural comparisons revealed that the duplicated architecture followed by a C-terminal helical extension seen in the DndE family is also present in the F-plasmid transfer factor TraY (Figure 2 H).TraY binds to the origin of transfer (oriT) to facilitate plasmid DNA-nicking by the plasmid-encoded endoDNase prior to conjugative transfer ( 60 ,61 ).This might be paralleled in the proposed role of DndE in DNA nicking and its preference for binding to nicked DNA ( 23 ).However, our analysis indicates that DndE does not conserve any residues suggestive of enzymatic activity and is unlikely to catalyze endonuclease activity by itself.Consistent with this, the currently proposed PT modification catalyzed by purified DndCDE appears to bypass nicking of the DNA ( 18 ,62 ).The presence of two tandem copies of the RHH domain in DndE enables it to function as a monomer, adopting a three-dimensional configuration and structural equivalence to dimers of the canonical RHH superfamily members with a single DNA-binding domain per polypeptide.Keeping with this, TraY also exists as a monomer in solution and binds DNA in this configuration ( 58 , 60 , 61 ).Based on these observations, we propose that one copy of DndE associates with its ABC ATPase partner and recognizes the target sequences akin to TraY binding oriT sites.Since the ABC ATPases themselves obligately dimerize via their coiledcoil inserts, the minimal functional unit can be reconstructed as containing two DndE and two ABC ATPase subunits similar to the Smc-ScpB / Kleisin complexes ( 63 ,64 ).

The di ver sification of DndE parallels their operonic pairing with the two ABC-ATPase clades
Based on our tracing of sequence-structure synapomorphies of all identified DndEs, we found that the DndEs can be broadly categorized into two major clades, which are perfectly congruent with their genomic association with the DndD and CxC clade ABC-ATPases.Accordingly, we term the DndEs associated with the DndD ABC-ATPases as 'clade-1' and those paired with the CxC clade ABC-ATPases as 'clade-2' (Figures 2 C-F, 3 ).At the sequence level, clade-1 and clade-2 DndEs can be distinguished by multiple prominent sequence signatures (Figure 2 I, Supplementary Data S2 and S3 ).At the structural level, both clade-1 and clade-2 DndEs are distinguished by the lengths of specific loops and helices.For instance, in clade-1, the loop connecting the helices H1a and H2a ranges between 3-5 residues, whereas in clade-2, the same loop is an average of 15 residues and displays a flexible conformation (Figures 2 C-F, I, Supplementary Data S2 and S3 ).Similarly, the loop connecting H1b and H2b is shorter in clade-1 (average 7 residues) and much longer in clade-2 (average 14 residues).The length of the terminal extension beyond H2b, including the C-terminal helix, is relatively short in clade-1 (17 residues overall with a C-terminal helix of 10 residues) compared to clade-2 (29 residues overall and a C-terminal helix of at least 19 residues) (Figure 2 C-F, I, and Supplementary Data S2 and S3 ).This suggests that the clade-1 DndE is much more compact and closer to the ancestral state than the clade-2 DndE with multiple long flexible regions.
The DndEs of each of the two major clades, clade-1 and clade-2, can also be further classified into subclades displaying specific synapomorphies, and each of these displays high sequence heterogeneity as observed in the Shannon-entropy analysis ( 47-49 ,65 ) (Figure 2 I-K).BLAST-score-based clustering, followed by a thorough visual inspection of the alignments of each cluster, allowed us to categorize clades 1 and 2 into 6 subclades each, referred to as clade-1.1 to 1.6 and clade-2.1 to 2.6 (Figure 2 I-K).We tested the validity of this classification by two independent analyses.First, we used a representative subset of all DndE sequences from each subclade and performed maximum-likelihood phylogenetic analysis using the IQ-TREE software (Figure 3 A).Second, we performed clustering using the CLANS software with the BLO-SUM62 matrix (Figure 3 B-C).The tree topology completely recapitulated our original subclade classification, with clades 1.1 to 1.6 and clades 2.1 to 2.6 forming well-supported groups in the tree (Figure 3 A).Similarly, the pairwise 'one-to-one' alignment-based approach, as implemented in the CLANS software ( 46 ), reproduced an identical clustering of DndEs as the inferred tree topology (Figure 3 B, C).
We next asked if the coupled ABC ATPases also exhibit congruent evolutionary trajectories at the subclade level.We addressed this using our gene neighborhood collection to obtain the corresponding DndD and CxC clade ABC ATPases that occur in tandem in the genome with the DndEs.These were then subjected to both phylogenetic tree construction and score matrix-based clustering analysis.Both the inferred tree topology and CLANS-based approach clustered the ABC ATPases into two major clades: the DndD and CxC clades, each with distinct subclades precisely corresponding to the  DndE subclades (Figure 3 A-C).These observations strongly support our above proposal that the DndE and ABC-ATPases (DndD / CxC) function in a DNA-manipulating complex similar to other members of the ABC superfamily.It also suggests that these two components form the functional core of these systems, to which the other regulatory, modification and restriction components are recruited, namely DndA, DndB and DndC in the canonical Dnd systems and the HKD + SF2helicase, DNA methylase and the SP in the CXC-ABC ATPasebased systems.

The DNA-binding interface of the DndE family displays high sequence variability suggesting a role in target specificity
To understand the DNA-binding interface of DndE, we systematically surveyed the literature to gather information on the DNA-contacting positions and residues of all characterized RHH domains.While most of these positions are primarily located in the antiparallel sheet, a few residues from the second helix of each RHH subunit also contact DNA as in the ancestral HTH domain ( 59 , 60 , 66-68 ).The primary residues engaging the major groove from the antiparallel sheet are a conserved arginine and / or a lysine, which simultaneously establish electrostatic contacts with the phosphate backbone of the DNA and engage in cationπ interactions with the bases, exhibiting a preference for guanine and adenine, respectively ( 59 ).Furthermore, polar residues, e.g.asparagine and glutamine (such as in MetJ and Arc) from the antiparallel sheet form hydrogen bonds with the acceptor / donor groups of adenine or guanine ( 59 ,66-69 ).However, unlike classical RHH homodimers, where both strands are identical, the two RHH domains of DndE are divergent, allowing for a greater asymmetry in the bound target.In line with this observation, one of the characterized PT modification sites takes the form GAAC / GTTC, with the modification occurring between G-A and G-T on the respective strands.We found striking sequence variability in the DNA binding interface of the DndE family (Figure 2 I-K, Supplementary Data S2 and S3 ).First, there is no universally conserved residue across all DndEs in both copies of the RHH domain.However, a motif with a dyad of basic / polar residues separated by a hydrophobic residue is observed in S1a of at least six DndE subclades from both clade-1 and clade-2.The second of these polar positions tends to be an arginine in at least 85% of the exemplars with this motif, and collectively, at least one positively charged residue (mainly arginine) tends to be retained in S1a (Figure 2 I, Supplementary Data S2 and S3 ).Thus, the first RHH domain might at least recapitulate one of the pivotal DNA interactions seen in the classical RHH domain transcription factors.S1b tends to be more divergent than S1a.However, several subclades show their own private conserved motifs with multiple charged or other hydrophilic residues that are suitably positioned to contact DNA.In most cases, at least one polar residue (usually R, K, N, Q, D, T) tends to be directed in a manner favoring major groove interactions.Additionally, both S1a and S1b feature partially conserved aromatic residues that are likely to engage in ππ interactions with the base.On the whole this is compatible with the purine-preference observed in the target sites of previously studied RHH domains and is consistent with both the de novo and the subsequent modification on the opposite strand occurring at sites with a guanine in currently characterized PT systems ( 15 ,18 ).Nevertheless, the overall observed diversity in the DNA-binding interface of the DndE family implies that different subclades recognize distinct DNA motifs (Figure 2 I, Supplementary Data S2 and S3 ).As a corollary, this would indicate that the target motifs for PT and other modifications are likely to differ across the subclades of the systems, comparable to the target motifs seen in R-M systems ( 3 , 4 , 70-72 ).
Duplication of the ABC ATPase-DndE pair, association with new functional partners, and subsequent re-coupling drove di ver sification of dnd modification and related systems Based on the comprehensive evaluation of the gene neighborhoods, we made two notable observations (Figure 4 A, B).First, the so-called PbeABCD system, recently claimed to be an alternative for the DndFGH system, is the same as the CxC ABC ATPase-anchored system with its cognate DndE, which we had previously reported ( 24 ).Second, these are frequently located in genomes alongside the canonical Dnd systems anchored on the DndD-DndE pairs and the associated PT modification genes, resulting in a 'mix and match' of one or more Dnd systems with the CxC ABC ATPase-anchored systems (Figure 4 A: i-iv).
In earlier work, we demonstrated that the CxC ABC ATPase-centric systems, in addition to its cognate DndE gene, include a distinct set of genes encoding: (i) an endoDNase of the HKD superfamily fused to a superfamily-2 (SF2) helicase module that resembles the restriction component of the Type-I / III-like ATP-dependent R-M systems ( 73 ,74 ); (ii) a rapidly evolving, mostly disordered small protein (SP) that we earlier predicted to function as a negative regulatory component or an interactor with viral triggers (Figure 4 A: i-iii); (iii) some of these neighborhoods additionally code for DNA cytosine-5 and / or circularly permuted cytosine-N4 methylases ( 24); (vi) The systems with methylases often feature a gene coding for an MTaX1 domain potentially involved in modified DNA-discrimination that is also found in several R-M systems ( 24 ) (Figure 4 A: iv).Parallel to our report of the CxC ABC ATPase-centric systems, Xiong and colleagues described a subset of these linked to the canonical Dnd genes as a new immunity system named PbeABCD ( 25 ) (Figure 4 B).We herein clarify that PbeA is the aforementioned HKD en-doDNase fused to a SF2 helicase module, PbeB is the rapidly evolving small protein, and PbeD is the DndE cognate.PbeC is the CxC-type ABC-ATPase that was misidentified in that study as an SMC-type ABC-ATPase (Figure 4 B).It should be noted that instead of the Zn-hook found in the CxC-ABC, the SMC-type ABC-ATPases are typified by a hinge domain at the apex of the coiled-coil, and the average length of their coiledcoil region (800-1000 residues) is almost twice the length of the coiled-coil segments usually found in DndD / CxC ABC-ATPase clades (400 residues) ( 24 ).
While the canonical DndD-containing systems are more widespread than the CxC ABC ATPase-containing systems, our survey identifies those systems centered on DndD and the CxC-type ABC ATPase, each with its cognate DndE homolog, occur tandemly in the genomes of 936 bacterial taxa (spread across most bacterial clades) and 41 archaeal taxa (from across 4 major archaeal clades) (Figure 4 A, C, D).Together, these constitute 22% of all taxa in which a Dnd modification system is present.Interestingly, we also found adjacent DndDE pairs occurring in the opposite orientation, with one of the pairs usually additionally containing DndB.These are found across 140 taxa from various bacterial phyla and one archaeal taxon.Notably, none of these taxa contains an adjacent system centered on a CxC-ABC ATPase.In 39 taxa distinct from the above, we found that the duplicated DndBDE / DndDE gene clusters occur along with a CxC ABC ATPase-DndE pair in between them (see Supplementary Data S1 and Figure 4 A,  D).These observations point to functional cooperation between distinct DndBDE / DndDE and CxC ABC-ATPase anchored systems.
The above observations suggest that, in addition to dissemination via lateral transfer across prokaryotes, these systems have undergone duplication and genomic re-association on more than one occasion: (i) an ancient duplication event involving an ancestral gene that encoded an ABC ATPase with a coiled-coil insert coupled with a DndE-like gene.This led to the formation of the core DndD and CxC systems.(ii) These systems then independently associated with distinct genes for the small protein, endonuclease and / or modification enzymes and the ParB superfamily enzyme DndB, resulting in the separate CxC-and DndD-centric systems.(iii) The DndBDE system also underwent a separate duplication event, resulting in pairs of this system oriented in opposite directions, with one copy losing the DndB component.(iv) In some instances, a CxC ABC ATPase system was inserted within the duplicated DndBDE / DndDE systems.
Further, our observations call for a reinterpretation of the possible mode of cooperative action of the canonical DndD and CxC-anchored systems.Contrary to previous reports, our analysis reveals that in many instances, the DndFGH system is encoded by the very same genomes with a DndCDEA coupled to the CxC-centric (the so-called PbeABCD) system ( Supplementary Data S1 and Figure 4 B).Conversely, as noted above, the CxC-anchored systems (PbeABCD) also occur independently of the DndD-centric system in numerous bacterial taxa.In these cases, they might occur with or without a co-occurring DndFGH operon in the same genome.These observations raise serious questions regarding the proposal that the so-called PbeABCD (CxC-based system) functions as a replacement restriction module for DndFGH.It was also proposed that the so-called PbeABCD module (the CxC-centric systems) operate via an enigmatic mechanism wherein no direct cleavage or degradation of the viral DNA occurs ( 25 ).However, we observed that the vast majority of these systems contain an HKD-DNase fused to the SF2 helicase (PbeA), which argues for the cleavage of DNA.Hence, we propose that nucleic acid cleavage is indeed part of the action of the systems.This could restrict the virus via two alternative processes: first, through an attack on viral DNA that has gone undetected.In this proposal, the modification component (PT or methylation), found in a large fraction of these systems could facilitate self-nonself discrimination for specifically targeting viral DNA.Alternatively, the observed restriction of the virus could proceed via cleavage of self DNA by the associated endonuclease, resulting in cell-suicide or apoptosis, thereby preventing its proliferation at the population level.In support of the latter model, we point to the small, mostly disordered protein (cognates of PbeB) that could act as a negative regulator of the system that is released by cleavage or modification by viral enzymes, unleashing the endoDNase component.Indeed, in both these processes, the ABC-ATPase-DndE pair could scan the DNA for specific target sequences.We present further evidence for the role of both DNA modifications and suicide effector actions in these systems below.

Gene-neighborhoods centered on the ABC-ATPase-DndE core incorporate di ver se nucleases
Our comprehensive collection of the gene neighborhoods centered on the ABC-ATPase-DndE core revealed the incorporation of an underappreciated diversity of nucleases alongside the Dnd modification components (Figure 4 C, Supplementary Dataset S1 ).Interestingly, these include en-doDNases on one hand, viz., HKD-endoDNase, HNH, multiple REase families (e.g. the previously defined NERD), and URI (the UvrC-Intron homing endonuclease superfamily, overlapping with Pfam: GIY-YIG) and endoRNases on the other, viz., HEPN and PIN (75)(76)(77).A subset of the en-doDNases tends to primarily occur immediately downstream of the Dnd modification genes.These endoDNases are typically the aforementioned HKD fused to an SF2-helicase module, an HNH endoDNase domain fused to N-terminal winged HTH (wHTH) and SAD(SRA) domains, and a REase domain similarly fused to a HARE-HTH domain which recognizes DNA with modified bases ( 78 ) (Figure 4 C).The HNH nucleases fused to SAD(SRA) domains were originally predicted to recognize epigenetic modifications in DNA, such as 5-hydroxymethylcytosine (5hmC) ( 10 ).This paradigm is illustrated by TagI, a SAD(SRA)+HNH family restriction enzyme targeting modified DNA ( 79 ).In the version found in the Dnd systems, the above three-domain HNH protein has been proposed to specifically recognize PT modification via its so-called N-terminal sulfur binding domain (SBD) ( 80 ).Rather than being a novel domain, the SBD corresponds to the wHTH domain that we identified.While the SAD(SRA) domain might augment DNA-binding by the SBD, we suspect that it might also additionally recognize other modifications, such as 5mC or 5hmC specific to the viral DNA.We propose a similar modification-discriminating role for the HARE-HTH fused to the REase domain found in several of these systems (Figure 4 C).
In contrast to these endoDNases, the linked genes coding for URI, certain standalone REase domains (e.g.NERD), and the endoRNases tend to be encoded at the boundaries of the core systems and sometimes on the opposite strand.Thus, we propose two distinct roles for the Dnd-associated endonucleases: first, the endoDNases fused to additional modified DNAbinding domains or the SF2-helicase module are likely to discriminate modified DNA versus unmodified DNA and function as bona fide restriction enzymes.Second, the standalone endoDNases and endoRNases are likely to serve as backups that act as suicide effectors that indiscriminately target DNA or RNA upon failure of the main Dnd system due to inhibitory mechanisms deployed by the invasive elements ( 77 , 81 , 82 ).Collectively, at least one endonuclease is present in 3416 out of 4287 unique taxa (80%) in which the modification component was identified (Figure 4 D, Supplementary Dataset S1 ).Among these 3416 taxa, only 806 possess the DndFGH module, raising the possibility that a subset of these endonucleases from the modification neighborhoods serves as the genuine alternative restriction components.Consistent with this, among the 698 taxa where the wHTH + SAD + HNH was encoded in the Dnd locus, only 197 possess the DndFGH module (Figure 4 D).The simplest proposal for the action of these restriction components would be that they discriminate between host DNA modified by PT or 5-methyl / N4-methyl cytosine marks and invader DNA with no modifications or its own virus-specific modifications to cleave the latter .However , more complex modes of action, such as suicide action on self-DNA triggered by detection of the virus, are also possible, as proposed above, for systems with the HKD endoDNase fused to an SF2 helicase module.

Apprehending the organizational and functional logic of the DndFGH system
Given the above observations suggesting that the DndFGH system is frequently decoupled from the Dnd modification system, we next sought to understand better the origin and function of this system.Our preliminary analysis also indicated that despite recent structural studies, there are several lacunae in understanding the architecture of the proteins in this system.Hence, we conducted a comprehensive sequencestructure analysis of these proteins to properly characterize their domain architectures, variability and evolutionary histories.We first consider DndG because it tends to retain a constant, relatively simple architecture in all exemplars of the system, followed by DndF and DndH, which are larger proteins with unusual architectural features or variability.

DndG
A recent structural study reported a C-terminal wHTH domain in DndG Esc heric hia coli B7A (residues 373-441) ( 16 ).However, systematic analysis of DndG from 1732 distinct bacterial taxa revealed that it is typically 450-500 residues long, leaving a substantial unaccounted N-terminal region.Through structure comparisons, profile-profile searches and structure prediction using AF2, we show that the C-terminal wHTH is preceded by two additional divergent N-terminal wHTH domains.Whereas the second and third wHTH domains feature a 3-stranded wing, the first domain shows a relatively rare 4-stranded wing.The second domain is also further decorated by a bihelical insert after the first core helix.Together, the wHTH domains of DndG fold into a compact 3bladed structure (Figure 5 A).Interestingly, outside of the Dnd-FGH systems, a cognate triple wHTH module is also found in other restriction and DNA-modification systems.First, a version of this module is fused to the REase domain in the FokI restriction enzyme, where it functions as a DNA-binding domain ( 83 ,84 ).Second, it is also present in the SspB subunit of the variant PT restriction-modification system, sspABCDE ( 22 ,24 ).In that system, sspA, sspD, and sspE are equivalents of DndA, DndC, and DndB, respectively, whereas the Cdc6 / Orc clade AAA+ ATPase, sspC, replaces the DndD / CXC ABC-ATPase of the Dnd system ( 22 ,24 ).Notably, this system does not have a DndE cognate, with its place evidently taken by sspB.Hence, as in the above two cases, DndG likely functions as a DNA-binding target-recognition and / or modificationdiscriminating subunit of the DndFGH system.

DndF
We observed that DndF proteins contain a conserved twodomain core, which on occasion, in several different prokaryotes, might be extended via the fusion of biochemically diverse N-terminal domains (see below).The first of the two domains of the conserved core is a derived P-loop kinase that presents several unusual features: (i) it starts with a four-helical Nterminal extension with conserved charged residues.(ii) While the Walker A motif assumes a canonical form, the Walker B motif has an asparagine in lieu of the conserved aspartate.Interestingly, there is a conserved ND motif at the end of strand-2 of the core P-loop kinase.Such a configuration with an acidic residue at the end of strand-2 but not in the Walker B motif is seen in certain other P-loop kinases, such as the shikimate, gluconate, and dephospho-CoA kinases ( 85 ).(iii) The core strand-4 of the P-loop kinase is preceded by a helix-strand insert, with the strand replacing the ancestral strand 5 of the domain, albeit in an antiparallel configuration.(iv) The most remarkable feature of this P-loop kinase is a highly augmented lid subdomain ( 85 ) that is 200 residues in average length and comprised of 12-13 helices (Figure 5 B, C and D).Some versions exhibit an additional Zn-chelating insert between the first and second helices of the lid ( Supplementary Data S2 -S3 ).Nevertheless, this subdomain retains the 3D configuration and a conserved arginine crucial for binding the base of the ATP typical of the P-loop kinases ( 85 ).This augmented lid further interacts with the above-mentioned N-terminal four-helical extension to form a large, exposed surface and is marked by multiple conserved residues.This suggests that it is favorably positioned to play an additional role in protein-protein interactions within the DndFGH complex.
The second core domain of DndF is an α+ β domain that does not recover any related domains in structure similarity searches with DALI and Foldseek.However, an inspection of its structure revealed that it might have emerged via the duplication and augmentation of a relatively simple precursor: we found that the core of the domain contains two copies of a unit made up of a bihelical hairpin followed by a βhairpin.The insertion of a 3-stranded β-meander between the two copies of the said unit expands this core, giving it a complex form with two distinct β-sheets.The two bihelical units and the β-hairpins protrude out from the P-loop kinase domain to present an extended exposed surface together with the large lid subdomain (Figure 5 B, D and E).The α+ β domain is fast-evolving and has no well-conserved residues indicative of a role in catalysis.However, its extensive exposed surface, positioned adjacent to part of the lid subdomain, suggests that it is favorably positioned for interactions with other similarly fast-evolving proteins, potentially from the invader.The variable N-terminal domains fused to the above core in several exemplars of DndF include: (i) the endoRNase HEPN; (ii) a REase domain of the NERD family; (iii) a S / T / Y kinase domain, typically occurring as tandem inactive and active copies; (iv) a GCN5-like acetyltransferase (GNAT) domain (annotated as DUF4338 in Pfam); (v) a wHTH; (vi) a Zn-ribbon.In some examples, multiple such domains, for instance, a wHTH, NERD, and inactive and active S / T / Y-kinase domains, are successively fused to the conserved core of DndF (Figure 5 B and F, Supplementary Data S2 -S3 ).
We interpret this architectural template of DndF in functional terms thus: while it is not certain whether the Nterminal P-loop kinase domain of DndF is catalytically active or inactive on account of the asparagine in the Walker B motif, its intact Walker A motif, together with the geometry of the Walker B, strand-2 polar residues and the lid arginine, indicate that it is definitely capable of binding a nucleotide.Several bacteriophages evade host restriction by incorporating modified nucleobases ( 86 ,87 ) derived from nucleotides modified by the action of phage enzymes, such as members of the thymidylate synthase superfamily, during DNA synthesis ( 88 ).Conversely, there are no endogenous signalingnucleotide-generating enzymes encoded by any of the Dnd systems.Hence, we predict that it potentially serves as a sensor for phage-derived modified nucleotides.If it is active, it might further phosphorylate these nucleotides and prevent its use by the phage.Further, the fast-evolving α+ β domain, together with the lid subdomain of the P-loop kinase, might directly interact with invader macromolecules.Therefore, we propose that a possible aspect of the activation of restriction by the DndFGH system is the sensing of invader-derived nucleotides by the DndF subunit.Under this scenario, the occasionally fused variable N-terminal domains are likely to function as co-effectors, as in other anti-invader conflict systems ( 81 ,89-91 ), which are brought into play if the main restriction activity were to be counteracted by inhibitory systems deployed by the selfish element.These could potentially evince their action by suicidally targeting self-molecules, including DNA (NERD), RNA (HEPN), or proteins (S / T / Y kinase or GNAT).

DndH
This subunit is again a large protein with a mean length of 1700 residues.Prior studies focusing on the DndFGH system had suggested that the C-terminal region of DndH contains a REase endoDNase followed by a RecA-like ATPase domain ( 16 ).However, our analysis indicated that the latter is not a member of the RecA clade; rather, it is unequivocally a HerA / FtsK-like ATPase (see below) with rather different functional implications.Moreover, the N-terminal part of 900 residues remains unaccounted for.Hence, we conducted a systematic analysis of the DndH proteins and established that they all possess a constant multidomain core of eight tandemly arranged globular domains that we outline below in order from the N-to the C-terminus (Figure 6 A).
(i) Through DALI and profile-profile searches, we established that the N-terminal-most domain of DndH is a version of the second P-loop domain of the SWI2 / SNF2-type SF2 helicase modules (Figure 6 B).Both SF1 and SF2 helicases are comprised of an N-terminal canonical P-loop ATPase domain that binds and hydrolyzes ATP and a second C-terminal Ploop ATPase domain that lacks the Walker A and B motifs but contributes two arginine fingers that aid catalysis ( 92 ,93 ).Given that this domain in DndH corresponds to the second P-loop domain of the SWI2 / SNF2-type SF2 modules, it lacks the characteristic ATP-binding signatures.Through sequence comparisons, we established that it also lacks the arginine fingers.However, we found that it contains a distinct, highly conserved arginine that lies precisely at the previously established nucleic-acid-binding interface of SF2 helicases.Hence, we posit that this inactive SWI2 / SNF2-type P-loop domain binds DNA.
(ii) The above domain is immediately followed by a fastevolving HTH domain augmented by an N-terminal helical bundle.This domain is predicted to extend the DNA-binding interface formed by the previous domain (Figure 6 B).
(iii) These are followed by two copies of an all β-domain.DALI searches with these domains recovered carbohydratebinding β-sandwich domains from glycosylhydrolases, e.g. the two β-sandwich domains of β-galactosidase (PDB: 1XC6).In light of their role in glycosylhydrolases, they could similarly bind the sugar in the backbone of DNA ( 94 ) or alternatively recognize proteins in the manner of β-sandwiches such as the immunoglobulin domain ( 95 ) (Figure 6 C).
(iv) Beyond the above is a rapidly evolving target recognition domain (TRD) (Figure 6 D) associated with the DNA adenine methyltransferase (DAM) of Type II RM systems ( 96 ,97 ).Consistent with this, we recovered equivalent domains from M.BpuSI (Type II-G) ( 98 ) and M.BspD6I (Type II-T) ( 99 ) as top hits (Figure 6 E).In conventional R-M systems, this domain plays a key role in the recognition of their target DNA sequence motifs ( 3 , 4 , 70 , 71 ).
(v) The next module in the polypeptide is a DAM domain with a second TRD nested within it.A close examination of the DAM domain showed that while it retains the classic Rossmann fold morphology of a SAM-dependent methyltransferase, the characteristic sequence motif of the S-AdoMetbinding Rossmann loop between strand-1 and the following helix and the catalytic [SDN]PP[FYW] motif after strand-4 are completely eroded ( 10 ).Thus, it can be definitively seen as a catalytically inactive version.The nested TRD is inserted into the DAM domain after strand-4.A similar insert has been observed in the DAM domains of the so-called γ-group ( 96 ,100 ), suggesting that this inactive DAM domain of DndH has been derived from those DAMs.Additionally, structural modeling using AF2 showed that the N-terminal TRD and the inserted TRD are spatially close to each other, forming a cohesive interaction interface (Figure 6 E, F).
(vi) The REase domain of DndH, which occurs after the DAM module, is similar to the eponymous domain from type-II restriction enzymes (e.g.BpuSI) and retains all key residues necessary for endoDNase activity (Figure 6 E, G).
(vii) The C-terminal-most domain of DndH is a P-loop NT-Pase domain, which consistently recovered members of the HerA-FtsK clade of ATPase domains in sequence searches with PSI-BLAST as well as structure-similarity searches with DALI.Notably, it contains a Q as the sensor-1 residue at the end of the core strand-4 and an arginine finger just upstream of this strand.Contrary to previous claims, these observations irrefutably establish this domain as a member of the HerA / FtsK clade ( 101 ) rather than the RecA clade, which does not show any of these features (Figure 6 H).Remarkably, we found that the HerA / FtsK ATPase domains at the Ctermini of different DndH proteins come from several distinct clades of this class of domains, some of which are rather distant from each other (see below).The identification of this domain as HerA / FtsK suggests that it might function as a DNA translocase or pump, unlike a RecA-like ATPase, which is involved in recombination ( 101 ).
While the above multi-domain architecture of DndH is retained in most prokaryotes, we found a larger variant in certain gammaproteobacteria that features a σ-factor-related HTH domain inserted between the REase and the HerA-FtsK domains (Figure 6 A).

DndH is a member of a broader array of previously unrecognized 'HerA-FtsK capture' systems
We next explored the implications of the striking finding that despite their conserved architecture, different DndH proteins display widely divergent HerA / FtsK domains belonging to multiple distinct clades.Phylogenetic analysis revealed that the HerA / FtsK domains from DndH proteins belong to six distinct clades, which also include homologs outside of Dnd-FGH systems.Thus, the HerA / FtsK domains of DndH proteins are dispersed throughout a radiation of these domains rather than forming a single clade.We observe the following relationships for the HerA / FtsK domains found in different DndH proteins (Figure 7 , Supplementary Dataset S1 ): clade-1 (red) additionally includes certain HerA-FtsK-like proteins encoded by a mobile operon, likely originating from a transposon, where they are linked or fused to a D5-type SF3 AAA+ helicase (see below); clade-2 (purple) also contains the highly conserved cellular FtsK involved in chromosome segregation ( 101 ); clade-3 (blue) further includes a group of HerA / FtsK domains that are found in a subset of Dnd PTmodification systems ( 24 ).Here, they occur adjacent to genes for a DndD-related ABC ATPase and are fused to DndE in the same polypeptide; clade-4 (grey) and clade-5 (yellow) additionally feature a class of HerA / FtsK domains coupled with an Orc / CDC6 family AAA+ ATPase ( 102 ,103 ) (see below); clade-6 (green) also contains a distinctive class of HerA-like ATPases found in certain eukaryotic elements (see below).This indicates that the DndH proteins have evolved through the repeated 'capture' of different versions of the HerA / FtsK domain from a diversity of sources ranging from the cellular DNA segregation system to disparate mobile gene neighborhoods (Figure 7 ).The HerA / FtsK domains from clades 1, 5, and 6 led us to a mélange of interesting mobile gene neighborhoods that display features indicative of both mobile elements and anti-selfish element defense systems.
Outside of DndH, the clade-1 HerA / FtsKs are predominantly found in mobile operons from firmicutes.We observed that the totality of HerA / FtsK domains found in these mobile neighborhoods themselves belong to two distinct branches, with all DndH-associated versions being nested within just one of the two branches.This suggests that the Clade-1 HerA / FtsKs in DndH were captured from a specific branch of these mobile operons.These operons additionally code for a D5-type SF3 helicase, a Xis-type HTH, a tyrosine (Y)recombinase, and sometimes a γ/ δ-type resolvase (Figure 7 ).The D5-type SF3 helicase is often fused to a C-terminal TcpKlike wHTH domain (PDB: 5VFY-B) that has been shown to bind DNA during conjugation in other systems ( 104 ).Similarly, the Xis-type HTH recognizes recombination sites in other transposons ( 103 ).Outside of these mobile operons, a closely related D5-like helicase is also found fused to Toprim primases, Primpols, phage P4-type RepBprime primases, and family A DNA polymerase with the proofreading 3 → 5 ex-oDNase domain ( 105 ,106 ).Further, the D5 SF3 helicase may also be coupled in an operon with a class II GATase domain that might function as a cell-wall peptidase during DNA transfer ( 107 ).These observations indicate that the D5-type SF3 helicase is a hallmark of multiple conjugative mobile elements.Interestingly, whereas the D5-helicase linked to the primases and polymerases is invariably active, the versions fused to the HerA / FtsK domains tend to be inactive.This suggests that while the former are functional mobile elements, the latter are potentially transposons domesticated for a role in counterinvader conflict ( Supplementary Data S1 ).
Similarly, certain versions of clades 2 and 4 and clade-5 HerA / FtsK domains led us to a distinct class of mobile gene neighborhoods where they are coupled to a specific version of the Orc / CDC6 family of AAA+ ATPases (Figure 7 ).These neighborhoods usually encode the following core domains: (i) an Orc / CDC6 ATPase; (iii) an inactive REase (iREase); (ii) an active REase belonging to either of two distinct families, one of which is also found in the AbiJ antiphage systems ( 72 ,77 ); (iv) a HerA / FtsK; (v) a wHTH.While these domains are combined in various arrangements in single or multiple polypeptides, the iREase domain is invariably inserted between the core AAA+ ATPase and the C-terminal wHTH characteristic of the Orc / CDC6 family ( 103 ).A subset of these operons also displays a further gene for a rapidly evolving protein with 2-4 wHTH domains related to DndG / DptG ( 16 ), sspB ( 22 ) and the FokI DNA-binding domain ( 83 ,84 ) (see above), while others may feature a 3 → 5 exonuclease domain fused to the HerA / FtsK component (Figure 7 ).From the perspective of these gene neighborhoods, one observes a similar phenomenon of the 'capture' of HerA / FtsK domains from different sources.For instance, one set of HerA / FtsK domains in these systems are fused to a N-terminal FtsK α domain that is otherwise only found in certain DndH proteins (clade 2, above) and the classical cellular FtsK and not in other HerA / FtsKs ( 101 ).Therefore, this version was captured from the cellular versions involved in chromosome segregation.Conversely, a subset of these neighborhoods encodes another version of the HerA / FtsK domain that lacks a fusion to the FtsK α domain but has a 4-helical bundle inserted into it, evidently indicating its acquisition from a distinct source (Figure 7 ).Independently of the above HerA / FtsK association, related Orc / Cdc6 genes are also found in further mobile gene neighborhoods where they are coupled in operons with either genes coding for (i) a REase domain fused to the modified DNA-binding ASCH domain and a DndG-like triple wHTH protein or (ii) a second Orc / CDC6 protein with an intact Walker A but a disrupted Walker B and a KAP P-loop NT-Pase previously implicated in anti-phage immunity ( 108 ).The active Orc / CDC6 copy from these neighborhoods are also often fused to a STAND NTPase (e.g.NACHT) or pentapeptide repeats or a TIR domain that possesses NADase activity ( 109 ).This web of associations to domains that have been implicated previously in anti-selfish element immunity strongly supports a similar role for these systems ( Supplementary Data S1 ).
Lastly, clade-6 also contains HerA / FtsK domains found in a group of novel genomic elements that are widespread in fungi (Ascomycota, Basidiomycota, Chytridiomycota) and other eukaryotes ( Carpediemonas , Kipferlia , some plants, stramenopiles, haptophytes, Capsaspora ) (Figure 7 ).These are characterized by a 4-helical insert in the HerA / FtsK P-loop NTPase domain and a peculiar C-terminal extension with a conserved motif (hxhxxRxRhsxDsGx[ST] signature), which is retained even in the cognate versions captured by DndH.AF2 modeling suggests that this extension projects out from the core and is evidently available for interaction with some other protein.Most eukaryotic members of this clade are characterized by an N-terminal extension that appears to be derived from a wHTH and 1-2 further C-terminal wHTH domains.Given that these versions show no fusions / genomic linkages to nuclease domains, they are likely to mediate their function (such as mobility) by interacting with some conserved cellular component via the above-mentioned C-terminal extension ( Supplementary Data S1 ).

Functional and evolutionary implications
While previous studies have postulated that DndABCDE and DndFGH together constitute a unique PT modification and restriction system, a systematic survey of those results together with our current analysis raise serious questions about their functional coupling.Instead, our results favor a scenario wherein the two, form distinct and self-contained antiinvader conflict systems that might cooperate on occasion.Indeed, multiple studies from the growing diversity of such conflict systems, including R-M, CRISPR / Cas, SMODS and the NAD + -targeting apparatus, point to a degree of cooperation between them, wherein the failure of one system might be backed up by another system that is otherwise capable of independent action ( 5 , 9 , 90 , 110 ).

A revised model for the action of DndABCDE and associated nucleases
While our comprehensive survey indicated a high degree of decoupling between the DndABCDE and DndFGH modules (Figure 8 A, Supplementary Data S1 ), it recovered a strong gene-neighborhood association between the Dnd modification modules and several other endoDNases.One of these is the wHTH + SAD + HNH protein, wherein the wHTH domain corresponds to the domain previously shown to bind PT in DNA ( 80 ).Similarly, we found other endoDNases with potential modified DNA discriminant modules that could serve as restriction components that go with the modification module.Hence, we reconstruct a scenario wherein: (i) DndD / CxC ABC ATPase and DndE form a complex akin to the wellstudied ABC ATPase DNA manipulation complexes seen in chromosome dynamics and DNA repair / recombination ( 24 ).The different versions of DndE co-evolving with the cognate DndD / CxC ABC ATPase could account for diversity in the modification target motifs.DndE likely scans the self-genome in conjunction with ABC ATPase-catalyzed DNA-looping to recognize such sites.In the sspABCDE systems, where the ABC ATPase is replaced by an ORC / CDC6 clade AAA+ AT-Pase, the scanning likely occurs through an alternative mechanism.Akin to other ORC / CDC6-type ATPases, these are predicted to form toroidal rings around DNA duplexes and translocate along them together with the triple-wHTH SspB recognition component.(ii) Once DndE and DndD / CxC ABC complex bind to a target site, it either triggers the associated PT-modification apparatus (DndA and DndC) to incorporate sulfur into the DNA backbone or potentially signals its modification status.(iii) In the restriction step, the reader modules fused to the endoDNases, such as the PT-DNA recognizing wHTH domain, discriminate self from non-self DNA and trigger invader restriction through DNA cleavage (Figure 8 B).
Overlayed on this core process are several distinct regulatory and / or backup systems.First, 52% of all taxa with the Dnd modification system also code for the ParB superfamily protein DndB.While it has been proposed to be an ATP-binding transcription factor, other studies on ParB proteins in conflict suggest that they might act as antiviral effectors that degrade nucleotides or as nucleases ( 19 , 111 , 112 ).Indeed, DndB could also act similarly by guarding against failure either by targeting phage-generated modified nucleotides or suicidally by limiting nucleotides like ATP in the general pool.Further, it could also serve as a negative regulator of the modification system by limiting ATP needed for PT incorporation.Second, as we noted above, several of these systems also code for standalone RNases and DNases that could act as suicide effectors in the event of failure of the main restriction system (Figures 4 A, C, 8B ).Our investigations also raise a further question -whether the DndD and CXC-ABC-ATPasecentered systems exclusively facilitate PT modification.We found that PT modification genes are not found in association with the DndD-DndE core in numerous taxa.While it has been proposed that housekeeping homologs of the cysteine desulfurase (IscS, NifS, SufS) could substitute for DndA ( 15 ), there are no functionally equivalent, close PP-loop ATPases to take the place of DndC ( 18 ,51-54 ).Hence, it is possible that some of these systems mediate entirely different modifications.At least in the case of the CXC ABC-ATPase-centered systems, we detected a frequent coupling with cytosine methylases ( 24 ) suggesting that modified cytosines could substitute for PT as in conventional R-M systems (Figures 4 A: iv, 8 B).

DndFGH constitutes a standalone counter-invader defense system
Our comprehensive dissection of the domain architectures of the three universal components of this system allows for a revision of the possible model of its action (Figure 8 C).This analysis revealed several unprecedented features of this system: (i) Taken together, the three subunits field a large number of distinct DNA binding domains, viz., a triad of wHTH domains (DndG), a C-terminal domain of the SWI2 / SNF2 SF2 helicase module, a further HTH and two TRD domains (DndH).This unusual feature indicates that the system forms several distinct concomitant or successive contacts with DNA, some of which might be sequence-specific.(ii) Together, the system displays 3 enzymatic domains, namely a P-loop kinase (DndF), a REase and a HerA / FtsK class P-loop ATPase (DndH).While the P-loop kinase domain is rather derived, it possesses an intact NTP-binding site suggesting that, in the least, it can act as a sensor of nucleotides.The recognition that the HerA / FtsK ATPase domain of DndH is captured from various distinct systems featuring such ATPases is one of the most striking findings of this study.Such ATPases play a role not only in the segregation of the cellular chromosome and plasmids but also in pumping the DNA of various conjugative selfish elements and phages with internal membranes ( 101 ).Hence, given that they have been repeatedly acquired from various sources in different DndH proteins, distinct versions of this system might be able to target different plasmids and other selfish elements depending on the type of HerA / FtsK domain it carries.(iii) Despite having an active REase domain, it is notable that the DAM domain in this system is catalytically inactive.This suggests that rather than modifying the target sites, it could prevent modifications of such sites by methylases encoded in selfish elements by binding as a dominant negative.Finally, as with several other conflict systems, versions of this system display a previously unnoticed repertoire of variable backup effectors that might be activated in the event of system failure (Figure 8 C).
Based on the above, we propose that the DndFGH system adopts a multi-pronged sensory approach that allows it to deploy restriction activity of the REase domain despite lacking an inbuilt modification module.Key to this are the multiple DNA-binding domains that evidently mediate specific target recognition.Notably, the triple-wHTH DndG, which is related to SspB from the sspABCDE DNA-modification systems, might explain how a subset of DndFGH systems can use PT-modified DNA as a discriminant.In addition to DNA recognition, the P-loop kinase domain is predicted to enable the sensing of modified nucleotides.Further, the specific recognition / localization to invader molecules could occur via the α+ β domain in DndF and the two β-sandwich domains of DndH.Thus, we suggest that this system precisely homes in on invader DNA through a combination of macromolecular determinants and soluble signals of infection.Further speci-ficity in terms of the location of action might be provided by the HerA / FtsK domain -given the roles of the homologous versions captured by these systems, it likely intercepts invader DNA that is segregating during cell division (plasmids) or being pumped into recipient cells (conjugative elements) or packaged into capsids (phages).Finally, earlier studies have indicated that several phages and conjugative elements encode their own DAM methylases that are delivered alongside the invading DNA and modify it to evade host restriction (113)(114)(115).The inactive DAM domain from DndH could interfere with such an evasion mechanism by acting as a dominant negative that prevents the action of the invader methylases by rendering its target sites inaccessible (Figure 8 C).
'HerA / FtsK capture systems' are a more general theme in antiinvader conflict systems A striking result from this study is that DndH is one of several distinct systems that exhibit a similar phenomenon of incorporating HerA / FtsK domains from evolutionarily distinct sources (Figure 7 ).As in DndFGH systems, we propose that different versions of the HerA / FtsK domain from these systems, along with frequently associated HTH domains, allow the localization of the defense apparatus to specific invader elements even as they are engaged in the process of genome segregation, transfer, or packaging.Further, the HerA / FtsK domain might help translocate invader DNA through pumping action to bring suitable sites on the targeted genome to the restriction apparatus.Like DndH, these new systems feature active endoDNase domains that could act as their restriction component.Notably, the first of these newly recovered systems contains an inactive D5-type SF3 helicase domain that is reminiscent of the inactive SWI2 / SNF2 C-terminal P-loop domain in DndH.In both these cases, these inactive derivatives of helicases could help bind replication intermediates of the invasive elements that frequently form during transfer-coupled replication.In the second of these systems, the HerA / FtsK domain is coupled to AAA+ ATPases of the Orc / CDC6 family.Several studies in the past few years have revealed that beyond their role in the replication of cellular genomes, members of the Orc / CDC6 family are the mainstay of various biological conflict systems ( 103 ).On one hand, these include ATPase domains / subunits of transposases (e.g.TN7 and Mu) and a widespread class of enzymatic effectors (the eukaryotic CR-effectors and their prokaryotic relatives) derived from them that tend to combine the CDC6 / ORC1 clade NTPase domain with a C-terminal REase domain.On the other hand, this family also includes the STAND NTPases that are major players in immunity across the Tree of Life ( 102 ).We suggest that in the systems combining the ORC / Cdc6 and HerA / FtsK ATPase domains, the former might help the restriction apparatus localize to specific sites on the DNA, even as the latter intercepts the selfish element during segregation or transfer.

Conclusions
While the Dnd systems have been the subject of intense scrutiny over the past several years, there remain several open questions regarding their mode of action and role in defense against selfish elements.Furthermore, inaccurate claims have been made regarding the constituent protein domains of components of these systems.We addressed these issues, thereby providing an improved understanding of them.Our work argues for a high degree of functional decoupling between the DndABCDE modification module and the previously implicated DndFGH systems.Instead, we present evidence that the modification module more frequently associates in the same gene neighborhood with other endoDNase domains fused to DNA-binding domains, which might serve as the actual restriction components.We also present evidence for the frequent coupling of multiple copies of Dnd systems centered on the ABC ATPase and DndE-like dyad in the same genome.Several of these systems, especially those centered on the CXC ABC ATPase, lack neighboring genes for the usual PT modification enzymes.Instead, they display one or more cytosine methylases, suggesting that a subset of these systems might facilitate modifications other than PT.Furthermore, we definitively establish that all DndEs are composed of two RHH domains, dispelling earlier structural and functional misconceptions regarding this protein.We also present evidence for the diversification of DndE and its coevolution with its DndD / CxC ABC ATPase partner.This allowed us to propose a model where it acts analogously to the Kleisin or ScpB DNA-binding component observed in ABC ATPases mediating chromosome dynamics, enabling different Dnd sys-tems to recognize and / or modify highly divergent sites on the self-DNA.
We demonstrate that DndFGH constitutes a separate counter-invader system and identify several previously undetected domains in its subunits.This leads us to postulate that it receives multiple sensory inputs, both in the form of soluble nucleotides and invader macromolecules, to specifically deploy its REase domain despite lacking an inherent modification component.Moreover, we show for the first time that DndH contains a HerA / FtsK ATPase domain, which, in different versions of this protein, is acquired from diverse sources featuring the aforementioned domain.We term this phenomenon 'HerA / FtsK capture' and demonstrate that DndH is just one of several potential anti-invader systems incorporating HerA / FtsK ATPase domains from different sources.
Among the key testable hypothesis with potential for promising results is an investigation of the proposed 'cladespecific' DNA target sites for DndD / CxC-DndE pairs using assays to query DNA-binding specificity.Similar tests for DNA binding activity based on the domains identified here could also help test the recognition and discrimination of unmodified versus modified DNA, as well has the specificity of the domains for different modifications such a PT, 5mC or 5hmC.Regarding the DndFGH system, our results would help guide experiments to establish the specificity of the multiple DNA-binding domains in the overall complex and also test the hypotheses regarding the role of the nucleotide kinase domain in DndF and of inactive DAMs as dominant negative regulators interfering with invader methylases.The identification of the capture of HerA / FtsK domains in DndH and other systems not only opens new vistas to explore the activity of these ATPases but also suggests the use of subcellular localization studies to explore the role of invadergenome-interception during chromosome segregation, transfer or packaging.Hence, we hope that this comprehensive analysis will facilitate future wet-lab investigations not only of Dnd but also the broader spectrum of anti-invader systems discussed here.

Figure 1 .
Figure 1.Updated phyletic distribution of Dnd systems.The heatmap shows the overall phyletic distribution of Dnd genes where the bo x es are coloured based on the number of unique NCBI tax IDs at the species le v el against each phylum or taxonomic group shown on the Y-axis.The panel on the right shows the range and the respective colour code.The individual Dnd genes and their occurrence percentage are shown at the top.DndD and CxC ABC-ATPase distributions are presented separately.

Figure 2 .
Figure 2. Sequence-Str uct ure synapomorphies of DndE clades.( A-D ) R epresentativ e str uct ure topology diagrams sho wing k e y features: (A) a simple tri-helical HTH; (B) a canonical RHH MetJ / Arc-like dimer containing two RHH domains; (C) clade-1 DndE; (D) clade-2 DndE.Key residues on the core stabilizing helix and the antiparallel strands are depicted.( E-H ) R epresentativ e 3D str uct ures showing key str uct ural synapomorphies: (E) clade-1 DndE; (F) clade-2 DndE; (G) ball and stick representation of k e y residues forming the DNA binding interface and interactions between the strands and the helices in DndE; (H) superimposition of DndE and F-plasmid transfer factor TraY showing identical 3D str uct ural topology; ( I ) multiple sequence alignment showing consensus representative sequence of each DndE subclade where the background color code represents the mean column-wise Shannon entrop y v alues of the corresponding position obtained from separate multiple alignments for each subclade.The entropy of the o v erall alignment reflects the high sequence heterogeneity and f ast-e v olving aspects of DndEs.K e y residues that are moderately conserved within each subclade are color-coded based on their properties: positively charged (green); negatively charged (red); polar (light blue); hydrophobic (yellow); and small (grey).( J , K ) Subclade-wise entropy plots of DndEs.Shannon entropy data, computed using the regular 20 amino acid alphabets, are depicted above the zero line in orange shades.Shannon entropy data, computed using a reduced 8-residue alphabet (based on chemical properties), are displa y ed belo w the z ero line in blue shades.High entropy in both alphabets indicates potential positive selection for amino acids with diverse chemical characteristics at those positions.

Figure 3 .
Figure 3. Classification and congruent e v olutionary trajectories of the DndE and DndD / CxC pairs.( A ) Maximum-likelihood tree topologies of DndE and DndD-CxC are shown as mirror trees.The tree on the left shows the grouping of DndE sequences into two major clades (Clade-1 and Clade-2) and multiple subclades.Lik e wise, the tree topology on the right clearly distinguishes the operonically linked DndD and CxC into two major clades and multiple subclades, corresponding to Clade-1 and Clade-2 DndEs, respectively.Percentage bootstrap values obtained using 1000 replicates are shown for each major node.( B , C ) CLANS-based clustering using the pairwise 'one-to-one' alignment-based approach precisely reproduces the grouping of DndE and DndD / CxC, respectively, into major clades and subclades as shown in the phylogenetic tree topologies.

Figure 4 .
Figure 4. Gene-neighborhoods and phyletic distribution of operonically linked DndDE and CxC-DndE anchored systems and the rectified annotation of PbeABCD system.( A : i-iii) Representative gene-neighborhoods showing the tandem or immediately adjacent occurrences of (i) Dnd systems centered on DndDE and CxC-DndE core; (ii) DndDE pairs and other Dnd genes without the CxC-DndE pair; (iii) DndDE pairs and associated Dnd genes occurring along with the CxC-DndE anchored system.Gene neighborhoods are shown as box arrows, with the direction of each arrow pointing to the orientation of individual genes.The domain names and the domain architectures of each gene product are shown within bo x arro ws and are colored accordingly.The dotted lines at the bottom mark the boundaries of the DndDE (red) and CxC-DndE (blue) anchored systems.Each domain is assigned a unique color, with nuclease domains highlighted in the same color for clear distinction and prominent visibility within the neighborhoods.SP in the gene-neighborhoods denotes the small disordered protein (A: iv) R epresentativ e gene-neighborhoods showing the occurrence of cytosine methylases with the CxC-DndE anchored systems.( B ) Rectified annotation of the so-called pbeABCD system.Gene-neighborhoods were reconstructed using the same assembly as utilized in the earlier study and are mapped to the CxC-centered system.( C ) Representative gene neighborhoods showing the occurrence of diverse nucleases within the gene neighborhoods of the Dnd modification system.D) The phyletic distribution of various contextual connections and occurrences of Dnd modification systems across different phyla or tax onomic groups.T he Y-axis sho ws different categories: (i) the complete DndABCDE operon; the DndBDE triad while lacking DndA and DndC; only the DndDE dyad being present without other PT modification genes.The 'Duplication' category includes instances where DndD and CxC anchored systems are found sequentially within the same genome, as indicated in panel A. The 'Nucleases' category encompasses all occurrences of multiple distinct nucleases associated with the Dnd modification systems.As shown in Figure 1 , the bo x es are color-coded to represent the number of unique NCBI tax IDs at the species le v el within each ph ylum or tax onomic group.

Figure 5 .
Figure 5. Str uct ural feat ures and domain architect ures of DndG and DndF.( A , B ) 3D str uct ure and domain architect ures of DndG and DndF, respectively.( C-E ) str uct ural topology diagrams sho wing k e y features of individual domains present in the DndF.( F ) Ph yletic distribution of multiple domain fusions occurring N-terminal to the P-loop kinase domain of DndF.As shown in Figure 1 , the bo x es are color-coded based on the number of unique NCBI tax IDs at the species le v el against each phylum or taxonomic group.

Figure 6 .
Figure 6.Str uct ural feat ures and domain architect ures of DndH.( A ) Overall 3D str uct ure and representative domain architect ures of DndH showing all individual domains.(B-D) 3D str uct ure and str uct ure topology diagrams showing key features of inactive C-terminal domain of SWI2 / SNF2-type SF2 helicase module fused to an HTH domain ( B ), β-sandwich domains ( C ), and target recognition domain ( D ). ( E ) 3D str uct ure of TRD + iMtase + REase triad and its comparison with equivalent domains from BpuSI (Type II-G).(F-H) 3D str uct ure and str uct ure topology diagrams showing key features of inactive Mtase ( F ), REase ( G ) and HerA / FtsK ATPase ( H ).

Figure 7 .
Figure 7. Diversity and functional radiation of HerA / FtsK capture systems.Maximum-likelihood tree topology anchored on HerA / FtsK domain of DndH informs six distinct and well-supported clades.HerA / FtsK Clades 1-6 are colored separately, and corresponding bootstrap values obtained from 10 0 0 replicates are shown as percentages on major nodes.The ancestral counterparts of six DndH HerA / FtsK clades f ormed w ell-defined groups (highlighted as o v al) and are clustered basal to the corresponding DndH HerA / FtsK clades with high confidence support.T he conte xtual connections of the selected members of these ancestral counterparts are shown adjacent to each clade.The selected members are shown with filled circles with respective color codes of the cluster.As shown in Figure 2 , gene neighborhoods are shown as box arrows, and the domain architectures of each gene product are shown within box arrows and are colored accordingly.

Figure 8 .
Figure 8. Phyletic decoupling, new functional models, and the overall summary of DndABCDE with associated nucleases and DndFGH systems.( A ) Phyletic distribution showing different contextual occurrences and decoupling of the Dnd systems.As shown in Figure 1 , the boxes are color-coded based on the number of unique NCBI tax IDs at the species le v el against each phylum or taxonomic group.( B ) Schematic diagram showing the functional mechanisms of the newly proposed and h ypothesiz ed model of DndABCDE and associated nucleases incorporating both the modification and restriction components.( C ) Schematic diagram showing the hypothesized functional mechanisms of DndFGH operating as a standalone counter-invader defense system.