Noncanonical prokaryotic X family DNA polymerases lack polymerase activity and act as exonucleases

Abstract The X family polymerases (PolXs) are specialized DNA polymerases that are found in all domains of life. While the main representatives of eukaryotic PolXs, which have dedicated functions in DNA repair, were studied in much detail, the functions and diversity of prokaryotic PolXs have remained largely unexplored. Here, by combining a comprehensive bioinformatic analysis of prokaryotic PolXs and biochemical experiments involving selected recombinant enzymes, we reveal a previously unrecognized group of PolXs that seem to be lacking DNA polymerase activity. The noncanonical PolXs contain substitutions of the key catalytic residues and deletions in their polymerase and dNTP binding sites in the palm and fingers domains, but contain functional nuclease domains, similar to canonical PolXs. We demonstrate that representative noncanonical PolXs from the Deinococcus genus are indeed inactive as DNA polymerases but are highly efficient as 3′-5′ exonucleases. We show that both canonical and noncanonical PolXs are often encoded together with the components of the non-homologous end joining pathway and may therefore participate in double-strand break repair, suggesting an evolutionary conservation of this PolX function. This is a remarkable example of polymerases that have lost their main polymerase activity, but retain accessory functions in DNA processing and repair.

Eukaryotic PolXs take part in base excision repair (BER) and double-strand break (DSB) repair during nonhomologous end joining (NHEJ) and V(D)J recombination. The activities of these polymerases in vitro, which are likely important for their in vivo functions, include gapfilling DNA polymerase and dRP-lyase activities (found in Pol␤, Pol and Pol) ( Figure 1A) and templateindependent polymerase and end-bridging activities (found in TdT, Pol and Pol) (3,(27)(28)(29)(30). Similarly to Pol␤, prokaryotic PolXs have the gap-filling activity and likely the 5 -dRP-lyase activity (23,25,31,32). In addition, prokaryotic polymerases were reported to possess a 3 -5 exonuclease activity that may play a role in DNA replication and repair, and an AP-endonuclease activity potentially involved in the BER pathway ( Figure 1A) (23,25,33,34).
The characteristic feature of PolX polymerases is the presence of the conserved Pol␤ fold in their catalytic palm domain, which belongs to the ancient superfamily of nucleotidyltransferases (a variant of this fold is also found in the C family of DNA polymerases) (21,35,36). Similar to other DNA polymerases, all PolXs form a common "right hand" structure, with the central polymerase part consisting of the thumb, palm and fingers domains involved in DNA and dNTP binding and catalysis ( Figure 1B). In both eukaryotic and prokaryotic PolXs, these domains are preceded by an N-terminal '8 kDa' domain, responsible for the 5 -dRP-lyase activity (31,37). Eukaryotic Pol, Pol and TdT additionally contain a BRCA1 C-terminal (BRCT) domain, which is involved in interactions with other proteins during NHEJ and V(D)J recombination ( Figure 1B) (14,(38)(39)(40). In contrast, all studied prokaryotic PolXs contain a C-terminal polymerase and histidinol phosphatase (PHP) domain, which is responsible for the 3 -5 exonuclease and AP-endonuclease activities and is absent in eukaryotic PolXs ( Figure 1B and D) (23,25,33,41). During DNA polymerization, the N-terminal, thumb, palm and fingers domains embrace the DNA substrate to bend it and position the template nucleotide into the active site, as seen in the complexes of human Pol ␤, Pol and T. thermophilus PolX with gapped DNA substrates ( Figure 1C, D and E) (32,42,43).
Despite the large number of sequenced prokaryotic genomes, the diversity of bacterial and archaeal PolXs have remained largely uninvestigated. Biological functions of PolXs in prokaryotes, their potential roles in various DNA repair pathways, and their interactomes remain mostly unknown. In this study, we present an in depth bioinformatic analysis of prokaryotic PolXs and interpret it in the context of structural and biochemical data available for bacterial polymerases of this family. We show that most prokaryotic PolXs share a common domain architecture but a significant part of them have a noncanonical structure of the active site and probably lack the polymerization activity. At the same time, the noncanonical PolXs contain the highly conserved PHP exonuclease domain and the predicted N-terminal lyase domain. Our findings are corroborated by biochemical analysis of two noncanoni-cal polymerases from Deinococcus species. We reveal a possible association of PolX genes with components of the non-homologous end joining pathway and propose that prokaryotic PolX polymerases likely have accessory functions in DNA repair beyond DNA synthesis.

Sequence and genomic analyses of prokaryotic PolXs and phylogenetic trees
The set of proteins, genomic sequences and annotations of prokaryotic genomes were fetched from the NCBI FTP site in December 2019. The search for prokaryotic PolX sequences was performed using the sequences of B. subtilis, T. thermophilus and D. radiodurans PolXs as queries. The identification of homologs of known PolX proteins was carried out using the PSI-BLAST and DELTA-BLAST programs from the NCBI-BLAST package, v.2.6.0. The search was performed with five iterations, resulting in 6362 unique PolX sequences found in 6239 genomes. To construct a non-redundant, representative sequence set for the phylogenetic and sequence analysis, the PolX sequences were clustered using the UCLUST 4.2 software (44) with the sequence identity threshold of 95%, resulting in a collection of 2935 unique sequences from 2846 genomes. The sequences were aligned by the MAFFT v.7.450 software (-genafpairmaxiterate 1000) (45). The resulting alignment is presented in Supplementary Dataset (2935 polX.fasta).
To identify domains and functionally important residues, the sequences of human Pol␤, T. thermophilus PolX and B. subtilis PolX were used as references. The columns of alignment corresponding to positions of interest were collected and analyzed in R. To estimate the lengths of PolX domains, the parts of the alignment corresponding to Pol␤ residues 178-279 for palm and 272-314 for fingers were extracted and analyzed using R packages biostrings, seqinr, ape, protr and tidyr. Structural modeling was performed with the colab version of AlphaFold2, visualizations were performed with PyMol (https://pymol.org/) and VMD programs (46,47).
The phylogenetic tree shown in Figure 2 was constructed from an alignment of 1433 unique PolX sequences obtained by further clustering of the non-redundant collection of 2935 sequences with MMseqs2 Version: 13.45111, using the sequence identity threshold level of 80% (46)(47)(48). Positions containing >50% gaps were removed from the alignment with TrimAl ver.1.2 (49). The tree was built using the IQTREE version 2.1.4-beta and visualized with iTol (https://itol.embl.de/shared/p5XbdEu6WXeO) (50). Local support values were calculated by the IQTREE ultrafast bootstrap test with 1000 replicates (51). The alignment of 1433 PolXs and the resulting tree in a machinereadable format can be found in Supplementary Dataset (1433 polX aln trim.fasta and 1433 polX tree.contree).
For analysis of genomic neighborhoods of PolX genes, only full genomes corresponding to the non-redundant collection of PolXs were selected. In total, we identified 2620 full genomes corresponding to 1249 non-redundant PolX sequences (1074 canonical and 175 altered). For each genome, genes co-directional with the PolX gene within 5 upstream and 5 downstream genes were selected, located within 300 bp from each other. The resulting databases were analyzed in R, and the number of genomes containing different Pfam families were calculated separately for canonical and altered PolXs from bacteria from various taxonomical classes. Classes with less than four genomes and Pfam families with less than 10% frequency in the class were excluded from the resulting table (Supplementary Table S4).

Cloning, expression and purification of PolXs from Deinococci and B. subtilis
The PolX gene of D. radiodurans (GenBank ID WP 010887112.1) was amplified from genomic DNA of D. radiodurans using primers Dra NdeI 5 CATTAG CATATGACCCTGCCGCCCGACGC and Dra SalI 5 -TATCTAGTCGACTTATGCACGGTCCGCCGGGCCG and cloned into pET28a between the sites of NdeI and SalI. The PolX gene of Deinococcus gobiensis (Gen-Bank ID AFD24201.1) was codon-optimized and obtained by custom gene synthesis from IDT (using two overlapping gBlocks Gene Fragments). The Nucleic Acids Research, 2022, Vol. 50, No. 11 6401 synthetic gene was amplified using primers Dgo F 5 CTTTAAGAAGGAGATATACATATGCATCAC and Dgo R 5 CTTGTCGACGGAGCTCGAATTCGT CATTACCCAGCGTTTGCACGTGC, and cloned in the same way. The PolX gene of B. subtilis (GenBank ID WP 063335829.1) was amplified from genomic DNA of B. subtilis using primers Bsu NdeI 5 cggcagccatAT-GCATAAAAAAGATATTATCCGGC and Bsu SalI 5 -gcttgTCGACTTAATCGTTGCGCTTCAGAAAT and cloned in the same way. Mutations in the polymerase catalytic triad in D. radiodurans PolX (E199A/E234A) and B. subtilis PolX (D193A) and in the PHP active site in D. radiodurans PolX (H332A/H334A) were introduced in the expression plasmids by Kunkel mutagenesis. All plasmid clones were verified by Sanger sequencing.
E. coli BL21 (DE3) cells were transformed with the PolX plasmids and several colonies were inoculated into 1 L of LB medium with 100 g/ml ampicillin, grown at 37 • C until OD 600 ∼0.5 and chilled on ice for 30 min. IPTG was added to 0.05 mM and the culture was grown at 16 • C overnight. The cells were precipitated by centrifugation and resuspended in buffer containing 30 mM HEPES-KOH pH 7.4, 1 M KCl, 10 mM K 2 HPO4, 5% glycerol, 4 mM ␤mercaptoethanol, 2 mM PMSF, and lysed with a high pressure homogenizer at 4 • C. The lysate was cleared by centrifugation at 15 000 rpm on a Hitachi CR22N centrifuge at 4 • C and loaded onto a 1 ml Ni-Sepharose column (GE Healthcare), equilibrated with buffer containing 30 mM HEPES-KOH pH 7.4, 1 M KCl, 10 mM K 2 HPO 4 , 5% glycerol. The column was washed with the same buffer containing 20 mM imidazole, and PolX was eluted by the same buffer containing 300 mM imidazole. Fractions containing PolX were pulled, diluted ten times by the same buffer without KCl and loaded onto a 1 ml Heparin-Sepharose column (GE Healthcare), equilibrated with the same buffer containing 80 mM KCl. PolX was eluted by a KCl gradient from 80 to 800 mM (40 ml). Fractions containing PolX were pulled, diluted 5 times by the same buffer without KCl and loaded onto a 1 ml MonoQ column (GE Healthcare), equilibrated with the same buffer with 50 mM KCl. PolX was eluted by a KCl gradient from 80 to 800 mM, fractions containing PolX were pulled, aliquoted, frozen in liquid nitrogen and stored at -80 • C. The purity of the samples was at least 98% based on SDS-PAGE analysis (Supplementary Figure S3A). The stability of the B. subtilis and D. radiodurans Pol Xs (1.6 and 0.3 mg/ml respectively) in the phosphate buffer pH 7.5 were measured by thermal unfolding using a Tycho NT.6 instrument (NanoTemper Technologies, Germany). Circular dichroism spectra for the same PolX preparations were measured on a Chiroscan CD spectrometer (Applied Photophysics) with 2 nM bandwidth at 22 • C (Supplementary Figure S3B).

In vitro analysis of PolX activities
To obtain DNA substrates for analysis of DNA polymerase and exonuclease activities of PolXs ( Figure 4B), 5 -P 32labeled primer (400 nM) and unlabeled template (440 nM) oligonucleotides were annealed in 100 mM KCl (5 min in-cubation at 70 • C followed by cooling down to 25 • C at ∼1 • C/min). For assembly of gapped substrates, a third 5 -P or 5 -OH downstream nontemplate oligonucleotide was added (440 nM).
PolXs were first incubated for 5 min at 30 • C in the reaction buffer containing 30 mM HEPES-KOH pH 7.

Phylogenetic and structural diversity of prokaryotic PolXs
To analyze the diversity of prokaryotic PolXs, we searched for PolX sequences in the NCBI Refseq genomic database based on homology with previously studied bacterial PolXs. In total, we identified 6362 PolX sequences in about 13% of bacterial and 31% of archaeal full genomes. For further analysis, we used a non-redundant collection of PolX sequences with <95% identity that contained 2935 unique polymerases, 2639 from bacteria and 296 from archaea ( Figure 2A). The number of polymerases found in different Bacterial and Archaeal classes was highly uneven, partially as a result of highly different numbers of sequenced genomes in each phylum ( Figure 2C). Class Bacilli (734 sequences), containing a large number of sequenced genomes of important human pathogens and cohabitants, was most abundant among Bacteria, other abundant classes included bacteria from the human microbiome. Class Halobacteria/Haloarchaea (224 sequences), containing common laboratory models, was most abundant among Archaea; 288 out of 296 archaeal sequences belonged to the Euryarchaeota phylum.
To identify key structural and functional motifs of PolXs, we performed multiple sequence alignment of the PolX sequences and defined the boundaries of individual protein domains using T. thermophilus, B. subtilis and human PolXs as references (see Materials and Methods) (Sup- plementary Dataset). The mean length for all prokaryotic PolXs is 573.1 ± 23.7 amino acid residues and the median is 573 residues; the length of 10 sequences is <400 and the length of 9 sequences is >700 residues, indicating that the collection largely includes full-sized PolXs (Figure 2D). The overall domain arrangement is well conserved in prokaryotic PolXs, and the majority of them contain five structurally distinct domains from the N-to C-end: 8 kDa dRP-lyase, thumb, palm, fingers, and 3 -5 exonuclease PHP domains ( Figure 1B). However, we revealed significant variations in the structure of DNA polymerase domains involved in catalysis, which are described in detail below.
Analysis of the maximum likelihood phylogenetic tree built from the amino acid alignment of PolXs showed that some bacterial phyla are split and interleaved in the PolX tree ( Figure 2A). In particular, a substantial number of PolX sequences from Deinococcus, Proteobacteria and Firmicutes are closely related to PolXs from Bacteroidetes, while the rest PolXs from these phyla form monophyletic groups or are related to PolXs from other phyla ( Figure  2A). These data indicate likely horizontal transfer of PolX genes between bacterial phyla. The largest group of PolXs found in archaea have a monophyletic origin and are distantly related to bacterial PolXs from Actinobacteria (Figure 2A). In addition, several smaller groups of archaeal PolXs are found in other branches of the PolX tree and are interleaved with bacterial sequences. This indicates that some archaea could have obtained the PolX gene via horizontal transfer from bacteria, which is not uncommon in archaea in general and in Haloarchaea in particular (53,54).

Noncanonical PolXs have an altered catalytic site in the palm domain
The palm domain of PolX belongs to the Pol␤-like nucleotidyltransferase superfamily (21) and has an ␣␤␣␤␤␣␤␤␤ topology, in which five ␤ strands form one mixed ␤ sheet containing three conserved acidic residues (usually three aspartates), involved in the binding of catalytic metal ions, in adjacent ␤ strands ( Figure 3A,B). Most prokaryotic PolXs (2164; 72.5% in our dataset) contain three acidic residues (aspartate or rarely glutamate) in corresponding positions and probably retain the DNA polymerase activity ( Figure 2B). We classify these PolXs as canonical polymerases. Surprisingly, besides the prevailing PolX variants with the canonical catalytic triad, we identified a group of polymerases (809; 27.5%) that partially or totally lack the conserved acidic residues in the polymerase active site. We classify these polymerases as altered or noncanonical PolXs (Figures 2 and 3A, B). Variations of the active site motif in the noncanonical polymerases include substitutions of one, two or all three aspartate residues with non-charged or even positively charged residues, and comprise 302 unique variants ( Figure 2B, Supplementary Table S1, Figure 3A). The substitutions include, but are not limited to, lysine, arginine, threonine, valine, alanine etc., and in the majority of the cases substantially change the electrostatic environment of the active site region. Therefore, initial inactivation of the polymerase site in noncanonical polymerases was likely followed by additional substitutions in the non-functional active site thus generating many triad variants.
Noncanonical polymerases with altered catalytic triads are found in different bacterial phyla but most of them form a single cluster on the phylogenetic tree and likely have monophyletic origin (Figure 2A). This cluster is mainly formed by PolXs from the Bacteroidetes and Deinococcus-Thermus phyla, and most PolXs from these phyla are noncanonical. Interestingly, in the class Deinococci, which includes previously studied PolXs from D. radiodurans and T. thermophilus, altered polymerases belong mostly to the genera Meiothermus and Deinococcus, while canonical polymerases belong mostly to the genus Thermus. The main cluster of noncanonical polymerases also contains several PolXs from other phyla including Proteobacteria and two polymerases of Acidobacteria and Rhodothermaeota. The phylogenetic relatedness of the majority of noncanonical polymerases suggests their common evolutionary origin while the presence of related PolX variants in unrelated bacterial lineages indicates their horizontal transfer, similarly to canonical PolXs (19).
In addition, there are several smaller groups of noncanonical PolXs, separated from the main cluster, found in bacteria and archaea ( Figure 2A). In bacteria, many of these polymerases are found in Firmicutes. Among Archaea, 11 from 12 altered polymerases (from the nonredundant collection of PolX sequences) belong to the class Methanomicrobia and also form a separate clade from the majority of noncanonical PolXs. This suggests that noncanonical PolXs with substitutions in the polymerase active site have likely appeared several independent times in the evolution. Further research is needed to fully understand the origin and evolution of noncanonical PolX, including analysis of additional PolX sequences from many underrepresented prokaryotic phyla.
In addition to substitutions of the catalytic residues, many noncanonical PolXs bear deletions in the palm domain (Figures 2A and 3A). The palm domain length is quite constant in canonical polymerases, with the mean domain size of 80.93 residues (95% CI 80.85-81.01) (Supplementary Figure S1). In comparison, the average protein length of noncanonical PolXs is shifted to smaller values in comparison with canonical polymerases ( Figure 2D). A particular group of noncanonical PolXs containing the shortest palm domains with deletions of 13-26 amino acids (mean domain size of 56.6 residues [95% CI 56.3-57.0]) is clustered together on the PolX tree (red sector in the main cluster of noncanonical PolXs in the palm ring in Figure 2A, highlighted with a dashed line in Supplementary Figure S1). Most noncanonical polymerases with truncated palm domains (<77 amino acids) belong to classes Alphaproteobacteria (86, the group with the shortest palm variants, 54% of truncated PolXs), Deinococci (23, 14.5%), and Saprospiria (13,8.2%) (Figure 2A).
In the two solved structures of prokaryotic PolXs from T. thermophilus and D. radiodurans, the palm domain adopts the classical Pol␤ nucleotidyltransferase fold ( Figure 3B, C) (28,32,35,(55)(56)(57). The overall organization of the T. thermophilus PolX active site is very similar to human Pol ␤, while D. radiodurans PolX has significant differences and contains an altered catalytic triad, AEE (56). Furthermore, D. radiodurans PolX has a deletion of 7 amino acids in comparison with T. thermophilus PolX, which results in significant shortening of the ␤ strand E (Figure 3B and C). A structural model of Mesorhizobium wenxiniae PolX con-taining a DAR triad in the catalytic site reveals an even more drastic deletion (26 residues) in the palm domain, in particular of the ␣ helix M and ␤ strands E and F ( Figure 3B and C).
It should be noted that the amino acid context of the substituted triad residues is well conserved in noncanonical PolXs, which allows their unambiguous identification in most sequences ( Figure 3A, Supplementary Dataset). Furthermore, substitutions and deletions in noncanonical PolXs are unlikely to represent sequencing artifacts, since they are found specifically in the palm and fingers domains, but not in other parts of PolX (see below). Moreover, most noncanonical PolXs are clustered on the phylogenetic tree, suggesting their evolutionary relationship (Figure 2A). In addition, we performed Sanger sequencing of noncanonical PolXs for several bacterial species from our laboratory collection including Flectobacillus major, Belliella baltica and Pedobacter insulae, all with altered catalytic triads and truncated domains. In all cases, the reported changes were present in the sequences, confirming the correctness of PolX sequences deposited in the genomic database.
The importance of the dNTP binding residues for catalysis was also confirmed in studies of prokaryotic PolXs. In particular, substitutions of residues corresponding to Pol␤ N279 in PolXs from B. subtilis and T. thermophilus (N263 and S266, respectively) significantly affected nucleotide incorporation (32,55,68). Interestingly, both B. subtilis and T. thermophilus PolXs contain a lysine residue in place of D276 in Pol␤ (K260 and K273, respectively). This residue was proposed to stabilize the incoming nucleotide, and its substitutions lowered the affinity of prokaryotic PolXs to dNTP substrates (32,68). The presence of a lysine at this position can explain the ability of T. thermophilus PolX to form a stable complex with dNTP in the absence of DNA and may favor an unusual mechanism of nucleotide incorporation, in which the binding of dNTP precedes the binding of DNA (32). Interestingly, a basic residue in this position is also present in TdT and Pol but not in Pol and viral ASFV PolX, all of which can also bind dNTP in the absence of DNA and are capable of non-templated DNA synthesis, suggesting that this residue is not the sole determinant for such interactions (20,(69)(70)(71)(72).
Our analysis demonstrated that the asparagine residue corresponding to N279 in Pol␤ is highly conserved in canonical PolXs (92.87% N) and is substituted by a hydrophobic residue in almost all altered polymerases (Supplementary Table S2). The basic residue in position corresponding to D276 of Pol␤ is also conserved among canonical polymerases (K 84.8%, R 6.8%), suggesting that most of them use a similar mechanism for dNTP binding. In contrast, this residue is not conserved in noncanonical polymerases (K 7.6%, R 4.2%), and is often substituted with E (24.1%) (including PolX from D. radiodurans), A (21.5%) or P (14.9%) (Supplementary Table S2). Furthermore, residues corresponding to the cis-peptide bond motif 274-GS-275 in the fingers and residue R183 in the palm domain in Pol␤ are highly conserved in canonical prokaryotic PolXs (97.6% and 100% respectively), suggesting that these polymerases preserve a functional conformation of the fingers domain during catalysis. In contrast, the GS motif is much less conserved in noncanonical polymerases (42%) and is often substituted with GN, AS or AA. Finally, the steric gate motif is found in most canonical PolXs (YF in 54.3% and HF in 40% of sequences) but is not at all conserved in altered polymerases (Supplementary Table S2). Together, the absence of conservation of key residues of the dNTP binding site in the noncanonical polymerases suggests that they have an impaired ability to coordinate incoming nucleotides in the active site.
Many altered polymerases also have a truncated fingers domain in comparison to canonical PolXs (Figure 2A, Supplementary Figure S1). The deletions can remove up to three successive ␤ strands and a part of the ␣-helix, as revealed by structural modeling of PolXs from Meiothermus silvanus and its relatives ( Figure 3D). In the complex of T. thermophilus PolX and Pol␤, this part of the fingers domain interacts with the template DNA strand, and its absence in noncanonical polymerases may potentially affect their interactions with DNA ( Figure 3D). Truncation of the fingers domain often accompanies deletions in the palm domain. In total, we revealed 55 such 'doubletruncated' (palm < 77 amino acid residues, fingers < 55 residues) polymerases among the 2935 non-redundant PolX sequences. All of them also have altered catalytic triads. The double-truncated polymerases are abundant in the phyla Bacteroidetes (29 PolX variants) and Deinococcus-Thermus (17 PolX variants).
Overall, these results indicate that the noncanonical polymerases have a degraded active site with multiple substitutions and deletions in both the palm and fingers domains involved in catalysis.

High conservation of the nuclease domains in noncanonical PolXs
The C-terminal exonuclease PHP domain is specific for prokaryotic PolXs and is absent in eukaryotic PolXs ( Figure  1B). The PHP domain is also found as an additional domain in prokaryotic C family DNA polymerases and as a standalone domain in histidinol phosphatases (22). In replicative C family DNA polymerases, it can be inactive due to sub-stitutions of catalytic residues (Pol III in E. coli) or active (Pol C in B. subtilis, M. tuberculosis, T. thermophilus), thus providing the proofreading activity during DNA replication (73)(74)(75)(76).
The PHP domain of prokaryotic PolXs was reported to have the 3 -5 exonuclease, AP-endonuclease, 3 -phosphodiesterase, and 3 -phosphatase activities (23,25,33,68,77). The 3 -5 exonuclease activity of prokaryotic PolXs was observed on single-stranded DNA as well as on primer/template substrates and was shown to be modulated by the secondary structure of the DNA substrate (23)(24)(25)34). All catalytic activities of the PHP domain depend on the same metal-chelating (Mn 2+ -dependent) active site, which is formed by four motifs with nine conserved residues (the HHHEHHEDH consensus) that coordinate divalent metal cofactors ( Figure 4A, Supplementary Figure  S2). Structural comparisons of the PHP domains from the canonical T. thermophilus and altered D. radiodurans PolXs and a modeled structure of M. wenxiniae PolX revealed almost no differences in the positions of the active site residues, which similarly coordinate two or three divalent cations ( Figure 4A) (32,56).
The PHP domains of the majority of bacterial and archaeal PolXs included in our analysis retain 8-9 conserved residues in the active site (Supplementary Figure S2). In both types of PolXs, the most abundant motif is the HH-HEHHEDH consensus (88% in canonical and 98% in altered polymerases, including D. radiodurans PolX, among the 2935 non-redundant PolX sequences). Variations of this motif are more common in canonical polymerases and include HHPERHEDQ (2.7%), HHRERHEDC (2.3%), QH HEHHEDH (1.3%, including T. thermophilus PolX) and HHRERHEDM (1%). Together, the data indicate that the active site of the PHP domain is extremely conserved in prokaryotic PolXs, suggesting that its functional activities are important in both types of polymerases.
In addition to the conserved C-terminal PHP domain, most prokaryotic PolXs also contain an intact N-terminal (8 kDa) domain ( Figure 1B). In eukaryotic PolXs, this domain together with thumb participates in the binding of gapped DNA substrates and contains residues responsible for the dRP-lyase activity (3). In Pol␤, it plays the key role in the processing of gapped DNA and directly recognizes the 5 -P or 5 -dRP groups of a gap/nick (67,78). The 8 kDa and thumb domains are well conserved in both canonical and noncanonical PolXs, with the full-length 8 kDa domain found in 95.3% of all sequences (Supplementary Dataset), suggesting that most of them may retain the dRP-lyase activity potentially important for their functions in DNA repair.

Functional analysis of noncanonical PolXs from Deinococcus species
Substitutions in the catalytic triad and changes in other parts of the palm and fingers domains suggest a loss of the DNA polymerization activity in noncanonical PolXs. Indeed, the aspartate triad is essential for the metal ion coordination and catalytic activity in the Pol␤ superfamily of nucleotidyltransferases (35,36,42,79,80). Similarly, even single substitutions in the catalytic triad in other polymerases dramatically decrease the rate of DNA polymerization (81). Surprisingly, a template-dependent polymerase activity was reported previously for recombinant PolX from D. radiodurans, a noncanonical PolX with an AEE triad, containing alanine and two glutamates instead of aspartates (31). However, no metal ions are bound in the active site in the published structure of D. radiodurans PolX, indicating that the substitutions impair catalytic metal binding by this PolX (Figures 1E and 3C) (56).
To study the spectrum of activities of noncanonical PolXs, we purified and analyzed recombinant PolX polymerases from D. radiodurans and D. gobiensis. Similarly to D. radiodurans PolX, the latter polymerase contains a noncanonical triad (ARE), in which all three aspartates are substituted with other residues ( Figure 3A). While the wildtype D. radiodurans PolX gene was successfully expressed in E. coli, a codon-optimized version of the D. gobiensis PolX gene was designed to increase its expression (see Materials and Methods). To avoid admixtures of cellular polymerases or nucleases, we performed three chromatographic steps during PolX purification, including Ni 2+ -chelating, heparin affinity and anion exchange chromatography and resulting in highly pure PolX preparations (Supplementary Figure S3A). In addition to the wild-type enzymes, we obtained a mutant variant of D. radiodurans PolX with alanine substitutions of the two glutamate residues in its active site (E199A/E234A). As a control canonical polymerase, we also expressed and purified B. subtilis PolX and its mutant variant with a single amino acid substitution in the catalytic triad (D203A). To confirm that noncanonical D. radiodurans PolX has a native conformation, we measured its circular dichroism spectrum and found that it is highly similar to that of B. subtilis PolX and to the predicted spectrum based on the content of ␣ and ␤ structures in the D. radiodurans PolX structure (Supplementary Figure S3B) (82). Furthermore, measurement of the denaturation temperatures (Td) for these polymerases demonstrated that D. radiodurans PolX is even more thermoresistant than B. subtilis PolX (Td of 83 and 61.6 • C, respectively), suggesting that it forms a stable structure.
The activity of D. radiodurans, D. gobiensis and B. subtilis PolXs was tested on primer-template or gapped DNA substrates ( Figure 4B) in the presence of dNTPs and Mg 2+ or Mn 2+ ions. It was found that noncanonical D. radiodurans PolX is unable to extend the primer in the presence of Mg 2+ on any of the tested templates at either low or high polymerase (20 nM or 1 M) or dNTP (10 or 200 M) concentrations ( Figure 4C, lanes 1-12; Supplementary Figure S4A, lanes 1-12; Supplementary Figure S4E, lanes 1-12). Not surprisingly, mutant D. radiodurans PolX with alanine substitutions in the active site was also inactive in these assays ( Figure 4C, lanes 1-12; Supplementary Figure S4B, lanes  1-12). Similarly, D. gobiensis PolX had no polymerase activity ( Figure 4E, lanes 1-3). Previously, a short-patch (one nucleotide) DNA extension by D. radiodurans PolX was detected in the presence of a large excess of unlabeled DNA substrate that was added together with dNTPs (to prevent multiple rounds of enzyme dissociation/association) (31). However, we could not detect any DNA polymerase activity in these conditions with our PolX samples (Supplementary Figure S4F). This suggested that the previously ob-served activity might have resulted from an admixture of other DNA polymerase(s) in the PolX preparations (31).
In the presence of Mn 2+ , both noncanonical deinococcal PolXs revealed robust 3 -5 exonuclease activity on all types of substrates, resulting in rapid shortening of the 5 -labeled primer ( Figure 4C, lanes 13-24; Figure 4E, lanes 4-6). In comparison, only low level of exonuclease activity was observed in the presence of Mg 2+ (lanes 11-12 in Figure 4C,E and Supplementary Figure S4A, B). Titration experiments demonstrated that the optimal concentration of Mn 2+ for this activity was between 3 and 10 mM, while no efficient cleavage was observed at any tested Mg 2+ concentration ( Figure 4I). In agreement with our observations, previously investigated bacterial PolXs, including PolX from D. radiodurans, were shown to possess Mn 2+ -dependent exonuclease activities (23,25,33,41,83). To confirm that this activity depends on the PHP domain, we obtained a mutant variant of D. radiodurans PolX with alanine substitutions of two of the nine active site residues in PHP involved in Mn 2+ binding (H332A/H334A). As expected, the mutant PolX lacked the exonuclease activity in the presence of either Mg 2+ or Mn 2+ (Figure 4F). At the same time, alanine substitutions in the polymerase active site did not affect the exonuclease activity ( Figure 4D, lanes 13-24).
For comparison, we tested the activities of B. subtilis PolX in the same conditions. It was found that it can efficiently extend DNA with both Mg 2+ and Mn 2+ . In the presence of Mg 2+ , the major reaction product at low PolX concentration corresponded to the addition of a single nucleotide to the primer 3 -end ( Figure 4G, lanes 1-12), while it was further extended at high PolX concentration (Supplementary Figure S4C, lanes 1-12). In the presence of Mn 2+ , B. subtilis PolX revealed highly efficient 3 -5 exonuclease activity, which competed with primer extension ( Figure 4G, lanes and Supplementary Figure S4C, lanes 13-24). The mutant B. subtilis PolX with substitution in the polymerase active site (D203A) completely lost its polymerase activity but remained active as exonuclease ( Figure 4H). This confirms that the PHP domain, but not the polymerase active site, is responsible for the 3 -5 exonuclease activity in both canonical and noncanonical PolX polymerases.

Association of prokaryotic PolX genes with components of DNA repair pathways
Eukaryotic PolXs, including Pol and Pol, participate in the NHEJ pathway by performing limited DNA synthesis at the gaps during DSB repair (84). Many bacteria encode components of the NHEJ pathway, including the Ku protein, homologous to eukaryotic Ku, and Ligase D (LigD), a multifunctional factor with ligase, polymerase and sometimes nuclease (phosphoesterase) activities (85). To uncover whether prokaryotic PolXs might be connected to DSB repair, we analysed the presence of PolX, Ku and LigD in fully sequenced bacterial genomes (24973 genomes in total). To avoid biases in the frequencies of PolXs and NHEJ components in bacterial genomes due to highly uneven numbers of sequenced genomes in various lineages (see the first section of Results), we generated a non-redundant sample of 2826 representative genomes containing 452 PolX variants (332 canonical and 114 altered), using a previously described al-gorithm of genome clustering (52). This algorithm allows to obtain a representative collection of genomes based on their sequence diversity and not on taxonomy, thus helping to smooth possible taxonomic biases.
We then analyzed co-occurrences of PolX and NHEJ genes in the non-redundant sample of genomes ( Figure  5). For comparison, a similar analysis was also performed for the complete set of sequenced genomes (Supplementary Figure S5A). We defined functional LigD variants as those containing both ligase and polymerase domains in the same protein, and looked separately for LigD variants with or without the nuclease domain. The genomes that contained both Ku and any of these two LigD variants were classified as encoding the NHEJ pathway with and without associated nuclease activities, respectively (NHEJ+Nuc and NHEJ-Nuc). The genomes that contained either Ku or LigD alone, or lacked both proteins were classified as lacking the NHEJ pathway.
It was found that 72.8% of the non-redundant genomes (2058 out of 2826) lack the canonical NHEJ pathway; most of them lack both Ku and LigD and some contain only Ku or LigD domains alone. Among them, canonical and altered PolXs are present in 10.3% and 3.0%, respectively. The absolute numbers of genomes in each group are indicated in Figure 5A (all phyla). Furthermore, 20.6% of the genomes (582 out of 2826) belong to the NHEJ+Nuc group and 6.6% (186 out of 2826) belong to the NHEJ-Nuc group. In the NHEJ+Nuc group, canonical and altered PolXs are present in 7.7% and 7.2% of the genomes, which is comparable to the NHEJ-minus genomes ( Figure  5A). In contrast, in the NHEJ-Nuc group they are present in 43% and 5.9% of the genomes, respectively. Thus, the NHEJ-Nuc genomes are enriched with both canonical and noncanonical PolX variants, and in sum about half of the NHEJ-Nuc genomes contain PolXs (in comparison with 13.3% of NHEJ-minus genomes) ( Figure 5A). Analysis of the complete set of sequenced genomes gives similar results, although with slightly different proportions of genomes in each group (Supplementary Figure S5A). This indicates that the nuclease activity of PolXs might compensate for the absence of the nuclease domain in LigD encoded in the NHEJ-Nuc genomes.
The distribution of the NHEJ pathways and PolX variants is uneven between bacterial phyla ( Figure 5A, B, Supplementary Figure S5A). We therefore performed analysis of PolX frequencies separately in three bacterial phyla with many sequenced genomes encoding PolXs, Actinobac-teria, Firmicutes and Bacteroidetes (Figure 2A). Firmicutes encode almost exclusively the NHEJ-Nuc pathway (found in 24% of non-redundant genomes) and canonical PolX variants (30% of genomes). The NHEJ-Nuc pathway is strongly enriched in genomes containing canonical PolXs (46.6% in comparison with 12.2% of genomes lacking PolXs) ( Figure 5A). While the inactive PolX variants are much less common, they are almost always associated with the NHEJ-Nuc pathway (11 out of 12 genomes encoding altered PolXs) ( Figure 5A). Actinobacteria often encode either the NHEJ+Nuc pathway (41% of sequenced genomes) or the NHEJ-Nuc pathway (16% of genomes). Only a small fraction of this phylum encodes PolXs (6% of genomes) and almost all identified PolX variants are canonical ( Figure 5A,B). The proportion of genomes encoding the NHEJ+Nuc pathway is similar among the genomes lacking or containing PolXs. However, the NHEJ-Nuc pathway is found much more frequently in the genomes encoding PolXs, suggesting a functional association ( Figure  5A). Bacteroidetes usually encode the NHEJ+Nuc pathway (found in 28% of sequenced genomes) and noncanonical PolX variants (also in 28% of genomes). In this phylum, PolXs are also strongly associated with the NHEJ pathway (∼49% of PolX-containing genomes encode NHEJ versus ∼19% of genomes lacking PolXs) ( Figure 5A). Thus, it can be concluded that PolX genes are often co-encoded with the NHEJ genes in different bacterial phyla but the type of association between different PolX and NHEJ groups is specific for individual phyla.
To estimate statistical significance of the found associations, we compared the observed frequencies of PolX and NHEJ genes with their expected distributions for random association and calculated corresponding P-values using the Pearson -square test of independence. It was found that the genomic association of PolXs with NHEJ is highly statistically significant (P-value = 2.3e -10 for co-occurrence of both PolX variants with both NHEJ pathways). Highly significant non-random associations were also observed when considering the two types of PolXs independently, either for all genomes or for individual bacterial phyla (Supplementary Figure S6).
To test whether the observed enrichment of the NHEJ pathways in the PolX-encoding genomes might be simply explained by a larger size of these genomes, we compared genome lengths depending on the presence of NHEJ and PolX for the same sample of non-redundant genomes. It was found that NHEJ-containing genomes are indeed on average larger than genomes without NHEJ (Supplementary Figure S5B) (86). However, a similar trend was observed for genomes both lacking and encoding PolXs. Moreover, in the case of genomes containing the NHEJ-Nuc pathway the length of the genomes with PolXs was even somewhat smaller than in the case of genomes lacking PolXs (Supplementary Figure S5B). It is therefore unlikely that the genomic association of PolXs and NHEJ pathways can be a nonspecific event resulting from increased lengths of such genomes.
To better understand possible biological functions of prokaryotic PolXs, we also analyzed operon structures of canonical and altered PolXs and identified most common Pfam domains enriched in proteins encoded in the genomic The numbers of genomes in each group are indicated. (B) Co-occurrence of PolXs with NHEJ genes in the non-redundant set of bacterial genomes, shown on a phylogenetic tree generated for the 452 PolX sequences found in these genomes (note that the tree topology is different from Figure 2A due to the much smaller number of PolX sequences used for analysis; this tree is used to illustrate solely the diversity of combinations of PolX and NHEJ in different phyla, and not the evolution of PolXs). The rings are annotated as follows: 1, phylum (the color code corresponds to Figure 2A); 2, catalytic triad status (green, canonical; ochre, altered); 3, NHEJ status (ligh green, no NHEJ; orange, NHEJ+Nuc; violet, NHEJ-Nuc). The green dots on the nodes correspond to bootstrap values of 98-100. neighborhood of the PolX genes (see Materials and Methods). It was found that the gene neighborhood of polymerases is specific for each investigated class of organisms (Supplementary Figure S7 and Supplementary Table S3). However, some of the detected genetic associations may suggest a possible functional connection between PolX and nucleic acid processing. Remarkably, in the class Alphaproteobacteria the most abundant operon neighbor of noncanonical PolXs is LigD, confirming a functional connection between PolX and NHEJ (Supplementary Figure S7 and Supplementary Table S3) (87)(88)(89). In the classes Bacilli and Clostridia, the most frequent operon neighbors of both canonical and non-canonical PolX include proteins ZapA, which participates in the Z-ring formation and synchronizes cell division with chromosome segregation (90)(91)(92)(93), and a MutS2 nuclease, which may be involved in processing of recombination intermediates and natural transformation in B. subtilis (94,95). The PolX operons in Bacilli also often contain a pore forming protein Colicin V, suggesting that together these proteins may promote gene exchange between bacteria. Furthermore, a common gene neighbor of altered PolXs in Deinococci is a stand-alone PHP domain, which might provide additional nuclease activities during DNA repair. Finally, canonical PolXs in Archaea are strongly associated with an ATP-dependent DNA ligase in the class Methanomicrobia and with a Mut7-C domaincontaining RNase in the class Halobacteria (96) (Supplementary Table S3). Overall, this analysis suggests that no universal genetic associations are characteristic for PolX operons but some of them may have dedicated functions in nuclear acid processing and genomic DNA repair.
Intriguingly, we also found a substantial number of genomes that contained more than one PolX gene (121 genomes among the 6239 genomes containing 6362 unique PolX sequences and 87 genomes among the 2846 genomes containing 2935 non-redundant PolX variants) (Supplementary Table S4). Most of them contained 2 PolX genes and two genomes of Bacteroidetes contained three PolX genes. Half of them contained canonical and noncanonical PolX genes at the same time (Supplementary Table S4). These variants likely correspond to independently acquired genes via horizontal gene transfer since the majority of such polymerase pairs are located in very distant clades of the PolX tree. This suggests that canonical and altered polymerases might play different functions in these bacterial species.

DISCUSSION
Our analysis of prokaryotic PolXs showed that they are much more diverse than their eukaryotic counterparts and form several clades including canonical Pol␤-like polymerases and highly divergent noncanonical PolX polymerases. Characteristic features of noncanonical PolX include: (i) substitutions of the catalytic triad residues in the polymerase active site in the palm domain, (ii) deletions in the palm domain; (iii) substitutions of conserved residues of the dNTP binding site at the interface of the palm and fingers domains and (iv) deletions in the fingers domain (Figure 1B). Since the aspartate triad is essential for the catalytic activity in Pol␤ and its relatives (35,36,42,79,80), noncanonical PolXs probably lack a DNA polymerase activity and are unlikely to act as DNA polymerases. Furthermore, alterations in the palm and fingers domains in these PolXs often accompany each other confirming that the mutated elements are no longer important for the polymerase activity. Indeed, our analysis of two noncanonical PolXs from D. radiodurans and D. gobiensis demonstrated that they are inactive as DNA polymerases.
Despite dramatic changes in the structure of the polymerase active site, the noncanonical PolXs do not have specific alterations in the N-terminal dRP-lyase and Cterminal PHP exonuclease domains. In particular, the majority of prokaryotic PolXs contain a highly conserved PHP domain with the predicted exonuclease activity. Indeed, we demonstrated that noncanonical PolXs from D. radiodurans and D. gobiensis have a high level of Mn 2+ -dependent 3 -5 exonuclease activity, which is abrogated in the presence of mutations of conserved residues involved in the catalytic metal binding in the PHP domain. Since the PHP domain has a broad range of activities toward 3 -ends and AP-sites in DNA substrates of various structures (23,33,97), both canonical and altered PolXs might participate in the sanitization of the primer 3 -ends during DNA replication and break repair, and possibly in the processing of DNA intermediates during BER.
In the complex of T. thermophilus PolX with DNA, the PHP domain is remote from the DNA substrate and the mode of its interactions with DNA during exonucleolytic reaction remains unknown ( Figure 1D). Interestingly, available structure of the noncanonical PolX from D. radiodurans demonstrates significant conformational changes in comparison with the T. thermophilus PolX ( Figure 1E). It can be speculated that such changes may be important for switching the activities of PolXs, but their functional role remains to be established. The coordination of various activities in canonical PolXs, as well as the role of polymerase domains in the nuclease activity of noncanonical PolXs, will be important questions for further studies.
Most noncanonical PolXs, including PolXs found in the Bacteroidetes and Deinococcus-Thermus phyla, form a single group on the phylogenetic tree suggesting their common evolutionary origin (Figure 2A). While more sophisticated analysis is needed to understand the exact evolutionary origins of altered PolXs, the presence of related PolXs in unrelated bacteria phyla indicates their horizontal transfer between prokaryotic species. It should be noted that noncanonical PolXs can also be present together with canonical PolXs in some genomes, suggesting that cooperation between the two types of DNA polymerases may be beneficial for host species.
Bacterial genomes encoding both canonical and noncanonical PolXs are enriched with genes encoding the main components of bacterial NHEJ, the Ku protein and the multifunctional ligase LigD (87,88). Interestingly, genomes with altered PolXs encode NHEJ pathways even more frequently than genomes with canonical PolXs, suggesting that the polymerase activity of PolX may not be important for NHEJ. Furthermore, both canonical and altered PolXs can be associated with the NHEJ-Nuc pathway, in which LigD lacks the nuclease domain involved in processing of DNA ends. In this case, the exonuclease activity of PolX might compensate for the absence of the nuclease activity in LigD during NHEJ. At the same time, altered PolXs are also frequently found in the same genomes with the NHEJ+Nuc pathway indicating that they might still have a role in DNA repair even in the presence of the nuclease activity in LigD.
Analysis of the genomic neighborhood of prokaryotic PolXs also reveals their association with nucleic acid processing enzymes in some bacterial classes, including LigD in Alphaproteobacteria, suggesting their possible functional cooperation. Therefore, PolX family polymerases may be generally involved in double-strand break repair, in particular NHEJ, in both bacteria and eukaryotes (27), suggesting that this PolX function may have first appeared in the prokaryotic world. Intriguingly, however, the polymerase activity of PolX is apparently not important for NHEJ in bacteria, which may be compensated by the polymerase domain of LigD. In contrast, PolXs involved in eukaryotic NHEJ lack the exonuclease domain, which is obligatorily present in prokaryotic PolXs, and cooperate with additional exonucleases (98).
Recent analysis revealed another example of inactive DNA polymerase from the Y family, ImuB, which forms a multisubunit complex with a homolog of the Pol III alpha subunit DnaE2 and a RecA homolog ImuA and interacts with the processivity clamp (99)(100)(101). This complex was proposed to act as a mutasome due to the error-prone catalytic activity of DnaE2 while ImuB serves as an organizing subunit. In comparison, inactive PolX polymerases may both play architectural and DNA binding functions during nonhomologous end joining and also directly contribute to DNA processing.
Available data from in vivo experiments, while very limited, suggest that PolXs may have different functions in different bacteria. Thus, PolXs from B. subtilis and T. thermophilus were proposed to participate in BER, while PolX from D. radiodurans was shown to be important for radioresistance and genome recovery after ␥ -irradiation, but not take part in the BER or nucleotide excision repair pathways (26,33,34,102,103). In D. radiodurans, PolX and the SbcCD nuclease, an evolutionary conserved structure-specific nuclease involved in processing of double-strand breaks, were shown to play complementary roles during post-radiation repair, suggesting their involvement in the same repair pathway (103). While the biological functions of most prokaryotic PolXs remain to be established, we hypothesize that the main role of both canonical and noncanonical polymerases may be in the processing of DNA intermediates during DNA repair rather than in DNA synthesis. Investigation of their cellular roles, including proposed participation in the NHEJ pathway, and of their interplay with other DNA repair pathways will be an important goal of future research.

DATA AVAILABILITY
All primary data are available from the corresponding authors upon request.