Exploring Protein Space: From Hydrolase to Ligase by Substitution

Abstract The understanding of how proteins evolve to perform novel functions has long been sought by biologists. In this regard, two homologous bacterial enzymes, PafA and Dop, pose an insightful case study, as both rely on similar mechanistic properties, yet catalyze different reactions. PafA conjugates a small protein tag to target proteins, whereas Dop removes the tag by hydrolysis. Given that both enzymes present a similar fold and high sequence similarity, we sought to identify the differences in the amino acid sequence and folding responsible for each distinct activity. We tackled this question using analysis of sequence–function relationships, and identified a set of uniquely conserved residues in each enzyme. Reciprocal mutagenesis of the hydrolase, Dop, completely abolished the native activity, at the same time yielding a catalytically active ligase. Based on the available Dop and PafA crystal structures, this change of activity required a conformational change of a critical loop at the vicinity of the active site. We identified the conserved positions essential for stabilization of the alternative loop conformation, and tracked alternative mutational pathways that lead to a change in activity. Remarkably, all these pathways were combined in the evolution of PafA and Dop, despite their redundant effect on activity. Overall, we identified the residues and structural elements in PafA and Dop responsible for their activity differences. This analysis delineated, in molecular terms, the changes required for the emergence of a new catalytic function from a preexisting one.


Introduction
The concept of "protein space" was introduced in 1970 by John Maynard Smith (Maynard Smith 1970) in an attempt to settle the apparent contradiction between evolution by natural selection and the complex nature of the gene-encoded protein (Salisbury 1969). Clearly, for enzymes to evolve and new functions to emerge, changes to the amino acid sequence must take place. However, proteins are of inherent restricted evolvability, as proteins are only marginally stable (DDG unfolding $5 to 10 kcal/mol) (DePristo et al. 2005), and about one third of random mutations in proteins have severe effects on their function (>90% loss of activity) (Camps et al. 2007). For natural selection to act as a driving force for molecular evolution, the enzyme catalytic activity must be retained at some level, as an inactive enzyme is a dead end for natural selection. Hence, protein space represents the continuous network of viable sequence combinations via a stepwise mutational process. The mutational trajectory in which protein evolution occurs-while retaining catalytic activity and stability-is complex, given the stochastic nature of mutation and the vast sequence space of proteins. Functionaltering mutations are often destabilizing, and additional mutations are required to compensate for this effect. Furthermore, the effect of mutation is not simply additive and could be epistatic in nature; namely, the same mutation could be either neutral, beneficial, or deleterious, depending on the context of the protein sequence. Thus, interactions between mutations pose severe restrictions over evolutionary trajectories (Camps et al. 2007;Kaltenbach and Tokuriki 2014).
Although understanding evolution at the molecular level is a central goal in modern biology, studying evolution involves inherent difficulties, as tracking past events always involves some level of uncertainty. Most research in this field is conducted synthetically, in vitro, using directed evolution, whereas kinetic parameters like k cat or K m are used as a proxy for organism fitness. Here we describe the evolutionary relationship between two homologous enzymes, Dop and PafA, and demonstrate in molecular detail the changes required for the emergence of a new catalytic function from a preexisting one. Dop and PafA pose an insightful case study, as both rely on similar mechanistic properties, yet catalyze distinct reactions (Striebel et al. 2009;€ Ozcelik et al. 2012). PafA catalyzes the ligation of a small protein tag termed Pup (Prokaryotic ubiquitin-like protein) to target protein substrates Article ß The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Open Access
Mol. Biol. Evol. 38(3):761-776 doi:10.1093/molbev/msaa215 Advance Access publication September 1, 2020 (Guth et al. 2011); Dop removes the tag by hydrolysis of the iso-peptide bond between Pup and the target protein ( fig. 1A) (Burns et al. 2010). Together, they form the pupylation pathway, a conserved pathway in species belonging to the phyla Actinobacteria and Nitrospira (Iyer et al. 2008). In Mycobacterium tuberculosis, pupylation is coupled to regulated protein degradation by the bacterial proteasome, and is essential for virulence of this pathogen (Darwin 2003). In the nonpathogenic model organism Mycobacterium smegmatis, the Pup-proteasome system (PPS) plays an important physiological role under nitrogen starvation conditions (Elharar et al. 2014). Since Dop and PafA are the products of natural evolution, they form an advantageous, bona fide, experimental system to explore protein space and test the effect of mutation on protein stability, function, and fitness-both biochemically and in the context of the living cell.
The M. smegmatis Dop and PafA share 37% identity and 65% similarity; both belong to the carboxylate-amine ligase superfamily and share the glutamine synthetase (GS) fold (fig. 1B) (Iyer et al. 2008;€ Ozcelik et al. 2012). Although PafA and Dop clearly had a common ancestor, they present distinct activities with no detectable promiscuous activities (Striebel et al. 2009). In other words, PafA does not perform deamidation and depupylation, whereas Dop cannot pupylate substrates. Very much like GS, PafA catalyzes a two-step reaction where ATP is used in the first step to phosphorylate a c-glutamyl group, thereby facilitating conjugation to an amine group in the second step alongside the release of a free phosphate. Specifically, PafA phosphorylates Pup C-terminal glutamate in the first step, and proceeds to the conjugation of this activated Pup form with the e-amino group of a target protein lysine ( fig. 1C) (Guth et al. 2011). In mycobacteria and some other species, Pup is translated with a C-terminal glutamine (Pup Q ) rather than a glutamate (Pup E ) (Pearce et al. 2008). In these cases, Dop is responsible for deamidation of Pup Q , leading to the formation of Pup E ( fig. 1A) (Striebel et al. 2009). Only then can PafA conjugate Pup E to target substrates. Via the same mechanism, Dop can also depupylate an already pupylated protein ( fig. 1C), albeit slower than it catalyzes deamidation (Elharar et al. 2016;Hecht et al. 2018 (Barandun et al. 2013). Dop and PafA are homologous enzymes that present high structural similarity. Two distinctive differences between the two enzymes are the presence of the Dop-loop in Dop but not in PafA, and the region of the alpha-loop, where an alpha-helix is formed in PafA and a loop in Dop. The illustrated Dop-loop segment was added for visualization purposes only. The active site groove is indicated by gold, and ATP is shown in black. (C) Dop and PafA belong to the carboxylate-amine ligase superfamily. Both GS and PafA employ a two-step catalytic mechanism, where ATP is used in the first step to phosphorylate a c-glutamyl group, followed by ligation of the amine group of a lysine residue (PafA) or ammonia (GS) in the second step. In contrast, Dop hydrolyzes an amide bond using ADP and Pi. X denotes either hydrogen or target protein for deamidation and depupylation, respectively. Hecht et al. . doi:10. (Bolten et al. 2017).
Although Dop and PafA present a similar fold, their structures differ significantly in two regions. A region of $40 amino acids, termed the Dop-loop, is conserved in Dop, but is absent in PafA orthologs ( € Ozcelik et al. 2012). The second noticeable structural difference between PafA and Dop lies in a region which we termed the "alpha-loop," as this region forms an alpha-helix in PafA, in contrast to a loop in Dop ( fig. 1B). Although the Dop-loop and the alpha-loop clearly differentiate between Dop and PafA, they are not essential for catalysis, and switching either of them between the enzymes did not lead to a change in activity ( € Ozcelik et al. 2012). It was later found that the alpha-loop is important for PafA interaction with pupylation targets (Regev et al. 2016), whereas the Dop-loop had been found to allosterically inhibit Dop depupylation activity (Hecht et al. 2020).
Here, we sought to identify the critical differences in amino acid sequence and folding responsible for each distinct activity. We tackled this question initially via analysis of sequencefunction relationships, and identified a set of uniquely conserved residues in each enzyme. A follow-up reciprocal mutagenesis of Dop completely abolished the native hydrolase activity, and at the same time yielded a catalytically active Pup-ligase. Mutational analysis, combined with the available structural information, indicated that the alpha-loop conformation is a critical factor that controls the protein function. Further analysis revealed conserved residues to be essential for stabilization of the alternative conformation required for a change in activity, rather than affecting the catalytic mechanism directly. Remarkably, a combinatorial mutant library of the identified residues uncovered multiple mutational paths, each enabling the change of function to occur. Overall, this study highlights, in molecular terms, the changes required for the emergence of a new catalytic function from a preexisting one.

Evolutionary Relationship between Dop and PafA
To give some insight into the evolutionary history of Dop and PafA a phylogenetic analysis was performed. Initially, taxa bearing Dop and PafA homologous sequences were identified, via alignment of the M. smegmatis strain MC 2 155 Dop and PafA sequences against the refseq_protein database using BLASTP searches. The analysis confirmed that Dop and PafA are largely conserved across the Actinobacteria and Nitrospirae phyla. Homologous sequences of one or both proteins were also detected very sporadically in a few draft genomes within other phyla, like the candidate division NC10, Armatimonadetes, Verrucomicrobia, Nitrospinae, Firmicutes, and Proteobacteria. A single copy of a homolog to both Dop and PafA was identified in some Planctomycetes species and further used as an external group for construction of a maximum likelihood phylogenetic tree. To reliably obtain this, we used the highest quality sequences that also represent the maximum diversity of bacteria having a complete pupylation pathway. We thus selected only complete genomes of the RefSeq database (https://www.ncbi.nlm.nih.gov, last accessed September 1, 2020) available in February 2019. Given the massive number of genomes available, a reduction of the data set was done by selecting randomly only one genome per Actinobacteria family, and per species for the other phyla. The resulting tree built with Dop and PafA sequences indicated that Dop and PafA form two distinct and statistically well-supported clusters that originated from an ancient duplication event ( fig. 2). The Planctomycetes paralogous proteins share about 29-31% identity with both Actinobacteria and Nitrospirae PafA and Dop proteins, and their sequence partially aligns with the Dop-loop (MAFFT alignment in supplementary file 1, Supplementary Material online). The data further suggests, given the sporadic co-occurrence of the Pupligases and depupylases in phyla other than Actinobacteria, and the current vision of the tree of Bacteria (Hug et al. 2016), that the full pupylation pathway emerged in Actinobacteria and was later horizontally transferred to the ancestor at the origin of the Nitrospirae phylum and to other phyla like Nitrospinae and Proteobacteria.

Identification of Residues Responsible for an Activity Change
To find the residues responsible for the catalytic differences between PafA and Dop, we sought to identify uniquely conserved positions in each enzyme. These were defined as positions conserved in one enzyme but not in the other, or differently conserved in both. We analyzed 2,689 protein sequences belonging to the Pup-ligase/deamidase family, and generated a sequence similarity network (SSN) to categorize each sequence as either a Pup-ligase or a deamidase. Examples of such residues are the GhExE (h, hydrophobic; x, any residue) ATP-binding motif and additional residues that were previously shown to be involved in catalysis (Iyer From   Next, reciprocal mutagenesis was performed on the M. smegmatis PafA and Dop. As PafA mutagenesis destabilized the enzyme, we describe here the mutational analysis performed on Dop. To simplify the analysis, uniquely conserved residues that were not located in close proximity to the active site cradle (>20 Å) were filtered, leaving 20 positions in Dop that were selected for reciprocal mutagenesis (fig. 4A and table 1). These included nine out of the ten shared positions of both enzymes, eight PafA uniquely conserved positions including one insertion, and one Dop uniquely conserved position. In addition, two positions in the alpha-loop region that were not highly conserved in PafA and Dop were nevertheless chosen for mutagenesis to maintain secondary structure integrity.
Three mutants were designed. The first mutant, Dopa, included only a substitution of the alpha-loop region, comprising nine amino acid substitutions ( fig. 4A and B). The second mutant, Dop 2 PafA, included mutations of 11 positions outside the alpha-loop region; and the third mutant, Dop 2 PafAa, contained all 20 reciprocal mutations. These mutants were initially designed without the Dop-loop, as this region is not essential for Dop catalytic activity ( € Ozcelik et al. 2012;Hecht et al. 2020). Accordingly, a 37 amino acid deletion, which completely removed the loop, was performed while generating the mutants. Eventually, however, the Dopa mutant did contain the Dop-loop, as deletion of this loop destabilized the mutant, rendering it insoluble. The three mutant proteins were expressed in Escherichia coli and purified to homogeneity for in vitro depupylation and pupylation assays. For these assays FabD, a bona fide substrate, and its pupylated form, Pup-FabD, were used. As FabD and Pup-FabD migrate differently in SDS-PAGE, gel-based assays readily detected pupylation and depupylation in our experimental system. A wild type PafA and a Dop mutant lacking the Doploop (Dop DDop-loop ) were used as controls. We found that the Dopa mutant depupylated Pup-FabD as well as Dop DDop-loop , and did not exhibit any pupylation activity ( fig. 4B). This result indicated that substitution of only the alpha-region is insufficient for an activity change. The Dop 2 PafA mutant was able to depupylate Pup-FabD, although poorly as compared with the Dop DDop-loop , and was not able to pupylate FabD. Clearly, the eleven point mutations did not convert Dop into a Pupligase. However, when these eleven mutations were combined with the alpha-loop mutations to yield Dop 2 PafAa, the mutant lost its native depupylation activity and functioned as a catalytically active Pup-ligase ( fig. 4B). Remarkably, 20 mutations were sufficient to completely abolish Dop native activity and to change its catalytic activity from a hydrolase to a ligase.

The Dop-Loop Contributes to the Change of Function
The mutational analysis described in figure 4B did not account for the possibility that, although the Dop-loop is not essential for Dop activities, its deletion nevertheless contributed to the change of function. This flaw resulted from our inability to purify a Dopa mutant lacking the Dop-loop (Dopa DDop-loop ) owing to protein solubility problems. To circumvent this problem, we sought to perform pupylation assays in E.  5A). As Pup E was co-expressed with each tested enzyme, the pupylome (i.e., the pool of pupylated proteins in the cell) levels could be monitored via western blots using antibodies against Pup. As expected, a pupylome was detected upon PafA expression, but not upon expression of wild type Dop. The Dop 2 PafAa mutant produced a pupylome level comparable with that of wild type PafA, whereas the Dopa mutant produced very low pupylation levels. This is consistent with the lack of pupylation observed for the Dopa mutant in vitro ( fig. 4B). Importantly, Dopa Dop-loop GS generated a higher level of pupylome, whereas deletion of the whole Dop-loop (Dopa DDop-loop ) resulted in an even higher pupylome level. Clearly, the Dopa mutant lacking the Doploop, with no addition of supporting mutations, was able to perform pupylation in vivo. In other words, the replacement of the alpha-loop region in Dop, combined with the Doploop deletion, was sufficient for a change in function to occur. However, this mutant presented lower pupylome levels in comparison with the Dop 2 PafAa mutant, the original mutant that includes 11 supporting mutations in addition to the alpha-loop replacement and the Dop-loop deletion. Therefore, the supporting mutations, although not essential for a change in activity, contributed to the conversion of a depupylase to a Pup-ligase.
Realizing that the Dop-loop presence can inhibit a change in activity, we sought to compare the in vitro activity of Dop 2 PafAa with a similar mutant that also presents the Dop-loop. To avoid solubility problems, we attempted mutagenesis of the Dop ortholog from A. cellulolyticus (Dop Ac ), the ortholog for which a crystal structure is available. Previously, mutational analysis indicated that transplantation of the PafA alpha-loop into Dop Ac did not lead to an activity change ( € Ozcelik et al. 2012). Here, a Dop 2 PafAa Ac mutant was generated, presenting an intact Dop-loop and all the additional 11 supporting mutations ( fig. 5B and table 1). The Dop 2 PafAa Ac mutant was purified, and its pupylation and depupylation activities were tested in vitro. We found that this mutant could pupylate FabD, albeit very slowly, emphasizing the contribution of the additional supporting mutations for a change in function ( fig. 5B). Interestingly, the Dop 2 PafAa Ac mutant also retained some depupylation activity, as it was able to depupylate Pup-FabD.
To further understand the Dop-loop contribution to the functional differences between PafA and Dop, a Pup Q deamidation reaction was performed. The product of the deamidation reaction is Pup E ; and the two Pup variants migrate slightly differently in SDS-PAGE, thus allowing detection of Pup Q deamidation. Although wild type Dop catalyzed Pup Q deamidation within a few minutes, no Pup E accumulation was observed using the Dop 2 PafAa Ac mutant even after 3 h ( fig. 5C). At the same time, Dop 2 PafAa Ac , in contrast to   Hecht et al. . doi:10.1093/molbev/msaa215 MBE residues that are identical in both enzymes. Specifically, two threonines and an arginine are highly conserved in both enzymes, and are perfectly aligned in the sequence of Dop and PafA, yet these residues are spatially arranged differently in both enzymes, owing to the different conformation of the alpha-loop region ( fig. 6A and B). In Dop, these residues clearly face the active site, and are potentially involved in catalysis. In PafA, these residues point away from the active site. To test their role in PafA, the two threonines and arginine were mutated to alanines for activity measurements in vitro. The single threonine to alanine mutants (PafA T183A , PafA T184A ) were found active, yet catalyzed FabD pupylation considerably slower than wild type PafA ( fig. 6C). The double mutant, PafA T183A , T184A , was found even less active, and no activity could be detected for the arginine to alanine mutant, PafA R193A . These results indicate that those alpha-loop residues that are conserved and identical in PafA and Dop are also functionally important, despite their different geometric arrangement in both enzymes. As our data indicate that the alpha-loop is a discriminatory factor that must be altered for an activity change to be achieved, it follows that the alpha-loop conformation, rather than the identity of its functional residues, is a prime factor that differentiates between PafA and Dop.

Multiple Distinct Mutational Paths Support a Change of Function
Replacement of the alpha-loop resulted in an activity change when combined with supporting mutations that were deduced based on position conservation analysis in PafA and Dop (figs. 3A and 4B). To determine which of the supporting mutations are indeed essential and responsible for the change in activity, a series of Dop 2 PafAa mutants was created, each presenting a single reversion back to the native state. As some of the mutants proved to be unstable to an extent where it was impossible to express and purify them for in vitro activity assays, in vivo analysis in M. smegmatis was carried out. Each Dop mutant was expressed from a plasmid in a pafA deletion strain, and the pupylome levels were monitored via western blots using antibodies against Pup. As PafA is the sole Pupligase, pupylome accumulation in these strains attested for a The residues referred to in the text are surrounded by a yellow square and marked by an asterisk. In addition, a sequence logo of the alpha-loop is presented for each enzyme (Crooks 2004). Polar, green; neutral, purple; basic, blue; acidic, red; hydrophobic, black. (C) FabD (10 lM) pupylation by wild type PafA, PafA T183A , PafA T184A , PafA T183A , T184A , and PafA R193A (1 mM each) and Pup E (20 mM). Samples were removed at the indicated time points for SDS-PAGE analysis, followed by CBB staining.
From Hydrolase to Ligase . doi:10.1093/molbev/msaa215 MBE Pup-ligase activity of the expressed Dop mutants. To assess the expression levels of the Dop mutants, we relied on a polyhistidine tag present at the N-terminus of each Dop mutant, and performed western blots using antibodies specific for this tag. An empty vector, and vectors expressing wild type Dop and PafA, were used as controls.
As expected, no pupylation was observed in the negative controls (empty vector, Dop), whereas a high level of pupylation was evident in the clone expressing wild type PafA ( fig. 7A). Dop 2 PafAa was well expressed in M. smegmatis, and gave rise to a clear pupylome, albeit at levels lower than those observed upon PafA expression. In contrast, most of the single-reversion mutants were poorly expressed, suggesting that these reversions destabilized the Dop 2 PafAa mutant. This is consistent with the idea that most of the mutations originally included in the Dop 2 PafAa were stabilizing mutations that were not necessarily required for catalysis per se. Only one reversion mutant, Ala104Pro, exhibited both expression and activity levels higher than the parental mutant, Dop 2 PafAa ( fig. 7A). Two mutants, Phe85Ile and Glu212Val, lost their pupylation activity to an extent where pupylomes were undetectable. However, since these mutants presented low expression levels, it was difficult to determine whether these positions are functionally important for pupylation. Previous studies did not point to the respective positions in PafA, Phe47, and Glu177 as being functionally important. To further explore the functional importance of these positions in PafA catalysis, reciprocal mutagenesis was performed in the wild type context. Specifically, Phe47 in PafA was mutated to isoleucine, and Glu177 was mutated to valine. The two resulting mutants, PafA F47I and PafA E177V , were purified and their activity was tested in vitro. A FabD pupylation assay was performed to test PafA Phe47Ile and PafA Glu177Val activity, and was compared with an assay using wild type PafA. The pupylation activity of both mutants was significantly lower than that of wild type PafA (fig. 7B). These results suggest that these positions are functionally important in PafA, and are consistent with their conservation in PafA orthologs ( fig. 3A).
To determine the minimal set of supporting mutations that can support a change in activity, we created a combinatorial mutant library using a Dop that presents the alpha-loop as a backbone for addition of mutations. This backbone also lacked the Dop-loop, as in the previous mutational analysis performed in M. smegmatis ( fig. 7A). As 11 positions were mutated alongside the alpha-loop region in Dop 2 PafAa, and as each position can accommodate either a PafA or Dop residue, there are 2 11 ¼ 2,048 possible combinations of supporting mutations. To simplify the analysis, the supporting mutations were divided into five different segments, with each segment presenting either the Dop or PafA sequence ( fig. 8A). Accordingly, a total of 2 5 ¼ 32 mutants were generated, and their activity was tested in vivo. This time, the A B    8B and table 2). Western blots using antibodies against Pup and Dop were performed to assess the levels of the pupylomes and of the expressed Dop mutants, respectively. Noticeably, no strong correlation was observed between the mutant Dop expression level and its Pup-ligase activity. This was evident also from the in vivo assay presented in figure 7A. Clearly, an enzyme stability and its activity are not tightly linked in the protein space. From the 32 mutants tested, some combinations of mutations resulted in an activity level substantially lower than that observed for the Dopa DDop-loop backbone (no. 31). For instance, mutants number 17 and 19 presented very weak pupylation activity ( fig. 8B and table 2). At the other extreme, four mutants generated pupylome levels comparable with those observed for the Dop 2 PafAa, and included the smallest number of supporting mutations ( fig. 8B and table 2). These four mutants are no. 7 (S27A, V31F, VHA to LVGS), no. 8 (VHA to LVGS, I85F), no. 9 (S27A, V31F, I85F), and no. 10 (S27A, V31P, S450D). Each included mutations across two segments, suggesting that mutation of only one segment could not effectively support a change in activity. Importantly, the results indicate that alternative mutational paths can support a change in function. Indeed, the four mutants did not share a specific mutation in common, but rather presented different combinations, with each effectively supporting a change in function. This analysis demonstrates that multiple mutational paths were combined in the evolution of PafA and Dop, despite their redundant effect on activity.

Discussion
Dop and PafA are close homologs that catalyze opposite reactions. One is a hydrolase; the other a ligase (Striebel et al. 2009;€ Ozcelik et al. 2012). Here, we were able to identify the conserved residues in Dop and PafA that are responsible for the functional differences between these enzymes. Generating Dop 2 PafAa, we converted Dop into a Pupligase, whereas the intermediate mutants between Dop and Dop 2 PafAa maintained their depupylation activity ( fig. 9). This suggests that along the mutational pathway of an enzyme, a catalytic change can occur following a mutational threshold, namely after a critical number of mutations have accumulated, rather than gradually. Our attempts to convert PafA to a hydrolase via reciprocal mutagenesis were not successful. This implies that the changes that were sufficient for a change in Dop activity are not simply reciprocal, and additional or different changes must be made to transform PafA into a hydrolase.
Dop and PafA evolved from duplication of a gene encoding an ancestral enzyme. According to the current view of protein evolution, it is most likely that the ancestral protein have been promiscuous, and the specific pupylation and depupylation activities evolved by sub-functionalization (Conant and Wolfe 2008). Since PafA catalyzes an activity that is essential for the pupylation pathway function, it is more likely that the ancestor had a Pup-ligase activity and presented a promiscuous Dop-like activity. This view of Dop and PafA evolution is also consistent with their belonging to the GS fold, or more specifically to the carboxylate-amine ligase superfamily. Other members of the superfamily include classical GS and two families of c-glutamyl-cysteine synthetases (GCS1 and GCS2) (Iyer et al. 2008(Iyer et al. , 2009). However, the Dop catalytic  Hecht et al. . doi:10.1093/molbev/msaa215 MBE mechanism diverged from enzymes in the superfamily in two major aspects. Although Dop does bind and uses ATP for the first step of the reaction to generate an acyl-phosphate intermediate, it uses the resulting ADP and Pi for multiple catalytic cycles ( fig. 1C) (Bolten et al. 2017). Although this process is still unclear, our results suggest the involvement of the conserved residues located at the Dop-loop in the unusual catalytic mechanism utilized by Dop. Secondly, the use of a water molecule instead of an amine group as a nucleophile, in the second part of the reaction, is unique and not known in other members of the superfamily. When considering the known enzymatic mechanisms for hydrolysis of an amide bond (as in proteolysis), Dop stands out as an unusual amidase. At first glance, such an unusual solution for catalysis of a widespread hydrolytic process may seem odd. However, when considering the evolutionary lineage of Dop, modifying an existing scaffold that already binds Pup stands to reason. It appears that most of the mutations required for the change in Dop function were necessary for the mutant protein stability, rather than catalysis. Accordingly, single position reversions performed on Dop 2 PafAa resulted in most cases in reduced expression levels, which we attribute to reduced stability. From the structural and biochemical point of view, our results demonstrate that although the region of the alphaloop contains catalytic residues that are highly conserved in both enzymes, a conformational change must take place to convey an activity change. Although structural information on the alpha-loop in the Dop 2 PafAa mutant is currently unavailable, deduction from the available Dop and PafA crystal structures in combination with our biochemical and mutational analysis led us to propose that the mutations in Dop 2 PafAa indeed resulted in a structural change of the alpha-loop conformation. Changing the region of the alphaloop alone is not sufficient for that change to take place, and it must be accompanied by additional point mutations, supposedly to stabilize the needed conformation, demonstrating an epistatic effect between the alpha-loop residues and the supporting mutations. When the supporting mutations were added combinatorically, we found that a minimum of three out of the eleven mutations are required to support a change of function, and that different distinct mutational paths enabled the change, demonstrating a higher than expected probability of change. All of the supporting mutations positions were highly conserved in PafA, however based on our results not all of them are needed to support a Pup-ligase activity. At most, one would expect some of these positions to show a coevolution relationship rather than been fully conserved. Hence, it seems that multiple mutational paths were combined in PafA evolution. This could be considered beneficial in terms of evolvability, however it is not clear what could be the selective pressure for this kind of redundancy and how general is this phenomena in protein evolution.
This study demonstrates the changes required in protein space for a new catalytic activity to evolve from a preexisting one. We identified a secondary conserved network of positions that are responsible for the change in activity, and by doing so explored the evolutionary consequences of the complex interplay that takes place between catalytic residues and the "static" protein scaffold that accommodates them. We conclude this discussion with a few sentences from the original paper that introduced the concept of protein space: "Some questions about molecular evolution can be formulated more clearly in terms of a protein space. For example: (i) Are all existing proteins part of the same continuous network, and if so, have they all been reached from a single starting point? (ii) How often, if ever, has evolution passed through a nonfunctional sequence?" (Maynard Smith 1970).

Bacterial Strains and Growth Conditions
Mycobacterium smegmatis MC 2 155 (wild type and mutants) cultures were grown in Middlebrook 7H9 broth containing 0.05% (v/v) Tween-80 and 0.4% (v/v) glycerol at 30 C. Solid media was prepared using Middlebrook 7H10 supplemented with 0.4% glycerol. Escherichia coli ER2566 (New England Biolabs) was used for all cloning procedures and was grown using typical procedures in LB broth and plates at 37 C. For the M. smegmatis in vivo pupylation assay, plasmid pMV206 (Stover et al. 1991) was used for cloning and expression of wild type PafA, Dop, and Dop mutants in a M. smegmatis Dpaf strain, under the transcriptional control of the hsp60 promoter. Cultures harboring pMV206 were grown with kanamycin (10 mg/ml). For pupylation assays in E. coli, plasmid pBAD24 (Guzman et al. 1995) was used to express Pup E under the control of the arabinose operon, and plasmid pCL1920 (Lerner and Inouye 1990) was used to express wild type PafA, Dop, and Dop mutants under the control of the lac promoter-operator. Cultures harboring pBAD24 and pCL1920 were grown with ampicillin (100 mg/ml) and spectinomycin (50 mg/ml), respectively.

Phylogenetic Analysis
The hmmsearch program of the HMMER 3.2.1 software (Eddy 2011;Mistry et al. 2013) and the hidden Markov model (HMM) profiles TIGR03688 and TIGR03686 available in TIGRFAM database (Haft 2003) were initially used to extract Dop and PafA orthologous proteins, respectively. However, we later observed incongruencies in alignments and concluded that the profiles were not discriminative enough to clearly distinguish both paralogs. We thus built HMM profiles in this study with the hmmbuild program using Dop and PafA sequences of model organisms. These 20-30 sequences, aligned using the MAFFT v7.313 software (Katoh and Standley 2013), represent several phyla and were unambiguously annotated using the MicroScope annotation platform as Dop or PafA (Vallenet et al. 2009(Vallenet et al. , 2017. Built HMM profiles and alignments are given in supplementary materials. For each genome, only the most significant hit was retained, setting an expectation E value threshold of 1eÀ100. One copy of Dop and PafA was recovered from each genome, aligned using MAFFT and the -lensi option for higher accuracy, and trimmed with the Gblocks software with less stringent parameters (Castresana 2000).
From Hydrolase to Ligase . doi:10.1093/molbev/msaa215 MBE A Maximum-Likelihood tree was built with the IQ-TREE software (Nguyen et al. 2015) and the model LGþFþR5 for describing amino-acid evolution, selected using ModelFinder (Kalyaanamoorthy et al. 2017) and the BIC criterion; 200 replicates of a nonparametric bootstrap approach were conducted to test the robustness of the tree topology. All known proteins in the c-glutamyl-cysteine synthetases families were too divergent to be used here as an external group. Lowering the expectation E-value threshold to 1eÀ10, we detected a single copy of a paralogous protein close enough to both Dop and PafA in some Planctomycetes species. This set of single copy PafA-/Dop-related proteins was used as an external group to attest to the duplication event and the ancestry of the indels of Dop and PafA.

Protein Expression and Purification
All proteins used in this study were recombinant M. smegmatis proteins, unless stated otherwise. For Pup purification, pup was cloned into plasmid pSH21 in fusion with the DNA encoding human titin-I27 and a TEV protease recognition sequence (His 6 -I27-TEV-Pup). Expression was at 30 C, and Ni 2þ -NTA purification was carried out according to a standard protocol. Following TEV cleavage, a buffer exchange step was carried out, and the His 6 -I27-TEV portion of the chimera was removed by loading the solution onto a Ni 2þ -NTA column. The flow-through was collected, and Pup was further purified on a C18 reverse phase column, lyophilized, and resuspended in 50 mM Hepes, pH 7.5, 50 mM NaCl.
All Dop variants were expressed in E. coli strain ER2566 from plasmid pET11a (with a C-terminal polyhistidine tag) or from plasmid pSH21 (N-terminal polyhistidine tag) under the transcriptional control of the T7 promoter. Following induction with IPTG, the cultures were incubated overnight at 18 C. Cells were lysed by sonication, and purification using Ni 2þ -NTA-agarose (Qiagen) was carried out according to a standard protocol, except that for purification of M. smegmatis Dop variants, buffers contained 10% glycerol (v/v). A second size exclusion chromatography purification step relied on a Superdex 200 column (GE Healthcare). For the M. smegmatis Dop variants, the buffer used for purification contained 50 mM Hepes, pH 7.5, 150 mM KCl, 20 mM MgCl 2 , 10% (v/v) glycerol and 1 mM DTT. For purification of A. cellulolyticus Dop, the buffer contained 50 mM Hepes, pH 8.0, 300 mM NaCl, 20 mM MgCl 2 , and 1 mM DTT.
PafA carried N-terminal polyhistidine tag and was at 30 C expressed in E. coli strain ER2566 from plasmid pSH21 under the transcriptional control of the T7 promoter. Cells were lysed by sonication, and purification using Ni 2þ -NTA-agarose (Qiagen) was carried out according to a standard protocol. Purification Ni 2þ -NTA buffers contained 10% glycerol (v/v). As a consequent purification step, a Superdex 200 size exclusion column (GE Healthcare) equilibrated with 50 mM Hepes, pH 7.5, 500 mM NaCl, and 10% (v/v) was used. The same procedure was used for PanB purification, except that the Superdex 200 size exclusion column (GE Healthcare) was equilibrated with 50 mM Hepes, pH 7.5, 150 mM NaCl, and 10% (v/v). For IdeR purification, the same procedure was used, with a buffer 50 mM Hepes, pH 7.5, 150 mM NaCl for Superdex 200 size exclusion column (GE Healthcare) equilibration. N-terminal polyhistidine tagged M. tuberculosis FabD that presents arginine substitutions of lysines 35, 122, and 291 was cloned following the same protocol used for IdeR purification.
For generation and purification of pupylated PanB, IdeR, and FabD, a C. glutamicum PafA (cgPafA) was used that presents an N-terminal polyhistidine tag followed by a TEV protease sequence. cgPafA was purified using the same protocol that was used for purification of M. smegmatis PafA, except following elution from the Ni 2þ -NTA beads, the imidazole in the buffer was removed via a buffer exchange step using a PD10 column (GE Healthcare), and the TEV protease was added at a TEV/PafA ratio of 1:100 (w/w). Following a 6-h incubation, the protein solution was loaded onto a prewashed Ni 2þ -NTA column, and the cgPafA-containing flowthrough was collected and loaded onto a Superdex 200 column (GE Healthcare) prewashed with a buffer containing 50 mM Hepes pH 7.5, 500 mM NaCl, and 1 mM DTT. PanB, IdeR, and FabD were expressed and purified as described above. However, following elution from the Ni 2þ -NTA beads, the buffers were exchanged using PD10 columns (GE Healthcare) into pupylation buffers. For IdeR and FabD, a pupylation buffer lacking glycerol was used. Next, cgPafA and Pup E were added to a final concentration of 2.5 and 200 lM, respectively. Following a 6-h incubation at 30 C, standard Ni 2þ -NTA purifications were performed to remove cgPafA and Pup E , as these proteins lack a polyhistidine tag. Hecht et al. . doi:10.1093/molbev/msaa215 MBE The eluted pupylated proteins were further purified by sizeexclusion chromatography using a Superdex 200 column (GE Healthcare) prewashed with a buffer containing 25 mM Hepes pH 7.5 and 300 mM NaCl. For PanB, glycerol (10% v/v).

Multiple Site-Directed Mutagenesis
The QuikChange Lightning Multi Site-Directed Mutagenesis kit (Agilent Technologies) was used to create the 32 Dop combinatorial mutants.

Activity Assays
The buffer used for all in vitro reactions contained 50 mM Hepes (pH 7.5), 20 mM MgCl 2 , 150 mM KCl, 1 mM DTT, and 10% (v/v) glycerol. Samples were analyzed by electrophoresis on a 12% polyacrylamide Bis-Tris gel followed by Coomassie brilliant blue (CBB) staining. Pupylation, depupylation and deamidation assays were performed in a buffer containing ATP (2 mM) at 30 C.
For in vivo activity assays, E. coli cultures harboring plasmids pBAD24 and pCL1920 were grown overnight ($20 h) at 30 C in 5 ml of auto induction media LB broth base (FORMEDIUM) supplemented with 1% glycerol (v/v) and 0.2% arabinose (v/v). Escherichia coli and M. smegmatis lysates were prepared by sonication of cell pellets in microcentrifuge tubes containing 0.5 ml of 1 mM Tris-HCl, pH 8.0, 1 mM EDTA. Cell debris was removed by centrifugation (18,000 Â g, 4 C) for 10 min. Protein content in each sample was determined using Pierce BCA protein assay kit (Thermo scientific). Equal protein amounts were loaded onto SDS-PAGE for electrophoretic separation, followed by transfer onto PVDF membranes and immuno-detection using standard procedures. As a final step after completion of immunodetection, probed membranes were stained by CBB to verify equal loading and transfer of proteins in each lane.

Structural Alignment
Molecular graphics and analyses were performed with the UCSF Chimera package (Pettersen et al. 2004).

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.