Domain acquisition by class I aminoacyl-tRNA synthetase urzymes coordinated the catalytic functions of HVGH and KMSKS motifs

Abstract Leucyl-tRNA synthetase (LeuRS) is a Class I aminoacyl-tRNA synthetase (aaRS) that synthesizes leucyl-tRNAleu for codon-directed protein synthesis. Two signature sequences, HxGH and KMSKS help stabilize transition-states for amino acid activation and tRNA aminoacylation by all Class I aaRS. Separate alanine mutants of each signature, together with the double mutant, behave in opposite ways in Pyrococcus horikoshii LeuRS and the 129-residue urzyme ancestral model generated from it (LeuAC). Free energy coupling terms, Δ(ΔG‡), for both reactions are large and favourable for LeuRS, but unfavourable for LeuAC. Single turnover assays with 32Pα-ATP show correspondingly different internal products. These results implicate domain motion in catalysis by full-length LeuRS. The distributed thermodynamic cycle of mutational changes authenticates LeuAC urzyme catalysis far more convincingly than do single point mutations. Most importantly, the evolutionary gain of function induced by acquiring the anticodon-binding (ABD) and multiple insertion modules in the catalytic domain appears to be to coordinate the catalytic function of the HxGH and KMSKS signature sequences. The implication that backbone elements of secondary structures achieve a major portion of the overall transition-state stabilization by LeuAC is also consistent with coevolution of the genetic code and metabolic pathways necessary to produce histidine and lysine sidechains.


INTRODUCTION
Class I aminoacyl-tRNA synthetases (aaRS) are one of two enzyme superfamilies that, together with their cogna te tRNAs, transla te the genetic code ( 1 ). They perform that function by activating cognate amino acids at the expense of the two high-energy bonds in ATP and transferring the activated aminoacyl gr oup fr om the 5 position of AMP to the tRNA 3 -terminal ribose. Ele v en contemporary Class I aaRS families ar e r esponsible for the hydrophobic amino acids (isoleucine , valine , leucine , methionine), together with the larger homologs of the acidic (glutamate), amide (glutamine), and aromatic (tryptophan, tyrosine) amino acids, as well as arginine and cysteine. Some archaea and bacteria also possess a Class I l ysyl-tRN A synthetase ( 2 ), which in most organisms is a Class II enzyme. All Class I aaRS share the same three domains ( Figure 1 ): a Rossmann dinucleotide-binding catalytic domain, variablelength insertions between the two crossover connections of  ( 11 ), an insertion module before second crossover; C 2 , C-terminal fragment of the urzyme (pyrophosphate binding); ABD, anticodon-binding domain; CTD, C-terminal domain that binds to the tRNA Leu variable loop. ( B ) Three-dimensional cartoon based on PDB ID 1WC2 showing the LeuAC urzyme (surface embedded into the full-length enzyme and the arrangement of modules acquired during the specialization of LeuRS from other Class I aaRS. A76 of the tRNA Leu acceptor stem inserts into the acti v e sites of both LeuRS and LeuAC, bisecting the protozyme (blue) and urzyme part C (C 2 ), the second half of the urzyme. Insertions into a loop connecting the protozyme and C2 are all nested, one within the next, forming the complete CP. In contrast, C-terminal additions are serial.
the Rossmann fold (CP) ( 3 , 4 ) including an embedded editing domain in larger Class I enzymes, and an idiosyncratic anticodon-binding domain, ABD. The acti v e site within the Rossmann fold is highly conserved across the superfamily ( 5 ). Moreover, that domain has been re-constructed without any of the other domains, and it retains all three functions --amino acid activation, retention of the activated amino acid, and transfer to cognate tRNA --necessary to implement a rudimentary form of coding. For these reasons, we have argued that such constructs, called 'urzymes' ( 6 ), are e xcellent e xperimental models for studying the emergence and early evolutionary history of ancestral aaRS (7)(8)(9)(10) and genetic coding generally.
Here, we utilize the leucyl-tRNA synthetase from Pyrococcus horikoshii , LeuRS ( 11 ) and its urzyme, LeuAC ( 12 ), to examine the evolutionary gain of function associated with acquiring the CP and ABD domains. We note that as P. horikoshii expresses the archaeal-like LeuRS, which is monomeric, LeuAC is also very likely monomeric. The location of its post-transfer editing domain within the extended connecting peptide (CP) insertion differs from that in eubacterial LeuRS ( 13 ). Construction of the LeuAC urzyme entailed removing the entire extended CP domain, which includes many acquired insertion modules (Douglas, J., Bouckaert, R., Carter, C.W.J. and Wills, P. (2023) Enzymic recognition of amino acids drove the e volution of primor dial genetic codes. R esearch Square DOI 10.21203 / rs.3.rs-2924681 / v1) as well as the ABD and Cterminal domains (Figure 1 ). LeuAC is thus homologous in that respect to the corresponding urzyme excerpted from Geobacillus stearothermophilus , TrpAC ( 6 , 14 ). Howe v er, the mass deleted from the catalytic domain (377 amino acids) is ∼ fiv e times mor e than wer e deleted (74 amino acids) from the TrpRS catalytic domain. That missing mass includes insertion modules that others have suggested were essential to aminoacylation ( 15 ). LeuAC r equir ed a modest number of sequence changes on the surface to stabilize the removal of the Connecting Peptide. Supplementary Figure S1 in the Supplementary Information for ( 12 ) summarizes these sequence changes.
Two consensus sequence motifs present in the Rossmann fold (HIGH and KMSKS) have been implicated multiple ways in catalysis of both amino acid activation and acyltransfer to tRNA in many Class I aaRS. Their conservation across the superfamily has been an evolutionary curiosity, suggesting that they are central to catalysis. Howe v er, both histidine and lysine r equir e complex biosynthetic pathways, and thus are unlikely to have been available for the earliest stages of codon-dependent translation (16)(17)(18)(19).
The unexpectedly modest impact of mutating lysine residues to alanine in the LeuAC urzyme KMSKS signature ( 12 ) motivated us to examine the catalytic roles of both signatures in both LeuRS and LeuAC. An obvious way to do this is with a factorial design in which both histidines in the HxGH signature are also mutated to alanine separately and in combination. That combinatorial mutagenesis measures the energetic coupling between the two signatures. To our knowledge, and despite a landmark combinatorial mutagenesis of the KFGKT sequence in TyrRS (20)(21)(22)(23), no one has examined the coordinated behavior of the two Class I ca talytic signa tures, e v en in a full-length Class I aaRS.
Comparing that coupling in urzyme and full-length enzyme maps changes in the evolutionary time domain, hence measures the catalytic gain of function from integrating the domains deleted in the urzyme.
The comparison r einfor ces pr evious suggestions ( 24-27 ) that full-length Class I aaRS benefit from a significant intr amolecular cooper a tivity tha t is absent from the corresponding urzymes. We identify side chain packing interactions between the hydrophobic side chains in each catalytic signature and a conserved cluster of nonpolar side chains in the anticodon-binding domain. Embedding the valine and methionine side chains of the LeuRS HVGH and KMSKS sequences into that cluster provides a structural rationale for their coordination during the catalytic cycle, consistent with their catalytic coupling in the full-length enzyme.
The anti-coupling between HVGH and KMSKS signatures in the urzyme has important implications for the emergence and early evolution of polypeptide catalysts. Surprisingly, the AVGA variant of the LeuAC urzyme benefits from both the wild type KMSKS signature and unfavorable coupling energy; it is thus the most acti v e LeuAC variant and is actually more acti v e at aminoacylation than the corresponding mutant of full-length LeuRS. Catalysis by aaRS urzymes does not r equir e specific amino acid side chains that are highly conserved in the acti v e sites of all Class I aaRS, thus significantly broadening the sequence space for primordial catalysts and creating substanti v e e xperimental support for Wong's proposal that the genetic code coe volv ed with the amino acid biosynthetic pathways ( 16 , 17 ).
Finally, the consistent response to combinatorial mutational changes evident from the thermodynamic cycle analysis, and provision of a structural rationale for differential intramolecular coupling in the full-length enzyme both also decisi v el y confirm the authenticity of catal ysis by aaRS urzymes.

Expression and purification of LeuRS and LeuAC
The gene for Pyrococcus horikoshii (Ph) LeuRS was synthesized by Gene Uni v ersal and expressed from pET-11a in BL21-CodonPlus (DE3)-RIPL (Agilent). Cells were grown at 37 • C and induced with 300 M IPTG for 4 h then harvested and stored overnight at -20 • C. The cell pellet was resuspended in 1 × Ni-NTA buffer (20 mM Tris, pH 8.0, 300 mM NaCl, 10 mM imidazole, 5 mM ␤-ME) plus cOmplete protease inhibitor (Roche) and lysed by three 15k psi passes on an Av estin Emulsifle x. Cell debris was pelleted at 4 • C 30 min 20k rpm. The soluble fraction was hea ted a t 80 • C for 30 min to denature nati v e Esc heric hia coli proteins. The heated cell extract was then pelleted, and the soluble material was loaded on to an equilibrated Ni-NTA column. The column was washed with three volumes 1 × Ni-NTA buffer, then protein was eluted in a stepwise fashion with imidazole concentrations of 40, 80, 100, 200, and 500 mM imidazole. The fractions containing the protein of interest were pooled and dialyzed overnight against 200 mM HEPES, pH 7.4, 450 mM NaCl, 100 mM KCl, 10 mM ␤-ME. The following day the dialyzed protein was concentrated, brought to 50% glycerol and stored at -20 • C.
LeuAC was expressed as an MBP fusion from pMAL-c2x in BL21Star(DE3) (Invitrogen). Cells were grown, induced, harvested, and l ysed similarl y to Ph LeuRS with the distinct difference of being resuspended in buffer (20 mM Tris, pH 7.4, 1 mM EDTA, 5 mM ␤-ME, 17.5% glycerol, 0.1% NP40, 33 mM (NH 4 ) 2 SO 4 , 1.25% glycine, 300 mM guanidine hydrochloride) plus cOmplete protease inhibitor (Roche). LeuAC crude extract was then pelleted at 4 • C 30 min 15k rpm to remov e insolub le material. The extract was then diluted 1:4 with Optimal Buffer and loaded onto equilibrated Amylose FF r esin (Cytiva). The r esin was washed with fiv e column volumes of buffer and the protein was eluted with 10 mM maltose in Optimal Buffer. Fractions containing protein were concentrated, brought to 50% glycerol and stored a t -20 • C . All protein concentrations were determined using the Pierce ™ Detergent-Compatible Bradf ord Assa y Kit (Thermo Scientific). Experimental assays were performed either with the intact MBP-LeuAC fusion protein or with samples cleaved by tobacco etch virus (TEV) protease, purified as described ( 28 ). Purity and cleavage efficiency was determined by running samples on PRO-TEAN ® TGX (Bio-RAD) gels.

Single turnover active-site titration assays
Acti v e-site titration assays were performed as described ( 29 , 30 ). 3 M of protein was added to 1x reaction mix Nucleic Acids Research, 2023, Vol. 51, No. 15 8073 (50 mM HEPES, pH 7.5, 10 mM MgCl 2 , 5 M ATP, 50 mM amino acid, 1 mM DTT, inorganic pyrophosphatase, and ␣-labeled [ 32 P] ATP) to start the reaction at 37 • C for LeuAC and on ice, or slightly above 0 • C for LeuRS. The ␣ labeling position allowed us to follow time courses of ADP and AMP production, as well as for ATP consumption. Timepoints were quenched in 0.4 M sodium acetate 0.1% SDS and kept on ice until all points had been collected. Quenched samples were spotted on TLC plates, developed in 850 mM Tris, pH 8.0, dried and then exposed for varying amounts of time to a phosphor image screen and visualized with a Typhoon Scanner (Cytiva). ImageJ ( 31 ) was used to quantitate intensities of each nucleotide. The time-dependent of loss (ATP) or de novo appearance (ADP, AMP) of the three adenine nucleotide phosphates were fitted using the nonlinear r egr ession module of JMP16PRO ™ (SAS Institute, Cary NC) to equation ( 1 ) ( 29 ): where k chem is the first-order rate constant, k cat is the rate of turnover, A is the amplitude of the first-order process or the size of the 'burst', and C is an offset. The burst size is proportional to the number of acti v e molecules responsible for producing a product in the first round of catalysis in the added catalyst. Eq. ( 1 ) was also used f or con venience to fit the appearance of nucleotide products, AMP and ADP. In that case the roles of A and C are inverted, C being the burst size and A the offset and both rate constants have the opposite sign.

Burst size estimation from fitted parameters in equation ( 1 )
For

tRNA Leu aminoacylation assays
A plasmid encoding the P. horikoshii tRNA Leu (TAG codon) was synthesized by Integrated DNA Technologies and used as template for PCR amplification of the tRNA and upstream T7 promoter and downstream Hepatitis Delta Virus (HDV) ribozyme. The PCR product was used directly as template for T7 transcription. Following a 4-hour transcription at 37 • C the RNA was cy cled fiv e times (90 • C for 1 min, 60 • C for 2 min, 25 • C for 2 min) to increase the cleavage by HDV. The tRNA was purified by urea PAGE and crush and soak extraction. The tRNA 2 -3 cyclic phosphate was removed by treatment with T4 PNK (New England Biolabs) following the manufacturer's protocol. The tRNA was then phenol chloroform isoamyl alcohol extracted, filter concentrated, aliquoted, and stored at -20 • C. We determined the acti v e fraction of tRNA Leu by following extended acylation assays using full-length P. horikoshii LeuRS until they reached a pla teau. Tha t pla teau value was 0.38, which we used to compute tRNA Leu concentrations in assays with both LeuRS and LeuAC.
Aminoacylations were performed in 50 mM HEPES, pH 7.5, 10 mM MgCl 2 , 20 mM KCl, 5 mM DTT with indicated amounts of ATP and amino acids. Desired amounts of unlabeled tRNA --mixed with [ ␣ 32 P] A76-labeled tRNA for assays by LeuAC --were heated in 30 mM HEPES, pH 7.5, 30 mM KCl to 90 • C for 2 min. The tRNA was then cooled linearly (drop 1 • C / 30 s) until it reached 80 • C when MgCl 2 was added to a final concentration of 10 mM. The tRNA continued to cool linearly until it reached 20 • C.

Data processing and statistical analysis
Phosphorimaging screens of TLC plates were densitometered using ImageJ ( 31 ). Data were transferred to JMP16PRO ™ via Microsoft Excel (version 16.49), after intermedia te calcula tions. We fitted acti v e-site titration curv es to Eq. ( 1 ) using the JMP16PRO ™ nonlinear fitting module. R 2 values were all > 0.97 and most were > 0.99.
Factorial design matrices in Supplementary Tables S1 and S2 were processed using the Fit model multiple regression analysis module of JMP16PRO ™ Pro, using an appropriate form of equation ( 2 ) where Y obs is a dependent variable, usually an experimental observation, β 0 is a constant deri v ed from the av erage value of Y obs , β i and β ij are coefficients to be fitted, P i,j are independent predictor variables from the design matrix, and ε is a residual to be minimized. All rates were converted to free energies of activation, G ‡ = -RT ln( k ), before regression analysis because free energies are additi v e, whereas rates are multiplicati v e. For e xample, the acti vation free energy for the first-order decay rate in single-turnover experiments is G ‡ k chem . Multiple r egr ession analyses of factorial designs exploit the r eplication inher ent in the full collection of experiments to estimate experimental variances on the basis of t -test Pvalues, in contrast to the presenting error bars showing the variance of individual data points. Anal yses reported here also entail triplet experimental replicates, which improve the associated analysis of variance.

RESULTS
The experimental design matrix for this work is the 2 3 factorial design that tests all possible combinations of HVGH → AVGA and KMSKS → AMSAS mutations in both full-length LeuRS and the LeuAC urzyme. The eight variants wer e constructed, expr essed, and purified as described previously ( 12 ) and in MATERIALS AND METHODS. We assayed all variants for leucine activ ation b y single turnov er acti v e-site titrations (Supplementary Tab le S1) and for aminoacylation of tRNA Leu in triplicate (Supplementary Tab le S3). Acti v e-site titration e xperiments provided suitable estimates of burst sizes for all variants of both LeuAC and LeuRS. Howe v er, first-or der rate constants were too fast at room temperature to fit reliably for LeuRS variants. Howe v er, reactions on ice (at ∼0 • C) were sufficiently slow to permit corresponding estimation of the mutational impact on leucine activation by LeuRS variants. The first section describes those results.
We then describe aminoacyla tion ra tes measured at satur ating concentr ations of leucine ( 12 ) and the highest achievable tRNA Leu concentrations. The next section compares mutational impacts on leucine activation and tRNA Leu aminoacylation by both LeuAC and LeuRS. A subsequent section places those results into the context of the evolutionary gain of function in evolving LeuRS from LeuAC. Finally, we provide a structur al r ationale for the enhanced energetic coupling between the two signatures in the fullye volv ed enzyme.

Single turnover experiments give consistent first-order rates for changes in ATP, ADP, and AMP for both LeuAC and LeuRS
Single turnover kinetic experiments, often called active-site titr ations (AST) use substr ate le v el amounts of enzyme. They measure the size of the burst corresponding to the first round of catalysis and are therefore the method of choice to estimate fractional activities of enzyme variants ( 29 , 33 ) and normalizing enzyme concentrations when computing k cat values.
AST assa ys perf ormed here to estimate the acti v e fractions in all variants provide supplemental evidence that mutational impacts are comparable in both leucine activation and aminoacylation by LeuAC. We uncovered an unexpected phenomenon when we followed the time courses for all three adenine nucleotides in AST assays of aaRS urzymes ( Figure 2 ). ATP consumption is accompanied by ADP production ( 12 ), in addition to the expected products, aminoacyl-5 AMP and AMP, which is produced by product release and hydrolysis. Prior to analyzing kinetic data for the thermodynamic cycles pr esented her e, we studied these curves in detail to assure a consistent interpretation.
Quantitation of the LeuAC acti v e-site titration curv es for the three adenine nucleotides (  Table S2).
LeuAC variant-specific column vectors in Figur e 2 B ar e quite strictly parallel; all six correlation coefficients have R 2 values > 0.99. Thus, although combinatorial mutagenesis changes the acti v e fractions, it has no effect on the relati v e magnitudes of the four different n-values. Strict linear dependence means that, for the purpose of normalizing the acti v e fractions of different variants in analyzing aminoacyla tion ra tes, any of the n-values would be equally suitable. For the present purpose, we chose to normalize acylation rates using the n T values, to estimate the minimum rate en-hancements of acylation by full length LeuRS, relati v e to LeuAC.
The preceding analysis provides additional circumstantial evidence supporting our earlier proposal that ADP production arises from successi v e phosphorylations of bound leucyl-5 AMP. That proposal was originally based on two observations ( 12 ). (i) AMP production exhibits a first order r ate ear ly in the titr ation, followed by a sharp increase in the steady state rate (see Supplementary Figure S2(c) in ( 12 ). (ii) The 3D structure of the LeuAC acti v e-site configuration provides alternati v e ATP-binding sites that place the ␥ -phosphate group into suitable positions to phosphorylate the ␣-phosphate of the bound adenylate first to ADP and then to ATP (see Figure 9 of ( 12 )). If those two phosphorylation r eactions r egenerated ATP, it would promote bound Leu-5 AMP to a transition-state-like configuration. That would be consistent with the relati v ely independent burst sizes and relati v e proportions of ADP and AMP.

Combinatorial mutants of HIGH and KMSKS catalytic signatures in LeuRS reduce its catalytic proficiency for tRNA Leu acylation to that of LeuAC and its mutational variants
Supplementary Table S4 summarizes the experimental data for aminoacylation of tRN A Leu . A histo gram of transitionsta te stabiliza tion free energies for tRNA Leu aminoacylation, G ‡ k acyl , for all eight variants ( Figure 3 ) re v eals that the two catalytic signatures make profound contributions to the catalytic proficiency of the full-length enzyme. The distributions in the upper right of Figure 3 B illustrate that G ‡ measurements for all LeuRS mutational variants hav e values comparab le to those of the wild type Urzyme LeuAC and its corresponding muta tions. Ra ther than forming a separate distribution in the neighborhood of the WT LeuRS, as observed f or LeuAC variants, an y mutation compromises all LeuRS variants to le v els of the LeuAC urzyme and its combinatorial mutants. Acti v e-site mutations therefore hav e se v ere effects on the full-length enzyme, but substantially smaller effects on the urzyme. Thus, neither wildtype urzyme acti v e site nor any full-length LeuRS variant can achie v e a comparab le rate enhancement chemistry to that of the full-length enzyme.
AST assays (Figure 2 ) furnish values of the internal firstorder rate constant for first-round catalysis of leucine activation. Using these values, we can compare the impacts of mutation on the activation r eaction. Figur e 3 A compar es transition-sta te stabiliza tion free energies of the four LeuAC variants in the two successi v e steps of tRNA Leu biosynthesis. Qualitati v ely, the main difference between them is that all variants have lower G ‡ for activation than for aminoacylation. The AVGA mutant is markedly faster than any of the others, and the AMSAS variant is slowest by a small margin over the double mutant. These differ ences ar e easier to a ppreciate w hen tr ansformed by estimating par ameters for the corresponding thermodynamic cycle [Eq. (2)].

Thermodynamic cycle analysis emphasizes the gain of function resulting from ABD and CP domain acquisition
The factorial design matrices (Supplementary Tables S1, S2) enable us to attribute a magnitude and statistical signif-  icance of free energy contributions to transition-state stabilization for intrinsic effects of each signature sequence and the synergy between them. For this purpose we use a construct known as a thermodynamic cycle, advocated by Jencks ( 34 ) and popularized by Horovitz and Fersht ( 35 ). As originally formulated, construction of a thermodynamic cycle entails computing differences observed in activation free energies, ( G ‡ ), at each corner of the implicit polygon of the factorial design. For a double mutant cycle, that involves a square with WT enzyme, the two single mutants, and the double mutant at the corners of a square (see Supplementary §II and Supplementary Figure S1).
We showed ( 26 ) that in practice it is more convenient to use multiple r egr ession methods to estima te the coef ficients in Eq. (2). When values of the experimental error, ε , are small and the r egr ession model explains a high ( ≥0.9) portion of the variation in experimental data points, a thermodynamic cycle is equivalent to changing the coordinate system of the experimental free energies in Figure 3 . Supplementary §II examines this equivalence in more detail, and Supplementary §III outlines how r egr ession coefficients for main effects and lower-order interactions depend on where along the highest-order interaction they are evaluated.
The r egr ession model f or the thermodynamic cycle f or G ‡ kchem from first-order rates, for all three nucleotides either consumed or produced in amino acid activation (Figur e 4 ) r e v eals se v eral notab le details. First, the distribution of Studentized residuals associated with the contributing data points (the prediction error, ε , divided by its adjusted standard error) lies between -4 ≤ 4, so there are no outliers. Second, ther e ar e fiv e significant predictors listed in the table. Their Student t-test probabilities are all ≤0.02, hence are statistically significant. Third, because neither ATP nor ADP are significant predictors of the first-order rate, there is no significant difference between rates for ATP consump-tion and ADP formation for the four variants. AMP production, on the other hand, is significantly different, and the AMP*HVGH ␤-coefficient means that it depends on whether the HxGH sequence is nati v e or mutant. We return to this point after considering mutational effects on tRNA Leu acylation.
First-order rate constants from AST, k chem , are independent of enzyme concentrations, and so do not r equir e normalization for acti v e fraction bef ore con verting to free energies for multiple r egr ession analysis to construct thermodynamic cycles. Figure 4 highlights the statistical quality of the r egr ession model ( Figure 5 omits the corresponding models). Figure 5 displays histograms of the ␤-coefficients for the acylation reaction in kcal / mol, with error bars denoting the 95% confidence le v els. Arrows in Figure 5 B form cycles showing that the net free energy change of the successi v e steps total zero. The difference between left and right vertical arrows must equal that between top and bottom arrows, and is equal to the free energy, ( G ‡ ) for the HVGH*KMSKS interaction.
The most striking feature in Figure 4 is that samples at the extremes (the black dots = WT; the blue dots = AVGA) are opposite of what might be expected. The AVGA mutant is, by ∼ 1 kcal / mol, the most acti v e catalyst in leucine activation and the WT variant is least acti v e. Moreov er, although the sign of the ␤ KMSKS coefficient is negati v e, signifying that the wild-type lysine residues favor catalysis, the synergy between it and the HIGH sequence, ␤ HVGH * KMSKS , is positi v e. Thus, when it comes to stabilizing the transition state for the internally catalyzed reaction, the two signatures actually work against one another. That, and the positi v e ␤ HIGH value, account for the superiority of the AVGA variant, which is ∼5 times more acti v e that wild-type LeuAC.   sequence.Combinatorial mutagenesis has comparable impacts on leucine activation and tRNA Leu aminoacylation by LeuAC.
The KMSKS and HVGH*KMSKS effects behave very differently in full-length LeuRS. The inversion of these effects from left to right r epr esents catalytic consequences of adding the anticodon binding domain and CP (as well as CP2) to the LeuAC urzyme. That consequence is a substantial increase in transition-sta te stabiliza tion for both reactions in LeuRS by enforcing coupled behavior on the two ca talytic signa tures, as discussed further in the next section and the DISCUSSION.
Thermodynamic cycle parameters built from G ‡ k chem values from time courses of the three nucleotides plus acyla tion illustra te how muta tions af fect all four ra te constants ( Figure 6 ). The main effects of wild-type HVGH and KM-SKS signatures and their two way interaction are essentially identical in the LeuAC urzyme (Figure 6 A, C). Mutational changes also impact both steps of the reaction by full-length LeuRS consistently (Figure 6 B, C). Those for AMP production and tRNA Leu aminoacylation by LeuRS are nearly colinear (Figure 6 B). Howe v er, an important distinction for LeuRS emerges in Figure 6 C. Only AMP production by full-length LeuRS has the same sensitivity to  Figure S6 provides a scatterplot matrix of the full set of these values. ( D ) Burst sizes for the three reactions measured by AST for full-length LeuRS. The last two columns gi v e the percentages of the total ATP consumption as ADP and AMP. Note that wild type LeuRS produces mostly AMP, consistent with the correlation between AMP production and acylation in C. mutations as does aminoacylation. In contrast, ATP consumption and ADP production exhibit similar mutational profiles that both respond favorably to the AMSAS mutation (i.e. G ‡ k chem > 0).
Products of the internal first-order reactions in the first round of catalysis also distribute differently in full-length LeuRS and the LeuAC urzyme. The ratio, M / D, of AMP to ADP produced (Supplementary Figure S4A; R 2 = 0.93; P ≤ 0.0001) is inversely proportional to the presence of the nati v e KMSKS sequence. Thus, increased ATP consumption associated with LeuAC variants lacking the wild-type lysine residues is also associated with lower amounts of the expected product, AMP. In contrast, the M / D ratio in fulllength LeuRS variants depends fully on both main effects (HVGH and KMSKS) and on their two-way interaction (Supplementary Figure S4B; R 2 = 0.99; P < 0.0001), in a manner similar to the dependence of the activation free energies, G ‡ k chem . Figure 6 C shows that contributions of the HVGH and KMSKS signatures and their two-way interactions to ATP consumption and ADP production by LeuRS are distinctly different from those contributing to AMP production and tRN A aminoacylation. In fact, their m utational profiles are closely anticorrelated to those for AMP production and acylation, as well as those for all four reactions by LeuAC ( < R 2 > = 0.85 ± 0.1 see Supplementary Figure  S5). Together with thermodynamic cycles for rates of AMP production by LeuAC and LeuRS (Supplementary Figure  S3), these observations suggest an overriding distinction between catalysis of aminoacylation by full-length LeuRS and all 7 remaining variants illustrated in Figure 3 B. Deletion of any contributor to the wild-type full-length LeuRS mechanism --either of the various domains missing in the urzyme or of either or both signature sequence --changes the catalytic mechanism in similar underlying manner (see again the distribution in the upper right of Figure 3 B).

Nonpolar sidechains V51 and M651 anchor both HVGH and KMSKS signatures securely into the ABD
It is worth noting that although only minimal structural data are available for any aaRS urzymes ( 36 ), neither of the hydrophobic residues in the two signature sequences appears from crystal structures of full-length Class I aaRS to be engaged in any significant nonpolar packing interactions within in LeuAC itself (Figure 7 B). By contrast, a well-de v eloped packing network of nonpolar amino acids including the two hydrophobic side chains together with residues from the ABD in full-length LeuRS anchors the valine and methionine residues within that domain (Figure 7 ). Delaunay tessellation and likelihood scoring ( 37 , 38 ) identified a homologous network in TrpRS ( 39 ), and was used here to identify residues detailed in Figure 7 C. Similar networks can be identified in all full-length Class I aaRS. They appear to be both necessary and sufficient to coordinate the behavior of the two catalytic signatures, as shown schematically in Figure 7 D.

DISCUSSION
The role of cooperati v e interactions between acti v e-site residues in enzyme catalysis is a recurring question that is seldom addressed directly by experimentation ( 40 ), despite well-established protocols for thermodynamic cycle analysis of combinatorial mutants ( 35 ). The thermodynamic cycle is a revealing linear transformation of the experimental transition sta te stabiliza tion free energies, G ‡ , which as noted elsewher e ar e additi v e. The alternati v e coor dinate system highlights the magnitude and sign of individual and energetically coupled contributions to catalysis.

Use of 32 p ␣-ATP in active site titrations reveals hidden mechanistic micr oheter ogeneity
Introducing the use of 32 P ␣-ATP in acti v e site titration assays has expanded our appreciation of the complexity of first-or der e v ents in the first round of catalysis. The ability to track all three adenine nucleotides re v eals une xpected production of small amounts of ADP e v en by full length LeuRS and especially in LeuRS acti v e-site variants (Supplementary Table S2). Indeed, as discussed in RESULTS in other contexts, loss either of the additional domains or the intact catalytic apparatus in the acti v e site is sufficient to dramatically alter the flow of chemical free energy during amino acid activation.
The equivalence between ATP consumption and the sum of ADP and AMP burst sizes, combined with the similarity of first-order rates for the three processes by LeuAC implica te similar ca talytic machinery in both ADP and AMP production. The KMSKS sequence in full-length Class I aaRS is involved with transient binding of the PP i leaving group of ATP (see, for example, the effects of removing PPi from the free energy surface traversed during catalysis by TrpRS ( 41 )). A more e xtensi v e discussion of this question is in Supplementary §IV.

Emergent coupling deepens evidence for the authenticity of urzyme catalysis
We cannot overlook the obvious conclusion that the contrast between the uncoupled behavior of the two catalytic signatures in LeuAC and their strong coupling by the ABD and CP in full length LeuRS r einfor ces the authenticity of both LeuAC ( 12 ) and TrpAC ( 25 ) catalysis. The evolution of intramolecular coupling shown here therefore materially strengthens the argument that aaRS urzymes r epr esent valid experimental models for the ancestral assignment catal ysts that originall y enabled nature to mine the immense functional di v ersity r epr esented by proteins ( 7 ).

The LeuAC studies complement and extend earlier thermodynamic cycle analyses of TyrRS and TrpRS mechanisms
In the present case the coupling between ca talytic signa tures might, in principle, arise solely from the sequence differences far from the acti v e site on the newly created molecular surface of LeuAC , ra ther than from the removal of the CP and anticodon-binding domains. We cannot envision any practical way to rule this possibility out, as the nati v e sequences lead to insoluble aggregates of LeuAC. Prior studies comparing TrpAC with TrpRS substantially strengthen confidence in the interpretation outlined here, which implicate domain movement and intramolecular communication in the enhanced function of the full-length enzyme.
Class I aminoacyl-tRNA synthetase enzymes (aaRS) affor d se v eral e xamples of how thermodynamic cy cle analysis probes structur e-function r elationships. First and Fersht showed that the effects of mutating residues in the KM-SKS loop were localized quite specifically along the reaction profile to destabilize the ground state pre-transition state and stabilize the transition state for tyrosine activation by TyrRS ( 20 ). The KMSKS signatur e br eaks the synergistic binding of amino acid and ATP in the pre-transition state (20)(21)(22)(23).
Destabilization by the KMSKS signature of the transition states for both activation and aminoacylation in LeuRS appears to be inconsistent with tha t observa tion. A plausible rationale for this appar ent discr epancy is that the behavior in TyrRS also depends on interaction between the two signa tures. Tha t interpreta tion is especially appealing because LeuAC exhibits the opposite dependence: the wild-type KMSKS sequence itself destabilizes both transition states ( Figure 5 ). Its catalytic contribution to both reactions becomes favorable only by virtue of the two-way interaction.
A packing motif ( 42 ) called the D1 switch ( 39 ) links the two ␤-strands and ␣-helix of the first crossover connection to the N-terminal ␤-strand of the second crossover connection of the Rossmann dinucleotide-binding domain, creating a conformational transition state that mediates shear during relati v e domain motion of the ABD and CP and imposing discr ete pr e-transition and products conformations in TrpRS ( 41 , 43 , 44 ) and likely in other Class I aaRS. Combinatorial mutagenesis of four residues from the D1 switch, assayed with both Mg 2+ and Mn 2+ established a five-dimensional thermodynamic cycle. That cy cle re v ealed that the packing motif, which is ∼20 Å from the acti v e-site metal, is ne v ertheless coupled by -6.2 kcal / mol to the catalytic function of the metal ( 26 ). That contribution nearly equals the full catalytic contribution, -6.4 kcal / mol, of the metal measured by assaying metalfree TrpRS ( 45 ). A complementary, modular thermodynamic cycle consisting of TrpRS, its urzyme, TrpAC and TrpAC plus either the CP insert (i.e. the intact catalytic domain) or the ABD showed that both CP and the ABD alone actually reduced the activity of TrpAC in aminoacylation of tRNA Trp , but that both together r estor ed full activity, a contribution of ∼5 kcal / mol --nearly the entire contribution of Mg 2+ ( 25 ). Finally, both the combinatorial mutagenesis ( 24 ) and modular cycles ( 25 ) also impli-cate the corresponding energetic coupling in amino acid specificity.
Coupling energies between D1 switch residues and Mg 2+ in transition-state stabilization are comparable to those between the TrpRS ABD and CP domains (24)(25)(26)(27). The minimum action path connecting successi v e structures along the TrpRS structural reaction profile implies that repacking D1 switch residues in the conformational transition state is ratelimiting between pre-transition state and products domain configur ations ( 41 , 43 , 44 ). These fr agmentary experimental vignettes consistently suggest that a central function of the ABD and CP domains is to impose coordinated motion on components of the acti v e site, such that they all assemble sim ultaneousl y into a catal yticall y acti v e, discriminating configuration ( 24 , 25 ).

Experimental exploration of the time domain
The thermodynamic cycles comparing LeuRS to its putati v e ancestral urzyme form in Figures 3 -5 extend experimentation into the evolutionary time domain (46)(47)(48)(49)(50)(51)(52)(53). In this manner we identify quantitati v ely the e xplicit gain of function induced in Class I leucyl-tRNA synthetase by the acquisition of the connecting peptide 1 (CP) insertion and anticodon-binding (ABD) domains. The new data reported her e r einfor ce the pictur e outlined in previous par agr aphs. In LeuAC, the absence of the two larger domains leaves the two catalytic signatures uncoordinated, contributing in unexpected ways to transition state stabilization. In full length LeuRS, on the other hand, although neither signature alone can enhance transition-state stabilization, their combination, in the context of the two additional domains, furnishes substantial catal yticall y producti v e synergy. That we now have identified much the same phenomena we observed for TrpRS, the smallest Class I aaRS in one of the largest as well as the smallest, Class I aaRS suggests generality in the superfamily. The superfamily-wide conservation of the 'enforcer' packing motif illustrated in Figure 7 C, in which high SNAPP potential ( 37 , 38 ) hydrophobic clusters anchor the sole hydrophobic side chains of each catalytic signa ture, coordina ting their motion r einfor ces its generality.

Successively acquired CP and anticodon-binding domains ma y hav e had distinct, less obvious selectiv e ad vantages
We have argued that aaRS of both Classes began as 46residue polypeptides called 'protozymes' that accelerated the rate of amino acid activation ∼10 6 -fold ( 54 ). That hypothesis gained support from validation by Tamura's group ( 55 ). Thus, it is possible that the HxGH signature has roots in the protozyme. Class I aaRS urzymes, howe v er, ar e mor e than twice as big ( ∼130 residues), and the KMSKS signature must have roots in that subsequent stage. The selective advantage of the KMSKS signature likely contributed to the acquisition of the second half of urzymes, which acceler ated the r ate of activation by 10 3 -fold over that for the protozyme.
As noted previously ( 25 ), neither CP nor the ABD enhanced either specificity or catalytic rate enhancement of the TrpRS urzyme. This prompts the question of how nature selected either domain except in the unlikely case that the urzyme acquired both sim ultaneousl y. The AST assays performed to assess the fraction of acti v e catalysts in each sample suggest a possible resolution of that conundrum. The proportion of total ATP usefully converted to AMP production by LeuAC (0.21) is roughly 25% that (0.8; Figure 6 D) observed for LeuRS. If the CP domain stabilizes closed, producti v e forms of the urzyme so that it used ATP more ef ficiently, tha t would certainly hav e gi v en it a selecti v e advantage, e v en without imposing the coupling between the two signatures. This hypothesis is experimentally testable.

Significant r ate acceler ation by early polypeptide catalysts did not r equir e sophisticated amino acid side chains
The LeuAC AVGA variant ( < G ‡ > = 1.9 kcal / mol) is quantitati v ely more acti v e than the same variant of the full-length LeuRS ( < G ‡ > = 2.1 kcal / mol). That brings us to what is perhaps the most consequential implication from the data in Figures 2 -4 . The overall rate enhancements observed for amino acid activ ation b y aaRS urzymes excerpted from two Class I (TrpRS ( 6 , 7 , 14 ), LeuRS ( 12 )) and one Class II aaRS (HisRS ( 10 , 25 )) r epr esent ∼60% of the overall transition-state stabilization free energy observed for the corresponding full length aaRS. How much of that must we attribute to the precise orientation of specific amino acid side chains in the acti v e site?
Our data suggest that the surprising answer to that question is: a pparentl y very little, if any of it. Indeed, the superiority of the AVGA LeuAC mutant means that the two histidine residues in that highly conserved signature actually reduce the rate of aminoacylation unless they are coupled to the KMSKS sequence by the additional domains. Moreover, the AMSAS mutant is almost as active as the WT LeuAC. These observations are consistent with the previous m utational anal ysis of the TrpAC urzyme, w here m utation of an aspartate at the N-terminus of the C-terminal ␣-helix of the second crossover connection in the Rossmann fold reduces activity by ∼200-fold in TyrRS but increases activity ∼25-fold in the TrpRS urzyme.
We previously noted that secondary structural; i.e. backbone interactions in Class I and II urzymes could account for tRNA ( 56 , 57 ) and amino acid ( 58 ) substrate selection ( 7 ). Together with our pre vious e vidence that the TrpAC urzyme may be a catal yticall y acti v e molten globule ( 36 ), these results substantiate the possibility that substantial catalytic rate enhancement does not r equir e pr ecise orientation of specialized amino acid side chains. It now seems that primordial protein catalysts achieved high rate enhancements and a modicum of substrate specificity largely via physical properties of the acti v e-site pocket that depend largely on backbone interactions that do not r equir e specific amino acid side chains.
That conclusion lends a strong experimental basis for Wong's coevolution theory for the emergence of the genetic code and its coevolution with metabolic pathways necessary to synthesize amino acids that were not produced in abundance by prebiotic geochemistry ( 16 , 17 ).

DA T A A V AILABILITY
JMP is a product of the SAS corporation, Research Triangle, NC, USA. All rate data are provided in Supplementary Tables, are accessible from The Protein Databank, or upon request from the author.