Structural basis for the evolution of cyclic phosphodiesterase activity in the U6 snRNA exoribonuclease Usb1

Abstract U6 snRNA undergoes post-transcriptional 3′ end modification prior to incorporation into the active site of spliceosomes. The responsible exoribonuclease is Usb1, which removes nucleotides from the 3′ end of U6 and, in humans, leaves a 2′,3′ cyclic phosphate that is recognized by the Lsm2–8 complex. Saccharomycescerevisiae Usb1 has additional 2′,3′ cyclic phosphodiesterase (CPDase) activity, which converts the cyclic phosphate into a 3′ phosphate group. Here we investigate the molecular basis for the evolution of Usb1 CPDase activity. We examine the structure and function of Usb1 from Kluyveromyces marxianus, which shares 25 and 19% sequence identity to the S. cerevisiae and Homo sapiens orthologs of Usb1, respectively. We show that K. marxianus Usb1 enzyme has CPDase activity and determined its structure, free and bound to the substrate analog uridine 5′-monophosphate. We find that the origin of CPDase activity is related to a loop structure that is conserved in yeast and forms a distinct penultimate (n – 1) nucleotide binding site. These data provide structural and mechanistic insight into the evolutionary divergence of Usb1 catalysis.


INTRODUCTION
Usb1 is a member of the 2H phosphodiesterase superfamily of enzymes, which contain two vicinal active site HxS/T motifs that are essential for catalysis. The 2H superfamily can be sub-divided into HxT and HxS enzymes, the latter of which contains Usb1. 2H superfamily enzymes act on myriad RNAs and nucleotide substrates, and are capable of many catalytic activities, including 2 ,5 RNA ligase or nuclease, 2 ,5 or 3 ,5 -phosphodiesterase, and 1 ,2cyclic or 2 ,3 -cyclic phosphodiesterase (CPDase) activities (1). These divergent activities are all thought to utilize two catalytic histidines within the central HxS/T motifs that act as a general acid and base, while the serine or threonine residues help coordinate substrates and assist in transition state stabilization (2)(3)(4)(5). Although the overall sequence conservation among 2H enzymes is rather low, all crystal structures of family members determined thus far display a characteristic fold with conserved terminal and transit lobes and the HxS/T motifs centrally positioned in a substrate binding cleft (3,(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15).
Several Usb1 crystal structures have been determined from human and S. cerevisiae (5,11,14). Despite sharing low sequence identity (<20%), the human and S. cerevisiae Usb1 enzymes have overall highly similar structures. Structures of HsUsb1 bound to nucleotides and intact RNA, in combination with kinetic analyses and QM/MM simulations, have revealed insights into its catalytic mechanism (5,14). However, the mechansim of CPDase activity in the S. cerevisiae enzyme remains less well understood. No structures of ScUsb1 bound to nucleotides or RNA have been determined, nor are there any known structures for a 2H superfamily enzyme with 2 -CPDase activity bound to sub- Figure 1. In vitro characterization of K. marxianus Usb1 activity. (A) Schematic cartoon of the catalytic mechanism of Usb1. Usb1 is a 3 -5 exoribonuclease that leaves a 2 ,3 -cyclic phosphate. Some orthologs of Usb1 have additional CPDase activity that hydrolyzes the cyclic phosphodiester linkage to yield a 3 -monophosphate group. (B) The exoribonuclease activity of K. marxianus Usb1. Use of a minimal substrate shows that KmUsb1 is indeed an exoribonuclease (lanes [2][3][4][5]. The ribonuclease activity is completely blocked by 2 -deoxy modification of the n -1 uridine (lanes 6-9). 2 -Deoxy modification of the n -2 uridine results in single cleavage products (lanes 10-13) that allow facile monitoring of enzymatic activity. (C) Close-up view of products in lanes 4 and 5 of Figure 1B. Cleavage products with two types of 3 -end phosphates are observed at higher concentrations of K. marxianus enzyme for substrate. (D) 3 -end phosphoryl formation by Usb1. A single-cleavage substrate (5 -UAUUUdUUU-3 ) was incubated with 0-10 M KmUsb1 and products (n -1 P , lanes 2 and 3) were then treated with CIP (lanes 4 and 5) or PNK (lanes 6 and 7). The shifted band (n -1 OH ) is generated by removal of 3 -end phosphoryl groups by either CIP or PNK, the former of which can only remove non-cyclic phosphates (E) Comparison of products in the presence of 1.0 and 10 M enzyme. Product (%) in lanes 4 and 5 shown in Figure 1D was quantified upon fluorescence intensity and indicates the relative CPDase activity. (F) Comparison of K. marxianus and human Usb1 activities. Reaction mixtures contained 1 M unmodified substrate (UAUUUUUU) and either 10 M KmUsb1 (lanes 2-6) or HsUsb1 (lanes 8-12).
strate. It is also not known if ScUsb1 is unique, or if other orthologs of Usb1 harbor CPDase activity. The presence of active site HxS motifs alone cannot explain CPDase activity, as these regions in ScUsb1 and HsUsb1 are superimposable within 0.4Å r.m.s.d. for all 32 atoms of the histidines and serines in the HxS motifs (14). Thus, the molecular basis for 2 -CPDase activity and the divergent activities of Usb1 enzymes cannot be explained by existing structural data.
U6 snRNA is a central active site component of the spliceosome, a large macromolecular machine that catalyzes precursor messenger RNA splicing (20,21). Nascent U6 transcripts, which are synthesized by RNA polymerase III, have heterogeneous polyuridine tails with 2 ,3 -cis-diols (22,23). Processing by Usb1 facilitates the recruitment of U6 to the spliceosome by increasing the binding affinity of the Lsm2-8 complex (14,24,25). In particular, the Cterminus of the Lsm8 protein interacts with the 3 end of U6. We previously reported that the C-terminal sequences of Lsm8 from various organisms fall into two families, and hypothesized that they evolved to recognize cyclic phosphates and 3 phosphates (25). The Lsm8 proteins from S. cerevisiae and related yeast have an extended lysine rich C-terminal sequence, which interacts electrostatically with the negatively charged 3 phosphate group. In constrast, the Lsm8 proteins that recognize a cyclic phosphate lack this Cterminal extension. We therefore further hypothesized that Lsm8 sequence alignments can be used to predict U6 3 chemistry, and by extension Usb1 activity, in vivo. For example, alignment of the putative Lsm8 ortholog from the ascomycetous yeast Kluyveromyces marxianus shows its Cterminus is similar to S. cerevisiae, suggesting the presence of a 3 phosphate on its U6 snRNA and therefore CPDase activity in the corresponding ortholog of Usb1 (hereafter, KmUsb1) (Supplementary Figure S1).
Molecular clock measurements estimate that humans and yeast diverged from a common ancestor approximately 1 billion years ago, whereas K. marxianus and S. cerevisiae shared a common ancestor 10 8 years ago (26). In order to provide further information on the structure and divergent catalytic mechanisms of Usb1 orthologs, we have characterized the in vitro activity and structure of the KmUsb1 enzyme. Our in vitro activity assays show that KmUsb1 shares properties of both ScUsb1 and HsUsb1. For example, KmUsb1 displays distributive 3 -5 -exoribonuclease processing that can progressively shorten the U6 snRNA 3 end, which is similar to the human enzyme but not ScUsb1. On the other hand, the enzyme has measurable CPDase activity similar to ScUsb1 but not HsUsb1. In all three orthologs, Usb1 is strongly inhibited by a terminal 3 phosphate.
We report crystal structures of KmUsb1, both free and bound to the substrate analog uridine 5 -monophosphate. The overall architecture is similar to the human and S. cerevisiae enzymes, and the substrate analog occupies the terminal nucleotide ('n') binding pocket in a manner that is very similar to HsUsb1. Structural comparisons and activity assays lead us to hypothesize that the origin of CPDase activity is related to the n-1 nucleotide binding site, for which both yeast species possess a similar loop architecture that is absent in the human enzyme.

Sample preparation
The DNA sequence coding for full-length K. marxianus Usb1 was codon optimized and synthesized by Integrated DNA Technologies for heterologous overexpression in Escherichia coli (Supplementary Data). The Usb1 open reading frame was cloned into the NdeI and BamHI sites of pET3a plasmid carrying the coding sequence of an octahistidine tag, maltose binding protein and TEV protease cleavage site upstream of the NdeI site (27,28). The crystallizable N-terminal deletion mutant (lacking the first 58 residues) was also constructed in the same way. Alanine substitution mutation was introduced by inverse PCR with primers designed for the target DNA sequence. All constructs were expressed and purified by the following protocol. Escherichia coli competent cells, BL21(DE3)pLysS (Invitrogen) were transformed with the above plasmids and protein expression was induced at an OD 600 of ∼2-3 by addition of 1 mM IPTG, then grown at 16 • C overnight in terrific broth medium supplemented with 1% glycerol, 100 g/ml of ampicillin and 30 g/ml of chloramphenicol. Cells were harvested by centrifugation and resuspended in immobilized metal affinity chromatography (IMAC) buffer (50 mM HEPES, pH 7.4, 500 mM NaCl, 25 mM imidazole, 1 mM TCEP-HCl and 10% glycerol) supplemented with DNase I, lysozyme and protease inhibitors, then lysed by sonication. The soluble fraction obtained by centrifugation was purified via Ni-NTA agarose resin (QIAGEN), and then eluted with IMAC buffer containing 500 mM imidazole. The eluate was dialyzed against IEX buffer (20 mM HEPES, pH 7.4, 100 mM NaCl, 1 mM TCEP-HCl and 10% glycerol) supplemented with ∼1 mg of TEV protease at 4 • C overnight. Subsequent purification was performed with amylose resin (New England Biolabs) for removal of maltose binding protein. The resultant fraction was directly applied to a HiTrap Heparin-HP column (GE Healthcare) and eluted with a linear gradient from 50 to 600 mM NaCl in IEX buffer. For crystallization, the final product was further dialyzed against buffer (100 mM sodium phosphate, pH 6.8, 100 mM NaCl and 1 mM TCEP-HCl), concentrated to 24 mg/ml with Amicon 10 kDa spin filters (Millipore), and then stored at -80 • C until use.

Crystallization
Crystallization screening for N-terminally truncated K. marxianus Usb1 was performed by vapor diffusion in sitting drops at 4 • C using commercial crystal screens (JCSGplus, MIDAS, Morpheus, PACT premier from Molecular Dimensions, Index HT, PEGRx HT from Hampton Research, and the Cryos Suite from Qiagen). In two months, crystals were found in 1 M potassium sodium tartrate, 100 mM HEPES, pH 7.4 and 30% glycerol. This condition was subsequently optimized by hanging-drop vapor diffusion with 1 l protein and 1 l reservoir solution (1.2 M potassium sodium tartrate, 100 mM HEPES, pH 7.5, 21% glycerol) at 4 • C for 1 week. Initial phases could not be inferred by molecular replacement with either of the known human or S. cerevisiae Usb1 structures. Phase determination was therefore accomplished by single isomorphous replacement with anomalous scattering using a uranyl acetate derivative (29). In order to prepare the heavy atom derivative, the above crystals were transferred into a fresh drop (1.2 M potassium sodium tartrate, 100 mM HEPES, pH 7.5, 21% glycerol, 100 mM NaCl and 1 mM TCEP-HCl) saturated with uranyl acetate, and allowed to incubate at 4 • C for two days. Nine uranium atoms were estimated to reside in the asymmetric unit based on clear density in an anomalous difference map. For the native structure, crystals were generated with 1 l protein and 1l reservoir solution (1.2 M potassium sodium tartrate, 100 mM HEPES, pH 7.5, 30% glycerol). Several glycerol molecules are observed at the active site of the native structure, which likely prevented generating a co-crystal structure of KmUsb1 bound to uridine 5 -monophosphate (5 -UMP) (Supplementary Figure S2A and B). The subsequent trials were thus performed by a modified soaking method in which the native crystals were transferred into a fresh drop containing 1.2 M potassium sodium tartrate, 100 mM HEPES, pH 7.5, 30% PEG 400, 10 mM 5 -UMP and incubated at 4 • C overnight. Many partial fragments of PEG 400, instead of glycerol molecules, were visible in the final structure, but do not interfere with accommodation of 5 -UMP in the active site.

Data collection and refinement
Diffraction data were collected at 100 K on beamline 24-ID-E or 21-ID-F of the Advanced Photon Source using a PILATUS 6MF or Rayonix MX300 detector, respectively. Data were integrated, indexed and scaled with XDS (30) and AIMLESS (31). The initial phases were determined using SHELXC/D/E pipeline (32) as implemented in HKL2Map (Supplementary Figure S3) (33). Automated model building was accomplished in RESOLVE and PHENIX (34,35). Initial phases for the native structure were determined by molecular replacement using the best model from the uranium bound structure. The diffraction data for the UMP bound crystals exhibited anisotropy, and were therefore subjected to ellipsoidal truncation and scaling by the STARANISO server (36). The subsequent refinement was performed via iterative rounds of manual model building in COOT (37) and automated refinement with individual isotropic B-factors, TLS and NCS restraints in PHENIX and REFMAC (38,39). Ligand occupancies were refined in PHENIX. Simulated annealing omit maps were generated in PHENIX, and used for final appraisal of the bound ligands. All figures were generated with PyMOL (http://www.pymol.org/).

Biochemical assays
K. marxianus U6 nucleotides 104-110 with an additional nucleotide at 3 end were utilized as substrate RNAs for K. marxianus Usb1. The sequence of K. marxianus U6 was inferred from homology to the close U6 homologs in Kluyveromyces lactis and S. cerevisiae, with 94 and 89% sequence identity, respectively. The RNA was labeled with 6-carboxyfluorescein at the 5 end (5 -FAM) and modified with four different nucleotides at the 3 end to observe the nucleotide specificity. In order to ensure single cleavage of these substrates, 2 -deoxyuridine was incorporated at the antepenultimate position. All substrate RNAs were purchased from Integrated DNA Technologies (see Supplementary Data) and were gel purified prior to usage.
The assay conditions were first optimized at a wide range of pH values (4.0-10.0) using a buffer containing 4 mM each of acetate, bis-tris, HEPES, Tris and CHES. Assays were performed at 37 • C for 30 min using an equal concentration of substrate RNA and enzyme (Supplementary Figure S4). Unless stated otherwise, the following assays were generally conducted at 37 • C for 30 min with an optimized buffer (20 mM bis-tris, pH 6.5, 100 mM NaCl, 1 mM EDTA, 1 mM TCEP-HCl) containing 1 M RNA and 0-10 M Usb1. To determine the 3 -end phosphate modification left by Usb1, reaction products were treated with 10 units of CIP or PNK under optimum buffer conditions as indicated by the manufacturer (New England Biolabs). To compare products shortened by Usb1 orthologs, 1 M RNA (FAM-UAUUUUUU) was incubated with either 10 M human or K. marxianus enzymes at 37 • C for the indicated times (5).
The independent exoribonuclease activity was monitored in the same conditions as above with a slight modification. Briefly, 1 M RNA was incubated with 1 M Usb1 variants at 37 • C for 10 min, followed by 3 -end dephosphorylation using either calf-intestinal alkaline phosphatase (CIP) or T4 polynucleotide kinase (PNK) to confirm that CPDase activity is undetectable. To prepare substrate for monitoring CPDase activity, 50 M RNA (FAM-UAUUUdUUA) was incubated with 5 M HsUsb1 at 37 • C for 1 h with an optimized buffer (5). The single-cleavage product (FAM-UAUUUdUU>p) was isolated by denaturing gel extraction and purified with HiTrap Q column (GE Healthcare). The quality of resulting product was confirmed by CIP and PNK treatment (Supplementary Figure S5). The CPDase activity was observed in an optimized buffer containing 1 M substrate and 20 M Usb1 variants at 37 • C for 1 hour, followed by 3 -end dephosphorylation using CIP.
The marker RNA ( -OH) was generated by partial hydrolysis of unmodified RNA (FAM-UAUUUUUU) at 95 • C for 10 min in 50 mM sodium carbonate (pH 9.2) and 1 mM EDTA. Reaction progress was determined by denaturing electrophoresis on 20% (19:1) polyacrylamide gels containing 8 M urea, 100 mM tris-borate and 1 mM EDTA, with subsequent visualization of fluorescent substrates and products on a Typhoon FLA 9000 (GE Healthcare).

Characterization of K. marxianus Usb1 activity
A candidate ortholog of Usb1 in K. marxianus was first identified by searching for sequences that harbor two HxS motifs and share similarity with confirmed orthologs of the enzyme. An uncharacterized open reading frame from K. marxianus (YLR132C) was identified with low sequence similarity to human and S. cerevisiae Usb1 (18.9% and 24.7% sequence identity, respectively) ( Supplementary Figure S6). We cloned the open reading frame corresponding to this protein and expressed it in E. coli. We then used a minimal substrate corresponding to the 3 -end region of K. marxianus U6 snRNA to show that the enzyme is indeed an exoribonuclease ( Figure 1B). Modified versions of this substrate, with either a penultimate (n -1) or antepenultimate (n -2) 2 -deoxyuridine, show that the KmUsb1 requires an attacking 2 OH group for the cleavage reaction and that the enzyme lacks endoribonuclease activity, as do other characterized orthologs of Usb1 ( Figure 1A and B) (5,14). When 1.0 M substrate is incubated in the presence of equimolar or ten-fold excess of KmUsb1 enzyme, multiple products of heterogeneous length are observed (Fig. 1B). This behavior is highly similar to the human enzyme, but differs from S. cerevisiae Usb1 which predominantly removes a single nucleotide (5,14). The human enzyme is catalytically inefficient with a k cat /K m = 10 2 M −1 s −1 for a single cleavage event (5). Our measurements of KmUsb1 activity indicate that it is also a catalytically inefficient enzyme in vitro. Interestingly, when the enzyme is in excess, the n-1 and n-2 products have faster electrophoretic mobility relative to the products obtained with a lower concentration of enzyme, and the n -3 and n -4 products are doublets ( Figure 1C, compare lanes 4 and 5). We hypothesized that the faster mobility products are terminal 3 phosphates, which have more negative charge due to a second ionization constant near pH 7. We tested this hypothesis by reacting the single cleavage substrate (n -2 deoxyuridine) with KmUsb1 and then treating the products with the enzymes CIP and PNK (Figure 1D and E). CIP can hydrolyze terminal but not cyclic phosphates, whereas PNK can hydrolyze both. After half an hour of processing with either 1 or 10 micromolar enzyme, the single cleavage substrate appears to be fully processed ( Figure 1D, lanes 2 and 3). However, at 1 micromolar enzyme the products have mostly cyclic phosphates and are resistant to CIP treatment (86% cyclic phosphate versus 14% terminal phosphate; Figure 1E). At higher concentration of enzyme, more CPDase activity is observed as the amount of terminal phosphate products more than double ( Figure 1D and E). The observed behavior is consistent with distributive processing in which the exoribonuclease activity is more efficient than the CPDase activity, with the enzyme dissociating after the first exoribonuclease step and then rebinding to hydrolyze the cyclic phosphate product. The presence of measurable CPDase activity is consistent with alignment of Lsm8 sequences (Supplementary Figure S1). We also compared K. marxianus and human Usb1 activities by time-course experiments in which 1 M unmodified RNA was processed with 10 M enzyme ( Figure 1F). The n -1, n -2 and n -3 products shortened by KmUsb1 are still detectable after two-hour incubation, whereas the corresponding products are unmeasurable in HsUsb1 ( Figure  1F, lanes 6 vs. 12). This difference suggests that the products with monophosphates inhibit further exoribonuclease activity of KmUsb1, as observed previously for ScUsb1. Indeed, we find that KmUsb1 is completely inhibited by a 3 phosphate (Supplementary Figure S7). Thus, inhibition by a 3 phosphate is a common property of all tested Usb1 or- thologs, including HsUsb1 (11), ScUsb1 (14) and KmUsb1. Distributive processing by KmUsb1 is thus diminished by the slow accumulation of inhibitory 3 phosphate products, which are not produced by HsUsb1.

Structure of K. marxianus Usb1
To determine the structure of KmUsb1, crystallization trials for full-length protein (residues 1-275) were performed using several commercial crystal screens but no crystals could be obtained. Subsequent trials were performed with N-terminally truncated Usb1 by removing the first 58 amino acids (residues 59-275, Supplementary Figure S6) that was designed to mimic the crystallizable S. cerevisiae and human orthologs of Usb1 (5,11,14). The structure of KmUsb1 was determined by single isomorphous replacement with anomalous scattering (SIRAS) methods (Supplementary Figure S3) and a uranyl acetate heavy atom derivative ( Table 1). The asymmetric unit contains three subunits (Supplementary Figure S2A), each of which is essentially identical in structure with an average alpha carbon pairwise r.m.s.d. of 0.63Å for residues 61-275.
Consistent with previously determined Usb1 structures, KmUsb1 retains the canonical 2H fold, with two conserved HxS motifs buried in the central cleft between the transit and terminal lobes in a very narrow channel that is just wide enough to accommodate single stranded, but not double stranded RNA (Supplementary Figure S2C). In addition, two glycerol molecules are bound in the catalytic pocket as high-occupancy ligands (calculated values are between 0.87 and 1.0), where one of the expected catalysts (His211) forms a hydrogen bond to a hydroxyl group of a glycerol molecule (Supplementary Figure S2B).

Coordination of uridine 5 -monophosphate in the active site
Based on the HsUsb1 co-crystal structure (14), we hypothesized that uridine 5 -monophosphate (5 -UMP) would act as a substrate analog by occupying the terminal nucleotide binding pocket ('n pocket'), with its 'scissile' phosphate positioned in the active site. We therefore soaked KmUsb1 crystals in a solution containing UMP and PEG 400 at 4 • C overnight to remove competing glycerol molecules in the active site (Supplementary Figure S2A and B). The resulting co-crystal structure of KmUsb1 bound to 5 -UMP was determined at a maximum resolution of 1.84Å (Figure 2; Supplementary Figure S8; Table 1). As hypothesized, the 5 -UMP substrate analog is bound in the n pocket, with its phosphate moiety positioned between the symmetrical HxS motif of the active site ( Figure 2B; Supplementary Figure S8). The protein structure changes little upon binding the substrate analog, with an r.m.s.d. of 0.3Å for all alpha-carbon atoms between the free and UMP-bound structures. H211 is positioned 3.5Å away from the 5 oxygen of the phosphate moiety of UMP and therefore is likely to function as the catalytic acid during the first cleavage step, analogous to previous structural and mechanistic data for HsUsb1 (5). We note that in the KmUsb1 structure, H211 is not perfectly positioned for catalysis and forms a hydrogen bond to a non-bridging oxygen of 5 -UMP (Figure 2B). The O4 oxygen of UMP forms a hydrogen bond to the sidechain nitrogen of Asn209 ( Figure 2B). Tyr162 forms a T-shaped stacking interaction (40) with the uracil nucleobase and also stacks against the C-terminal catalytic histidine ( Figure 2B).
The overall K. marxianus, S. cerevisiae and human Usb1 structures are highly homologous, although KmUsb1 more closely resembles ScUsb1 (the r.m.s.d. values are 3.1Å for 182 aligned alpha carbon residues between KmUsb1 versus ScUsb1, and 4.1Å for 159 aligned alpha carbon residues between KmUsb1 versus HsUsb1). The two HxS motifs in the active sites are all in nearly identical conformations, regardless of the presence of a bound ligand (Supplementary Figure S9: the all-atom r.m.s.d. values of the histidines and serines in the HxS motifs are within 0.08-0.40). The active site of KmUsb1 is similar to ScUsb1 as both have a 3 10helix loop architecture in the n nucleotide binding pocket (residues 207-209 in KmUsb1) which is absent in HsUsb1 ( Figure 3A-C; Supplementary Figure S6B). An asparagine  residue (Asn209), interacts with the nucleobase via a hydrogen bond to the uracil O4 (Figures 2B and 3B).
We measured the extent of processing for substrate RNAs that terminate in four different nucleotides. KmUsb1 favors RNA substrates with a terminal uridine, has weaker activity towards guanosine followed by adenosine and cytidine (U>G>A>C) ( Figure 3G). The nucleotide preference of K. marxianus differs from the human ortholog, which is more reactive towards terminal adenosines (A>U>G>C) (5).
Previously determined co-crystal structures of 2H proteins bound to RNA also have an aromatic residue in the active site that stacks with a nucleobase (4)(5)(6)9,10,14). In HsUsb1, a stacking interaction between Tyr202 and the terminal nucleotide is important for Usb1 catalysis ( Figure 3C) (5). However, KmUsb1 and ScUsb1 lack a corresponding aromatic residue. In KmUsb1, there is a glutamine instead of tyrosine (Gln203), the side chain of which is not visible and is therefore presumed disordered in the co-crystal structure ( Figure 3B). Other notable differences include the 5 -UMP sugar pucker, which is C2 -endo and C3 -endo in the KmUsb1-5 -UMP and HsUsb1-5 -UMP structures, respectively ( Figure 3B, C, E and F). Additionally, the nucleobase orientation differs slightly between the two structures, owing in part to the different modes of recognition involving H-bonding vs. stacking ( Figure 3B, C, E and F). These differences in coordinating the n nucleobase likely contribute to nucleotide specificity among Usb1 orthologs.
We constructed a structure-based sequence alignment and identified three residues that are fully conserved among Usb1 proteins in addition to the catalytic residues (Supplementary Figure S6B). A universally conserved tyrosine residue stacks against the C-terminal catalytic histidine and nucleobase in the nucleotide bound structures ( Figure 3D-F). We tested the functional importance of this tyrosine in KmUsb1 by mutating to alanine. The resulting Y162A mutant shows a significant reduction in catalytic activity, therefore confirming that this highly conserved tyrosine is important for catalysis ( Figure 3H).

The n -1 nucleotide loop architecture
A major difference in the Usb1 structures corresponds to the n-1 binding pocket ('n -1 loop') ( Figure 4A-C). In the structures of HsUsb1 bound to UUUU and UUUA (5), the n-1 nucleobase is coordinated via hydrogen bonds with the backbone of Val118 in the n -1 loop ( Figure 4C; Supplementary Figure S6B). In contrast, this loop is much larger in both KmUsb1 and ScUsb1, with an overall positive charge and includes a 3 10 -helix structure ( Figure 4A and B; Supplementary Figures S6B and S10). The yeast-specific architecture suggests that the coordination mechanism of the n -1 nucleotide is similar in both organisms, and differs from human Usb1. A lysine and proline residue are conserved in yeast enzymes (Lys114 and Pro115 in KmUsb1: Figure 4A and B; Supplementary Figure S6B). We tested to what extent these residues affect Usb1 catalysis using alanine substitutions. These mutations resulted in slower kinetics compared to wild type Usb1 ( Figure 4D), with the greatest defect observed for K114A ( Figure 4D, lanes 7 and  8). We hypothesize that the lysine is involved in contacting the U6 substrate and the proline is important for forming the n -1 loop architecture. Unlike the shorter n-1 loop in HsUsb1 that can accommodate only the penultimate nucleotide ( Figure 4C), the extended loop in yeast may be capable of interacting with additional nucleotides ( Figure 4A and B). We therefore wondered if the fungal n -1 loop struc- ture is important for the second step CPDase activity shared by ScUsb1 and KmUsb1.
We independently measured the first (exoribonuclease) and second (CPDase) steps for the wild type and mutant KmUsb1. The mutations were made to the n pocket (Y162A: Figure 3E) or to the n -1 pocket (K114A: Figure 4B). By using a single-cleavage substrate and a relatively short incubation time (10 min), the exoribonuclease step can be isolated and the product contains almost exclusively a 2 ,3 -cyclic phosphate rather than a 3 phosphate, as confirmed by CIP and PNK treatment ( Figure 5A, lanes  4 and 5). Both mutants exhibit a significant defect in exoribonuclease activity relative to wild type Usb1. Y162A has a greater defect than K114A, indicating the importance of stacking on the terminal nucleobase during the first catalytic step ( Figures 3E, H and 5A). We also compared second step CPDase activity by utilizing a substrate terminating in a 2 ,3 -cyclic phosphate and containing a 2 -deoxy modification at the n -2, position to prevent further processing ( Figure 5B; Supplementary Figure S5). In this assay, the defects are reversed: K114A hydrolyzes the cyclic phosphate less well than the Y162A mutant ( Figure 5B and C). These results can be explained by the model depicted in Figure 5D. During the first step of catalysis, the terminal n nucleotide occupies the n pocket, corresponding to the UMP-bound crystal structures ( Figure 3B and E). After the first step in catalysis, the cleaved terminal n nucleotide dissociates from the n pocket. Interactions between the cyclic phosphate product and the n -1 loop promote second step CPDase activity ( Figure 5D).

DISCUSSION
Although Usb1 enzymes have significantly diverged throughout evolution, KmUsb1 is more closely related to ScUsb1 in both sequence and structure than it is to HsUsb1. It is therefore interesting that KmUsb1 can process the 3 end of U6 into progressively shorter products, similar to HsUsb1 but not ScUsb1. Progressive processing of U6 snRNA by Usb1 requires the presence of an RNA end with either a 3 hydroxyl or a 2 ,3 cyclic phosphate (11,14). In contrast, a 3 phosphate substrate cannot be processed by KmUsb1, ScUsb1 or HsUsb1 (11,14). In the case of KmUsb1, a 2 ,3 cyclic phosphate on the substrate persists due to weak CPDase activity, whereas in HsUsb1 it persists due to complete lack of detectable CPDase activity (5,14). In contrast, ScUsb1 has efficient CPDase activity resulting in a two-step catalytic mechanism that produces a 3 phosphate which halts the enzyme from further processing U6 snRNA, leading to cleavage of a single nucleotide, predominately (14). In general, there appear to be several factors that redundantly prevent U6 snRNA from being over-processed by Usb1. First, Usb1 is an intrinsically inefficient enzyme, ∼10 6 -fold slower than RNase A (5). Additionally, the secondary structure of U6 snRNA and/or interactions with other spliceosomal components are likely to further block Usb1 from over-processing (5).
Although KmUsb1 has weak CPDase activity, its ability to hydrolyze cyclic phosphates into 3 monophosphates is consistent with our phylogenetic prediction based on the C-terminal sequences of Lsm8 (Supplementary Figure S1) (25). These data strongly suggest a co-evolutionary relationship between Usb1 and Lsm8. Our in vitro assays reveal a distributive processing mechanism in which the cyclic phosphate product can dissociate after the first catalytic step, and rebinds for hydrolysis into the 3 phosphate ( Figures  1C and 5D). In our model, the cyclic phosphate-containing RNA substrate binds the active site through interactions involving the upstream nucleotides and the n -1 loop (Figure 5D). In such a scenario, the two catalytic histidines would switch catalytic roles during cyclic phosphate hydrolysis ( Figure 1A), as suggested for other CPDase enzymes (41). The second catalytic step for KmUsb1 is clearly inefficient for the isolated enzyme and substrate in vitro, as evidenced by the accumulation of cyclic phosphate intermediates.
RNase A also accumulates cyclic phosphate intermediates due to a second catalytic step that is slower than the first step (42). We hypothesize that hydrolysis of the cyclic phosphate may be more efficient in vivo, especially if Usb1 is actively recruited to U6 snRNA through protein-protein interactions. In support of this hypothesis, high-throughput studies have identified many potential interaction partners for Usb1 (43)(44)(45)(46). Some of these potential partners include known spliceosomal proteins. Consistent with this observation, it has been previously observed that mammalian U6 snRNA associated with purified spliceosomes is shortened during splicing (47). In K. marxianus, recruitment of Usb1 to U6 snRNA is also likely required to promote cyclic phosphate hydrolysis due to the poor catalytic efficiency observed here.
There is a striking structural similarity between the n -1 loops of ScUsb1 and KmUsb1 ( Figure 4A and B). This loop is proximal to where the RNA chain would extend away from the enzyme and is likely to interact with U6. Indeed, the HsUsb1 n -1 loop interacts with the n -1 nucleotide through main chain interactions ( Figure 4C), even though the loop itself is only a single amino acid (Supplementary Figure S6B). It is also striking that the yeast n -1 loop is rich in positively charged amino acids that may assist in positioning U6 snRNA into the active site of the enzyme (Figure 5D; Supplementary Figure S10A and B). It is therefore likely that this yeast-specific loop architecture evolved to assist in substrate binding, particularly for CPDase activity ( Figure 5D). We hypothesize that the Usb1 CPDase activity may have evolved in yeast to inhibit over-processing of U6 snRNA, whereas other organisms such as humans have evolved orthogonal terminal uridylyl transferases that can extend and potentially 'repair' over-processed U6 snRNA (48).

DATA AVAILABILITY
Coordinates and structure factors have been deposited in the Protein Data Bank with accession codes 6PFQ and 6PGL.