Crystal structure of RNase H3–substrate complex reveals parallel evolution of RNA/DNA hybrid recognition

RNases H participate in the replication and maintenance of genomic DNA. RNase H1 cleaves the RNA strand of RNA/DNA hybrids, and RNase H2 in addition hydrolyzes the RNA residue of RNA–DNA junctions. RNase H3 is structurally closely related to RNases H2, but its biochemical properties are similar to type 1 enzymes. Its unique N-terminal substrate-binding domain (N-domain) is related to TATA-binding protein. Here, we report the first crystal structure of RNase H3 in complex with its RNA/DNA substrate. Just like RNases H1, type 3 enzyme recognizes the 2′-OH groups of the RNA strand and detects the DNA strand by binding a phosphate group and inducing B-form conformation. Moreover, the N-domain recognizes RNA and DNA in a manner that is highly similar to the hybrid-binding domain of RNases H1. Our structure demonstrates a remarkable example of parallel evolution of the elements used in the specific recognition of RNA and DNA.


Protein expression and purification
Synthetic genes that encode Thermovibrio ammonificans (Ta), Thermocrinis albus (Tha), and Thermotoga lettingae (Tl) RNase H3 were purchased from Epoch Life Science and subcloned into a pET28 expression vector that carries an N-terminal His-tag and SUMO-tag that is removable by SENP protease. BamHI and XhoI restriction sites were used for cloning, which results in two N-terminal residues after SENP cleavage: Serine 0 and initiator Methionine 1. Mutagenesis of the constructs was performed using the QuikChange protocol (Stratagene) or inside-out polymerase chain reaction (PCR). For expression of the isolated Ta-RNase H3 N-domain (residues 1-70) a stop codon was introduced into the expression construct for the full length protein.
For expression, the vectors were transformed into Escherichia coli BL21 Star cells (Ta-RNase H3 and Tha-RNase H3) or RIL cells (Tl-RNase H3). Protein expression was induced overnight with 0.4 mM isopropyl 1-thio-β-D-galactopyranoside at 30°C. Bacterial cells were next suspended in 40 mM NaH 2 PO 4 (pH 7.0), 100 mM NaCl, and 5% glycerol with the addition of a mixture of protease inhibitors and incubated on ice in the presence of 1 mg/ml lysozyme. After sonication, the cleared lysate was applied to a HisTrap column (GE Healthcare) equilibrated with 10 mM imidazole, 40 mM NaH 2 PO 4 , 0.5 M NaCl, and 5% glycerol. After a wash step with 60 mM imidazole, the protein was eluted with 300 mM imidazole. Eluted fraction was dialyzed overnight against 10 mM imidazole, 40 mM NaH 2 PO 4 , 0.5 M NaCl, and 5% glycerol in the presence of SENP protease. The sample was then reapplied to the HisTrap column in the same conditions. Cleaved protein did not bind to the resin and was collected in the flow-through fraction. Pure protein was concentrated to 25-56 mg/ml and stored in 20 mM Tris (pH 8.0), 100 mM NaCl, 5% glycerol, 0.5 mM EDTA, and 1 mM DTT.
Ta-RNase H3 N-domain expression was performed in Escherichia coli BL21star cells and induced overnight with 0.4 mM isopropyl 1-thio-β-D-galactopyranoside at 16°C. Two-step purification on a HisTrap column was performed as described for the full length protein. The collected fraction was concentrated and applied to a Superdex 75 column equilibrated with 20 mM Tris (pH 8.0), 100 mM NaCl, 5% glycerol, 0.5 mM EDTA, and 1 mM DTT. Fractions containing the pure protein were pooled and concentrated to 21.5 mg/ml.

Diffraction data collection and structure determination
The diffraction data for all of the crystals were collected at the Berliner Elektronenspeicherring-Gesellschaft für Synchrotronstrahlung (BESSY) synchrotron at beamline MX-14.1 on a Pilatus 6M detector at 100K (24). The data for the native and Au-soaked crystals were collected at wavelengths of 0.91841 Å and 1.04024 Å, respectively. The datasets were processed and scaled using XDS (25). The crystals belonged to the P2 1 2 1 2 1 space group and contained one protein-substrate complex in the asymmetric unit.
The structure was solved with Phenix Autosol using the SAD method and the dataset collected with Ta-RNase H3-Au crystals. Homology models of the catalytic and N-domains were generated with SwissModeller (http://swissmodel.expasy.org/) using A. aeolicus RNase H3 structure (PDB ID: 3VN5) and C-α traces of those models were manually docked into the electron density maps. The nucleic acid model was manually built into the structure in Coot. Refinement of the structure was performed with Phenix (26) interspersed with rounds of manual corrections in Coot (27). After preliminary refinement of the model using the dataset collected for Ta-RNase H3-Au crystals, the structure was further refined against the diffraction data collected for the native Ta-RNase H3 crystals, resulting in better resolution and statistics. Of the reflections, 5% were used to calculate the R free . In the final model, all of the residues reside in the allowed regions of the Ramachandran plot. The structure contains one molecule of glycerol and seven sulfate ions. Structural analyses and figure preparation were performed using PyMol (www.pymol.org/).
Crystal packing interactions are shown in Supplementary Figure 3. They involve stacking of the blunt ends of the hybrids, which nearly form a pseudohelix going through the crystal. Additional protein-protein crystal contacts are between helix H (Lys249) and helix B (Glu129). Very weak contacts between the RNA strand of the hybrid and the symmetry-related protein molecules are mediated by Arg232 and Lys258. All the observed crystal contacts are far from the substrate interface and therefore do not seem to affect protein-nucleic acid interactions.

Electrophoretic mobility shift assay
The substrate specificity of the isolated N-domain of Ta-RNase H3 was determined using 24-mer RNA/DNA hybrid, dsDNA and dsRNA of the same sequence labeled with TAMRA at the 3' end of one of the strands. The substrate concentration was 1 µM and the protein was added in 1:1, 2.5:1, 5:1, 10:1 or 20:1 molar ratio to the substrate. Reactions were incubated at room temperature for 30 min in a buffer that contained 100 mM NaCl, 20 mM Tris (pH 8.0), 5% glycerol, 1 mM DTT, 0.5 mM EDTA and 100 µg/ml BSA. The samples were resolved on a 10% native polyacrylamide gel in 1 × TAE buffer at 4ºC and visualized by fluorescence readout.

Comparison of the N-domain and TBP
The main difference between the N-domain and TBP is that the former has a monopartite structure, and the latter is composed of two similar halves (TBP-domains). A single domain of the human TBP can be superimposed on the N-domain of Ta-RNase H3 with an rmsd of 2.7 Å over 30 C-α atoms ( Supplementary Fig. 6a). The highest structural similarity is observed for the three β-strands that are common for the two domains (1*, 2*, and 3* in the N-domain), whereas the positions of the two helices present in the fold are less conserved. Because the role of the TBP-domains in the TATA-binding protein is the sequence-specific binding of dsDNA in the promoter region, it establishes both interactions with the DNA backbone (e.g., Arg294 and Ser307) and base-specific contacts (e.g., Phe288 and Phe305; Supplementary Fig. 6b). A few interactions are conserved between Ta-RNase H3 and TBP, including DNA backbone phosphate-binding by Ser45 (Ser307 in human TBP) and a stacking interaction that discriminates against 2'-OH by Tyr43 (Phe305). Despite these similarities, a large difference is observed in terms of nucleic acid conformation. The dsDNA bound by the TBP-domain retains Watson-Crick base-pairing but is largely unwound, with minor groove width values that oscillate between 11 and 13 Å (Fig. 2a), whereas the hybrid bound by the N-domain preserves the A form-like conformation.

Additional potential interactions with the non-cleaved strand
When structures of Tm-RNase H2 (14) and Ta-RNase H3 are compared, the trajectory of the noncleaved strand downstream from the phosphate-binding pocket differs. In RNase H3 the DNA strand leaves the protein surface and does not form any further contacts with the protein (Fig 2b,   Supplementary Fig. 7). In Tm-RNase H2 structure this region of DNA passes much closer to the protein and forms additional contacts with Phe83. In Ta-RNase H3 structure the region corresponding to the vicinity of Phe83 binds a sulfate ion through side chains of Asn154, Arg155, Asn184 and Arg185, which strongly suggests that this site would also bind the nucleic acid backbone, if a trajectory similar to the one observed in RNase H2 is possible. We attribute the difference in the trajectory of the non-cleaved strand to the different nature of the cleaved strand (RNA in RNase H3 structure and DNA with a single ribonucleotide in RNase H2 structure), which affects the overall geometry of the double helix. We therefore assume that for substrates with only one or several ribonucleotides in the cleaved strand Asn154, Arg155, Asn184 and Arg185 would bind a phosphate group downstream from the Pbinding pocket. The cleavage products were analyzed on 20% Tris borate EDTA-urea polyacrylamide gels. The size of the products was estimated by comparisons with the marker indicated as M (products of alkaline hydrolysis of the cleaved strand). The schematic representation of the chimeric substrates shows DNA residues as blue and RNA residues as red. The observed cleavage sites are indicated with arrows, and the fluorescent label position is indicated with a yellow star. (b) Cleavage of model substrates by