Structural basis of Mfd-dependent transcription termination

Abstract Mfd-dependent transcription termination plays an important role in transcription-coupled DNA repair, transcription-replication conflict resolution, and antimicrobial resistance development. Despite extensive studies, the molecular mechanism of Mfd-dependent transcription termination in bacteria remains unclear, with several long-standing puzzles. How Mfd is activated by stalled RNA polymerase (RNAP) and how activated Mfd translocates along the DNA are unknown. Here, we report the single-particle cryo-electron microscopy structures of T. thermophilus Mfd-RNAP complex with and without ATPγS. The structures reveal that Mfd undergoes profound conformational changes upon activation, contacts the RNAP β1 domain and its clamp, and pries open the RNAP clamp. These structures provide a foundation for future studies aimed at dissecting the precise mechanism of Mfd-dependent transcription termination and pave the way for rational drug design targeting Mfd for the purpose of tackling the antimicrobial resistance crisis.


INTRODUCTION
Mfd is a highly conserved ATP-dependent DNA translocase in bacteria. It recognizes stalled RNA polymerase (RNAP) and removes it from DNA, leading to Mfd-dependent transcription termination. Mfd-dependent transcription termination has several physiological roles.
The best-characterized function of Mfd-dependent transcription termination is transcription-coupled DNA repair (TCR), which is a sub-pathway of nucleotide excision repair (NER) (1,2). In TCR, Mfd binds to RNAP stalled at the lesion site, displaces RNAP, and recruits NER machinery to the lesion site (3,4).
Mfd-dependent transcription termination has also been suggested to play important roles other than TCR. For example, replisome and RNA polymerase translocate along the same DNA template, often in opposite directions. These processes routinely interfere with each other and lead to catastrophic effects on genome stability and cell viability (5). Mfd may resolve conflicts between DNA replication and transcription by removing RNAP stalled at the replication fork, facilitating unimpeded replication, and thus reducing possible DNA damage (6,7).
Recent studies have shown that Mfd promotes antibiotic resistance in diverse bacterial species, including Mycobacterium tuberculosis, by increasing the mutation rates (8,9). It is proposed that Mfd-dependent transcription termination leads to mutagenic DNA repair through error-prone gap filling. Compared with wild-type strains, mfd strains develop antibiotic resistance much slower and to a lower level. Therefore, the combination of Mfd inhibitor and antibiotics may prevent the evolution of antimicrobial resistance.
Mfd can be functionally dissected into an N-terminal region (NTR), an RNAP interacting domain (RID), a translocation module (TM), and a C-terminal domain (CTD) ( Figure 1A). Biochemical, biophysical and structural analyses have uncovered some aspects of the mechanism of Mfd-dependent transcription termination. First, the interaction between the RID and RNAP ␤1 domain has been identified to be essential for the function of Mfd (10)(11)(12)(13). Second, Mfd can rescue backtracked RNAP by promoting forward translocation via ATP hydrolysis (10). Third, Mfd simultaneously interacts with RNAP via the RID and with DNA via the TM, allowing its translocase activity to generate positive torque on the DNA, thereby overwinding the transcription bubble and disrupting the transcription elongation complex (TEC) (12,(14)(15)(16)(17)(18)(19)(20)(21).
In the crystal structures of Mfd (11), the conformation of the TM is incompatible with DNA binding and ATP hydrolysis, and the determinants of NTR, which are responsible for recruiting NER machinery, are masked by the CTD. Genetic and biochemical studies suggested that large conformational changes are expected upon Mfd activation (17)(18)(19)22,23). To determine the exact conformational changes upon Mfd activation, we solved the single-particle cryoelectron microscopy (cryo-EM) structures of Mfd-RNAP complexes with and without ATP␥ S. The structures trap the active conformation of Mfd and define the proteinprotein and protein-DNA interactions that mediate Mfddependent transcription termination.

Protein expression and purification
Escherichia coli RNAP-70 holoenzyme was purified and assembled as previously described (24). T. thermophilus RNAP core enzyme and T. thermophilus RNAP-A holoenzyme were purified and assembled as reported (25). NusG was purified as reported (26).
Escherichia coli strain BL21(DE3) (Invitrogen, Inc.) was transformed with plasmid pET28a-NH-EcoMfd encoding N-hexahistidine-tagged E. coli Mfd under the control of T7 promoter. Single colonies of the resulting transformants were used to inoculate 1 l LB broth containing 50 g/ml kanamycin, cultures were incubated at 37 • C with shaking until OD 600 = 0.6. Protein expression was induced by addition of IPTG to 1 mM, and cultures were incubated 4 h at 30 • C. Then cells were harvested by centrifugation (5000 rpm; 10 min at 4 • C), resuspended in 20 ml buffer A (50 mM Tris-HCl, pH 8.0, 0.1 M NaCl, 5% glycerol) and lysed using a JN-02C cell disrupter (JNBIO, Inc.). The lysate was centrifuged (12 000 rpm; 45 min at 4 • C), and the supernatant was loaded onto a 2 ml column of Ni-NTA agarose (Qiagen, Inc.) equilibrated with buffer A. The column was washed with 10 ml buffer A containing 0.04 M imidazole and eluted with 10 ml buffer A containing 0.5 M imidazole. The sample was further purified by anion-exchange chromatography on a Mono Q 10/100 GL column (GE Healthcare, Inc.; 160 ml linear gradient of 0.1-1 M NaCl in buffer A). Fractions containing E. coli Mfd were pooled and stored at -80 • C. E. coli Mfd derivatives were expressed and purified in the same way as wild type protein. Yields were ∼10 mg/l, and purities were >95%.
Escherichia coli strain BL21(DE3) (Invitrogen, Inc.) was transformed with plasmid pET28a-NH-TthMfd encoding N-hexahistidine-tagged T. thermophilus Mfd under the control of T7 promoter. Single colonies of the resulting transformants were used to inoculate 1 l LB broth containing 50 g/ml kanamycin, cultures were incubated at 37 • C with shaking until OD 600 = 0.6. Protein expression was induced by addition of IPTG to 1 mM, and cultures were incubated 4 h at 30 • C. Then cells were harvested by centrifugation (5000 rpm; 10 min at 4 • C), resuspended in 20 ml buffer B (50 mM Tris-HCl, pH 8.0, 0.2 M NaCl, 5% glycerol) and lysed using a JN-02C cell disrupter (JNBIO, Inc.). The lysate was centrifuged (12 000 rpm; 45 min at 4 • C), and the supernatant was loaded onto a 2 ml column of Ni-NTA agarose (Qiagen, Inc.) equilibrated with buffer B. The column was washed with 10 ml buffer B containing 0.1 M imidazole and eluted with 10 ml buffer B containing 0.5 M imidazole. The eluate was loaded onto a 5 ml column of HiTrap Heparin HP (GE Healthcare, Inc.) equilibrated in buffer B and eluted with a 100 ml linear gradient of 0.2-1 M NaCl in buffer B. Fractions containing T. thermophilus Mfd were pooled and stored at -80 • C. T. thermophilus Mfd derivatives were expressed and purified in the same way as wild type protein.

ATPase activity assay
ATPase activity assays were performed in a 96-well microplate format using a commercial kit (MAK113, Sigma-Aldrich, Inc.). Reaction mixtures contained (40 l): 5 M T. thermophilus Mfd, 0.4-2 mM ATP, 50 mM Tris-HCl (pH 7.9), 0.1 M KCl, 10 mM MgCl 2 , 1 mM DTT and 5% glycerol. Reaction mixtures were incubated 60 min at 37 • C. 200 l reagent (MAK113A, Sigma-Aldrich, Inc.) was added to terminate the enzyme reaction and generate the colorimetric product. The absorbance at 620 nm was measured using a SpectraMax M5 microplate reader (Molecular Devices, Inc.). Phosphate standard solution was used to calculate the extinction coefficient of the colorimetric product. Less than 2% ATP was consumed to make sure initial velocity was measured.
T. thermophilus RNAP displacement assay was per-  The reaction mixtures were applied to 5% polyacrylamide slab gels (29:1 acrylamide/bisacrylamide), electrophoresed in 90 mM Tris-borate, pH 8.0 and 0.2 mM EDTA, stained with 4S Red Plus Nucleic Acid Stain (Sangon Biotech, Inc.) according to the procedure of the manufacturer.

Assembly of Mfd-dependent transcription termination complex
DNA oligonucleotides and RNA oligonucleotide (sequences in Figure 1B) (Sangon Biotech, Inc.) were dissolved in nuclease-free water to ∼1 mM and stored at -80 • C. Template strand DNA, nontemplate strand DNA, and RNA were annealed at a 1:1:1 ratio in 10 mM Tris-HCl, pH 7.9, 0.

Cryo-EM grid preparation
Immediately before freezing, 8 mM CHAPSO was added to the sample. C-flat grids (CF-1.2/1.3-4C; Protochips, Inc.) were glow-discharged for 60 s at 15 mA prior to the application of 3 l of the complex, then plunge-frozen in liquid ethane using a Vitrobot (FEI, Inc.) with 95% chamber humidity at 10 • C.

Cryo-EM data acquisition and processing
In the preliminary experiment, some data of E. coli Mfd complex with ATP␥ S were collected, but there is no density for Mfd in the final map. Then we turned to determine the structure of T. thermophilus Mfd complex. The grids were imaged using a 300 kV Titan Krios (FEI, Inc.) equipped with a K2 Summit direct electron detector (Gatan, Inc.). Images were recorded with Serial EM (27) in counting mode with a physical pixel size of 1.307Å and a defocus range of 1.5-2.5 m. 4266 images and 4035 images were recorded for MTC ATP␥ S and MTC apo , respectively. Data were collected with a dose of 10 e/pixel/s. Images were recorded with a 10 s exposure and 0.25 s subframes to give a total dose of 59 e/Å 2 . Subframes were aligned and summed using Mo-tionCor2 (28). The contrast transfer function was estimated for each summed image using CTFFIND4 (29). From the summed images, 661,783 particles (MTC ATP␥ S ) and 1 149 168 particles (MTC apo ) were auto-picked, extracted with a box size of 200 pixels, and subjected to 2D classification in RELION (30). Poorly populated classes were removed. These particles were 3D classified in RELION using a map of E. coli TEC (EMD-8585) (31) low-pass filtered to 40 A resolution as a reference. The best-resolved classes were 3D auto-refined and post-processed in RELION. The final numbers of particles are 60 650 (MTC ATP␥ S ), 24 037 (MTC apo ) and 558 003 (TEC).

Cryo-EM model building and refinement
The homology model of T. thermophilus Mfd was generated on Phyre2 server (32). The model of RNAP core enzyme from the structure of T. thermophilus RPo (PDB 4G7H) (33) and the homology model of T. thermophilus Mfd were fitted into the cryo-EM density map using Chimera (34). The model of nucleic acids was built manually in Coot (35). The coordinates were real-space refined with secondary structure restraints in Phenix (36).
where I VV and I VH are fluorescence intensities with the excitation polarizer at the vertical position and the emission polarizer at, respectively, the vertical position and the horizontal position. Equilibrium dissociation constant, K D , were extracted by non-linear regression using the equation: where P is the fluorescence polarization at a given concentration of Mfd, P f is the fluorescence polarization for free 6-FAM-labeled DNA scaffold, P b is the fluorescence polarization for bound 6-FAM-labeled DNA scaffold, and [M] is the concentration of Mfd or Mfd derivative.

Overall structures of Mfd-dependent transcription termination complex
Because the interaction between E. coli RNAP and Mfd is transient, the initial attempt to determine the structure of E. coli Mfd-dependent transcription termination complex (MTC) failed. In the preliminary experiment, some data of E. coli MTC with ATP␥ S were collected, but there is no density for Mfd in the final map. Then we turned to determine the structure of T. thermophilus MTC. The RID and the TM are highly conserved between E. coli Mfd and T. thermophilus Mfd (Supplementary Figure S1). AT-Pase activity assay verified that T. thermophilus Mfd hydrolyzed ATP, while substitution of a conserved Walker B residue, E572, affected ATP hydrolysis, but not ATP binding (Supplementary Figure S2A, B) (17). Electrophoretic mobility shift assay (EMSA) confirmed that T. thermophilus Mfd displaced RNAP stalled by NTP starvation, while substitution of residue E572, which is deficient in ATP hydrolysis, failed to displace stalled RNAP (Supplementary Figure S2C, D).
To obtain the structure of T. thermophilus MTC, we modified the scaffold, which has been used for structure determination of a TEC (31), by extending the upstream ds-DNA from 6 bp to 40 bp ( Figure 1B), which is necessary and sufficient for Mfd to function (10,17). The cryo-EM structures of MTC with and without ATP␥ S (MTC ATP␥ S and MTC apo ) were determined at 4.1 and 5.0Å, respectively ( Figure 1C and D, Supplementary Figures S3-S9, Supplementary Table S1). As expected, a clear density feature for ATP␥ S is observed in MTC ATP␥ S ( Figure 1C). The conformations of RNAP in both structures are similar with a root-mean-square deviation (RMSD) of 0.85Å (2954 C␣s aligned). Although full-length Mfd was used, cryo-EM densities for only the RID and the TM were observed (Supplementary Figure S7). The RID binds to the RNAP ␤1 domain, while the TM binds to the upstream dsDNA and the clamp. Another class from the dataset without ATP␥ S was determined at 3.1Å, lacked the density for Mfd, and turned out to be a regular TEC ( Figure  1E, Supplementary Figures S5, S7 and S8, Supplementary Table S1).

Mfd undergoes profound conformational changes upon activation
The TM, composed of translocation domain 1 (TD1) and translocation domain 2 (TD2), contains the characteristic motifs that identify Mfd as a RecG-like SF2 helicase (Supplementary Figure S1). The TRG (translocation in RecG) motif from TD2 is highly conserved among RecG-like SF2 helicases. Its antiparallel helical hairpin conformation is critical for coupling nucleotide hydrolysis to duplex translocation (16,37).
The structures of Mfd from different species (E. coli, M. smegmatis, M. tuberculosis and T. thermophilus) have been solved in complex with different partners (nucleotide, DNA, and RNAP) using different methods (crystallography and cryo-EM) ( Figure 2) (11,13,17,19,38,39). The most striking differences among these structures are the conformational change of the TRG motif and the repositioning of the RID. The two helices of the TRG motif are almost perpendicular to each other in all structures without DNA, while they are antiparallel in all structures with DNA, hinting that DNA is required for the active conformation of the TRG motif. The RID is connected to the TM through a long helix, the relay helix (RH). The RH is in a similar orientation and the RID is in a similar position in all structures without RNAP. Compared with the cryo-EM structure of Mfd in the absence of RNAP, the RH rotates ∼45 • and the RID translates ∼70Å in MTC ATP␥ S and MTC apo , indicating that RNAP induces domain repositioning.

The upstream dsDNA binds to a positively charged groove of TM
The upstream dsDNA bends ∼40 • relative to its orientation in TEC and binds to a positively charged groove between TD1 and TD2 ( Figure 3A, B). There is a kink of 29 • in MTC ATP␥ S and 36 • in MTC apo at position -16 (Supplementary Figure S10), which is consistent with the single molecule observation (15). Alanine substitution of a conserved basic residue (K739 in E. coli Mfd), which is positioned near the DNA backbone, shows defects in DNA binding assay and RNAP displacement assay ( Figure  3C, D), verifying that the cryo-EM structures are biologically relevant. Furthermore, alanine substitutions of R685 and N817 in E. coli Mfd, which are positioned near the DNA backbones as well, showed severely impaired binding affinity to DNA in a previous report (40).

TD2 undergoes a rotation relative to TD1 upon ATP␥S binding
Compared to MTC apo , TD2 rotates by ∼33 • toward TD1 in MTC ATP␥ S ( Figure 3E and Supplementary Figure S9B Figure 3E), suggesting that the step size of Mfd is 1 bp.

Mfd contacts the RNAP ␤1 domain and the clamp
The interactions between Mfd and RNAP are essentially the same in MTC with and without ATP␥ S. The interactions in MTC ATP␥ S will be discussed in the following sections due to its superior resolution.
The RID binds to the RNAP ␤1 domain with a buried surface area of ∼579Å 2 ( Figure 4A and Supplementary Figure S9C). The interface has been genetically, biochemically, and structurally characterized (11)(12)(13). The structure of the RID and the RNAP ␤1 domain in MTC ATP␥ S is superimposable on the crystal structure of RID complexed with the RNAP ␤1 domain ( Figure 4B) (13), indicating that the RID makes a similar set of interactions in both structures. Consistently, substitutions of interface residues disrupt the Mfd-RNAP interaction and cause defects in the RNAP release activity of Mfd (11,12).
TD2 binds to the clamp with a buried surface area of ∼580Å 2 ( Figure 4C and Supplementary Figure S9D). Specifically, it interacts with the evolutionarily conserved clamp helices and rudder. Transcription initiation factor binds to the clamp helices with high affinity ( Figure 4D) (33), which would exclude the accessibility of the clamp helices. Therefore, transcript release by Mfd is inhibited by (10). NusG binds to the clamp helices, as well (41). Due to its lower affinity, NusG does not interfere with Mfd in RNAP displacement assay ( Supplementary Figure S11), which is expected considering the clamp helices of most elongating RNAP are pre-occupied by NusG in vivo (42).

The clamp is open in Mfd-dependent transcription termination complex
The RNAP is like a crab claw with two pincers (43). The clamp, a mobile structural module that makes up much of one pincer, undergoes swing motions that open the active center cleft to allow entry of the nucleic acid scaffold during initiation or that close the cleft around the nucleic acid scaffold to enable processive elongation (44)(45)(46)(47). Compared with the structure of TEC, the clamp in MTC rotates open by ∼14 • and the DNA-RNA hybrid becomes disordered due to the loss of interaction between the hybrid and the active center cleft ( Figure 5A).
Can Mfd bind to the clamp and the upstream dsDNA in the same way as in MTC if the clamp is closed? To answer this question, the structure of TEC was used as a reference to superimpose the structures of MTC via ␣-carbon atoms of the clamp, revealing severe clashes between the upstream dsDNA and the RNAP ␤1 domain ( Figure 5B). Therefore, Mfd would not be able to bind to the clamp and the upstream dsDNA in the same way as in MTC if the clamp is closed.

DISCUSSION
In this work, we determined the cryo-EM structures of Mfddependent transcription termination complex with and without ATP␥ S, revealing the precise mechanism of Mfd activation and translocation. Dots are connected based on this work and previous studies ( Figure 6). In the absence of DNA, the two helices of the TRG motif adopt the perpendicular configuration, and the UvrA binding determinant is sequestered by the CTD (11,17,18). After binding to DNA, the two helices of the TRG motif turn into the antiparallel configuration and couple ATP hydrolysis to duplex translocation (11,39). If Mfd encounters a backtracked RNAP, the RID binds to the RNAP ␤1 domain ( Figure 6A). In the meantime, Mfd translocates on the upstream dsDNA and pushes RNAP forward via ATP hydrolysis. As soon as the 3 end of the RNA aligns with the active center, transcription resumes. Because the rate of transcription elongation (∼14 bp/s) is faster than the rate of Mfd translocation (∼7 bp/s) (40), RNAP moves forward and leaves Mfd behind. During this process, the UvrA binding determinant of Mfd remains sequestered by the CTD, so the NER machinery will not be recruited.
If Mfd encounters an RNAP stalled by DNA damage, the RID binds to the RNAP ␤1 domain ( Figure 6B). In the meantime, the TM translocates on the upstream dsDNA via ATP hydrolysis. However, RNAP cannot move forward due to the DNA damage. The RID cannot move forward either, due to its interaction with the RNAP. Therefore, the RID translates a long distance relative to the TM after several cycles of ATP hydrolysis. When the TM steps into the clamp of RNAP, it pushes against and pries open the clamp. During this process, ATP hydrolysis drives profound conformational changes of Mfd, including the translation of the RID and the exposure of the UvrA binding determinant. Therefore, although Mfd is capable of binding both paused RNAP and RNAP stalled by DNA damage, only in the case of DNA damage can Mfd complete the conformational change to expose the UvrA binding determinant and recruit NER machinery. This model is consistent with the observation that ATP hydrolysis is required for Mfd activation (15). This model is also consistent with the proposal that Mfd kinetically discriminates stalled RNAP from backtracked RNAP (15,40,48).
A closed clamp is critical for the processivity of transcription elongation. Even when RNAP is paused by backtracking or an RNA hairpin in the RNA exit channel, the clamp remains closed (49)(50)(51). On the contrary, the clamp is open in MTC structures. The open clamp loses its interaction with the DNA-RNA hybrid and probably aids the dissociation of the DNA-RNA hybrid.
Rad26, a Swi2/Snf2 family helicase, is among the first proteins to be recruited to Pol II during the initiation of S. cerevisiae TCR. The cryo-EM structure of Rad26 in complex with Pol II shows that Rad26 binds upstream of Pol II and translocates toward Pol II, suggesting Rad26 may play a role similar to that of Mfd (52). However, structural analysis reveals the divergences between Rad26 and Mfd (Supplementary Figure S12). First, the clamp is closed and the transcription bubble is ordered in Rad26-Pol II complex, while Mfd pries open the clamp and disrupts the contacts between the transcription bubble and RNAP. Second, Rad26 bends the upstream dsDNA by ∼80 • , while Mfd has a less dramatic impact on the conformation of the upstream dsDNA. Third, the second-largest subunit of Pol II is the major contact site of Rad26, while Mfd contacts both the largest and the second-largest subunits of the polymerase.
Besides its role in TCR, Mfd is proposed to be an 'evolvability factor' that promotes mutagenesis and is required for rapid resistance development to antibiotics (8,9). Therefore, Mfd may be an ideal target for 'anti-evolution' drugs that inhibit antimicrobial resistance development. The structures of Mfd in action provide a basis for rational drug design targeting Mfd. For example, because the conformational A B Figure 6. Proposed model of Mfd-dependent rescue and Mfd-dependent transcription termination. (A) If Mfd encounters a backtracked RNAP, the RID binds to the RNAP and the TM binds to the upstream dsDNA. Mfd and RNAP translocate forward together via ATP hydrolysis. When the 3 end of the RNA aligns with the active center, transcription resumes as long as NTPs are available. (B) If Mfd encounters an RNAP stalled by DNA damage, the RID binds to the RNAP and the TM binds to the upstream dsDNA. The TM translocates forward via ATP hydrolysis, but the RNAP and the RID stay stationary due to the DNA damage. Eventually, the TM steps into and pries open the clamp, which will facilitate the dissociation of DNA-RNA hybrid. During this process, the UvrA binding determinant of Mfd gets exposed.
change of the TRG motif is critical for the function of Mfd, inhibitors might be designed to lock the conformation of the TRG motif.

DATA AVAILABILITY
The accession numbers for the cryo-EM density map reported in this paper is Electron Microscopy Data Bank: EMD-30117 (MTC apo ), EMD-30118 (MTC ATP␥ S ), and EMD-30119 (TEC). The accession numbers for the atomic coordinates reported in this paper are Protein Data Bank: 6M6A (MTC apo ), 6M6B (MTC ATP␥ S ), and 6M6C (TEC).

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.