An archaeal transcription factor EnfR with a novel ‘eighth note’ fold controls hydrogen production of a hyperthermophilic archaeon Thermococcus onnurineus NA1

Abstract Thermococcus onnurineus NA1, a hyperthermophilic carboxydotrophic archaeon, produces H2 through CO oxidation catalyzed by proteins encoded in a carbon monoxide dehydrogenase (CODH) gene cluster. TON_1525 with a DNA-binding helix-turn-helix (HTH) motif is a putative repressor regulating the transcriptional expression of the codh gene cluster. The T55I mutation in TON_1525 led to enhanced H2 production accompanied by the increased expression of genes in the codh cluster. Here, TON_1525 was demonstrated to be a dimer. Monomeric TON_1525 adopts a novel ‘eighth note’ symbol-like fold (referred to as ‘eighth note’ fold regulator, EnfR), and the dimerization mode of EnfR is unique in that it has no resemblance to structures in the Protein Data Bank. According to footprinting and gel shift assays, dimeric EnfR binds to a 36-bp pseudo-palindromic inverted repeat in the promoter region of the codh gene cluster, which is supported by an in silico EnfR/DNA complex model and mutational studies revealing the implication of N-terminal loops as well as HTH motifs in DNA recognition. The DNA-binding affinity of the T55I mutant was lowered by ∼15-fold, for which the conformational change of N-terminal loops is responsible. In addition, transcriptome analysis suggested that EnfR could regulate diverse metabolic processes besides H2 production.


INTRODUCTION
Fossil fuels such as coal, oil and natural gas have been major energy sour ces r esponsible for almost 80% of a total energy supply for decades ( 1 , 2 ). Howe v er, due to air pollution and global warming caused by their combustion, fossil fuel consumption is declining at a global le v el and the demand for alternati v e rene wab le energy sources is soaring ( 3 ). According to a roadmap to 2050 reported by the International Rene wab le Energy Agency (IRENA), the percentage share of rene wab le energy would be reached 65% in 2050, whereas the use of fossil fuels would fall to one-third of the present le v els ( 4 ). Hydrogen is an attracti v e candidate as a future fuel for se v eral reasons. It is a non-toxic, clean and efficient source in that the major byproduct from hydrogen combustion is pure water, and its energy content per unit mass is 3-times higher than traditional fossil fuels ( 5 ). Ther efor e, the global demand for hydrogen energy is getting higher, and the de v elopment of hydrogen-fueling infr astructures for production, stor age and utilization is a gr eat r esear ch challenge (6)(7)(8).
Almost 96% of hydrogen is produced using natural gas (48%), petroleum (30%) and coal (18%) ( 9 ). Besides H 2 production depending on nonrene wab le fossil sour ces, ther e are ways to produce H 2 from rene wab le feedstocks such as wind, solar and biomass ( 10 ). Microbial H 2 production is also promising since it is an environment-friendly and less energy-intensi v e way to produce H 2 . It is also notable that microbes can exploit industrial wastes including carbon monoxide (CO) or organic matter to produce H 2 (11)(12)(13). In particular, biohydr ogen pr oduction using thermophiles at high temperatures ( ≥60 • C) has numerous advantages such as high metabolic activity leading to enhanced product formation rates ( 14 ), less contamination by H 2 -consuming microorganisms ( 15 ), no intensi v e cooling necessary during fermentation, decreased density, surface tension, and viscosity of culture broth, etc. ( 16 ). Thermococcales , Thermotogales , Desulfurococcales and Clostridium species have been reported to produce H 2 from organic substra tes a t high tempera tures (17)(18)(19).
Thermococcus onnurineus NA1, isolated from a deep-sea hydrothermal vent, is a sulfur-reducing hyperthermophilic carboxydotrophic archaeon ( 20 ). It requires elemental sulfur as a terminal electron acceptor for heterotrophic growth on peptides or an amino acid mixture and exhibits optimum growth at 80 • C and pH 8.5. Interestingly, T. onnurineus NA1 grows under di v erse substrates including CO and formate to produce H 2 ( 21 ). The H 2 productivity of T. onnurineus NA1 is higher than that of other hyperthermophilic archaea such as Pyrococcus furiosus and Desulfur ococcus am ylolyticus ( 18 , 21 , 22 ). In the case of the CO-utilization, the unique codh gene cluster in T. onnurineus N A1, w hich encodes CO dehydro genase, hydro genase and Na + / H + antiporter, catalyzes the anaerobic oxidation of CO to CO 2 through the water-gas shift reaction: CO + H 2 O → H 2 + CO 2 ( G • = −20 kJ / mol) ( 23 ).
In a previous study, an adaptive evolution was carried out through the serial transfer of T. onnurineus NA1 into CO medium to enhance its H 2 production ( 24 ). After the 156time serial transfer, cell density, CO consumption rate, and H 2 production rate increased by 2.8-, 6.5-and 5.9-fold, respecti v el y. Although m utations occurred in gene sequences of ten proteins after the serial transfer ( 24 ), the CO consumption rate and H 2 production rate of a mutant strain MC11 harboring the Thr → Ile replacement at position 55 in the TON 1525 protein were comparable to those of the 156-serial-tr ansferred str ain (156T). Consistently, the expression of proteins essential for CO-dependent H 2 production was increased in both 156T and MC11 strains compared to the wild-type ( 24 ).
Sequence analysis re v ealed that TON 1525 is a Tfx DNA-binding protein. According to NCBI's Gene resources, a total of 288 microorganisms have Tfx DNAbinding proteins, most of which ( ∼98%) are distributed in archaea, including Thermococcales , Methanococcales, and Halobacteriales . Almost all archaeal Tfx DNA-binding proteins have a basic DNA-binding helix-turn-helix (HTH) motif (p I range = 8. [8][9][10][11].0) at their N-terminal region and an acidic C-terminal domain (pI range = 4.6-6.8) in common. The amino acid sequence of the HTH motif resembles that of the bacterial RN A pol ymerase (RN AP) 70 region4 domain that recognizes specific transcription-initiation sites in the promoter region and recruits RNAP to start transcription ( 25 ). The presence of the DN A-reco gnizing HTH motif, together with experimental data that two Tfx family members (TON 1525 and the MTH0916 protein from Methanobacterium thermoautotrophicum ) bind to promoter regions ( 24 , 26 ), strongly suggests that the Tfx family members are transcription regulators. Ne v ertheless, there is no report about the basic characteristics of Tfx family members such as the molecular shape, the oligomerization state, the target DNA sequence, and the DNA-recognition mode. In this study, we re v ealed that TON 1525 adopts a fold resembling an eighth note symbol and the unique X-shaped dimeric structure of EnfR has no structural homolog. In addition, the target DNA sequence was determined in the promoter region of the codh gene cluster, and the DNAbinding mode of EnfR was elucidated based on structural and mutational analyses. We also provide a possibility that it may be involved in the transcriptional regulation of diverse metabolic processes in T. onnurineus NA1. Consequently, our study will play as a platform for future research about the Tfx family members and for the understanding of how the energy metabolism of a hyperthermophilic archaeon T. onnurineus NA1 is controlled at the transcription le v el.

Protein pr epar ation
The T. onnurineus NA1 TON 1525 gene encoding residues 1-146 was synthesized to have the in-frame non-cleavable C-terminal His6-tag and inserted at the Nde I and Not I sites of the expression vector pET24a(+) (Novagen, USA). The T55I substitution was generated by site-directed mutagenesis using the wild-type gene as a template. The wild-type and mutant construct were transformed into Esc heric hia coli Rosetta (DE3), respecti v ely. The two transformed cells were grown to an optical density values at 600 nm (OD 600 ) of ∼0.6 in Luria-Bertani medium a t 37 • C , and the expression of the wild-type and mutants were induced by 0.5 mM isoprop yl ␤-D -thiogalactop yranoside. After 4 h induction a t 37 • C , the cells wer e harvested, r esuspended in a 10 mM Tris (pH 7.4), 500 mM NaCl and 5 mM ␤-mercaptoethanol buffer, and disrupted by sonication. The crude lysate was centrifuga ted a t 10,000 × g for 30 min and the resulting supernatant was then boiled at 75 • C for 10 min. The lysate was centrifugated again at 10,000 × g for 1 h. The resulting supernatant was loaded onto a nickel-nitrilotriacetic acid (Ni-NTA) column (GE Healthcare, USA). The protein fraction eluted from the Ni-NTA column was concentrated and loaded onto a Super de x 75 HR 16 / 600 column (GE Healthcare, USA). The elute from gel-filtration was concentrated to ∼10 mg / ml in a 20 mM Tris buffer (pH 7.4), 500 mM NaCl and 2 mM DTT for crystallization.

Crystallization, X-r ay diffr action experiment and structur e determination
The microbatch crystallization method was employed to grow crystals under oil ( 27 ). Crystals of the wild-type TON 1525 were obtained at 22 • C by mixing 1 l of protein solution with an equivalent volume of a mother liquor consisting of 30% polyethylene glycol (PEG) 400, 100 mM HEPES (pH 7.5), and 200 mM NaCl. The T55I mutant was crystallized in the same way with a mother liquor of 30% PEG 400, 100 mM HEPES (pH 7.5), and 200 mM ammonium sulfate. A 2.8 Å resolution data set of the wild-type and a 3.4 Å resolution data set of the T55I mutant were collected at beamline 17A of Photon Factory, Tsukuba, Japan. Both data sets were integrated and scaled with XDS ( 28 ). Crystals of the wild-type belonged to the space group P 4 3 2 1 2 with cell parameters a = b = 102.7 Å and c = 90.4 Å corresponding to two monomers in an asymmetric unit. Crystals of the T55I m utant, w hich belonged to the space group P 4 3 2 1 2 with cell parameters a = b = 123.2 Å and c = 82.7 Å , contain two monomers in an asymmetric unit (Supplementary Table S1).
To solve the structures of the wild-type and the T55I mutant by molecular replacement (MR), we used the MTH0916 structure (PDB code:1NR3) as a search model because MTH0916 is the only Tfx family member whose structure is availab le. Howe v er, our MR trials using programs MOLREP and PHASER were unsuccessful. TON 1525 has no methionine residue except for the Nterminal initiation methionine that is usually disordered in proteins, which indicates that TON 1525 is not adequate for incorporating seleno-methionines for de novo phasing. Ther efor e, a leucine r esidue at position 83 was replaced by methionine together with another C135S substitution in the T55I mutant. Since a protein band corresponding to the dimeric size of TON 1525 was observed on non-reducing SDS-PAGE during protein purification, we mutated Cys135 to serine to pre v ent intermolecular disulfide bonds. After the introduction of the Cys → Ser replacement, the band of the dimeric size disappeared. For the seleno-methionine labeling of the triple mutant (T55I / L83M / C135S), the methionine auxotroph E. coli B834 (DE3) (Novagen) strain was used as a host. The triple mutant harboring a selenomethionine at position 83 was purified in the same way as the wild-type and was concentrated to ∼10 mg / ml for crystallization.
Crystals of the triple mutant were obtained in the same way as the wild-type with a mother liquor consisting of 16% PEG 400, 100 mM Tris (pH 8.6) and 200 mM ammonium sulfate. A single-wavelength (0.9796 Å ) anomalous dif fraction (SAD) da ta set was collected to the resolution of 3.00 Å at beamline 5C of Pohang Accelerator Laboratory and was processed by HKL2000 ( 29 ). The experimental SAD phasing and the initial model building were performed with PHENIX ( 30 ). The initial model of the triple mutant was subjected to the automated model building of BUCCANEER ( 31 ) and then further refined to R / R free of 0.251 / 0.267 by using COOT ( 32 ) and PHENIX . The crystal structures of the wild-type and the T55I mutant were determined with the triple mutant structure as a search model. MR solutions from PHASER was manually manipulated by COOT and refined by PHENIX . Se v eral rounds of refinements and manual refitting gave rise to final models of the wild-type ( R / R free = 0.222 / 0.244) and the T55I mutant ( R / R free = 0.236 / 0.275). The final model of the wild-type contains residues 4 ∼142 of chain A and residues 4 ∼139 of chain B while that of the T55I mutant consists of residues 4-144 of chain A and residues 2-139 of chain B. The Ramachandran plots indicate 97.1% (the wild-type) and 91.6% (the T55I mutant) of non-glycine residues are in the most favor ed r egions, and all others ar e in the additionally allowed regions.

Analytical ultracentrifugation
Sedimentation velocity of wild-type EnfR and all the mutants were calculated by analytical ultracentrifugation at a wavelength of 280 nm using a ProteomeLab XL-A centrifuge (Beckman Coulter, Brea, CA, USA) equipped with an AN-60 Ti rotor a t 20 • C . Da ta from sedimenta tion a t 42,000 rpm were collected at 5-min intervals, giving a total of 130 scans. Sedimentation velocity data were analyzed using SEDFIT . The density and viscosity of the buffer solution were determined at 20 • C to be 1.019 g / ml and 0.010503 P, respecti v ely. The partial specific volume was calculated as 0.7454-0.7466 ml / g at 20 • C.

DNase I protection assay
For DNase I protection assay, a DNA probe of the 150-bp promoter region of the codh gene cluster ( 24 ) was amplified by PCR using 6-carboxyfluorescein (6-FAM)-labeled 1017 150 F and unlabeled 1017 150 R as primers (Supplementary Table S2). The labeled DNA probe (200 ng) was then incubated with the purified EnfR protein for 10 min at 80 • C in a 20 l r eaction mixtur e containing 1 × binding buffer (20 mM Tris-HCl pH 7.5, 200 mM KCl, 1 mM EDTA, and 5% glycerol). DNaseI digestion of the protein-DNA complex followed the procedures described previously ( 33 ). The resulting DNA fragments were precipitated with ethanol, eluted in nuclease-free water and then analyzed using an ABI 3730xl DNA analyzer (Applied Biosystems, Foster City, CA) with P eak Scanner ™ Softw are ver. 1.0 (Applied Biosystems).

Molecular docking
An in silico model of EnfR in complex with the 36-bp target DNA was built using the HADDOCK v.2.4 server. For molecular docking, we defined the ␣2-turn-␣3 region (residues 24-51) of EnfR as acti v e r esidues that wer e pr edicted to be involved in DNA-binding, and the first three Nterminal residues that are conformationally different from that of the T55I mutant structure were also selected as acti v e residues in each monomer. For DNA molecule, acti v e residues were selected based on the sequence of 5 -AA TCTTTTTGTTTACA TT-3 which was determined as a minimal binding region in the DNase I protection assay.
Passi v e residues were automatically selected by the server. Among the generated models, the top-ranked model was selected to get insights into the DNA-binding mode of EnfR.

Electrophoretic mobility shift assay (EMSA)
All protein mutants were generated by site-directed mutagenesis using the wild-type TON 1525 (EnfR) gene as a template (Cosmogenetech, Republic of Korea). Mutant proteins wer e pr epar ed by the same procedur e as described in Pr otein Pr epar ation . To analyze interactions between EnfR and the target DNA sequence, the 36-bp 5 -Cyanine 5 (Cy5)-labeled (5 nM) probes were used with different protein concentrations. For verification of EnfR-binding to the promoter regions of differentially expressed genes (DEGs), DNA probes (10 nM) of TON 1582, TON 0537 and TON 1563 genes were amplified by PCR using 5 -Cy5labeled forward primers (1582 F, 0537 / 8 F and 1563 F) and unlabeled re v erse primers (1582 R, 0537 / 8 R and 1563 R) (Supplementary Table S4). Proteins were incubated with the probes at 37 • C for 20 min in a buffer containing 10 mM Tris (pH 7.4), 200 mM KCl, 2 mM DTT, 0.2 mM EDTA. After incubation, the reaction mixtures were separated by electrophoresis using a 6% polyacrylamide gel. The gels were analyzed with ChemiDoc MP imaging system (Bio-Rad). The band intensities were calculated with the program ImageJ and used to compare the DNA-binding affinities of the wild-type and the T55I mutant.

Str ains, cultur e conditions and analytical methods
T. onnurineus NA1 (KCTC 10859) was routinely cultured in modified medium 1 (MM1) supplemented with 1 bar of 100% CO (MM1 + CO) in serum bottles under anaerobic conditions at 80 • C ( 34 ). An MC11 mutant strain harboring the amino acid substitution, T55I, in EnfR was previously constructed ( 24 ) and used in this study. For the cultivation of the wild-type and MC11 strains in the bioreactor, 100% CO was supplemented at a flow rate of 120 ml / min and the pH was controlled at 6.1-6.2 using 5 N NaOH containing 3.5% NaCl. Cell growth was monitored by OD 600 , and H 2 production was analyzed as previously described ( 35 ).

T r anscriptome analysis
The wild-type and MC11 strains were grown to the exponential growth phase in MM1 + CO medium and three independent cultur es wer e employ ed for RNA extr action. Total RNA was extracted from the cells using Trizol reagent (Invitro gen, Carlsbad, CA) as previousl y described ( 24 ). The quality and quantity of the RNAs were assessed as RNA integrity number (RIN) ( 36 ) and RNA electr opher ograms produced with an Agilent 2100 Bioanalyzer (Palo Alto, USA) according to the manufacturer's instructions. 10 g of RNA from samples with RIN values ranging from 8.4 to 9.7 for the wild-type and MC11 strains was used for further experiments. Sequencing library generation and sequencing using a HiSeq 2500 (Illumina, San Diego, USA) were performed by CJ bioscience (Seoul, Republic of Kor ea). Quality-filter ed r eads wer e mapped against a r eference genome sequence (BioProject accession number PR-JNA59043) ( https://www.ncbi.nlm.nih.gov/bioproject ) by Gibiome (Seongnam, Republic of Korea). Relati v e transcript abundance was measured as fragments per kilobase per million mapped reads (FPKM) ( 37 ). RNA sequence data were submitted to the NCBI Gene Expression Omnibus (GEO) database ( http://www.ncbi.nlm.nih.gov/geo ) with accession code GSE200806.

Western blotting analysis
After immunization of rabbits with each purified protein in Ab Frontier (Seoul, Republic of Korea), polyclonal antibodies were generated and verified by Western blotting. Western blotting analysis was conducted as described ( 13 ).

Isothermal titration calorimetry (ITC)
IT C experiments wer e conducted using a VP-IT C instrument from Malvern Panalytical (Malvern, UK) at 25 • C. The wild-type, the T55I mutant, and the 36-bp target DNA were dissolved in 10 mM Tris-HCl buffer (pH 7.4) containing 200 mM KCl, 1 mM tris(2-chloroethyl) phosphate (TCEP) and 0.2 mM EDTA. The solutions were degassed for 3 min before being loaded into the ITC instrument. The concentration of the wild-type and the T55I mutant in the cell was 25 M, while the concentration of the 36-bp target DNA in the syringe was 380 M. Titration experiments consisting of 25 injections in total were performed with continuous stirring at 307 rpm. The injection volume was 2 l for the first injection to minimize effects of bubbles and 11 l for the remaining injections. The initial delay and the reference pow er w ere set to 1200 s and 10 cal / s, respecti v ely. The 36-bp target DNA solution was titrated into the same buffered solution for the heat of dilution. Binding isotherms were displayed after subtracting the dilution heat and baseline correction. Fitting analyses of binding isotherms were carried out using the one-set-of-sites binding model in the MicroCal Origin 7.0 software.

Lucifer ase r eporter assay
A luciferase reporter assay to monitor the specific binding of EnfR to the target DNA sequence was designed by using a heterologous bacterial host E. coli ( 38 ). To construct the EnfR-producing plasmids, wild-type and mutant TON 1525 (EnfR) genes in the expression plasmids were subcloned into pJK1113, a plasmid containing the arabinose-inducible promoter P BAD (Supplementary Table  S5) ( 39 ). The resulting plasmids were designated as the pJH EnfR series. To construct synthetic promoters that are r epr essed by EnfR, we engineered two constituti v ely e xpressing bacterial promoters (J23117 and J23110) ( 40 ) to have the EnfR-binding site either at the very downstream of or at partiall y overla pping position with the -10 box for bacterial RNAP (see 'The target DNA sequence and the implication of the N-terminal loop in DN A reco gnization' of the Results and Discussion section). DNA fragments containing such engineered sequences (Supplementary Table  S6) were chemically synthesized and then cloned into the plasmid pBBR lux containing a promoterless luxCDABE operon (Supplementary Table S5) ( 41 ). The resulting reporter plasmids were designated as the pJH road-blocking (if EnfR binds to the downstream of -10 box) or pJH steric hindrance (if EnfR binds to the -10 box thereby interfering RN AP binding). Finall y, to produce reporter strains, each pJH EnfR plasmid was co-transformed with pJH roadblocking or pJH steric hindrance into E. coli DH ␣ cells (Supplementary Table S5).
For the luciferase reporter assay, overnight cultures of the r eporter strains wer e inoculated into the fr esh LB medium supplemented with appropriate antibiotics and L-arabinose for EnfR induction. When the cells were grown to an exponential phase (OD 600 of 0.5), 200 l of the culture was transferred into a well of the Nunc TM 96-well white / clear bottom plate (Thermo Fisher Scientific, MA). After further incuba tion a t 37 • C for 3 h, the OD 600 and cellular luminescence were measured using the Spark ™ microplate reader (Tecan, Switzerland). The relati v e luminescence unit (RLU) was deri v ed by di viding the cellular luminescence with OD 600 . The relati v e luminescence le v el was then calculated by normalizing the RLU of each sample with that of the EnfR-uninduced sample [Relati v e luminescence le v el = (RLU of the sample / RLU of the EnfR-uninduced sample) × 100 (%)].

TON 1525 is a homo-dimer
Proteins engaged in transcriptional regulations are generally homo-oligomers, and thus, we examined the oligomeric state of TON 1525 through the sedimentation velocity analytical ultracentrifugation (SV-AUC) (Figure 1 ). Occupying 99.98% of the total protein peak area, it was analyzed to have a molecular weight of 33,800 Da. Consequently, it can be safely concluded that TON 1525 exists as a homo-dimer in solution.

TON 1525 adopts an 'eighth note' symbol-like fold that is assembled into an X-shaped dimer
We determined the 2.8 Å resolution crystal structure of the wild-type TON 1525 (Supplementary Table S1). There are two monomers in the asymmetric unit of the wild-type TON 1525 crystals, which is compatible with our demonstra tion tha t TON 1525 is a dimeric protein in solution. The structures of the two monomers are virtually identical; their structur es ar e superposed with the root mean squar e (r.m.s.) deviation of 0.490 Å for all C ␣ atoms. Monomeric structure of TON 1525 resembles an 'eighth note ( )' symbol in that it can be divided into three parts: a flag, a straight stem and a note-head (Figure 2 A) (hereafter referred to as 'eighth note' fold regulator, EnfR). The flag part contains two Nterminal ␣1 and ␣2 helices (residues 1-34), the stem part is a long ␣3 helix (residues 35-62) and the note-head corresponds to the C-terminal ␣+ ␤ domain (residues 63-146) in which a three-stranded antiparallel ␤-sheet ( ↑ ␤1-↓ ␤2-↑ ␤3) is sandwiched by the stem helix ( ␣3) on one face and two Cterminal helices ( ␣4 and ␣5) on the other face (Figure 2 A). The C-terminal ␣+ ␤ note-head domain has a ␤ A -␣ A -␣ B -␤ B -␤ C topology (the alphabet subscripts indicate the order of secondary structural elements r epr esented by Gr eek letters) (Supplementary Figure S1). This domain is unique in that most proteins having ␤-␣-␣-␤-␤ topology among the proteins belonging to the ␣+ ␤ class defined by Structural Classification of Proteins (SCOP) ( 42 ) shows ␤ A -␣ A -␣ B -␤ C -␤ B topolo gy, w hich displayed a re v ersed or der of the second and third ␤-strands compared with EnfR.
The dimerization of EnfR is mediated mainly by the flag and stem helices with a contact area of ∼2300 Å 2 , covering one-third of the molecular surface of a monomer. The long stem helices from two monomers form antiparallel Xshaped cross interactions (Figure 2 B). Ala52 and Ile56 from a monomer form a hydrophobic core with the same residues from another monomer at the central intersection where the molecular 2-fold axis is located (Figure 2 C). Three hydrophobic residues (Ile58, Trp59, and Ile62) at one end of the stem helix in a monomer make e xtensi v e contacts with a hydrophobic patch lined by Leu7, Ile12, Leu15, Arg16 (a three-carbon aliphatic chain in its side chain), Leu46, and Ile49 at the peripheral of the cor e (Figur e 2 B). To verify the importance of the hydrophobic interactions for the dimeric conformation, an EnfR mutant harboring three point mutations (A52W, I58G and I62G) was designed. The introduction of Trp into position 52 at the hydrophobic core was for steric clashes, and the two Ile → Gl y m utations were to reduce contacts with the hydrophobic patch. As expected, SV-AUC re v ealed that 72.3% of the total protein peak area was estimated to have a monomeric size (Supplementary Figure S2). In addition to hydrophobic dimeric contacts, electrostatic interactions contribute in part to the dimerization. The acidic EEFDE segment (residues 138-142) of the note-head ␣+ ␤ domain in one monomer makes interactions with a basic side of ␣1 harboring Lys13, Arg16, and Arg20 in the other monomer (Figure 2 C).
Mycobacterium tuberculosis RslA-bound RN A pol ymerase sigma factor 4 L is the highest structural homolog of monomeric EnfR searched by the Dali server (Supplementary Table S7) ( 43 ). However, 4 L displays a low structural similarity only towards the ␣2-turn-␣3 HTH motif of EnfR ( Dali Z -score of 8.5, r.m.s. deviation of 1.8 Å ) (Supplementary Figure S3 and Table S7). In addition, the dimerization mode of EnfR is not observed among all structural homologs with Dali Z -score > 6. The unique tertiary and quaternary structures of EnfR resembling the eighth-note indica te clearly tha t the Tfx family is structurally distinct from other transcription factor families.
In addition to DNA-binding domains (DBD), transcription factors generally hav e e xtra domains that mediate oligomerization and / or recognize molecular signals such as small molecules or proteins. For example, TrmB from Thermococcales and Halobacterium salinarum contains an

N-terminal DBD and an effector binding domain (EBD).
Sugar such as sucrose and maltotriose binds to EBD and induces a conformational change of TrmB to control the transcription of genes related to sugar metabolism ( 44 , 45 ). In EnfR, the C-terminal ␣+ ␤ note-head domain corresponds to the extra domain. As an effort to get an idea of the functional role of the note-head domain, we searched for structural homologs using the Dali server (Supplementary Table  S8). A part of the E.coli 4-amino-4-deoxychorismate lyase (PDB code: 1I2L) was identified as the highest structural homolog; the Z -score was 6.1, and the r.m.s. deviation was 3.2 Å for 73 ma tching C ␣ a toms. Structural investiga tion re-vealed that the homologous region in the lyase is implicated in dimerization, which suggests that the note-head domain of EnfR appears not to function as an EBD.
The absence of EBD, which senses molecular signals and induces structural altera tion, suggests tha t an antir epr essor / r epr essor system ( Ant / Rep ) might regulate the DNA-binding activity of EnfR. Ant is generally known to bind to DN A-reco gnition sites of Rep to pre v ent the formation of the R ep -operator comple x ( 46 ). Ther e ar e also novel types of Ant that bind not to DNA-binding sites but to other regions to inactivate Rep ( 47 ). In these ways, Ant disrupts the Rep -operator complex, allowing the transcrip-tional machinery to initiate gene expression. Unfortunately, there is no known Ant of EnfR. Consequently, it is a future challenge to re v eal how the negati v e regulation of the codh gene cluster by EnfR is relie v ed.

The target DNA sequence and the implication of the Nterminal loop in DNA recognization
To elucidate the DN A-reco gnition mode of EnfR, first, we mapped the binding sequence for EnfR in the codh promoter region by using the DNase I protection assay. In the previous study, it was confirmed that the purified EnfR binds specifically to the 150-bp long DNA probe containing the promoter region of codh gene cluster ( 24 ). When the same DNA probe was incubated with the purified EnfR protein, a region extended from -44 to +3 (centered at -21 from the translation start site of TON 1017, a gene expressed pol ycistronicall y with the codh gene TON 1018) ( 13 , 48 ) was pr otected fr om DNase I digestions (Figure 3 A). Inspection of this sequence revealed a 36-bp pseudo-palindromic inverted repeat (5 -cCaTCtTAa AA TcTttttgtttAcA TT aTAcGAgGt-3 ); 18-bp minimal binding region is underlined. Considering the dimeric structure of EnfR, this pseudo-palindromic sequence is highly likely to be a target site for EnfR-binding.
Afterward, we built an in silico model structure of the EnfR / DNA complex to reveal the DN A-reco gnition mode of EnfR in structural aspects. Among structural homologs searched by the Dali server (Supplementary Table S7), there were se v eral RNAP 70 -like factors that are necessary for the initiation of transcription in bacteria. Consistently, according to the Pfam analyses, the ␣2-turn-␣3 HTH motif (residues 7-49) in EnfR corresponds to the region 4 helixturn-helix (r4-HTH) motif of RNAP 70 (Figure 3 B). The ␣2-turn-␣3 HTH motif of EnfR is nicely overlapped onto the r4-HTH motif of Thermus aquaticus A fragment with the r.m.s. deviation of 0.587 Å for C ␣ atoms of residues 7-49 (Supplementary Figure S4). Since the r4-HTH motif is responsible for DN A reco gnition ( 49 ), it would appear that the ␣2-turn-␣3 HTH motif is the DNA-binding site in EnfR. In the dimeric structure, two ␣2-turn-␣3 HTH motifs are ∼35.8 Å apart from each other (using the C ␣ of Glu42 as a r efer ence point). Since the spacing between two successi v e major groov es is ∼36 Å in the B-form DNA, the two HTH motifs seem to be suitably arranged for the recognition of the target DNA sequence. Based on these perspecti v es, the comple x model structure was generated using the HADDDOCK v2.4 program (see the Materials and Methods section).
The resulting complex model is quite reasonable considering electrostatic complementarity and favorable interactions between EnfR and DNA ( Figure 3 C and D). As shown in Figure 3 C, the positi v ely-charged electrostatic potential of the DNA-binding interface in EnfR is adequate to accommodate the negati v ely-charged sugar-phosphate backbone of DNA. The N-terminal loop (residues 1-6, referred to as the N -6 region hereafter) in the two monomers is situated in such a way to make contact with the minor groove (Figure 3 C). In addition, the N-terminal part of the stem helix ( ␣3) in the HTH motifs fits into the major groove with three arginine residues (Ar g35, Ar g43 and Arg44) in-teracting with DNA. Arg43 and Arg44 extend their side chains toward the bases in the major groove, and Arg35 makes contact with the phosphorous backbone of DNA (Figure 3 D). It should be noted that the guanidino group of arginine residues is well known to interact with the base and the backbone of DNA in the protein-DNA complexes ( 50 ).
To verify the in silico complex model, we made four mutant proteins (a mutant with the R35A replacement, a mutant with a R43A replacement, a mutant with a R44A replacement, and a mutant with N-terminal six residues deleted ( N 1-6 )). Notably, the three arginine residues in the HTH motif (Ar g35, Ar g43, and Arg44) and the Nterminal six residues are directly involved in DNA-binding in the complex model. Point mutations introduced into the solv ent-e xposed arginine residues and the deletion of the N -6 region at the N-terminus cannot affect the dimeric nature of EnfR, which was verified by AUC experiments in which all the mutants wer e r e v ealed to be dimers (Supplementary Figure S5). The DNA-binding activities of the four mutant proteins (the R35A mutant, the R43A mutant, the R44A mutant, and the N 1-6 mutant) were analyzed by the electrophoretic mobility shift assay (EMSA) with the 36-bp target DNA sequence. In the case of the R35A and R43A m utants, their DN A-binding activities were drastically decreased, and the R44A mutant seemed to lose the DNA-binding activity (Figure 4 ). These observations, together with the defect of the N 1-6 mutant in DNA-binding (Figure 4 ), demonstrated that the N -6 region and the HTH motif take part in DN A reco gnition as suggested by the in silico complex model.
For in vivo validation of the in vitro mutational studies, we de v eloped two reporter systems: (i) the road-blocking system, of which the target DNA sequence is located between a bacterial constituti v e promoter (J23117) ( 40 ) and a transcriptional start site of the luciferase ( lux ) operon and (ii) the steric hindrance system, of which the target sequence is partiall y overla pped with another type of bacterial constituti v e promoter (J20110) ( Figure 5 A and B) ( 40 ). The binding of EnfR to the target sequence in vivo would interfere with the proceeding or binding of bacterial RNAP in the road-blocking or steric hindrance system, respecti v el y. Indeed, w hen EnfR was induced via arabinose (Figure 5 A and B), the luminescence le v els were significantly reduced in both systems in an arabinose concentrationdependent manner. If the wild-type EnfR was induced with 0.01% (v / v) of arabinose in either the road-blocking or steric hindrance system, a relati v e luminescence le v el was about 35% compared to the uninduced condition. When R35A, R43A, R44A or N 1-6 was induced, the relati v e luminescence le v els were in the range of 55-85%, which indica ted tha t all the tested m utants have defects in DN A binding as shown in EMSA and that the three arginine residues and the N -6 region contribute to DNA-binding ( Figure 5 C  and D).
Furthermore, to examine the effect of base substitutions on EnfR-binding, four base pairs with palindromic relationships in the target DNA sequence were changed; A10-T27 → C10-G27, A11-T26 → C11-G26, T12-A25 → G12-C25 and T14-A23 → G14-C23 (Supplementary Figure S6A and Table S3). Then, the influence of the base substitution Nucleic Acids Research, 2023, Vol. 51, No. 18 10033    and their r epr essor function was examined via luciferase reporter gene assay using road-blocking ( C ) or steric hindrance system ( D ). ( E ) EnfR-mediated r epr ession was significantly dampened if the EnfR-binding sequence in the promoter was mutated (T14G-A23C). WT indicates the wild-type EnfR. For A, B and E, a schematic diagram of each promoter is shown above the graph. The statistical significance was determined by ordinary one-way ANOVA with multiple comparisons (A, B and E) or by multiple t-tests (C and D) ( ns , not significant; **** P < 0.0001; ** P < 0.01; * P < 0.05). EnfRB, EnfR-binding site; EnfRB mt, Enf-binding site with mutation. on EnfR-binding was examined by EMSA. Among the four substitutions, the T14G-A23C substitution was remar kab le in the reduction of EnfR-binding (Supplementary Figure  S6B), which was confirmed by the luciferase reporter assay. Compared to the uninduced control, a relati v e luminescence le v el in the presence of the T14G-A23C substituted promoter sequence remained at ∼85% and ∼65% after the wild-type EnfR was induced with 0.01% and 0.1% (v / v) of arabinose, respecti v ely (Figure 5 E). In contrast, when the original target sequence was used, the relati v e luminescence le v el was decreased to about 39% and 22% at 0.01% and 0.1% (v / v) arabinose, respecti v el y (Figure 5 B), w hich indica ted tha t the T14G-A23C substitution r educed the r epr ession activity of EnfR. Collectively, it can be concluded that EnfR loosely binds to the target DNA sequence harboring the T14G-A23C substitution. The road-blocking system was not employed in this base substitution experiment since the T14G-A23C substitution resulted in no expression of lux oper on, pr obably due to the formation of a stable secondary structure in the 5 untranslated region of the transcript.

Structur al explanation f or the defect of the T55I mutant in DNA-binding
The transcription le v el of genes in the codh cluster was reported to be affected by the T55I mutation in EnfR ( 24 ). To examine the influence of the T55I mutation on the DNAbinding activity of EnfR, EMSA was performed. The Cy5labelled target DNA sequence was titrated as the concentration of the wild-type and T55I mutant increased. In both cases, single retarded bands were observed, and their intensities were increased in a concentration-dependent manner (Figure 6 B), which indicates only one dimeric EnfR binds to the target sequence. Remar kab l y, the T55I m utation seemed to cause a defect in the DNA-binding activity. The intensity of the retarded band in the presence of the T55I mutant was lower than that of the wild-type in EMSA (Figure 6 ). Consistentl y, w hen the T55I m utant was induced in the luciferase reporter assay, the luminescence le v el was incr eased compar ed to in the presence of the wild-type (Figure 5 C and D). For quantitati v e measurements of the negati v e effect of the T55I mutation on DNA-binding activity, we determined the dissociation constant ( K D ) of the wild-type and the T55I mutant towards the target DNA sequence by isothermal titration calorimetry (ITC). The K D values of the wild-type and the T55I mutant were calculated to be 61.8 ± 20.5 and 948.0 ± 130.0 nM, respecti v ely, clearly indica ting tha t the DNA-binding af finity of the mutant is much lower (over fifteen-fold) than that of the wildtype. The stoichiometry parameters ( N ) were 0.94 ± 0.01 for the wild-type and 0.80 ± 0.01 for the T55I m utant, w hich confirmed that one dimeric EnfR binds to the 36-bp target DNA sequence (Figure 7 ).
To get structural insights into the reduced DNA-binding activity of the T55I mutant, we determined its 3.4 Å resolution crystal structure (Supplementary Table S1). The overall structures of the monomeric and dimeric T55I mutant are virtually identical to those of the wide-type. The r.m.s. deviations of monomers and dimers between the mutant and the wild-type are 0.827 and 0.941 Å , respecti v ely, for all C ␣ atoms ( Figure 8 ). Despite this structural resemblance, howe v er, the T55I mutant has a remar kab le structural differ-ence in the orientation and conformation of the N -6 region that is engaged in DN A reco gnition. In the wild-type dimer, the side chain hydr oxyl gr oup of Thr55 in a monomer is hydrogen-bonded to the backbone -NH group of Phe6 in another monomer, which directs the N -6 region to extend out of the main body ( Figure 8 ). Two benzyl side chains of Phe6 in the N -6 regions of two monomers formstacking interaction to stabilize the observed conformation of the N -6 region. In contrast, in the case of the T55I mutant, the N -6 region runs in a different direction due to the loss of the hydro gen bond, w hich is the consequence of the isoleucine replacement at position 55 ( Figure 8 ). In this conformation of the two N -6 regions, the two phenylalanine residues cannot formstacking interactions.
To elucidate structural determinants for the dramatic effect of the T55I mutation on DNA-binding activity, we made a complex model between the T55I mutant and DNA by substituting the wild-type structure with the T55I mutant structure in the in silico EnfR / DNA complex model. In the resulting complex model, it was obvious that the N -6 region of the mutant clashes with DNA (Figure 6 A). Consequently, it can be concluded that the altered conformation of the N -6 region induced by the T55I mutation disturbs the DNA-binding activity of the mutant, and the conformation of T55I mutant is not adequate in DNA-binding.

EnfR is a r epr essor r egulating the expr ession of the codh gene cluster
The ar chaeal cor e promoter r egion contains two consensus sequences, A-T rich T AT A-box and purine-rich BRE ( 25 ). Transcription factor B (TFB) and T AT A-binding protein (TBP) bind to BRE and T AT A-box, respecti v ely, and recruit RNAP to form pre-initiation complexes (PIC) and start transcription ( 51 ). When we analyzed the 36-bp target DNA sequence resulting from the DNase I protection assay, there was a T AT A box-like sequence consisting of TT AT A in 15-19 bp upstream from the translation start site of the codh gene cluster. In other words, EnfR binds to the T AT A box-like region. Notably, transcription repressors in archaea usually bind to sequences downstream of BRE including T AT A-box regions to block the formation of PIC ( 52 ). Ther efor e, the sequence analysis for the promoter region of the codh gene cluster, together with the reduced affinity of the T55I mutant toward the target DNA sequence, suggests that EnfR acts as a transcriptional repressor that controls the expression le v el of the codh gene cluster.

Changes in diverse gene expression in the MC11 strain promote the cell growth and H 2 production under carboxydotrophic condition
According to our pr evious r eports ( 24 , 35 ), the T55I mutation in EnfR (156T strain) and the deletion of EnfR ( 1525 strain) led to significant increase of CO oxidation activity and H 2 production rate, which was accompanied by the incr eased expr ession of the codh gene cluster. In addition, we have also shown that 1525 led to expression changes in various genes ( 35 ). To further explore the inherent role of EnfR, we performed comparati v e transcriptome analyses between the wild-type and the MC11 strain harboring the T55I mutation. Sequencing reads were mapped to 1914 protein-coding genes. Among them, significantly ( ≥2fold) up-or down-regulated genes in the MC11 strain were 101 (5.3%) and 77 (4.0%) genes, respecti v ely (Supplementary Table S9 and S10). The expression of the gene encoding EnfR (TON 1525) also increased 2.9-fold, suggesting the potential for self-regula tion. The dif fer entially expr essed genes (DEGs) were assigned to multiple groups of the archaeal clusters of orthologous genes (arCOGs) (Figure 9 A).
Groups P (inorganic ion transport and metabolism) and C (energy production and conversion) had high proportions for up-regulated DEGs (18.7 and 11.3%). On the other hand, gr oups C (energy pr oduction and conversion), I (lipid transport and metabolism) and V (defense mechanisms) showed high proportions for down-regulated DEGs (16.7, 11.8 and 10%). It is noteworthy that group C showed an . The number indicates the proportion of genes that show significant ( ≥2-fold) increase (filled bars) or decrease (empty bars) in each group. The one letter code for arCOG categories is as follows: P, Inorganic ion transport and metabolism; C, Energy production and conversion; U, Intr acellular tr afficking, secretion and vesicular tr ansport; S, Function unknown; R, General function prediction only; Q, Secondary metabolite biosynthesis, transport and catabolism; E, Amino acid transport and metabolism; H, Coenzyme transport and metabolism; L, Replica tion, recombina tion and repair; T, Signal transduction mechanisms; J, Translation, ribosomal structure and biogenesis; G, Carbohydrate transport and metabolism; O, Posttranslational modification, protein turnover and chaperones; N, Cell motility; K, Transcription; F, Nucleotide transport and metabolism; V, Defense mechanisms; M, cell wall / membrane / envelope biogenesis and I, Lipid transport and metabolism. (B, C) Expression profile of (de)hydrogenases in transcriptome analysis ( B ) and western blotting analysis ( C ). The fold change r epr esents the ratio of the expression level of each gene identified from the triplicate analysis between the wild-type and the MC11 strain. ( D ) Cell growth and H 2 production of the wild-type (closed circles) and the MC11 strain (open circles) strains with 100% CO. The data for the wild-type strain was adapted from our previous report ( 35 ). Error bars indicate the standard deviations of three independent cultivations of the MC11 strain in this study. amphoteric pattern with a high proportion of both up-and down-regulated DEGs. Since hydrogenases are known to play an important role in energy metabolism by generating electrochemical ion gradient along with H 2 production or in H 2 metabolism by disposing of reducing equivalents in hyperthermophilic archaea such as Pyrococcus and Thermococcus ( 34 , 53 , 54 ), the expression change of the genes encoding them was further analyzed. Transcriptome analyses showed that many hydrogenase genes and related dehydrogenase genes were up-regulated in the MC11 strain: frhA encoding Frh hydrogenase ␣ subunit (1.4-fold), mbh encoding membrane-bound hydrogenase (2.2-fold), mch encoding membrane-bound carbon monoxide-dependent hydrogenase (2.3-fold), sulf1 encoding soluble hydrogenase Sulf1 (1.2-fold) and codh encoding carbon monoxide dehydrogenase (2.4-fold) (Figure 9 B). On the contrary, the expression le v el of sulf2 gene encoding soluble hydrogenase Sulf2 was 0.7-fold down-regula ted (da ta not shown), and mfh2 encoding membrane-bound formate-dependent hydrogenase was significantly down-regulated by 21.5-fold in the MC11 strain (Figure 9 B). Expression changes at the transcription le v el were consistent with the data at the protein le v el obtained by Western blotting (Figure 9 C).
Pr eviously, r esting cell assay revealed enhanced CO oxidation activity of the MC11 strain ( 24 ). Considering the upregulation of codh and mch gene expression, it was assumed that carboxydotr ophic gr owth of the MC11 strain would be improv ed. Accor ding to bioreactor experiments, significant enhancement of cell growth and H 2 production of the MC11 strain was observed (Figure 9 D). The maximum cell density and H 2 production rate were increased 2.5-and 2.8fold, respecti v ely, compared to those of the wild-type strain.
In this study, whole transcriptome analysis re v ealed the potential of EnfR to control expression of a broad array of genes at the transcription le v el. Sulfur-responsi v e regulator, SurR and Thermococcales gl ycol ytic regulator, Tgr / TrmBL1 ( 55 , 56 ), are well-known global regulators in Thermococcales species to regulate di v erse gene e xpression associated with energy conservation and central metabolic pathwa y f or gl ycol ysis and gluconeo genesis, respecti v ely ( 56 , 57 ). It is not yet clear whether EnfR can have pleiotropic effects on various cellular processes. Howe v er, the distribution of large numbers of DEGs for a wide range of arCOGs implica tes tha t EnfR might have a role in regulating the expression of di v erse genes and has the potential to influence various metabolic processes. To investigate whether EnfR binds to promoter regions of DEGs, we selected two up-regulated genes TON 1582 and TON 0537 that encode a Na + / H + antiporter and sulfhydrogenase ␤ subunit SulfI, respecti v ely. In addition, one down-regulated gene TON 1563 encoding formate dehydrogenase was selected. According to EMSA experiments (Supplementary Figur e S7), r etarded bands for the promoter DNA sequences of the three genes were observed in the presence of EnfR, and they disappeared when cold DNA was added, which showed the specific association of EnfR with the three promoter regions. To verify the possibility of EnfR to act as a global regulator, further studies including chromatin immunoprecipitation sequencing and investigation of phenotypic changes of the EnfR mutants under various culture conditions are needed.

DA T A A V AILABILITY
The atomic coordinates and structure factors of the final models have been deposited in the Protein Data Bank with ID codes 8HNO (the wild-type), 8HNP (the T55I mutant).
RNA sequence data were submitted to the NCBI Gene Expression Omnibus (GEO) database ( http://www.ncbi. nlm.nih.gov/geo ) with accession code GSE200806.

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.