DUF2285 is a novel helix-turn-helix domain variant that orchestrates both activation and antiactivation of conjugative element transfer in proteobacteria

Abstract Horizontal gene transfer is tightly regulated in bacteria. Often only a fraction of cells become donors even when regulation of horizontal transfer is coordinated at the cell population level by quorum sensing. Here, we reveal the widespread ‘domain of unknown function’ DUF2285 represents an ‘extended-turn’ variant of the helix-turn-helix domain that participates in both transcriptional activation and antiactivation to initiate or inhibit horizontal gene transfer. Transfer of the integrative and conjugative element ICEMlSymR7A is controlled by the DUF2285-containing transcriptional activator FseA. One side of the DUF2285 domain of FseA has a positively charged surface which is required for DNA binding, while the opposite side makes critical interdomain contacts with the N-terminal FseA DUF6499 domain. The QseM protein is an antiactivator of FseA and is composed of a DUF2285 domain with a negative surface charge. While QseM lacks the DUF6499 domain, it can bind the FseA DUF6499 domain and prevent transcriptional activation by FseA. DUF2285-domain proteins are encoded on mobile elements throughout the proteobacteria, suggesting regulation of gene transfer by DUF2285 domains is a widespread phenomenon. These findings provide a striking example of how antagonistic domain paralogues have evolved to provide robust molecular control over the initiation of horizontal gene transfer.

by chromosomal reintegration in both host and recipient. Traits encoded by well-characterized ICEs include antibiotic resistance, catabolism of xenobiotic compounds, and determinants of pathogenesis and symbiosis ( 2 , 3 ).
The symbiosis island of the soil bacterium Mesorhizobium japonicum R7A (ICE Ml Sym R7A ) is a 502-kb ICE that transfers to non-symbiotic Mesorhizobium sp. strains, rendering them capable of forming a nitrogen-fixing symbiosis with legume hosts ( 4 , 5 ). ICE Ml Sym R7A integrates into the 3 -end of the chromosomal phe-tRNA gene through the action of the site-specific recombinase IntS. Subsequent transfer r equir es the expr ession of the r ecombination directionality factor RdfS, which stimulates IntS to catalyse excision of ICE Ml Sym R7A ( 6 , 7 ). Excision and transfer of ICE Ml Sym R7A is stimulated by the quorum-sensing (QS) regulator TraR, which in the presence of N -acyl homoserine lactone signalling molecules activates expression of the ICE Ml Sym R7A transcriptional activator FseA ( 7 , 8 ). FseA then acti vates e xpression of RdfS to initiate e xcision of ICE Ml Sym R7A ( 9 ).
The FseA protein is encoded by tw o o verlapping open reading frames, msi172 and msi171 . During translation of the msi172-msi171 mRNA, a low-frequency + 1 progr ammed ribosomal fr ameshift (PRF) fuses the msi172 and msi171 coding sequences to produce the FseA polypeptide ( 9 ). FseA contains two domains of unknown function (DUF), the N-terminal DUF6499 and the C-terminal DUF2285. Genes encoding FseA-like proteins and the conserved + 1 PRF site are found throughout the ␥ , ␤ and ␣proteobacteria, but ar e fr equently misannotated or unannota ted ( 7 ). W hile FseA shares only weak primary sequence similarities with known DNA-binding proteins, genetic experiments indicate FseA is likely a direct activator of the r dfS promoter (P r dfS ) that may bind a conserved inverted repeat (IR) sequence present immediately upstream of the P rdfS -35 element ( 9 ).
Although ICE Ml Sym R7A excision and horizontal transfer can be activ ated b y QS and FseA, only a minority of cells in R7A populations respond to N-acyl homoserine lactone and participate as donors of ICE Ml Sym R7A . In R7A populations most cells are inhibited for QS and ICE Ml Sym R7A excision and transfer by the antiactivator protein QseM ( 8 ). The remaining cells are repressed for qseM transcription by a bistable epigenetic switch, which allows for a small proportion of the population to participate in QS and initiate ICE Ml Sym R7A excision and transfer ( 10 ). QseM contains a lone DUF2285 domain that shares ∼18% amino acid identity with the FseA DUF2285 domain. Bacterial two-hybrid experiments show QseM directly interacts with the msi172encoded portion of FseA (composed of DUF6499) and, independently , with T raR in the presence of N -acyl homoserine lactone ( 8 , 9 ). In summary, QseM through its dual target antiactivation of TraR and FseA is the critical factor determining the ability of cells to become epigenetically activated for QS and ICE Ml Sym R7A transfer.
Here, we show that purified MBP-tagged FseA forms a homodimer in solution and binds to DNA containing the IR r egion upstr eam of P rdfS . The entire IR and the inverted orientation of its r epeats ar e critical for FseA-dependent transcriptional activa tion. Computa tional prediction of the FseA structure suggests that the DUF2285 domain folds into a distinct variant of the DNA-binding helix-turn-helix (HTH) domain that deviates from the canonical HTH domain by containing an 'extended turn' motif. The FseA DUF2285 domain is also predicted to make core interdomain contacts with ␣-helix two of the FseA DUF6499 domain. Conserved residues in both DUF domains are critical for activation of P rdfS , and residues that make up a positi v ely charged surface of the DUF2285 domain are critical for DNA binding. We determined the structure of QseM by nuclear magnetic resonance (NMR), re v ealing that monomeric QseM also contains an extended-turn variant of the HTH domain akin to the FseA DUF2285 domain prediction. QseM binds ␣-helix two of the FseA DUF6499 domain and likely mimics the key contacts formed between the FseA DUF2285 domain and DUF6499 domain. QseM has an overall negatively charged surface and is unable to bind P rdfS DNA. Ther efor e, QseM appears to have evolved to become an antiactivator of FseA that has lost DNAbinding ability but retained the ability to bind the DUF6499 domain ␣-helix two of FseA.

Strains, plasmids, and growth media
Mesorhizobium japonicum and Esc heric hia coli strains used in this study are listed in Supplementary Table S1. Plasmids used in this study are listed in Supplementary Table  S2. Bacterial strains were cultured as previously described ( 6 , 7 , 11 ). W here appropria te, media were supplemented with antibiotics at the following concentrations: ampicillin (Ap) 100 g / ml, chloramphenicol (Cm) 50 g / ml, kanamycin (Km) 50 g / ml, gentamicin (Gm) 50 g / ml ( E. coli ) and 25 g / ml ( M. japonicum ), tetracycline (Tc) 10 or 15 g / ml ( E. coli ) and 2 g / ml ( M. japonicum ). Medium used to grow E. coli ST18 was supplemented with 50 g / ml of 5aminolevulinic acid.

Cloning
DNA cloning was carried out using Gibson assembly (New England BioLabs) according to the manufacturer's instructions. Gene mutations or truncations were generated with synthesized DNA oligonucleotides or using PCR. PCRbased mutagenesis was carried out with DNA primers incorpora ting mis-ma tched base pairs compared to the wildtype gene template DNA. For truncations, DNA primers bound to template sequence sites that excluded either upstream or downstream sequence in the amplified product. PCR amplification of DNA for cloning was carried out using Phusion DNA polymerase (New England BioLabs) according to the manufacturer's instructions. All constructs were confirmed using Sanger sequencing (Massey Uni v ersity Genome Service). Conjugation of plasmids from E.coli ST18 to M. japonicum R7ANS was performed by biparental spot mating as previously described ( 12 ).

␤-Galactosidase assays
Br oths inoculated fr om single colonies of M. japonicum R7ANS cells (R7A cur ed of ICE Ml Sym R7A , ther eby avoiding possible interference from ICE genes) carrying pSDZ-P rdfS -lacZ or deri vati v es, or pSDZ-P rdfS-lacZ and pPR3G Nucleic Acids Research, 2023, Vol. 51, No. 13 6843 or its deri vati v es were grown for ∼72 h. One hundred microliters of culture was inoculated into fresh medium with or without 1 mM isopropyl ␤-D -1-thiogalactopyranoside (IPTG) and grown for 18-20 h. Cell density was estimated by absorbance at OD 600 , and cells were analysed for ␤-galactosidase expression using ortho-nitrophenyl-␤galactoside as previously described ( 13 ).

Protein expression and purification
6H-MBP-FseA and 6H-MBP-FseA R247A-R248A were expressed from pETM-41 in E. coli strain NiCo31(DE3). An overnight LB culture containing Km was used to inoculate 500 ml of LB containing Km, and the culture grown at 37 • C to an OD 600 of ∼0.3. The temperature was reduced to 18 • C and the culture further grown to an OD 600 of 0.6, at which point IPTG was added to a final concentration of 1 mM. After shaking overnight at 180 rpm, cells were harvested by centrifugation. Cell pellets were resuspended in binding buffer (50 mM Na 2 HPO 4 / NaH 2 PO 4 (combined to final pH of 6.35), 10% (v / v) glycerol, 500 mM NaCl, 20 mM imidazole), and supplemented with one cOmplete EDTA-free Protease Inhibitor Cocktail tablet (Roche) and 20 g / ml DNaseI before lysis by fiv e cycles through a Fr ench Pr ess (Homogenising Systems) at 10 000 psi. Soluble lysate was separated by centrifugation at 4 • C for 30-45 min at 15 000 × g and then loaded onto a 1 ml HisTrap FF column (GE Healthcare) preequilibrated with binding buffer using a Ä KTA pure chromato gra phy system (GE Healthcare). Recombinant protein was eluted using a linear imidazole gradient to 100% elution buffer (50 mM Na 2 HPO 4 / NaH 2 PO 4 (combined to final pH of 6.35 or 7.5 for 6H-MBP-FseA and 6H-MBP-FseA R247A-R248A respecti v el y), 10% (v / v) gl ycerol, 500 mM NaCl, 500 mM imidazole). Purified recombinant protein was pooled and centrifuged at 20 000 × g for 5 min at 4 • C before further purification by size exclusion chromatography (SEC) using a HiLoad 16 / 600 Super de x 200 column (GE Healthcar e) pr e-equilibrated with SEC buffer (50 mM Na 2 HPO 4 / NaH 2 PO 4 (combined to final pH of 6.35), 10% (v / v) glycerol, 500 mM NaCl). Fractions containing purified protein were pooled and stored at -80 • C in 20-50 l aliquots until use. 6H-MBP-FseA was stored at 0.73 M final concentration. 6H-MBP-FseA R247A-R248A was concentrated to 30 M using a Vivaspin 6 MWCO 10 000 (Cytiva) column pre-equilibrated with SEC buffer before storage.
For use in electrophoretic mobility shift assays (EMSA), 6H-QseM was expressed and purified following the method abov e e xcept that pETM-11 was used as the host vector, all buffers were at pH 7.5, and a Super de x 75 Increase 10 / 300 GL column (GE Healthcare) was used for SEC.
For use in NMR experiments, 6H-QseM was expressed from pQE80 in E. coli strain BL21(DE3)pLysS. A 5-ml overnight culture was grown at 37 • C with Ap. The culture was used to inoculate 1 l of M9 minimal medium containing A p, w hich was grown at 37 • C with shaking for 12-24 h with induction by 0.2 mM IPTG. M9 minimal medium with 0.02 M 13 C-glucose and 9.3 mM 15 NH 4 Cl was used to express 6H-QseM with 13 C and 15 N necessary f or man y of the multidimensional NMR acquisitions. Cells were harvested by centrifugation at 4 • C for 20 min at 10 000 × g and resuspended in NMR binding buffer (100 mM Na 2 HPO 4 / NaH 2 PO 4 (combined to final pH of 7.5), 300 mM NaCl, 100 mM imidazole, 5-10% (v / v) glycerol). Cells were lysed using a Cell Disruptor CF (Constant Systems, UK) at 20 000 psi. The lysate was centrifuged at 4 • C for 45 min at 15 000 × g, then passed through a 0.2 m filter. Filtered lysate was loaded onto a 5-ml HisTrap HP column (GE Healthcare) pre-equilibrated with NMR binding buffer using a peristaltic pump (Bio-Rad) at a flow rate of 1-2 ml / min. 6H-QseM was purified using a Ä KTA pure chromato gra phy system, and a linear imidazole gradient to 100% elution buffer (100 mM Na 2 HPO 4 / NaH 2 PO 4 (combined to final pH of 7.5), 300 mM NaCl, 800 mM imidazole, 5-10% (v / v) glycerol). SEC was performed with a Superdex 200 16 / 600 column (GE Healthcar e) pr e-equilibrated with NMR SEC buffer (10 mM NaH 2 PO 4 , 20 mM NaCl (pH 7.5)). Protein was concentrated to 1-2 mg / ml using centrifugal filtration tubes (GE Healthcar e, Millipor e) prior to storage at -80 • C in 200-300 l aliquots.

Electrophoretic mobility shift assays
PCR amplification of DNA for EMSAs was carried out using Phusion DN A pol ymerase (New England BioLabs) and the primers listed in Supplementary Table S3. For the synthesis of fluorescent P rdfS DNA, 5 -IRDye700-tagged primers and a template of 1 ng / l of a pure 510-bp DNA fragment amplified from pSDZ-P rdfS were used in the PCR program: 98 • C for 30 s; 35 cycles of 98 • C for 10 s, 68 • C for 15 s, then 72 • C for 10 s; 72 • C for 5 min. Glycerol was added to the product at 15-20% (v / v), followed by purification by TAE agarose (3% (w / v)) gel electrophoresis (2 h at 65 V).
EMSA reactions with 6H-MBP-FseA alone were carried out in 10 l volumes containing 10 mM Na 2 HPO 4 / NaH 2 PO 4 (combined to final pH of 6.35), 220 mM NaCl, 6% (v / v) glycerol, 1 mM DTT, 0.01 g / l poly(dI.dC), 0.1 g / l herring sperm DNA, 5 nM fluorescent DNA probe, 0.1-0.19 g / l BSA, and denoted purified protein concentra tions. W here appropria te, excess unlabelled DNA was added to a final concentration of 260 nM and pre-incubated with protein for 30 min at 28 • C prior to adding the fluorescent P rdfS DNA. Binding r eactions wer e incuba ted a t 28 • C for 30 min. Samples were loaded onto a 4% (v / v) polyacrylamide gel (19:1 acrylamide / bis solution (Bio-Rad), 0.01% (v / v) TEMED, 0.02% (v / v) of 10% ammonium persulfate, 0.5 × TBE (45 mM Tris, 45 mM boric acid, and 1.25 mM EDTA (pH 8.3)) that was pre-run for at least 30 min. Gel electrophoresis was performed at 100 V for 50 min and fluorescent DNA imaged at 700 nm using an Odyssey Fc imaging system (LI-COR Biosciences) with Image Studio (version 5.2) (LI-COR Biosciences). Image Studio Lite (version 5.2) was used to quantitate protein-bound fluorescent DNA. The K D was determined with the ratio of bound to unbound DNA from three independent replicates using the non-linear r egr ession analysis specific binding with Hill slope in GraphPad Prism (version 9.1.2). Co-purified fluor escent P rdfS DNA that r emained equally unbound at each 6H-MBP-FseA concentration was excluded from the analysis.

Compilation of the FseA homologue database
FseA homologue sequences wer e acquir ed using PSI-BLAST wher e sear ches wer e performed with FseA against non-redundant protein translations (GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding environmental samples from WGS, accessed 18 / 02 / 2021). Sear ches wer e performed independently in ␣-, ␤-, ␥proteobacteria and excluding all three Classes, yielding: 5894 (six iterations), 753 (se v en iterations), 1019 (four iterations) and 266 sequences (nine iterations), respecti v ely. This resulted in an initial combined database of 7932 sequences. For FseA matches that contained only the DUF2285 domain, the corresponding DNA locus was inspected for the presence of an upstream PRF site, misannotated start / stop codons and the presence of an upstream DUF6499 domain. DUF6499 domains were identified through the presence of an upstream encoded 'AWEFLRRN' sequence motif characteristic of the DUF6499 domain. The frameshift site in each DNA locus was edited to produce a openreading frame and corresponding full-length FseA polypeptide containing both DUF6499 and DUF2285 domains. Shorter QseM-like proteins were distinguished from FseAlike activator protein sequences by their lack of an upstream encoded 'AWEFLRRN'-like motif. All FseA search ma tches tha t retrie v ed a lone DUF6499 domain were found to be encoded upstream of one of the DUF2285 domain matches identified in the above search and so were removed to avoid duplication. Large protein sequences of more than 400 amino acids wer e r emoved, as wer e sequences that did not contain a distinct 'AWEFLRRN' motif after sequence alignment using Clustal Omega. Lastly, sequences containing ambiguous amino acids (i.e. 'X') were remov ed. Ov erall, ∼59% of sequences of the starting database met the parameters above and coded for an identifiable 'AWEFLRRN' motif, making them homologues of FseA. The final sequences were aligned in Clustal Omega with the parameters: clustalo -i Fasta.txt -full -o MSA.fasta -wrap = 10 000 -output-order = tree-order -iterations 6 -max-guidetreeiterations = 6 -max-hmm-iterations = 6.
A curated multiple sequence alignment (MSA) supplemented the tr-RefineR osetta and R oseTTAFold predictions. No template was detected at the time of modelling with trRefineRosetta.
The QseM structure was predicted with AF2 through ColabFold. Prediction of coevolving pairs of FseA amino acids (Supplementary Table S4) was carried out using the GREMLIN ( 19 ) w e bserver ( http://gremlin.bakerlab.org/ ) with the curated FseA homologue database using the settings: generate MSA with HHblits with E-value 1E-10 and 0 iterations; filter MSA with coverage 75 and remove gaps 50.

Bacterial two-hybrid assays
In vi vo QseM-FseA inter actions were detected using the Bacteriomatch II Two-Hybrid System (Agilent) as previously described ( 8 ), with the following changes: screening medium contained 6.8% (w / v) Na 2 HPO 4 , 3% (w / v) KH 2 PO 4 , 0.05% (w / v) NaCl, 0.1% (w / v) NH 4 Cl; Cm and Tc were added to the final concentration of 25 g / ml and 12.5 g / ml, respecti v ely; LB was used as the recovery growth medium after electrotransformation, and no 3o x o-C6-HSL was added. Pr otein-pr otein interaction was detected by growth on selecti v e medium containing 5 mM 3amino-1,2,4-triazole. Plasmid co-transforma tion ef ficiency was determined by growth on nonselecti v e medium. Relati v e interaction strength was quantified in CFU / ml by the number of colonies growing on selecti v e medium compared to non-selecti v e medium. Biological replicates were performed with three technical replicates.

NMR spectroscopy
Purified 15 N / 13 C-labelled 6H-QseM (250 M) was prepared in 20 mM NaCl, 10 mM NaH 2 PO 4, and 10% (v / v) D 2 O for the acquisition of most spectra. For the acquisition of HCCH-TOCSY and 13 C NOSEY-HSQC spectra, the purified 15 N / 13 C-labelled 6H-QseM (250 M) in 20 mM NaCl, 10 mM NaH 2 PO 4 was lyophilized (Martin Christ, Alpha 3-4 LSCbasic) at room temperature for ∼12 h, then resuspended in the equivalent volume of D 2 O to maintain the buffer concentration. All samples were spiked with 4,4dimethyl-4-silapentane-1-sulfonic acid (150-300 M) as a chemical shift standard prior to NMR experiments. Spectra wer e acquir ed on a Bruker Avance III 600 or 800 MHz spectrometer at 298 K with a cryogenic TXI probe (600 MHz) or a cryogenic TCI probe (800 MHz).
CYANA was provided with a list of 680 chemical shift assignments and 535 15 N NOESY peaks, 694 13 C NOESY peaks, 94 13 C aromatic NOESY peaks, and 1488 2D NOESY peaks. Initial structure calculations were performed using a family of 100 structures each running for 10 000 steps of torsion angle dynamics. The 20 lowest energy structures after completion of the CYANA run were then used for subsequent w ater refinement. Refinement w as performed using the RECOORD protocol ( 22 ) using 500 annealing runs. The 100 lowest energy structures after annealing were then refined in water and the resulting 20 lowest energy structur es wer e used to form the final family of structures (Supplementary Table S5). The final family structure r esidues ar e Ramachandran favour ed (76%) and allowed (24%), with one outlier (His33). This set of models has been submitted to the PDB under record number 7UQT.

Small angle X-ray scattering (SAXS)
Size-exclusion chromato gra phy-coupled synchrotron small angle X-ray scattering (SEC-SY-SAXS) data were collected on purified 6H-QseM using the SAXS / WAXS beamline at the Australian Synchr otr on ( 23 , 24 ). Purified 6H-QseM (50 l at 10 mg / ml) was injected into a Super de x 200 Incr ease 10 / 300 GL (GE Healthcar e) pr e-equilibrated with buffer (10 mM NaH 2 PO 4 , 20 mM NaCl, 2% (v / v) glycerol (pH 7.5)) mounted on a Shimadzu HPLC system with a constant flow rate of 0.25 ml / min at 295 K. A 1-second continuous data-frame was collected using a Pilataus3 S 2M detector at a distance of 1.6 m. Data reduction and background subtraction were carried out using SCATTER-BRAIN, and data processed using ATSAS (version 2.8.4) software ( 25 , 26 ). The P ( r ), Porod volume and maximum dimension ( D max ) were calculated by GNOM ( 27 ). The ab initio SAXS envelope was generated using DAMMIN ( 28 ). SAXS data have been deposited to the SASBDB ( 29 ) under the accession code SASDNM8.

Size e x clusion chromatography coupled to multi-angle light scattering (SEC-MALS)
SEC-MALS experiments were carried out using a Super de x Increase 10 / 300 GL column (GE Healthcare) attached to a Viskotek GPCmax VE 2001 solvent / sample module (Malvern) coupled to a Viskotec 305 TDA detector array (Malvern) at room temperature. Two hundred l samples of purified 6H-MBP-FseA (1 mg / ml), 6H-QseM (1 mg / ml), a mixture of 6H-MBP-FseA and 6H-QseM (0.5 mg / ml of each protein), or BSA (1 mg / ml) standards in SEC buffer (pH 6.35) were applied to the size-exclusion column preequilibrated with SEC buffer at flow rate of 0.2 ml / min. The refracti v e inde x, UV absorbance and left and right-angle light scattering of the eluent were constantly monitored. OmniSEC (version 5.10) (Malvern) was used to analyse the SEC profiles and to calculate molecular weight averages and dispersity using calibration settings deri v ed from the average of fiv e BSA standards.

FseA binds the IR region of P rdfS to activate transcription
FseA activates transcription downstream of a conserved IR DNA sequence adjacent to the rdfS promoter -35 element ( 9 ). The IR of P rdfS is comprised of two inverted hexamers separated by 16 bp, with each hexamer containing two highly conserved central nucleobases (Figure 1 A, B). To establish the role of the IR in P rdfS activ ation, v ariants of the P rdfS sequence were constructed and then cloned upstream of a promoterless lacZ gene in plasmid pDSZ -fseA-6H . This vector also contained a genetically fused copy of fseA-6H (frameshift between msi172-msi171 removed; sequence encoding 6H added at 3 -end), which is under control of the leaky IPTG-inducible lac promoter. Each cloned P rdfS variant was then tested for activity in ␤-galactosidase reporter assays in M. japonicum strain R7ANS (which lacks ICE Ml Sym R7A ) in the presence and absence of IPTGinduced fseA-6H expression. P rdfS variants that contained a truncated IR region showed no ␤-galactosidase activity, confirming the IR is r equir ed for FseA activation of P rdfS (Figure 1 C). Single nucleotide changes made to either IR hexamer showed either little or no difference (50-80%) in ␤galactosidase activity compared to the wild-type sequence, and no difference in activity was observed for single nucleotide changes made to the sequence between the hexamers (Figure 1 C). Variants with one of either hexamer sequence in the re v erse orientation showed no ␤-galactosidase activity, indica ting the orienta tion of each IR hexamer was critical for activation (Figure 1 C). Together, these results demonstra te tha t the IR of P rdfS facilita tes transcriptional activ ation b y FseA and suggested FseA may bind the IR to activate P rdfS .
To confirm FseA bound the IR DNA, we expressed FseA as an N-terminal 6H-maltose-binding-protein (MBP) fusion protein. 6H-MBP-FseA was purified from E. coli by Ni 2+ affinity chromato gra phy followed by size exclusion chromato gra phy (SEC). Elution of 6H-MBP-FseA in SEC indicated a molecular mass of ∼158 kDa (theoretical monomer mass, ∼75 kDa), suggesting that the protein was a dimer in solution (Supplementary Figure S2A). Although the MBP tag of 6H-MBP-FseA contained a cleavable site, it was not removed because purifications of untagged FseA yielded insoluble aggregates. To investigate whether the MBP tag altered the ability of FseA to activate P rdfS , 6H-MBP-FseA was tested for transcriptional activation of P rdfS . Sequence encoding 6H-MBP-FseA was cloned under the control of the IPTG-inducible lac promoter in plasmid pSDZ-P rdfS . In the absence of IPTG induction, the 6H-MBP-FseA plasmid induced ␤-galactosidase activity to a le v el ∼3-fold higher than the plasmid carrying fseA-6H (Supplementary Figure S3), suggesting that the MBP tag  Figure  S3). These results confirmed that the MBP tag did not decrease the transcriptional activity of FseA in the 6H-MBP-FseA fusion.
To assess FseA DNA-binding, electrophoretic mobility shift assays (EMSAs) were performed using purified 6H-MBP-FseA together with a 71-bp dsDNA oligonucleotide containing the conserved IR sequence pr esent upstr eam of P rdfS (Figure 1 A). 6H-MBP-FseA produced a single, discrete shift in the migration of the IR DNA and bound with an approximate K D of 30 nM (Figure 1 D). No shift of the labelled IR DNA was observed when reactions included excess unlabelled IR DNA (S-comp, Figure 1 D), whereas the shift was unaffected by the addition of excess DNA amplified from the fseA gene (NS-comp, Figure 1 D). Thus, FseA specifically bound the IR region upstream of P rdfS , which hereafter we refer to as the FseA box.

The FseA DUF2285 is a variant of the HTH DNA-binding domain
The structure prediction tool trRefineRosetta was used to generate ab initio structure predictions for FseA based on coevolving r esidues inferr ed from custom sequence alignments. Since fseA homologues are often encoded on two separate open reading frames (DUF6499 and DUF2285 that through a +1 PRF generate a single protein), the polypeptide sequences of FseA homologues are frequently misannotated: DUF6499 can be unannotated or annotated with stop codons following the +1 PRF site and DUF2285 domains are generally annotated with incorrect start codons due to the upstream +1 PRF site. Therefore, we manually curated a database of FseA coding sequences, correcting for the + 1 frameshift. We identified 4709 unique FseA homologues fr om thr oughout the pr oteobacteria, ∼61% of which were encoded by two open reading fr ames separ ated by the conserved +1 PRF site. The FseA homologues were aligned for use in trRefineRosetta (Figure 2 A). The quality score (local distance difference test = 0.75) gave confidence in the overall fold of the predicted FseA structure. Subsequent to this work, the crystal structure of the distantly related RovC protein (PDB 6xz5), which acti vates e xpr ession of a Type VI secr etion system in Yersinia spp., was published ( 30 ) and the structur e pr ediction tools RoseTTAF old and AlphaF old2 (AF2) became available. FseA models obtained with each tool were highly similar to the trRefineRosetta model (Figure 2 B), and the AF2 model (Figure 3 A)  The FseA model contains three structured domains (demarcated using SWORD ( 31 )), with a disordered sequence at its N-terminus and a short link er-lik e sequence joining the 'middle' and DUF2285 domains (Figure 3 A). The Nterminal DUF6499 domain contains three ␣-helices across r esidues 10-94, her ein termed ␣1-␣3 (Figur e 3 B). The ␣2 helix contains the highly conserved 'AWEFLRRN' sequence motif (residues 31-38) (Figure 2 A), which serves as a central structural component that interconnects the DUF6499 domain and the C-terminal DUF2285 domain: Glu33 positions the side chain of Arg37 such that it interacts with Asp210 of the DUF2285 domain and, together with Arg36, contributes to the placement of the Trp32 side chain that makes hydrophobic contacts in the core of FseA (Figure 3 D). ␣1, which is not present in RovC, makes additional contacts with the DUF2285 domain. The middle domain, which exhibits low amino-acid sequence conservation (Figure 2 A), spans residues 95-193 and contains an anti-par allel ␤-sheet ( ␤-str ands 5-9) and an ␣-helix ( ␣4) that are positioned at the periphery of the structure (Figure  3 A, B). The C-terminal DUF2285 domain (residues 194-266), w hich is highl y conserved (Figure 2 A), contains fiv e ␣-helices (Figure 3 A, B). The ␣5 and ␣9 helices form a cleft that directly interacts with the ␣2 helix of the DUF6499 domain (Figure 3 A). ␣5 also contacts ␣1 of the DUF6649 domain through a distinct face.
To test if residues within and around the vicinity of the highly conserved ␣2 helix were important for FseA function, alanine substitutions were constructed in the ␣2 region and the resulting alleles were cloned and assayed for their ability to activate P rdfS . Almost all mutant proteins were abolished in their ability to activate P rdfS (Supplementary Figure S5). Even when the same substitutions were constructed in MBP-tagged FseA and induced with IPTG, the highest activation observed was less than 20% of the wildtype (Figure 4 , Supplementary Figure S6). These observations confirm that ␣2 is critical for function and support its role in maintaining FseA tertiary structure. DALI ( 32 ) was used to compare the predicted FseA DUF2285 structure with structures in the Protein Data Bank (PDB). This re v ealed that DUF2285 does indeed share structural similarity with DNA-binding HTH domains (Figure 3 C), including those present in sigma factors such as SigL (PDB 3HUG ( 33 )), transcriptional activators such as HetR (PDB 4IZZ ( 34 )), and quorumsensing transcriptional activators such as CviR (PDB 3QP6 ( 35 )). Howe v er, the FseA DUF2285 domain deviates significantly from the canonical HTH by containing additional sequence within the turn of the HTH motif, which extends the length of its turn (Figure 3 C). To clarify comparisons with the helices of canonical HTH domains (H1, H2, H3) and those of QseM, we refer to FseA helices ␣5-␣9 as H1, H2, H2b, H3 and H4 in following text (Figure 3 B, C).
Within the extended turn between helices H2 and H3 of the FseA DUF2285 domain is a short ␣-helix (H2b) orientated perpendicular to H1 and H3 (Figure 3 C). The Cterminus of H2b places a 79% conserved tryptophan residue Trp235 in a hydrophobic pocket formed by alanine, valine, and leucine residues from H2, H2b and H3, respecti v ely (Supplementary Figure S9). The DUF2285 domain forms an e xtensi v e positi v ely charged surface ( Figure 5B(i)), with a net positi v e charge of +7 (15-6-1-1; Arg-Asp-Glu-COO-(C-terminus)), which is consistent with a role in DNA binding ( 36 ).
To test if the solv ent-e xposed positi v ely charged residues in the FseA DUF2285 domain were required for transcrip-tional activ ation b y FseA, alanine substitution mutants of 6H-MBP-FseA were tested in vivo for their ability to acti vate e xpression from P rdfS . These mutants were tested in the 6H-MBP fusion only so that any destabilising effects of the mutations could be minimised by the added stability / solubility of MBP. Singly substituted proteins with R243A, R247A and R248A showed P rdfS transcriptional activation of less than 10% of wild-type 6H-MBP-FseA without, and 40% with, IPTG-induced 6H-MBP-FseA expr ession (Figur e 4 A, B). 6H-MBP-FseA containing a double substitution, R247A and R248A, showed no transcriptional activation from P rdfS even under IPTG-induced conditions (Figure 4 A). Purified 6H-MBP-FseA R247A-R248A exhibited an apparent molecular mass of ∼160 kDa in SEC experiments (Supplementary Figure S2B), indicating oligomerisa tion was unaf fected by the substitutions. EM-SAs carried out using 6H-MBP-FseA R247A-R248A re v ealed that the mutated protein exhibited greatly reduced binding affinity to the FseA box, with 3 M protein concentration only shifting ∼60% of the DNA (Supplementary Figure  S11). We ne xt inv estigated the Trp235 r esidue pr esent at the end of H2b and central to the hydrophobic pocket formed between H2, H2b and H3. A W235A substitution abolished transcriptional activ ation b y FseA in vivo (Figure 4 A, B), consistent with its high conservation in FseA homologues and possible key role in the hydrophobic pocket of the domain. In summary, purified FseA is a dimeric DNA-binding protein that interacts with DNA containing the FseA box that is upstream of P rdfS , with positi v ely charged residues in the HTH-like DUF2285 domain being r equir ed for DNA binding.

QseM is a HTH variant akin to the FseA DUF2285 domain but lacks a positively charged surface and H2b
The QseM polypeptide contains a DUF2285 domain with 18% amino-acid identity to that of the FseA DUF2285 domain. To gain insight into the structure of QseM, we undertook solution small-angle X-ray scattering (SAXS) on purified 6H-QseM. The data (Supplementary Figure S12) indicated 6H-QseM was an ellipsoidal globular monomer with a molecular mass in solution of 10 400 Da, close to the theoretical mass of 10 960 Da. The molecular dimensions ( R g 16 Å and D max 57 Å ) and the shape of the pair-distribution function were commensurate with a prolate ellipsoid, and Kra tky analysis indica ted a substantially order ed structur e.
We next determined the three-dimensional structure of 6H-QseM using solution NMR spectroscopy. All or most atoms of the residues Gln6-Val9, Ser12, Leu15-Arg53, Pro55-Trp71 and Leu80-Arg83, encompassing 78% of the QseM sequence, were assigned. Only the polypeptide backbone atoms were assigned for the residues Val5, Trp11, Glu54 and Met72. No atoms of the residues Ser1-Lys4, Asp7, Glu8, Pro10, Asp13, Ser14 and Val73-Lys79 were assigned. Residues 6-74 of QseM form an ordered trihelical arrangement, whereas both termini (residues 1-5 and 75-83) are fully or partially disordered. The ensemble of the 20 lowest energy models overlaid with an RMSD of 0.36 Å across the backbone heavy atoms of residues 6-72 (Supplementary Figure S10A). CRYSOL ( 25 ) comparison of the NMR structure to the SAXS data for QseM yields a very good fit ( 2 = 0.30, where 0.25 r epr esents an ideal fit for Australian Synchr otr on data). Taken together, the solution structur e measur ements both support the observa tion tha t QseM is a globular monomer in solution. The NMR structure and data have been deposited in the PDB with the code 7UQT. A single r epr esentati v e structure was used for further analysis (Figure 5 A).
6H-QseM exhibits three ␣-helices (H1, residues 17-34; H2, r esidues 39-46; H3, r esidues 55-71) with the H3 helix forming the backbone of the structure that the other two helices cross at close to 90 • , creating a hydrophobic cor e (Figur e 5 A). The H1 helix is curved along its length with an ov erall de viation of around 40 degrees, allowing it to partiall y wra p around H3. The structure of QseM is akin to that of the FseA DUF2285 domain, as it is also comprised of a HTH-like fold that contains an extended turn between H2 and H3 (Supplementary Figure S10C). Compared to the FseA DUF2285, the solution structure of QseM lacks H2b and H4. While lacking structure in the H4 r egion, this r egion of QseM contains the highly conserved GY sequence motif at the turn dir ectly pr eceding H4 that is present in other DUF2285 domains (Figure 2 A). QseM is more compact than the FseA DUF2285 due to the bend in H1. Notab ly, QseM lacks the e xtensi v e positi v ely charged surface present in the FseA DUF2285 that is involved in DNA binding (Figure 5 B). We also generated models of QseM using AF2. The AF2 QseM model agreed closely with the NMR structure (RMSD 1.71 Å over residues 15-71) (Figure 5 C), except that the predictions indicated the presence of the H4 helix at the C-terminus of QseM. The AF2 prediction, together with the dearth of chemical shift assignments in this r egion (e.g. for r esidues Thr74-Gly76, Lys78 and Glu79) and the presence of a corresponding H4 helix in FseA and RovC, suggest the existence of a conformational exchange process at the C-terminus of QseM, perhaps involving the folding-unfolding of a short H4 helix.

QseM is unlikely to bind DNA and is unable to inhibit FseA in vitro
The lack of the e xtensi v e positi v ely charged surface on QseM suggested that it may lack DNA-binding activity. Indeed, we observed no binding by QseM to the FseA box in EMSAs (Supplementary Figure S13A). We also failed to observe a complex between 6H-QseM and dimeric 6H-MBP-FseA in SEC experiments and did not observe any difference in 6H-MBP-FseA binding to the FseA box when co-incubated with excess 6H-QseM in EMSAs (Supplementary Figure S13), despite QseM binding Msi172 (composed of DUF6499; Figure 3 B) and FseA in bacterial two-hybrid assays. This suggests that either QseM cannot bind mature 6H-MBP-tagged FseA dimers, or that the 6H and 6H-MBP tags interfered with binding in vitro . To investigate whether the protein tags interfered with binding, a plasmid expressing 6H-QseM from the constituti v e nptII promoter was introduced into R7ANS carrying plasmid pSDZ-fseA -6H-P rdfS . Expression of 6H-QseM completely blocked FseA-6H-dependent activation of P rdfS both in the presence and absence of IPTG induction, suggesting that the 6H-tag of QseM or FseA did not interfere with their interaction (Supplementary Figure S14). Likewise, the activity of 6H-MBP-FseA was completely blocked by 6H-QseM in the absence of IPTG and partially reduced when induced with IPTG (Supplementary Figure S14), confirming that 6H-QseM was able to inhibit 6H-MBP-FseA in vivo . Ther efor e the 6H and 6H-MBP tags do not pre v ent QseM-FseA interactions in vivo . Taken together, the lack of QseM DNA-binding in EMSAs and a paucity of surface-e xposed positi v ely charged amino acids on QseM H3 suggests that QseM antiactivation of FseA does not r equir e interaction with DNA. Also, while tagged FseA and QseM appear to interact in vivo , no interaction was observed with tagged homomeric FseA and QseM in vitr o , indica ting tha t QseM cannot access its binding site in mature FseA dimers.

QseM H2 and H3 are not required for binding or antiactivation of FseA, while H1 and the C-terminus are essential
To identify regions of QseM key to its interaction with FseA in vivo , alanine-scanning mutagenesis was carried out for QseM. Alanine substitution variants were tested for their ability to inhibit FseA-6H activation of P rdfS using ␤galactosidase assays, and a selection of these variants were tested further for interactions with FseA in bacterial twohybrid assays. Gi v en that QseM lacks a positi v ely charged surface in the likely DNA-binding region, we wondered if exposed residues surrounding the H2-H3 region were important for FseA antiactivation. QseM variants carrying alanine substitutions of solv ent-e xposed charged residues in H2 and at the base of H3 (Glu41, Glu56, Arg57 and Arg59), corr esponding with r esidues in FseA r equir ed for activation of P rdfS and DNA binding, were therefore constructed. The QseM mutants exhibited wild-type-like FseA antiactivation and variants tested in bacterial two-hybrid assays (Glu56, Arg57 and Arg59) inter acted similar ly to wild-type 6H-QseM (Figure 4 C, D). These results suggest antiactivation of FseA did not involve the H2-H3 region of QseM, consistent with the QseM H2-H3 quasi-DNA-binding region not being r equir ed for antiactivation of FseA-dependent transcription.
Mutants of QseM containing alanine substitutions in residues Tyr18, Asp19, Tyr26, Leu29, Leu30, His65 and Leu66, which form interhelical contacts within the QseM structur e, wer e each r educed in their antiactivation of FseA (Figure 4 C, D), consistent with a role in maintaining QseM structure. In contrast, mutants with alanine substitutions in exposed residues bordering the structural core (Asp7, His22, Glu39, Glu41, Pro51, Glu56 and Arg59) were similar to wild-type QseM in their ability to r epr ess transcriptional activ ation b y FseA-6H (Figure 4 C, D). Howe v er, alanine substitutions of solv ent-e xposed residues Arg28 and Asp31 of H1 and C-terminal residues Arg68, Gly70 and Trp71 impaired or abolished antiactivation of FseA-6H (Figure 4 C,  D). Bacterial two-hybrid assays re v ealed that these same substitutions (apart from R28A) also reduced QseM binding to FseA (Figure 4 C, D). These results suggest that both the solv ent-e xposed side of H1 and the v ery C-terminus of QseM (corresponding to H4 of the DUF2285 domain in FseA) play an essential role in the binding and antiactivation of FseA.

QseM binds the DUF6499 domain of FseA
Pr eviously r eported bacterial two-hybrid assays indicated the N-terminal portion of FseA containing DUF6499 was sufficient for interaction with QseM ( 9 ). To further delineate regions of FseA required for QseM interaction, we constructed a series of FseA N-and C-terminal truncations (Supplementary Figure S15). Truncated FseA lacking both the middle and DUF2285 domains (FseA 1-85 ) interacted with QseM as strongly as wild-type FseA in bacterial two-hybrid assays (Supplementary Figure S7A), confirming our previous observations. Further truncation (FseA 1-55 ) reduced QseM interaction to ∼40% the strength of wildtype, suggesting that this truncation bordered residues that are critical for QseM binding (Supplementary Figure S7A). Truncation of the N-terminal 15 amino acids of FseA (FseA  ) had no effect on QseM binding, while truncations FseA  and FseA 25-266 exhibited severely reduced QseM interaction (Supplementary Figure S7A). This delineated FseA amino-acids 15-55, which contain helices ␣1 and ␣2, as being necessary and sufficient for QseM binding.  (Figure 2 A) and, together with ␣1, makes the majority of interdomain contacts formed between the FseA N-terminus and the DUF2285 domain in FseA structure predictions.
The DUF6499 ␣2 helix forms a hydrophobic pocket with helices H1 and H4 of the DUF2285 domain. The side chain of Phe34 of ␣2 is positioned in the centre of this hydrophobic pocket, whilst the nearby Trp32 and Arg36 residues are predicted to protrude away in the opposite direction (Figure 4B(iii)). We hypothesised that a Phe34 alanine substitution mutant might show reduced QseM interaction due to a loss of hydrophobic contacts with QseM, while mutants carrying alanine substitutions of Trp32 and Arg36 might show wild-type-like QseM interactions. Indeed, bacterial two-hybrid assays re v ealed reduced interaction of the F34A FseA mutant to QseM, whilst mutant FseA proteins carrying alanine substitutions of Trp32 or Arg36 exhibited wild-type-like QseM interactions (Figure 4 A, B). Together, these observations support the role of a hydrophobic pocket forming between the FseA DUF6499 domain and QseM during the QseM-FseA binding interaction and confirm that FseA residues Trp32 and Arg36 are not involved in the interaction with QseM.
The highly conserved DUF6499 ␣2 residue Arg37 is strongly co-evolving with Asp210 (in DUF2285 H1) in FseA homologues (GREMLIN; Supplementary Table S4) and the two residues are in contact in all FseA structure predictions and the RovC crystal structure (RovC Arg16 and Asp193) (Figure 3 D). Alanine substitutions in either FseA Arg37 or Asp210 abolished the ability of the mutant proteins to activate transcription from P rdfS (Figure 4 A,  B), supporting the importance of this contact in the activity of FseA. The QseM residue Asp31 of H1, which abolishes QseM activity w hen m uta ted (Figure 4 C , D), is reciprocal to Asp210 of the FseA DUF2285 domain. We hypothesised that Asp31 of QseM might interact with Arg37 of FseA. In-deed, FseA carrying an alanine substitution in the Arg37 residue showed near zero interaction with QseM in bacterial two-hybrid assays (Figure 4 A, B), making it probable that a salt-bridge forms between Arg37 of FseA and Asp31 of QseM. Together, these results are consistent with a model wherein the FseA DUF6499 ␣2 helix interacts with QseM H1 and C-terminus in a mechanism that is analogous to its interaction with the FseA DUF2285.
To visualize the potential interaction interface between QseM and the FseA DUF6499 domain, AF2 was used to produce a model of the FseA N-terminal domain fused to QseM (FseA residues 1-198 and QseM residues 17-83), such that the QseM DUF2285 domain replaced that of FseA. The resulting AF2 model placed QseM helices H1 and H4 cradling the FseA ␣2 helix as expected (Supplementary Figure S16A). This model placed QseM Asp31 (corresponding with Asp210 in the FseA DUF2285 domain) in contact with FseA Arg37, consistent with our prediction that these residues form a salt-bridge during the interaction of QseM and FseA. The AF2 model also placed H1 residues of QseM in contact with ␣1 of DUF6499. Residues in this interface notable for their conservation include Arg28 of QseM and Tyr19 of FseA. Alanine substitution of QseM Arg28 abolished r epr ession of FseA activation of P rdfS but did not reduce binding to FseA in bacterial two-hybrid assays (Figure 4 C, D), and alanine substitution of Tyr19 of FseA showed a minor decrease in binding to QseM (Figure 4 A, B). These data suggest the interaction of QseM with FseA ␣1 is not essential for FseA-QseM binding but is r equir ed for a producti v e antiacti vation interaction with FseA.
Rigid-body docking simulations (ClusPro) were performed with a truncated FseA structure (amino acids 9-184; N-terminus trimmed, DUF2285 domain removed) and either the QseM NMR structure (amino acids 11-83) or AF2 model. Interestingly, docking simulations of truncated FseA with the AF2-predicted QseM model, which contains H4, closel y a pproximated the FseA DUF2285-DUF6499 interaction (Supplementary Figure S16B). The H4 helix is present in the C-terminus of all DUF2285 domain structur e pr edictions and in RovC, wher eas the C-terminus of the QseM NMR structure is disordered. It is possible that the C-terminus of QseM forms a mor e structur ed helical (H4) region upon FseA binding. QseM residues directly preceding the putati v e fourth helix, such as Arg68 and Trp71, are highly conserved amongst QseM homologues and are critical for FseA antiactivation and binding (Figure 4 C, D), supporting a role for this region in the QseM-FseA interaction.
In summary, we confirmed that solv ent-e xposed QseM residues of H1 and the C-terminus, are required for effecti v e binding and activation of FseA and that residues in the FseA DUF6499 ␣2 helix are involved in binding QseM. Together, these observations suggest that QseM likely makes the same contacts with the DUF6499 domain as the FseA DUF2285 domain does.

DISCUSSION
The DUF2285-domain proteins FseA and QseM are master regulators of ICE Ml Sym R7A transfer and likely control the transfer of numerous mobile genetic elements present Nucleic Acids Research, 2023, Vol. 51, No. 13 6853 throughout the proteobacteria. In this work, we show that the DUF2285 domain r epr esents a pr eviously unr ecognized variant of the HTH motif. Purified 6H-MBP-FseA is dimeric in solution and binds the FseA box upstream of P rdfS , consistent with the function of FseA as a transcriptional activator. Structural predictions followed by mutational analyses re v ealed that FseA likely adopts a similar fold to the Yersinia pestis transcriptional activator RovC, and that the FseA DUF2285 domain exhibits an extensi v e positi v ely charged surface that is critical for its transcriptional activation and DNA binding functions. The NMR structure of the antiacti vator QseM re v ealed that it is comprised of HTH-like domain similar to the FseA DUF2285 domain, with disordered N-and C-termini. Despite the similarity of QseM to the DUF2285 of FseA, QseM has an ov erall negati v ely charged surface and is unable to bind the FseA box. Transcriptional activation and bacterial two-hybrid assays carried out with mutated and truncated FseA / QseM proteins re v ealed QseM achie v es antiactiv ation of FseA b y binding its N-terminal DUF6499 domain. Both FseA and QseM DUF2285 domains ar e pr edicted to similarly contact the highly conserved ␣2 helix of the DUF6499 domain. FseA substitution mutants in either the DUF2285 or DUF6499 domain that were predicted to disrupt this interaction destroyed the ability of the mutant FseA protein to activate transcription, while corresponding DUF2285 substitutions in QseM pre v ented it from antiactivating FseA. Ther efor e, DUF6499 constitutes a critical structural component of FseA and r epr esents the binding target of QseM.
The archetypical HTH domain forms a trihelical arrangement in which H2 and H3 are separated by a short, almost uni v ersally conserv ed 'turn' that is poor at tolerating insertions ( 36 ). In contrast, both FseA and QseM DUF2285 domains contain a substantial insertion in the turn which, in the case of FseA, includes an additional short helix, termed here H2b. Other HTH-domain proteins with extended turn motifs between H2 and H3 include the chitin sensor protein ChiS from Vibrio cholerae ( 37 ), and the Q antiterminator of lambdoid phages ( 38 ). The extended H2-H3 motifs of Q and ChiS function to enhance DNA interactions, making it possible that residues in the extended-turn of the FseA DUF2285 make DNA interactions. When compared to DNA-bound HTH domains that were detected in DALI searches, the FseA H2b appears to clash with DNA bases (Supplementary Figure S17A); howe v er, comparisons to some HTHs not detected in DALI searches, such as the DNA-bound winged-HTH of the transcriptional response regulator KdpE (PDB 4KNY ( 39 )), placed H2b in a position that does not clash with DNA and, importantl y, placed key DN A-binding H3 residues in the DNA major groove (Supplementary Figure S17B). Taken together, it is possible that H2b residues make DNA contacts that stabilize DUF2285-DNA binding and help to position key H3 residues in proximity of DNA major groove nucleobases.
The FseA box is notable because of the distance between the centre of each IR hexamer, which spans approximately two DNA turns (22 base pairs). We observed no evidence of additional binding e v ents in our EMSAs that might suggest sequential binding of two sites by sep-arate molecules. Ther efor e, individual DUF2285 domains in a single FseA dimer are likely to bind a single hexamer sequence. Modelling of FseA dimers based on an AF2predicted homodimeric FseA model produced a plausible protein-DNA comple x with indi vidual DUF2285 domain H3s positioned in proximity of a hexamer of the FseA-box IR (Figure 6 A). Overall, we propose that the extended-turn and e xtensi v e positi v ely charged surface of the DUF2285 domain function to stabilize FseA dimer binding over the span of the 28-basepair FseA box.
FseA homologues ar e widespr ead thr oughout pr oteobacteria. Their r epr esentation and, r elatedly, the coexistence of DUF2285 and DUF6499 domains is notably underestimated due to the presence of a +1 PRF. In our curated dataset, around 61% of FseA homologues r equir e a +1 PRF, and are thus not annotated as intact proteins by highthroughput genome annotation pipelines. We have not detected any case where a bona fide DUF6499 domain is not adjoined to a DUF2285 domain. There is a knock-on effect of this segregation of apparent open reading frames in protein domain prediction tools (e.g. Pfam ( 40 )), which may be unable to detect structural motifs in what are, in practical terms, highly conserved single proteins. It is also possible that the DUF2285 domain was not initially identified as a variant of the HTH domain because of the unusual sequence distance between H2 and H3. Having shown here that the DUF2285 domain is a novel HTH domain variant, we propose that DUF2285 should be included in the Pfam HTH superfamily and that FseA-like variants be named to Helix-extended-Turn-Helix domain-containing proteins (HeTH) to better distinguish this family of proteins. Overall, these observations suggest that other DUFs may be variants of well-characterized domains and, with de v elopments in ab initio contact and structur e pr ediction, may lead to other DUFs to be more readily linked to existing structural families.
QseM likely e volv ed from a gene duplication of an FseA ancestor that retained DUF6499 binding while losing the ability to bind DNA and homo-oligomerize. While most HTH domains bind to DNA, some have evolved to mediate pr otein-pr otein inter actions or make structur al units in enzymatic complexes (41)(42)(43). We propose a model in which QseM pre v ents FseA from adopting its nati v e conformation r equir ed for transcriptional activation by forming a QseM-DUF2285 -FseA-DUF6499 heterodimer, wherein QseM mimics the FseA DUF2285 domain (Figure 6 B).
Protein structur e pr ediction and comparison with the RovC crystal structure suggest that the DUF2285 domain C-terminus contains a short helix (H4), which is disordered in the QseM NMR structure. Disor der-to-or der transitions are a well-described feature of regulatory networks. For example, the interaction domains of the p160 and p300 hormone receptor coactivators are intrinsically unstructured; howe v er, upon interaction they form a structured trihelical arrangement ( 44 ). It is thus likely that the QseM C-terminus is disordered in isolation but adopts an ␣-helical structure when interacting with FseA. The inherent flexibility of this region may allow it to contribute to more di v erse proteinprotein interactions, such as the currently structurally unchar acterized inter action between QseM and the unrelated quorum-sensing transcriptional activator, TraR. QseM may bind and sequester FseA while FseA is partially unfolded, or prior to the formation of transcriptionally acti v e dimers. This suggestion is based on the preliminary observations that purified 6H-QseM does not bind dimeric 6H-MBP-FseA or inhibit its DNA-binding activity in vitro , despite 6H-QseM being capable of antiactivating 6H-MBP-FseA in vivo . It is also possible that in vivo QseM acts early in the life of FseA, perhaps binding to the DUF6499 domain during its translation. The positioning of the PRF that results in the fusion of the DUF6499 domain to the remainder of FseA is curious because the highly conserved adjacent WGL sequence is encoded by a ribosomal binding site-like mRNA sequence (UGGGGG). This sequence may stall translation, allowing the nascent FseA DUF6499 domain to bind to QseM prior to translation of the remainder of FseA. Gi v en that ov ere xpression of FseA is bacteriostatic in M. japonicum R7A (mediated through induction of P rdfS ( 6 , 7 )), acute negati v e regulation of functional FseA is essential for R7A cells to survi v e. This selection pr essur e has likely led to the evolution of the observed multi-layered transcriptional, translational, and post-translational r epr ession of FseA through QseM antiactivation of TraR, the +1 PRF and QseM antiactivation of FseA, respecti v ely.
In summary, we show that the DUF2285 and DUF6499 domains form an interacting pair. While both domains are commonly found within a single FseA-family protein that is capable of transcriptional activation, 'loss of function' QseM-family variants containing only the DUF2285 domain are capable of binding and inhibiting the conjoined activator. The detection and determination of function of the DUF2285 and DUF6499 domains had been obscured in genomic analyses but has been resolved here by a comprehensi v e structure-function analysis.

DA T A A V AILABILITY
The QseM structure and data have been deposited in the PDB with the code 7UQT. SAXS data have been deposited to the SASBDB under the accession code SASDNM8.