Structural insights into CodY activation and DNA recognition

Abstract Virulence factors enable pathogenic bacteria to infect host cells, establish infection, and contribute to disease progressions. In Gram-positive pathogens such as Staphylococcus aureus (Sa) and Enterococcus faecalis (Ef), the pleiotropic transcription factor CodY plays a key role in integrating metabolism and virulence factor expression. However, to date, the structural mechanisms of CodY activation and DNA recognition are not understood. Here, we report the crystal structures of CodY from Sa and Ef in their ligand-free form and their ligand-bound form complexed with DNA. Binding of the ligands—branched chain amino acids and GTP—induces conformational changes in the form of helical shifts that propagate to the homodimer interface and reorient the linker helices and DNA binding domains. DNA binding is mediated by a non-canonical recognition mechanism dictated by DNA shape readout. Furthermore, two CodY dimers bind to two overlapping binding sites in a highly cooperative manner facilitated by cross-dimer interactions and minor groove deformation. Our structural and biochemical data explain how CodY can bind a wide range of substrates, a hallmark of many pleiotropic transcription factors. These data contribute to a better understanding of the mechanisms underlying virulence activation in important human pathogens.


INTRODUCTION
Staphylococcus aureus ( Sa ) and Enterococcus faecalis ( Ef ) are two Gram-positi v e bacteria that can reside in humans commensally or as pathogens able to cause lifethreatening infections. The transition between their commensal and pathogenic states is controlled by a complex network of transcription factors which adjust gene expression and ensure adaptation to di v erse host environments. The pleiotropic transcription factor protein CodY is a global transcription factor, found in almost all low G + C Grampositi v e bacteria. In these bacteria, CodY functions as a regulator of transcription for se v eral hundred genes. The majority of CodY target genes encode for proteins involved in metabolic pathways; howe v er, CodY also directly or in- The key role of CodY in the transition between commensal and pathogenic states warrants a detailed understanding of the molecular mechanisms underlying CodYdependent gene regulation. Structural analysis of free-and ligand-bound CodY from Bs, B. cereus ( Bc ) and Sa has provided important insights into the mechanism of BCAA and GTP binding and ligand-induced domain movements ( 15 , 18 ). Howe v er, the link between ligand binding and DNA-binding remains poorly understood, and above all, the basis of CodY target-site recognition and cooperativity is not known. Here, we report biochemical data and X-ray structures of CodY from Sa and Ef (r eferr ed to as SaCodY and EfCodY) in ligand-free form and in complex with a 30-nt DNA consensus sequence comprising two 15-nt binding sequences with a 6-nt overlap. Our da ta illumina te the ligand-induced structural changes that govern CodY activity and explain how cooperative DNA binding enables two CodY dimers to reco gnize overla pping binding sites of highly variable sequence. Our analysis highlights mechanistic similarities and differences between CodY proteins and provides a stepping-stone for therapeutic targeting of the virulence regulator CodY.

Cloning, expression and purification
The full-lengthenlargethispage -8pt SaCodY and EfCodY coding sequences (UniProt Q2FHI3 and Q834K5) were PCR-amplified from genomic DNA and ligated into a pETHis 1a vector ( 26 ) using the Nco1 and Acc651 restriction sites. Mutations in the coding sequences were gener ated using over lap extension PCR. All constructs were verified by DNA sequencing. For CodY ov ere xpression, the pETHis 1a-CodY vectors were transformed into Esc heric hia coli strain BL21(DE3)pLysS. An overnight culture of the transformed strain in LB medium was diluted 1 / 100 and cultured at 37 • C until a cell density of 0.4-0.6 at OD 600 . At this density, the temperature was reduced to 18 • C and CodY expression was induced with 0.5 mM isoprop yl ␤-D -1-thiogalactop yranoside (IPTG). After IPTG addition, the cells were further cultured for ∼16 h at 18 • C before harvest. After cell-lysis by sonication, CodY was captured from the supernatant on Ni-NTA (Qiagen). After elution with imidazole, CodY was incubated for ∼12 h at 4 • C with Tobacco Etch Virus (TEV) protease (100:1 molar ratio) to cleave the N-terminal His 6 -tag. The cleavage reaction --dialyzed with a 15-kDa cutoff membrane against imidazole-free buffer --was reloaded onto Ni-NTA to capture uncleaved CodY and the His 6 -tagged TEV protease. The flow-through contained cleaved, full-length CodY with a glycine and alanine r esidue --r emnants of the TEV cleavage site preceding the start methionine residue. CodY was further purified by size exclusion chromato gra phy on a Super de x S200 column (Cytiva). Note that the NaCl salt concentration during the entire CodY purification procedure was kept at a minimum of 200 mM to pre v ent CodY aggrega tion. SaCodY was concentra ted to 360 M (the gi v en molarities refer to the CodY protomer concentrations) in the Super de x S200 buffer containing 20 mM Tris-HCl pH 8, 200 mM NaCl. EfCodY was concentrated to 690 M in Nucleic Acids Research, 2023, Vol. 51, No. 14 7633 the Super de x S200 buffer containing 20 mM Tris-HCl pH 8, 500 mM NaCl.

Bio-la y er interfer ometry
5 -Biotinylated DNA-oligos (Eurofins) were annealed with non-biotinylated complementary DNA-oligos by mixing them 1:1.1, heating the mixture for 5 min in boiling water, and slow cooling. 100 nM dsDNA --containing a 3-nt single-stranded overhang at the biotinylated 5 -end --was captured on Streptavidin (SA) biosensors of the Octet system (Sartorius). The loaded biosensors were subsequently incuba ted a t 25 • C and a shaking speed of 1000 rpm with dif ferent concentra tions of CodY in a buffer containing 20 mM Tris-HCl pH 8, 150 mM NaCl, 2 mM MgCl 2 and 0.1% NP-40. 2 mM GTP and 10 mM of either Ile, Leu or Val was added singly or in combination to the buffer in separate experiments performed in independent triplicates. The times for the base line, association, and dissociation steps were 300, 600 and 600 s, respecti v el y. CodY-DN A binding and dissociation was measured in real-time. After control and r efer ence subtraction, as well as base line alignment, kinetic parameters were analysed using the Octet Analysis Studio Software. The binding curve data of twofold CodY-dilution series and a 2:1 interaction model for global curve fitting were used to determine the dissociation constant (KD). The KD at equilibrium was calculated with the steady state equation:

Mass photometry
Mass photometry analysis was performed using a Refeyn OneMP mass photometer (Refeyn Ltd). Movies of 6000 frames (60 s) were acquired using AcquireMP software with default settings. Briefly, contrast-to-mass calibration was performed using a nati v e-mar ker protein standar d mixture composed of eight pr oteins fr om 20 to 1200 kDa (Nati v eMar k Unstained Protein Standard, Thermo Fisher) in phospha te-buf fered saline (PBS). Prior to the measurements, the objecti v e was focused on the surface of the glassbuffer interface with 8 l of 20 mM Tris-HCl pH 8, 200 mM NaCl. Movies wer e acquir ed after addition of 8 l of 100 nM CodY-proteins in the same buffers. The recorded data were processed using DiscoverMP software (Refeyn Ltd). The data were plotted as mass distribution histograms, and the distribution peaks were fitted with Gaussian functions to obtain the average molecular mass.

Size-e x clusion chromatography coupled to multi-angle light scattering (SEC-MALS)
SEC-MALS was performed using an Ä KTApure system (GE Healthcare) coupled to a miniDAWN TREOS II detector and an OptiLab T-rEX online refracti v e inde x detector (Wyatt Technology). The absolute molar mass was calculated by analysing the scattering data with the ASTRA analysis software package, version 7.2.2.10 (Wyatt Technology). Bovine serum albumin (BSA) was used for calibration, and proteins were separated on a Super de x 75 10 / 300 analytical column (GE Healthcare) at a flow rate of 0.4 ml / min. CodY (200 l of 35 M) was injected and eluted in 20 mM Tris-HCl pH 8, 200 mM NaCl. The refracti v e inde x increment was set at 17.66 M for EfCodY and 5.37 M for SaCodY, and the extinction coefficient for ultraviolet detection at 280 nm was calculated from the primary sequences.

Crystallization
Crystals were grown by sitting-drop vapour diffusion at 18 • C and appeared within 2-10 days. For crystallization of ligand-free CodY, SaCodY (270 M) in 150 mM NaCl was mixed 1:1 with 0.2 M ammonium sulphate, 0.1 M sodium acetate pH 4.6, and 25% (w / v) polyethylene glycol (PEG) 4000. Crystals were cryo-protected by a brief soak in mother liquor supplemented with 35% (w / v) PEG 4000. EfCodY (600 M) in 450 mM sodium chloride was mixed 1.3:1 with 0.2 M ammonium phosphate, 2.5% ethanol and 23% (w / v) PEG 3350. Crystals were cryo-protected by a brief soak in mother liquor supplemented with 35% (w / v) PEG 3350. For crystallization of the CodY-DNA complexes, the DNAs were prepared as follows: DNA-oligos (Eurofins) were dissolved in 10 mM Tris-HCl pH 8, 50 mM NaCl, and 1 mM ethylenediaminetetraacetic acid (EDTA) to a final concentration of 0.4 mM. Complementary DNAoligos wer e mix ed 1:1 and annealed by placing the mixture for 5 min in boiling water, followed by slow cooling. The CodY-DNA complex es wer e pr epar ed by mixing CodY with DNA at a molar ratio of 4:1 to final concentrations of 140 M and 35 M (2 mM GTP, 20 mM Ile) for the Sa -complex, and 200 M and 60 M (4 mM Leu) for the Ef -complex. The mixtures --with final NaCl concentrations of 150 mM --were incubated for at least for 4 h at room temperature (RT). Se v eral DNAs from 28 to 32 base pairs (bp) in length with different 5 -and 3 -termini (bluntended or sticky-ended) containing sequence-optimized overlapping binding sites ( 24 ) were used for crystallization trials. Diffracting crystals for both SaCodY and EfCodY were obtained with a blunt-ended 30-bp DNA (5 -GAT AA TTTTCAGAA TTTTCAGAAAA TT TAG-3 ; CodY consensus sequence is highlighted in italics). For this, the SaCodY-DNA complex was mixed 1.5:1 with 0.15 M ammonium sulphate, 0.1 M MES pH 5.4, and 25.5% (w / v) PEG 4000. Crystals were cryo-protected by a brief soak in mother liquor supplemented with 35% (w / v) PEG 4000. The EfCodY-DNA complex was mixed 2:1 with 0.01 M cobalt chloride , 0.01 M manganese chloride , 0.1 M sodium acetate pH 4.6, and 1 M 1,6-hexanediol. Crystals were cryo-protected by a brief soak in mother liquor supplemented with 20% glycerol.

Structure determination, model building and refinement
Dif fraction da ta were collected a t 100 K a t the MAX IV synchr otr on in Lund (SaCodY, beamline Biomax, = 0.9762 Å ) and the European Synchrotron Radiation Facility in Grenoble (SaCodY-Ile-GTP-DNA, beamlines ID23-2, = 0.8731 Å ; EfCodY, ID30B, = 0.9763 Å ; and EfCodY-Leu-DNA, ID30B, = 0.9116 Å ). Diffraction images were processed with XDS ( 27 ) and scaled and merged using AIM-LESS from the CCP4 software suite ( 28 ). All structures were determined by molecular replacement with the program PHASER from the PHENIX program suite ( 29 ) using the ligand-bound SaCodY structure, PDB code 5ey1 ( 15 ) as the initial search model. The atomic models were manually built using the program COOT ( 30 ) and refined with PHENIX Refine ( 29 ) using non-crystallo gra phic symmetry (NCS) restraints ( 31 ). Each chain of the pseudopalindromic DNA is numbered from -2 to + 27. Note that crystal packing interactions in the SaCodY-and EfCodY-DNA complexes did not involve linker helices and DBD domains. Interface residues and nucleotides were well defined in the electron density, but at the present resolution, we were unable to confidently resolve interfacial water molecules. Data collection and refinement statistics are shown in Supplementary Table S1. The diffraction data of the SaCodY-Ile-GTP complex with DNA were processed and refined in space group P 6 1 22. The asymmetric unit consisted of one SaCodY dimer and one ssDNA comprising both DNA strands, refined with half occupancy. The asymmetric unit of the crystal and the biological assembly of the SaCodY-Ile-GTP complex with DNA are shown in Supplementary Figure S1. POLDER maps ( 32 ) were used to verify Ile and GTP binding to protomers A and B in the SaCodY-DNA complex structure, and Leu-binding to protomers A and B in the EfCodY-DNA complex structure. For all ligands, the polder map was likely to show the omitted atoms (CC( 1 , 3 ) values > 0.75, Supplementary Figure S2. GTP binding was better defined in the electron density of protomers A and C than in protomers B and D. The following residues could not be modelled due to lack of density, suggesting that these r esidues ar e fle xib le: ligand-free EfCodY, Lys260 in chain A, Val23-Glu26 in chain C, and Asn182-Lys260 in chain C. Furthermore, almost all side chains within the DBD domain of protomer B were poorly defined resulting in a high number of RSRZ outliers: EfCodY-Leu-DNA, Asn18-Val23 in chain C. The DNA inter-strand phosphatephosphate distances were calculated using the 3DNA program ( 33 ). Figures were prepared with ICM browser ( https: //www.molsoft.com ) and CCP4mg ( 34 ).

Leucine promotes EfCodY DNA binding
To gain a better understanding of how ligand binding promotes DNA binding, we used bio-layer interferometry. For this, biotinylated DNA substrates were immobilised on streptavidin biosensors and incubated with CodY in solution to analyse the affinity of the interaction. As a DNA substrate we used the DNA sequence of the wellchar acterized over lapping CodY-binding sites in the hutP operator of Bs ( 24 ). For comparison, we analysed the affinity of both SaCodY and EfCodY in the absence and presence of the ligands BCAAs and GTP ( Figure 1 ). The analysis showed that SaCodY was activated by Ile (and to a lesser extent by Leu) and GTP as expected, but only if both wer e pr esent sim ultaneousl y. Furthermor e, we uncover ed that --in contrast to SaCodY --only Leu activated DNA binding of EfCodY. Consistent with prior predictions based on sequence homology ( 15 , 16 ), GTP did not activate Ef-CodY. Together with Leu, GTP reduced activation, presum-ably by interacting with the protein and preventing Leuinduced conformational changes.

Ligand-free SaCodY is monomeric and ligand-free EfCodY is dimeric
Ne xt, we solv ed the crystal structur es of ligand-fr ee SaCodY and EfCodY. Both structures are similar to previously reported CodY structures in that they have physically separated GAF and DBD domains connected by the extended linker helix ( 15 , 18 ). Unexpectedly however, the structur es r e v ealed a pre viously undescribed monomeric form of SaCodY and a dimeric EfCodY (Figure 2 A). Mass photometry analysis of ligand-free SaCodY and EfCodY in solution supported the oligomerization states observed in our crystal structures. Ne v ertheless, ligand-free SaCodY can also form dimers, as confirmed with SEC-MALS (Figure 2 B, C). This suggests that a monomer-dimer equilibrium may be a factor in regulation of SaCodY activity. The pr eviously r eported crystal structur es of ligand-fr ee BsCodY ( 18 ) and BcCodY ( 15 ) are tetrameric CodY, but we found no evidence for tetrameric forms of SaCodY and Ef-CodY in our mass photometry or SEC-MALS experiments. This is in line with prior suggestions that the tetrameric form in the crystal lattice of ligand-free BsCodY is caused by the high CodY concentrations used for crystallization. In other words, the tetramer-interface is probably an artefact of the crystallization ( 18 ).
The GAF domains of the ligand-free EfCodY protomers pack with close to perfect 2-fold symmetry. Howe v er, this symmetry breaks down in the C-terminal part of the linker helices as well as the DBD domains, which all assume different orientations in the four protomers in the asymmetric unit. Furthermore, for protomer chain C, parts of the linker helix and the DBD domain could not be modelled (residues 181-260). This evidenced the flexibility of them in the ligand-free EfCodY, in consistency with previous structures of CodY ( 15 ). The asymmetric unit of the ligand-free SaCodY crystals contains two well-structured monomers with their GAF and DBD domains tightly packed head-totail (Supplementary Figure S3AB). Interestingly, the buried surface area in the head-to-tail packing of two ligand-free SaCodY monomers in the crystal is similar in size to the buried surface area of two protomers in the Ile / GTP-bound dimer ( ∼1600 Å 2 , PDB code 5ey0 ( 15 )). Ther efor e, it is possible that the dimer observed in the SEC-MALS experiments of ligand-free SaCodY (Figure 2 C) is the nonbiological head-to-tail dimer observed in the crystals. To determine if the DBD domain of ligand-free SaCodY is flexible in solution, we used hydrogen deuterium exchange mass spectrometry, which did show the DBD domain to be flexible (Supplementary Table S2 and Supplementary Figure  S3CD).

Ligand binding induces a conformational change that is propagated to the linker helix
The GAF domain consists of a fiv e-stranded ß-sheet (S1-S5), packed on the inner face against a 3-helix bundle (H1, H2, and the N-terminal part of the linker helix). The outer face of the ß-sheet is packed against a more irregular structure of two long loops comprising ␣-helices H3 and H4. In the structure of the Ile / GTP-bound SaCodY, the GTP is located close to the dimer interface. Two residues from the linker helix, and six residues from three regions: the two loops connecting H1-H2 (motif 1) and S1-S2 (motif 2), as well as the H3-H4 linker (motif 3) tightly interact with the GTP molecule ( 15 ). In contrast, the Ile binding site is distant from the dimer interface; it is located on the outer face of the ß-sheet, where two long loops including ␣-helices H3 and H4 wrap around the Ile molecule ( 15 , 18 , 35 ).
To elucidate structural changes induced on the GAF domain by Ile and GTP, we compared the ligand-free and ligand-bound SaGAF domain structures. The most apparent structural change occurs on the outer face of the ßsheet, in the region between ß-strands S2 and S3 including also helix H3. Here, Ile-binding induced a large shift of the entire helix H3 (residues Gln60-Glu68) (Figure 3 A, Supplementary Figure S4). The ∼14 Å movement of the C ␣ atom of Arg61 in the ligand-bound structur e ex emplifies the magnitude of this shift from its position on the surface in the ligand-free structure (Figure 3 A). In more detail, in absence of ligands, Arg69 in the H3-H4 linker, extends its side chain into the vacant Ile binding site making a hydrogen bond to the main chain carbonyl group of Val97.
Following ligand binding and the H3-helix shift, Arg61 (loca ted a t the H3 N-terminus) occupies the equivalent position of Arg69. There it forms a salt bridge with the carboxylate group of the bound Ile molecule (Figure 3 A). Meanwhile, Ar g69, moves ∼11 Å to wards the S1-S2 ␤-hairpin. Together with His70 and Ile71, it forms a ß-sheet interaction with Gly46, Lys47 and Ile48 in the S1-S2 loop. These two regions, the S1-S2 loop and the H3-H4 linker, compose the GTP-binding motifs 2 and 3 respecti v ely ( 15 ). These in turn together form the binding pocket for the GTP phospha tes. It appears tha t the induced ß-sheet interactions adjust residues Ser43, Arg44, Arg45, Lys47 in motif 2 to allow direct phosphate interactions and stabilize the large displacement of motif 3. This displacement positions His70 sufficiently close to form a water-mediated phosphate interaction.
The guanine moiety of GTP makes Watson-Crick-like interactions with the main-chain nitrogen atom of Phe24, the main chain oxygen atom of Val22, and the side chain carboxyl group of Glu153. Furthermore, the guanine base is stabilized by stacking on the side chain of Phe24. Residues Val22 and Phe24 reside in the H1-H2 loop region which composes the GTP binding motif 1 ( 15 ). In the ligandfr ee structur e, eleva ted B-factors suggest tha t the H1-H2 loop region (residues Leu14-Phe24) has increased flexibility. The formation of the Watson-Crick-like interactions of the guanine base of GTP induces a conformational change that stabilizes the H1-H2 loop (Figure 3 A). The loop backbone now hydrogen bonds with the side chain of linker helix-r esidue Glu153. Mor eover, r esidues Leu14, Gln15 and Lys16 become ␣-helical, extending H1 with one turn and forming hydrogen bonds with residues in protomer B (e.g. to residue Thr148 in the linker helix, and to residues Gly118, Gly119 and Gly120 in the S3-S4 ␤-hairpin loop). Superimposition of monomeric ligand-free and dimeric-activated SaGAF domains shows that residues Gln15-Lys18 in the ligand-free domain cause steric clashes with the linker helix and S3-S4 loop of protomer B. This may explain the Nucleic Acids Research, 2023, Vol. 51, No. 14 7637  Figure S5).
In conclusion, our data suggest that: (i) Ile binding induces structural changes in the GTP-binding motifs 2 and 3, which enable the GTP phosphate interactions necessary for efficient GTP binding and (ii) the guanine base of the bound GTP nucleotide induces the structural changes in motif 1 that are then propagated to the pr otomer-pr otomer interface. This implies that Ile and GTP binding are structurally linked and that Ile and GTP act synergistically. The affinity of SaCodY for the hutP operator sequence showed tha t ef ficient DNA binding activity is indeed dependent on the presence of both ligands ( Figure 1 ).

Activated SaCodY and EfCodY are structurally similar
We determined the crystal structure of the ligand-free form of EfCodY; howe v er, we did not manage to obtain diffracting crystals of Leu-bound EfCodY. Fortunately, the crystal structure of the EfCodY-DNA complex --presented in more detail below --had Leu-bound GAF domains. The crystal structure of the SaCodY-DNA complex --also presented below --had Ile / GTP-bound GAF domains. The Ile / GTP-bound GAF domains of SaCodY in its free and DNA-bound forms are essentially identical, demonstra ting tha t the GAF domains do not undergo a structural change upon DBD domain-DNA interaction. This allowed us to characterize Leu-activation of EfCodY by analysing the GAF domains in the ligandfree and the Leu / DNA-bound EfCodY and to compare Leu-activation of EfCodY to Ile / GTP-activation of SaCodY.
The structures of the activated GAF domains in Leubound EfCodY and Ile / GTP-bound SaCodY are remarkab ly similar. Howe v er, ther e ar e noticeable differ ences in the structural changes leading to the activated GAF domains. In ligand-free EfCodY, residues Leu60-Gln78 / Glu55-Ser73 (EfCodY / SaCodY sequence numbering = shift -5) form an extended loop with a few loop residues involved in crystal contacts. The fold of the loop in all four protomers is identical e v en though the crystal contacts are not identical, suggesting that the loop structure in the crystal structure also r epr esents the structur e in solution. The most apparent difference of the loop structure to that in SaCodY is the absence of helix H3 (Supplementary Figure S6). Nevertheless, Lys74 (Arg69 in SaCodY) extends into the vacant Leu binding pocket. Leu binding induces the H3forma tion associa ted with a large displacement of loop residues L ys74, L ys75 and Phe76. These r esidues now r eside in the H3-H4 linker forming a ß-sheet interaction with Gly51, Asp52 and Leu53 in the S1-S2 loop --similar to what we observed in SaCodY (Figure 3 B). Arg66 moved into the position of Lys74 and occupies the equivalent position to Arg61 in Ile / GTP-activated SaCodY, forming a salt bridge with the carboxylate group of the bound Leumolecule. The H1-H2 loop, which in EfCodY contains an insertion of the four additional residues 25-AlaGluLeuPro-28, also changes its conformation upon Leu-binding including breaking of the hydrogen bond between the loop backbone and Gln158 in the linker helix (the homologue of Glu153 in SaCodY). Moreover, similar to what we observed in SaCodY, residues Gln16, Lys17 and Asn18 become ␣-helical. These residues are in contact with the S3-S4 loop and linker helix of protomer B in both the ligand-free and Leu-bound EfCodY dimer. The H1-extension therefore triggers a concerted movement of the S3-S4 loop and linker helix, leading to a striking different 4-helix bundle arrangement at the GAF dimer interface (Figure 4 ).  Importantly, Lys74 in the H3-H4 linker, stabilized by the ß-sheet, now moved close enough to the H1-H2 loop to form a direct salt bridge with the insertion residue Glu26. In EfCodY, the structural changes in the BCAA-binding site are thus directly coupled to changes in the H1-H2 loop --as opposed to via GTP in SaCodY. Consistent with this notion, DNA binding in EfCodY can be activated by Leu alone (Figure 1 ). To further confirm the importance of Lys74 for EfCodY activation, we analysed a Lys74Ala mutant. Indeed, the mutant binds Leu with comparable affinity to the wild type but has gr eatly r educed DNA-binding activity (Supplementary Figure S7). This argues for a crucial role of Lys74 in propagating the structural changes in the Leubinding site to the H1-H2 loop and pr otomer-pr otomer interface.

SaCodY and EfCodY in complex with DNA
To understand the structural basis of DN A reco gnition and cooperativity in CodY, we determined crystal structures of SaCodY and EfCodY in complex with DNA. To create high-affinity CodY binding sites for this purpose, we designed DNAs that contained overlapping binding sites with optimized sequence similarity to the consensus sequence ( Figure 5 A). We obtained crystals of a 30-bp DNA bound to SaCodY and EfCodY and determined the structures at a resolution of 3-3.2 Å . In the crystals, two CodY dimers bind to a single DNA duplex. The SaCodY-DNA complex was co-crystallized with Ile and GTP (Figure 5 B). Both ligands are clearly visible in the electron density of the four independent protomers. The EfCodY complex was co-crystallized in the presence of Leu, yet Leu bound only to protomers A and B (dimer 1) (Figure 5 C, Supplementary Figure S2). In protomers C and D (dimer 2), the residues Glu58-Lys75 involved in the H3-helix shift had overall weaker electron density compared to ligand-free EfCodY. Furthermore, the H1-H2 loop defined by residues Asn19-Glu26 had weaker electron density in all four protomers.

Description of CodY dimers bound to DNA
In the CodY-DNA complex, the SaCodY-and EfCodYdimers adopt a dumbbell-shaped structure in which the dimerized GAF and DBD domains form two physically separated lobes connected by the extended linker helices ( Figure 5 ). The DNA-bound CodY-dimer is stabilized by polar and hydrophobic interactions between the CodYprotomers, comprising residues in both the GAF and DBD lobes as well as the linker helices. The dimer interface buries ∼1900 Å 2 of solv ent-accessib le surface area, with the GAF lobe accounting for ∼67% of the interface and the DBD lobe for ∼22%. In the GAF lobe, H1 and the linker helix form the 4-helix bundle with a mainly hydrophobic protomer-protomer interface. In the DBD lobe, the C-terminal ends of the recognition helix are w edged in betw een the C-terminal ends of the linker helices. Mainly polar interactions are formed between recognition helix residues from both protomers --assembling an unusual recognition helix dimer --and between recognition helix residues of one protomer with linker helix residues of the other protomer ( Figure 6 ). Residues in the entire CodY pr otomer-pr otomer interface are highly conserved --most notably in the DBD domains where the recognition helix interface-r esidues ar e invariant in all CodY proteins (Supplementary Figure S8).

Spatial arrangement of CodY dimers bound to DNA
A prominent feature of the DNA complex structure is the relati v e spatial arrangement of the two CodY-dimers. The dimers ar e r ela ted by a 60 • rota tion and a 30-Å translation along the DNA, binding one side of the DNA helix. The spa tial rela tionship of the two dimers of SaCodY is defined by a crystallo gra phic axis. In the Ef -complex, the DBD domains of the two dimers are related by a 60 • rotation as in the Sa -comple x; howe v er, their GAF domains are related by a ∼45 • rotation due to a shift in the position of the N-terminal parts of the linker helices ( Figure 5 ). This dimer arrangement is stabilised not only by interactions of each dimer with the DNA, but also by direct interactions between the two dimers, r eferr ed to as cross-dimer interactions. A cross-dimer interaction is formed at the DBD domains where the S7-S8 ␤-hairpin of one dimer and the H6 N-terminus of the other dimer reciprocally interact with each other, covering an interface of ∼380 Å of the solventaccessible surface area. The interface is rigid and essentially the same for the Sa -and Ef -complexes. Interactions are mediated by residues Arg238, Leu240, and Phe246 (S7-S8 ß- hairpin) and 184-Leu,Ser ,Tyr ,Ser-187 (H6 N-terminus, Ef-CodY numbering) (Figure 7 A). These residues are strictly conserved in all CodY-proteins except for Tyr246 that is also found as Phe or His. As noted above, we found no experimental evidence (by either mass photometry or SEC-MALS) that SaCodY or EfCodY forms tetramers in the absence of DNA. This argues that the stable cross-dimer interaction seen in the crystal is DNA-dependent.

Cross-dimer inter actions ar e r equir ed f or cooper ative DNAbinding
To assess the functional relevance of the DBD-DBD crossdimer interaction for cooperativity, we replaced residues Tyr186, Arg238, Leu240 and Tyr246 in EfCodY with Ala and evaluated the quadruple mutant for DNA affinity. The EfCodY-mutant is still dimeric in solution (data not shown from mass photometry and SEC-MALS); howe v er, bio-layer interferometry showed that this mutant was unable to bind to the 24-nt sequence-optimized overlapping sites (Supplementary Figure S9). This failure to bind DNA could be because the mutated residues might not only mediate cross-dimer interactions, but also promote DNAbinding, a notion supported by close proximity of mutated residues and protein-DNA interface. Therefore, we used a different approach and reduced the binding site overlap from 6-nt to 5-nt and 4-nt. We expected that the altered angle and increased distance between the two dimers would pre v ent cross-dimer interactions. Affinity measurements showed that EfCodY binds sequence-optimized 25nt and 26-nt overlapping sites with low affinities, comparable with that to a single site (Figure 7 B). This indicates tha t la teral stabiliza tion of CodY-dimers on DNA through cross-dimer interactions and tetramerization is r equir ed for the cooperati v e binding of CodY to DNA.

Non-canonical DNA binding by CodY
The two symmetrically positioned wHTH motifs in each CodY dimer bind to the 15-nt binding site such that each motif recognizes one half-site. The dimerised recognition helices sharply angle away from the DNA axis and insert the N-terminal ends centrally into the major groove. The S7-S8 ß-hairpins extend to each side and insert the wings into the adjacent minor grooves --at the binding site 5 -and 3ends. Consequently, in the DNA containing the 24-nt overlapping sites, the two wings in the central shared half-site (nt 9-16) insert into the minor grooves next to the centre of each other's site. Note that --although CodY contains a canonical wHTH-motif ( 36 ) --CodY-wHTH-DNA binding is non-canonical in that dimerized recognition helices insert into a single major groove. This is in contrast to canonical wHTH binding, in which separate recognition helices insert singly into consecuti v e major groov es (Supplementary Figure S10). An recognition helix-dimerized form of the wHTH-motif bound to DNA similar to that of CodY has been reported for FadR ( 37 ) and TubR ( 38 ). The DNA interacting residues are conserved in all CodY proteins, and SaCodY and EfCodY form essentially identical interactions with the DNA (superimposing DBD-residues 175-245 in SaCodY with the homologous residues 180-250 in EfCodY gi v es RMSD values of between 0.4-0.5 Å ). The two CodY-dimers interact with all 24 nucleotide pairs of the overlapping binding sites burying ∼3500 Å of solventaccessible surface area. DNA interactions are primarily mediated by hydrogen bonds of residues with the sugarphosphate backbone of the DNA, indicati v e of an indirect or shape readout mechanism. As will be discussed, only two base-specific hydrogen bonds, namely with residues Ser215 and Met237 (numbering from here on corresponds to SaCodY), are formed between each protomer and the half-site DNA (see Figure 8 for a schematic diagram).

Inter actions betw een DBD domain and DNA
W ha t follows is a detailed description of the interactions between protomer B of SaCodY and the 5 half-site DNA (Figure 9 ). The position of the recognition helix in the major groove is stabilized by hydrogen bonds of recognition helix residues Thr213 and Arg222 to backbone phosphates of T 5 , T 6 and T 17' located on opposite sides of the major groove (the prime-sign indica tes templa te strand nt). The recognition helix position allows the side chain of Val216 and Val218 to make hydrophobic contacts with the methyl group C7 of the thymine bases T 6 and T 17' . Ser215, located at the recognition helix N-ter minus, for ms the only basespecific hydrogen bond between the recognition helix and major groove bases. The side chain hydroxyl group projects towards the DNA major groove floor and forms a hydrogen bond with the carbonyl oxygen atom O4 of T 6 or O6 of G 18' . Absence of electron density for Ser215 side chains indica te tha t the side chain can adopt dif ferent rotamer conforma tions tha t allow switching between alternati v e hydrogen bonds with bases T 6 or G 18' . Interestingly, the rotamers potentially allow for hydrogen bonds also with bases

Figur e 9.
CodY-DN A reco gnition. Ribbon representation of SaCodY protomer A (blue) interacting with major and minor grooves of the consensus binding site. Interacting residues are depicted in stick r epr esentation. Hydrogen bonds between residues and DNA backbone as well as between Ser215 and Met237 and DNA bases in major and minor groove are indica ted. LH indica tes linker helix and RH recognition helix. For clarity, the ribbon r epr esentation for r esidues 244-260 is excluded. G 6 and T 18' , and weaker hydrogen bonds with G 7 and T 7 . Ther efor e, in addition to the consensus T 6 C 7 / G 18' A 19' basepair step, T 6 N 7 / N 18' A 19' , N 6 C 7 / G 18' N 19' , G 6 N 7 / N 18' C 19' , N 6 A 7 / T 18' N 19' and (N 6 G 7 / C 18' N 19' , N 6 T 7 / A 18' N 19' ) basepair steps could be accommodated without significant loss of binding specificity (N r epr esents any base; base pairs steps in parentheses indicate weak hydrogen bonds with Ser215). In agreement, the C 7 -G 18' base pair is frequently missing from actual CodY-binding sites but its absence does not pre v ent ef ficient regula tion of the corresponding genes by CodY ( 39 ).
Two ␣-helix dipole interactions with the DNA backbone further stabilize the HTH-unit in the major groove (Figure 9 ). H7 packs its partial, positi v ely charged N-terminus against the backbone phospha te a t position T 4 . The T 4 phosphate is deeply buried in a cleft formed by the H7 Nterminus and the S7-S8 ␤-hairpin located in major and minor groov es, respecti v el y, and is hydro gen bonded to the main chain N-cap nitrogen atoms of Ala203 and Ser204 (H7 N-terminus), and the hydroxyl group of Thr240 (S8). On the opposite side of the major groove, H6 packs its N-terminus against the DNA backbone phosphate at position C 16' . The C 16' phosphate is hydrogen bonded with the main-chain Nca p nitro gen atom of Ser182 as well as the side-chain hydr oxyl gr oups of Ser180 and Ser182 . These r esidues r eside in the invariant 179-Leu, Ser, Tyr, Ser-182 sequence, mediating the cross-dimer interactions. In the shared half-site, the C 16 / C 16' backbones are wedged between the H6 N-terminus and S7-S8 ␤-hairpin of the two dimers. This suggests that the H6-DNA contacts may play a role in proper positioning of Tyr181 for cross-dimer stacking interactions with Phe241 in ␤-strand S8 (Tyr246 in EfCodY) (Figure 7 A).
The wing element of the wHTH motif is formed by the short turn between ␤-strands S7 and S8 comprising residues Leu235, Gly236, and Met237. The wing penetrates deeply into the AT-minor groove at position T 1 -A 3 / T 24' -A 22' , and --in the shared half-site --is positioned between the H6-and H7-N-termini of the two dimers. The embedded main chain of Gly236, as well as main and side chains of Met237, e xtend ov er the shape of the minor groove floor. Note that AT minor grooves are usuall y reco gnized by positi v ely char ged Ar g residues (or less frequently Lys residues) to complement the local shape and enhanced negati v e electrostatic potential ( 40 ). Met237 also forms one base-specific hydrogen bond with its main chain nitrogen atom to the carbonyl oxygen atom O2 of T 24' (Figure 9 ). Thus, in the shared half-site, the T 15 and T 15' bases hydrogen bond with both dimers; the minor groove base edge interacts with Met237 from one dimer and the major groove base edge interacts with Ser215 from the other. In agreement, sequence logos of the 24-nt overlapping sites from different species show that T 15 and T 15' are almost completely conserved ( 20 , 21 ). In line with the structural and base frequency data, we found tha t muta tion of the T to A or G strongl y reduces DN A binding of SaCodY and EfCodY (Supplementary Figure  S11).

Binding of CodY locally distorts the double helix
Globall y, the DN A in the complex adopts a slightly bent B-form structure; howe v er, locally, the DNA deviates from the canonical B-form. Strikingly, the minor grooves along the three AT-sequences (position A 1 -T 6 , A 10 -T 14 , A 19 -T 24 ) are unusually narrow. Along three base pairs of the two peripheral AT-sequences (T 3 -T 5 , T 20 -T 22 ), the minor groove widths average 9.5-9.6 Å (P-P distance; compared to 12.0 Å in the canonical B-form) with a minimum of 9.0 Å . The minor groove along the two CAG sequences is moderately widened (maximum 13.8 Å ), causing a strong oscillation in minor groove widths ( Figure 10 ). Notably, the minor groove compression for the central AT-sequence --bound by two wings --is less se v ere (av erage 10.4 Å , minimum 9.4 Å ), indica ting tha t the wing insertion widens an intrinsically narrow minor groove. An important feature of CodY target sites thus appears to be the narrow shape and deformability of the minor grooves.

CodY activ ation: communication betw een ligand bound GAF domains and DBD domains
Because the GAF domains are physically separated from the DBD domains, ligand-binding must be communicated via the linker helices. When comparing the available CodY structures, we observed different linker helix orientations that allowed some positional flexibility of the DBD domains, by adopting different orientations relati v e to one another and to the GAF domains. Moreover, our structural comparison suggested that the different linker helix orienta tions origina te from linker helix packing dif ferences within the GAF domains. Our data indicate that --within the GAF domain --ligand-induced conformational changes are propagated via the H1-H2 loop to H1 and the Nterminal part of the linker helices. This leads to a different packing of the linker helix in the 4-helix-bundle that forms the pr otomer-pr otomer interface (Figur e 4 ). In agr eement, superimposition of the linker helix structures shows that the linker helix orientations in activated CodY (BsCodY ( 18 , 35 ), SaCodY ( 15 ), and DNA-bound SaCodY and Ef-CodY pr esented her e) significantly differ from those in inacti v e CodY (BsCodY ( 18 , 41 ), and EfCodY presented her e) (Figur e 11 A). Notably, in the orientation of activated CodY, the invariant Arg167 from both linker helices forms stacking interactions with each other. This is not seen in inacti v e CodY. We speculate that the stacking interactions of this Arg167 facilitate the recognition helices to w edge in betw een the linker helices and to form the acti v e pr otomer-pr otomer interface. In agreement, we found that an Arg167Ala mutant had strongly reduced DNA affinity (Figure 11 B). Howe v er, CodY-DN A reco gnition is more than a simple docking of preformed DBD domains onto the DNA. This is shown by the activated free SaCodY structure ( 15 ), in which the symmetric DBD-DBD interface (including the recognition helix dimer) is not yet formed. Moreover, the DBD domains in the activated free form retain significant flexibility, showing that the DBD-dimer folds into a highly-ordered lobe only upon DNA binding.

DISCUSSION
Sa and Ef are opportunistic pathogens that pose a major risk to human health, a risk increased by the rising prevalence of antimicrobial resistant Sa and Ef strains. Hence, there is an urgent need to better understand the mechanism by which these bacteria switch from their commensal to pathogenic state. A growing body of evidence suggests that CodY plays a critical role in disease de v elopment by reorganizing metabolism and activating virulence gene expression (2)(3)(4).
The mechanism by w hich CodY-DN A binding is activa ted can dif fer for CodYs from different bacterial genera. Differences include the GTP responsi v eness, the nature of the BCAA activator, and the oligomeric state of the ligand-free form. Mechanistically, it is proposed that ligand binding causes a reorientation of the linker helices and thus primes the DBD domains for DN A binding ( 15 , 18 ). Exactl y how ligand binding leads to changes in linker helix orientation, howe v er, remained puzzling. Our biochemical and structural data of ligand-free and ligand-bound forms of SaCodY and EfCodY no w allo w for a more detailed understanding of BCAA / GTP signalling and, inter estingly, r eveal mechanistic differences and similarities ( Figure 12 ).
In both SaCodY and EfCodY, ligand-binding induces a large rearrangement at the BCAA-binding site, causing critical residues to move and form ␤-sheet interactions with S1-S2 loop r esidues. Furthermor e, for both SaCodY and EfCodY, ligand binding causes a conformational change of the H1-H2 loop associated with a C-terminal extension of H1 with 1-2 turns; this conformational change has also been observed in BsCodY ( 35 ). Conformational changes at the BCAA binding site are thus propagated to the pr otomer-pr otomer interface composed of the 4-helix bundle of H1 and linker helices that defines the relati v e linker helix orientations. In SaCodY, the conformational changes at the Ile binding site are propagated to the H1-H2 loop via GTP, explaining why Ile and GTP activate DNA-  In SaCodY, Ile-binding triggers a movement of H3 that brings the H3-H4 linker and S1-S2 loop together ( ␤-str and ␤-str and formation) and allows GTP binding. When bound, the guanine base of GTP interacts with the H1-H2 loop, which causes a C-terminal extension of H1 with 1-2 turns. At this stage, the DBD domains are less fle xib le than in their ligand-free form but primed for DNA binding. ( B ) In EfCodY, Leu binding triggers formation of H3 that brings the H3-H4 linker and S1-S2 loop together ( ␤-strand ␤-strand formation). The ␤-sheet in turn interacts with the four-residues extended H1-H2 loop, which causes a C-terminal extension of H1 with 1-2 turns. As for the DBD domains in activated SaCodY shown in (A), we expect the DBD domains in activated EfCodY to be primed for DNA binding. Arrows indicate interactions.
binding synergistically --an observation also reported for BsCodY ( 13 ). In EfCodY, a salt-bridge can directly link the conforma tional changes a t the Leu binding site to changes in the H1-H2 loop, explaining why Leu alone can activate EfCodY. Interestingly, the salt bridge in EfCodY is made possible by the 4-amino acid insertion in the H1-H2 loop. This insertion closes the distance between the loop and the H3-H4 linker by ∼8 Å . Notably, CodY from Enterococci, Streptococci and Lactococci that do not bind GTP all contain Lys74, Glu26 (or Asp) and a 3-4 amino acid insertion in the H1-H2 loop ( 16 ). This raises the possibility that the direct interaction is conserved in CodY from these genera and that the allosteric control of CodY-activity by GTP was gained or lost during bacterial evolution. The reported crystal structures of ligand-free GTP-responsi v e BsCodY and BcCodY ( 15 , 18 ) are intermediates between inacti v e and ac-ti v e states, i.e. with the H3-shift alr eady occurr ed. If the H3shift is induced by the CodY tetramerization during crystallization or if it indicates a somewhat different mechanism in Bacilli has yet to be addressed. We ne v ertheless e xpect CodY to be able to adopt intermediate and activated states e v en in the absence of ligands, with an equilibrium shift towards the activated state upon ligand-binding.
General rules governing CodY-DN A reco gnition remain poorly understood because CodY-binding sites display a remar kab ly high degree of sequence variation. In fact, due to the sequence variation, it was proposed early on that CodY recognizes a specific feature of DNA structure rather than a specific DNA sequence ( 42 ). Here, we present the structural basis for the interaction of CodY with overlapping binding sites. These re v ealed that CodY-DN A reco gnition is mainl y dri v en by local shape readout of a narrow and deformed A-T rich minor groo ve. Only tw o hydrogen bonds between each wHTH motif and half-site DNA contribute to base specificity. The conserved nature of interface-residues, as well as essentially identical DNA-interactions of SaCodY and EfCodY, suggest that the DN A reco gnition mechanism is common to all CodY proteins.
Genome-wide analysis of CodY-binding sites indicate that CodY-dependent genes largel y rel y on overla pping binding sites for regulation (20)(21)(22)(23). Because of the larger protein-DNA interface in overlapping sites, CodY binds overlapping sites with increased affinity and specificity, which can compensate for weaker and less specific interactions at the indi vidual sites. Indi vidual sites in many overlapping sites of CodY target genes --often containing four or more mismatches ( 24 ) --may not bind CodY by themselves, but they do in context of the ov erlapping site. For e xample, the overlapping sites in the hutP operator of Bs contain two and four mismatches, and both sites are required for CodY-mediated regulation in vivo ( 24 ). With bio-layer interferometry we could show that CodY efficiently only binds to the overlapping site but not to the single sites (Supplementary Figure S12).
Howe v er, the affinity of CodY for overlapping sites not only depends on its intrinsic affinities for each individual site but also on cooperativity ( 21 , 24 ). Cooperativity may play an important role in gene regulation by enhancing the responsi v eness of target genes towards small changes in activated CodY levels. The presented structures re v ealed that two CodY dimers assemble onto the overlapping binding sites such that cross-dimer interactions ar e formed, r esulting in DNA-dependent CodY tetramerization. We suggest that binding of the first CodY dimer provides the protein contact surface r equir ed to promote binding of the second dimer. Nonetheless, we think that DNA-structure hereby plays an acti v e role. This notion is supported by different minor groove widths of the central and peripheral half sites. In the central, shared half site, the minor grooves are widened at the cross dimer-interactions, thereby linking DNA deformation to cross dimer interactions.
The DNA-bound CodY tetramer allows assembly of e v en higher-or der CodY oligomers thr ough cr oss-dimer interactions with additional dimers on either side of the tetramer. The assembly of such higher-order CodY oligomers in vivo is supported by extended CodY protected r egions in r egulatory r egions pr edicted to contain mor e than two overlapping binding sites ( 24 , 39 ). For example, DNase I footprinting in Bs identified a CodY protected region in the ur eA opera tor tha t contains three overlapping binding sites with 5, 2 and 3 mismatches respecti v ely to the consensus sequence ( 24 ). Here, we showed with electromobility shift assays that this sequence can indeed assemble a DNA-complex containing three SaCodY dimers (Supplementary Figure S13A). A structural model of this complex is shown in Supplementary Figure S13B.
In summary, binding to overlapping sites with less stringent sequence r equir ements, together with a recognition mechanism largely dictated by shape readout, explains why CodY can recognize target sequences that substantially deviate from the consensus sequence. Binding of CodY to longer sites with little base-specificity provides a potent stra tegy to facilita te binding site o verlaps. These w ould be necessary in regulatory regions that serve multiple functions, e.g. over lapping oper ators and promoters or overlapping binding sites for different transcription factors. Overlapping binding sites for different transcription factors allow operators to effecti v ely integrate signals from discrete signal transduction pathways. There are se v eral e xamples known to date where CodY-binding sites overlap with binding sites of other transcription factors; e.g. PutR and AbrB in Bs ( 43 , 44 ). The mechanistic insights into the activity of CodY pr esented her e will guide efforts to uncover the full regulatory potential, a critical stepping-stone in understanding the lifestyle switch to pathogenicity in se v eral important human pathogens.
Other global transcription factors use strategies similar to those of CodY. For example, members of the LysR and IclR families use tetramers to recognize two less-conserved binding sites ( 45 , 46 ). Howe v er --unlike CodY --these proteins are tetramers in their free form. For the global repressor Fur an 'overlapping-dimer binding model' has been proposed in which Fur cooperati v ely binds and oligomerizes on operator DNA containing overlapping binding sites ( 47 , 48 ). Thus, our data pr esented her e r e v eal mechanistic insights that may be relevant to many other bacterial transcription factors and contribute to our general understanding of how these control bacterial behaviour.

DA T A A V AILABILITY
The a tomic coordina tes and the structure factors have been deposited with the Protein Data Bank ( 49 ) (PDB codes 8c7o for ligand-free SaCodY, 8c7s for SaCodY-Ile-GTP-DNA, 8c7t for ligand-free EfCodY, and 8c7u for EfCodY-Leu-DNA).

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.