Investigation of DNA sequence recognition by a streptomycete MarR family transcriptional regulator through surface plasmon resonance and X-ray crystallography

Consistent with their complex lifestyles and rich secondary metabolite profiles, the genomes of streptomycetes encode a plethora of transcription factors, the vast majority of which are uncharacterized. Herein, we use Surface Plasmon Resonance (SPR) to identify and delineate putative operator sites for SCO3205, a MarR family transcriptional regulator from Streptomyces coelicolor that is well represented in sequenced actinomycete genomes. In particular, we use a novel SPR footprinting approach that exploits indirect ligand capture to vastly extend the lifetime of a standard streptavidin SPR chip. We define two operator sites upstream of sco3205 and a pseudopalindromic consensus sequence derived from these enables further potential operator sites to be identified in the S. coelicolor genome. We evaluate each of these through SPR and test the importance of the conserved bases within the consensus sequence. Informed by these results, we determine the crystal structure of a SCO3205-DNA complex at 2.8 Å resolution, enabling molecular level rationalization of the SPR data. Taken together, our observations support a DNA recognition mechanism involving both direct and indirect sequence readout.

response should return to the level prior to step 1). Buffer was flowed over both FC ref and FC test for a further 90s.
Cycles 1-3 were repeated in triplicate for a range of protein concentrations spanning either side of the expected K D (estimated from preliminary experiments). When the experiment was completed, the chip was removed from the instrument and stored in buffer at 4°C until required again.

SPR data analysis
All sensorgrams were analysed using Biacore T200 BiaEvaluation software version 1.0 (GE Healthcare). The data were then plotted using Microsoft Excel.

Normalizing responses due to protein binding in SPR experiments
In order to readily compare the results from the SPR experiments, the responses recorded due to protein binding were normalized as described below.
The theoretical maximum response, R max , for SCO3205 (the "analyte") binding to the test DNA (the "ligand") was calculated using the formula: theoretical R max = (mol.mass analyte/mol.mass ligand) x (response for ligand capture) x (stoichiometry) However, when the ligand is DNA, it has been suggested that the result needs to be multiplied by a factor of 0.78 because the response associated with nucleic acid binding to the surface is not the same as that for a protein of equivalent mass (10,12). This correction was made in all the R max calculations in the present study.
Moreover, all calculations were made assuming a stoichiometry of one SCO3205 dimer binding to one ds oligomer of test DNA. Thus, for this system: theoretical R max = (mol.mass SCO3205 dimer/mol.mass test DNA) x DNA captured x 1 x 0.78 Then, the percentage of R max measured upon protein binding is calculated as follows: % of R max measured = (measured R/theoretical R max ) x 100 For example in the intergenic screening, for fragment 5, replicate 1 at 100 nM SCO3205 (data highlighted in Supplementary Tables S2 and S3 This figure of greater than 100% of R max could be attributed to a small amount of additional non-specific binding giving rise to more than one protein dimer binding per immobilized DNA duplex. Alternatively, it could be due to an underestimation of the theoretical R max . Indeed, it has been suggested that the 0.78 correction factor is not necessary (52), which would give rise to the following adjusted values: theoretical R max = (41046/23787) x 420.1 x 1 = 724.9 RU % of R max measured = (655.8/724.9) x 100 = 90.5% This would indicate that full 1:1 binding has not been achieved in this experiment. However, for the purposes of the analysis presented herein, the absolute values of these normalized responses are not crucial. Instead, it is their relative values that enable us to determine which of the intergenic screening fragments contain putative operator sites, and which of the footprinting truncated oligomers define the borders of these operator sites.
Nevertheless, normalized response values approaching 200%, for example, could be indicative of specific 2:1 binding i.e. two protein dimers binding per ds DNA oligomer.   ReDCaT linker  20  Biotin-gcaggaggacgtagggtagg  8718  -ReDCaT linker complement  20  cctaccctacgtcctcctgc  5925  14642  fragment_1_F  29  GCGCTCCCGGACGGCGGCGGCCATGGCTA  8925  -fragment_1_R  49  TAGCCATGGCCGCCGCCGTCCGGGAGCGCcctaccctacgtcctcctgc  14872  23797  fragment_2_F  29  CGGACGGCGGCGGCCATGGCTACTCCAAT  8908  -fragment_2_R  49  ATTGGAGTAGCCATGGCCGCCGCCGTCCGcctaccctacgtcctcctgc  14886  23794  fragment_3_F  29  CGGCGGCCATGGCTACTCCAATACTTGAA  8866  -fragment_3_R  49  TTCAAGTATTGGAGTAGCCATGGCCGCCGcctaccctacgtcctcctgc  14925  23791  fragment_4_F  29  CATGGCTACTCCAATACTTGAACTCTCAA  8785  -fragment_4_R  49  TTGAGAGTTCAAGTATTGGAGTAGCCATGcctaccctacgtcctcctgc  15003  23788  fragment_5_F  29  ACTCCAATACTTGAACTCTCAATCTTTAC  8735  -fragment_5_R  49  GTAAAGATTGAGAGTTCAAGTATTGGAGTcctaccctacgtcctcctgc  15052  23787  fragment_6_F  29  TACTTGAACTCTCAATCTTTACGTGCCGT  8797  -fragment_6_R  49  ACGGCACGTAAAGATTGAGAGTTCAAGTAcctaccctacgtcctcctgc  14991  23788  fragment_7_F  29  ACTCTCAATCTTTACGTGCCGTCAATCTA  8766  -fragment_7_R  49  TAGATTGACGGCACGTAAAGATTGAGAGTcctaccctacgtcctcctgc  15022  23788  fragment_8_F  29  ATCTTTACGTGCCGTCAATCTACGCCGAT  8807  -fragment_8_R  49  ATCGGCGTAGATTGACGGCACGTAAAGATcctaccctacgtcctcctgc  14982  23789  fragment_9_F  29  CGTGCCGTCAATCTACGCCGATTTTGTTT  8829  -fragment_9_R  49  AAACAAAATCGGCGTAGATTGACGGCACGcctaccctacgtcctcctgc  14961  23790  fragment_10_F  29  TCAATCTACGCCGATTTTGTTTAATGTTC  8827  -fragment_10_R  49  GAACATTAAACAAAATCGGCGTAGATTGAcctaccctacgtcctcctgc  14959  23786  fragment_11_F  29  ACGCCGATTTTGTTTAATGTTCAAGGAAC  8911  -fragment_11_R  49  GTTCCTTGAACATTAAACAAAATCGGCGTcctaccctacgtcctcctgc  14876  23787  fragment_12_F  29  TTTTGTTTAATGTTCAAGGAACCGTCTCG  8892  -fragment_12_R  49  CGAGACGGTTCCTTGAACATTAAACAAAAcctaccctacgtcctcctgc  14895  23787  fragment_13_F  29  TAATGTTCAAGGAACCGTCTCGTACAGTG  8921  -fragment_13_R  49  CACTGTACGAGACGGTTCCTTGAACATTAcctaccctacgtcctcctgc  14868  23789  fragment_14_F  29  CAAGGAACCGTCTCGTACAGTGGGACACA  8925  -fragment_14_R  49  TGTGTCCCACTGTACGAGACGGTTCCTTGcctaccctacgtcctcctgc  14867  23792 The sequences of the linker, and its complement, are shown in lower case. Data highlighted in red were used in the Supplementary Methods. F = forward strand; R = reverse strand. Data highlighted in red were used in the Supplementary Methods, which explains how "theoretical R max " and "% of R max measured" were calculated. The sequence of the linker complement is shown in lower case. An explanation of how "theoretical R max " and "% of R max measured" were calculated is given in the Supplementary Methods. The sequence of the linker complement is shown in lower case. An explanation of how "theoretical R max " and "% of R max measured" were calculated is given in the Supplementary Methods. *O 3205 _RH gave an anomalously low normalized maximum response, which we are unable to explain. The sequence of the linker complement is shown in lower case. The sequence of the linker complement is shown in lower case. An explanation of how "theoretical R max " and "% of R max measured" were calculated is given in the Supplementary Methods.   The sequence of the linker complement is shown in lower case. The sequence numbering is relative to the 22-mer used to determine the crystal structure ( Figure 5). An explanation of how "theoretical R max " and "% of R max measured" were calculated is given in the Supplementary Methods. The sequence numbering is relative to the 22-mer used to determine the crystal structure (see Figure 5).
is the ith observation of reflection hkl, 〈I(hkl)〉 is the weighted average intensity for all observations i of reflection hkl and N is the number of observations of reflection hkl. d CC ½ is the correlation coefficient between intensities taken from random halves of the dataset. e The data set was split into "working" and "free" sets consisting of 95 and 5% of the data, respectively.
The free set was not used for refinement. f The R-factors R work and R free are calculated as follows: R = ∑(| F obs -F calc |)/∑| F obs | x 100, where F obs and F calc are the observed and calculated structure factor amplitudes, respectively. g As calculated using MolProbity (53).     Figure S2. The 119 nt sequence of the sco3204-sco3205 intergenic region was fragmented using a Perl script, termed POOP (Perl Overlapping Oligo Producer). Part of the program output is shown above. Each fragment oligomer is 29 nt long and overlaps with its neighbour(s) by 22 nt. The 3' adenine of fragment 14 actually corresponds to the first nt of the start codon for sco3205. POOP also produces a text file containing all the required oligomer fragments in a format suitable for ordering for synthesis, including the complement to the ReDCaT linker (see Supplementary Table S2), which was attached to the 3' ends of the reverse strands in all cases.
POOP is available as part of this submission (Supplementary Program 1) and should run on any Unix operating system. Methods). In each case, the test DNA oligomers were 29 bp in length. SCO3205 binding was measured at concentrations of 10 nM (pale blue), 50 nM (mid blue) and 100 nM (dark blue). All measurements were carried out in duplicate. The oligomer sequences and the SPR data are given in Supplementary Tables S2 and S3. (B) Selected sensorgrams showing protein binding and dissociation phases for a non-binding sequence (fragment 2) a partial hit (fragment 4) and a full hit (fragment 6), each at both 10 nM and 100 nM SCO3205 concentration.
Note that the relatively short injection time used in the screening prevented the 10 nM injection of SCO3205 over fragment 6 from saturating.   Supplementary Table S12. For each, the wild-type sequence is shown in blue. All the mutated sequences are shown in grey. (B) The percentage of SCO3205 still bound after the dissociation phase (coloured blue for the wild-type replicates, and grey for mutated sequences) and after each 1 M NaCl wash (yellow and orange, respectively). The bars for each sequence have been overlaid. The sequence numbering is relative to the 22-mer used to determine the crystal structure (see Figure 5). = residue interacting with DNA β2 β3 α5 α2 (cont.) 3 10 -1 β1 α3 α4 = β/γ-turn β/γ = β-hairpin = Secondary structure: β-strand, α-helix/3 10 -helix 3 10 -2 α6 3 10 -3 α2 Key α1 Figure S9. Secondary structure analysis of protein chain A from the SCO3205-DNA complex as output by the PDBSUM server (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html)(54).

MosR
SlyA OhrR SCO3205 Figure S10. Cartoon stereoview representations of the SCO3205-DNA complex together with other known MFR-DNA complexes (displayed in the same relative orientation using the same colour scheme as for Figure   7). Also shown as grey dots are the helical axes for each DNA duplex as determined using CURVES+ (56). Interactions between the wing, in particular Arg98, and the minor groove. (C) Indirect interactions between the C-terminal tail and the phosphate backbone: Asp158 interacts through Arg32, and the C-terminal residue, Arg163, interacts through Arg72 via its carboxyl group. Figure S13. Widths of major (red) and minor (green) grooves for one DNA duplex in the asymmetric unit of the SCO3205-DNA complex, as calculated using the CURVES+ server (http://gbiopbil.ibcp.fr/cgi/Curves_plus/)(56). The dashed lines show the corresponding widths in ideal B-form DNA. Very similar plots were obtained using the second DNA duplex in the ASU (data not shown).