Directed co-evolution of interacting protein–peptide pairs by compartmentalized two-hybrid replication (C2HR)

Abstract Directed evolution methodologies benefit from read-outs quantitatively linking genotype to phenotype. We therefore devised a method that couples protein–peptide interactions to the dynamic read-out provided by an engineered DNA polymerase. Fusion of a processivity clamp protein to a thermostable nucleic acid polymerase enables polymerase activity and DNA amplification in otherwise prohibitive high-salt buffers. Here, we recapitulate this phenotype by indirectly coupling the Sso7d processivity clamp to Taq DNA polymerase via respective fusion to a high affinity and thermostable interacting protein–peptide pair. Escherichia coli cells co-expressing protein–peptide pairs can directly be used in polymerase chain reactions to determine relative interaction strengths by the measurement of amplicon yields. Conditional polymerase activity is further used to link genotype to phenotype of interacting protein–peptide pairs co-expressed in E. coli using the compartmentalized self-replication directed evolution platform. We validate this approach, termed compartmentalized two-hybrid replication, by selecting for high-affinity peptides that bind two model protein partners: SpyCatcher and the large fragment of NanoLuc luciferase. We further demonstrate directed co-evolution by randomizing both protein and peptide components of the SpyCatcher–SpyTag pair and co-selecting for functionally interacting variants.


INTRODUCTION
Cellular biology is governed by a complex network of protein-protein interactions (PPIs). In many cases, the principal interacting component of one protein in a binary complex presents as a short, often alpha helical region, that retains binding affinity in the form of a discrete peptide (1,2). This knowledge can guide development of both small molecule and peptidic antagonists towards therapeutic targets and protein biosensors (3)(4)(5)(6). Protein engineering can further derive novel peptide-protein pairs by splitting compliant proteins into interacting components. This approach has yielded robust tools for biosensing, imaging and targeted protein conjugation (7)(8)(9). Methodologies that disclose new PPIs, modulate affinities of known PPIs, and select for novel peptide/protein binders are therefore important tools for proteomics, drug discovery, target validation and biotechnology applications. To this end, a suite of 'N-hybrid' platforms including the prototypical yeast two-hybrid (Y2H) selection methodology have been developed and successfully implemented over the years (10)(11)(12)(13). These couple in vivo protein-protein interactions to co-localization of two protein domains required for signal generation, typically a component of the transcriptional machinery and a DNA-binding protein. Despite the widespread success of conventional in vivo two-hybrid platforms, certain limitations remain. Efficient nuclear import of the fusion proteins is often a prerequisite for read-out, and reliance on cell viability along with use of mesophilic reporter proteins limits use for co-selection of thermostability, a desired feature in many downstream applications of evolved proteins. The protein-fragment complementation assay (PCA) is a related genetic in vivo method, wherein a PPI leads to reconstitution of an otherwise nonfunctional split transducing/reporter protein (14)(15)(16)(17). PCAs can provide dynamic read-outs and are often amenable to highthroughput screening campaigns. They can, however, be prone to background issues due to spontaneous reassembly of the split fragments that is independent of fusion partner interaction. As with two-hybrid approaches, most PCAs employ mesophilic reporters, again restricting their use in co-selection of thermostability.
The compartmentalized self-replication (CSR) directed evolution platform was originally developed to select for thermostable nucleic acid polymerase variants with improved functionality (18). CSR entails clonal encapsulation of bacteria expressing a library of polymerase variants into the aqueous compartments of a heat-stable emulsion. Subsequent thermal cycling permits amplification of a polymerase gene only by the particular enzyme it encodes, quantitatively linking activity of constituent library members to the copy number of their respective genes. This dynamic feature enables rapid selection of novel polymerases with desired properties such as improved thermostability, tolerance for non-natural bases, and resistance to inhibitors (19)(20)(21)(22). CSR has been further modified to permit selection of other enzymes by coupling their activities to polymerase read-out (18,23). Here, we show that a high affinity and thermostable peptide-protein interaction can also be coupled to DNA polymerase function, thus enabling read-out of their encoding genes by CSR. This is achieved by expression of candidate peptides/proteins as respective fusions to Taq polymerase and the Sso7d processivity clamp. Peptideprotein interaction brings Sso7d into close proximity with Taq polymerase, allowing DNA amplification in otherwise prohibitively high-salt concentrations. This approach, termed compartmentalized two-hybrid replication (C2HR) is used in selections employing well-characterized highaffinity protein-peptide pairs (SpyCatcher-SpyTag and the large/small fragments of split NanoLuc luciferase). C2HR also permits co-evolution of interacting protein-peptide pairs, as exemplified by co-randomization of SpyCatcher and SpyTag and selection for interacting variants.

Materials
Oligonucleotides and genes were from Integrated DNA Technologies; restriction enzymes, T4 polynucleotide kinase and T4 DNA ligase were from NEB; Pfu DNA polymerase (Agilent Technologies) and Taq DNA polymerase (Bioline) were used for DNA amplification. Nucleic acid purification kits were from Qiagen and chemicals from Sigma. Electrocompetent TG1 and BL21 cells were obtained from Lucigen.
The large fragment of split NanoLuc luciferase was amplified using primers NanoBigDUET-F and NanoBigHHH-R and the product cloned into Sso7d Stoffel pETDuet-1 to create Sso7d-NB Stoffel pETDuet-1. A series of complementary primer pairs were annealed to form oligo duplexes which were cloned into this vector to get Sso7d-NB NS1/2/3/4/5/6-Stoffel pETDuet-1 for test selection.

Polymerase activity assays
Constructs expressing Stoffel, HhH-Stoffel fusion protein and Sso7d-Stoffel were transformed into Escherichia coli BL21 (DE3) competent cells. Cells expressing HhH-Stoffel were grown in LB medium with glucose (10 mM) and induced for 3 h at 37 • C with 1 mM isopropyl-␤-Dthiogalactoside (IPTG). Cells expressing Sso7d-Stoffel were grown in LB medium with glucose (10 mM) and induced with 1 mM IPTG with different temperature and duration as described in text. About 1 ml of culture was then harvested by centrifugation, washed with phosphate buffered saline (PBS) twice and resuspended in 50 l of PBS. About 2 l of cell suspension was used for PCR (95 • C for 5 min, 25

Compartmentalized self-replication (CSR) selections
CSR reactions were essentially carried out as previously described (18). All expressor cells were grown in LB medium with glucose (10 mM

Sequence analysis
Amplicons generated by C2HR were adapted by PCR using primers SpyT-F19 and SpyT-R19 (Lib 1) and SpyT-F17 and SpyT-R17 (Lib 2) and sequencing carried out using the NextSeq Illumina platform (DNA Link, Korea). Data extraction/analysis was carried out using Python scripts developed in the p53 Laboratory.

Protein expression and purification
The Sso7D-SpyC construct was cloned with a N-terminal 6xHis-tag and transformed into E. coli BL21(DE3) (Invitrogen) competent cells. These were grown in LB medium with glucose (10 mM) at 37 • C and induced at OD 600 nm ∼ 0.6 at 25 • C with 1 mM IPTG and incubated overnight. Cells were then harvested by centrifugation, and the cell pellet was resuspended in binding buffer (50 mM Tris pH 8, 500 mM NaCl, 20 mM Imidazole) and sonicated. The cell lysate was heated at 65 • C for 15 min before clarification by centrifugation. The clarified cell lysate was applied to a 1 ml His-TrapFF column (GE Healthcare) pre-equilibrated in binding buffer and bound protein was eluted using a linear gradient (0-100%) in elution buffer ( Protein purity as assessed by SDS-PAGE was ∼95%, and the protein was concentrated using Amicon-Ultra (3 kDa MWCO) concentrator (Millipore). The SpyTag-Stoffel construct was cloned with a C-terminal 6× His-tag and transformed into E. coli BL21(DE3) (Invitrogen) competent cells. These were grown in LB medium at 37 • C and induced at OD 600 nm ∼ 0.6 at 30 • C with 0.5 mM IPTG and incubated overnight. Cells were then harvested by centrifugation, and the cell pellet was resuspended in binding buffer (50 mM Tris pH 8, 500 mM NaCl, 20 mM imidazole) and sonicated. The cell lysate was then heated at 65 • C for 15 min and clarified by centrifugation. The clarified cell lysate was applied to a 1 ml His-TrapFF column (GE Healthcare) pre-equilibrated in binding buffer and bound protein was eluted using a gradient elution (0-100%) in elution buffer (50 mM Tris pH 8, 500 mM NaCl, 1 M imidazole) over 50 column volumes. The fractions containing the protein were pooled and buffer exchanged into buffer with 50 mM Tris pH 8, 150 mM NaCl, 1 mM DTT and run on a size exclusion HiLoad 16/600 Superdex S200 column. Fractions were pooled and protein purity as assessed by SDS-PAGE was ∼95%. The protein was concentrated using Amicon-Ultra (10 kDa MWCO) concentrator (Millipore).
Activity assay was carried out by co-incubating purified proteins (Sso7D-SpyC and SpyTag-Stoffel, 5 M each) at room temperature for 30 min in buffer comprising 50 mM Tris pH 8, 150 mM NaCl. About 1 l of the reaction mixture was subjected to polymerase activity assays as mentioned above.

Pull-down assay
Biotin-labelled peptides (100 M) were incubated with streptavidin beads (50 l) for 2 h in PBS at room temperatures prior to washing with three washes of PBS + 0.1% (v/v) Tween 20. Beads were next incubated at 4 • C overnight with 500 M of Sso7d-SpyCatcher protein, followed by three washes with PBS + 0.1% (v/v) Tween 20 and then three washes with PBS. Bound protein was eluted by boiling in SDS buffer prior to analysis by SDS-PAGE.

Coupled polymerase read-out of protein-peptide interactions using model interactants
We first assayed the polymerase activity of the Stoffel fragment of Taq DNA polymerase (amino acids 293-832) (24) fused to either the Sso7d or Topoisomerase V HhH processivity domains. As previously reported (25,26), both domains facilitated PCR amplification in higher salt concentrations (>50 mM KCl) that inhibited the non-chimeric Stoffel fragment (Figure 1).
The SpyCatcher-SpyTag protein-peptide pair associate with relatively high affinity to form a complex with exceptional stability due to interlinking isopeptide bond formation (7). Sso7d-SpyCatcher and SpyTag-Stoffel fusion proteins were co-expressed in E. coli and polymerase activity assayed by adding cells directly to other standard PCR components and carrying out thermal cycling in buffer with increasing salt concentrations. Covalent association between SpyCatcher and bound SpyTag peptide resulted in an Sso7d-SpyCatcher-SpyTag-Stoffel fusion protein competent for PCR in high-salt buffer (Figure 2A). Control reactions omitting either one or both of the SpyCatcher/SpyTag components did not show any DNA amplification. SDS-PAGE analysis of cell lysates used in PCR confirmed formation of the thermostable Sso7d-SpyCatcher-SpyTag-Stoffel fusion protein ( Figure 2B). Similar results were obtained using purified protein components ( Figure 2C). Only the reaction comprising Sso7d-SpyCatcher and SpyTag-Stoffel proteins yielded PCR amplicons in high-salt buffer, with formation of the Sso7d-SpyCatcher-SpyTag-Stoffel fusion protein again confirmed by SDS-PAGE. We next replaced the SpyCatcher and SpyTag components with the noncovalently interacting large and small peptide fragments of split NanoLuc luciferase (NB and NS, respectively) (8). A series of small peptide fragments with wide ranging affinities for the large fragment (Kds 0.7-1.9 × 10 5 nM) were fused to Stoffel and individually co-expressed with the Sso7dlarge fragment chimera. PCR analysis directly using expressor cells showed a positive high-salt buffer read-out for peptide variants with affinities ≤ 180 nM for the large NanoLuc fragment ( Figure 3A,B and Supplementary Figure S1). Furthermore, amplicon yields correlated with the reported affinities of the small fragment peptides (8), with maximal polymerase activity observed for the highest affinity peptide (NS1, Kd = 0.7 nM). We additionally assayed activity of the same panel of expressor cells for amplification of a larger 1545 bp fragment using normal buffer conditions but reduced annealing and extension times (15 and 10 s, respectively) during thermal cycling. Under these conditions, the same subset of salt-tolerant expressor cells yielded the correct amplicon ( Figure 3C). Notably, all expressor cells yielded the larger amplicon when longer annealing and extension times (30 and 120 s) were used. Similar results were obtained using recombinant Sso7d-SpyCatcher and SpyTag-Stoffel proteins. Generation of the larger amplicon using a shorter extension time (15 s) only occurred when both proteins were present whilst SpyTag-Stoffel alone was able to yield amplicon using longer (120 s) extension time (Supplementary Figure S2). Therefore, high-affinity interactions can also be assessed using fast cycling conditions that require processivity gains attendant on Sso7d co-localization with polymerase to generate signal.

Model selections for interacting proteins and peptides using the compartmentalized self-replication (CSR) platform
The dynamic read-out of the reporter polymerase was next evaluated in the CSR platform. A test selection was carried out using E. coli cells co-expressing either Sso7d-NB + Stoffel or Sso7d-NB + NS1-Stoffel (Figure 4). Cells were mixed at different ratios prior to emulsification and ther-   Figure 2A). Similar results have been reported previously (24)(25). mocycling in high-salt buffer using a primer pair common to both expression constructs flanking the NS1 cassette. In the absence of emulsification, the Sso7d-NB-NS1-Stoffel complex amplified from both expression plasmid templates as expected ( Figure 5). In contrast, C2HR enabled clonal amplification/enrichment of the NS1 cassette in plasmids expressing NS1-Stoffel (upper arrowed band) over those expressing Stoffel only (lower arrowed band). This is readily apparent at the 1:100 ratio of cells, with selection for the NS1 gene cassette occurring only when C2HR is used. The panel of cells co-expressing Sso7d-NB and NS-Stoffel variants ( Figure 3) were next combined equally and one round of C2HR carried out. Analysis of only 10 selectants indicated preferential enrichment for the high-affinity NS1 variant (Kd = 0.7 nM, 5/10 selectants) followed by the next highest affinity variant, NS5 (Kd = 3.4 nM, 3/10 selectants). The other two selectants encoded the lower affinity NS2 variant. Together, these experiments confirm that C2HR can select for high-affinity interacting protein pairs.

Selection for functional SpyTag peptide variants using compartmentalized two-hybrid replication (C2HR)
We next created a library of SpyTag-Stoffel variants wherein the hydrophobic 'IVMV' motif in SpyTag essential for high-affinity interaction with SpyCatcher ( Figure 8A) (7) was randomized. This library (Lib 1) was co-expressed in E. coli along with Sso7d-SpyCatcher prior to encapsulation in emulsion compartments containing oligonucleotide primers flanking the randomized region of SpyTag along with other requisite PCR components (dNTPs, high-salt buffer). Ten rounds of thermal cycling were carried out to facilitate clonal amplification of genes encoding functional SpyTag core motifs, following which amplicons were harvested and sequenced en masse. This identified selection of 96,400 unique peptide sequences with an average read number of 168. The wild-type 'IVMV' motif was the 161st most abundant (16443 reads), indicating positive enrichment (Supporting Data File S2). Consensus motifs (27) highlighting positional frequencies of residues from 20 random sequences from the naïve and the 20 most enriched by selection varied notably, indicating stronger preference for hydrophobic residues in the latter ( Figure 6A). Further consensus sequence analysis of the top 500 abundant motifs identified the endogenous 'IVMV' motif, and highlighted tolerance for other bulky hydrophobic residues in place of the isoleucine and methionine residues ( Figure 6B). These pack into a hydrophobic groove in SpyCatcher and are essential for high-affinity interaction ( Figure 8A) (28). Higher sequence variation was tolerated at both valine positions in the motif, again commensurate with structural data show-ing these residues to project away from the SpyCatcher hydrophobic pocket and contributing less to productive binding interactions.
We next carried out a further single round selection, this time randomizing the three residues either side of the core 'IVMVD' motif of SpyTag (Lib 2). The obligate aspartic acid residue in this motif forms the isopeptide bond with lysine 31 in SpyCatcher. Sequencing yielded 160 415 unique peptide sequences with an average read number of 91 (Supporting Data File S2). Endogenous SpyTag with 'GAH' and    'AYK' flanking motifs was the 52nd most abundant peptide (26 269 reads), again indicating positive selection by C2HR. Notably, no clear consensus motif emerged upon analysis of the top 500 enriched sequences, signifying a higher degree of redundancy for residues flanking the SpyTag 'IVMVD' core motif ( Figure 6B). This was confirmed by analysis of the top 10 enriched flanking motifs for SpyCatcher binding. All showed a positive, covalent interaction with Spy-Catcher as judged by high-salt PCR and SDS-PAGE analysis ( Figure 7). We further synthesized biotinylated peptides encoding SpyTag and the top Lib 2 selected variant (STL2: SFDIVMVDHVS) and assayed pull-down of a recombinant target protein (Sso7d-SpyCatcher). As before, the variant showed comparable activity to SpyTag, pulling down a similar amount of the SpyCatcher fusion protein (Supplementary Figure S3). In both the L1 and L2 selections, wildtype SpyTag was present among the top 0.5% most abundant peptide sequences selected after one round, giving an indication of the cut-off threshold for future selections.

Co-evolution of an interacting protein-peptide pair using C2HR
We next investigated co-evolution of both peptide and an interacting partner using C2HR. The isoleucine residue in the core 'IVMV' motif of SpyTag packs into a discrete hydrophobic pocket lined by phenylalanines 75 and 92 of Spy-Catcher ( Figure 8A). These three residues were simultane- ously randomized to cover all amino acid combinations and C2HR selection carried out. In contrast to previous selections, the primer pair was chosen to generate amplicons coencoding interacting SpyCatcher and SpyTag variants during the emulsion PCR phase. We additionally carried out selections using uninduced cells, relying on T7 promoter leakiness to reduce protein levels and potentially increase selection pressure.
After one round of selection using induced cells, 1 out of the 42 selectants analyzed comprised the endogenous FF/I residues at the randomized SpyCatcher/SpyTag positions. Other combinations that were enriched included IY/W, LF/Y and FF/P (2 out of 42 selectants for each). Consensus sequence analysis of all 42 selectants further highlighted preference for hydrophobic residues at the three randomized positions ( Figure 8B). In particular, clear selection for the endogenous phenylalanine residues in Spy-Catcher was observed. No clear consensus emerged from analysis of 52 random sequences from the unselected library, although there was some inherent bias for phenylalanine and leucine at codon 92 of SpyCatcher. A second round of selection did not lead to enrichment of any specific motif, but clearly enriched for bulky hydrophobic residues at the randomized positions. In the absence of induction, the FFI motif was not observed in any of the selectants analyzed in the first round. It was, however, enriched after the second round (4/47 selectants). As with induced C2HR conditions, clear selection for bulky hydrophobic residues was also observed. The aggregate consensus for all sequences enriched during both rounds ( Figure 8B, top right) further emphasizes this preference.

DISCUSSION
We have described facile detection of protein-peptide interactions through coupling to enzymatic activity of a thermostable nucleic acid polymerase. Whilst we have exemplified using the Stoffel fragment of Taq polymerase, evolutionary conservation of protein-nucleic acid interaction mechanisms (29,30) suggests that other families and classes of polymerase (e.g., DNA/RNA dependent RNA polymerase) could potentially be configured to work in C2HR. As shown, E. coli cells co-expressing a protein-peptide pair can be added directly into a PCR tube and interaction validated by assessing amplicon yield after thermal cycling. While end-point PCR was used to validate interactions, more quantitative readouts could be obtained using realtime PCR. Emulsion PCR and other single molecule detection methodologies (31-33) could possibly also be adapted for absolute (i.e., digital) quantification of interacting pairs.
We have further transposed the interaction assay into the CSR directed evolution platform to select for peptide binders using two model interacting peptide-protein pairs. The highest affinity variant of a peptide fragment of split NanoLuc luciferase was readily enriched from a test pool of described peptides with wide-ranging affinities. As next exemplified using the interacting SpyCatcher-SpyTag pair, a much larger repertoire of candidate peptides was interrogated through a single round of C2HR and deep sequencing to rapidly identify binders with a hydrophobic consensus peptide motif comprising the endogenous SpyTag core sequence. Given the irreversibility of the SpyCatcher-SpyTag interaction, it is likely that selection for improved SpyTag variants will require additional selection pressure. This could be introduced by further reducing substrate levels through tighter control of intracellular expression and/or co-expression of competing substrates. Another option is to express SpyTag variants fused to the Stoffel fragment intracellularly, and titrate levels of recombinant Sso7d-SpyCatcher adding during C2HR emulsification of cells. We have also shown directed co-evolution by selection of interacting protein-peptide pairs from a focused co-randomized library. Here, we varied the key isoleucine in SpyTag along with the two phenylalanines in the SpyCatcher hydropbobic cleft that it packs against. Selection yielded the endogenous SpyCatcher and SpyTag residues, and highlighted functional degeneracy, with many other combinations of hydrophobic residues being tolerated. This plasticity has previously been exploited to yield orthogonal SpyCatcher-SpyTag pairs through mutagenesis of the same residue set by conventional screening (34). Further C2HR selections incorporating competitor substrates and deep sequencing could therefore yield many more orthogonal protein-peptide pairs with wide-ranging applications (34)(35)(36). Additionally, use of faster cycling conditions to score for productive interactions ( Figure 3C and Supplementary Figure S2) would obviate the use of higher salt concentrations during selections, and could be employed to select for protein-peptide interactions under more physiologically relevant conditions. Of pertinent interest would be the study and co-evolution of interactant pairs in clinically relevant virus-host systems (37,38).
The protein-peptide pairs used in this study are inherently thermostable, a pre-requisite for polymerase read-out of interactions by thermal cycling. The SpyCatcher-SpyTag pair has a reported T m of 85.4 • C, while the large fragment of Nanoluc has a T m of 54 • C (8,39). The thermal stability of both is likely further elevated through binding to high-affinity peptides and by fusion to the highly thermostable Sso7d so as to enable read-out using thermal cycling at consistently elevated temperatures. This thermostability requirement could be exploited to select for thermostabilizing mutations in peptide-binding proteins (e.g.,  The two underlined phenyalanine residues in SpyCatcher and the underlined isoleucine in SpyTag were randomized prior to selection. The corresponding positions of these residues (purple, pink and yellow respectively) in the binary complex is shown on the right (adapted from 4MLI) (28). (B) Consensus sequence logos for naïve and library selectants after one or two rounds of C2HR. Frequency of endogenous (FF/I) and other enriched motifs indicated. Top right logo denotes aggregate consensus for all round 1 and round 2 sequences. n = 52 (naïve), 42 (R1 induced, R1 uninduced), 55 (R2 induced), 47 (R2 uninduced) and 186 (all R1 and R2 selectants). Error bars represent ± SD.