Biofoundry-assisted expression and characterization of plant proteins

Abstract Many goals in synthetic biology, including the elucidation and refactoring of biosynthetic pathways and the engineering of regulatory circuits and networks, require knowledge of protein function. In plants, the prevalence of large gene families means it can be particularly challenging to link specific functions to individual proteins. However, protein characterization has remained a technical bottleneck, often requiring significant effort to optimize expression and purification protocols. To leverage the ability of biofoundries to accelerate design–built–test–learn cycles, we present a workflow for automated DNA assembly and cell-free expression of plant proteins that accelerates optimization and enables rapid screening of enzyme activity. First, we developed a phytobrick-compatible Golden Gate DNA assembly toolbox containing plasmid acceptors for cell-free expression using Escherichiacoli or wheat germ lysates as well as a set of N- and C-terminal tag parts for detection, purification and improved expression/folding. We next optimized automated assembly of miniaturized cell-free reactions using an acoustic liquid handling platform and then compared tag configurations to identify those that increase expression. We additionally developed a luciferase-based system for rapid quantification that requires a minimal 11–amino acid tag and demonstrate facile removal of tags following synthesis. Finally, we show that several functional assays can be performed with cell-free protein synthesis reactions without the need for protein purification. Together, the combination of automated assembly of DNA parts and cell-free expression reactions should significantly increase the throughput of experiments to test and understand plant protein function and enable the direct reuse of DNA parts in downstream plant engineering workflows.


Introduction
Plant synthetic biology endeavors to apply principles of abstraction, modularity and standardization to engineer plants for useful purposes (1,2). In support of this goal, the plant community adopted a common syntax, commonly known as the 'phytobrick' standard, that defines the features of DNA parts (3,4). For expression in plants, phytobricks can be assembled into synthetic genetic circuits using a number of plasmid toolkits including MoClo (5), Loop (6), GoldenBraid (7) and Mobius (8). These systems use Type IIS restriction endonucleases to direct onepot digestion-ligation assembly reactions, known as Golden Gate (9), that are easily parallelized and miniaturized for automation (10,11). Parallel DNA assembly has become an enabling technology for biofoundries, which specialize in automating the designbuild-test-learn (DBTL) cycle that underpins synthetic biology (12)(13)(14).
Plants have long been exploited as sources of bioactive and high-value natural products (15), and recently, applications of model-informed synthetic biology approaches have led to sophisticated engineering of crop traits including biomass and responses to the environment (16)(17)(18). There is also growing interest and investment in the use of plants, particularly Nicotiana benthamiana, as photosynthetic platforms for the production of recombinant proteins and small molecules for industry and medicine (19)(20)(21), including rapid-response vaccines (22-25). For many applications, methods to rapidly characterize protein functions are essential. This particularly applies to understanding the specific functions of members of large protein families such as decorating enzymes and transcription factors (TFs). However, for plant proteins, this is frequently a bottleneck, with researchers often investing considerable time and effort to identify permissive constructs and conditions to obtain useful yields before lengthy protocols to purify recombinant proteins for in vitro assays.
Cell-free protein synthesis (CFPS) is an established tool for rapid in vitro protein production that combines a DNA template, energy source, amino acids, nucleotide triphosphates (NTPs) and excess cofactors, along with a crude lysate containing the translational machinery (26, 27). The source lysate from Escherichia coli is highly active, easy to manipulate and can produce a broad range of proteins (28) including enzymes (29, 30), antibodies (31), glycoproteins (32, 33), as well as proteins containing non-canonical amino acids (34-36). It is also possible to use plant-based lysates from wheat germ (37), Arabidopsis (38) and BY-2 tobacco cells (39). Cell-free expression can minimize 'build' times in the DBTL cycle (40) and has been used to prototype metabolic pathways (29,30,41,42) and characterize several natural product biosynthesis pathways (43,44) including cyanobacterial alkaloids (45) and antibiotic peptides (46). Furthermore, CFPS is amenable to automation and miniaturization (47). The use of automation platforms that utilize acoustic energy to transfer reagents has been shown to reduce operator error and variability in CFPS reactions (48), facilitate active learning-guided optimization of reaction conditions (49) and generally increase the throughput of experiments (50)(51)(52)(53)(54)(55).
In this work, we have developed molecular tools and automated biofoundry workflows to enable cell-free expression of plant proteins ( Figure 1A). We first developed a phytobrickcompatible plasmid toolkit including (i) plasmid acceptors containing regulatory elements for T7-driven E. coli CFPS and the commercial TNT SP6-Coupled Wheat Germ Expression System (Promega), (ii) affinity tags for purification or detection and (iii) a suite of tags for improving the yields of soluble protein. We then use the toolkit to optimize the expression of a range of plant proteins including enzymes and TFs. Finally, we demonstrate the functional activity of the cell-free expressed proteins. Together, our tools and workflows enable the rapid optimization of expression conditions and allow immediate progression to characterization assays, significantly increasing the scale and throughput of experimentation.

DNA assembly
Acceptor plasmids for CFPS (see Table 1) were designed as 'terminal acceptors' and are not designed for subsequent assembly into plasmids containing multiple transcriptional units; each contains promoter and terminator sequences specific to the expression system being utilized. Acceptors for E. coli CFPS were based on pJL1 (Addgene #69496) and some include an N-terminal Expression Tag (NET) that encodes Met-Glu-Lys-Lys-Ile (MEKKI) shown to enhance the expression of some proteins (29). Acceptors for wheat germ expression were based on pEU-Gm23 (Addgene #53738), which uses the pEU backbone (56) containing the E01 translational enhancer (57).
New level 0 DNA parts (i.e. N-terminal tags, coding sequences (CDSs), C-terminal tags) were either synthesized (Twist Bioscience, San Francisco, CA) or amplified by PCR with overhangs containing BpiI (BbsI) recognition sites and assembled into pUAP1 (Addgene #63674) to create parts compatible with the phytobrick standard (3). We included a number of N-terminal tags previously shown to improve translation, solubility and folding (58,59): the S-tag sequence pancreatic ribonuclease A (KETAAAKFERQHMDS) typically used for quantitation/purification but anecdotally suggested to improve protein solubility (60); a small ubiquitin-related modifier (SUMO) sequence known to increase soluble expression (61) derived from pDest-Sumo (Addgene #106980) (62); a TrxA sequence from E. coli and derived from pET32a-TRXtag (Addgene #11516) (63); a glutathione-S-transferase (GST) (64) and thrombin sequence derived from pGEX-2T (GE Healthcare, Chicago, IL), and a maltose-binding protein (MBP) (65) and Factor Xa sequence derived from pMAL-c5X vector (New England Biolabs, Ipswich, MA) with a V313A mutation to be consistent with the native E. coli sequence. We also included N-terminal and C-terminal HiBiT tags (Promega, Madison, WI) for quantification using luminescence, a green fluorescence protein (GFP) derived from pJL1 (Addgene #69496) and a C-terminal twin-strep-tag (66). Finally, we included a CDS part encoding the tobacco etch virus (TEV) protease containing a S219V mutation reported to have less autolysis (67). All DNA assembly reactions were performed in 20 µl (manual) or 2 µl (automated) reaction volumes as previously described (10) and were verified by sequencing. All plasmids used in this study are described in the supplementary information (Supplementary Table S1) along with nucleotide sequences for L0 DNA parts encoding plant proteins (Supplementary  Table S2). Reactions were assembled in a final volume of 2 µl using a Labcyte Echo® 550 (Beckman Coulter, Brea, CA) via a two-step strategy. Initially, 30-50 µl of each plasmid (or water) was distributed into an Echo® Qualified 384-Well Polypropylene Source Microplate (384PP, P-05525). Subsequently, 65 µl of a master mix containing all remaining CFPS reagents (with lysate added just before distributing) was aliquoted to fresh wells of the same source plate. Next, the Labcyte Echo® Plate Reformat software v1.6.4 was used to direct the distribution of 1735 nl to each well of a FrameS-tar® 384-Well PCR Plate (4ti-0384/C; 4titude, Wotton, UK) (i.e. the destination plate) using 384PP_AQ_SP2 as a sample plate-type setting. Each source well can support 20-21 reactions. Then, the Labcyte Echo® Cherry Pick software v1.6.4 (guided by a .csv file) used the 384PP_AQ_BP2 sample plate-type setting to direct the distribution of 265 nl containing 26.6 ng plasmid (equivalent to a final concentration of 13.3 ng/µl) with remaining volume water to all destination wells. The destination plate was then centrifuged briefly at room temperature, covered with adhesive aluminum foil and incubated at 30 • C for 20 h (lid temperature 37 • C) in a Mastercycler pro384 vapo.protect thermocycler (Eppendorf, Hamberg, Germany). Scaled-up (15 µl) CFPS reactions were performed in 1.5 ml DNA LoBind® microcentrifuge tubes (0030108051; Eppendorf). To quantify soluble/total protein, reactions were centrifuged for 10 min at 4 • C at 21 000 × g to pellet insoluble protein.
All wheat germ cell-free expressions used the TNT® SP6 High-Yield Wheat Germ Protein Expression System L3260/L3261 (Promega) according to manufacturers' instructions. Plasmids were added at a final concentration of 80 ng/µl, and assembled reactions were incubated at 25 • C for 20 h.

Protein quantification
To compare relative protein concentration of proteins containing GFP, 2 µl of CFPS reaction was diluted with 198 µl of 10 mM Tris-HCl pH 7.5 in a black 96-well plate (655076 PS medium binding; Greiner Bio-One Vilvoorde, Belgium). After a double orbital shake for 10 s at 300 rpm, fluorescence was measured (excitation 470 nm, emission 515 nm) using a CLARIOstar plate reader (BMG Labtech Gmbh, Ortenberg, Germany).
Absolute protein concentration of HiBiT-tagged proteins was measured using the Nano-Glo® HiBiT Extracellular Detection System N2420 (Promega). CFPS reactions were diluted 10 4 -10 6 -fold using 1× PLB + PI buffer (generated by mixing 2 ml of 5× Passive Lysis Buffer E1941 (Promega) with 8 ml water plus one cOm-plete™ Mini EDTA-free 11836170001 (Roche, Basel, Switzerland) protease inhibitor tablet). A standard curve of HiBiT Control Protein N3010 (Promega) was diluted in PLB + PI at concentrations ranging from 0.01 to 2 nM. Luminescence reactions were mixed in a white 96-well plate (Greiner Bio-One 655075 PS medium binding) by combined 40 µl of diluted protein (CFPS reaction or standard protein) with 10 µl of luminescence reagent master mix (containing 10 µl of LgBiT protein, 20 µl of 50× substrate and 970 µl of HiBiT buffer). Luminescence values were recorded in a CLARIOstar plate reader every 2.33 min for an hour to ensure the signal is stable over time with the value at 18 min typically used for sample comparison. CFPS reactions assembled on the Labcyte Echo® 550 were diluted 10 2 -10 5 -fold using 10 mM Tris-HCl pH 7.5 and frozen at −20 • C; 4 µl was later thawed and added to 36 µl of PLB + PI along with 10 µl luminescence reagent master mix for measurement.

Purification of His-tagged UGT73 C5
Plasmid pEPQDCB0093 was transformed into BL21(DE3) E. coli cells. A starter culture of 50 ml of lysogeny broth (LB) media containing 100 µg/ml carbenicillin was inoculated with 2 ml of saturated overnight culture. Cells were grown at 37 • C (250 rpm) until OD 600 ∼ 1.5-3 and used to inoculate 1000 ml of 2YT media containing 100 µg/ml carbenicillin to a calculated initial OD 600 of 0.037. Cells were further grown at 37 • C (250 rpm) until OD 600 = 0.6 and transferred to a 18 • C shaking incubator and allowed to cool. After 1 h at 18 • C (250 rpm), 0.5 mM isopropyl ß-D-1thiogalactopyranoside (IPTG) was added to induce protein expression, and cells were incubated at 18 • C (250 rpm) overnight. Cells were pelleted by centrifugation at 8000 × g for 15 min at 4 • C and resuspended in 1× phosphate buffered saline (PBS). Cells were centrifuged again at 4000 × g for 15 min at 4 • C, supernatant poured off and flash frozen on liquid nitrogen for storage at −80 • C. After thawing, cells were resuspended in 50 ml Buffer A (50 mM Tris-HCl pH 8, 50 mM glycine, 500 mM NaCl, 5% (v/v) glycerol, 20 mM imidazole) supplemented with 10 mg lysozyme (Sigma, 62 971) and one tablet of cOmplete™ EDTA-free protease inhibitor (Roche, 11873580001). Cells were lysed by a single run through a Cell Disruption System CF1 (Constant Systems Limited, Daventry, UK) cell disruptor at 26 kpsi and centrifuged at 40 000 × g for 30 min at 4 • C to remove debris. Protein was purified using an AKTA Pure HPLC (GE Healthcare) fitted with a HisTrap HP 5 ml column (GE Healthcare) equilibrated with Buffer A. Samples were step-eluted using Buffer B (50 mM Tris-HCl pH 8, 50 mM glycine, 500 mM NaCl, 5% (v/v) glycerol, 500 mM imidazole). Eluted protein was further purified on a HiLoadTM 16/600 SuperdexTM 200 column (GE Healthcare) and eluted with Buffer A4 (20 mM HEPES, 150 mM NaCl, pH 7.5). Protein was concentrated using a Vivaspin 500 column (Sartorius, Goettingen, Germany) column following the manufacturer's instructions.
For detection of geraniol and nepetalactol glycosides, 2 µl of quenched in vitro reaction was injected and separated on a Waters UPLC with an Acquity BEH C18, 1.7-µm (2.1 × 50 mm) column (40 • C) at a flow rate of 0.6 ml/min. Mobile phase A was 0.1% formic acid and mobile phase B was acetonitrile. A linear gradient from 5% B to 60% B in 5.5 min and 60-100% B in 0.5 min was applied for compound separation followed by 100% B for 1 min. The flow was returned to 5% B for 2.5 min to re-equilibrate prior to the next injection. Eluting compounds were subjected to positive electrospray ionization (ESI) and analyzed on a Waters Xevo TQ-s (QqQ) using optimized source conditions: cone voltage 30 eV, capillary  Approximately 0.1 g of NaCl was added, and the terpenoids were extracted by addition of 500 µl tert-butyl methyl ether. Compounds were analyzed using an HP 6890 gas chromatograph with a 5973 MSD (Hewlett Packard/Agilent, Santa Clara, CA) using 1 µl injection, a Zebron™ ZB-5HT Inferno™ capillary column (30 m × 0.25 mm × 0.10 µm + 5 m Guardian) with split vent and helium carrier gas at a constant flow of 1.0 ml/min. The inlet temperature was 200 • C, and the initial column temperature was held at 40 • C for 0.5 min, increased at 25 • C/min to 70 • C, increased at 3 • C/min to 120 • C and finally increased at 50 • C/min to 200 • C, which was maintained for 3 more minutes. Mass spectra of relevant peaks were compared with NIST database standards to identify chrysanthemol and lavandulol.

A modular type IIs cloning toolbox for cell-free protein synthesis
To enable a pipeline for cell-free expression of plant proteins, we first built a suite of plasmid vectors amenable to miniaturization and automation and able to utilize DNA parts in the phytobrick standard ( Figure 1A). We built plasmid toolkits for two cell-free systems: the E. coli-based PANOx-SP (29, 68, 69) (driven by a T7 promoter) and the Promega TNT® SP6 High-Yield Wheat Germ Protein Expression System (driven by the SP6 promoter) (37). Using high copy number plasmid backbones, we constructed a suite of 10 different acceptor plasmids with minimal regulatory sequences and strong terminators for both systems, each capable of utilizing Level 0 phytobrick N-tag, CDS and C-tag parts ( Figure 1B, Table 1). All plasmids have been deposited in the Addgene plasmid repository.
To expand the utility of the DNA assembly toolbox, we made a library of parts encoding N-terminal and C-terminal tags for purification, detection and improved expression. Facile detection of proteins is important for high-throughput CFPS applications, and although sfGFP was optimized to reduce interference with the folding of its fusion partners, bulky fluorescent protein tags are not always tolerated. To accurately measure protein expression without a GFP fusion or using expensive radiolabeled amino acids, we adapted the HiBiT system sold by Promega. This requires only a minimal 11-amino acid tag to be genetically encoded. Following protein expression, an 18-kDa engineered polypeptide with high affinity for the tag is added, resulting in a complex with luciferase activity proportional to the level of HiBiT-tagged protein. To assess the ability of HiBiT to quantify protein in CFPS reactions, we measured the E. coli cell-free expression of sfGFP fused with a C-terminal HiBiT tag (pEPQDKN0248) to be 41.8 ± 3.0 µM (1140 ± 84 µg/ml) (Figure 2A, second bar from top), which is consistent with other PANOx-SP systems using BL21(DE3) (69,72). The HiBiT tag works for quantification on both the N-and C-terminus of a target protein; however, we found that inclusion on the N-terminus reduced cell-free protein yield when measuring relative GFP fluorescence (Figure 2A). Several fusion tags have been shown to improve translation, solubility and folding of recombinant proteins; however, it is rarely obvious which expression tag is optimal for a given protein. We therefore included a number of different tags in the toolkit. We initially assessed the functionality of S-tag, SUMO, thioredoxin (TRX), GST and MBP tags by assembling them with the sfGFP CDS and doing E. coli CFPS reactions (Figure 2A). We found that many N-terminal tags appeared to express well, although the larger tags (such as MBP) reduced expression. As expected, they did not increase the expression of sfGFP, for which the CFPS reaction conditions have been optimized (34, 69).
As many downstream protein applications require the removal of tags that may interfere with, for example, enzyme functionality, we developed a low-cost method for tag cleavage. First, we expressed the TEV protease S219V using CFPS. Subsequently, this reaction was mixed with the reaction containing the target protein containing the cleavage site (Supplementary Figure . Values in panels A and C represent averages (n = 3), and error bars represent 1 SD. S1). Mixing the two reactions overnight at a 1:130 w/w ratio of protease:target (∼1:2 v/v ratio of cell-free reactions) showed efficient cleavage when assayed by western blot using an anti-GFP antibody ( Figure 2B) and did not affect enzyme activity (Supplementary Figure S1C).

Sequential improvement of automated cell-free protein synthesis
To enable the progression of high-throughput combinatorial experiments, for example, to select tags that enable the best yields for individual proteins, we established an automated workflow for low-volume CFPS. Our aim was to achieve consistent results across replicates and to ensure that the yields obtained from low-volume reactions correlated with those from largescale reactions, thus demonstrating that conditions established using biofoundries are transferable. To maximize consistency, we pre-mixed all cell-free reaction components into a master mix, which was distributed to all wells of a 384-well plate to which the plasmid template was subsequently added. We first found that calibration of plate type settings was required to enable nanovolumes of master mix to be accurately transferred by the acoustic droplet technology of the Labcyte Echo (Supplementary Figure S2A and B). We then optimized the number of destination wells to which the reaction mix was transferred from each source well to reduce the number of failed reactions (Supplementary Figure S2C). Finally, to reduce the time taken for 384 CFPS reactions to be assembled while maximizing the consistency and yield of protein between reactions, we compared four different reaction volumes (Supplementary Figure S2D) as well software protocols that monitor reagent transfer (Supplementary Figure S2E).
In summary, we found that a single specific plate setting (384PP_AQ_SP2) was essential for consistent distribution and that expression levels obtained from 2000-nl reactions were the most consistent and produced the largest amount of protein. Expression of GFP fusion protein was found to be consistent across all wells of a 384-well plate, and the yields obtained correlated well (R 2 = 0.902) with assembly by a manual pipette (15 µl) ( Figure 2C). This suggests that automated reaction assembly is an appropriate method for quantifying and comparing expression from different DNA templates.

Selecting optimal configurations for expression of plant proteins
To test our DNA and cell-free assembly workflow on non-model CDSs, we first selected a range of plant UDP glycosyltransferases (UGTs). UGTs are a superfamily of enzymes that catalyze the addition of glycosyl groups, including many plant specialized metabolites. Although UGTs can easily be identified from their primary sequence, functional assays are typically required to identify which family members are active on a specific substrate. We included UGT73C5 from Arabidopsis thaliana, together with 10 uncharacterized UGTs from N. benthamiana. We constructed eight different versions of 11 plant UGTs (88 total plasmids), assembled 2000-nl cell-free reactions and measured total protein level using the HiBiT assay ( Figure 3A). In general, we observed that SUMO and S-tags improved the expression of most UGTs; however, several did not express well in any context and may require further optimization or an alternate expression strategy. The GST fusion produced high protein levels for only a handful of UGTs (UGT74T6). Finally, we manually assembled 15-µl reactions for a dozen different UGT-encoding plasmids and again obtained a strong correlation with the low-volume automated reaction (Supplementary Figure S3).
After demonstrating that biofoundry protocols can be used to select optimal construct configurations, we proceeded to test the expression of different classes of proteins. We first tested additional enzymes. CFPS has previously been demonstrated for monoterpene synthases that catalyze a regular head-to-tail 1-4 linkage, for example, geranyl diphosphate synthase or farnesyl diphosphate synthase (29). We successfully used CFPS to express chrysanthemol diphosphate synthase (CcCPPase) from Chrysanthemum cinerariaefolium (Supplementary Figure S4), which catalyzes an irregular, non-head-to-tail 1-2 linkage between molecules of DMAPP and also hydrolyzes the diphosphate moiety to produce chrysanthemol that contains a cyclopropane ring and is a precursor to pyrethrins, a widely used class of plant pesticides (73).
We then tested the expression of plant TF proteins, which are of great interest to plant synthetic biologists aiming to engineer complex gene regulatory networks known to control important agricultural traits or to link the synthetic circuits to endogenous processes. Most plants contain large TF families, and many of their cognate DNA sequences remain unknown. We selected two TGA-family TFs from Arabidopsis thaliana (TGA) and three TFs from Triticum aestivum (bread wheat). All five TFs were expressed using E. coli CFPS with five different N-terminal configurations ( Figure 3B). The NET and NET-6xHis tags were the most consistent across multiple TFs, but the S-tag proved remarkably helpful in expressing TGA2. To further assess the cell-free expression of TFs from wheat, we wanted to compare E. coli CFPS with expression using TNT SP6 High-Yield Wheat Germ Protein Expression System (Promega). As a first test, we first assembled GFP into three different plasmid acceptors containing different combinations of terminator and antibiotic resistance sequences and found that folA terminator from the pEU vector outperformed the T7 terminator (Supplementary Figure S5). We then cloned the three LUX protein CDSs (along with LHYA and NAMA1) into the optimal wheat germ acceptor and measured soluble and total protein expression using the HiBiT assay. While both systems produced (C) Cell-free-expressed AtTGA2 binds a 60-bp DNA probe containing its cognate-binding site but does not impede the mobility of a probe with randomized sequence.
folded protein, the wheat germ kit generally produced higher levels of soluble protein with the exception of NAMA1, only expressed in the E. coli system ( Figure 3C).

Cell-free expressed proteins are functionally active
Having identified conditions from which we could obtain a good yield of several proteins, we wanted to test if the proteins obtained were functionally active and whether assays could be conducted without extensive purification protocols. To test UGT enzymatic activity, we expressed UGT73C5 in 15-µl reactions and incubated the reactions with UDP-glucose along with geraniol or cis-trans-nepetalactol. Reactions were quenched at 1 h (Supplementary Figure S6) and then analyzed by liquid chromatography-mass spectrometry (LC-MS), which monitored possible glucoside-substrate adducts ( Supplementary Figures S7  and S8). Cell-free reactions expressing UGT73C5 in three different tag configurations produced glucosides of both substrates with peaks matching the retention time of glucosides produced by purified UGT73C5 and not present in the GFP negative control ( Figure 4A). The cell-free reactions, including GFP negative control, contain additional peaks likely derived from unspecified compounds in the E. coli lysate. We were also able to detect the synthesis of chrysanthemol by gas chromatography-mass spectrometry (GC-MS) after incubating cell-free reactions expressing CcCPPase with DMAPP ( Figure 4B), obtaining the expected fragmentation pattern (Supplementary Figure S9).
To test the binding ability of the cell-free expressed protein to DNA, CFPS-derived AtTGA2 was incubated with a DNA sequence probe containing a known binding site from the cauliflower mosaic virus (CaMV) 35S promoter (74) and DNA-protein interactions visualized by EMSA. Cell-free expressed TGA2 successfully bound to the probe containing its binding site but not the random control sequence (Figure 4C), suggesting that the DNA-binding domain of the cell-free expressed TGA2 was correctly folded and functional.

Discussion
Biofoundries provide a powerful combination of automation platforms and synthetic biology approaches, including the application of engineering principles and computational modeling and analysis to aid the design and evaluation of large datasets. Consequently, they can significantly increase the scale and throughput for experiments testing a given biological problem or question (12)(13)(14). Building on earlier works in which we established automation-compliant DNA assembly tools and protocols for the assembly of constructs for engineering plants systems (3,5,10), we have developed a toolbox for CFPS compatible with DNA parts used by the plant community ( Figure 1, Table 1). This enabled us to obtain useful yields of both enzymes and regulatory proteins, progressing directly to characterization experiments. The resulting toolbox contains 37 plasmids, including acceptors for T7-driven E. coli CFPS as well as commercial wheat germ protein expression. While other Golden Gate assembly systems could potentially be used for E. coli cell-free expression, they are customized for specific applications (75)(76)(77) and are incompatible with DNA parts in the Phytobrick standard. While, for some proteins, optimal expression levels might be obtained by system-specific codon-optimization, optimal expression is generally unnecessary for rapid characterization and is outweighed by the ability to reuse a large number of parts in prototyping experiments.
Over the past few decades, CFPS has been successfully used to express and prototype bacterial proteins including DNA regulatory elements (50,(77)(78)(79), metabolic sensors (80,81) and ribosomal peptides (82). It has also been applied to metabolic pathways from a range of organisms, mainly to prototype biosynthesis for production in microbial chassis (29,30,41,45,83,84). More recently, the advantages of this technology have been applied to enabling the expression of proteins that can be challenging to express, such as cytotoxic proteins and glycoproteins (26,28,32,33,85,86). Automated liquid handling has the general advantage of increasing throughput, reducing reaction volumes and improving repeatability. These advantages have already been applied to assembling the components of CFPS reactions (47)(48)(49)(50)(51)(52)(53)(54). In this study, we coupled automated nanoscale DNA assembly workflows to CFPS to progress high-throughput experiments that enable the selection of optimal configurations for the expression of multiple plant proteins. We show how software settings, reaction volumes and assembly time are important for consistent cell-free expression and optimize multiple parameters to obtain consistent, repeatable yields from low-volume reactions (Figure 2, Supplementary Figure S2). Our workflows were aided by the use of the high-affinity, 11-amino acid HiBiT tag (87), which is able to complex with a 18-kDa engineered NanoLuc polypeptide (88) and luminesce proportional to the level of HiBiT-tagged protein.
We found that this worked well as a C-terminal tag but reduced yields when fused to the N-terminus of sfGFP (Figure 2), perhaps because the HiBiT amino acid sequence is not optimal for early translational elongation (89). Although sfGFP has been optimized to improve folding (90), the minimal size of the HiBiT tag may be less likely to inhibit protein function and accessibility. However, alternatives may be required for proteins in which the C-terminus needs to be folded inside the protein or must be freely available for activity or post-translational modifications. To assist this, we demonstrate that tags can be removed after synthesis by simply incubating with a parallel reaction expressing TEV protease ( Figure 2C).
We then applied the twin capabilities of automated highthroughput DNA assembly and low-volume CFPS to rapidly select the tag configurations that resulted in useful levels of expression of different plant proteins. Although we did not expect to find a single configuration that was generally applicable to multiple types of proteins, there was surprisingly little consistency between even closely related proteins. Tags that significantly improved the yields of some family members resulted in little or no expression of others ( Figure 3A). This demonstrates the utility of automated high-throughput and low-volume screens to select optimal configurations.
Cell-free expression has yet to be widely applied in plant science but has had an impact in the development of next-generation sequencing techniques such as DNA affinity purification sequencing. In this technique, wheat germ-based cell-free expression systems have been used to produce affinity tagged TFs that are used to capture genomic DNA enabling the identification of TFbinding sites (91,92). However, when applied to genome-scale collections of TFs, insufficient expression was obtained for several hundred proteins (91). We expressed TGA TFs, known to regulate the expression of defense-related genes (93)(94)(95)(96)(97) as well as the CaMV 35S promoter (74,(98)(99)(100)(101)(102). We also expressed wheat homologs of TFs that regulate circadian rhythms in Arabidopsis including homologs of LUX ARRHYTHMO (LUX) from each of the three wheat sub-genomes (103). The E. coli lysate system provides substantial cost benefits (20-fold less than wheat germ, see Supplementary Figure S10), and although better yields were obtained with the wheat-germ system for three TFs of wheat origin, the E. coli system proved beneficial two additional TFs (LHYA and NAMA1) ( Figure 3C). We do not, however, expect success with all protein classes and did not, for example, attempt the synthesis of plant proteins that are widely known to express poorly in non-eukaryotes such as cytochrome P450s and oxidosqualine cyclases, which in their native cells are localized to lipid droplets or embedded into the endoplasmic reticulum (104,105).
Plants synthesize a diverse array of metabolites that contribute to adaptation to ecological niches, serving as attractants for beneficial organisms and providing defense against biotic and abiotic agents (106). Metabolic profiling has been widely applied to assess this diversity, identifying a number of molecules that have been leveraged for use in industry and medicine. However, the genetic basis of biosynthesis for many molecules remains unknown. With many genomes now sequenced and many more underway, a major challenge is to assign the function to sequence, which is particularly challenging for large enzyme families. Sequence similarity allows the classification of enzymes into large and complex superfamilies, but the highly specific reactions, substrates and products that enzymes catalyze remain difficult to predict. For example, there is interest in characterizing the substrate specificity of plant glycosyltransferases as it has been observed that some metabolites are over-glycosylated by native enzymes present in N. benthamiana hindering yield (107)(108)(109)(110). A significant advantage of cell-free expression was the ability to progress directly to functional analysis without timeconsuming cell-disruption and purification protocols. We were able to express and demonstrate the expected activity of an Arabidopsis UGT that has previously been leveraged for overproduction of triterpene saponins (111)(112)(113). This workflow is ideally suited to enzymes from plant or microbial secondary metabolism. However, enzymes present in the source lysate may interfere with metabolic reactions of the expressed enzyme, particularly if a cognate of the enzyme of interest is present in the E. coli or wheat germ lysate (114). To alleviate this challenge, users could add purification steps to their cell-free expression protocol or utilize source strains deficient in the enzyme of interest (115). Previous studies have shown that some cell-free expressed enzymes exhibit equivalent activity to enzymes produced in cells and subsequently purified (116); however, kinetic characterization experiments will be compromised if lysate proteins that utilize similar substrates or products to the enzyme of interest are present.
The approaches described in this manuscript can be applied to a range of diverse proteins. High-throughput, low-cost experimental pipelines will be useful both for directly assigning function to novel sequences and also for the creation of large datasets that can train machine learning algorithms to predict the characteristics such as specificity from the primary sequence (117,118). Further, automated reaction assembly may be a helpful approach for adjusting cell-free reaction conditions unique to a given protein of interest (49,85).

Supplementary data
Supplementary data are available at SYNBIO Online.

Data availability
Plasmids are available from Addgene (#162281-162317). Nucleotide sequences and corresponding accession numbers are provided in the supplementary data.

Funding
The authors acknowledge the support of the Biotechnology and Biological Sciences Research Council (BBSRC), and the Engineering and Physical Sciences Research Council (EPSRC), part of UK Research and Innovation; this research was funded by BBSRC Core Strategic Programme Grant (Genomes to Food Security) BB/CSP1720/1; National Capability Grant BBS/E/T/000PR9815 (Earlham Biofoundry); BB/P010490/1, an industrial partnership award with Leaf Expression Systems; BB/R021554; BBSRC/EPSRC OpenPlant Synthetic Biology Research Centre Grant (BB/ L014130/1). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.