Identification of plant enhancers and their constituent elements by STARR-seq in tobacco leaves

Genetic engineering of cis-regulatory elements in crop plants is a promising strategy to ensure food security. However, such engineering is currently hindered by our limited knowledge of plant cis-regulatory elements. Here, we adapted STARR-seq — a technology for the high-throughput identification of enhancers — for its use in transiently transformed tobacco leaves. We demonstrate that the optimal placement in the reporter construct of enhancer sequences from a plant virus, pea and wheat was just upstream of a minimal promoter, and that none of these four known enhancers was active in the 3′-UTR of the reporter gene. The optimized assay sensitively identified small DNA regions containing each of the four enhancers, including two whose activity was stimulated by light. Furthermore, we coupled the assay to saturation mutagenesis to pinpoint functional regions within an enhancer, which we recombined to create synthetic enhancers. Our results describe an approach to define enhancer properties that can be performed in potentially any plant species or tissue transformable by Agrobacterium and that can use regulatory DNA derived from any plant genome. One-sentence summary We developed a high-throughput assay in transiently transformed tobacco leaves that can identify enhancers, characterize their functional elements and detect condition-specific enhancer activity.

enhancers -for its use in transiently transformed tobacco leaves. We demonstrate that the optimal 28 placement in the reporter construct of enhancer sequences from a plant virus, pea and wheat was just 29 upstream of a minimal promoter, and that none of these four known enhancers was active in the 3′-UTR 30 of the reporter gene. The optimized assay sensitively identified small DNA regions containing each of the 31 four enhancers, including two whose activity was stimulated by light. Furthermore, we coupled the assay 32 to saturation mutagenesis to pinpoint functional regions within an enhancer, which we recombined to 33 create synthetic enhancers. Our results describe an approach to define enhancer properties that can be 34 performed in potentially any plant species or tissue transformable by Agrobacterium and that can use 35 regulatory DNA derived from any plant genome. 36

37
In a time of climate change and increasing human population, crop plants with higher yields and improved 38 response to abiotic stresses will be required to ensure food security. As many of the beneficial traits in 39 domesticated crops are caused by mutations in cis-regulatory elements, especially enhancers, genetic 40 engineering of such elements is a promising strategy for improving crops (Swinnen et al., 2016;Scheben 41 Here, we established a STARR-seq assay that uses transient expression of STARR-seq libraries in 73 tobacco leaves. This assay bypasses the need for a species-specific protoplasting protocol and instead 74 relies on efficient Agrobacterium-mediated transformation. Among species that are amenable to 75 transformation with Agrobacteria, tobacco combines fast and robust growth with convenient 76 transformation by syringe-infiltration of intact leaves. As transcription factors are highly conserved among 77 plant species (Lehti-Shiu et al., 2017;Wilhelmsson et al., 2017), the versatile tobacco system can serve 78 as a proxy for many plant species, including crops. We optimized the placement of the enhancer 79 candidates to provide an optimal dynamic range and performed proof-of-principle experiments to 80 demonstrate that the assay can detect enhancers and characterize the underlying functional elements. 81 Furthermore, we show that our in planta assay is capable of detecting light-dependent changes of the 82 transcriptional activity of known light-sensitive enhancers. 83

84
The positioning of enhancers strongly impacts their activity in tobacco STARR-seq 85 Transient expression in tobacco leaves is a well-established method for reporter assays. We tested 86 whether STARR-seq, a massively parallel reporter assay to identify active cis-regulatory elements, could 87 be performed by transient expression of libraries in tobacco. We created a reporter construct with a green 88 fluorescent protein (GFP) gene under control of the Cauliflower mosaic virus 35S minimal promoter and 89 the 35S core enhancer (Fang et al., 1989;Benfey et al., 1990) (subdomains A1 and B1-3). Agrobacterium 90 tumefaciens cells harboring this construct were used to transiently transform leaves of 3-4 week old 91 tobacco (Nicotiana benthamiana) plants. After two days, the resulting mRNAs were extracted from the 92 transformed leaves and analyzed by next generation sequencing ( Figure 1A). 93 To ensure a wide dynamic range of the assay, we systematically analyzed the position-and orientation-94 dependency of the 35S enhancer ( Figure 1A). We used a more generalized version of STARR-seq in 95 which we placed a barcode in the GFP open reading frame. This barcode is linked to the corresponding 96 enhancer variant by next generation sequencing and serves as a readout for the activity of the variant. 97 For each variant, we used 5-10 constructs with different barcodes. This barcode redundancy helps to 98 mitigate potential effects that an individual barcode might have on the transcript level. As expected, the 99 35S enhancer was active in either orientation and both up-and downstream of the reporter gene ( Figure  100 1B). Similar to previous observations (Fang et al., 1989), the activity of the 35S enhancer was lower when 101 present downstream of the gene as compared to upstream of the minimal promoter. In contrast to the 102 mammalian system, when placed in the 3′-UTR, the enhancer had almost no activity. Addition of a 103 second copy of the enhancer in the 'downstream' and 'distal upstream' positions led on average to a 70% 104 increase in transcript levels as compared to a single enhancer, while a second copy in the 'upstream' 105 position increased transcript levels by only 30% ( Figure 1B). These observations suggest that the 106 transcriptional activation caused by a single 35S enhancer directly upstream of the minimal promoter is 107 already close to the maximum level detectable in our assay. 108 We observed the strongest activation of transcription with the enhancer immediately upstream of the 109 minimal 35S promoter, and lower levels when the enhancer was placed about 1.5 kb away from the 110 promoter as in the 'downstream' and 'distal upstream' constructs (Figures 1A and 1B). To characterize 111 the distance-activity relationship, we inserted the 35S enhancer at different positions within a 2 kb spacer 112 upstream of the minimal promoter ( Figure 1C). Enhancer activity was strongest immediately upstream of 113 the promoter. However, enhancer activity was greatly reduced by 500 bp or more of spacer between the 114 enhancer and promoter ( Figure 1C), consistent with a previously described distance-dependent decrease 115 of 35S enhancer activity (Odell et al., 1988). 116 To test if the observed position-dependency is unique to the 35S enhancer, we assayed three additional 117 enhancers derived from the Pisum sativum AB80 and rbcS-E9 genes and the wheat Cab-1 gene 118 (Simpson et al., 1986;Fluhr et al., 1986;Nagy et al., 1987;Giuliano et al., 1988;Fejes et al., 1990;119 Argüello et al., 1992;Gotor et al., 1993). Similar to the 35S enhancer, these enhancers were orientation-120 independent and most active immediately upstream of the promoter, and they did not activate 121 transcription when placed in the 3′-UTR ( Figure 1D). 122

The 35S enhancer is not active in the transcribed region 123
Although previous STARR-seq studies placed candidate enhancer fragments in the 3′-UTR (Arnold et al., 124 2013;Sun et al., 2019;Ricci et al., 2019), enhancers in this position were not active in our system. To test 125 if the lack of enhancer activity in the 3′-UTR is specific to our assay in transiently transformed tobacco 126 leaves or a more general feature of enhancers in plants, we performed STARR-seq in maize (Zea mays 127 L. cultivar B73) protoplasts ( Figure 2A). The results with maize protoplasts were qualitatively similar to 128 those from the assay in tobacco leaves. The 35S enhancer was most active upstream of the minimal 129 promoter, and its activity was greatly reduced when placed in the 3′-UTR ( Figure 2B). Quantitatively, the 130 activity of the 35S enhancer in the upstream position was lower in the maize protoplasts compared to that 131 observed in tobacco leaves. However, the activity of the 35S enhancer in the 3′-UTR position was slightly 132 higher in maize protoplasts than in tobacco leaves (compare Figures 1B and 2B). 133 To explain the low activity of the 35S enhancer in the 3′-UTR, we hypothesized that such an mRNA could 134 be degraded by nonsense-mediated decay, as long 3′-UTRs can subject mRNAs to this decay pathway 135 (Kertész et al., 2006). To test whether the 35S enhancer in the 3′-UTR destabilizes the mRNA by 136 promoting nonsense-mediated decay, we inserted the unstructured region from the Turnip crinkle virus 3′-137 UTR, shown to reduce nonsense-mediated decay (May et al., 2018), in between the stop codon and the 138 enhancer. However, insertion of this region further reduced transcript levels when the 35S enhancer was 139 placed in the 3′-UTR ( Figures 3A and 3B). We next asked whether insertion of the 35S enhancer in an 140 intron, which would also be transcribed, could confer transcriptional activation, but found that it did not 141 ( Figures 3A and 3C). Furthermore, combining an upstream AB80 enhancer with a 35S enhancer within 142 the 3′-UTR transcribed region considerably reduced transcription compared to that from the AB80 143 enhancer alone ( Figure 3D). Taken together, these findings demonstrate that the 35S enhancer residing 144 within the transcribed region is not active in our system. Therefore, for subsequent experiments, we 145 placed the enhancer fragments directly upstream of the minimal promoter, barcoding the reporter 146 amplicons to enable detection by RNA-seq. A similar approach with a barcode in the transcript was used 147 in previous studies of enhancers in human cells (Kwasnieski et al., 2012;Inoue et al., 2019). 148 The tobacco STARR-seq assay can detect enhancer fragments and their light-dependency 149 The AB80, Cab-1 and rbcS-E9 enhancers are activated by light (Simpson et al., 1986;Nagy et al., 1987;150 Fluhr et al., 1986). We tested the light-dependency of these enhancers in our assay system by placing 151 the transformed plants in the dark prior to mRNA extraction. The AB80 and Cab-1 enhancers 152 demonstrated decreased activity in the dark. Although the activity of the rbcS-E9 enhancer also showed a 153 response to light, in this case the activity was higher in the dark (Figure 4) Next, we tested if the assay could detect enhancer signatures among randomly fragmented DNA 160 sequences from a plasmid containing embedded enhancers. We constructed a plasmid harboring the 161 35S, AB80, Cab-1, and rbcS-E9 enhancers. We fragmented the plasmid using Tn5 transposase and 162 inserted the fragments upstream of the 35S minimal promoter to generate a fragment library for use in the 163 STARR-seq assay ( Figure 5A). This fragment library consisted of approximately 6,200 fragments linked to 164 a total of ~50,000 barcodes. About 40,000 (80%) of these barcodes were recovered with at least 5 counts 165 from the extracted mRNAs. The STARR-seq assay identified the known enhancers as the regions with 166 highest enrichment values ( Figure 5B). As expected, the orientation in which the fragments were cloned 167 into the STARR-seq plasmid did not affect their enrichment (Supplemental Figure 1A). This result 168 confirms that the fragments act as enhancers instead of as autonomous promoters, whose activity would 169 be orientation-dependent. The assay was highly reproducible, with good correlation across replicates for in this study (Spearman's ρ ≥ 0.6 for barcodes and ≥ 0.7 for fragments or variants, Supplemental Table   174 1). 175 We also used the fragment library in a STARR-seq experiment with plants kept in the dark prior to mRNA 176 extraction to test for light-dependency. We observed the expected changes in enrichment ( Figure 4), with 177 the AB80 and Cab-1 enhancers less active and the rbcS-E9 enhancer more active in the light-deprived 178 plants ( Figures 5B and 5C). We conclude that the STARR-seq assay established in this study can identify 179 enhancers in a condition-specific manner. 180

The tobacco STARR-seq assay can pinpoint functional enhancer elements 181
To further reveal individual elements of enhancers, we repeated the screen with a second library (5,700 182 fragments with a total of 73,000 barcodes, more than 95% of which were recovered from the mRNA) that 183 contained shorter fragments (median length 84 bp vs. 191 bp in the initial library, Figure 5D). As these 184 shorter fragments were, on average, well below the size of the full-length enhancers, they are unlikely to 185 contain all the elements required for maximum activity. The shorter fragments split the peaks of the AB80 186 and Cab-1 enhancers into two subpeaks, suggesting that these enhancers contain at least two 187 independent functional elements. The sole functional element of the rbcS-E9 enhancer resided in the 3′ 188 half of the tested region ( Figures 5D and 5E). 189 Having established the capacity of the assay to distinguish enhancer subdomains, we tested its suitability 190 for conducting saturation mutagenesis of cis-regulatory elements. To do so, we array-synthesized all 191 possible single nucleotide substitution, deletion, and insertion variants of the minimal promoter and of the 192 35S enhancer as two separate variant pools, and subjected the two pools to STARR-seq. Approximately 193 98% of all possible variants were linked to at least one barcode in the input library, and mRNAs 194 corresponding to over 99% of these were recovered from the tobacco leaves. We first assayed the 195 activity of variants of a 46 bp region containing the 35S minimal promoter, in constructs with and without 196 an enhancer. The effects of the individual mutations were similar in both contexts (Supplemental Fig. 2). 197 As expected, mutations that disrupt the TATA box (positions 16-22) had a strong negative impact on 198 promoter activity, while most others had a weak effect or no effect ( deleterious. This region, previously implicated in enhancer activity, can be bound by the tobacco 204 activation sequence factor 1 (ASF-1), a complex containing the bZIP transcription factor TGA2.2 (Fang et 205 al., 1989;Benfey et al., 1990;Lam et al., 1989;Niggeweg et al., 2000). Similarly, we observed mutational 206 sensitivity of the 35S enhancer in positions 95-115, which contain a binding site for the bHLH 207 transcription factor complex ASF-2 (Lam and Chua, 1989). A third mutation-sensitive region in positions 208 7-28 is predicted to be bound by ERF and TCP transcription factors. 209

Enhancer fragments can be combined to build synthetic enhancers 210
To demonstrate that these mutation-sensitive regions possess enhancer activity, we split the enhancer 211 into three fragments that span positions 1-30 (A), 60-105 (B), and 106-140 (C) ( Figure 6D). These 212 fragments were cloned in one to four copies on average, in random order, and the enhancer activity of the 213 resulting constructs was determined. We identified 100 different constructs linked to a total of 29,000 214 barcodes, 95% of which were present in the extracted mRNAs. Fragments A and C alone were sufficient 215 to activate transcription, while fragment B was active only in the presence of a second fragment ( Figure  216 6E). In line with our observations from the enhancer mutagenesis, fragment C had the highest activity. 217 The greater the number of fragments in a construct, the higher its activity. However, even four fragments 218 combined did not reach the level of transcription achieved with the full-length enhancer, indicating that the 219 sequences excluded from the A, B and C fragments contribute to enhancer activity, either directly or by 220 providing the correct spacing for the fragments ( Figure 6E). Although spacing may play a role in enhancer 221 activity, the order of the fragments had only weak effects (Supplemental Figure 3). Taken together, we 222 demonstrate that this assay can identify functional enhancer elements that can be recombined to create 223 synthetic enhancers of varying strength. 224

225
In this study, we developed a massively parallel reporter assay in tobacco plants that can identify DNA 226 regions with enhancer or promoter activity and can dissect these regions to characterize functional 227 sequences with single nucleotide resolution. The assay does not depend on efficient protoplasting and 228 transformation protocols, which have been established only for a limited number of species and tissues. 229 Furthermore, in contrast to protoplasts, the in planta system is more robust and can be exposed to a 230 variety of environmental conditions to detect condition-specific cis-regulatory elements. Indeed, our 231 tobacco STARR-seq assay can detect enhancer light-dependency. Such condition-specific cis-regulatory 232 elements could play important roles in future genetic engineering efforts to help plants adapt to a rapidly 233 changing environment. 234 We observed in our experiments that the tested enhancers were not active when placed in the 235 transcribed region. Other studies have shown that plant genes can contain elements in their transcribed 236 region, especially in the first intron, that drastically increase their expression (Callis et al., 1987;Rose and 237 Last, 1997;Rose, 2004;Samadder et al., 2008;Laxa et al., 2016;Laxa, 2017). However, the increased 238 expression levels could have been the result of enhanced transcription or translation, improved mRNA 239 processing, export, or stability, or a combination of these mechanisms. Few studies have dissected these 240 potential mechanisms, and these have generally found that enhanced transcription played no role, or only 241 a relatively small role, in the overall expression increase (Rose and Last, 1997;Samadder et al., 2008;242 Laxa et al., 2016). The apparent absence of strong transcriptional enhancers in the transcribed region of 243 plant genes could be due to any of several reasons. The constraints placed on such regions to enable 244 efficient mRNA processing and translation might not be compatible with the requirements for enhancers. 245 Alternatively, strong binding of transcription factors within the transcribed region could inhibit transcription 246 by physically blocking the RNA polymerase. Future studies will be required to address this issue in plants. 247 Comparing the activity of the 35S enhancer in transiently transformed tobacco leaves to its activity in 248 maize protoplasts, a general trend of high activity upstream of the minimal promoter and low activity 249 within the 3′-UTR was observed, but the levels differed between the two systems. Previous studies have 250 reported that the 35S promoter constructs encompassing the 35S minimal promoter and enhancer are 251 more active in dicots like tobacco than monocots such as maize and rice (Christensen et al., 1992;Bruce 252 et al., 1989). In agreement with these studies, we detected higher activity of the 35S enhancer upstream 253 of the minimal promoter in tobacco compared to maize. In contrast, the maize system led to more 35S 254 enhancer activity than the tobacco system when the enhancer was inserted into the 3′-UTR, a possible 255 effect of species-specific differences in the tolerance of an enhancer within the transcribed region. 256 Consistent with these species differences, effects of intron-mediated enhancement of gene expression 257 are stronger in monocots than dicots (Samadder et al., 2008). Alternatively, the physical state of the 258 reporter construct-containing DNA could influence enhancer activity. In maize protoplasts, the reporter is 259 Wilhelmsson et al., 2017), the enhancer elements identified in tobacco leaves will likely be active in many 288 other plant species. Furthermore, the STARR-seq assay described herein can potentially be performed in 289 any species or tissue that can be transiently transformed by Agrobacteria. Apart from enhancers and 290 promoters, the assay can likely be adapted to screen for silencers and insulators -cis-regulatory 291 elements that are known from animals but have, so far, not been detected in plants. 292 Taken together, we describe a plant STARR-seq assay that is applicable to enhancer screens for any 293 plant species to analyze plant gene regulation and to identify promising building blocks for future genetic 294 engineering efforts. The data generated by these screens and subsequent saturation mutagenesis will 295 enable deep learning approaches to identify defining characteristics of plant enhancers. 296

Plasmid construction and library creation 298
The STARR-seq plasmids used herein are based on the pGreen plasmid (Hellens et al., 2000). In their T-299 DNA region, they harbor a phosphinothricin resistance gene (BlpR) and the GFP reporter construct 300   (A) A plasmid harboring the indicated enhancers was fragmented. The fragments were inserted in the upstream position of the STARR-seq construct and their activity was measured by the STARR-seq assay. (B) Plants were grown for two days in normal light/dark cycles (light, black line) or completely in the dark (dark, blue line) prior to mRNA extraction. The log 2 (enrichment) of RNA expression over input of all fragments at each position was averaged.
(C) Light-dependency (log 2 (enrichment light /enrichment dark ) was determined for each base of the original plasmid.
(D) The STARR-seq assay was performed with plasmid fragment libraries with different fragment length distributions (see inset), and log 2 (enrichment) for each fragment library is shown across the whole plasmid. (E) log 2 (enrichment) obtained from the library with shorter fragments is shown in more detail for regions of interest. Positions in the original plasmid that contain enhancers are shaded in gray.