A Mutation in the Drosophila melanogaster eve Stripe 2 Minimal Enhancer Is Buffered by Flanking Sequences

Enhancers are DNA sequences composed of transcription factor binding sites that drive complex patterns of gene expression in space and time. Until recently, studying enhancers in their genomic context was technically challenging. Therefore, minimal enhancers, the shortest pieces of DNA that can drive an expression pattern that resembles a gene’s endogenous pattern, are often used to study features of enhancer function. However, evidence suggests that some enhancers require sequences outside the minimal enhancer to maintain function under environmental perturbations. We hypothesized that these additional sequences also prevent misexpression caused by a transcription factor binding site mutation within a minimal enhancer. Using the Drosophila melanogaster even-skipped stripe 2 enhancer as a case study, we tested the effect of a Giant binding site mutation (gt-2) on the expression patterns driven by minimal and extended enhancer reporter constructs. We found that, in contrast to the misexpression caused by the gt-2 binding site deletion in the minimal enhancer, the same gt-2 binding site deletion in the extended enhancer did not have an effect on expression. The buffering of expression levels, but not expression pattern, is partially explained by an additional Giant binding site outside the minimal enhancer. Deleting the gt-2 binding site in the endogenous locus had no significant effect on stripe 2 expression. Our results indicate that rules derived from mutating enhancer reporter constructs may not represent what occurs in the endogenous context.

sites (Ney et al. 1990;Arnosti et al. 1996;Ma et al. 2000;Milewski et al. 2004;Crocker and Stern 2017). With the advent of highthroughput DNA synthesis and sequencing, this approach has been extended to study the effects of large numbers of enhancer variants in massively parallel reporter assays (Patwardhan et al. 2009;Melnikov et al. 2012;Inoue and Ahituv 2015;White 2015). An important, but often unstated assumption of this approach is that, if we assume that enhancers are modular, we can use minimal enhancer reporter measurements to decipher regulatory genetic variation in the intact genome. In other words, mutations would behave identically in an isolated enhancer and in the genome. Here, we set out to test this assumption directly.
There are several observations that enhancer function, particularly as defined by a minimal enhancer, may not be strictly modular (Spitz and Furlong 2012;Lim et al. 2018). When measured quantitatively, the expression patterns driven by some enhancer reporters do not precisely match the endogenous pattern (Staller et al. 2015). In many loci, the paradigm of a single enhancer driving expression in a single tissue is often an oversimplification. For example, in some loci, minimal enhancers cannot be identified for a given expression pattern, and many genes are controlled by seemingly redundant shadow enhancers (Barolo 2012;Sabarís et al. 2019). Furthermore, enhancer boundaries defined by DNAse accessibility and histone marks often do not match minimal enhancer boundaries defined by activity in reporters (Kwasnieski et al. 2014;Henriques et al. 2018). In some cases, the minimal enhancer is sufficient for an animal's viability under ideal conditions, but sequences outside of the minimal enhancer are required for viability when the animal is exposed to temperature perturbations (Ludwig et al. 2011). Together, these examples highlight that while minimal enhancer regions can approximate the expression patterns of a gene, sometimes very closely, quantitative measurements of these regions' activities can reveal their inability to recapitulate the nuances of gene regulation in the endogenous context.
In this work, we directly test the assumption that the misxpression caused by a mutation in a minimal enhancer reporter construct will also be observed when the same mutation is found in the genome. We compared the changes in gene expression caused by a mutation in three versions of an enhancer: 1) a minimal enhancer in a reporter, 2) an extended enhancer that contains the minimal enhancer plus flanking sequences in a reporter, and 3) in the endogenous locus. If the minimal enhancer truly represents a modular functional enhancer unit, the effects of the mutation on gene expression will be the same in each of these contexts. If not, the effects caused by the mutation will differ.
We use the well-studied Drosophila melanogaster even-skipped (eve) stripe 2 enhancer as our case study for several reasons (Goto et al. 1989;Small et al. 1992). Eve encodes a homeodomain transcription factor essential for proper segment formation in Drosophila, and five well-characterized enhancers drive its seven-stripe expression pattern in the blastoderm embryo ( Figure 1A). To understand the mechanism of eve stripe 2 enhancer function, classic experiments mutated transcription factor binding sites in minimal enhancer reporter constructs, resulting in a set of variants with known effects that we can test in an extended enhancer construct and in the endogenous locus (Small et al. 1992;Arnosti et al. 1996). Subsequent experiments showed that, while the eve stripe 2 minimal enhancer is sufficient for an animal's viability in D. melanogaster, the sequences outside of the minimal enhancer are required to drive robust patterns of gene expression when the animal is exposed to temperature perturbations (Ludwig et al. 2011), or to drive a proper stripe in other species (Crocker and Stern 2017). Together, these experiments indicate that the minimal enhancer does not recapitulate the complete transcriptional control of eve stripe 2. The Drosophila blastoderm embryo also provides technical advantages; we can readily incorporate reporter constructs, make genomic mutations, and measure levels and patterns of gene expression at cellular resolution (Luengo Hendriks et al. 2006;Wunderlich et al. 2014). This allows us to measure potentially subtle differences in expression patterns and levels driven by different enhancer variants.
We hypothesized that a transcription factor binding site deletion will have its maximum effect on gene expression when found in a minimal enhancer, while its effects will be reduced, or buffered, when found in the extended enhancer and in the endogenous locus due to the contributions of additional regulatory DNA sequences. We tested our hypothesis and found that the effects of a Giant TF binding site deletion on gene expression are indeed buffered in the extended eve stripe 2 enhancer and in the endogenous locus. This buffering is partially explained by an additional Giant binding site in the sequence outside the eve stripe 2 minimal enhancer. These results imply that we cannot always extrapolate the effects of naturally or experimentally induced enhancer mutations in minimal reporters to extended sequences or to the endogenous intact locus. We discuss implications of our results for studying the functional consequences of regulatory sequence variation.

Enhancer sequences and mutations in reporter constructs
Each of the eve stripe 2 enhancer sequences was cloned into a pBfY plasmid containing an eve basal promoter-lacZ fusion gene, the miniwhite marker, and an attB integration site. The enhancer sequences are located immediately upstream of the eve basal promoter. All constructs were integrated by Genetic Services, Inc. into the attP2 docking site of the Drosophila melanogaster y[1], w[67c23] line. We followed the mini-white eye marker as we conducted crosses to make the transgenic fly lines homozygous.
The 484 base pair (bp) wild-type minimal (minWT) enhancer sequence was defined by Small and colleagues (Small et al. 1992). MinDgt-2 is the minWT enhancer with a 43 bp deletion of the giant-2 (gt-2) binding site as described in (Small et al. 1992). The wild-type extended (extWT) enhancer is the minWT sequence plus the 50 bp upstream and 264 bp downstream flanking sequences present in the eve locus. The boundaries of the extWT enhancer are two conserved blocks of 18 and 26 bp on the 39 and 59 ends of the enhancer (Ludwig et al. 1998). The extDgt-2 enhancer consists of the extWT enhancer with the same gt-2 binding site deletion as in minDgt-2.
To computationally predict additional Gt sites in the extended enhancer, we used PATSER and three different Gt position weight matrices (PWMs) generated with data from yeast one-hybrid, DNA footprinting, and SELEX assays (Hertz and Stormo 1999;Noyes et al. 2008;Li et al. 2011;Schroeder et al. 2011). A common Gt binding site, which we named gt-4, was found in the downstream flanking sequence of the extended enhancer using all three PWMs with a p-value of 0.001 ( Figure S1). Because of overlaps with other predicted binding sites, the gt-4 binding site was mutated by changing five nucleotides in extDgt-2 to create the extDgt-2,Dgt-4 enhancer.
The minWT-sp1 and minWT-sp2 enhancers consist of the minWT enhancer and two different 264 bp downstream spacer sequences, sp1 and sp2. Each of these sequences are about half of a 500 bp lacZ sequence from which we removed high affinity binding sites for Bicoid, Hunchback, Giant, and Kruppel, using a PATSER p-value of 0.003. The minDgt-2-sp1 enhancer is composed of minDgt-2 and sp1. MinDgt-2-sp1+gt-4 is the minDgt-2-sp1 enhancer containing the additional gt-4 binding site that we identified, located in the position where it is found in the extended enhancer. File S1 contains the sequences of all the enhancers that were tested in reporter constructs.
Endogenous eve giant-2 deletion using the CRISPR system Briefly, gRNAs (59-TCTAACTCGAAAGTGAAACGAGG-39 and 59-ATTCCGTCTAAATGAAAGTATGG-39) adjacent to the gt-2 binding site were cloned into pU6-BbsI-chiRNA. A ScarlessDsRed selection cassette (https://flycrispr.org/scarless-gene-editing/) was used with 500 bp homology arms flanking the gRNA cut sites in the eve stripe 2 enhancer. These plasmids were injected into y In situ hybridization and imaging We collected and fixed 0-4 hr old embryos grown at 25°, and we stained them using in situ hybridization as in (Luengo Hendriks et al. 2006;Wunderlich et al. 2014). We incubated the embryos at 56°for two days with DNP-labeled probes for hkb and DIG-labeled probes for ftz. Transgenic reporter embryos were also incubated with a DNPlabeled probe for lacZ, and the WT eve locus and Dgt-2 eve locus CRISPR embryos were incubated with a DNP-labeled probe for eve. Figure 1 The effect of the gt-2 binding site mutation is buffered in the eve stripe 2 extended enhancer. (A) Eve is expressed as a pattern of seven stripes along the anterior-posterior axis of the Drosophila melanogaster blastoderm, and this pattern is driven by five enhancers. Here we show a visual rendering of even-skipped expression as measured in (Fowlkes et al. 2008). (B) We generated transgenic reporter fly lines with the wildtype minimal (minWT) and extended (extWT) eve stripe 2 enhancers, and we measured lacZ expression in embryos using in situ hybridization. (C) A representative image of a minWT reporter embryo stained for lacZ and a normalization gene, hkb, is shown. The image shown is a maximum intensity projection. (D) We plotted lacZ levels in a lateral strip of cells along the AP axis (as shown in C) for the minWT (dark gray) and extWT (yellow) enhancers measured in a single stain, with the shading showing the standard error of the mean. The extWT enhancer drives a higher peak level of expression. (E) We calculated the ratio of peak lacZ expression levels (black dots in D) driven by the extWT and minWT enhancers in five different stains (open circles). The average ratio of the five stains is represented by a closed circle. The extWT enhancer drives 1.45 times higher expression than the minWT (p( extWT minWT =1) = 0.037, one-sample t-test). (F) We show the average boundary positions of the lacZ expression pattern. Error bars show standard error of the mean boundary positions of the expression pattern. The extWT enhancer drives a wider pattern of expression (yellow shading) than the minWT enhancer (gray shading), with the anterior border of the stripe laying 1.6 cell widths more anterior than the minWT enhancer pattern. (G) The transcription factor Giant (Gt) is expressed as a broad band anterior to eve stripe 2 and represses eve, establishing the anterior boundary of stripe 2. We have included a visual rendering of Giant protein and even-skipped mRNA during early nuclear cycle 14 of the blastoderm stage as measured in (Fowlkes et al. 2008) (H, I) We characterized the expression patterns and levels driven by the minimal (minDgt-2, top panels) and extended (extDgt-2, bottom panels) enhancers with a gt-2 binding site deletion. In the minDgt-2 enhancer, the gt-2 deletion causes an anterior shift in the anterior boundary of the expression pattern and an increase in expression level (p( minDgt-2 minWT =1) = 0.0018, one-sample t-test). In the extended enhancer, the gt-2 deletion causes a very slight shift in the anterior boundary and no significant change in peak expression level (p( extDgt-2 extWT =1) = 0.45, one-sample t-test; p( extDgt-2 extWT . minDgt-2 minWT ) = 0.0032, one-sided, two-sample t-test with unequal variances).
Hkb probes were used to normalize lacZ expression levels between the different transgenic reporter lines. The DIG probes were detected with anti-DIG-HRP antibody (Roche, Indianapolis, IN) and a coumarin-tyramide color reaction (Perkin-Elmer, Waltham, MA), and the DNP probes were detected afterward with anti-DNP-HRP (Perkin-Elmer) antibody and a Cy3-tyramide color reaction (Perkin-Elmer). Embryos were treated with RNAse and nuclei were stained with Sytox green. We mounted the embryos in DePex (Electron Microscopy Sciences, Hatfield, PA), using a bridge of #1 slide coverslips to avoid embryo morphology disruption. Reporter embryos from the early blastoderm stage (4-10% membrane invagination, roughly 10-20 min after the start of the blastoderm stage) were imaged, and CRISPR embryos from early blastoderm stage (9-15% membrane invagination, roughly 15-25 min after the start of the blastoderm stage) were imaged. We used 2-photon laser scanning microscopy to obtain z-stacks of each embryo on a LSM 710 with a plan-apochromat 20X 0.8 NA objective. Representative images are shown in Figure S2. Each stack was converted into a PointCloud, a text file that includes the location and levels of gene expression for each nucleus (Luengo Hendriks et al. 2006).

Data analysis of eve stripe 2 reporter constructs
To normalize the lacZ levels in the reporter embryos, we divided the lacZ signal by the 95% quantile of hkb expression in the posterior 10% of each embryo (Wunderlich et al. 2014). We expect the lacZ and hkb levels to be correlated within a transgenic line. To verify this, we ran a regression of the 99% quantile lacZ value from each embryo and the 95% quantile hkb value. Cook's distance was used to discard influential outliers (on average, 26.5% of analyzed embryos) (Wunderlich et al. 2014). To avoid extraneous sources of noise in the normalization, we only compared lacZ levels between embryos with the same genetic background and stained in the same in situ hybridization experiment.
To calculate the average lacZ expression levels along the anteriorposterior (AP) axis in each transgenic line, we used the extractpattern command in the PointCloud toolbox. This command divides the embryo into 16 strips around the dorso-ventral (DV) axis of the embryo, and for each strip, calculates the mean expression level in 100 bins along the anterior-posterior (AP) axis. We averaged the strips along the right and left lateral sides of the embryos and subtracted the minimum value along the axis to remove background noise.
We calculated the peak average lacZ expression level within the eve stripe 2 region for each transgenic line in each in situ experiment separately. We then calculated the ratio between the peak average lacZ expression levels of two transgenic lines stained in the same in situ experiment. Ratios were calculated for each stain and the average ratio from multiple stains was determined (see Figure S3 for details of stain numbers and sample sizes). To compare ratios to 1, we used one-sample t-tests. To compare two different ratios to each other, we used two-sample t-tests with unequal variances.
The boundaries of eve stripe 2 expression were defined as the inflection point of the lacZ expression levels. Since the boundaries of lacZ expression should not change between stains, plots with the average boundaries of lacZ expression in each transgenic line were made with embryos pooled from multiple stains (see Figure S3 for number of embryos measured for each genotype). The cell length differences were calculated by determining the average position of the boundary across the DV axis of the embryos analyzed. One cell length is approximately equivalent to one percent of the embryo length.
Data analysis of endogenous eve stripe 2 giant-2 deletion Briefly, we normalized to eve stripe 1 cellular expression to compare eve levels in the eve[eveS2Dgt-2] embryos and the control (Fowlkes et al. 2008). As described above, using the extractpattern command from the PointCloud toolbox, we found an averaged lateral trace across both sides of the embryo. The peak average eve expression for each stripe was normalized to the peak average expression of eve stripe 1. We performed a comparison of stripe levels between conditions using a two-sided rank sum test.
The boundary of eve stripes were defined as above using extractpattern and, for a given embryo, eight boundary positions on the left and right lateral sides were averaged. Plots with the average boundary of eve stripe 2 in the eve[eveS2Dgt-2] vs. control were made with embryos pooled from different stains. To compare boundaries between the two genotypes, a Mann-Whitney U Test was used, with the factors being one of the eight dorso-ventral positions along both lateral sides of the embryo and the embryo genotype. The p-value was corrected using a Bonferroni adjustment and reported for the genotype factor effect.

Data availability
All transgenic and CRISPR fly lines are available upon request. Supplemental files are available at FigShare. File S1 contains the sequences for all enhancer constructs, and Files S2 and S3 have binding site locations for the diagrams in Figure S6. Figure S1 has a depiction of all the predicted Gt binding sites in the eve stripe 2 enhancer. Figure S2 has representative images for all the genotypes analyzed. Figure S3 contains all ratios presented in Figures 1-3 in one plot. Figure S4 has details on the normalization used for the CRISPR fly data analysis. Figure S5 has the expression patterns for the other eve stripes in the eve[eveS2Dgt-2] CRISPR flies. Figure S6 contains the enhancer sequence of the eve[eveS2Dgt-2] locus as well as a map of the predicted binding sites. Figure S7 contains a multi-species comparison of the eve stripe 2 enhancer. Table S1 describes all the individual embryos analyzed in this project, and File S4 contains the PointCloud files for each embryo, which includes the positions of all the nuclei in each embryo and the expression values for therein. Supplemental material available at figshare: https://doi.org/10.25387/ g3.13010030.

RESULTS
The minimal and extended eve stripe 2 enhancers drive different patterns and levels of expression To test the effects of mutations in the minimal and extended eve stripe 2 enhancer on expression, we began by characterizing the wild-type (WT) expression patterns driven by the previously-defined minimal (minWT) and extended (extWT) enhancers ( Figure 1B-F). The minimal enhancer is 484 bp and was identified as the smallest piece sufficient to drive expression in the region of stripe 2 (Small et al. 1992). The extended enhancer boundaries were chosen as the two conserved blocks of 18 and 26 bp on the 39 and 59 sides of the minimal enhancer, resulting in a 798 bp piece (Ludwig et al. 1998). We generated transgenic animals with lacZ reporter constructs inserted into the same location of the genome, and we measured lacZ expression using in situ hybridization and a co-stain for normalization (Wunderlich et al. 2014). Embryos in the first quarter of nuclear cycle 14 (nc-14) were analyzed because our normalization technique is most accurate during this time period (Wunderlich et al. 2014). Moreover, key eve regulators, including Giant, are expressed by this time (Petkova et al. 2019). The stripe driven by the extended enhancer is widerits anterior boundary is 1.6 cell widths more anterior than that of the minimal enhancer ( Figure 1F). In addition, the peak lacZ expression driven by the extWT is 1.45 times higher than the minWT enhancer (p-value = 0.037, one-sample t-test comparing extWT/ minWT ratio to 1; Figure 1D, E).
The gt-2 transcription factor binding site deletion is buffered in the extended enhancer To test the effect of mutations in the minimal and extended enhancers, we looked to the literature to find a known sequence mutation that had a measurable effect on expression in the minimal enhancer. Previous work identified three footprinted binding sites within the minimal enhancer for the repressor Giant (Gt), which is expressed anterior of eve stripe 2 (Small et al. 1992) (Figure 1G). A minimal enhancer with a deletion of one of these binding sites, gt-2, drives higher and broader anterior expression than the WT enhancer (Arnosti et al. 1996).
We chose to focus our work on gt-2 instead of the other Giant binding sites for two reasons: (1) we wanted to only mutate one TF binding site to best simulate natural population variation and (2) deletion of gt-2 resulted in the greatest effect of eve stripe 2 expression (Arnosti et al. 1996). We created reporters with the same deletion of gt-2 as in Arnosti et al. 1996 in the minimal and extended enhancers ( Figure 1H) and measured the effect of the deletion on both expression levels and patterns. Consistent with previous results, we found that minDgt-2 drives 1.67 times the expression of the minWT enhancer (p-value = 0.0018, one-sample t-test comparing minDgt-2/minWT ratio to 1, in Figure 1I, top), and a pattern that is expanded 1.7 cell widths to the anterior ( Figure 1H, top). Notably, in the Arnosti et al. (1996) study, the authors observed a large anterior expansion in eve stripe 2 when gt-2 was deleted in mid-blastoderm embryos. In early blastoderm embryos, we observe a more modest anterior expansion. The more modest expansion is likely because we are collecting data when Gt levels are lower and prior to eve expression refinement, when eve stripe 2 shifts to the posterior (Petkova et al. 2019).
In contrast, the expression level driven by the extDgt-2 enhancer is not significantly different from the extWT enhancer (p-value = 0.45, one-sample t-test comparing extDgt-2/minWT ratio to 1; Figure 1I, bottom), and the pattern is expanded by only 0.9 cell widths ( Figure  1H, bottom). The minDgt-2/minWT expression ratio is also significantly larger than the extDgt-2/extWT ratio (p-value = 0.0032, one-sided, two-sample t-test with unequal variances comparing minDgt-2/minWT to extDgt-2/extWT), indicating that the deletion has a much larger effect on the expression level driven by the minimal enhancer than by the extended enhancer. Together, these results indicate that the effect of the gt-2 binding site deletion is buffered in the extended enhancer.

Distance from the promoter reduces expression levels and does not explain buffering
The minimal and extended enhancers differ from one another in the flanking sequences. These flanks may contribute to buffering in two primary ways: 1) the flanks may contain TF binding sites or other specific sequence elements, and 2) the flanks increase the distance of the minimal piece from the promoter.
In the minWT constructs the enhancer is 38 bp from the promoter, whereas in the extWT constructs the same minWT sequence is located 302 bp away from the promoter. To test if this change in distance contributes to the differences in expression of the two constructs, we inserted two different 264 bp spacer sequences (sp1 and sp2) into the minWT reporters, to make the constructs minWT-sp1 and minWT-sp2 (Figure 2A). The two distinct spacers, sp1 and sp2 are lacZ sequences from which high affinity binding sites for the best known regulators involved in eve stripe 2 expression have been removed. For both spacers, increasing the distance of the minWT sequence significantly reduces expression levels, (sp1: p-value= 3.6e-4; sp2: p-value = 5.0e-4, one-sample t-tests comparing each ratio to 1; Figure 2C), while only minimally affecting the AP positioning. The anterior and posterior boundaries of the minWT-sp1 are shifted to the posterior part of the embryo by 1.4 and 1.3 cell lengths, respectively, when compared to minWT ( Figure 2B). The anterior and posterior boundaries of minWT-sp2 are shifted to the posterior by 1.0 and 1.1 cell lengths, respectively, when compared to minWT ( Figure 2B). These data demonstrate that the level of expression driven by minWT is influenced by enhancer-promoter distance.
To test if promoter-enhancer distance explains the buffering of the gt-2 deletion, we made a construct with the minDgt-2 enhancer separated from the promoter by sp1, minDgt-2-sp1, and compared it to minWT-sp1 ( Figure 2D). If the distance from the promoter contributes to the buffering effect, the expression ratio of minDgt-2-sp1/minWT-sp1 would be smaller than that of minDgt-2/minWT, and the spatial pattern between minWT-sp1 and minDgt-2-sp1 would be more similar than between minWT and minDgt-2. In fact, the opposite is truethe ratio is larger (p-value = 0.0025, one-sided, two-sample t-test with unequal variances comparing minDgt-2-sp1/ minWT-sp1 to minDgt-2/minWT), and the spatial pattern is different ( Figure 2E, F). This finding is surprising and suggests a change in regulatory information integration when both gt-2 is deleted and the enhancer-promoter distance is increased (see Discussion). Together this indicates that the relative distance of the core 484 bp to the promoter does not contribute to the buffering in the extended piece.
An additional Gt binding site in the flanking sequence partially explains the buffering Since promoter-enhancer distance does not explain the buffering of the extended enhancer, the buffering must be due to differences in the sequence content of the minimal and extended enhancers. We hypothesized that there might be additional Gt binding sites in the flanks of the extended enhancer that explain the observed buffering of the gt-2 deletion. We scanned these flanking regions with three existing Gt position weight matrices (PWMs) and found one binding site downstream of the minWT sequence that was common to all the PWMs, which we call gt-4. We suspected this site was most likely to be bound in vivo (see Materials and Methods and Figure S1). We mutated the common site to make the extDgt-2,Dgt-4 construct ( Figure 3A). If this common site contributes to buffering, we would expect that the extDgt-2,Dgt-4 construct would drive higher expression levels and a wider stripe than the extWT construct. The extDgt-2,Dgt-4 enhancer drives a pattern with an anterior boundary that is not significantly different from the extDgt-2 enhancer ( Figure 3B). However, compared to the peak expression levels driven by the extWT enhancer, the extDgt-2,Dgt-4 enhancer drives 1.2 times more expression (p-value = 0.065, one-sample t-test comparing extDgt-2,Dgt-4/extWT ratio to 1) ( Figure 3C). Because the peak expression ratio of extDgt-2,Dgt-4/extWT is between that of minDgt-2/minWT and extDgt-2/extWT, this result suggests that the additional gt-4 binding site is partially responsible for buffering the effect of the gt-2 deletion on expression levels ( Figure S1). However, since the extDgt-2 and extDgt-2,Dgt-4 enhancers drive virtually the same expression pattern, this binding site is not responsible for buffering the effect of gt-2 deletion on expression pattern. Therefore, this additional gt-4 binding site can only partially explain why the extended enhancer can buffer the effect of the gt-2 deletion. Additional Gt binding sites, other TF binding sites, or other functional sequences in the extended enhancer sequence flanks may be responsible for the unexplained buffering (see Discussion).
Adding a Gt binding site to the minimal enhancer is not sufficient to buffer a Gt mutation Since the additional gt-4 site is necessary to partially buffer the gt-2 deletion, we wanted to test whether it was also sufficient. We inserted the additional gt-4 binding site into the spacer of the minDgt-2-sp1 construct in the same position as gt-4 in the extWT construct to make the minDgt-2-sp1+gt-4 construct ( Figure 3D). We compared its expression to the minWT-sp1 and the minDgt-2-sp1 constructs. If the additional gt-4 site is sufficient to buffer the gt-2 deletion, we would expect that the minDgt-2-sp1+gt-4 would drive lower expression levels than minDgt-2-sp1 and a similar expression pattern to the minWT-sp1 construct. We found that the peak expression ratio of minDgt-2-sp1+gt-4/minWT-sp1 is on average lower, but not significantly different from the minDgt-2-sp1/minWT-sp1 ratio, indicating that this binding site alone is not sufficient to buffer the gt-2 deletion (p-value = 0.17, one-sided, two-sample t-test with unequal variances) ( Figure 3F). The expression patterns driven by minDgt-2-sp1+gt-4 and minDgt-2-sp1 are also very similar, though there is a slight posterior shift of the anterior boundary in the minDgt-2-sp1+gt-4 construct ( Figure 3E). It is possible that this gt-4 binding site needs its original context to function properly, which may be due to the importance of binding site flanks on DNA shape (Rohs et al. 2010;Li and Eisen 2018), or other, unknown requirements.
The gt-2 transcription factor binding site mutation is buffered in the endogenous locus To test whether the gt-2 deletion can be buffered in the intact locus, as it is in the extended enhancer, we used CRISPR editing to generate flies homozygous for the same gt-2 deletion in the endogenous eve locus, which we called Dgt-2 eve locus ( Figure 4A, Figure S6). We then measured eve expression patterns and levels using in situ hybridization in the Dgt-2 eve locus embryos and WT eve locus embryos (see Methods for details). To measure expression levels in eve stripe 2, we internally normalized to the levels of eve stripe 1, which is the first eve stripe to be expressed in this developmental stage ( Figure S4; see Figure 2 Distance from the promoter reduces eve stripe 2 expression levels and is not sufficient to explain the buffering. (A) To test if distance from the promoter contributes to buffering the gt-2 deletion, we used two different 264 bp spacer sequences (sp1 and sp2) to make two constructs, minWT-sp1 and minWT-sp2. (B) We find that moving the minimal enhancer away from the promoter slightly shifts the boundaries of the stripe to the posterior. Error bars show standard error of the mean boundary positions of the expression pattern. (C) A comparison of peak expression levels shows that moving the minimal enhancer away from the promoter reduces peak expression levels in both spacer constructs (p( minWT-sp1 minWT .1) = 3.6 e-4, p( minWT-sp2 minWT .1) = 5.0 e-4; one-sided, one-sample t-test; p( minWT-sp1 minWT . minWT-sp2 minWT ) = 0.18; one-sided, two-sample t-tests with unequal variances). (D) We tested if distance from the promoter is sufficient to explain the gt-2 site deletion buffering in the extended enhancer by introducing the gt-2 site deletion into the minWT-sp1 construct, minDgt-2-sp1. (E) The minDgt-2-sp1 construct drives an expression pattern that is dramatically shifted to the anterior, indicating that the spacer cannot buffer the gt-2 binding site deletion's effect on expression pattern. (F) The minDgt-2-sp1/minWT-sp1 peak expression ratio is significantly larger than minDgt-2/minWT ratio, indicating that the gt-2 deletion has a more dramatic effect in the minDgt-2-sp1 and that increasing distance from the promoter does not buffer the effects of the gt-2 deletion (p( minDgt-2-sp1 minWT -sp1 =1) = 9.3 e-4, one-sample t-test; p( minDgt-2-sp1 minWT-sp1 , minDgt-2 minWT ) = 0.0025, one-sided, two-sample t-test with unequal variances).
Methods for details). The expression levels of eve stripe 2 in embryos with Dgt-2 eve locus are not significantly different from those in embryos with WT eve locus (p-value = 0.1007, Mann-Whitney U-test with Bonferroni correction; Figure 4C). The eve stripe 2 patterns driven by the WT eve locus and the Dgt-2 eve locus are also not significantly different ( Figure 4B). This suggests that the gt-2 deletion in the endogenous eve stripe 2 enhancer is buffered: expression levels and boundary position in the Dgt-2 eve locus embryos are not significantly different from the WT eve locus embryos, in agreement with the observations made in the extended enhancer. Interestingly, we observed differences between Dgt-2 eve locus and WT eve locus on other stripes of the eve pattern ( Figure S5). There are differences in the expression levels of eve stripes 5 and 6, and in the patterns of eve stripe 4 ( Figure S5). We speculate that the differences might be due to the effects of the genetic backgrounds of Dgt-2 eve and WT eve locus embryos (see Methods), which is consistent with previous findings of background effects (Lott et al. 2007;Kalay et al. 2020). All together, these results suggest that the effect of a specific mutation in the eve stripe 2 minimal reporter construct is not recapitulated when tested in the endogenous enhancer context.

DISCUSSION
The desire to define discrete minimal sequences that are sufficient to drive gene expression patterns emerged from a combination of the technical limitations imposed upon early studies and the resulting "founder fallacy" (Halfon 2019), cementing the first discovered examples of enhancers into generalizations. Understanding and acknowledging the ways in which the activity of minimal enhancers in reporter constructs differs from the activity of the same sequences within the endogenous locus will help us understand gene regulatory logic at a genome scale, as well as regulatory variation and evolution. Simultaneously, it also reaffirms the important contributions that reporter constructs can still make to deciphering the mechanisms of transcription (Catarino and Stark 2018).
Using one of the textbook examples of an enhancer, eve stripe 2, we have shown that deletion of a key TF binding site for Gt has Figure 3 An additional Gt binding site partially explains the buffering. (A) We found an additional predicted Gt binding site outside the minimal enhancer sequence but within the extended enhancer, which we called gt-4. A reporter construct, extDgt-2,Dgt-4, testing for the necessity of the additional Gt binding site was made by mutating the predicted gt-4 binding site. (B) The average position of the lacZ anterior boundaries was nearly identical for the extDgt-2 and extDgt-2,Dgt-4 constructs, indicating that eliminating the additional gt-4 binding site does not affect buffering of the gt-2 deletion on expression pattern. Error bars show standard error of the mean boundary positions of the expression pattern. (C) If the additional gt-4 site was necessary and sufficient for the buffering, the extDgt-2,Dgt-4/extWT ratio would be higher than 1 and very similar to minDgt-2/minWT ratio. If the additional gt-4 binding site was not necessary at all, the extDgt-2,Dgt-4/extWT ratio would be similar to 1 and to the extDgt-2/extWT ratio. The results suggest that the additional gt-4 site explains only some of the buffering of the gt-2 deletion on expression level (p( extDgt-2;Dgt-4 extWT =1) = 0.065, one-sample t-test; p( extDgt-2;Dgt-4 extWT , extDgt-2 extWT ) = 0.052, one-sided, two-sample t-test with unequal variances). (D) We tested the sufficiency of the additional gt-4 binding site by making a construct with the minDgt-2-sp1 element and inserting the additional gt-4 binding site in the same position relative to the promoter as in the extended enhancer (minDgt-2-sp1+gt). (E) The additional gt-4 binding site shifts the anterior boundary of expression slightly to the posterior, when compared to the pattern driven by minDgt-2-sp1. (F) The peak minDgt-2-sp1+gt-4/minWT-sp1 ratio is lower, but not significantly different from the minDgt-2-sp1/minWT-sp1 ratio, indicating that this gt-4 binding site is not sufficient to explain the buffering of expression level in the extended enhancer (p( minDgt-2-sp1þgt-4 minWT-sp1 =1) = 0.23, one-sample t-test; p( minDgt-2-sp1þgt-4 minWT-sp1 . minDgt-2-sp1 minWT-sp1 ) = 0.17, onesided, two-sample t-test with unequal variances). significant effects on the expression driven by the minimal enhancer sequence, but not when this minimal enhancer is modestly extended, nor when the same binding site is removed from the endogenous locus. Furthermore, we identified an additional Gt binding site found outside the minimal enhancer that contributes to buffering the effect of this mutation.
Given that there were no characterized Gt binding sites in the region flanking the minimal enhancer, it was somewhat unexpected that the effect of the gt-2 deletion would be buffered in the extended enhancer (Ludwig et al. 2011). However, finding all transcription factor binding sites remains a challenge (Keilwagen et al. 2019) and may explain why we cannot fully account for the gt-2 deletion buffering in the extended enhancer (Keilwagen et al. 2019). Gt's binding preference has been measured using several techniques, which all yield different sequence motifs (Noyes et al. 2008;Li et al. 2011;Schroeder et al. 2011). We searched for Gt binding sites with three different sequence motifs, and we found and mutated a single high-affinity binding site predicted by all three motifs, gt-4 ( Figure S1). The gt-4 site is conserved across multiple Drosophila species ( Figure S7). There are additional predicted Gt binding sites and other conserved regions within these extended sequences that may be contributing to the buffering and to enhancer function.
We do not understand why the minimal spacer constructs that include the gt-2 deletion show a large anterior shift of the posterior boundary of the expression pattern. The shift is not observed in the wild-type minimal spacer constructs or in the minDgt-2 or extDgt-2 constructs, so it is not due to the spacer sequence or to the gt-2 deletion individually. We hypothesize there is a specific promoterenhancer interaction that occurs when both the spacer and the gt-2 deletion are present, but we cannot speculate on the precise underlying cause of this interaction.
This simple case study illustrates clearly that the effects of mutations, as measured in minimal enhancer sequences, cannot be simply extrapolated to larger enhancer regions or to the enhancer in its endogenous context in the genome. These results provide additional evidence challenging the idea that enhancers are strictly modular and that they have defined boundaries (Evans et al. 2012;Halfon 2019;Sabarís et al. 2019). Experiments using minimal enhancer reporter constructs have been extremely valuable for identifying genetic interactions and mechanisms of transcriptional control, e.g., activator/repressor balance and short-and long-range repression (Arnosti et al. 1996;Kulkarni and Arnosti 2005;Vincent et al. 2018). However, as more high throughput methods are developed to test the effect of mutations in small to medium-size enhancer fragments, we need to be cautious in interpreting these results (Inoue and Ahituv 2015). A mutation that may have dramatic effects on expression when made in a minimal enhancer may have no effect when made in the genome of an organism, which has implications for how we interpret naturally occurring sequence variation in the context of human disease and evolution.
To test the mutation effects definitively, reporter construct experiments need to be complemented with manipulations of the endogenous enhancer sequences. Due to the CRISPR revolution, these types of experiments are becoming increasingly feasible (Zhou et al. 2014;Kvon et al. 2016;Rogers et al. 2017), and methods are being developed to use high-throughput CRISPR experiments to identify and perturb enhancers, as reviewed in (Lopes et al. 2016;Catarino and Stark 2018). These experiments will provide the data to attack the challenge of modeling the function of increasingly large pieces of the genome simultaneously, which is ultimately required to predict how variation in enhancer sequences affects gene expression.

ACKNOWLEDGMENTS
We thank all the members of the DePace lab for helpful discussions and suggestions on the manuscript. The research reported in this publication was funded by the Harvard GSAS Research Scholar Initiative (to F.L.R.), NIH grants K99/R00 HD073191 and R01 HD095246 (to Z.W.), the NSF Graduate Research Fellowship Figure 4 CRISPR deletion of the gt-2 binding site in endogenous eve locus does not change stripe 2 expression. (A) Using a scarless-CRISPR method, we removed the gt-2 binding site endogenously. (B) The boundaries of eve stripe 2 are not significantly different between the Dgt-2 eve locus embryos and the WT locus embryos (Mann-Whitney U-test). Error bars show standard error of the mean boundary positions of the expression pattern. This indicates that the boundary of eve stripe 2 in the endogenous context is buffered against the removal of gt-2. (C) Normalized peak expression levels of eve stripe 2 did not change significantly in the Dgt-2 eve locus vs. control (P = 0.10, Mann-Whitney U-test with Bonferroni adjustment, 8). The ratio of Dgt-2 eve locus to WT eve locus equals 1.18. Filled circles represent mean expression level and open circles are eve peak expression for each individual embryo analyzed (Dgt-2 eve locus: n = 11, WT eve locus: n = 11).