FlyORF-TaDa allows rapid generation of new lines for in vivo cell-type-specific profiling of protein–DNA interactions in Drosophila melanogaster

Abstract Targeted DamID (TaDa) is an increasingly popular method of generating cell-type-specific DNA-binding profiles in vivo. Although sensitive and versatile, TaDa requires the generation of new transgenic fly lines for every protein that is profiled, which is both time-consuming and costly. Here, we describe the FlyORF-TaDa system for converting an existing FlyORF library of inducible open reading frames (ORFs) to TaDa lines via a genetic cross, with recombinant progeny easily identifiable by eye color. Profiling the binding of the H3K36me3-associated chromatin protein MRG15 in larval neural stem cells using both FlyORF-TaDa and conventional TaDa demonstrates that new lines generated using this system provide accurate and highly reproducible DamID-binding profiles. Our data further show that MRG15 binds to a subset of active chromatin domains in vivo. Courtesy of the large coverage of the FlyORF library, the FlyORF-TaDa system enables the easy creation of TaDa lines for 74% of all transcription factors and chromatin-modifying proteins within the Drosophila genome.


Introduction
Characterizing the specific protein-DNA interactions that underlie gene expression is essential for understanding the biology of any given tissue. The last decade has seen a change in thinking regarding the action of transcription factors (TFs), proteins that bind to enhancer and promoter regions of genes and modify the level of gene expression. In particular, it is now established that TFs work in complex groups or communities to control gene expression (for review, see Reiter et al. 2017). A key recent finding has been that the majority of TFs can act as either an activator or as a repressor, with their function determined by the surrounding TF community (Stampfel et al. 2015). As such, the ability to profile modules of TFs is critical to understanding regulatory function.
A major challenge in undertaking the systematic profiling of TF binding within a cell is the availability of reagents. For ChIPseq, a lack of appropriate antibodies is a considerable impediment, combined with the difficulty of profiling TFs that are not directly bound to DNA. These issues are solved by DamID, a technique in which the TF of interest is expressed as a fusion protein with Escherichia coli DNA adenine methylase (Dam) and the resulting enriched adenine methylation surrounding the TF-binding sites profiled. DamID effectively profiles all TFs coming into proximity with DNA and requires no antibodies for profiling (for review, see Aughey et al. 2019).
Targeted DamID (TaDa) allows DamID to be applied in a celltype-specific manner to profile cellular transcriptional machinery (Southall et al. 2013), chromatin-modifying proteins (Marshall and Brand 2017), nuclear structural proteins, and TFs (Doupé et al. 2018). The technique allows the binding profile of any protein associated with DNA to be mapped in living organisms without the concern of fixation-induced artifacts, from very small amounts of material (10,000 cells being enough to generate highquality profiles) and without the need for cell-sorting (Marshall et al. 2016). Profiling is performed in vivo using the GAL4/UAS system (Brand and Perrimon 1993) to provide cell-type specificity. As the GAL4/UAS system drives strong expression of transgenes, and high levels of Dam can be toxic in eukaryotic cells, TaDa uses a bicistronic transcript to greatly reduce the translation of Dam-fusion proteins (Southall et al. 2013). In this system, a long primary open reading frame (ORF) is separated from the Damfusion ORF by two stop codons and a frame shift, with translation of the latter arising through low rates of spontaneous ribosomal re-initiation. To generate DNA-binding profiles, GAL4 driver lines are crossed to lines carrying TaDa Dam-fusion proteins. However, the time and costs of generating the transgenic fly lines required for TaDa profiling are considerable.
The FlyORF library of fly lines (Bischof et al. 2013) contains inducible ORFs that cover 74% (560/757) of all known and predicted TFs and chromatin-associated proteins in the Drosophila genome (Bischof et al. 2013(Bischof et al. , 2014. The library was designed with cassette exchange features, presenting the possibility of converting existing FlyORF lines into TaDa lines to profile DNA binding via a simple cross. Here, we describe FlyORF-TaDa, a new system that allows the conversion of FlyORF lines to TaDa lines via Flippasemediated cassette exchange. FlyORF-TaDa permits the rapid and easy generation of new TF-binding profiles without cloning or the creation of transgenic animals, with recombinants easily identified by eye color and fluorescence.
pTaDaG2-MRG15 was generated by inserting a synthetic gBlock (IDT) containing the sequence of MRG15-RA into the pTaDaG2 vector cut with XbaI/XhoI. All plasmids were sequence-verified via Sanger sequencing (ABI). Plasmid maps were generated using SnapGene software (Insightful Science).

FlyORF cassette exchange
Homozygous hs-FlpD5;FlyORF-TaDa virgin females were crossed to males from FlyORF lines. Progeny (third instar, 96 h after larval hatching) were heat shocked at 37 C for 60 min. After eclosion, F1 male flies were crossed to TM3/TM6B virgin females. F2 males and virgin females with TM6B and exhibiting the correct eye phenotype (w-;3xP3-dsRed2þ) were crossed to establish a balanced stock.

Targeted DamID
Appropriate lines (for conventional TaDa: TaDaG2-MRG15 and TaDaG2-Dam; for FlyORF-TaDa: FlyORF-TaDa-MRG15 and FlyORF-TaDa lines) were crossed to worniu-GAL4;tub-GAL80ts virgin females in cages. Embryos were collected on apple juice agar plates with yeast over a 4-h collection window and grown at 18 C for 2 days. Newly hatched larvae were transferred to food plates for a further 5 days at 18 C, before shifting to 29 C for 24 h.
Larval brains were dissected in PBS, and processed for DamIDseq as previously described (Marshall et al. 2016;Marshall and Brand 2017) with the following modifications. Briefly, DNA was extracted using a Quick-DNA Miniprep plus kit (Zymo), digested with DpnI (NEB) overnight, and cleaned up with a PCR purification kit (Machery-Nagel), DamID adaptors were ligated, digested with DpnII (NEB) for 2 h, and amplified via PCR using MyTaq DNA polymerase (Bioline).

Bioinformatic analyses
DamID-binding profiles were generated from NGS reads using damidseq_pipeline (Marshall and Brand 2015) and visualized using pyGenomeTracks (Ramírez et al. 2018). Peaks were called using a three-state hidden markov model via the hmm.peak.caller R script (freely available from https://github.com/owenjm/ hmm.peak.caller).
Heatmaps were generated via the ComplexHeatmap R package (Gu et al. 2016).
Gene ontology (GO) enrichment plots were performed using the ClusterProfiler R package with Bonferroni-Holm-adjusted Pvalues; enrichmap plots were generated by limiting GO terms to <1000 genes and using the simply() function to remove redundancy (Yu et al. 2012).
MRG15 enrichment by chromatin state (significance and odds ratio) was assessed via Fisher's exact test with an alternate hypothesis of "greater", for contingency tables of genomic coverage of MRG15 vs genomic coverage of the chromatin state in neural stem cells (NSCs). All P-values were Bonferroni-Holm adjusted. All other plots were generated using R (R Core Team, 2020).

Data availability
The sequence of pFlyORF-TaDa is available under Genbank accession number MT733231, and the plasmid DNA is available upon request. DamID-seq data and processed bedgraph and analysis files have been deposited in NCBI GEO, accession number GSE159632. The w;þ;TaDaG2-MRG15, w;þ;TaDaG2-Dam, and w;þ;FlyORF-TaDa-MRG15 fly lines are available upon request; the w;hsFlp-D5;FlyORF-TaDa and w;þ;FlyORF-TaDa lines have been deposited with the Bloomington Drosophila Stock Center.

Design of the FlyORF-TaDa system
The FlyORF library of inducible ORFs incorporates an FRT5 site immediately upstream of each ORF, allowing Flippase-mediated exchange with an upstream donor cassette in the same genomic insertion site (Bischof et al. 2013). To convert these lines to TaDa lines, FlyORF-TaDa provides the UAS-inducible bicistronic transcript of TaDa vectors upstream of an FRT5 site as the donor cassette. FlyORF-TaDa uses the TaDaG2 version of the TaDa bicistronic transcript, in which the primary ORF is myristoylated GFP, providing the advantage of easily-detectable fluorescent labeling of cells being profiled within experimental samples (Delandre et al. 2020).
The bicistronic cassette is followed by an FRT5 site in frame with Dam ( Figure 1A and Supplementary Figure S1A). Without recombination, a stop codon positioned directly after the FRT5 site allows the FlyORF-TaDa line to be used as a Dam-only control for DamID signal normalization (Supplementary Figure S1B). Following recombination with a FlyORF line, the FRT5 site together with the Gateway cloning residual attB1 site present as part of the FlyORF library construction (Bischof et al. 2013) becomes a 23-amino-acid protein linker region (Supplementary Figure S1C).
To prevent the need to PCR-screen progeny, the vector was additionally designed with a 3xP3-dsRed2 eye marker upstream of the TaDa cassette, and the mini-white eye marker downstream of the FRT5 site ( Figure 1A and Supplementary Figure S1A). While both parental strains used for a conversion are wþ, upon Flippase-mediated strand exchange, successful recombinants will be w-,dsRed2þ making screening of recombinants by eye color fast and straightforward. [We note that the FlyORF lines contain a 3xP3-RFP marker as part of the landing site (removed in the FlyORF-TaDa lines) making screening for recombinants on eye fluorescence alone difficult, although we observe significantly more intense eye fluorescence from the 3xP3-dsRed2 marker.] The pFlyORF-TaDa vector was inserted into ZH-86FB, the same insertion site used by the FlyORF library, and the resulting line was crossed with a heat-shock-inducible Flippase (hs-FlpD5) line to yield an hs-FlpD5;FlyORF-TaDa donor line. TaDa alleles are generated by crossing any FlyORF line to this donor line, in conjunction with a heat-shock of progeny during the larval phase. Crossing the resulting adult males to virgin females with chromosome 3 balancers yields a typical recombination frequency of 50% (combined totals for two separate FlyORF-TaDa conversions: 7 vials with at least one recombinant/14 total vials scored with a minimum of 50 progeny). An illustration of the crossing scheme is shown in Supplementary Figure S2.

Profiling of MRG15 binding in neural stem cells using the FlyORF-TaDa system
To determine whether FlyORF-TaDa lines generated through the system faithfully profiled binding in a cell-type-specific manner, we obtained binding profiles for the chromatin-binding protein MRG15 in NSCs. We obtained profiles from a FlyORF-TaDa-MRG15 and compared these to MRG15 profiles generated using conventional TaDa, driving expression in both cases with the NSCspecific driver, worniu-GAL4. Excellent correlation between samples was observed throughout ( Figure 1B), both between individual replicates (Supplementary Figure S3) and when comparing the average profiles from the two systems ( Figure 1B). In particular, the two biological replicates generated using the FlyORF-TaDa system showed extremely high reproducibility (Pearson's correlation between replicates, 0.95; Supplementary Figure S3).
MRG15 is a protein associated with the H3K36me3 histone mark (Zhang et al. 2006), which in turn is associated with transcribed exons (Kolasinska-Zwierz et al. 2009;Kharchenko et al. 2011). The protein is also a key component of the "Yellow" active chromatin state described in flies by Filion et al. (2010), a study that reduced chromatin configurations in Drosophila to five broad classification types, or colors. In concordance with these observations, we found MRG15 bound at previously published Yellow chromatin domains in NSCs (Marshall and Brand 2017) (Figure 2A), with MRG15-bound peaks covering 80% (9.8 Mb of 12.2 Mb) of Yellow chromatin. While MRG15 occupancy was significantly enriched to some degree over all three active chromatin states in NSCs, by far the greatest enrichment was observed over Yellow chromatin [ Figure 2B; log e h (Yellow chromatin) ¼ 2.91, Fisher's exact test].
Genes covered by bound peaks were enriched for GO functions associated with metabolism ( Figure 3 and Supplementary Figure  S4), a known association for genes covered by Yellow chromatin, both in cell lines (Filion et al. 2010;van Bemmel et al. 2013) and in NSCs (Marshall and Brand 2017). Importantly, we also observed an enrichment for genes involved in nervous system development expressed in NSCs (Figure 3), consistent with occupancy over transcribed exons and indicating that the FlyORF-TaDa system can faithfully profile cell-type-specific protein binding in vivo.

Limitations
Although versatile, some potential limitations of the FlyORF-TaDa system should be considered when using the converted lines for profiling. In particular, the FlyORF lines that are convertible to TaDa lines (i.e. lines that include an FRT5 site) all incorporate a long (22aa) flexible linker combined with 3xHA tags at the C-terminal. There is some evidence that the presence of HA tags may disrupt some aspects of protein function in unusual cases. For the FlyORF library, the presence of these tags was shown-albeit in overexpression experiments-to cause a gain of function in 11% of lines, and a failure to cause a phenotype in 22% of lines, when compared to overexpression of the untagged variant (Bischof et al. 2013). A related study of GFP-and FLAG-tagged proteins from the FlyFos library of lines suggested that 33% of tagged lines were unable to rescue the corresponding deficiency, although this failure was ascribed to the lack of long-range regulatory elements of complex developmental TFs rather than the presence of tags themselves (Sarov et al. 2016). It is unknown whether in any of these cases protein-DNA binding was affected. Nevertheless, it is possible that in a minority of cases the presence of the HA tags may lead to unrepresentative binding profiles.
Another consideration is that the FlyORF-TaDa system only generates fusions with Dam fused to the N-terminus of the ORF protein (again via a long linker generated via the FRT5 site). It remains unclear as to whether Dam can disrupt TF binding in some circumstances, although previous studies have shown that Dam-fusion proteins yield binding profiles broadly comparable to ChIP-seq data obtained for the same protein (Cheetham et al. 2018;Tosti et al. 2018;Szczesnik et al. 2019).
Notably, the FlyORF libraries also include an FRT2 site immediately 3 0 of the ORF. Both the removal of the C-terminal linker and 3xHA tags, and the generation of C-terminal Dam-fusion proteins, would in the future be possible with different donor lines designed on a similar basis to the system presented here.  . GO term enrichment for genes covering MRG15-bound peaks identifies terms for both metabolism and developmental neurogenesis (nonredundant GO terms covering <1000 genes illustrated). Metabolism-related terms are colored in orange; developmental terms are colored in red. All terms illustrated were significantly enriched (all adjusted P-values <10 À28 ).

Conclusion
The FlyORF-TaDa system places fast and straightforward celltype-specific profiling of TF binding within the reach of any fly lab, allowing the profiling of over 74% of all TFs and chromatinassociated proteins via a simple genetic cross. Establishment of new recombinant lines from the parental FlyORF stocks is achieved in two generations, without the need for cloning, sequencing or transgenesis. The donor w;hs-FlpD5;FlyORF-TaDa and Dam-only control w;þ;FlyORF-TaDa lines have been deposited with, and will be available from, the Bloomington Drosophila Stock Center (BDSC).
With three quarters of all TFs covered by donor FlyORF lines, comprehensive wide-scale cell-type-specific profiling of TFbinding networks can be achieved. TaDa is ideally suited to profiling such networks without introducing the variability of different fixation and antibody isolation steps found in alternative methods such as ChIP-seq. The FlyORF-TaDa system now eliminates a major obstacle to this approach by dramatically reducing the time and costs required to generate the lines required to profile TF networks. To further this aim, we have successfully generated 17 new FlyORF-TaDa lines using the system and are creating a library of converted TF and chromatin factor lines that will be deposited in stages at the BDSC. We anticipate that this resource will prove highly useful to the Drosophila transcription and chromatin, and developmental biology communities.