ZNF146/OZF and ZNF507 target LINE-1 sequences

Abstract Repetitive sequences including transposable elements and transposon-derived fragments account for nearly half of the human genome. While transposition-competent transposable elements must be repressed to maintain genomic stability, mutated and fragmented transposable elements comprising the bulk of repetitive sequences can also contribute to regulation of host gene expression and broader genome organization. Here, we analyzed published ChIP-seq data sets to identify proteins broadly enriched on transposable elements in the human genome. We show 2 of the proteins identified, C2H2 zinc finger-containing proteins ZNF146 (also known as OZF) and ZNF507, are targeted to distinct sites within LINE-1 ORF2 at thousands of locations in the genome. ZNF146 binding sites are found at old and young LINE-1 elements. In contrast, ZNF507 preferentially binds at young LINE-1 sequences correlated to sequence changes in LINE-1 elements at ZNF507’s binding site. To gain further insight into ZNF146 and ZNF507 function, we disrupt their expression in HEK293 cells using CRISPR/Cas9 and perform RNA sequencing, finding modest gene expression changes in cells where ZNF507 has been disrupted. We further identify a physical interaction between ZNF507 and PRMT5, suggesting ZNF507 may target arginine methylation activity to LINE-1 sequences.


Introduction
In the wake of whole-genome sequencing, it has become clear that only a small portion of DNA codes for protein in higher organisms. While only about 1-2% of the roughly 3 billion bases in the human genome are protein-coding, an astounding 40-50% is derived from mobile repetitive sequences collectively known as transposable elements (TEs). Mobilization of TEs, sometimes referred to as "jumping genes," poses a threat to host genome stability and is known to sometimes cause disease (Kazazian and Moran 2017). In recent years, however, TEs and TE-derived sequences have been implicated as drivers of gene expression patterning, genome organization, development, and evolution in higher organisms (Rebollo et al. 2012;Hall et al. 2014;Lu et al. 2021).
TEs multiply and spread in host genomes by "cut-and-paste" transposition of their DNA sequence or through an RNA intermediate (retrotransposition). Ultimately, to preserve genomic integrity TEs are largely inactivated by various mechanisms encoded by the host genome. Relatively few mobile TEs persist in the human genome, primarily consisting of a hundred or so long interspersed nuclear elements (LINE-1 or L1) and SVA (short interspersed nuclear element-VNTR-Alu) repeats that require L1 encoded reverse transcriptase activity to mobilize (Brouha et al. 2003;Wang et al. 2005). Instead, most TE DNA in the genome consists of degenerate remnants of prior mobilization events that have been truncated or mutated and no longer harbor the activity necessary to "jump." Transposition-competent TEs are recognized by DNA-binding proteins and generally inactivated at the level of transcription, perhaps most notably by KRAB-containing zinc finger proteins that recruit repressive epigenetic modifications to chromatin and by DNA methylation (Yang and Wang 2016;Yang et al. 2017). TEs sometimes, however, escape this repression. In humans, L1s mobilize in the germline and are a major source of inheritable structural variation between individuals (Ewing and Kazazian 2010;Mir et al. 2015). Recent studies have revealed that L1 elements are also activated in specific contexts outside of the germline, such as stress, cancer, and neural development, with diseaseamplifying or functional implications (Coufal et al. 2009;Burns 2017;Faulkner and Garcia-Perez 2017;Kazazian and Moran 2017).
Irrespective of their mobility, abundant TE sequences are nonrandomly distributed in host genomes and contribute to development and gene expression patterning through recognition by DNA-binding proteins (Manuelidis and Ward 1984;Korenberg and Rykowski 1988;Bourque et al. 2008;Rebollo et al. 2012). For instance, HERV-H/MERV-L TEs are known to be expressed early during embryogenesis and are important regulators of pluripotency and embryogenesis by regulating the expression of neighboring genes, a process driven by sequence-specific recognition of TE DNA by transcription factors (TFs) (Kigami et al. 2003;Wang et al. 2014;Robbez-Masson and Rowe 2015). L1 elements are also known to be highly expressed in the early mouse embryo and contribute to proper embryonic development (Fadloun et al. 2013;Jachowicz et al. 2017).
While several DNA-binding proteins are known to target TEs, the complex picture of TF binding to TE sequences remains incomplete and the functional significance of these interactions on cell or organism biology is understudied, and largely unknown. Here, we analyze published ChIP-seq data sets to identify proteins that broadly recognize TE-derived sequences in the human genome. We focus on 2 poorly studied zinc finger proteins, ZNF146 (also known as OZF) and ZNF507, and characterize their recognition motifs within L1 repeats. Interestingly, we find ZNF146 is a highly conserved protein that retains targeting to thousands of relatively old L1 sequences in the genome. In contrast, ZNF507 was only observed at relatively young L1 sequences. We go on to functionally test the impact depletion of these proteins has on the transcriptome of HEK293 cells. Finally, we identify an interaction between ZNF507 and PRMT5 by coimmunoprecipitation and mass spectrometry, yielding potential insight into the function of ZNF507 in the cell.

TE enrichment analysis
All TF ChIP-seq experiments with a corresponding IDR threshold peak file were downloaded from ENCODE in March 2020 (hg19 only). For each experiment, peaks were shuffled 10 times using BEDtools shuffle excluding ENCODE blacklist regions and -noOverlapping -maxTries 1000 parameters (Quinlan and Hall 2010). Frequency of intersection between experimental or shuffled peaks and Repeatmasker repeat classes were calculated using BEDtools intersect and the -u parameter. Heatmaps were graphed in R.

LINE-1 enrichment analysis
Meta-analysis of ChIP enrichment at LINE-1 elements was performed using unique reads (fold change over control bigwig files downloaded from ENCODE). Primary alignments were also downloaded from ENCODE and coverage files of fold change over control coverage were also generated without filtering for uniqueness using deepTools bamCoverage and the -normalizeTo1X parameters (Ramirez et al. 2014). Meta-analysis was performed at full-length L1 elements annotated by L1Base or binned by subfamily from Repeatmasker (Penzkofer et al. 2017). Mappability tracks were downloaded from the UCSC table browser.
Coverage of peaks aligned to the L1PA1 consensus was performed by extracting reads (ENCODE) mapping to L1 elements using BEDtools intersect. Aligned sequences were then retrieved using BEDtools getfasta. Fasta sequences were then directly aligned to the L1PA1 consensus sequence using bwa mem and the -B 1 -O 1 -d 1 -T 1 -r 0.1 -t 12 -k 10 parameters (Li and Durbin 2009). Coverage was then calculated using Samtools depth. Signal over input was then calculated and graphed in R using ggplot2.

Motif discovery and scanning
Motif discovery was performed on ENCODE peak DNA sequences using MEME software and order-0 background, classic discovery mode, 0 or 1 occurrence, 13mer parameters (Bailey and Elkan 1994). Motif occurrences in the genome were found using the top identified motif for each ChIP experiment and FIMO software, filtered by the indicated P-value cutoff (Grant et al. 2011).

Zinc finger identification and multiple sequence alignments
Protein sequences were downloaded for alignment from UniProt. L1 sequences were downloaded from Dfam. Zinc fingers were predicted and annotated using online software (http://zf.prince ton.edu/, last accessed September 2021) (Persikov et al. 2009). Multiple sequence alignments were performed using ClustalX and Jalview with additional manual curation to focus on zinc finger residues and presentation.
Western blots were performed using antibodies against ZNF146 (Novus Biologicals) or ZNF507 (Thermo Fisher). RNA was isolated from cells disrupted with CRISPR/Cas9 in biological duplicate using Trizol RNA extraction reagent. RNA was treated with TURBO DNase (Thermo Fisher) to remove contaminating DNA and cleaned up using RNeasy Mini (Qiagen) kits according to manufacturer recommendations. RNA sequencing libraries were prepared using a stand-specific kit RNA-seq kit with ribosome depletion (KAPA Biosystems) and NEBNext Adapters and Multiplex Oligos for Illumina. Paired-end 50 bp sequencing was performed on the Illumina HiSeq 2000 platform (UMMS Core Facilities).
Reads were pre-processed and clipped using the Fastx Toolkit and those mapping to ribosomal RNA were removed prior to subsequent mapping with Bowtie. Nonribosomal reads were then mapped to the human genome (hg19) using TopHat2 software with -library-type fr-firststrand -no-coverage-search parameters and Ensembl (ver82) gene model annotations. Differential expression analysis was performed using count matrices generated by featureCounts and DeSeq2 (Liao et al. 2014). Associated graphs were generated using ggplot2 within R. The associated fastq files and count matrices for this experiment are available at GEO: GSE172285.
For immunofluorescence analysis, HEK293 cells were grown on coverslips in 6 well dishes and transfected with pcDNA3.1-3XFLAG-ZNF507 plasmid using lipofectamine reagent (Thermo Fisher). After 36 h cells were fixed with 4% paraformaldehyde in PBS for 10 min at room temperature. Coverslips were then incubated with a 1:100 dilution of M2 anti-FLAG antibody (Sigma) in 1% BSA, 1X PBS at 37 C for 1 h, then washed, and immunodetected using 1:500 dilution of conjugated secondary antibody, in 1X PBS with 1% BSA. Nuclei were visualized using Axiovert 200 microscope equipped with a 100X PlanApo objective (NA 1.4) and Chroma 83000 multibandpass dichroic and emission filter set (Bratteboro). Images were captured using a cooled charge-coupled device (CCD) camera (200 series, Photometrics). Images were minimally corrected for brightness and contrast using standard practices to best represent signals observed by eye using Zen (Zeiss) software. In cells with signal, 3xFLAG-ZNF507 consistently was localized to the nucleus. No appreciable signal was observed for cells transfected with empty vector control.
Immunoprecipitation of 3xFLAG-ZNF507 and associated proteins was performed by transfecting approximately 40 million HEK293 cells with either 3x-FLAG-ZNF507 expression plasmid or an empty vector control. After 36 h, cells were dislodged from flasks in ice cold 1X PBS and harvested in 10 ml falcon tubes. Cells were washed twice in ice cold PBS before lysis for 5 min on ice in 1 ml Lysis Buffer [10 mM HEPES pH 7.5, 1.5 mM MgCl2, 10 mM NaCl, 0.075% NP-40, 1 mM PMSF, 1X EDTA-free protease inhibitor cocktail (Roche)]. Crude nuclear pellets were then harvested by centrifugation at 4 C for 3 min at 1,000 Â g and removal of supernatant. Nuclei were then suspended in 1 ml Nuclear Extraction Buffer [20 mM Tris-HCl pH7.5, 500 mM NaCl, 1.5 mM MgCl2, 25% glycerol, 1 mM PMSF, 1X EDTA-free protease inhibitor cocktail (Roche)], vortexed briefly, and incubated 15 min on ice. Insoluble DNA and debris were then cleared from the nuclear extract by centrifugation at 4 C for 10 min at 12,000 Â g. To the supernatant an equal amount of Dilution buffer [20 mM Tris-HCl pH 7.5, 1 mM EDTA, 0.2% Triton-X 100, 1 mM PMSF, 1X EDTA-free protease inhibitor cocktail (Roche)] was added to lower salt concentration before immunoprecipitation.
Nuclear extracts were pre-cleared by incubation with 40 ml Protein A dynabeads (Sigma) for 1 h at 4 C. After capture of beads, 40 ml anti-FLAG M2 magnetic beads (Sigma) were added to each nuclear extract and incubated 6 h at 4 C. Beads and coimmunoprecipitated proteins were then captured and washed twice for 5 min with rotation at 4 C with Wash buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl) containing 0.1% Triton X-100 and once without detergent. Proteins were eluted twice using 50 ml of 3X-FLAG peptide at a concentration of 500 ng/ml and incubated with rotation at 4 C for 30 min for each elution.
Laemmli sample buffer (4X) was added to each elution to 1X final concentration and samples were incubated for 15 min at 80 C. Samples were then briefly run on the same 4%-20% Criterion Stain Free Tris-HCl Protein Gel separated by several lanes so that the dye front was 2 cm below each well. Lanes were cut out slightly above the dye to the well bottom for LC-MS/MS analysis (UMMS Mass Spectrometry Core Facility). Mascot was set up to search SwissProt_Human and Scaffold (Proteome Software Inc.) was used to validate MS/MS based peptide and protein identifications.

Identification of DNA-binding proteins that target TEs
In order to identify proteins that broadly target TEs we developed a strategy to test chromatin immunoprecipitation with sequencing (ChIP-seq) data sets for enrichment on TEs using a simple peak intersection strategy (Fig. 1a). For this study, we analyzed all ENCODE transcription factor (TF) ChIP-seq experiments (as of March 2020, see Supplementary Table 1) for which peaks had been identified, spanning a large number of transcription factors, cell types, and sequencing methodologies (ENCODE Project Consortium 2012; Imbeault et al. 2017). Peaks were then intersected with Repeatmasker annotated repeats for the most abundant TE classes in the human genome. Frequency of intersection between peaks for each experiment was then compared to frequency of intersection with background models generated by random shuffling to generate a ratio of observed elements relative to what would be expected by random chance (Fig. 1b).
While most peak/TE comparisons indicated TF binding to TEs occurred less frequently than expected by random chance (Fig. 1c), 361 experiments indicated peak enrichment of greater than 2-fold (observed/expected), 267 of which were enriched for a single TE class (Fig. 1d). Interestingly, enrichment was most frequently (>100 experiments) observed for ERVK transposons. We were additionally struck with the lack of TFs identified by this approach targeting LINE-1 elements. Just 2 experiments, ChIP-seq for GFP-ZNF146 in HEK293 and GFP-ZNF507 in A529 cells demonstrated 2-fold enrichment for LINE-1 elements. This enrichment was specific for LINE-1 as these proteins were not associated with other TE classes (Fig. 1e).
ZNF146 (also known as OZF or Only Zinc Fingers) and ZNF507 genes are both found on chromosome 19 and encode proteins with 10 and 9 C2H2 zinc finger motifs. Neither protein has any other easily identifiable domains (Fig. 1f), and apparently lack KRAB domains frequently found in zinc finger proteins known to silence TEs (Yang et al. 2017). Zinc finger motifs can contribute to protein-protein and protein-RNA interactions, but are best studied for their contribution to transcription factor binding to specific DNA sequences (Laity et al. 2001). ZNF146/OZF was previously identified as being overexpressed in certain cancers (Ferbus et al. 1999(Ferbus et al. , 2003. Still, neither ZNF146 or ZNF507 have been extensively studied and little is known about their function despite being predicted to be widely expressed in human cells (Uhlen et al. 2015(Uhlen et al. , 2017. LINE-1 sequences are widely distributed and highly abundant in the human genome. Most L1 sequences are truncated and degenerate, with varying divergence from transposon-competent consensus sequences. There are estimated to be just 100 transposition-competent L1 elements, 4,000 mutated but fulllength elements, and several hundred thousand truncated L1 fragments in the human genome (Brouha et al. 2003). Because of LINE-1 prevalence in the human genome, the small number and specificity of TFs identified by this approach, and the understudied nature of the 2 proteins identified, we decided to characterize the targeting of ZNF146 and ZNF507 to L1 elements in greater detail.

ZNF146 and ZNF507 target LINE-1 ORF2
The above analyses indicated ZNF146 and ZNF507 preferentially bind to LINE-1 sequences. In order to verify these observations, we performed additional in-depth analysis of ENCODE's ChIP-seq experiments. Consistent with L1 targeting, peaks for both ZNF146 and ZNF507 ChIP-seq experiments were found mostly (81% and 76% of peaks, respectively) within annotated L1 elements (Fig. 2a). Despite high frequency of targeting to L1 elements for both proteins, ZNF146 peaks and ZNF507 peaks themselves did not frequently coincide (Fig. 2b), suggesting they target different sequences within L1 elements.
To test this idea we performed meta-analysis of ChIP-seq coverage for both proteins at full-length L1 elements (Fig. 2c). Due to their repetitive nature, full-length L1 elements are often difficult to analyze by sequencing methods, however for these experiments ENCODE performed 100 bp paired-end sequencing, allowing for relatively high mappability at disrupted (degenerate or mutated), full-length elements (Fig. 2c, left panels). In contrast, intact (retaining coding potential) L1s had very low mappability (Fig. 2c, right panels).
Using uniquely mapping reads, concentrated enrichment of ChIP-seq signal was observed for both ZNF146 and ZNF507 at disrupted, full-length L1s, with ZNF507 enrichment most prominent toward the 5' end and ZNF146 toward the 3' end of these elements. Using primary reads (which allow for multimapping reads to be randomly assigned to 1 location), similar enrichments could be observed on intact L1 elements. Hence while it is not possible TcMar-Tigger log 2 enrichment (obs/exp) ZF 292aa 957aa (f) Fig. 1. Identification of TFs that bind transposable elements. a) Schematic of ENCODE TF ChIP-seq peak intersection with Repeatmasker annotated TEs. b) Heatmap of TF ChIP-seq enrichment on abundant TE classes calculated as the log 2 ratio of observed (obs) peak/TE intersections relative to simulated shuffled peaks (expected, exp). c) Density of enrichment across all ChIP-seq experiment peak and TE class comparisons. d) Bar chart of the number of TF ChIP-seq experiments demonstrating 2-fold enrichment binned by TE class. e) Bar chart of peak enrichment on TE classes for GFP-ZNF146 ChIP-seq in HEK293 cells and GFP-ZNF507 in K562 cells. f) Schematic representation of ZNF146 and ZNF507 proteins and the distribution of their C2H2 zinc finger domains.  Fig. 2. ZNF146 and ZNF507 target LINE-1 sequences. a) Pie chart representing the percentage of peaks in the indicated ChIP-seq experiments overlapping annotated LINE-1 elements. b) Venn diagram of peak interval intersections between the indicated ChIP-seq experiments. c) Meta-analysis of ChIP-seq enrichment at full-length disrupted (n ¼ 13,092) or intact (ORF2 or ORF1 and ORF2) LINE-1 elements (n ¼ 245) normalized to 5 kb. Below is the 100mer mappability score for the same regions (lower scores indicate lower mappability) d, e) Coverage of L1-mapping reads directly mapped to LINE-1 (human specific, PA1) consensus sequence. Below is a schematic of full-length LINE-1. f, g) Enriched motifs (see Materials and Methods) and corresponding LINE-1 (human specific, PA1) consensus sequence determined from ZNF146 and ZNF507 ChIP-seq experiments, respectively. h, i) Heatmap of ChIP-seq enrichment centered around occurrences of identified motifs (FIMO, P < 5e-6) found in annotated LINE-1 elements in their genomic context. On the right is the theoretical mappability of 100 bp reads in the same regions.
to ascertain whether ZNF146 and ZNF507 bind at individual, intact elements, these proteins likely target intact L1s as well.
We then resolved ZNF146 and ZNF507 binding by calculating coverage after performing local alignment of peak sequences to the L1PA1 (human specific) consensus sequence. LINE-1 transposons code for 2 proteins, ORF1p and ORF2p, both of which are required for transposition of LINE-1 (Moran et al. 1996). ORF1 encodes a chaperone while ORF2 encodes a reverse transcriptase essential for transposition of not only L1, but other nonautonomous TEs as well (Mathias et al. 1991;Kolosha and Martin 1997;Dewannieux et al. 2003;Ostertag et al. 2003). ZNF146 and ZNF507 peaks specifically aligned to the 3' and 5' ends of the ORF2 coding region of L1, respectively, with no enrichment within ORF1 (Fig. 2, d and e).
We further identified likely binding motifs within L1 ORF2 by performing motif discovery (Fig. 2, f and g). The most significant motif identified for each ChIP-seq experiment closely matched sequences in the L1PA1 consensus sequence at positions of highest ChIP-seq enrichment (Fig. 2, d-g). We then located all instances of the determined motifs for ZNF146 and ZNF507 in L1 elements genome-wide and performed meta-analysis of ChIP-seq signal at these motifs. As expected, robust ChIP-seq signal was found centered around these motifs for both proteins (Fig. 2, h and i). Thus, targeting of ZNF146 and ZNF507 to L1 elements appears to be genuine and occurs at opposite ends of the ORF2 coding region.

ZNF507 preferentially binds at young LINE-1 sequences
While ChIP-seq signal enrichment was observed for ZNF146 at nearly all mappable instances of its predicted binding sites (Fig. 2h), ZNF507 ChIP-seq signal was enriched at a fraction of its motif sites, suggesting that ZNF507 may only bind to a subset of L1 elements. LINE-1 sequences in the genome have been classified based on diagnostic sequence variants. Comparative genomics and expected sequence divergence over time have further allowed for approximate aging of L1 subfamilies containing these shared sequence variants from youngest (L1PA1, human specific) to oldest (L1PA17) primate-specific LINE-1 elements (Smit et al. 1995;Khan et al. 2006;Giordano et al. 2007;Konkel et al. 2010).
To determine whether ZNF146 or ZNF507 might differentially recognize L1 subfamilies we binned motif instances in LINE-1 elements by subfamily, and performed meta-analysis of ChIP-seq signals around these motifs. As expected from unbinned metaanalysis (Fig. 2h), ZNF146 ChIP-seq enrichment was observed at nearly all predicted mappable binding sites in all subfamilies (Fig. 3a, upper panels). Because the youngest subfamilies (L1PA1 and L1PA2) are less uniquely mappable, binding to those families could not be reliably determined. We note, however, that those families contain the same conserved binding motif and enrichment can be observed at young, intact elements when including multimapping reads (Fig. 2c), suggesting ZNF146 likely binds at these sequences as well.
In contrast, we observed strong enrichment of ZNF507 only in young subfamilies, particularly in L1PA3-L1PA8 (Fig. 3a, lower  panels). Again, while binding at L1PA1 and L1PA2 sequences could not be definitively determined due to mappability, binding of ZNF507 was observed at intact L1 sequences when allowing for multimapping (Fig. 2c), suggesting ZNF507 likely binds at these elements as well. Enrichment was essentially absent or rarely observed for ZNF507 in older subfamilies, particularly PA12-PA17. L1 classifications are determined largely by 3' UTR sequence variations (as above) but are sometimes classified using upstream ORF2 sequences (L1P1, youngest, to L1P4, oldest). ZNF507 enrichment was clearly observed at motifs in L1P1, L1P2, and L1P3 subfamilies, but essentially absent at motifs in L1P4 elements (Fig. 3b). Interestingly the L1P4 consensus sequence deviates from younger subfamilies within the core motif we identified as well as 2 other nearby residues, perhaps explaining the lack of binding observed for ZNF507 at these elements (Fig. 3c).
Zinc finger proteins and TEs are in some cases thought to have co-evolved in an "evolutionary arms race" of natural selection (Jacobs et al. 2014;Cosby et al. 2019). Since ZNF507 binding was observed only at young L1 subfamilies, we wondered if this specificity change could be related to amino acid substitutions in ZNF507 zinc fingers. Interestingly, multiple sequence alignment of ZNF507 ZFs revealed 3 separate residues in ZNF507 ZFs that are highly conserved among higher primates (apes and monkeys) but were not found in lower primates and other mammals (Fig. 4a). The timing of these substitutions appeared to closely parallel changes in L1 sequence and differences in ZNF507 binding (compare timeline in Fig. 4b to Fig. 3, a-c), collectively suggesting a relationship between the amino acid substitutions in ZNF507 ZFs and L1 targeting specificity.

Impact of ZNF146 and ZNF507 depletion on a somatic cell transcriptome
The above results demonstrate ZNF146 and ZNF507 target thousands of L1 sequences in the genome. Binding at these regions could potentially influence L1 or endogenous gene expression. We therefore disrupted ZNF146 and ZNF507 expression in human embryonic kidney cells (HEK293) using CRISPR/Cas9 to determine what impact depleting these proteins can have on the transcriptome of a somatic cell line.
Cells were transiently transfected with Cas9 and sgRNA expressing plasmids targeting either gene or a nonempty vector control (Ran et al. 2013). After transient selection and outgrowth, whole-cell extracts were analyzed by western blot to confirm target gene disruption. Figure 5a shows ZNF146 and ZNF507 were essentially undetectable in Cas9-targeted cells (Fig. 5a), which had no obvious growth or morphology defects. We then isolated RNA and performed transcriptomic analysis by paired-end RNA sequencing following ribosome depletion.
First, we confirmed Cas9 targeting by aligning RNA-seq reads to ZNF146 and ZNF507 cDNA reference sequences using Tophat2 and BWA aligners to rescue unmapped reads as previously described (Deininger et al. 2017). We then compared the coverage ratio of reads marked as having insertions or deletions (indels) during alignment vs. reads without indels. For both ZNF146 and ZNF507 indels were frequently detected at sgRNA target sites that were not observed in control cells, consistent with efficient disruption and protein depletion observed by western blot (Fig. 5b).
We then assessed whether we could detect an increase in RNA originating from full-length L1 elements. Full-length and intact L1 elements are known to be heavily repressed in most cell lines. Because of this and their repetitive nature, the vast majority of full-length elements were not detected using uniquely mapping RNA-seq reads (see histogram of counts at or near zero in Fig. 5c). For those that were detectable we did not observe any that were differentially expressed (P < 0.05) in cell populations in which ZNF146 or ZNF507 were disrupted (Fig. 5c).
Although intact LINE-1 elements are themselves highly unmappable by unique reads, their expression can reliably be detected in RNA-seq experiments by quantifying reads mapping to immediately downstream regions which become expressed as a result of read-through transcription (Philippe et al. 2016). Similar to what we observed within L1 elements, few RNA-seq reads mapped downstream of intact L1 elements, consistent with their general repression (Fig. 5d). For those downstream regions for which we did observe low-level expression, transcripts from these regions were not differentially expressed when ZNF146 or ZNF507 were disrupted. Although we cannot rule out increased levels of rare or stochastic L1 activation, these results indicate L1 elements are not broadly activated in response to ZNF146 and ZNF507 depletion in HEK293 cells.
ZNF146 and ZNF507 ChIP-seq peaks are not restricted to intergenic regions as we also observed many peaks for both proteins in regulatory and genic regions including at annotated enhancers and promoters (Fig. 6a). In addition, a large numbers of peaks (>50%) for both ZNF146 and ZNF507 were located in introns.
Targeting to these regulatory and genic regions could potentially influence endogenous gene expression.
We therefore assessed what impact, if any, loss of ZNF146 or ZNF507 had on gene expression. Remarkably, despite observed binding peaks at over 40,000 sites in ChIP-seq data, we did not observe differential expression of any protein-coding or lncRNA (P < 0.05) in cells depleted of ZNF146 in HEK293 cells (Fig. 6b). Interestingly, however, ZNF507 disruption resulted in the differential expression of 30 protein-coding (24 up and 6 downregulated) and 2 lncRNA (1 up and 1 downregulated) transcripts, indicating depletion of ZNF507 influenced nonretroviral gene expression (Fig. 6c, Table 1). Two differentially expressed genes had ZNF507 ChIP-seq peaks within 10 kb of their transcription start sites and twelve genes had peaks located within 100 kb, a common distance cutoff for cis regulatory elements (Vijayabaskar (c) Fig. 3. ZNF507 does not bind at old L1PA elements. a) Heatmap of ChIP-seq enrichment centered around occurrences of identified motifs (FIMO, P < 5e-6) 6 1 Kb after intersection and binning by LINE-1 subfamily. Below the ChIP-seq heatmaps (blue) is the theoretical mappability of 100 bp reads for the same regions (grayscale). b) Heatmaps as above but for LINE-1 subfamilies binned by their ORF2-derived classification. Relationship between the ORF2 and 3'UTR classifications is given in parentheses. c) Multiple sequence alignment of L1 subfamily consensus sequences. Positions are derived from the L1PA1 consensus sequence. Position of the ZNF507 ChIP seq binding motif identified by MEME is indicated above the alignment.

Identification of a physical interaction between ZNF507 and PRMT5
Transcription factors frequently regulate gene expression by recruiting chromatin modifying complexes to regulatory sequences. Hence, we wondered whether ZNF507 might associate with a known chromatin modifying enzyme. To test this idea, we expressed 3xFLAG-tagged ZNF507 in HEK293 cells by transient transfection (Fig. 7a) and performed co-immunoprecipitation followed by unbiased interaction discovery by liquid chromatography with tandem mass spectrometry (LC-MS/MS). FLAG-tagged ZNF507 localized as expected to the nucleus in HEK293 cells (Fig. 7b). Immunoprecipitation (IP) was performed on cells expressing 3xFLAG-ZNF507 as well as an empty vector control using anti-FLAG beads. Proteins were eluted with 3xFLAG peptide and partially resolved by SDS-PAGE prior to proteomic analysis by LC-MS/MS. As expected, bait protein, ZNF507, had the highest peptide count in the 3xFLAG-ZNF507 IP with far more peptides in this sample compared to empty vector control ( Table 2). The second most abundant protein (by peptide count) was protein arginine methyltransferase 5 (PRMT5), again with far more peptides detected in the 3xFLAG-ZNF507 IP than for the EV control. Interestingly, PRMT5 is a known transcriptional and posttranscriptional regulator of gene expression with various protein targets including core histones (Dacwag et al. 2007;Lacroix et al. 2008;Bedford and Clarke 2009;Tee et al. 2010;LeBlanc et al. 2016). These results suggest ZNF507 physically associates with PRMT5 and may recruit arginine methylation to L1 sequences to regulate L1 or endogenous gene expression (see model, Fig. 7c).

Discussion
About half of the human genome is comprised of repeat sequences that may serve underexplored roles in coordinated genome regulation. Unearthing proteins that target repetitive sequences may therefore serve as a launching pad for greater understanding of such processes. Over the past decade thousands of next generation data sets have been produced and made publicly available, most of which were processed without repeat sequences in mind.

GFP-ZNF507
(K562) ChIP-seq peaks Fig. 6. Disruption of ZNF507 leads to modest changes in gene expression. a) Pie chart of ChIP-seq peak intersection with genomic features. Promoters and enhancers are PHAMTOM-annotated. Genic features are RefSeq-annotated. b, c) Comparison of RNA-sequencing depth in fragments per kilobase per million mapped reads (FPKM) mapping to protein-coding transcripts (top) or lncRNAs (bottom). ZNF146 (left) or ZNF507 (right) disrupted cells are compared to EV control. Dotted gray lines mark 1.5-fold change. Transcripts which were differentially expressed (P < 0.05) are colored blue.
Taking a repeat-centric approach to mining these data sets may facilitate better understanding of the repeat genome and its influence on the biology of higher organisms.
There are many potential approaches to studying repeat enrichment in next generation sequencing data sets (Teissandier et al. 2019). Here, we took a simple peak intersection approach and analyzed publish ENCODE data sets to identify proteins specifically enriched on repetitive transposon-derived sequences. We identified over 200 TF ChIP-seq peak sets enriched on specific transposon sequences more than would be expected by random chance. Strikingly, many DNA-binding proteins appear to specifically target ERV-K transposons, for reasons we cannot readily explain. We then focused on a prominent enrichment of 2 putative L1-targeting proteins, ZNF146/OZF and ZNF507, and characterized their relationship to L1 sequences.
Other groups have previously taken similar approaches to identify proteins enriched on repetitive elements in ChIP-seq data sets, either using related peak intersection approaches or taking additional steps to extract information from reads mapping to multiple places in the genome (Criscione et al. 2014;Jacobs et al. 2014;Schmitges et al. 2016;Barazandeh et al. 2018;Soraya Shehata 2020). Schmitges et al. used a similar approach to that taken here to calculate the percentage of peaks for ENCODE ChIP-seq data sets with TEs and independently identified ZNF146 as being enriched on L1 elements. During the preparation of this manuscript, ZNF507 was also recently identified by Shehata et al. as binding at L1 elements while performing large-scale analysis of ChIP-seq data sets.
In comparison to some other studies, particularly Barazandeh et al. and Schmitges et al. referenced above using similar data sets, relatively few DNA binding proteins were found to specifically recognize L1 repeats. These studies took a slightly different approach, using simple peak intersection frequencies rather than enrichment over shuffled background models to identify interactors. This difference likely caused our approach to under identify genuine TE-targeting proteins and highlight only those with highfrequency binding in the genome. Less frequent binders (for instance those that bind in the often-truncated 5'-end of L1 elements) would likely be missed by the approach taken here as these peak interactions would fail to enrich over background shuffling models when compared to high-abundance and highly fragmented elements such as L1.
Here, we focus on 2 such proteins, ZNF146 and ZNF507 targeting to L1 sequences and find these proteins target different regions of the ORF2-encoding region of L1 DNA sequences in the human genome. ZNF146 targets both "old" and "young" L1 near the extreme 3' end of ORF2. In contrast, ZNF507 was found to bind the 5' end of L1 ORF2 and was only observed at younger elements. Thus, our findings add to the knowledge base of proteins that specifically recognize L1 elements in the genome including other L1-targeting proteins shown to differentiate between L1 subfamilies (Jacobs et al. 2014). We show that changes in the amino acid composition of ZNF507 zinc fingers during primate evolution paralleled changes to L1 DNA sequence, indicating ZNF507 and L1 sequence may have co-evolved during this time period or perhaps that ZNF507 may have recently acquired affinity for L1 repeats.
Surprisingly little is known about the functions of either ZNF146/OZF or ZNF507 despite being widely expressed in human tissue (Uhlen et al. 2015(Uhlen et al. , 2017. Several years ago ZNF146 was To account for additional peaks that would likely not be detected due to mapability, distances were also calculated for FIMO-predicted ZNF507 binding motifs in LINE-1 PA1, PA2, PA3, and PA4 subfamilies. When this distance was smaller it is indicated in parentheses. shown to be overexpressed in various cancers, though the contribution or molecular mechanisms related to ZNF146/OZF in oncogenic transformation remains opaque (Ferbus et al. 1999(Ferbus et al. , 2003Antoine et al. 2005b;Ma et al. 2020;Zhu et al. 2021). ZNF146 was shown to associate with telomeric protein Rap1 and thus may have a role in telomere maintenance (Antoine et al. 2005a). Recently, ZNF146 was additionally shown to associate with replisome subunits and influence replication fork progression under conditions of DNA replication stress (Feu et al. 2020). Future studies are needed to determine how LINE-1 targeting and these potential roles in regulating genomic integrity at telomeres and during replication are related.
Although we did not observe broad upregulation of L1 elements or identify any gene expression changes in HEK293 cells lacking ZNF146, it remains possible that ZNF146 has important roles for regulating transcription or epigenetic modifications in heterochromatic regions, which are enriched for L1 sequences. If so, their contribution to regulation may be obscured by DNA methylation or other redundant silencing mechanisms, analogous to the maintenance of inactive-X heterochromatin despite removal of XIST RNA (Csankovszki et al. 1999). Similarly, ZNF146 could have a more essential role in specific developmental states, as it was recently identified as a potential regulator of the naïve primate pluripotency network (Boroviak et al. 2018). Since LINE-1 elements are known to be activated and regulate chromatin dynamics during early embryogenesis (Fadloun et al. 2013;Jachowicz et al. 2017), it's tempting to speculate ZNF146/OZF is either involved in this activation or important for subsequent repression of LINE-1 elements post-activation. Future studies may address these potential connections between ZNF146, LINE-1 elements, pluripotency, and early development.
ZNF507 has been identified in genomic studies as being a potential risk locus for neural developmental disorders (Talkowski et 7. Identification of an interaction between ZNF507 and PMRT5. a) Schematic of approach to identify ZNF507-interacting proteins. HEK293 cells were transiently transfected with an 3xFLAG tagged ZNF507 expression plasmid or an empty vector control. After 48 h proteins were extracted and subjected to immunoprecipitation using anti-FLAG beads. Proteins were eluted by competition with 3xFLAG peptide before limited short gel electrophoresis and analysis by mass spectrometry (LC-MS/MS). b) Immunofluorescence analysis of HEK293 cells expressing 3xFLAG-ZNF507. All interphase cells observed expressing 3xFLAG-ZNF507 had nuclear localization. No appreciable signal was observed for cells transfected with empty vector 3xFLAG control plasmid. c) Model for LINE-1 binding by ZNF146 and ZNF507 and potential functions. ZNF146 and ZNF507 bind at thousands of sites in the genome as a result of LINE-1 transposition. Binding by either protein may then recruit various activities to these regions, such as the potential for ZNF507 to recruit protein arginine methylation activity through interaction with PRMT5. These activities may then contribute to context or cell-type dependent regulation of LINE-1 elements or nearby genes. Schematics created with BioRender.com.
potential relationship, to our knowledge ZNF507 had not been the subject of directed investigation. Our experiments here demonstrate ZNF507 targets L1 "young" elements and indicate ZNF507 depletion influences gene expression, potentially through interaction with PRMT5. Our transcriptomic analysis was performed in an embryonic kidney cell line (HEK293), which may not be the most relevant cell type to study ZNF507 function. Future experiments depleting ZNF507 in relevant neural cell types or neural organoids are needed to determine whether ZNF507 acts as a transcriptional regulator in these contexts. Identifying these targets could be key in determining what impact, if any, ZNF507 has on various brain disorders. PRMT5 was recently shown to be a critical repressor of LINE-1 elements in primordial germ cells and early embryonic development (Kim et al. 2014). In this context, PRMT5-mediated methylation of histones is required to repress L1 expression during this unusual period of global reduction in DNA methylation. To our knowledge, how PRMT5 is targeted to LINE-1 sequences is largely unknown. Here, we identify a potential interaction between PRMT5 and ZNF507. Although this relationship requires further validation, results here indicate ZNF507's primary function may be to recruit PRMT5-mediated arginine methyltransferase activity to chromatin in order to regulate L1 or endogenous gene expression during development.

Data availability
Strains and plasmids are available upon request. Gene expression data are available at GEO with the accession number: GSE172285.
Supplementary material is available at G3 online.
Conditional deletion of Xist disrupts histone macroH2A Results represent the most abundant proteins (by number of peptides observed) after LC-MS/MS analysis of immunoprecipitations from HEK293 cells transfected with either 3xFLAG-ZNF507 or EV control expression plasmids as indicated. Common contaminants: keratins, heat shock proteins, tubulins, DSPN, FLNA, TBB5, T complex were excluded from this table.