ONECUT transcription factors induce neuronal characteristics and remodel chromatin accessibility

Abstract Remodeling of chromatin accessibility is necessary for successful reprogramming of fibroblasts to neurons. However, it is still not fully known which transcription factors can induce a neuronal chromatin accessibility profile when overexpressed in fibroblasts. To identify such transcription factors, we used ATAC-sequencing to generate differential chromatin accessibility profiles between human fibroblasts and iNeurons, an in vitro neuronal model system obtained by overexpression of Neurog2 in induced pluripotent stem cells (iPSCs). We found that the ONECUT transcription factor sequence motif was strongly associated with differential chromatin accessibility between iNeurons and fibroblasts. All three ONECUT transcription factors associated with this motif (ONECUT1, ONECUT2 and ONECUT3) induced a neuron-like morphology and expression of neuronal genes within two days of overexpression in fibroblasts. We observed widespread remodeling of chromatin accessibility; in particular, we found that chromatin regions that contain the ONECUT motif were in- or lowly accessible in fibroblasts and became accessible after the overexpression of ONECUT1, ONECUT2 or ONECUT3. There was substantial overlap with iNeurons, still, many regions that gained accessibility following ONECUT overexpression were not accessible in iNeurons. Our study highlights both the potential and challenges of ONECUT-based direct neuronal reprogramming.

. Images used for staining quantification All images used for the staining quantification. Per condition 3 coverslips were quantified, with 4-5 images per condition. On the images, TUBB3 is in red, ONECUT is in green and dapi is in blue. All images were taken with the same exposure times.    Figure S11. GO-terms associated with downregulated genes and overlap with expressed GTEX genes. a. Quantile normalized log2(FPKM+1) expression for genes associated with respectively the GO-terms 'axon development', 'trans-synaptic signaling' and 'cell morphogenesis involved in neuron differentiation'. The top five genes with the highest fold change are annotated for both upregulated (green) and downregulated genes (red). b. GO-terms associated with differentially downregulated genes. Similar GO-terms were discarded based of information content in the GO-term graph (see Methods). For the five most significant GO-terms associated with respectively OC1-down, OC2-down, OC3-down and iN-Fib-down, the odds ratios are plotted for all differentially expressed gene sets (Fig. 6b). c. Quantile normalized log2(FPKM+1) gene expression distribution for GTEX samples. The blue line indicates the threshold above which genes are expressed (panel d) and the red line the threshold above which genes are highly expressed (Fig. 6h). Figure S12. Enrichment of differential CRs near differential genes in the same condition a. Enrichment within 1 kb of transcription start sites (TSSs). The figure shows distributions for the number of differential CRs (OC1/2/3-up or OC1/2/3-down) within 1 kb of the TSS of 1000 genes randomly sampled 5000 times. In each subpanel, distributions are plotted for three different gene sets from which the 1000 genes are sampled: 1) All GENCODE v.27 genes, 2) OC-1/2/3-up genes and 3) OC1/2/3-down genes. b. Enrichment within 10 kb of transcription start sites (TSSs). Identical to panel a, but with a 10 kb distance. c. Enrichment within 100 kb of transcription start sites (TSSs). Identical to panel a, but with a 100 kb distance.

Cell culture
The human skin fibroblast lines were obtained from three anonymized healthy individuals and were stored at the Radboud University Medical Center Human Genetics biobank (Table   S1). The use of the fibroblast lines was in accordance with the regulations of the Ethical

Lentivirus production
In this study, multiple lentiviral transfer vectors were used (Table S2). In addition to the transfer vectors, we used two lentiviral packaging vectors for lentivirus production, psPAX2 (Addgene #12260) and pMD2.G (Addgene #12259).Lentivirus was produced by cotransfecting HEK293T cells with psPAX2, pMD2.G and the transfer vector using calcium phosphate co-precipitation. The HEK293T were incubated at 37 °C and 5% CO2 for 5-8 hours, after which the medium was replaced. 48 hours later, supernatant of the spent culture medium was collected. This supernatant was either first concentrated using an ultracentrifuge or directly stored at -80°C.

iNeuron differentiation
iNeuron differentiation was performed as described previously (3). Briefly, rtTA/Neurog2positive iPSC lines were differentiated to iNeurons via doxycyclin-dependent Neurog2 overexpression over a period of three weeks (4). On day 21 after induction, cells were isolated for ATAC-seq and RNA-seq.

Validation experiments
The validation experiments consisted of overexpressing OC1/2/3 in human adult skin fibroblasts and were performed as follows. On day -2 (two days before induction of OC1/2/3 expression), cultured fibroblasts were detached using 0.25% trypsin, counted and resuspended in fibroblast medium. 20.000 fibroblasts were plated in 1ml fibroblast medium in each well of a twelve wells plate (Corning).
On day -1, cells were transduced with either only the Bclxl, OC1, OC2 or OC3 vector or the Bclxl vector in combination with the OC1, OC2 or OC3 vector. Transduction was performed in fresh fibroblast medium in the presence of 8ug/ml polybrene (Sigma-Aldrich). On day 0, 2 and 4, the medium was refreshed for medium containing 2ug/ml doxycycline (Sigma-Aldrich) to induce expression of the OC1/2/3 transgene. This medium also contained 2ug/ml puromycin (Sigma-Aldrich) for the Bclxl conditions and 8ug/ml blasticidin (Sigma-Aldrich) for the OC1/2/3 conditions. Morphology was assessed on day 0-5 using brightfield microscopy. One representative picture was taken for each well, with (at least) triplicates for each condition.
For ATAC-seq and RNA-seq, cells were isolated on day 2. We should note that we stained for the overexpressed OC1/2/3 using a polyclonal antibody raised against a ONECUT3 epitope highly conserved in ONECUT1 (89% of a.a.

Multiplicity of infection
sequence identical to epitope) and ONECUT2 (also 89% of a.a. sequence identical to epitope).
The affinity of the antibody might not be the same for ONECUT1, ONECUT2 and ONECUT3.
The staining intensities for OC1+Bclxl, OC2+Bclxl and OC3+Bclxl should therefore not be compared with each other, but only with the Bclxl condition.

ATAC-sequencing
iPSCs were harvested using accutase (Sigma-Aldrich) and fibroblasts using 0.25% trypsin (Gibco). Approximately 50,000 cells were collected and washed once with ice-cold PBS. Cells were centrifuged at 300 g and the supernatant was removed. iNeurons were not harvested, μL proteinase K (10 mg/mL) and 2 μL 5 % SDS. DNA was purified using normal-phase 2x AMPure bead purification (Beckman Coulter). The purified DNA was PCR amplified in 8 PCR cycles, followed by reverse-phase 0.55x AMPure bead purification (Beckman Coulter) and QIAquick column purification (Qiagen). The size-selected, purified PCR product was PCR amplified for another 8 PCR cycles, followed by QIAquick column purification (Qiagen). The fragment length distribution was determined using TapeStation (Agilent) and library concentration was quantified using KAPA library quantification kit (KAPA Biosystems).

RNA-sequencing
RNA was isolated with the RNeasy Mini kit (Qiagen) according to the manufacturer's instructions. The RNA isolation included on-column DNA digestion. Before the RNA isolation, iPSCs and fibroblasts were lysed after harvesting the cells, whereas the iNeurons were lysed in situ (as described for ATAC-seq). RNA-seq library preparation was performed with the SMARTer Stranded Total RNA Sample Prep Kit (low input mammalian) (Clontech), according to the manufacturer's instructions. 50 ng of total RNA was depleted for rRNA using the RiboGone method included in the kit. The resulting rRNA-depleted RNA input was less than 10 ng and we followed the additional steps in the protocol of the manufacturer for this amount of input material. 12 amplification cycles were used for the PCR. The fragment length distribution was determined using TapeStation (Agilent) and the last bead size selection of the protocol (with 1.0x beads) was repeated once more in case the libraries contained PCR product with sizes smaller than 100 bp. The library concentration was quantified using KAPA library quantification kit (KAPA Biosystems). Sequencing was performed on an Illumina NextSeq 500 using HighOutput kit v2 for 75 cycles (paired-end 2x43 bp).

Processing of sequencing data
ATAC-seq reads were aligned to a combined human-rat reference genome (respectively hg19 and rn6) using BWA sampe (7) with default parameters. All rn6-mapping reads and all h19mapping reads with a mapping quality ≤ 40 were removed from the bam file (Number of mapped reads in Table S3). Peak calling was performed on the bam files using the MACS2 algorithm (8) (parameters: -f BAMPE -g hs --nomodel -q 0.001). Note that for the peak calling the two replicates of each cell line were taken together. Union CR sets were defined by combining all the peaks called for the different ATAC-seq samples and merging overlapping peaks to one peak. Quantification of the read counts on the union CR sets was performed using the R package GenomicAlignments (9) function summarizeOverlaps (parameters: mode = "Union", ignore.strand = TRUE, inter.feature = TRUE, singleEnd = FALSE, fragments = FALSE).
For RNA-seq, the first three bases of all forward reads were removed using the trimmomatic software (10) (parameters: SE HEADCROP:3), as recommended by the manufacturer of the SMARTer Stranded RNA Sample Prep Kit. Reads were mapped to the combined hg19-rn6 reference genome using STAR (11) with default parameters. All rn6mapping reads and all h19-mapping reads with a mapping quality other than 255 were removed from the bam file (Number of mapped reads in Table S3). htseq-count (12) was used for quantifying the gene expression (parameters: --order=pos --stranded=yes --mode=union --type=exon --idattr=gene_id). For this, hg19 transcript definitions from GENCODE 27 (13) were used.
The resulting count tables for the ATAC-seq and RNA-seq (containing respectively the number of fragments per CR and gene) were used directly for differential analysis with

Union CR sets
We used three different union CR sets: 1) iNeuron-Fibroblast union: A reduced union CR set made by combining the iNeuron and fibroblast MACS2 peaksets and merging overlapping peaks to one peak.
3) DNase union: A union set of CRs with DNase I accessibility in at least one of 127 3) Comparing the DNase union CR set with the OC1-up, OC2-up and OC3-up CR sets to identify enriched motifs in the different CR sets. The OC1-up, OC2-up and OC3-up CR sets were grouped into three different sets: CRs with a ONECUT motif score > 7.5, CRs with a ONECUT motif score < 7.5 and CRs with a ONECUT motif score < 2.5. We used the following parameters: gimme maelstrom input.txt hg19 output.
The motifs used in this study are supplied by GimmeMotifs and are motifs formed by clustering motifs in the cis-bp database (1) on similarity.

Differential chromatin accessibility and gene expression analysis
Both differential chromatin accessibility and differential gene expression were determined using DESeq2 (14). As input we used count values from the ATAC-seq/RNA-seq for CRs/genes.
In all differential analyses with DESeq2, we used the design: design ~ cell line + cell type, to take into account both cell line and cell type. We considered CRs/genes differentially upregulated if the Benjamini-Hochberg-adjusted p-value < 0.01 and the log2(fold change) >
Using these criteria, we defined the following sets of differential information content in the GO-term graph. We used the GOstats function geneIdUniverse to obtain the genes (EnsemblIDs) annotated with each GO-term.

Regression analysis
We We used a and counted the number of CRs/genes in the different intersections. These numbers were plotted as a Venn diagram using the http://eulerr.co/ web application (17). Areas are scaled (close to) proportional to the number of CRs.
Jaccard indices were determined using the following formula: Overlap percentages were determined using the following formula:

Fluff plots
We used the command line tool fluff (18)

ONECUT-motif-OCRs comparison with ENCODE/ROADMAP
First, we adjusted the size of each CR in the ATAC union set and DNase union set (center CR±100bp) to prevent any bias due to CR size. We used GimmeMotifs (no background) to determine the ONECUT motif score for each CR in the ATAC union set and DNase union set and assigned the CRs with a motif score > 7.5 as ONECUT-motif-CRs (FPR = 0.01, determined using the command "gimme background -gc").
We  Using the GTEX annotation file 'GTEx_Data_V6_Annotations_SampleAttributesDS.txt' (downloaded from https://www.gtexportal.org/home/datasets), we calculated a mean odds ratio and standard deviation for each tissue type within the 8555 GTEX samples.

Enrichment induced CRs near induced genes
For each of OC1, OC2 and OC3, we performed the following analysis.
Three, we plotted the different counts for the 5000 permutations as a densityplot, always comparing the different gene sets in one plot.

Histone modifications and motifs associated with sets of differentially accessible CRs
We determined the coverage on the DNase union set for 25 different histone modifications, each one assayed in up to five ENCODE/ROADMAP fibroblast samples (E017, E055, E056, E126 and E128; Table S7). In addition, we called motif scores for 580 clustered TF motifs based on the cis-bp database (1) using GimmeMotifs (15). Both the histone modification coverage and motif scores were quantile normalized to a normal distribution with mean 0 and standard deviation 1.
We used bedtools (parameters: intersect -a -b -c) to overlap each differentially more accessible CR set (OC1-up, OC2-up, OC3-up, iN-Fib-up) with the DNase union CR set. We intersected the iN-Fib-up and OC1/2/3-up CR sets to get both the CRs shared and specific for the different CR sets. We used a threshold requiring the ONECUT motif score > 7.5 to select the ONECUT-motif-CRs (FPR=0.01, based on the background set) within the DNase union set.