Mechanism of REST/NRSF regulation of clustered protocadherin α genes

Abstract Repressor element-1 silencing transcription factor (REST) or neuron-restrictive silencer factor (NRSF) is a zinc-finger (ZF) containing transcriptional repressor that recognizes thousands of neuron-restrictive silencer elements (NRSEs) in mammalian genomes. How REST/NRSF regulates gene expression remains incompletely understood. Here, we investigate the binding pattern and regulation mechanism of REST/NRSF in the clustered protocadherin (PCDH) genes. We find that REST/NRSF directionally forms base-specific interactions with NRSEs via tandem ZFs in an anti-parallel manner but with striking conformational changes. In addition, REST/NRSF recruitment to the HS5–1 enhancer leads to the decrease of long-range enhancer-promoter interactions and downregulation of the clustered PCDHα genes. Thus, REST/NRSF represses PCDHα gene expression through directional binding to a repertoire of NRSEs within the distal enhancer and variable target genes.

REST/NRSF contains a central DNA-binding domain with eight tandem C2H2 ZFs and two repressor domains residing in the amino and carboxyl termini, respectively (Figure 1A) (2,3,15). REST/NRSF has been shown to bind to thousands of NRSEs which can be divided into three groups: canonical, noncanonical, and half-site only motifs (16,17). Intriguingly, canonical and noncanonical NRSEs contain very different gap sizes between the left-and right-half sites. ZF domains are small DNA-recognition units that are usually organized in tandem and there are >800 genes encoding ZF transcription factors in the human genome (18)(19)(20)(21). The DNA-recognition mechanism by which these ZF transcription factors bind to distinct groups of vast number of genomic sites via conformation switch of ZF domains are largely unknown.
The clustered protocadherin (PCDH) genes encode a large number of cell-surface cadherin-like adhesion proteins which are thought to function as neuron identity codes in brain wiring and neuron discrimination (22)(23)(24)(25). There are 53 highly-similar clustered PCDH genes organized into three closely-linked gene clusters (PCDHα, PCDHβ and PCDHγ ) in the human 5q31 chromosomal region (Figure 1B) (22). The genomic organizations of PCDHα and PCDHγ are similar in that each contains more than a dozen variable exons that can be spliced to a single set of three downstream constant exons. The variable exons can be grouped into alternate and C-type exons based on their locations and similarities ( Figure 1B) (22). The PCDHβ cluster contains only variable exons but with no constant exon (22). Each variable exon has its own promoter (26,27). Two super-enhancers, one composed of HS7 (DNase I hypersensitive site 7) and HS5-1, the other composed of HS7L (HS7 like), HS5-1L (HS5-1like) and HS18-22, regulate the expression of the PCDH α and βγ clusters, respectively (Figure 1B) (25,(28)(29)(30)(31)(32)(33). (E-G) EMSA experiments using REST/NRSF with a site 'a' NRSE probe of each member of the alternate PCDHα (E), a site 'b' NRSE probe of each PCDHγ a (F), or a site 'c' NRSE probe of each member of the alternate PCDHγ (G), using mock as a control (Ctr). Supershifted bands were detected with a specific antibody against human MYC (Ab) tag fused to the C-terminal of REST/NRSF. CTCF (CCCTC-binding factor) is a master regulator of clustered PCDH genes (25,31,(33)(34)(35)(36)(37)(38). There are tandem arrays of forward CTCF-binding sites (CBSs) associated with the PCDH promoters and of reverse CBSs associated with the super-enhancers ( Figure 1B) (31,33). Through CTCF/cohesin-mediated topological chromatin interactions between enhancers and promoters, clustered PCDH genes are stochastically and unbiasedly expressed (33,38). By contrast, REST/NRSF has been shown to repress expression of the clustered PCDH genes (29,39). However, the mechanism by which REST/NRSF represses these clustered genes remains unknown.
Here, we identified NRSEs in every alternate exon of the human PCDH α and γ clusters, as well as in each Ctype gene. By systematic EMSA (electrophoretic mobility shift assay) experiments, in conjunction with computational molecular dynamics, we found that REST/NRSF recognizes NRSEs via tandem ZF domains in a directional and flexible manner. Moreover, through genetic experiments we found that REST/NRSF inhibits long-distance chromatin contacts between the PCDHα HS5-1 enhancer and its target promoters through preventing CTCF binding. Thus, REST/NRSF regulates the neural-specific expression of the clustered PCDHα genes through modifying chromatin structures.

Animals
C57BL/6 and ICR mouse strains were housed at 23 • C on a 12/12 h light-dark cycle (7:00 am-19:00 pm) in an SPF (specific pathogen free) facility. The zygotes were obtained from the oviducts of superovulated female C57BL/6 mice mated with the C57BL/6 stud male mice. All experiments were carried out in accordance with the protocol approved by the Institutional Animal Care and Use Committee of Shanghai Jiao Tong University (protocol#: 1602029).

Plasmid construction
To generate plasmids for in-vitro expression of REST/NRSF, the coding sequences of REST/NRSF ZF1-8 were amplified by PCR using cDNA as templates and cloned into the pTNT vector (Promega) between the EcoRI and XbaI sites, with a MYC tag in the 3 terminal. A series of ZF-deleted and ZF-mutated REST/NRSFexpressing plasmids were constructed by PCR using ZF1-8 encoding region as templates and ligated into the pTNT vector between the EcoRI and XbaI sites, also with a MYC tag in the 3 terminal.
To prepare plasmids for generating probes used in EMSA experiments, the sequences containing putative NRSE motifs were amplified by PCR from the human genomic DNA and then subcloned into the pGEM-T Easy vector (Promega). To prepare plasmids for generating the mutated probes, the mutated sequences were constructed by PCR from the wild-type plasmids and then ligated into the pGEM-T Easy vector.
The plasmids for REST/NRSF knockdown and for GFP control were constructed by ligating annealed primer pairs into the pLKO.1 vector (Addgene) between the EcoRI and AgeI sites. Plasmids for sgRNA expression were constructed by inserting annealed primer pairs into a BsaIlinearized pGL3 vector under the control of the U6 promoter. All constructs were confirmed by Sanger sequencing and all primers are shown in Supplementary Table S1.

Lentivirus packaging and infection
The pLKO.1-plasmids for REST/NRSF knockdown were co-transfected into HEK293T cells with the psPAX2 and pMD2.G helper plasmids (Addgene) using Lipofectamine 2000 (Life Technologies) to produce lentiviral particles. HEC-1-B cells and HEK293T cells at the confluency of 70-90% were infected with the virus in the presence of 8 g/ml polybrene (Sigma). Puromycin (Sigma) was added at a final concentration of 2 g/ml to select the infected cells. Fresh puromycin-containing medium was changed every other day. Cells were collected for assays at day 5 post infection. After harvesting by lysing cells with RIPA lysis buffer (50 mM Tris-HCl, pH 7.4, 150 mM NaCl, 1% Triton X-100, 1% sodium deoxycholate, 0.1% SDS, 1 mM PMSF), the expression levels of REST/NRSF were measured by western blot.

Western blot
Proteins were denatured and separated by SDS-PAGE and transferred to nitrocellulose membranes. These membranes were then incubated with mouse anti-MYC (Millipore), rabbit anti-REST/NRSF (Millipore), or rabbit anti-␤-actin antibody (Abcam). Finally, the membranes were incubated with anti-mouse or anti-rabbit secondary antibodies and scanned using the Odyssey System (LI-COR Biosciences).

Electrophoretic mobility shift assays (EMSA)
EMSA experiments were performed as described (31,36) with some modifications. Proteins used for EMSA experiments were synthesized in-vitro from pTNT plasmids using TNT T7 Quick Coupled Transcription/Translation System (Promega) according to the manufacturer's protocol. Briefly, ZF-deleted or ZF-mutated pTNT plasmids were mixed gently with TNT T7 Quick Master Mix, Methionine, and T7 TNT PCR Enhance by pipetting, and then incubated at 30 • C for 60-90 min.
Probes were generated by PCR with high-fidelity polymerase using 5 biotin-labeled primers from templatecontaining plasmids and gel-purified. The primers used are listed in Supplementary Table S1. Probe concentration was measured with NanoDrop (Thermo). Each binding reaction contained equimolar of the biotin-labeled probes. Protein concentration was determined by western blot and each binding reaction contained the same amounts of proteins.
EMSA was performed with the LightShift Chemiluminescent EMSA Kit (Thermo) according to the manufacturer's manuals. Briefly, the in-vitro-synthesized proteins were precleared with the binding buffer containing 10 mM Tris-HCl, 250 mM KCl, 2.5 mM MgCl 2 , 0.1 mM ZnSO 4 , 1 mM DTT, 0.1% NP-40, 50 ng/l poly (dI-dC), and 2.5% (v/v) glycerol on ice for 20 min. Fifty fmol of biotin-labeled probes were then added and the reactions were incubated at room temperature for 20 min. One g of anti-MYC antibody was added into the binding reaction and incubated at room temperature for another 20 min for the supershift experiments. The binding reactions were electrophoresed on 5% nondenaturing polyacrylamide gels in ice-cold 0.5× TBE buffer (pH 8.0) and transferred to a nylon membrane. After crosslinking under UV-light for 10 min, the membranes were blocked in the Blocking Buffer by incubating for 15 min with gentle shaking and then incubated in the Stabilized Streptavidin-Horseradish Peroxidase Conjugate solution for 15 min with gentle shaking. After rinsing the membrane with 1× washing buffer briefly, the membrane was washed four times for 5 min each in 1× washing buffer with gentle shaking. Then, the membrane was incubated in substrate equilibration buffer for 5 min with gentle shaking, followed by incubation in the substrate working solution for 5 min without shaking and exposure using ChemiDoc XRS+ System (Bio-Rad).

ChIP-nexus
ChIP-nexus experiments were performed as described (40) with some modifications. Briefly, ∼2 × 10 7 cells were crosslinked with 1% formaldehyde for 10 min at room temperature, followed by quenching the crosslinking with glycine at a final concentration of 0.125 M and then spun down. Subsequently, cell pellets were lysed twice with ice-cold ChIP buffer (10 mM Tris-HCl, pH 7.5, 0.15 M NaCl, 1% Triton X-100, 1 mM EDTA, 0.1% SDS, 0.1% sodium deoxycholate, 1× protease inhibitors) by incubating at 4 • C for 10 min with slow rotation. Nuclei were spun down and then resuspended in 700 l of the ChIP buffer. After incubating on ice for 10 min, the samples were sonicated using a Bioruptor Sonicator on high power (30 s on/30 s off) for 30 min to fragmentize DNA to sizes ranging from 100 to 10000 bp. The sheared chromatin solutions were immunoprecipitated with a specific antibody against REST/NRSF (Millipore, 4 g for each reaction) by slow rotation at 4 • C overnight. Antibody-precipitated complexes were incubated with 50 l of Protein A/G Magnetic beads (Thermo) at 4 • C for another 3 h the next day. The chromatin-enriched magnetic beads were then washed with the washing buffer A (10 mM TE, 0.1% Triton X-100), washing buffer B (150 mM NaCl, 20 mM Tris-HCl, pH 8.0, 5 mM EDTA, 5.2% sucrose, 1.0% Triton X-100, 0.2% SDS), washing buffer C (250 mM NaCl, 5 mM Tris-HCl, pH 8.0, 25 mM HEPES, 0.5% Triton X-100, 0.05% sodium deoxycholate, 0.5 mM EDTA), washing buffer D (250 mM LiCl, 0.5% NP-40, 10 mM Tris-HCl, pH 8.0, 0.5% sodium deoxycholate, 10 mM EDTA), and finally the Tris buffer (10 mM Tris-HCl, pH 7.5). Washing volumes were 1 ml per sample. After the washing buffer was added, the tubes were briefly inverted by hand to resuspend the beads for each washing. The beads were resuspended every ∼15 min by gently tapping the tubes for all the following incubations.
The DNA-complex-coated beads were incubated with the end-repair enzyme mixture (NEB) at 20 • C for 30 min to repair the DNA ends and then with Klenow exo-(NEB) in the NEB buffer 2 containing 0.2 mM dATP at 37 • C for 30 min for dA tailing. The samples were then ligated with the annealed Nexus adaptors (Nex adaptor UBamHI: 5 phosphate GATCG GAAGA GCACA CGTCT GGATC CACGA CGCTC TTCC, Nex adaptor Barcode BamHI: 5 phosphate TCAGA GTCGA GATCG GAAGA GCGTC GTGGA TCCAG ACGTG TGCTC TTCCG ATCT) with 2× Blunt/TA ligase master mix (NEB) at 25 • C for 1 h. The adaptors contained a pair of sequences for library amplification, a BamHI site for later linearization, a nine-nucleotide barcode containing five random bases and four fixed bases. Subsequently, the samples were treated with Klenow exo-at 37 • C for 30 min to fill-in the ends of the adaptors and then trimmed with T4 DNA polymerase (NEB) at 12 • C for 5 min. The blunt-ended DNA was then treated with Lambda Exonuclease (NEB) at 37 • C for 60 min with constant rotation to digest one strand of the double-stranded DNA in a 5 to 3 direction until encountering a cross-linked protein. The samples were then digested by RecJ f exonuclease (NEB) for 60 min at 37 • C to degrade the single-stranded DNA from the 5 -end and washed three times with the RIPA buffer (50 mM HEPES, pH 7.5, 1 mM EDTA, 0.7% sodium deoxycholate, 1% NP-40, 0.5 M LiCl).
The DNA-protein complex was eluted with 200 l of elution buffer (50 mM Tris-HCl, pH 8.0, 10 mM EDTA, 1% SDS) at 65 • C, shaken at 1000 rpm for 30 min, and then reverse-crosslinked at 65 • C overnight. The next day, after incubating the samples with 2 l of RNase A (Thermo) at 37 • C for 2 h and then with 8 l of proteinase K (NEB) at 55 • C for 4h, ssDNA was extracted with phenol:chloroform: isopentanol (25:24:1, v/v), and precipitated with 2.5× (v/v) of ethanol, 1/10 (v/v) of 3 M sodium acetate and 1.5 l of glycogen (20 mg/ml, Thermo). The DNA pellet was resuspended in 10 l of nuclease-free water.
After being denatured at 95 • C for 5 min, the ssDNA was circularized with CircLigase (Epicentre) at 60 • C for 1 h. The samples were annealed with the BamHI cut oligo (GAAGA GCGTC GTGGA TCCAG ACGTG) in the digesting buffer (Thermo) in a thermocycler using the annealing program (95 • C for 1 min, slowly cooled down at 1% ramp to 25 • C for 1 min, 25 • C for 30 min, hold at 4 • C) and then digested with BamHI (Thermo) at 37 • C for 30 min. The linearized DNA were precipitated using ethanol and sodium acetate, then resuspended with 20 l of water. The DNA was amplified by Q5 polymerase (NEB) with the Illumina primers with the PCR program (98 • C for 30 s, 16 cycles of 98 • C for 10 s and 65 • C for 75 s, 65 • C for 5 min, hold at 4 • C). The PCR products with sizes from 150 bp to 300 bp were extracted from 2% agarose gel using MinElute Gel Extraction Kit (Qiagen).
ChIP-nexus library DNA was sequenced on an Illumina HiSeq X Ten platform, and reads were filtered for the presence of the fixed barcode CTGA starting from the sixth position of reads. The random and fixed barcode sequences were then removed. Adaptor sequences at the 3 end of reads were then trimmed using the Cutadapt tool. The trimmed reads were aligned to the human genome (hg19) using Bowtie2, and then analyzed by MACS2, MACE and MEME suites. The heatmaps were generated by an R package named pheatmap.

RNA-seq
RNA-seq experiments were performed as previously described (31,33). Briefly, total RNA was extracted from cultured cells or mouse tissues using TRIzol Reagent (Ambion). RNA-seq experiments were performed using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB) according to the manufacturer's protocol. Briefly, mRNA was enriched from 1 g of total RNA using NEBNext oligo (dT) magnetic beads, and fragmented by heating at 95 • C for 10 min. After reverse transcription of the first stranded cDNA and synthesis of the second stranded cDNA, cDNA was purified using 1.8× AMPure XP beads (Beckman). The purified cDNA was end-repaired and then ligated with NEBNext Adaptors, followed by treatment with the USER enzyme (NEB). The ligated product was purified using the AMPure XP beads (Beckman). The purified cDNA product was then amplified with the Illumina primers by the Q5 enzyme (NEB) with the PCR program (98 • C for 30 s, 14 cycles of 98 • C for 10 s and 65 • C for 75 s, 65 • C for 5 min, hold at 4 • C). RNA-seq libraries were sequenced on an Illumina HiSeq X Ten platform, and reads were aligned to the human or mouse genome using TopHat (v2.0.14). The expression levels were calculated using the Cufflinks software (v2.2.1) with default parameters. All RNA-seq experiments were performed with at least two biological replicates.

Screening single-cell CRISPR clones by DNA-fragment editing
The generation of the CRISPR single-cell clones by DNAfragment editing was performed as previously described (41,42). Briefly, HEC-1-B or HEK293T cells at ∼80% confluency were transfected with Cas9 plasmids (0.6 g) and two HS5-1-NRSE-targeted sgRNA-expressing plasmids (1.2 g) by Lipofectamine 2000 (Life Technologies) in a 12-well plate. Two days post transfection, puromycin (Sigma) was added at a final concentration of 2 g/ml. Four days later, cells were changed to fresh medium without puromycin and cultured for another 2 days. The cells were then diluted and plated into 96-well plates with approximately one cell per well. Two weeks later, singlecell clones were marked manually under a microscope and screened for targeted deletion by PCR. At least two single-cell clones for each deletion were obtained. We screened for a total of 133 single-cell clones, and 4 homozygous clones were obtained and analyzed. Single-cell clones for each editing were confirmed by Sanger sequencing and the screening primers are shown in Supplementary  Table S1.

Generation of HS5-1-NRSE-deleted mice by CRISPR DNA-fragment editing
The generation of CRISPR mice was performed as previously described (33,41). Briefly, Cas9 mRNA and a pair of sgRNAs targeting the NRSE sequences were injected into zygotes. Cas9 mRNA for mouse zygote injection was in-vitro transcribed from the XbaI-linearized Cas9 plasmid with a T7 promoter using mMESSAGE mMACHINE T7 Ultra Kit (Life Technologies). The sgRNAs were in-vitro transcribed from PCR products with the MEGAshortscript Kit (Life Technologies). The PCR products were amplified with a forward primer containing a T7 promoter followed by targeting sequences and a common reverse primer. Then, the transcribed RNA was purified with the MEGAclear Kit (Life Technologies) and eluted with the TE buffer (0.2 mM EDTA).
Zygotes obtained from the oviducts of super-ovulated C57BL/6 mice were injected with Cas9 mRNAs (100 ng/l) and a pair of sgRNAs (50 ng/l each). After culturing with KSOM medium (Millipore) for 0.5 h at 37 • C in a 5% CO 2 incubator, the injected embryos were transplanted into the oviducts of pseudo-pregnant ICR foster mothers. The F0 mice with HS5-1 NRSE deletion were maintained and crossed to obtain F1 mice. F1 mice were genotyped again for heterozygous deletion. Heterozygous mice were then crossed to obtain homozygous mice. The wildtype littermates were used as controls.
For genotyping, mouse tails were lysed with 40 l of the alkaline lysis buffer (25 mM NaOH and 0.2 mM disodium EDTA, pH 12.0) at 98 • C for 40 min, and then neutralized with equal volume of the neutralizing buffer (40 mM Tris-HCl, pH5.0). 1 l of the solution containing genomic DNA was used as the template for PCR in a total volume of 20 l to screen for NRSE deletion with specific primers (Supplementary Table S1).

ChIP-seq
ChIP was performed as previously described (31,33) with some modifications. Briefly, the P0 mouse tissues were dispersed by 0.0125% (w/v) (for brain) or 0.0625% (w/v) (for kidney) collagenase (Sigma) treatment in DMEM supplemented with 10% (v/v) FBS for 20 min at 37 • C by rotating at 700 rpm. Cells were then filtered through a 100-m cell strainer (CORNING) to obtain single-cell suspension.
About 5 × 10 6 cells were cross-linked with 1% formaldehyde for 10 min at room temperature, followed by quenching the crosslinking with glycine at a final concentration of 0.125 M and then spun down. Subsequently, cell pellets were lysed twice with the ChIP buffer (10 mM Tris-HCl, pH 7.5, 0.15 M NaCl, 1% Triton X-100, 1 mM EDTA, 0.1% SDS, 0.1% sodium deoxycholate, 1× protease inhibitors) by incubating at 4 • C for 10 min with slow rotation. Nuclei were spun down and resuspended in 400 l of the ChIP buffer (for human cells) or ChIP buffer with 0.4% SDS (for cells from mouse tissues), and then sonicated using the Vibracell ultrasonic processor (Sonics) (25% maximum, a train of 20 s on and 20 s off for 15 cycles). The sonicated solution was diluted with the ChIP buffer (1:5), pre-cleared with 40 l Protein A agarose beads (Millipore), and then immunoprecipitated with specific antibodies against REST/NRSF (Millipore, 4 g for each reaction), CTCF (Millipore, 2.5 g for each reaction), H3K4me3 (Millipore, 2.5 g for each reaction), and H3K27ac (Abcam, 2.5 g for each reaction). The protein-DNA complexes were enriched with 40 l Protein A/G magnetic beads (Thermo) by incubating at 4 • C for another 3h with slow rotation, and washed once with 1 ml of ChIP buffer, once with 1 ml of ChIP buffer with 0.4M NaCl, once with ChIP buffer without NaCl, and once with ml of LiCl buffer (250 mM LiCl, 5% NP-40, 0.5% sodium deoxycholate), by incubating at 4 • C for 10 min with slow rotation. Then, the samples were eluted with elution buffer (50 mM Tris-HCl, pH 8.0, 10 mM EDTA, 1% SDS) at 65 • C, shaken at 1000 rpm for 30 min, and then reverse-crosslinked by heating at 65 • C overnight. The DNA was purified with phenol-chloroform and precipitated with ethanol, and then used for library preparation.
Libraries were prepared according to the manual of Universal DNA Library Prep Kit for Illumina V2 (Vazyme). Briefly, DNA was end-repaired using End Prep Mix 3, ligated with index-containing adaptors using rapid DNA ligase in the rapid ligation buffer (66 mM Tris-HCl pH 7.6, 20 mM MgCl 2 , 2 mM DTT, 2 mM ATP, 7.5% PEG 6000), and amplified using primers and amplification mix to generate libraries. The library fragments of ∼350 bp (insert plus adaptor and PCR primer sequences) were selected and isolated with Agencourt AMPure XP beads (Beckman). Library DNA was sequenced on an Illumina HiSeq X Ten platform. The reads were sorted by indexes, aligned to hg19 or mm9 using Bowtie2, and peak-called using MACS2. All of the ChIP-seq experiments were performed with at least two biological replicates.

Quantitative high-resolution chromosome conformation capture copy (QHR-4C)
QHR-4C experiments were performed as previously described (33) with some modifications. Briefly, cells were har-vested from mice as described in ChIP-seq experiments. Then, ∼2 million cells were crosslinked with 1% formaldehyde for 10 min at room temperature and the crosslinking reaction was quenched by glycine at a final concentration of 0.125 M. After spun down and washed twice with 1× PBS, cell pellets were lysed twice with cold lysis buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 1% Triton X-100, 5 mM EDTA, 0.5% NP-40, 1× protease inhibitors). The pellet was resuspended in 225 l of water, 30 l of 10× Dp-nII buffer, and 7.5 l of 10% SDS and incubated at 37 • C for 1 h with constant shaking at 900 rpm. After quenching by adding 37.5 l of 20% Triton X-100 and incubating at 37 • C for 1 h with shaking at 900 rpm, the nuclei were then digested with 10 l of DpnII (10 unit/l) in situ at 37 • C with shaking at 900 rpm overnight. After enzyme inactivation at 65 • C for 20 min, nuclei were centrifuged at 2500 g for 5 min, resuspended and ligated in 358 l of water, 40 l of 10× T4 ligation buffer, and 2 l of T4 DNA ligase (400 unit/l) overnight at 16 • C. The ligated product was then reverse crosslinked by adding 10 l of proteinase K (10 mg/ml) and heating at 65 • C for 4 h, followed by adding 2 l of RNase A (NEB) and incubating at 37 • C for 45 min. DNA was then extracted using phenol-chloroform. 2 l of glycogen (20 mg/ml) was added to facilitate DNA precipitation. The precipitated DNA was dissolved in 50 l of water, and then sonicated to 200-600 bp using the Bioruptor system with a low energy setting at a train of 30 s on with 30s off for 12 cycles.
With the fragmented DNA as a template, a linearized amplification step was applied using a 5 biotin-tagged primer (Supplementary Table S1) complementary to the viewpoint fragment in 100 l of PCR system. The PCR product was denatured at 95 • C for 5 min and immediately chilled on ice to obtain ssDNA. The ssDNA was enriched and purified with Streptavidin Magnetic Beads (Thermo) according to the manufacturer's instructions, and then ligated with annealed adaptors in 45 l of the ligation system (20 l of DNA-coated beads, 4.5 l of 10× T4 ligation buffer, 10 l of 30% PEG 8000, 1 l of 50 M adaptor, 0.9 l of T4 DNA ligase, 8.6 l of water). The beads were then washed twice to remove the free adaptors with the B/W buffer (5 mM Tris-HCl, 1 M NaCl, 0.5 mM EDTA, pH 7.5). The DNA-coated beads were then resuspended in 10 l of water. With the DNA on beads as a template, the QHR-4C libraries were generated by PCR amplification with a pair of primers (Supplementary Table S1). Finally, the libraries were purified with a PCR purification kit (Qiagen) and sequenced on an Illumina HiSeq X Ten platform. Reads were sorted using indexes and barcodes, mapped to the mouse (NCBI37/mm9) or human (GRCh37/hg19) genomes using Bowtie2, and calculated using the r3Cseq program (version 1.20) in the R package (version 3.3.3). All of the QHR-4C experiments were performed with at least two biological replicates.

Structural modeling and molecular dynamics (MD) simulations
Structural modeling and MD simulations were performed as previously described (43). Briefly, we first built the models of REST-HS5-1-NRSE and REST-PCDHγ a6-NRSE based on the crystal structure of the ZF protein PR/SET domain 9 (PRDM9) in complex with DNA (PDB id: 5v3g). The REST/NRSF structure was created using the homology modeling strategy and DNA was modeled by directly replacing the DNA nucleobases in 5v3g to the ones of HS5-1-NRSE or PCDHγ a6-NRSE. In particular, the orientation of the REST-ZF6 in REST-HS5-1-NRSE in respect to DNA was constructed according to the formerly captured CTCF-ZF8 structure that displays a non-DNA-penetration conformation (44,45).
Then, the above two DNA-bound REST/NRSF complexes were subject to energy minimization, followed by MD simulations. The protein/DNA were described using the AMBER force field ff14SB, with the bsc1 corrections used for the DNA nucleotides. Each solute was immersed in a cubic box filled with the TIP3P waters, and the appropriate number of Na + ions were added to neutralize the whole system. The final structure includes a total of 49652 and 68845 atoms for REST-PCDHγ a6-NRSE and REST-HS5-1-NRSE, respectively. Energy minimization was conducted using the steepest descent and conjugate gradient methods. Then, each complex was heated from 0 to 310 K within 200-ps MD simulations, followed by 200-ps equilibrium MD simulations by constraining all of the solute heavy-atoms. Finally, we performed three parallel 50-ns production MD simulations, each initiated from different velocities. The temperature was controlled by the Langevin thermostat at 310 K and the SHAKE algorithm was used to constrain the bond lengths involving hydrogen atoms. The non-bonded cutoff distance was set as 12Å, and the longrange electrostatic interactions were treated by the particle mesh Ewald (PME) method. All the MD simulations were conducted using the AMBER 2018 package.

DNA methylation analysis
About 5 × 10 5 cells were harvested from mice and crosslinked with formaldehyde as described above in the ChIP-seq experiments. The harvested cells were then lysed and reverse-crosslinked by adding 400 l of 1× TE buffer containing 5 l of proteinase K (10 mg/ml) and heating at 65 • C for 4 h. The genomic DNA was extracted from the lysed solution with phenol: chloroform: isopentanol (25:24:1, v/v) and resuspended in 40 l of nuclease-free water. Then bisulfite conversions of the DNA were performed using the EpiTect Plus DNA Bisulfite Kit (Qiagen) according to the manual. Briefly, DNA was converted with bisulfite solution in the thermal cycler under the condition (95 • C for 5 min, 60 • C for 25 min, 95 • C for 5 min, 60 • C for 1 h 5 min, 95 • C for 5 min, 60 • C for 2 h 55 min). Then, the converted DNA was desulfonated and cleaned up with the MinElute DNA spin column, amplified using the bisulfite sequencing primers (Supplementary Table S1) with GoTaq G2 Green Master Mix (Promega) under the condition (95 • C for 10 min, 38 cycles of 95 • C for 30 s, 55 • C for 30 s and 72 • C for 30 s, and a final extension at 72 • C for 5 min). The PCR products were gel-purified using MinElute Gel Extraction Kit (Qiagen) and ligated into the pGEM-T Easy vector (Promega) for Sanger sequencing. The results were analyzed using a quantification tool for methylation analysis (QUMA) (46).

A repertoire of NRSEs in clustered PCDH genes
To investigate mechanisms of neural-specific expression of the clustered PCDH genes in the brain, we performed REST/NRSF ChIP-nexus experiments in model cells of neuroblastoma SK-N-SH and endometrial carcinoma HEC-1-B (33,36,38) and found 5276 and 9681 REST/NRSF sites, respectively ( Figure 1C). In the clustered PCDH loci, we found a repertoire of numerous REST/NRSF sites within variable regions of the PCDH α and γ , but not β, clusters ( Figure 1B, Supplementary Figure  S1A). In particular, there are two REST/NRSF sites in the variable exons of PCDHγ a (Supplementary Figure S1B). Sequence analyses identified three conserved REST/NRSF sites, designated as 'a', 'b' and 'c', within the PCDH variable exons ( Figure 1D, Supplementary Figures S2 and S3). Specifically, site 'a' in PCDHα, site 'b' in PCDHγ a, and site 'c' in the PCDHγ alternate exons match the REST/NRSF consensus motif (Supplementary Figures S2 and S3) (16).
To investigate the recognition mechanisms of these PCDH sites, we performed comprehensive EMSA experiments using recombinant REST/NRSF proteins. For site 'a', we designed a repertoire of probes for each alternate member of the PCDHα cluster and three probes in the same location for members of the PCDHβγ clusters as negative controls. EMSA experiments revealed that site 'a' in the PCDHα cluster, but not in the PCDH β or γ cluster, is recognized by REST/NRSF in vitro. Specifically, there is a shifted band with probes for each PCDHα alternate exon and a supershifted band when incubated with a specific antibody ( Figure 1E). Similarly, for site 'b', only members of the PCDHγ a subfamily (except PCDHγ a9) are recognized by REST/NRSF ( Figure 1F). Consistent with the nonbinding to the PCDHγ a9 probe (Figure 1F), the first and fourth positions of the PCDHγ a9 site had been mutated from consensus 'G' and 'C' to non-consensus 'A' and 'T' nucleotides during the evolution, respectively (Supplementary Figure S3A). For site 'c', members of both the PCDHγ a and PCDHγ b subfamilies are recognized by REST/NRSF in vitro ( Figure 1G). These data suggest that there is one NRSE in each alternate exon of PCDHα (site 'a') and PCDHγ b (site 'c'), two NRSEs in each alternate exon of PCDHγ a (sites 'b' and 'c', except PCDHγ a9), and no NRSE in the exons of the PCDHβ cluster. Consistently, each variable exon of the PCDHβ cluster is an independent gene whereas every variable exon of the PCDHα and PCDHγ a clusters is alternatively spliced to the respective constant exons. Together, these observations suggest that members of the PCDHβ cluster may be regulated differently from PCDHαγ .
The five C-type PCDH exons are distinct from alternate exons both in their encoded protein sequences and regulatory mechanisms (22,36,38,47,48). We performed EMSA experiments for each member of the C-type exons and found that each contains a REST/NRSF site ( Supplementary Figure S4A and B). We also systematically examined candidate recognition sites in the super-enhancers of the PCDH α and βγ clusters by EMSA experiments and found that HS5-1 and HS7 each contains a REST/NRSF site (Supplementary Figure S4C and D). However, in the super-enhancer of the PCDHβγ clusters, we could not find any authentic REST/NRSF site (Supplementary Figure S4C and E).
In summary, in the PCDHα cluster, both variable exons and the downstream super-enhancer contain NRSEs; but in the PCDHβγ clusters, only PCDHγ but neither PCDHβ nor their downstream super-enhancer contains NRSE, consistent with the neural-specific expression of PCDHα, but not PCDH β or γ genes (Supplementary Figure S4F) (49).

Directional REST/NRSF binding to PCDH NRSEs
To investigate REST/NRSF DNA-recognition mechanisms, we generated a set of truncated REST/NRSFs through the sequential deletion of ZF domains from either N-or C-terminus (Figure 2A-C). We found that, for the series of C-terminal deletion mutants, truncation up to ZF6 (ZF1-5) does not perturb REST/NRSF binding, but truncation up to ZF5 (ZF1-4) abolishes REST/NRSF binding to the noncanonical HS5-1 site, suggesting that ZF5 plays an important role in the recognition of the HS5-1 site (Figure 2D). The same result was also observed using the righthalf only site of PCDHα8, suggesting that ZF5 is also essential for the recognition of the right-half site ( Figure 2E). For the series of N-terminal deletion mutants, truncation up to ZF4 (ZF5-8) does not perturb the REST/NRSF binding to the noncanonical HS5-1 site ( Figure 2D), but remarkably results in a significant decrease in binding levels to the right-half only site of PCDHα8 ( Figure 2E), suggesting that ZF4 probably recognizes nucleotides within the righthalf site. This is consistent with previous studies on a Cterminal truncated form of REST/NRSF (50). Finally, we tested the binding ability of ZF-truncated REST/NRSF to two canonical sites and found that ZF5 is also required for their binding (Supplementary Figure S5A).
To further investigate the differential roles of ZF domains in NRSE recognition, we substituted the two Cys residues of each REST/NRSF tandem ZF domain with two Arg residues ( Figure 2F-I). Substitution of Cys in ZF1 or ZF2 with Arg residues results in a significant decrease in the REST/NRSF binding to the right-half only site of PCDHα8, but does not perturb the binding to the noncanonical and canonical NRSEs ( Figure 2J and K, Supplementary Figure S5B), suggesting that ZF1 and ZF2 are required for the binding to the right-half only site but are dispensable for REST/NRSF binding to the noncanonical and canonical NRSEs with both left-and right-half sites. In addition, substitution of Cys in ZF3, ZF4 or ZF5 with Arg residues affects the REST/NRSF binding to all NRSEs tested, including noncanonical, canonical, and right-half only sites ( Figure 2J and K, Supplementary Figure S5B). This suggests that that ZF3-5 recognize the right-half sites. Finally, substitution of Cys in ZF7 or ZF8 with Arg residues does not affect REST/NRSF binding to the right-half only site of PCDHα8, but results in a significant decrease in REST/NRSF binding levels to the noncanonical and canonical NRSEs ( Figure 2J and K, Supplementary Figure  S5B), suggesting that ZF7-8 recognize nucleotides within the left-half site. Together, these REST/NRSF truncation and mutation experiments suggest that ZF3-5 recognize the right-half while ZF7-8 recognize the left-half sites. To further investigate the directionality of REST/NRSF DNA recognition, we sequentially mutated triple nucleotides (Mut1, Mut2 and Mut3) of the right-half only site of PCDHα8 and found that these mutations abolish the binding of the C-terminal truncated REST/NRSF ( Figure  2L). For the N-terminal truncated proteins, Mut3 abolishes the binding of ZF2-8 and ZF3-8, but not ZF4-8, suggesting that ZF3 binds to the three nucleotides corresponding to Mut3 ( Figure 2L). Considering the essential role of ZF4 and ZF5 in binding to the right-half site ( Figure 2D and E, J and K), we conclude that they recognize the triple nucleotides corresponding to Mut2 and Mut1, respectively.
We next generated a series of mutated NRSE probes of the PCDH HS5-1 noncanonical site with sequential substitution of triple nucleotides ( Figure 2M) and performed comprehensive EMSA experiments using REST/NRSF with sequential ZF deletions. Similar to the right-half only site, EMSA experiments demonstrated that mutations of the right-half (Mut1-3), but not left-half (Mut4-6), of the noncanonical site affect the REST/NRSF binding only if it contains ZF1-5 ( Figure 2N). By contrast, mutations of the left-half (Mut4-6), but not right-half (Mut1-3), of the noncanonical site appear to affect the REST/NRSF binding only if it contains ZF5-8 ( Figure 2O), again suggesting that ZF3-5 bind to the right-half site while ZF7-8 bind to the left-half site of NRSEs. Moreover, we generated an additional series of triple nucleotide mutations of the lefthalf site with a different phase (Mut7-10) ( Figure 2M). Remarkably, we found that Mut8 and Mut9 almost abolish the REST/NRSF binding completely but Mut7 and Mut10 have much less effect ( Figure 2P), suggesting that the six nucleotides (CAGCAC) corresponding to Mut8 and Mut9 play a major role. In conjunction with the fact that ZF6 mutation does not appear to affect the REST/NRSF binding ( Figure 2J, Supplementary Figure S5B), we conclude that the six conserved nucleotides of the left-half site are recognized by ZF7-8. Finally, we mutated sequences downstream of the Mut3 site (Mut11 and Mut12) and found that they do not affect REST/NRSF binding, suggesting that ZF2 and ZF1 have no major role in binding the right-half only site of PCDH␣8 (Supplementary Figure S5C). Taken together, we conclude that REST/NRSF base-specifically recognizes DNA duplexes in an antiparallel manner via tandem ZF3-8 ( Figure 2Q).

ChIP-nexus peaks with multiple NRSE motifs
We previously found that some CTCF peaks contains more than one CBS element (51). We also noted that some REST/NRSF peaks contain more than one NRSE. For example, one peak in the first exon of the neural protocadherin CELSR3 gene contains four tandem half-site motifs in the configuration of left-11bp-right-9bp-left-11bp-right ( Figure 3A). These four tandem half-site motifs either function as four half-site only NRSEs or as two noncanonical NRSEs separated by 9 bp (Figure 3A). These sites have previously been shown to play an important role in the regulation of neural expression of the protocadherin CELSR3 gene (52).
To investigate how REST/NRSF binds to these composite tandem sites, we prepared two probes each matching one of the hypothetical noncanonical NRSEs as well as a probe containing all of the four tandem sites (Figure 3B-D). EMSA experiments showed only a single shifted band for either left or right noncanonical NRSE ( Figure 3B and C); by contrast, we observed two shifted bands for the probe containing all four tandem sites ( Figure 3D). Two similarly shifted bands were also observed for the PCDHγ a tandem site ( Figure 3E, Supplementary Figure S3). Taken together, these observations suggest that some REST/NRSF ChIPnexus peaks contain more than one NRSE in tandem and that each NRSE is bound by one REST/NRSF protein molecule in only one orientation ( Figure 3F).

Distinct ZF6 conformations for canonical and noncanonical NRSEs
Analysis of REST/NRSF binding sites of ChIP-nexus data revealed that the gaps between left-and right-half sites are either 2 bp in canonical NRSEs or 7-9 bp in noncanonical NRSEs, but intriguingly could not be 3-6 bp ( Figure  4A). To investigate how REST/NRSF tolerates the flexible length of gaps in NRSE motifs, we performed molecular dynamics (MD) experiments using ZF3-8 and the site 'c' canonical NRSE motif of PCDHγ a6 with a 2-bp gap (Supplementary Figure S3C and D). We found that these six ZFs wrap around the DNA duplex with the N-terminus of each ZF ␣-helix inserted into the major groove and that this binding complex can remain stable during the MD simulations ( Figure 4B), even though EMSA experiments demonstrated that ZF6 is not essential for binding to canonical NRSEs (Supplementary Figure S5A and B). We then performed MD experiments using ZF3-8 and the noncanonical NRSE motif of HS5-1 with an 8-bp gap. We found that ZF3-5 and ZF7-8 wrap around the DNA duplex in the major groove and form base-specific contacts; however, in this case ZF6 flips out of the DNA major groove and remains parallel to the axis of DNA during the whole MD simulation to span the 8-bp gap ( Figure 4C), consistent with no alteration of binding upon ZF6 mutation ( Figure 2J). Thus, REST/NRSF recognizes canonical and noncanonical NRSEs with remarkable conformation changes.
We noted that in the 3232 canonical NRSE motifs with 2-bp gaps, the ninth position is a highly conserved base of 'C' ( Figure 4A); however, this 'C' is barely conserved in the noncanonical NRSE motifs with variable large gaps of 7-9 bp ( Figure 4A). Nevertheless, our EMSA experiments demonstrated that ZF7-8 and ZF3-5 contact with the leftand right-half sites, respectively, of both canonical and noncanonical NRSEs in a base-specific manner (Figures 1-3,  Supplementary Figure S5). To investigate why the ninth position 'C' is more conserved in canonical than noncanonical NRSEs, we simulated base-recognition of ZF6 by MD in the two different conformations. We found that, in the canonical NRSEs, the side chain of Arg349 in ZF6 can form stable hydrogen bonds with the base 'G' in the complementary strand ( Figure 4B, Supplementary Figure S5D). In the noncanonical NRSEs, however, ZF6 is not a reader of any NRSE base but is positioned parallel to the DNA axis to span the much larger gap ( Figure 4C). This explains why the ninth position 'C' of canonical NRSEs is much more conserved than that of the noncanonical NRSEs ( Figure 4A).

REST/NRSF inhibits long-distance enhancer-promoter contacts
Most NRSEs are located distal from promoters (16,17). To investigate how these NRSEs regulate gene expression, we focused on the HS5-1 noncanonical site and perturbed either the REST/NRSF protein or the NRSE cis-element. We first knocked down REST/NRSF by designing three different shRNAs and found that REST/NRSF knockdown results in a significant increase in the expression levels of PCDH α6 and α12 genes in HEC-1-B cells ( Figure 5A and B). In addition, we found that deposition of active chromatin marks, H3K4me3 and H3K27ac, is also significantly increased in both the HS5-1 enhancer and PCDHα promoters ( Figure 5C-E), suggesting that REST/NRSF regulates PCDHα expression through epigenetic modifications or the deposition of H3K4me3 and H3K27ac is the consequence of transcription upon REST/NRSF knockdown (53). Because the expression of PCDHα genes depends on their long-distance chromatin interactions with the distal enhancer (36), to see whether the 3D chromatin architecture of the locus is altered, we performed quantitative highresolution chromosome conformation capture followed by next-generation sequencing (QHR-4C) experiments (33) with either HS5-1 or PCDHα12 as a viewpoint. Remarkably, we found that there is a subtle and reproducible increase in long-distance chromatin interactions between the HS5-1 enhancer and its target PCDHα genes upon REST/NRSF knockdown ( Figure 5F and G), which explains the increased expression levels of the PCDHα genes in HEC-1-B cells ( Figure 5B). Similar effects were also observed in HEK293T cells upon REST/NRSF knockdown (Supplementary Figure S6A-C).
We then deleted the NRSE within the HS5-1 enhancer in HEC-1-B and HEK293T cells by screening single-cell CRISPR clones through DNA-fragment editing (41,42). We obtained two single-cell homozygous clones with NRSE deletion in each cell line (Supplementary Figure S6D-G). RNA-seq revealed that deletion of the HS5-1 NRSE results in a significant increase of expression levels of the PCDH α6 and α12 genes ( Figure 6A, Supplementary Figure S6H). In addition, the deposition of H3K4me3 and H3K27ac histone marks is also significantly increased in both the HS5-1 enhancer and PCDHα target promoters in these single-cell CRISPR clones ( Figure 6B-D, Supplementary Figure S6I-K). Similar to the REST/NRSF knockdown, deletion of the HS5-1 NRSE also results in a significant increase in long-distance chromatin interactions between the HS5-1 enhancer and PCDHα target promoters ( Figure 6E and F). Together, these data suggest that HS5-1bound REST/NRSF suppresses the expression of PCDHα genes and may indirectly modulate histone tails and longdistance chromatin interactions.

NRSE deletion reshapes Pcdhα 3D chromatin structure in vivo
To see whether HS5-1-bound REST/NRSF modulates histone tails and chromatin architectures in vivo, we deleted the HS5-1 NRSE in mice through CRISPR pronuclear injection with dual sgRNAs ( Figure 7A, Supplementary Figure S7A and B). ChIP-seq showed that the binding of REST/NRSF to HS5-1 is abolished upon NRSE deletion (Supplementary Figure S7C). We first confirmed that there are significantly higher levels of REST/NRSF expression in kidney than in cortical tissues by Western blot and RNAseq ( Figure 7B and C). Interestingly, there is a significant increase in expression levels of members of the Pcdhα cluster in kidney but not in cortical tissues upon deletion of the HS5-1 NRSE (Figure 7D and E). In addition, the deposition of the active histone mark of H3K4me3 is significantly increased in HS5-1 and in members of the Pcdhα cluster but not in Pcdhβ1 in kidney tissues ( Figure 7F).  To investigate why the distal sites were affected, we performed CTCF ChIP-seq experiments and found that there is a significant increase of CTCF enrichments in HS5-1 and in members of the Pcdhα cluster in kidney tissues ( Figure  7G, Supplementary Figure S7D and E). In addition, there is a significant decrease of CpG methylation in kidney but not cortical tissues (Supplementary Figure S8). Finally, QHR-4C experiments demonstrated that there is a significant increase in long-distance chromatin interactions between the distal HS5-1 enhancer and Pcdhα target genes upon NRSE deletion ( Figure 7H and I). Considering the conserved organization of the Pcdh locus as well as conserved locations of CTCF (31) and REST/NRSF sites between mice and humans (Supplementary Figure S9), these data suggest that REST/NRSF represses expression of Pcdhα target genes and may indirectly modulate higher-order chromatin structures ( Figure 7J).

DISCUSSION
The clustered PCDHs participate in a wide variety of neurodevelopmental processes such as dendritic self-avoidance, axonal even spacing and tiling, spine morphogenesis and synaptogenesis, and neuronal migration and survival (23,54). Through stochastic and combinatorial expression, the clustered PCDH genes encode countless assemblies of cell-surface molecules to endow each neuron with a unique identity code (33,38,47,(55)(56)(57)(58). Thus, the spatiotemporal expression patterns of the clustered PCDH genes must be precisely regulated during brain development. REST/NRSF is a key repressor of neural genes in neuronal progenitors and its derepression is central for neurogenesis and neuronal differentiation (2,3,8). Here, we found that REST/NRSF recognizes diverse PCDH NRSEs in an antiparallel manner via base-specific contacts with tandem ZF domains. In addition, MD simulations revealed that REST/NRSF endures different gap sizes of canonical and noncanonical NRSEs by adopting distinct conformations for ZF6. Finally, through genetic approaches, we demonstrated that the mechanism by which enhancerbound REST/NRSF represses the expression of distant PCDHα genes and may modulate higher-order chromatin structures.
The stochastic expression of clustered PCDHα genes is achieved through promoter choice determined by CTCF/cohesin-mediated chromatin interactions between the HS5-1 enhancer and its target promoters (29,31,36,38). In the PCDHβγ clusters, topological chromatin interac- tions between tandem CTCF sites determine the balanced expression of members of the PCDHβγ genes (33). In contrast, how PCDH genes are silenced in neural progenitors and nonneural tissues remains largely unknown. We showed here that the binding of REST/NRSF to the HS5-1 enhancer inhibits the expression of the PCDHα genes through CTCF/cohesin-mediated chromatin interactions. We identified NRSEs in each member of the PCDH α and γ gene clusters. These sites may be related to the repression of nonchosen members of clustered PCDH α or γ genes in single cells in the brain because REST/NRSF has been shown to directly repress neural gene expression (6,59,60).
REST/NRSF binds to diverse NRSEs in vivo with distinct and hierarchical affinities according to their sequence variations (16,61,62). It is known that ZF domains are important for REST/NRSF binding to DNA duplexes (16,50); however, it is puzzling why there exist two major classes of canonical and noncanonical NRSEs with distinct gap sizes in the human genome (16). By a combination of EMSA and molecular dynamics experiments, we found that, in the case of canonical NRSEs, each ZF domain of tandem ZF3-8 is inserted into the major groove of DNA duplexes and recognizes standard 3 basepairs ( Figure 4B). By contrast, in the case of noncanonical NRSEs, ZF7-8 and ZF3-5 are inserted into the major grooves of the left-and right-half sites. However, in this case, ZF6 is flipped out of the major groove and positioned nearly parallel to the axis of the DNA duplexes, serving as a spacer element to tolerate variable distances between the two half sites. Thus, ZF6 functions as bridge-like spacer for the flexible gaps connect-ing the left-and right-half sites of the noncanonical NRSE motifs ( Figure 4C). This explains the long-standing mystery that REST/NRSF recognizes both canonical and noncanonical classes of NRSE motifs across the entire human genome (16).
MD demonstrated that, for the canonical NRSEs, the Arg349 of the ZF6 forms base-specific contacts with the 'G' base of the complementary strand at the 9 th position ( Figure  4B, Supplementary Figure S5D), explaining the observed conservation of the 'C' nucleotide at the ninth position of canonical NRSE motifs ( Figure 4A). For the noncanonical NRSEs, however, once ZF6 has flipped out of the major groove, it cannot form base-specific contacts with any nucleobase. Instead, the flexible linkers between ZF5 and ZF6, and between ZF6 and ZF7, allow REST/NRSF to tolerate variable gap sizes between the left-and right-half sites of noncanonical NRSE motifs ( Figure 4C). Finally, because of the biophysical hindrance of the flipped ZF6, the gap distance between the left-and right-half sites cannot be 3-6 bp, consistent with the fact that the flexible gaps of the noncanonical NRSEs cannot be <7 bp in the human genome ( Figure 4). Together, our data shed significant insights into mechanisms by which REST/NRSF recognizes diverse genomic DNA sites and regulates gene expression.

DATA AVAILABILITY
High-throughput sequencing files (ChIP-nexus, ChIP-seq, RNA-seq and QHR-4C) have been deposited into the NCBI Gene Expression Omnibus (GEO) database with the accession number GSE150254.