Epigenetic reprogramming of a distal developmental enhancer cluster drives SOX2 overexpression in breast and lung adenocarcinoma

Abstract Enhancer reprogramming has been proposed as a key source of transcriptional dysregulation during tumorigenesis, but the molecular mechanisms underlying this process remain unclear. Here, we identify an enhancer cluster required for normal development that is aberrantly activated in breast and lung adenocarcinoma. Deletion of the SRR124–134 cluster disrupts expression of the SOX2 oncogene, dysregulates genome-wide transcription and chromatin accessibility and reduces the ability of cancer cells to form colonies in vitro. Analysis of primary tumors reveals a correlation between chromatin accessibility at this cluster and SOX2 overexpression in breast and lung cancer patients. We demonstrate that FOXA1 is an activator and NFIB is a repressor of SRR124–134 activity and SOX2 transcription in cancer cells, revealing a co-opting of the regulatory mechanisms involved in early development. Notably, we show that the conserved SRR124 and SRR134 regions are essential during mouse development, where homozygous deletion results in the lethal failure of esophageal–tracheal separation. These findings provide insights into how developmental enhancers can be reprogrammed during tumorigenesis and underscore the importance of understanding enhancer dynamics during development and disease.


INTRODUCTION
De v elopmental enhancers are commissioned during early embryogenesis, as transcription factors progressi v ely restrict the epigenome through the repression of regulatory regions associated with pluripotency ( 1 , 2 ) and the activation of enhancers that control the expression of lineage-specific de v elopmental genes (3)(4)(5).This estab lishes a cell type-specific epigenetic regulatory 'memory' that maintains cell lineage commitment and r einfor ces transcriptional programs ( 6 ).As cells mature and de v elopment ends, de v elopmental-associated enhancers are decommissioned, and the enhancer landscape becomes highly restricti v e and de v elopmentally stab le ( 6 ).This landscape, howe v er, becomes profoundly disturbed during tumorigenesis, as cancer cells aberrantly acquire euchroma tin fea tures at regions near oncogenes ( 7 , 8 ) that are often associated with earlier stages of cell lineage specification ( 6 ).This 'enhancer reprogramming' has been proposed to result in a d ysfunctional sta te tha t causes widespread abnormal gene expression and cellular plasticity (9)(10)(11)(12)(13).Although the misactivation of enhancers has been suggested as a major source of transcriptional d ysregula tion (re vie wed in 14 , 15 ), it remains largely unclear how this mechanism unfolds during the progression of cancer.To study this process, we evalua ted cis-regula tory elements involv ed in dri ving transcription during normal de v elopment and disease.
SRY-box transcription factor 2 (SOX2) is a pioneer transcription factor r equir ed for pluripotency maintenance in embryonic stem cells ( 16 , 17 ), involved in reprogramming dif ferentia ted cells to induced pluripotent stem cells in mammals (18)(19)(20), and acts as an oncogene in se v eral different types of cancer (re vie wed in 21 , 22 ).During later development, SOX2 is also r equir ed for tissue morphogenesis and homeostasis of the brain ( 23 ), eyes ( 24 ), esophagus ( 25 ), inner ear ( 26 ), lungs ( 27 ), skin ( 28 ), stomach ( 29 ), taste buds ( 30 ) and trachea ( 31 ) in both human and mouse.In these tissues, SOX2 expression is regulated precisely in space and time at critical stages of de v elopment, although in most cases the cis-r egulatory r egions that mediate this precision remain unknown.For example, proper levels of SOX2 expr ession ar e r equir ed during early de v elopment for the complete separation of the anterior foregut into the esophagus and trachea in mice ( 25 , 32 , 33 ) and in humans (34)(35)(36), as the disruption of SOX2 expression leads to an abnormal de v elopmental condition known as esophageal atresia with distal tracheoesophageal fistula (EA / TEF) (re vie wed in 37 , 38 ).After the anterior foregut is properly separated in mice, Sox2 expression ranges from the esophagus to the stomach in the gut ( 25 , 29 ), and throughout the trachea, bronchi and upper portion of the lungs in the de v eloping airways ( 31 ).Proper branching morphogenesis at the tip of the lungs, howe v er, r equir es temporary down-r egulation of Sox2 , followed b y reactiv ation after lung bud establishment ( 27 ).Sox2 also retains an essential function in multiple mature epithelial tissues, where it is highly expressed in proliferati v e and self-rene wing adult stem cells necessary for replacing terminally dif ferentia ted cells within the epithelium of the brain, bronchi, esophagus, stomach and trachea ( 29 , 31 , 39 , 40 ).The expression of Sox2 , however, becomes repressed as stem cells dif ferentia te in these tissues ( 39 ).
Here, we investigated the mechanisms underlying SOX2 ov ere xpression in cancer.We found that, in breast and lung adenocarcinoma, SOX2 is dri v en by a novel developmental enhancer cluster we termed SRR124-134, rather than the previously identified SRR1, SRR2 or the SCR.This novel distal cluster contains two regions located 124 and 134 kb downstream of the SOX2 promoter that dri v e transcription in breast and lung adenocarcinoma cells.Deletion of this cluster results in significant SOX2 down-regulation, leading to genome-wide changes in chromatin accessibility and a globally disrupted transcriptome.The SRR124-134 cluster is highly accessible in most breast and lung patient tumors, where chromatin accessibility at these regions is correlated with SOX2 ov ere xpression and is regulated positi v ely by FOXA1 and negati v ely by NFIB.Finally, we found that both SRR124 and SRR134 are highly conserved in the mouse and are essential for postnatal survival, as homozygous deletion of their homologous r egions r esults in lethal EA / TEF.These findings serve as a prime example of how different types of cancer cells reprogram enhancers that were decommissioned during de v elopment to dri v e the e xpression of oncogenes during tumorigenesis.

Genome editing
Guide RN A (gRN A) sequences were designed using Benchling.We minimized the possibility of unwanted off-target m utations by strictl y selecting gRN A with no off-target sites with < 3 bp mismatches.Pairs of gRNA plasmids were constructed by inserting a 20 bp target sequence (Supplementary Table S1) into an empty gRNA cloning vector (a gift from George Church; Addgene plasmid #41824) ( 74 ) containing either miRFP670 (Addgene plasmid #163748) or tagBFP (Addgene plasmid #163747) fluorescent markers.Plasmids were sequenced to confirm correct insertion.Both gRNA (1 g each) vectors were co-transfected with 3 g of pCas9 GFP (a gift from Kiran Musunuru; Addgene plasmid #44719) ( 75 ) using Neon electroporation (Life Technologies).After 72 h of transfection, cells were sorted by fluorescence-activated cell sorting (FACS) to select clones that contained all three plasmids.Sorted tagBFP + / GFP + / miRFP670 + cells were grown in a bulk population and serially diluted into individual wells to generate isogenic populations.Once fully grown, each well was screened by polymerase chain reaction (PCR) to confirm the deletion (Supplementary Table S2).Enhancer-deleted cells are available to the r esear ch community upon request.

Gene tagging
SOX2 was tagged with a P2A-tagBFP sequence in both alleles using clustered regularly interspaced palindromic repea ts (CRISPR)-media ted homology-dir ected r epair (HDR) ( 76 ).This strategy results in the expression of a single transcript that is further translated into two separate proteins due to ribosomal skipping ( 77 ).In summary, we designed a gRNA that targets the 3 end of the SOX2 stop codon (Supplementary Table S1, Addgene plasmid #163752).We then amplified ∼800 bp homology arms upstream and downstream of the gRNA target sequence using high-fidelity Phusion Pol ymerase.We purposel y avoided amplification of the SOX2 promoter sequence to reduce the likelihood of random integrations in the genome.Both homology arms were then joined at each end of a P2A-tagBFP sequence using Gibson assembly.Flanking primers containing the gRNA target sequence were used to reamplify SOX2 -P2A-tagBFP and add gRNA targets at both ends of the fragment; this approach allows excision of the HDR sequence from the backbone plasmid once inside the cell ( 78 ).Finally, the full HDR sequence was inserted into a pJET1.2(Thermo Scientific) backbone, midiprepped and sequenced (Addgene #163751).A 3 g aliquot of HDR template was then co-transfected with 1 g of hCas9 (a gift from George Church; Addgene plasmid #41815) ( 74 ) and 1 g of gRNA plasmid using Neon electroporation (Life Technologies).A week after transfection, tagBFP + cells were FACS sorted as a bulk population.Sorted cells were further grown for 2 weeks, and single tagBFP + cells were isolated to genera te isogenic popula tions.Once fully grown, each clone was screened by PCR and sequenced to confirm homozygous integration of P2A-tagBFP into the SOX2 locus (Supplementary Table S2).MCF-7 SOX2 -P2A-tagBFP cells are available to the r esear ch community upon request.

Luciferase assay
Luciferase activity was measured using the dual-luciferase reporter assay (Promega #E1960) that relies on the cotransfection of two plasmids: pGL4.23 (firefly luciferase, luc2 ) and pGL4.75 ( Renilla lucifer ase).Assay ed plasmids were constructed by subcloning the empty pGL4.23 vector containing a minimal promoter (minP).SRR124, SRR134, SRR1, SRR2 and hSCR were PCR amplified (primers are gi v en in Supplementary Table S3) from MCF-7 genomic DNA using high-fidelity Phusion Polymerase and inserted in the forward position downstream of the luc2 gene at the NotI r estriction site.Constructs wer e sequenced to confirm correct insertions.JASPAR2022 ( 79 ) was used to detect FOXA1 (GTAAACA) and NFIB (TGGCAnnnnGCCAA) motifs in the SRR134 sequence.Only motifs with a score of ≥80% were further analyzed.Bases within each motif sequence were mutated until the score was reduced below 80% without affecting co-occurring motifs or creating novel binding sites.In total, four FOXA1 motifs and two NFIB motifs were mutated (Supplementary Table S4).Engineered sequences wer e order ed as gene blocks (Eurofins) and inserted into pGL4.23 in the forward position.Constructs were sequenced to confirm correct insertions.
Cells were plated in 96-well plates with four technical replica tes a t 2 × 10 4 cells per well.After 24 h, a 200 ng 50:1 mixture of enhancer vector and pGL4.75 was transfected using Lipofectamine 3000 (0.05 l of Lipofectamine:1 l of Opti-mem).For transcription factor ov ere xpression analysis, a 200 ng 50:10:1 mixture of enhancer v ector, e xpression plasmid and pGL4.75 was transfected.After 48 h of transfection, cells were lysed in 1 × Passi v e Lysis Buffer and stored at -80 • C until all fiv e biological r eplicates wer e completed.Luciferase activity was measured in the Fluoroskan Ascent FL plate reader.Enhancer activity was calculated by normalizing the firefly signal from pGL4.23 to the Renilla signal from pGL4.75.

Colony formation assay
MCF-7 and PC-9 cells were seeded at low density (2,000 cells / well) into 6-well plates in triplicate for each cell line.Culture medium was rene wed e v ery 3 days.After 12 days, cells wer e fix ed with 3.7% paraf ormaldehyde f or 10 min and stained with 0.5% crystal violet for 20 min to quantify the number of colonies formed.Crystal violet staining was then eluted with 10% acetic acid and absorbance was measured at 570 nm to evaluate cell proliferation.Each 6-well plate was considered one biological replicate and the experiment was repeated fiv e times for each cell line ( n = 5).

Transcriptome analysis
Total RNA was isolated from wild-type (WT; ENH + / + ) and enhancer-deleted ( ENH -/ -) cell lines using the RNeasy kit.Genomic DNA was digested by Turbo DNase.A 500-2,000 ng aliquot of total RNA was used in a reverse transcription reaction with random primers.cDNA was diluted in H 2 O and amplified in a quantitati v e PCR (qPCR) using SYBR Select Mix (primers are gi v en in Supplementary Table S5).Amplicons were sequenced to confirm primer specificity.Gene expression was normalized to PUM1 (80)(81)(82).
Cancer patient transcriptome data were obtained from TCGA ( 90 ) using the TCGAbiolinks package ( 91 ).The ov erall survi val KM-plot ( 92 ) was calculated using clinical information from TCGA ( 93 ).Tumor transcriptome data wer e compar ed with normal tissue using DESeq2.RNAseq reads were normalized to library size using DESeq2 ( 88 ) and transformed to a lo g 2 scale [lo g 2 counts].Differential gene expression was considered significant if |log 2 FC| > 1 and Q < 0.01.
Gene set enrichment analysis (GSEA) was performed by ranking genes according to their log 2 FC in ENH -/ -versus ENH + / + MCF-7 cells.The ranking was then analyzed using the GSEA function from the clusterProfiler package ( 94 ) with a threshold of FDR-adjusted Q < 0.05 using the MSigDB GO term database (C5).

Chromatin accessibility analysis
Cells were grown in three separate wells ( n = 3) and 50,000 cells were sent to the Princess Margaret Genomics Centre for ATAC-seq library preparation using the Omni-A TAC protocol ( 95 ).A TAC-seq libraries were sequenced using 50 bp paired-ended parameters in the Illumina Novaseq 6000 platform.Read quality was checked by fastQC, trimmed using fastP and mapped to the human genome (GRCh38 / hg38) using STAR 2.7.Narrow peaks were called using Genrich ( https://github.com/jsh58/Genrich).Differential chromatin accessibility analysis was performed using diffBind ( 96 ).ATAC-seq peaks with a |log 2 FC| > 1 and FDR-adjusted Q < 0.01 were considered significantly changed.Correlation heatmaps were generated using diffBind.A signal enrichment plot was pr epar ed using NGS.plot ( 89 ).Genes were separated into three categories according to their expression levels in our ENH + / + MCF-7 RNA-seq data.
Transcription factor footprint analysis was performed using TOBIAS ( 97 ) with standard settings.Motifs with a |log 2 FC| > 0.1 and FDR-adjusted Q < 0.01 were considered significantly enriched in each condition.Replicates ( n = 3) were merged into a single BAM file for each condition.Motif enrichment a t dif ferential ATACseq peaks was performed using HOMER ( 98 ).ATACseq peaks were assigned to their closest gene within ± 1 Mb distance from their promoter using ChIPpeakAnno ( 99 ).
Cancer patient ATAC-seq data were obtained from TCGA ( 100 ).DNase-seq data from human de v eloping tissues were obtained from ENCODE (Supplementary Table S6) ( 85 , 86 ).Read quantification was calculated at the RAB7a (pRAB7a), OR5K1 (pOR5K1) and SOX2 (pSOX2) promoters, together with SRR1, SRR2, SRR124, SRR134, hSCR and desert regions with a 1500 bp window centered at the core of each region (genomic coordinates of each r egion ar e gi v en in Supplementary Tab le S7).Reads were normalized to library size [reads per million (RPM)] and transformed to a log 2 scale (log 2 RPM) using a custom script ( https://github.com/luisabatti/BAMquantify).Each region's average log 2 RPM was compared with that of the OR5K1 promoter for differential analysis using Dunn's test with Holm corr ection.Corr elations wer e calculated using Pearson's correlation test and considered significant if FDR-adjusted Q < 0.05.Chromatin accessibility at SRR124 and SRR134 regions was considered low if log ATAC-seq data from de v eloping mouse lung and stomach tissues were obtained from ENCODE (Supplementary Table S6) ( 85 ) and others ( 101 ).Conserved mouse r egulatory r egions wer e lifted from the human build (GRCh38 / hg38) to the mouse build (GRCm38 / mm10) using UCSC liftOver ( 102 ).The number of mapped reads was calcula ted a t the Egf (pEgf), Olfr266 (pOlfr266) and Sox2 (pSox2) promoters, together with the mouse mSRR1, mSRR2, mSRR96, mSRR102, mSCR and desert regions with a 1500 bp window at each location (genomic coordinates are given in Supplementary Table S8).Each log 2 -transformed region's RPM (log 2 RPM) was compared with that of the negati v e Olfr266 promoter control for differential analysis using Dunn's test with Holm correction.

ChIP-seq analysis
ChIP-seq data for transcription factor and histone modifications were obtained from ENCODE ( 85 ) (Supplementary Table S6) and others ( 106-108 ) (Supplementary Table S9).H3K4me1 and H3K27ac tracks were normalized to input and library size (log 2 RPM).Histone modification ChIP-seq tracks and transcription factor ChIP-seq peaks were uploaded to the UCSC browser ( 102 ) for visualization.Normalized H3K4me1 and H3K27ac reads were quantified and the difference in normalized signal was calculated using diffBind.Peaks with a |log 2 FC| > 1 and Q < 0.01 were considered significantly changed.
Overlapping ChIP-seq and ATAC-seq peaks were analyzed using ChIPpeakAnno ( 99 ).The hypergeometric test was performed by comparing the number of overlapping peaks with the total size of the genome divided by the median peak size.

Mouse line construction
Our mSRR96-102 knockout mouse line (C57BL / 6J; Chr3 SRR124-SRR134 del) was ordered from and generated by The Centre for Phenogenomics (TCP) model production core in Toronto, ON.The protocol for the generation of the mouse line has been previously described ( 109 ).Briefly, C57BL / 6J zygotes were collected from superovula ted, ma ted and plugged female mice at 0.5 days post-coitum.Zygotes were electroporated with CRISPRassociated protein 9 (Cas9) ribonucleoprotein (RNP) complexes (gRNA sequences are gi v en in Supplementary Table S1) and transferred into pseudopregnant female recipients within 3-4 hours of electroporation.Newborn pups (potential founders) were screened by endpoint PCR and sequenced to confirm allelic mSRR96-102 deletions (Supplementary Table S2).One heterozygous mSRR96-102 founder ( mENH + / -) was then backcrossed twice to the parental strain to reduce the probability of off-target muta tion segrega tion and to confir m ger mline transmission.Off-target mutagenesis by Cas9 is rare in mouse embryos using this protocol ( 110 ).Neither of the two gRNAs used for the mSRR96-102 deletion had any predicted off-target sites with < 3 bp mismatches.Furthermore, no off-target hits were found within exonic regions on chromosome 3, where Sox2 is located.Potential changes in chromosomal copy numbers were also ruled out by real-time PCR.
Once the mouse line was established and the mSRR96-102 deletion was fully confirmed and sequenced in the N1 offspring, mENH + / -mice were crossed and the number of li v e pups from each genotype ( mENH + / + , mENH + / -, mENH -/ -) was assessed at weaning (P21).The obtained number of li v e pups from each genotype was then compared with the expected Mendelian ratio of 1:2:1 ( mENH + / + : mENH + / -: mENH -/ -) using a chisquared test.Once the lethality of the homozygous deletion was confirmed at weaning, E18.5 littermate embryos generated from new mENH + / -crosses were collected for further histological analyses.
All procedures involving animals were performed in compliance with the Animals for Research Act of Ontario and the Guidelines of the Canadian Council on Animal Care.The TCP Animal Care Committee re vie wed and approved all procedures conducted on animals at the facility.Sperm from male mENH + / -mice has been cryopreserved at the Canadian Mouse Mutant Repository (CMMR) and is available upon request.

Histological analyses
A total of 46 embryos were collected at E18.5 and fixed in 4% paraformaldehyde.Each of these embryos was genotyped.A total of 15 embryos (Supplementary Tab le S10), fiv e of each genotype ( mENH + / + , mENH + / -, mENH -/ -), were randomly selected, processed and embedded in paraffin for sectioning and further analysis.Tissue sections were collected at 4 m thickness roughly at the start of the thymus.Sections were prepared by the Pathology Core at TCP. Tissue sections were stained with hematoxylin and eosin (H&E) using an auto-stainer to ensure batch consistency.Slides were scanned using a Hamamatsu Nanozoomer slide scanner at ×20 magnification.For immunohistochemistry staining, E18.5 embryo cross-sections were submitted to heat-induced epitope retrieval with Tris-EDTA (pH 9.0) for 10 min, followed by quenching of endogenous peroxidase with Bloxall reagent (Vector).Non-specific antibody binding was blocked with 2.5% normal horse serum (Vector), followed by incubation for 1 hour in rabbit anti-SOX2 (Abcam, ab92494, 1:500).After washes, sections were incubated for 30 min with ImmPRESS anti-r abbit horser adish peroxidase (HRP; Vector), followed by 3,3 -diaminobenzidine (DAB) reagent and counterstained in Mayer's hematoxylin.
For immunofluorescence staining, E18.5 embryo crosssections were collected onto charged slides and then baked at 60 • C for 30 min.Tissue sections were submitted to heatinduced epitope retrieval with citrate buffer pH 6.0 for 10 min.Non-specific antibody binding was blocked with Protein Block Serum-Free (Dako) for 10 min, followed by overnight incuba tion a t 4 • C in a primary antibody cocktail (rabbit anti-NKX2.1,Abcam ab76013 at 1:200; rat anti-SOX2, Thermo Fisher Scientific 14-9811-80 at 1:100).After washes with TBS-T, sections were incubated for 1 hour with a cocktail of Alexa Fluor-conjugated secondary antibodies at 1:200 (goat anti-rabbit IgG AF488, Thermo Fisher Scientific A32731; goat anti-rat IgG AF647, Thermo Fisher Scientific A21247), followed by counterstaining with 4 ,6-diamidino-2-phenylindole (DAPI).Scanning was performed using an Olympus VS-120 slide scanner and imaged using a Hamamatsu ORCA-R2 C10600 digital camera for all dark-field and fluorescent images.

Tw o r egions downstr eam of SOX2 gain enhancer featur es in cancer cells
SOX2 ov ere xpression occurs in multiple types of cancer (reviewed in 21 , 22 ).To examine which cancer types have the highest le v els of SOX2 up-regulation, we performed differ ential expr ession analysis by calculating the log 2 FC of SOX2 transcription from 21 TCGA primary solid tumors (see Supplementary Table S11 for cancer type abbreviations) compared with normal tissue samples ( 90 ).We found that BRCA (log 2 FC = 3.31), COAD (log 2 FC = 1.38),GBM (log 2 FC = 2.05), LIHC (log 2 FC = 3.22), LUAD (log 2 FC = 1.36) and LUSC (log 2 FC = 4.91) tumors had the greatest SOX2 up-regulation (log 2 FC > 1; FDRadjusted Q < 0.01; Figure 1 A; Supplementary Table S12).As a negati v e control, we ran this same analysis using the housekeeping gene PUM1 ( 81 ) and found no cancer types with significant up-regulation of this gene (Supplementary Figure S1A; Supplementary Table S13).
Alongside gains in chromatin features, another characteristic of acti v e enhancers is the binding of numerous ( > 10) transcription factors (121)(122)(123).Chromatin immunoprecipitation sequencing (ChIP-seq) data from EN-CODE ( 85 ) on 117 transcription factors re v ealed 48 different factors present at the SRR124-134 cluster in MCF-7 cells, with the majority ( 47 ) of these factors present at SRR134 (Figure 1 D).Transcription factors bound at both SRR124 and SRR134 include CEBPB, CREB1, FOXA1, FOXM1, NFIB, NR2F2, TCF12 and ZNF217.An additional feature of distal enhancers is that they contact their target genes through long-range chromatin interactions ( 124 , 125 ).We analyzed Chromatin Interaction Analysis by P air ed-End-Tag sequencing (ChIA-PET) data from MCF-7 cells ( 126 ) and found two interesting RN A pol ymerase II (RNAPII)-mediated chromatin interactions: one between the SOX2 gene and SRR134, and one between SRR124 and SRR134 (Figure 1 E).Beyond the interactions with SOX2, we also identified long-range interactions between SRR124 and the upstream long non-coding RNA (lncRNA) SOX2-OT ( ∼665 kb away), between SRR134 and the downstream lncRNA LINC01206 ( ∼150 kb away), and between SRR134 and the upstream RSRC1 gene ( ∼23 Mb away) (Supplementary Table S18).In addition to MCF-7 cells, we found that H520 (LUSC), PC-9 (LUAD) and T47D (luminal A BRCA) cancer cell lines, which display varying le v els of SOX2 e xpression (Supplementary Figure S1E), also gained substantial enhancer features at SRR124 and SRR134 when compared with normal tissue (Figure 1 E) ( 85 , 106 , 108 , 127 ).Together, these data suggest that SRR124 and SRR134 could be acti v e enhancers driving SOX2 transcription in BRCA, LUAD and LUSC.

The SRR124-134 cluster is essential for SOX2 expression in BRCA and LUAD cells
To assess SRR124 and SRR134 enhancer activity alongside the embryonic-associated SRR1, SRR2 and hSCR regions, we used a reporter vector containing the firefly luciferase gene under the control of a minimal promoter (minP, pGL4.23).We transfected each enhancer construct into the BRCA (MCF-7, T47D), LUAD (PC-9) and  ( 90 ).RNA-seq r eads wer e normalized to library size using DESeq2 ( 88 ).Error bars: SD.Significance analysis by Dunn's test ( 180 ) with Holm correction ( 181 ).( C ) 1500 bp genomic regions within ± 1 Mb from the SOX2 transcription start site (TSS) that gained enhancer features in MCF-7 cells ( 85 ) compared with normal breast epithelium ( 86 ).Regions that gained both ATAC-seq and H3K27ac ChIP-seq signal above our threshold (log 2 FC > 1, dashed line) are highlighted in pink.Each region was labeled according to their distance in kilobases to the SOX2 promoter (pSOX2, bold).( D ) ChIP-seq signal for H3K4me1 and H3K27ac, ATAC-seq signal and transcription factor ChIP-seq peaks at the SRR124-134 cluster in MCF-7 cells.Datasets are from ENCODE ( 85 ).( E ) UCSC Genome Browser ( 102 ) display of H3K4me1 and H3K27ac ChIP-seq signal, DNase-seq and ATAC-seq chromatin accessibility signal, and ChIA-PET RNA polymerase II (RNAPII) interactions around the SOX2 gene within breast (normal tissue and 2 BRCA cancer cell lines) and lung (normal tissue, one LUAD and one LUSC cancer cell line) samples ( 85 ,106, 108 , 127 ).Relevant RNAPII interactions (between SRR124 and SRR134, and between SRR134 and pSOX2) are highlighted in maroon.

SOX2 regulates pathways associated with epithelium development in luminal A BRCA
Gi v en the estab lished r ole of SOX2 in regulating pr oliferation and differentiation pathways in other epithelial cells ( 40 , 130 ), we decided to further investigate the molecular function of SOX2 in luminal A BRCA cells by le v eraging our SOX2 -depleted ENH -/ -MCF-7 cell model.GSEA showed a significant (FDR-adjusted Q < 0.05) depletion of multiple epithelium-associated processes within the transcriptome of ENH -/ -MCF-7 cells, as indicated by the normalized enrichment score (NES) < 1 (Supplementary Table S20).These processes included epidermis development (NES = -1.93;Q = 0.001; Figure 3 A), epithelial cell dif ferentia tion (NES = -1.67;Q = 0.007; Figure 3 B) and cornification (NES = -2.11;Q = 0.006; Figure 3 C).Cornification is the process of terminal dif ferentia tion of epidermal cells, wherein these cells undergo a specialized form of programmed cell death to produce a layer of flattened, dead cells with a high keratin content (re vie wed in 131 ).This suggests that SOX2 has a pivotal role in regulating epithelial de v elopment and dif ferentia tion pa thways in luminal A BRCA cells.
Expression le v els of either GRHL2 or RUNX2 , howe v er, were not significantly affected by SOX2 down-regulation in ENH -/ -MCF-7 cells (-1 ≤ log 2 FC ≤ 1; Supplementary Table S19), indicating that they are not directly regulated by SOX2 at the transcriptional le v el but may interact at the protein le v el.

The SRR124-134 cluster is associated with SOX2 o ver expression in primary tumors
With the confirmation that the SRR124-134 cluster dri v es SOX2 ov ere xpression in the BRCA and LUAD cell lines, we investiga ted chroma tin accessibility a t this enhancer cluster within primary tumors isolated from cancer patients.By analyzing the pan-cancer ATAC-seq dataset from TCGA ( 100 ), we found that SRR124 and SRR134 are most accessible within LUSC, LUAD, BRCA, bladder carcinoma (BLCA), stomach adenocarcinoma (STAD) and uterine endometrial carcinoma (UCEC) patient tumors (Figure 4 A).We also quantified the ATAC-seq signal at six other regions: the SOX2 embryonic-associated enhancers (SRR1, SRR2 and hSCR), the SOX2 promoter (pSOX2), a gene regulatory desert with no enhancer fea tures loca ted between the SOX2 gene and the SRR124-134 cluster (desert), and the promoter of the housekeeping gene RAB7A (pRAB7A, positi v e control).We then compared the chromatin accessibility le v els at each of these regions with the promoter of the r epr essed olfactory gene OR5K1 (pOR5K1, negati v e control).Both SRR124 and SRR134 showed significantly increased ( P < 0.05, Holm-adjusted Dunn's test) chromatin accessibility when compared with pOR5K1 in BLCA (SRR124 P = 0.014; SRR134 P = 1.52 × 10 One potential explanation for increased chromatin accessibility could be locus amplifica tion.W hile LUSC had high le v els of chromatin accessibility probably related to previously described SOX2 amplifications ( 58 , 59 , 111 , 112 ), most patient tumors showed no evidence of locus amplifications extending to the SRR124-134 cluster, as evidenced by the lack of significant ( P > 0.05) accessibility at the intermediate desert region.In contrast, the SRR124-134 cluster displayed a consistent pattern of accessible chromatin across multiple cancer types: BLCA, BRCA, LUAD, LUSC, STAD and UCEC (Figure 4 C).GBM and LGG tumors lacked accessible chroma tin a t this cluster but displayed increased chromatin accessibility at the SRR1 and SRR2 enhancers (Supplementary Figure S4A; Supplementary Table S26), which is consistent with the evidence that SRR1 and SRR2 dri v e SOX2 e xpression in the neural lineage ( 23 , 71 , 135 ).
Next, we reasoned that an accessible SRR124-134 cluster dri v es subsequent SOX2 transcription within patient tumors.If this was the case, we anticipated finding positi v e and significantly correla ted chroma tin accessibility between this enhancer cluster and pSOX2.Indeed, we found that the  majority of BRCA (58%), LUAD (82%) and LUSC (69%) tumors have concurrent accessibility (log 2 RPM > 0) at pSOX2, SRR124 and SRR134.Patient tumors also showed a significant ( P < 0.05) correlation (Pearson R ) between accessible chromatin signal at pSOX2 and at both SRR124 and SRR134 in BRCA and LUAD (Figure 4 D).LUSC tumors showed a significant correlation between accessible chroma tin a t pSOX2 and SRR124, but not at SRR134 (Figure 4 D).As a negati v e control, we measured the correlation between chromatin accessibility at pSOX2 and at the SOX2 desert region and found no significant ( P > 0.05) correlation in any of these cancer types (Supplementary Figure S4B).We also conducted a similar analysis after segregating BRCA tumors into luminal A, luminal B, HER2 + and basal-like subtypes ( 100 , 117 ).Interestingly, we found that both luminal A and luminal B tumors possess a significant ( P < 0.05) correlation between enhancer accessibility and pSOX2 accessibility, whereas for HER2 + tumors the correlation was weaker (Supplementary Figure S4C).Basal-like tumors, on the other hand, display no accessible chromatin at either SRR124 or SRR134.This supports that luminal BRCA and LUAD subtypes are strongly associated with increased accessibility at the SRR124-134 cluster.Finally, by separating BRCA, LUAD and LUSC patient tumors according to their chromatin accessibility at SRR124 and SRR134, we found that tumors with the most accessible chroma tin a t each of these regions also significantly ( P < 0.05, t -test) ov ere xpress SOX2 compared with tumors with low chromatin accessibility at these regions (Figure 4 E; Supplementary Table S27).Together, these data are consistent with a model in which increased chromatin accessibility at the SRR124-134 cluster dri v es SOX2 overexpression in breast and lung patient tumors.

FOXA1 and NFIB are upstream regulators of the SRR124-134 cluster
Gi v en the e vidence that the SRR124-134 cluster is driving SOX2 ov ere xpression in cancer patient tumors, we investigated which transcription factors regulate this cluster in BRCA, LUAD and LUSC tumors from TCGA ( 90 , 100 ).From a comprehensi v e list of 1622 human transcription factors ( 136 ), we found 115 transcription factors whose expression significantly correlated (FDR-adjusted Q < 0.05) with chromatin accessibility at SRR124 and 90 transcription factors whose expression correlated with accessibility at SRR134 (Figure 5 A; Supplementary Table S28).From this list, we focused our investigation on FOXA1 and NFIB, which show binding at both SRR124 and SRR134 in ChIPseq data from MCF-7 cells ( 85 ).
To assess the influence of these transcription factors on enhancer activity, we over expr essed either FOXA1 or NFIB in H520, MCF-7, PC-9 and T47D cells and compared SRR124 and SRR134 enhancer activity measured by luciferase reporter assay with cells transfected with an empty vector (mock).Despite the high endogenous expression of FOXA1 and NFIB in MCF-7 and T47D cells, but not in H520 and PC-9 cells (Supplementary Figure S5A), we found that ov ere xpression of FOXA1 significantly increased (log 2 FC > 1; P < 0.05, Tukey's test) the enhancer activity of both SRR124 and SRR134 in all four cell lines, wher eas NFIB over expr ession led to a significant decrease (log 2 FC < 1; P < 0.05) in SRR124 and SRR134 enhancer activity in the H520, MCF-7 and T47D cell lines (Figure 5 F).This further indicates that FOXA1 ov ere xpression incr eases SRR124-134 activity, wher eas NFIB r epr esses the enhancer activity of this cluster.
To assess the importance of FOXA1 and NFIB motifs in modulating enhancer activity, we analyzed the SRR134 sequence using the JASPAR2022 motif database ( 79 ) and mutated FOXA1 (GTAAACA) or NFIB (TGGCAnnnnGC-CAA) motifs to eliminate their binding.We found that mutation of the FOXA1 motif abolished SRR134 enhancer activity measured by luciferase reporter assay compared with the WT SRR134 sequence within MCF-7 ( P = 1.53 × 10 −5 , Tukey's test), PC-9 ( P = 1 × 10 −2 ) and T47D ( P = 4.48 × 10 −6 ) cells, whereas no significant change ( P > 0.05) in enhancer activity was found for the NFIB-mutated construct (Figure 5 G).These findings underscore the pivotal role of the FOXA1 motif in maintaining SRR134 activity, whereas the NFIB motif is dispensable in this context, consistent with the behavior of a negative regulator when the target activity is elevated.
With the evidence that these two transcription factors are modulating SRR124-134 activity, we investigated their transcriptional effects on SOX2 expression.We used CRISPR HDR to create an MCF-7 cell line in which the SOX2 gene is tagged with a 2A self-cleaving peptide (P2A) followed by a blue fluorescent protein (tagBFP).This cell line, MCF-7 SOX2 -P2A-tagBFP, allows rapid visualization of SOX2 transcriptional changes by measuring tagBFP signal through FACS.To validate this model, we sorted cells within the top 10% (BFP +ve ) and bottom 10% (BFP −ve ) tagBFP signal (Supplementary Figure S5B).We found that BFP +ve cells showed a significant ( P = 4.25 × 10 −5 , paired t-test) increase in SOX2 expression, and displayed significantly up-regulated transcription of enhancer RN A (eRN A) at SRR124 ( P = 1.54 × 10  5 I).This confirms the repressi v e effect of NFIB over SOX2 expression and illustrates a potential mechanism upstream of SOX2 tha t modula tes chroma tin accessibility a t the SRR124-134 cluster and subsequent control of SOX2 transcription in cancer cells.
Since critical de v elopmental genes are often controlled by highly conserved enhancers across species ( 138 , 139 ), we hypothesized that the SRR124-134 cluster might regulate SOX2 expression during the development of other species.By analyzing PhyloP conservation scores ( 102 , 103 ), we discovered that both SRR124 and SRR134 contain a highly conserved core sequence that is preserved across mammals , birds , reptiles and amphibians (Figure 6 C).After aligning and comparing enhancer sequences between humans and mice, we found that the core sequences at both SRR124 and SRR134 are highly conserved ( > 80%) in the mouse genome (Supplementary Figure S6A).We termed these homologous regions mSRR96 (96 kb downstream of the mouse Sox2 promoter ; homolo gous to the human SRR124) and mSRR102 (102 kb downstream of the mouse Sox2 promoter ; homolo gous to the human SRR134).Enhancer feature analysis in the de v eloping lung and stomach tissues in the mouse ( 85 , 101 ) showed that both mSRR96 and mSRR102 display increased chromatin accessibility and H3K27ac signal throughout de v elopmental days E14.5 to the eighth post-natal week (Figure 6 D).Interestingly, mSRR96 and mSRR102 display higher ATAC-seq and H3K27ac signal towards the later stages of de v elopment in the lungs, but at early stages of de v elopment in the stomach.This suggests a distinct spatiotemporal contribution of this homologous cluster to Sox2 expression during the de v elopment of these tissues in the mouse.Furthermore, ATAC-seq quantification showed that both mSRR96 (lung P = 5.54 × 10 −5 ; stomach P = 2.37 × 10 −4 ; Holm-adjusted Dunn's test) and mSRR102 (lung P = 1.27 × 10 −3 ; stomach P = 0.046) are significantly more accessible than the repressed promoter of the olfactory gene Olfr266 (pOlfr266, negati v e control) during the de v elopment of the lungs and stomach in the mouse (Supplementary Figure S6B; Supplementary Table S32).Together, these results suggest a conserved SOX2 regulatory mechanism across multiple species and support a model in which the SRR124 and SRR134 enhancers and their homologs regulate SOX2 expression during the de v elopment of the digesti v e and respiratory systems.
To assess the contribution of the mSRR96 and mSRR102 regions to the de v elopment of the mouse, we generated a C57BL / 6J knockout containing a deletion spanning the mSRR96-102 enhancer cluster ( mENH) (Figure 6 E).We crossed animals carrying a heterozygous mSRR96-102 deletion ( mENH + / -) and determined the number of pups ali v e at weaning (P21) from each genotype.We found a significant ( P = 1.13 × 10 −4 , Chi-squared test) deviation from the expected Mendelian ratio, with no homozygous mice ( mENH -/ -) ali v e at weaning (Figure 6 F), demonstra ting tha t the mSRR96-102 enhancer cluster is crucial for survival in the mouse.To investigate the resulting phenotype in a homozygous mSRR96-102 enhancer deletion, we collected E18.5 littermate embryos and pr epar ed crosssections at the thymus le v el from fiv e animals of each genotype ( mENH + / + , mENH + / -and mENH -/ -) (Figure 6 G).Similar to other studies that interfered with Sox2 expression during de v elopment ( 25 , 32 , 33 ), we found that all fiv e mENH -/ -embryos de v eloped EA / TEF, where the esophagus and trachea fail to separate during embryonic de v elopment (Figure 6 H; Supplementary Figure S6C).In contrast, mENH + / + and mENH + / -embry os displa yed normal de v elopment of the esophageal and tracheal tissues.Immunohistochemistry re v ealed the complete absence of the SOX2 protein within the EA / TEF tissue in mENH -/ - embryos, whereas mENH + / + and mENH + / -embryos showed high le v els of SOX2 protein within both the esophagus and tracheal tubes (Figure 6 I).Finall y, imm unofluorescence staining for NKX2.1, a transcription factor associated with the inner epithelium of the respir atory tr act ( 140 ), showed high protein le v els within the inner layer of the EA / TEF tissue in mENH -/ -embryos, indica ting tha t this aberrant tissue resembles a tracheal-like structure lacking SOX2 (Supplementary Figure S7A).Together, these results demonstra te tha t mSRR96 and mSRR102 are required to dri v e Sox2 e xpression during the de v elopment and separation of the esophagus and trachea.

DISCUSSION
Our findings re v eal that the SRR124-134 enhancer cluster is essential for Sox2 expression in the developing digesti v e and respiratory systems as it is required for the separation of the esophagus and trachea during mouse de v elopment.When embryogenesis is complete, Sox2 expression is down-regulated in most differentiated cell types as its de v elopmental enhancers are decommissioned.We propose that aberrant up-regulation of the pioneer factor FOXA1 recommissions both SRR124 and SRR134 in tumor cells, driving SOX2 ov ere xpression in breast and lung adenocarcinoma.Gi v en that SOX2 itself acts as a pioneer transcription factor throughout de v elopment, we determined that increased le v els of this protein further reprogram the chromatin landscape of cancer cells, binding at multiple regulatory regions, increasing chromatin accessibility, and driving subsequent up-regulation of genes associated with epithelium de v elopment.Previous studies have alr eady underscor ed the indispensable role of SOX2 in both preserving gene expression patterns and orchestr ating long-r ange chromatin interactions in neural stem cells ( 141 ), where SOX2 acts as a master regulator ( 23 , 142 ).Considering our observation that the loss of SOX2 expression leads to a genome-wide reduction in chromatin accessibility and transcription, our results position SOX2 as a central agent in the aberrant activation of gene regulatory pathways that ultimately support a tumorinitiating phenotype in breast and lung adenocarcinomas.
Our discovery that enhancers involved in the development of the digesti v e and respiratory systems are reprogrammed to support SOX2 up-regulation during tumorigenesis is in line with previous observations that tumorinitiating cells acquire a less dif ferentia ted phenotype (143)(144)(145)(146).It is more surprising, howe v er, that the SOX2 gene is regulated by common enhancers in both breast and lung adenocarcinoma cells as enhancers are usuall y highl y tissue specific ( 6 , 138 , 139 , 147 ).Our observation that FOXA1 expression is significantly correlated to chromatin accessibility at the SRR124-134 cluster and increases the transcriptional output of the SRR124 and SRR134 enhancers provides a mechanistic link between breast and lung de v elopmental programs and cancer progression.FOXA1 is directly involved in the branching morphogenesis of the epithelium in breast ( 148 , 149 ) and lung ( 150 , 151 ) tissues, where SOX2 also plays an important role ( 27 , 60 ).Ov ere xpression of both FOXA1 ( 6 , 9 , 10 , 13 , 152-154 ) and SOX2 ( 55 , 66 , 155 ) hav e been indi vidually linked to the acti vation of transcriptional programs associated with multiple types of cancer.
Ther efor e, we propose that FOXA1 is one of the key players responsible for the reprogramming of the SRR124-134 cluster in cancer, which then dri v es SOX2 ov ere xpression in breast and lung tumors.It remains intriguing, howe v er, that we were unable to detect a further increase in SOX2 expression in MCF-7 cells ov ere xpressing FOXA1 despite observing an up-regulation in SRR124 and SRR134 activity measured by luciferase assay.Since FOXA1 is already highly expressed in MCF-7 cells, we reason that exogenous overexpression of FOXA1 may be incapable of further increasing SOX2 expression if transcriptional levels ar e alr eady high, such as in the case of MCF-7 cells.Furthermore, our approach to detect changes in SOX2 transcription using BFP as a fluorescent reporter may have limited our ability to detect small changes in gene expression compared with the higher sensitivity obtained from the luciferase reporter.As mutation of the FOXA1 motif disrupted SRR134 enhancer activity, and this motif is shared among other members of the forkhead box (FOX) transcription factor family ( 156 ), it also remains possible that other FOX proteins are involv ed in acti vating the SRR124-134 cluster.For e xample, FOXM1 ov ere xpression, which also showed binding at both SRR124 and SRR134 in MCF-7 cells, has similarly been associated with poor patient outcomes in multiple types of cancer ( 157 ).
In addition to the activating role of FOXA1, we identified NFIB as a negati v e regulator of SOX2 expression through inhibition of SRR124-134 activity.NFIB is normally r equir ed for the de v elopment of multiple tissues (reviewed in 158 ), including the brain and lungs (159)(160)(161), tissues in which SOX2 expression is also tightly regulated ( 27 , 142 ).In the lungs, NFIB is essential for promoting the ma tura tion and dif ferentia tion of progenitor cells ( 159 , 160 ).This is in stark contrast to SOX2, which inhibits the differentiation of lung cells ( 27 ).Interestingly, NFIB seems to have paradoxical roles in cancer, acting both as a tumor suppressor and as an oncogene in different tissues ( 162 ).Among its tumor suppressor activities, NFIB acts as a barrier to skin car cinoma progr ession ( 163 ), and its downregula tion is associa ted with dedif ferentia tion and aggressi v eness in LUAD ( 164 ).On the other hand, SOX2 promotes skin ( 66 ) and lung ( 165 ) cancer progression.As an oncogene, NFIB pr omotes cell pr oliferation and metastasis in STAD ( 166 ), where SOX2 down-regulation is associated with poor patient outcomes (167)(168)(169).With this contrasting relationship between SOX2 and NFIB across multiple tissues, we propose that NFIB normally acts as a suppressor of SRR124-134 activity and SOX2 expression during the dif ferentia tion of progenitor cells; down-regula tion of NFIB expr ession then r esults in SOX2 over expr ession during breast and lung tumorigenesis.
We initially hypothesized that SRR1 and SRR2 ( 70 , 71 , 170 ), and / or the SCR ( 72 , 73 ), might be recommissioned during cancer progression, as stem cell-related enhancers have been shown to acquire enhancer features in tumorigenic cells ( 171 ).Although other studies have also proposed the activation of either SRR1 ( 42 , 69 ) or SRR2 ( 172 , 173 ) as the main dri v ers of SOX2 ov ere xpression in BRCA, we found no evidence of this mechanism and instead identified the SRR124-134 cluster as the main dri v er of SOX2 expression in BRCA and LUAD.Our patient tumor analysis did show that GBM and LGG were the only cancer types that display a unique and consistent pattern of accessible chromatin at SRR1 and SRR2, which is probably related to glioma cells assuming a neural stem cell-like identity to sustain high le v els of cell proliferation in the brain ( 62 ).In fact, SRR2 deletion was shown to down-regulate SOX2 and reduce cell proliferation in GBM cells ( 174 ), highlighting enhancer specificity to different tumor types.In line with these findings, our observation that PC-9 LUAD cells are dependent on SRR124-134 for SOX2 transcription, whereas in H520 LUSC cells SRR124-134 is dispensable, again underscores these tumor type-specific regulatory mechanisms.LUSC tumors frequently amplify the SOX2 locus ( 58 , 59 , 111 , 112 ), whereas LUAD tumors do not ( 175 ), indicating that different mechanisms are involved in genome d ysregula tion in these two subtypes of lung cancer.Indeed, we found FOXA1 expression to be the lowest in H520 cells, which may explain the diminished transcriptional activity of the SRR124-134 cluster in this cell line.Interestingly, a further downstream enhancer cluster located ∼55 kb away from SRR124-134 exhibits high H3K27ac signal and is co-amplified with SOX2 in H520 cells and other LUSC cell lines ( 112 ), re v ealing an alternati v e mechanism that could sustain SOX2 ov ere xpression in the absence of the SRR124-134 cluster in certain types of LUSC but not in LUAD.
Enhancer clusters often contain individual enhancers with partially redundant functions ( 128 , 176 , 177 ).Our analyses positioned SRR134 as the most potent enhancer within the SRR124-134 cluster.This is not surprising since SRR134 also shows a higher amount of transcription factor binding in MCF-7 cells, a key fea ture associa ted with enhancer activity ( 123 ).However, while both SRR124 and SRR134 display similar chromatin accessibility in MCF-7 cells, PC-9 cells showed much greater accessibility at the SRR134 enhancer, whereas T47D and H520 cells showed a more accessible SRR124 region.Gi v en that SOX2 expression is more elevated in MCF-7, T47D and H520 compared with PC-9 cells, we postulate that simultaneous activation of both SRR124 and SRR134 enhancers may be crucial for optimal SOX2 transcription.Another distinguishing feature between these enhancers is the e xclusi v e binding of CT CF at SRR124.CT CF is a transcription factor involved in chromatin structure and distal enhancer-promoter loop forma tion a t some loci ( 178 , 179 ).Based on these findings, we propose that SRR124 acts as a tether between pSOX2 and SRR134, the latter functioning as a docking region for the binding of multiple transcription factors that ultimately dri v e SOX2 ov ere xpression.Therefore, in a scenario where both enhancers are accessible, we believe the chroma tin d ynamics facilita te enhanced interactions between pSOX2 and the entire SRR124-134 cluster, ultimately elevating the transcription of SOX2 .
Deletion of mSRR96-102, a homolog of the human SRR124-134 cluster, resulted in EA / TEF, which is also observed in human cases with SOX2 heterozygous mutations (34)(35)(36).A recent study showed that insertion of a CTCF insulation cluster downstream of the Sox2 gene, but upstream of mSRR96-102, disrupts Sox2 expression, impairs separation of the esophagus and trachea, and results in perinatal lethality due to EA / TEF in the mouse ( 33 ).This was of particular interest for understanding enhancer functional nuances during de v elopment since the SCR, which is required for Sox2 transcription at implantation, can partially overcome the insula tor ef fect of this insertion.The authors proposed that enhancer density might explain the EA / TEF phenotype, as chroma tin fea tures suggested tha t enhancers in the de v eloping lung and stomach tissues might be spread over a 400 kb domain ( 33 ).Howe v er, the 6 kb deletion that removes the mSRR96-102 cluster causing EA / TEF suggests that this is not the case.Instead, we propose that the sensitivity of each cell type to gene dosage is behind the differing ability of CTCF to block distal enhancers.This is based on two observations: in humans, heterozygous SOX2 mutations are linked with the anophthalmia-esophagealgenital syndrome (34)(35)(36); in mice, hypomorphic Sox2 alleles display similar phenotypes in the eye ( 24 ) and EA / TEF ( 25 , 32 ).This suggests that cells from the peri-implantation phase are less sensiti v e to lower Sox2 dosages compared with cells from the de v eloping airways and digesti v e systems in both species, and explains the aberrant phenotypes observed at term.Overall, our findings illustrate how cis-regulatory regions can similarly dri v e gene expression in both normal and diseased contexts and serve as a prime example of how decommissioned de v elopmental enhancers may be reprogrammed during tumorigenesis.The fact that we have found a digesti v e / respira tory-associa ted enhancer cluster driving gene expression in a non-native context such as BRCA remains intriguing and r einfor ces a model in which tumorigenic cells often re v ert to a progenitor-like sta te tha t combines cis-regulatory features of progenitor cells from multiple de v eloping lineages ( 6 ).This 'd ys-dif ferentia tion' mechanism seems to be centered around the ov ere xpression of a fe w key de v elopment-associated pioneer transcription factors such as FOXA1 and SOX2.Identifying additional mechanisms tha t regula te the reprogramming of these enhancers could lead to new approaches to target tumorinitiating cells that depend on SOX2 ov ere xpression.

DA T A A V AILABILITY
Sequencing and processed data files were submitted to the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/ ) repository (GSE132344).

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.