Chromatin regulatory dynamics of early human small intestinal development using a directed differentiation model

Abstract The establishment of the small intestinal (SI) lineage during human embryogenesis ensures functional integrity of the intestine after birth. The chromatin dynamics that drive SI lineage formation and regional patterning in humans are essentially unknown. To fill this knowledge void, we apply a cutting-edge genomic technology to a state-of-the-art human model of early SI development. Specifically, we leverage chromatin run-on sequencing (ChRO-seq) to define the landscape of active promoters, enhancers and gene bodies across distinct stages of directed differentiation of human pluripotent stem cells into SI spheroids with regional specification. Through comprehensive ChRO-seq analysis we identify candidate stage-specific chromatin activity states, novel markers and enhancer hotspots during the directed differentiation. Moreover, we propose a detailed transcriptional network associated with SI lineage formation or regional patterning. Our ChRO-seq analyses uncover a previously undescribed pattern of enhancer activity and transcription at HOX gene loci underlying SI regional patterning. We also validated this unique HOX dynamics by the analysis of single cell RNA-seq data from human fetal SI. Overall, the results lead to a new proposed working model for the regulatory underpinnings of human SI development, thereby adding a novel dimension to the literature that has relied almost exclusively on non-human models.


INTRODUCTION
The embryonic development of the small intestine (SI) is critical for a fetus to thrive and grow. The genetic programming of SI cell identity and regional specification during early development is fundamental to ensure proper SI morphogenesis and maturation with specialized functions, including nutrient digestion and absorption, energy balance and pathogen defense. Several seminal studies have identified important molecular regulators associated with gut development (1)(2)(3)(4)(5), and more recent studies have leveraged advanced genomic technologies (e.g. single cell RNA-seq and bulk ATAC-seq) to provide insights at a more granular level into gut development (6)(7)(8)(9)(10)(11); however, this work relies almost exclusively on animal models. Studies of human SI development have been few (7,12,13), due in large part to limited access to primary human fetal tissues. Initial human SI lineage formation from the endodermal germ layer, which occurs much prior to the developmental time points at which human fetal tissues are generally available, is essentially uncharacterized. In this study, we sought to fill this important knowledge gap by profiling for the first time the chromatin regulatory dynamics and transcriptional programs that are associated with human SI lineage formation and initial regional patterning.
Robust temporal and spatial regulation of gene expression is fundamental to all biological processes including development (14,15). Transcriptional programs are precisely controlled by promoters and distal cis-regulatory regions known as enhancers. Enhancers harbor binding sites for transcription factors (TFs), activate long-range gene transcription, and are often cell-type specific (16,17). RNA polymerases are known to be recruited to active enhancers, generating divergent short transcripts (also known as enhancer RNA or eRNAs) (18,19). Recently, an approach called chromatin run-on sequencing (ChRO-seq) (20) was developed for genome-wide identification of promoters, active enhancers, and actively transcribed gene bodies in a single assay. ChRO-seq represents the newest generation of nascent RNA sequencing technologies, and overcomes several limitations of previous versions including global run-on sequencing (GRO-seq) (21) and precision run-on sequencing (PRO-seq) (22). ChRO-seq was very recently successfully applied to archived solid tumor tissues (20,23).
Advances in the directed differentiation of human pluripotent stem cells (hPSCs), including human embryonic stem cells (hESCs), provide a powerful strategy for studying the early developmental events of human SI that would be nearly impossible otherwise (24,25). The multi-step process of generating human 'gut-in-a-dish' from hPSCs recapitulates many aspects of in vivo SI development (26)(27)(28), including induction of the endodermal fate, formation of SI lineage (SI spheroids) with regional identities, and maturation into three-dimensional human intestinal organoids (HIOs) that exhibit molecular, structural and functional features similar to those of the human fetal SI (13,29). While the protocol for directed differentiation of hPSCs into SI spheroids has been established, the molecular mechanisms governing this process remains minimally characterized. To reveal temporal dynamics of the chromatin state during early human SI development and regional patterning, we performed ChRO-seq across the following stages: hPSC, definitive endoderm (DE), duodenal (proximal SI) spheroid and ileal (distal SI) spheroid. This study provides the firstever view of the changing chromatin regulatory landscape, defines stage-specific enhancer hotspots and identifies key TF networks that underlie the acquisition of SI identity and the initiation of ileal regional patterning. Moreover, we uncover previously undescribed dynamics at HOX gene loci associated with these events. Additional analysis of single cell RNA-seq (scRNA-seq) of human fetal SI tissue underscores the unique HOX gene patterns, although these results are subject to the caveat that these tissues reflect developmental time points much later than what we have the opportunity to analyze in the directed differentiation model. Overall, this study offers an unprecedented resource that serves as a springboard from which the research community can develop and test targeted hypotheses about key regulatory hotspots and molecular drivers of early events of human SI development.

Directed differentiation of hESCs
Differentiation of H9 hESCs and organoids was performed as previously published, with minor modifications (27,30). Briefly, the endoderm was generated by treatment of activin A (100 ng/ml) for 3 consecutive days in Roswell Park Memorial Institute 1640 (RPMI-1640) media supplemented with 0% (v/v), 0.2% (v/v) and 2.0% (v/v) Hyclone defined fetal bovine serum (dFBS). The endoderm cultures then received daily treatments of FGF4 (500 ng/ml) and CHIR99021 (2 M) for next 10 days. The intestinal spheroids representing fetal duodenum and ileum were collected on day 5 and day 10, respectively (28). Subsets of spheroids collected at day 5 and day 10 were then cultured in Matrigel with the previously defined intestine growth media (28) for 28 days in order to mature into organoids. The resulting organoids were then prepared for cell sorting to purify EPCAM+ cell population (epithelial fraction) and the sorted cells were processed for RNA-seq library preparation. The cells generated at the stages of hESC, endoderm, duodenal spheroids and ileal spheroids were subject for ChRO-seq and RNA-seq library preparation.

Fluorescence activated cell sorting (FACS)
Methods for organoid dissociation into single cells followed by selection of the epithelial component with fluorescence activated cell sorting, were based on previously described procedures (31). All solutions, including overnight pretreatment of the organoid cultures, contained 10 M Y27632 (Tocris). Matrigel was digested for 30 min with cold 4 mM ethylenediaminetetraacetic acid (EDTA)-DPBS and organoids were washed 4× with cold DPBS. Structures were enzymatically dissociated into single cells using the Tumor Dissociation Kit (human) (Miltenyi Biotec) with a gentleMACS™ Octo Dissociator (with heaters; Miltenyi Biotec) for 50 min at 37 • C. The cell suspension was then washed with 0.5% bovine serum albumin (BSA)-2 mM EDTA-DPBS over a succession of cell strainers, 100 m, 40 m (Corning) and 20 m (CellTrics) and centrifuged for 5 min at 500 × g. Cells were labeled with EpCAM phycoerythrin (PE)-conjugated antibody (BioLegend) and an EpCAM isotype-PE control (BioLegend), and were sorted in 0.1% BSA 2 mM EDTA-DPBS on a MoFlo Astrios 1 (Beckman Coulter; Brea, CA, USA) instrument at the University of Michigan BRCF Flow Cytometry Core facility. Events were first selected with lightscatter and doublet discrimination gating, followed by exclusion of dead cells using 1 M DAPI dilactate (Molecular Probes). EpCAM-PE(+)/DAPI(−) cells were sorted into cold Advanced DMEM/F12 (Invitrogen). Collected cells were reanalyzed for a purity-check and showed greater than 89% viable and 98% EpCAM-PE(+) events. Cells were pelleted at 500 × g for 5 min and flash-frozen for subsequent RNA isolation.

Procr-mGFP-2A-LacZ mouse line and staining
The Procr-mGFP-2A-LacZ mouse line was generated by knocking in a cassette of mGFP-2A-LacZ behind start codon of the Procr gene (32). To perform X-gal staining, embryos were isolated and washed in cold PBS followed by incubation in ice-cold fixative (30 min up to E10.5, or 50 mins for E11.5-E13.5) on a rocking platform. Fixative contains 37% formaldehyde, 25% glutaraldehyde, 10% NP-40 all dissolved in PBS. The fixative was removed and the whole embryo washed twice in PBS for 20 min at room temperature on a rocking platform. The ␤-galactosidase substrate (5 mM K3FE(CN)6, 5 mM K4Fe(CN)6·3H2O, 2 mM MgCl2, 0.02% NP40, 0.1% sodium deoxycholate and 1 mg ml-1 X-gal in PBS) was then added and the tissues incubated in the dark overnight at room temperature.
The substrate was removed and the tissues washed twice in PBS for 20 min at room temperature on a rocking platform. Embryos were serially dehydrated using glycerol, and was stored in 80% glycerol at 4 • C. Whole-embryo images were captured using Olympus SZX16.

Chromatin isolation
The chromatin isolation for ChRO-seq library preparation was performed as previously described (20,33

Total RNA isolation, mRNA-seq library and sequencing
Total RNA was isolated using the Total Purification kit (Norgen Biotek, Thorold, ON, Canada). High Capacity RNA to cDNA kit (Life Technologies, Grand Island, NY, USA) was used for reverse transcription of RNA. Libraries were generated using the NEBNext Ultra II Directional Library Prep Kit (New England Biolabs, Ipswich, MA, USA) and subjected to sequencing (single-end 92×) on the NextSeq500 platform (Illumina) at the Cornell University Biotechnology Resource Center. At least 80M reads per sample were acquired.

Mapping sequencing reads
In the ChRO-seq studies, the publicly available pipeline (35) was used to align ChRO-seq reads. Since the libraries were prepared using adapters that contained a moleculespecific unique identifier (first 6 bp sequenced), the PCR duplicates were first removed using PRINSEQ lite. Adapters were trimmed from the 3 end of remaining reads using cutadapt with a 10% error rate. Reads were mapped with the Burrows-Wheeler Aligner (BWA) to the human reference genome hg38 plus a single copy of the Pol I ribosomal RNA transcription unit (GenBank U13369.1). The location of active RNA polymerase was represented by a single base that denotes the 3 end of the nascent RNA, which corresponds to the position on the 5 end of each sequenced read. Mapped reads were converted to bigwig format using BedTools and the bedGraphToBigWig program in the Kent Source software package. For visualization purpose, bigwig files from identical stages were merged and normalized to a total signal of 1 × 10 6 . In the RNA-seq studies, reads were mapped to human genome hg38 using STAR (v2.5.3a) (36) and transcript quantification was performed using Salmon (v0.6.0) (37) with GENCODE release 25 transcript annotations. The expression (RNA-seq) levels of genes were normalized using DESeq2 (38). All the samples except an Ile HIO sample had <80% mapping rates. Although the Ile HIO sample had an unfavorable mapping rate, it was able to show elevated levels of Ile-associated regional markers compared to the Duo HIO sample (Supplementary Figure  S3A).

ChRO-seq quantification of gene loci and promoter transcription activity
Gene definitions were obtained from GENCODE v25 annotations. To quantify transcription activity of gene loci, ChRO-seq signals present in annotated gene bodies were used with exclusion of reads within 500 base downstream of transcription start site (TSS) to avoid bias generated by the RNA polymerase pausing at the promoters. Genes with gen body <1000 base were excluded from all the gene body related analysis, given that genes with short gene bodies are likely biased when excluding the pause peak. The ChRO-seq reads were normalized by the length of gene bodies to transcripts per million (TPM). To determine promoter activity of genes, stranded ChRO-seq signals at the promoter proximal region (500 bps upstream and 200 bps downstream of TSS) were used. The promoter regions that have significant changing patterns were defined using the following criteria: average TPM > 5 and padj < 0.05 across all the stages by likelihood ratio test (DESeq2). The promoter regions that have significant changing patterns and are annotated as 'protein-coding' gene type were subject to clustering analysis ('degPatterns' in R platform).

Differential expression and pathway analyses of genes
The differential analysis of gene bodies in ChRO-seq data was performed using DESeq2 package (38). For all the analyses except stage-specific marker analysis, the normalized levels of transcription or expression, foldchange and the statistic filtering were based on the DESeq2 analysis including only the two stages in a comparison. The pathway enrichment analyses with subsets of genes were assessed using Enrichr (39).

Identification of stage-specific marker genes
First, genes that have ChRO-seq signal in gene bodies (excluding the first 500 bps downstream of the TSS) greater than 50 TPM in the stage of interest are identified and filtered according to padj < 0.2, P < 0.05, fold change > 1.5 compared to all the other stages (Wald test; DESeq2). Second, genes that have RNA-seq signal > 100 base mean units are identified and filtered according to padj < 0.2, P < 0.05, fold change > 1.5 compared to all the other stages (Wald test; DESeq2). Third, the output from these two analyses are intersected to arrive at final gene lists for stage-specific markers. To identify genes that label SI spheroids irrespective of regional identity, analyses were done using the same criteria except the fold change criterion. The genes that have short gene bodies (<1000 bp) or annotated under the pseudogene category are excluded from this analysis.

TRE identification, annotation, categorization and comparison with other datasets
The active transcriptional regulatory elements (TREs) were identified by dREG tool (35). The annotation of the identified TREs was defined using annotatePeaks.pl function (genome = hg38) of HOMER package (40) based on GENCODE v25 annotations. To compare the TRE landscape of SI spheroids with developing and mature SI in humans, DNase-seq datasets from the Roadmap Epigenomics Project were used. Specifically, first, the files E085-DNase.hotspot.fdr0.01.peaks.v2.bed (fetal SI) and E109-DNase.hotspot.fdr0.01.peaks.v2.bed (adult SI) were downloaded in hg19, converted to hg38 using the USCS liftOver tool and used to define DNase-based open chromatin regions unique to fetal and adult SI using the bedtools intersect function (at least 1 base overlapping). The fetal and adult SI-specific DNase open chromatin regions were further intersected with TREs present in SI spheroids using bedtools intersect function (at least 1 base overlapping). To assign active TREs as promoters, the TREs with at least 1 base overlapping with the window of −1000 base and +200 base of annotated TSSs were defined as proximal TREs, or promoters. The rest of the active TREs were defined as distal TREs, or enhancers.

Stage-specific TRE density analysis
The stage-specific TREs were defined as TREs of which the ChRO-seq intensity is significantly higher in a stage of interest relative to a comparative stage (padj < 0.05 and log 2 fold change > 2.5 by DESeq2) (38). The ChRO-seq intensity was determined by the sum of the unnormalized ChROseq reads from both strands within a TRE region. The density of stage-specific TREs was determined by the number of TREs present within the window of +100 and −100 kb around the TSS for all the genes which are actively transcribed in the stage of interest (TPM > 50 in the ChRO-seq study). The genes which are associated with stage-specific TREs are defined using the following criteria: (i) genes have density of stage-specific TRE > 0, (ii) genes are actively transcribed (TPM > 50 in the ChRO-seq) and expressed (base mean > 100 in the RNA-seq) in the stage of interest and (iii) the genes are significantly uptranscribed and upregulated (padj < 0.2, P < 0.05, fold change > 1.5 in both ChRO-seq and RNA-seq by DESeq2) in the stage of interest relative to a comparison stage.

Identification of de novo stage-specific enhancer hotspots and the associated genes
The stage-specific active distal TREs (enhancers) were used in the enhancer hotspot analysis. The enhancer hotspots in this study were identified by the criteria similar (with slight modifications) to the studies describing 'super-enhancers' (41,42). Briefly, the stage-specific enhancers in proximity of distance (<12.5 kb) were stitched. For each of the stagespecific stitched enhancers, the transcription activity was determined by the sum of un-normalized ChRO-seq signals from both strands of each individual enhancer. To further identify stage-specific enhancer hotspots, a tangent line was applied the stage-specific stitched enhancers and they were ranked based on their transcriptional activities in a plot. The ones above the tangent line in the analysis were defined as stage-specific enhancer hotpots. To identify the genes which are associated with stage-specific stitched enhancers, a given stage-specific stitched enhancer is assigned to the gene of which the transcription in the gene body is active (TPM > 50 in the matching stage) and the TSS is closest to the border of the enhancer hotspot region. To assess overlap between the enhancer hotspots defined in our study with regions of H3K27ac enrichment, we downloaded datasets of H3K27ac peaks from primary human fetal SI (E085-H3K27ac.gappedPeak) from the Roadmap Epigenomics Project. The peak coordinates were first converted from hg19 to hg38 using the UCSC liftOver tool, and then intersected with the enhancer hotspot regions using the bedtools intersect function.

Transcription factor binding motif enrichment analysis
HOMER tool (40) was used to determine enrichment of sites corresponding to known motifs with stage-specific TREs (relative to a comparative stage). More specifically, we used function findMotifsGenome.pl (genome = hg38 and size = given) and the TREs which are shared or unique to the comparative stage were used as background.

Transcription factor cistrome analysis
For a TF of interest, the binding motifs present in the stagespecific TREs were identified and the density of the motif was determined by the number of motifs within the window of +100 and −100 kb around the TSS for all the genes which are actively transcribed in the stage of interest (TPM > 50). The cistrome of a TF is defined using the following criteria: (i) genes have motif density > 0, (ii) genes are actively transcribed (TPM > 50 in the ChRO-seq) and expressed (base mean > 500 in the RNA-seq) in the stage of interest, and (iii) the genes are significantly uptranscribed and upregulated (padj < 0.2, P < 0.05, fold change > 1.5 in both ChRO-seq and RNA-seq by DESeq2) in the stage of interest relative to a comparison stage.

Single cell RNA-seq analysis
The scRNA-seq data of primary human fetal tissues was generated in recent studies (13,43). Briefly, tissue samples were processed into single cell suspension detailed in (13,43). As described in (13), the scRNA-seq dataset of human developing gastrointestinal organs and multiple gestational time points were generated by 10× Genomics platform and mapped to hg19 using default alignment parameters provided by Cell Ranger. Seurat (v3.1) package (44) was applied to perform the downstream analysis. Cells with >5% mitochondrial transcript proportion and fewer than 1000 unique detected genes were excluded. Ribosomal genes, mitochondrial genes and genes located on sex chromosomes were also removed before data integration into a Seurat object. The cell type annotation of the dataset was pre-defined and detailed in (13). In the present study, we focused on the data of matched human fetal duodenum and ileum samples (week 11, 14 and 19 of gestation).

Statistics
All the padj and P-values presented in this study were determined by Wald test (DESeq2), unless otherwise specifically noted.

ChRO-seq reveals temporal dynamics of nascent transcription at gene loci in the directed differentiation model of human developing SI
The directed differentiation of hPSCs (here we used H9 hESCs) into SI spheroids with regional specification was carried out as previously described (26,28) ( Figure 1A). The patterned SI spheroids represent the primitive form of the SI comprised of mainly stem/progenitor cells (45), which can give rise to different mature epithelial cell types after prolonged organoid culture, or following transplantation into a murine host (46). In this study, we included the four stages of this model: hESC, DE, duodenum (Duo) spheroid and ileal (Ile) spheroid ( Figure 1A). We carried out ChROseq in order to characterize the dynamics of genome-wide nascent transcriptional activity toward the goal of defining the changing landscape of promoters, enhancers and active gene bodies across the following sequential events: DE fate induction (from hESC to DE), SI lineage formation (from DE to Duo spheroid), and initial ileal regional patterning (from Duo to Ile spheroid) ( Figure 1A).
To assess nascent transcription of genes, we first analyzed ChRO-seq signal within the body of annotated genes. ChRO-seq data in the first 500 bp downstream of the TSS, was excluded to remove signal contributed by RNA polymerase pausing. Hierarchical clustering analysis of the gene transcriptional profiles demonstrates clean stratification of the different stages ( Figure 1B). The signal at gene bodies encoding TFs that are well-established markers of particular stages (SOX2, GATA6 and CDX2) indicate the expected specificity ( Figure 1C-D). Also, the very minimal expression of the mesenchymal marker FOXF1 ( Figure 1C-D), as well as markers of other organ lineages (Supplementary Figure S1A and B), demonstrate that the Duo and Ile spheroids are indeed dominated by early-stage SI epithelial cells. ChRO-seq can also be used to assess transcriptional activity of short, non-coding regions (e.g. microRNAs). We observed robust ChRO-seq signal within the miR-302-607 cluster, previously reported to be enriched in embryonic stem cells (47), in only the stages of hESC and DE and not in Duo or Ile spheroid (Supplementary Figure S1C). At other microRNA loci, such as miR-10a, we detected an enrichment of ChRO-seq signal in SI spheroids relative to hESC or DE (Supplementary Figure S1D).
Next we sought to identify from the ChRO-seq data differentially transcribed genes during each stage transition of this model (log 2 fold change > 1, average TPM > 25, padj < 0.2, P < 0.05 by Wald test; DESeq2) ( Figure 1E). The analyses identified 1886 differentially transcribed genes (1328 up; 558 down) during DE formation (DE versus hESC), 4024 differentially transcribed genes (2546 up; 1478 down) during SI lineage formation (Duo spheroids versus DE) and 362 differentially transcribed genes (182 up; 180 down) during ileal patterning event (Ile versus Duo spheroids) (Figure 1E). We also performed RNA-seq on the same stages to define differentially expressed genes and the Gene Ontology (GO) term enrichment analysis between differentially transcribed and differentially expressed genes are in general agreement (Supplementary Figure S2).
The same ChRO-seq data also affords the unique opportunity to evaluate promoter activity. Therefore, we next developed an approach to analyze promoter dynamics across all of the stages (Supplementary Figure S1E). We identified 9550 genes, the majority of which are protein-coding genes (n = 7651), which exhibit significantly changing patterns of promoter activity across different stages (padj < 0.05 by likelihood ratio test; DESeq2) (Supplementary Figure S1E). Notably, a subset of these protein-coding genes has low promoter activity in both hESC and DE but high promoter activity in both Duo and Ile spheroids (n = 2992). These include BEX5 and PROCR ( Figure 1F), which were recently observed to have robust expression in primary human fetal SI at early developmental time points (7,12). Analysis of the matched RNA-seq data, which showed a dramatic rise in BEX5 and PROCR mRNA levels during formation of SI spheroids from DE ( Figure 1G), confirm the findings from the promoter analysis. We also validated the upregulation of PROCR in SI spheroids by RT-qPCR with an independent batch of samples ( Figure 1H) and demonstrated by in situ hybridization staining a robust expression of Procr in the midgut region of Procr-mGFP-2A-LacZ mice (32) at E9.5 and E10.5 ( Figure 1I), indicating the conserved early presence of PROCR in the developing SI. Overall, these findings demonstrate that hPSC-derived SI spheroids recapitulate features of early developing SI and therefore represent the state-of-the-art model for further study of human SI lineage formation.

Identification of transcriptional markers of specific stages in the directed differentiation model of human developing SI
ChRO-seq signal at gene bodies (TPM > 50 in at least one stage) reveals the changing patterns of transcription across stages ( Figure 2A). We sought to identify transcriptional markers of each stage in this model. It is important to note that previously validated SI regional markers using human fetal duodenum and ileum ranging 14-19 weeks of gestation (28) are not sufficient to distinguish Duo from Ile spheroids, although they are sufficient to distinguish Duo from Ile HIOs (derived after culturing the spheroids in Matrigel for 28 days) (Supplementary Figure S3A). This likely reflects the fact that spheroids, which are newly differentiated, represent the early gut lineage whereas the HIOs represent a more mature state. Here, by leveraging ChRO-seq signal at gene bodies as well as RNA-seq data, we were able to define gene sets that serve as early markers of SI lineage formation and ileal regional patterning at the levels of both transcription and steady-state expression.
We first demonstrated that ChRO-seq and RNA-seq signal for genes are reasonably well-correlated (Pearson correlation coefficient = 0.73) ( Figure 2B and Supplementary Figure S3B-E), indicating that nascent transcription profiles globally reflect steady state expression profiles in this model system (albeit not perfectly, as expected due to other layers of gene regulation such as those at the posttranscriptional level). To identify genes that label a specific stage at the levels of both transcription and steady-state expression, we developed a bioinformatic pipeline to integrate ChRO-seq and RNA-seq data ('Materials and Methods' section) and performed this integrative analysis with genes that are actively transcribed (TPM > 50) in at least one stage. We identified genes that are significantly elevated according to both transcription and steady-state expression in each stage relative to all other stages: hESC (n = 239), DE (n = 149), SI spheroid irrespective of regional identity (n = 88), Duo spheroid (n = 9) and Ile spheroid (n = 40) ( Figure 2C and Supplementary Data 2). The stage-specific genes we identified include well-established markers, such as POU5F1 and SOX2 for hESCs ( Figure 2D) and NODAL, SOX17, EOMES and LEFTY2 for DE ( Figure 2E). Notably, while these known endoderm markers are highly transcribed in the stage of DE, markers of mesendoderm (e.g. TBXT and PAX6) are lowly transcribed in the DE stage (< 5 TPM; data not shown), suggesting that the DE cells used in this study are indeed highly enriched in cells with DE identity, instead of mesendoderm cells that still have dual potential for endoderm and mesoderm. Furthermore, consistent with the literature, CDX2 was identified to faithfully label SI spheroids irrespective of regional identity ( Figure  2F). We defined additional stage-specific markers, many of which are previously undescribed: HES4, FOXA1, WFDC2 and several HOXB family members for SI spheroids irrespective of regional identity; SP5 and SALL1 for Duo spheroids; and FJX1, IGF2, as well as several HOXA/C/D members for Ile spheroids (Figure 2F and G). We also identified several long, non-coding RNA (lncRNA) and antisense transcript markers: LINC00543 for SI spheroids irrespective of regional identity, and EVX1-AS for Ile spheroids ( Figure 2F and G). Notably, while many genes were defined as markers of SI spheroids irrespective of regional identity, the analysis identified much fewer genes as Duospecific markers compared than Ile-specific markers (Figure 2C), which may suggest that Ile regional specification is mainly driven by gain of additional driver genes rather than loss of genes contributing to Duo regional identity.

Transcriptional regulatory element profiles reveal notable rewiring of chromatin activity during directed differentiation
Active TREs, including promoters and enhancers, are identified in ChRO-seq data by the hallmark feature of short bi-directional transcription ( Figure 3A). To identify active TREs across the entire genome, we employed dREG (35), which was developed specifically for this purpose. Using this tool, we identified a total of 125863 active TREs across all four stages included in this study ( Figure 3B). The length distribution of these active TREs is consistent with what has been reported previously (48) (Supplementary Figure  S4A) and the vast majority of the active TREs, as expected, are located in intergenic regions, introns, and annotated TSSs (Supplementary Figure S4B). To specifically evaluate whether the TRE landscape of SI spheroids is more similar to developing as opposed to mature SI in humans, we leveraged data on DNase-accessible regions in both fetal and adult human SI available from the Roadmap Epigenomics Project. We first defined DNase-accessible regions unique to fetal or adult tissue ('Materials and Methods' section) and then intersected those with active TREs present in Duo and/or Ile spheroids. We found that 12541 out of 37892 (∼33%) open chromatin loci specific to fetal human SI overlap with TREs present in Duo and/or Ile spheroids, whereas only 7504 out of 45242 (∼16%) open chromatin loci specific to adult human SI overlap with TREs present in Duo and/or Ile spheroids ( Figure 3C). To provide a specific example of a TRE defined in this model that captures an important regulatory region associated with human fetal SI, we turned to the very recently identified intestinal critical region (ICR). The ICR represents an important regulatory element of the new gene PERCC1, which exhibits transient expression in the SI during development and has been linked to congenital diarrheal disorders in humans (49). We identified a TRE that is highly overlapping with the ICR and active only in SI spheroids, especially in Duo spheroids ( Figure 3D and E). Taken together, these observations demonstrate that TREs in hPSC-derived SI spheroids are able to capture important features of gene regulatory landscapes that are associated with the developing human SI.
We next categorized the identified active TREs into 49855 proximal and 76008 distal TREs, which from here on in we refer to as promoters and enhancers, respectively ( Figure  3A). Unsupervised hierarchical clustering analyses reveal that enhancer profiles stratify different stages more accurately and clearly than promoters ( Figure 3F and G), consistent with the notion that enhancer signature is the most cell-type specific (14,16). To further elucidate the dynamics of active TRE profiles across the directed differentiation process, we identified active TREs that are unchanged, gained (enhanced), or lost (suppressed) during the transition to DE ( Figure 3H), SI lineage formation ( Figure 3I), and ileal patterning ( Figure 3J) (log2FC > 2.5, padj < 0.05 by Wald test; DESeq2). We found that a much greater proportion of enhancers, relative to promoters, exhibit significantly altered levels of activity (either gain or loss) across all stage transitions. In contrast, enhancers and promoters were roughly equally represented among those TREs that were unchanged in activity across stage transitions ( Figure  3H-J).

Identification of genes associated with high a density of nearby enhancers that emerge during SI lineage formation or ileal patterning
Next we developed an analysis pipeline to further investigate the enhancers and associated genes that emerge during different stages of the directed differentiation ( Figure 4A). We focused on 'Duo-specific enhancers' and 'Ile-specific enhancers', which are the enhancers that gain activity during Duo and Ile spheroid formation, respectively ( Figure 3I and J). We then determined the density of stage-specific enhancers for every actively transcribed gene (TPM > 50 in the stage of interest) by counting stage-specific enhancers within a 200 kb window centered on the TSS ( Figure 4A). The genes that are associated with the stage-specific enhancers and also have significant activation in both ChROseq and RNA-seq are eventually defined as stage-specific enhancer linked genes ( Figure 4A). As defined in Figure 3, we identified a total of 2787 Duospecific and 328 Ile-specific enhancers ( Figure 4B and C). We confirmed that genes associated with a greater number of Duo-specific enhancers also exhibit greater increases in transcription in Duo spheroids relative to DE ( Figure 4D and E). Similarly, genes associated with a greater number of Ile-specific enhancers also exhibit greater increases in tran-scription in Ile relative to Duo spheroids ( Figure 4I   Regarding the transition from DE to Duo spheroids, we identified 117 genes that are associated with at least one Duo-specific enhancer and are significantly increased in both transcription (ChRO-seq) and steady-state expression (RNA-seq) ( Figure 4F). Among these, as expected, CDX2 is one of the top genes ranked by the density of Duospecific enhancers ( Figure 4G). Notably, even more highly ranked are the HOXB family members ( Figure 4H). Regarding ileal regional patterning, we identified 35 genes that are associated with at least one Ile-specific enhancer and are significantly increased in both transcription and steady state-expression in Ile relevant to Duo spheroids ( Figure  4K). Among these, the genes associated with the greatest number of Ile-specific enhancers include members of the HOXA/C/D clusters, FJX1 CSRNP1 and HOTTIP (Figure 4L and M), several of which were also identified as Ile spheroid-specific marker genes ( Figure 2G). These analyses together reveal previously undescribed temporal dynamics of the HOX cluster genes during the events of SI lineage formation followed by ileal regional patterning: first, the HOXB cluster is activated during Duo spheroid formation (likely by nearby Duo-specific enhancers), then subsequently the other three HOX clusters are activated during ileal patterning (likely by nearby Ile-specific enhancers) ( Figure 4H and M).

Identification of genes associated with enhancer 'hotspots' that emerge SI lineage formation or ileal patterning
It has been shown in previous studies that dense clusters of highly active enhancers (which we refer to as 'hotspots') occur nearby to genes that are especially critical for defining cell identity and status (41)(42)50). We sought to define for the first time the changing landscape of enhancer hotspots in this human model of SI development. To accomplish this, we adapted a previously described methodology (41,42), which requires ChIP-seq based datasets, to work with ChRO-seq data instead ( Figure 5A). By implementing this analysis pipeline with the Duo-and Ile-specific enhancers (defined in Figure 4B and C, respectively), we were able to identify enhancer hotspots and the associated genes relevant to SI lineage formation and ileal patterning, respectively.
To identify enhancer hotspots that emerge during SI lineage formation and ileal patterning, we clustered Duoand Ile-specific enhancers, respectively, into 'stitched enhancers' (Supplementary Data 3). Among the 2389 stitched enhancers formed by Duo-specific enhancers (relative to DE), we identified 145 that exhibit strong enough transcriptional activity to be designated as Duo-specific 'enhancer hotspots' (Figure 5B). Among the 305 stitched enhancers formed by Ile-specific enhancers (relative to Duo spheroids), we defined 29 Ile-specific enhancer hotspots ( Figure 5C). Similar analyses were performed with hESCand DE-specific enhancers (Supplementary Figure S6A and E). Given that H3K27ac ChIP-seq signal has been successfully leveraged in previous studies to identify enhancer cluster regions (42), we compared the stage-specific enhancer hotspots with a published database of H3K27ac peaks in primary human fetal SI (from the Roadmap Epigenomics Project) ( Figure 5D). We observed that nearly 50% of Duo-and 55% of Ile-specific enhancer hotspots substantially overlap (at least half of the length) with regions of H3K27ac enrichment, whereas this is the case for only 21% of hESC-or 26% of DE-specific enhancer hotspots. (Figure 5D). This suggests that Duo-and Ile-specific enhancer hotspots defined in the hPSC-HIO model resemble, though of course are not identical to, active enhancer regions in primary human fetal SI. Moreover, we found that the active enhancers defined by our ChRO-seq analysis provide higher resolution in defining the boundaries of each individual enhancer within a given enhancer hotspot region, compared to the narrow, stringent peaks defined by H3K27ac activity (Supplementary Figure S7A and B).
Next, we performed an analysis to identify the genes likely associated with Duo-and Ile-specific enhancer hotspots. We assigned each of the Duo-specific stitched enhancers (including non-hotspots and hotspots) to its nearest active gene (TPM > 50 in ChRO-seq in Duo spheroids) within a 100 kb window on either end of the boundaries of the stitched enhancer. We found that the overall increase in gene body transcription is significantly greater for the set of genes associated with Duo-specific enhancer hotspots compared to those associated with stitched enhancers that are not hotspots ( Figure 5E). We also assigned each of the Ile-specific stitched enhancers (including non-hotspots and hotspots) to its nearest active gene (TPM > 50 in ChRO-seq in Ile spheroids) using the same criteria. Similar to the observation made for the transition from DE to Duo spheroids, the genes nearest to Ile-specific enhancer hotspots exhibit a significantly greater increase in transcription compared to those associated with non-hotspot stitched enhancers ( Figure 5F).
These findings are consistent with the notion that enhancer hotspots exert particularly strong effects on transcription. Among the 73 genes nearest to Duo-specific enhancer hotspots, 25 exhibit highly significant increases in both transcription (ChRO-seq) and steady state expression (RNA-seq) in Duo spheroids relative to DE ( Figure  5D and E). Many of these were defined earlier as SI or Duo-specific markers (e.g. CDX2, FOXA1, HOXB cluster, SALL1 and LINC00543) (Figure 2; Supplementary Figure  S7A). Among the 22 genes nearest to Ile-specific enhancer hotspots, 12 exhibit highly significant increases in both transcription (ChRO-seq) and steady-state expression (RNAseq) in Ile relative to Duo spheroids ( Figure 5H). These genes include HAND2, HOXC/D family members, FJX1, DLX5, CSRNP1 and PITX1 ( Figure 5I and Supplementary Figure S7B), many of which were also identified earlier as Ile-specific markers (Figure 2). Similar analyses were performed for the hESC and DE stages and the results are summarized in Supplementary Figure S6.

Identification of candidate TF drivers and their networks relevant to SI lineage formation or ileal patterning
To discover putative key TF drivers and their candidate cistromes (genome-wide set of TF targets) in this model, we first used the tool HOMER to determine motifs enriched in each set of stage-specific enhancers and then identified genes associated enhancers that harbor specific motifs of interest ( Figure 6A). Expectedly, HOMER analyses  of hESC-and DE-specific enhancers revealed binding sites of POU5F1 and SMAD2 to be the most enriched motifs, respectively (Supplementary Figure S8A-D). We next performed motif enrichment analyses on Duo-( Figure 6B) and Ile-specific enhancers ( Figure 6D).
As expected, among the 2787 Duo-specific enhancers relative to non-Duo specific enhancers ( Figure 6B), we detected significant enrichment for the binding motifs of CDX2 and many forkhead box TFs ( Figure 6C). We also performed the same analysis with Duo-specific enhancer hotspots relative to non-hotspots ( Figure 6B) and observed significant enrichment for binding motifs of SP5 and RXR ( Figure 6C), the former of which is defined as a Duo-specific gene marker ( Figure 2G), but neither of which have been associated previously with early SI development. Genes that are associated with Duo-specific enhancers harboring motifs of one or more of these TFs and that are significantly increased in transcription (ChRO-seq) and steady-state expression (RNA-seq) are shown in Figure 6F (see full gene list in Supplementary Figure S8E). The TF cistrome analysis suggests that CDX2 is a prominent transcriptional activator of the HOXB cluster ( Figure 6F), which is consistent with previous observations (8). Moreover, this analysis reveals candidate TF activators of the gene CDX2. While Nucleic Acids Research, 2021, Vol. 49, No. 2 739 CDX2 locus has a CDX2 motif in its promoter region in Duo-spheroids (data not shown) and is thus likely subject to autoregulation as previously described (51), Duo-specific enhancers that are associated with the CDX2 locus do not harbor any binding motifs for CDX2 and instead have binding motifs for other TFs such as FOXA1 and FOXO3 (Figure 6F). More importantly, through this analysis we were able to identify candidate TF drivers of SI/Duo spheroid marker genes that do not appear to be a part of the CDX2 cistrome. For example, WFDC2 and SALL1 (defined earlier as a marker gene of SI and Duo spheroids respectively) are associated with FOXA1 and FOXO3 motifs; SP5 (defined earlier as a marker gene of Duo spheroid) is associated with the TCF3 motif ( Figure 6F).
We next performed a similar analysis for ileal regional patterning. Among the 328 Ile-specific enhancers relative to non-Ile specific enhancers ( Figure 6D), we detected significant enrichment for the binding motifs of several key TF families including CDX2, HOXC9, HOXA11 and PITX1 ( Figure 6E), the last of which is itself associated with Ilespecific enhancer hotspots ( Figure 5I). We also performed the same analysis with Ile-specific enhancer hotspots relative to non-hotspots but did not observed significant enrichment for binding motifs (data not shown). Genes that are associated with Ile-specific enhancers harboring motifs of one or more of these TFs and that are significantly increased in transcription (ChRO-seq) and steady-state expression (RNA-seq) are shown in Figure 6G. This analysis led to several new observations. Firstly, CDX2 likely plays a role in both Duo spheroid formation and ileal regional patterning, but apparently through distinct cistromes ( Figure 6F and G). Notably, the activation of HOXD and HOXC (especially HOXC8 and 9) loci is associated with Ile-specific TREs harboring the CDX2 motif ( Figure 6E). While the HOXB cluster is induced in Duo spheroids, it remains highly transcribed in Ile spheroids, likely due to the fact that the nearby TREs with CDX2 motifs that are activated during Duo spheroid formation are retained during ileal regional patterning ( Figure 4H, and M). Secondly, we found that the activation of HOXA, C and D families are associated with overlapping but distinct TF regulators. Specifically, while HOXA11 seems to act on all four HOX clusters, PITX1 is associated only with the HOXA cluster and HOXC9 is preferentially associated with the HOXD cluster ( Figure 6G). Finally, this analysis also reveals other TF cistrome networks that do not target HOX genes during ileal patterning ( Figure 6G). For example, CSRNP1, FJX1 and ID1 loci are associated with motifs of AP-1 (e.g. JUNB and FOSL2) and/or ATF factors ( Figure 6G).

The HOX cluster dynamics observed in the directed differentiation model of SI is present in the early developmental stages of primary human fetal SI tissue
Across various analyses in this study we repeatedly observed a previously undescribed dynamics of HOX gene activation across SI lineage formation to ileal patterning. Notably, ChRO-seq data reveal strong transcriptional activation of the HOXB cluster (particularly HOXB1) upon acquisition of SI identity (Duo spheroids) and the sequential transcriptional activation of members of HOXA, C, D clusters during ileal regional patterning (Ile spheroids) (Figure 7A), which as we showed previously is also reflected in the pattern of emergence of nearby enhancers and enhancer hotspots (Figures 4 and 5). This unique HOX cluster dynamics is largely reflected at the level of steady-state expression (RNA-seq) also ( Figure 7B), except for the 3 -most HOXA genes (e.g. HOXA1, A2) and 5 -end HOXB genes (e.g. HOXB7, B9) ( Figure 7A and B), which may be subject to post-transcriptional regulation.
To determine whether the differences in HOX gene expression between early Duo and Ile extends beyond this in vitro directed differentiation model, we mined the single cell RNA-seq (scRNA-seq) dataset of primary human Duo and Ile tissue at very early stages of fetal development (13,43). Specifically, we focused on matched Duo and Ile fetal tissue at week 11, 14 and 19 of gestation. It is important to note that these developmental time points, though considered as early/mid gestation stage, represent time points much later than what is reflected in our much earlier stage SI spheroids and therefore exhibited weaker overall expression of HOX genes. Since the SI spheroids generated from the directed differentiation method are mainly comprised of epithelial cells but also contain some mesenchymal cells, we analyzed both cell types from the scRNA-seq data, together and separately, in order to examine the HOX dynamics. At week 11 of gestation, we found that the HOXB cluster is active in both human fetal Duo and Ile cells (epithelium plus mesenchyme), with HOXB1 particularly enriched in Duo cells and 5 HOX genes of all four clusters enriched in Ile cells ( Figure 7C). This distinct HOX pattern between primary human fetal Duo and Ile tissues is consistent with our observations in the Duo and Ile spheroids generated from the directed differentiation method. Interestingly, this signal is present only at week 11 and dissipates as the developmental process proceeds (week 14 and 19) ( Figure 7C). The scRNA-seq analysis also offers the new insight that the enrichment of HOXB1 in primary fetal Duo tissue is mainly driven by the mesenchymal fraction, whereas the activation of 5 HOX genes of all four clusters in Ile tissue occurs in both epithelial and mesenchymal cells ( Figure 7C). Overall, this analysis suggests that the Duo and Ile spheroids generated from the directed differentiation model appear to capture the region-specific HOX features that are transiently present in the developing human SI.
Beyond the finding of the previously undescribed temporal HOX cluster patterns highlighted here, the ChRO-seq analyses in this study provide a comprehensive characterization of chromatin regulatory dynamics, enhancer activity profiles, and transcriptional programs that are associated with human SI lineage formation and regional patterning. A working model is provided in Figure 7D.

DISCUSSION
ChRO-seq is capable of high-resolution definition (approximately single nucleotide resolution) of regulatory elements, which offers the opportunity to more precisely define the boundary of promoters and enhancers. ChRO-seq also captures active regulatory elements, as opposed to open chromatin regions which represent candidate regulatory elements. By leveraging ChRO-seq, we generated for the first time ever comprehensive chromatin regulatory landscapes across the different stages of directed differentiation from hPSC to SI spheroids with regional specification. Specifically, we defined: (i) early marker genes that label SI lineage formation and regional specification, (ii) the map of active regulatory elements (promoters and enhancers) as well as enhancer hotspots associated with SI lineage formation and ileal patterning and (iii) candidate key TF drivers and their putative cistromes relevant to these critical developmental events. However, in these analyses we also observed cases where genes near an emerging enhancer or enhancer hotspot do not exhibit changes in transcriptional activity and expressional levels, which may indicate the presence of very long-range chromatin interactions beyond the conservative distance criterion used in this study (100 kb window from TSSs). Recent seminal studies leveraging Hi-C technology observed that chromatin interactions are often in the range of 200 kb but could be up to 500 kb or more in the human genome (52,53). Nonetheless, the findings presented in our work motivate future experiments to investigate the functional role of candidate regulators and assess the sufficiency and necessity of the associated enhancers and enhancer hotspots for controlling transcriptional networks relevant to early human SI development and patterning.
We identified for the first time HES4 as marker gene of SI spheroids irrespective of regional identity in the context of gut development; one likely reason for this is that it is absent in the genome of the mouse (54), the model organism most used previously to study gut development. Moreover, the marker genes that we found to specifically label Duo identity include SP5 and SALL1, the latter of which has been linked to Townes-Brocks syndrome with developmental malformation of multiple organs including the limb, kidney and gastrointestinal tract (55,56). While SP5, SALL1 and other marker genes of Duo identity were reported to enhance Wnt signaling (57,58), we identified a distinct set of marker genes that specifically label Ile identity (e.g. HAND2, FJX1 and CSRNP1) that are also known to be downstream of Wnt signaling. Fjx1 has been shown to function through the Wnt/planar cell polarity pathway to determine anterior-posterior axis in Drosophila (59,60) and is expressed in the epithelium of murine developing gut (61). Csrnp1 is expressed in neural crest progenitors in a Wnt1/␤-catenin-dependent manner and acts as a critical effector of Wnt signaling for driving neural crest formation (62). Together, these observations indicate that different sets of Wnt signaling effector genes are likely induced upon different durations of exposure to the Wnt-activating supplements during directed differentiation. Future investigation of the functional roles of these genes in the context of early gut development is warranted. The present study also uncovers the associated enhancers (or enhancer hotspots) and TF motifs of these marker genes, which may elucidate the potential mechanisms through which these genes are regulated during the establishment of SI identity and regional specification.
Although HOX genes have been reported to play important roles in proximal-distal axis patterning in the gut through a so-called 'spatial collinearity' mechanism, our observations in the directed differentiation model of human SI development suggest even greater complexity to HOX genes dynamics during small intestinal formation and regionalization. The spatial collinearity of HOX genes describes the correspondence between HOX paralogues along the chromosome (3 to 5 ) and their expression pattern along the anterior-posterior axis. Indeed, consistent with early literature in the mouse developing gut (63), our data shows higher expression of 3 HOX genes in Duo spheroids and higher expression of 5 HOX genes in Ile spheroids. Notably, though, our data additionally reveals that the formation of Duo and Ile spheroids are associated with the activation of distinct families of HOX genes. Specifically, we found that the activation of the HOXB cluster (especially HOXB1-4) first occurs during the formation of Duo spheroids and the other HOX clusters (HOXA, C and D) are activated subsequently during the formation of Ile spheroids. Through mining of a novel scRNA-seq dataset (13,43), we showed that this distinct HOX cluster behavior between Duo and Ile regions is present in primary human fetal SI tissue. Therefore, this distinct HOX cluster behavior we observed in vitro likely resembles SI regional specification in vivo in humans. Notably, a recent scRNA-seq study done in the digestive tract of E8.75 mouse embryos showed strong expression of Hoxb1 and Hoxb2 around the proximal midgut region (from where the duodenum is derived) in a pseudo anteriorposterior space (6). The same mouse study also indicated higher expression of several Hoxc and Hoxd genes in the distal digestive tract. Furthermore, the mouse model lacking Hoxd3 was reported to have severe narrowing of the proximal colon and retention of gut luminal content in the terminal region of the SI (ileum), providing in vivo evidence for the role of a 3 Hoxd gene in regulating the development of the distal intestinal region. Using hPSC-derived SI spheroids, as well as in human primary fetal tissues, our study is able to show activation of distinct HOX clusters during early SI formation and regional specification. More importantly, through ChRO-seq analyses, we also provided additional insights into HOX biology by elucidating candidate TF drivers of specific HOX genes (and associated enhancers and enhancer hotspots), which merits detailed investigation.
Our ChRO-seq analysis mapped the key transcriptional networks associated with SI lineage formation or regional patterning. For example, we found that binding sites of FOXA1 and FOXO3 are enriched in active enhancers that emerge only during SI identity acquisition, including those nearby CDX2, which encodes the well-known master TF regulator of SI development. Notably, the FOXA1 gene locus is actively up-transcribed during the transition from definitive endoderm to SI spheroids and is associated with an enhancer hotspot that emerges during this process. FOXA1 has been reported to regulate allocation of murine SI secretory lineage (64) and proper gene expression in murine intestinal epithelium (65). Recently, dysregulation of intestinal FOXA1 also has been linked to necrotizing enterocolitis in infants (66). The observations made in this study as well as previous reports together underscores the functional importance of FOXA1 in human gut development. Also, our analyses suggest that CDX2 is involved in not only in initial SI lineage establishment but also the subsequent event of ileal specification, but evidently through distinct cistromes. One specific example is the association between the HOXC9 locus and the nearby Ile-specific active enhancers containing CDX2 binding motifs, which is consistent with a previous study showing a remarkable downregulation of Hoxc9 in distal but not in proximal intestine of Cdx2 null mice (2). Recent evidence also demonstrates that CDX2 regulates developing and fully mature gut through targeting distinct gene sets in mice (8) and that CDX2 is involved in patterning both intestinal epithelial and mesenchymal cells in the hPSC-based directed differentiation model (13). Our ChRO-seq findings pertaining to CDX2 may further elucidate how CDX2 functions through a stageor cell type-dependent manner during early human gut development. Beyond CDX2, other candidate TFs associated with the establishment of ileal identity are worthy of future investigation. One notable example is PITX1, the ortholog of which has been reported previously in mouse models to be expressed in the distal SI region and to be involved in cecum budding during murine development (67). However, PITX1 has not been previously appreciated in the context of ileal regional specification.
The use of the state-of-art in vitro model of human SI development is a critical feature of this study given the lack of other existing platforms or resources that are readily available for the study of very early developmental time points pertaining to SI lineage formation and regional patterning. It is also important given the growing appreciation for cross-species differences in developmental processes at the molecular level (68)(69)(70)(71). Furthermore, the genome-wide characterization of the chromatin regulatory landscape by ChRO-seq in this model system generates valuable translational knowledge for a better understanding of the transcriptional programming underlying early SI development in humans and motivates future functional studies of candidate TFs and enhancers in driving human SI development. Our analysis pipeline can also be applied to other hPSCderived organoid systems to gain insights into dynamic transcriptional programming and chromatin status during early stages of human organogenesis. Most importantly, the identification of the candidate drivers in the present study may ultimately improve methods of generating therapeutic replacement SI as well as molecular therapies for children and possibly adult patients.

DATA AVAILABILITY
Raw sequencing data of ChRO-seq and bulk RNA-seq are available through GEO accession GSE142316. Raw single cell RNA-seq data of human fetal duodenum is available through ArrayExpress (E-MTAB-9489) (13,43). Raw single cell RNA-seq data of human fetal ileum is available through ArrayExpress (MTAB-9906).