-
PDF
- Split View
-
Views
-
Cite
Cite
Margaret Brown, Anne Dodd, Fang Shi, Emily Greenwood, Sini Nagpal, Vasantha L Kolachala, Subra Kugathasan, Greg Gibson, Concordant B and T Cell Heterogeneity Inferred from the multi-omic Landscape of Peripheral Blood Mononuclear Cells in a Crohn’s Disease Cohort, Journal of Crohn's and Colitis, Volume 18, Issue 12, December 2024, Pages 1939–1956, https://doi.org/10.1093/ecco-jcc/jjae055
- Share Icon Share
Abstract
Crohn’s disease is characterised by inflammation in the gastrointestinal tract due to a combination of genetic, immune, and environmental factors. Transcriptomic and epigenomic profiling of intestinal tissue of Crohn’s disease patients have revealed valuable insights into pathology, but have not been conducted jointly on less invasive peripheral blood mononuclear cells [PBMCs]. Furthermore, the heterogeneous responses to treatments among individuals with Crohn’s disease imply hidden diversity of pathological mechanisms.
We employed single nucleus multi-omic analysis, integrating both snRNA-seq and snATAC-seq of PBMCs with a variety of open source, bioinformatics applications.
Our findings reveal a diverse range of transcriptional signatures among individuals, highlighting the heterogeneity in PBMC profiles. Nevertheless, striking concordance between three heterogeneous groups was observed across B cells and T cells. Differential gene regulatory mechanisms partially explain these profiles, notably including a signature involving TGFß signalling in two individuals with Crohn’s disease. A mutation mapped to a transcription factor binding site within a differentially accessible peak associated with the expression of this pathway, with implications for a personalised approach to understanding disease pathology.
This study highlights how multi-omic analysis can reveal common regulatory mechanisms that underlie heterogeneity of PBMC profiles, one of which may be specific to inflammatory disease.

1. Introduction
Crohn’s disease is a chronic gastrointestinal disease caused by interactions of genetic, immune and environmental factors.1 Genome-wide association studies [GWAS] have identified over 240 loci associated with inflammatory bowel disease [IBD], with many associating with increased Crohn’s disease risk.2 GWAS have provided insight into genetic associations of Crohn’s disease, and transcriptomic studies provide further insight into biological mechanisms associated with Crohn’s disease pathology. Such studies have focused on cells in the intestine, neglecting peripheral blood despite the contribution of immune cells to inflammation and evidence that circulating peripheral blood profiles have distinct differences in individuals with Crohn’s disease.3,4 The largest single-cell RNA sequencing [scRNA-seq] investigation of Crohn’s disease to date explored immune dysregulation and identified transcriptional and cellular composition differences within and among immune cells across compartments of intestinal tissue, and related these to disease severity.5 Another single-cell transcriptomic analysis contrasted individuals with ulcerative colitis and Crohn’s disease,6 detecting compositional differences in the peripheral blood immune cell compartment, confirming our own observations performed with bulk RNA-sequencing.6,7 However, it is not clear whether these differences are specific to Crohn’s disease, or are enriched in a variety of autoinflammatory conditions.
Cellular and molecular heterogeneity are hallmark characteristics of Crohn’s disease and pose a challenge for understanding the mechanisms of pathology, contributing to variability in disease location, severity, progression, and treatment response.8 This heterogeneity has been observed within both gene expression and chromatin accessibility of individuals with Crohn’s disease within colonic tissue, suggesting distinct molecular signatures contributing to Crohn’s disease heterogeneity.9 Similarly, PBMC profiles from scRNA-seq analysis of individuals with systemic lupus erythematosus [SLE] identified transcriptional signatures that stratified subpopulations of SLE patients.10 Similarly, the first scRNA-seq study of ileal biopsies from Crohn’s disease patients identified cell modules within inflamed tissue correlated with varied responses to anti-tumour necrosis factor [TNF] therapy, demonstrating the utility of single-cell genomics in understanding variability of treatment response in Crohn’s disease.11 Whereas several transcriptomic studies of Crohn’s disease have noted inter-individual variability, they have not jointly considered transcriptional and epigenetic heterogeneity. Inherent variability underlies the idea of personalised medicine, the notion that individual-specific profiles can be used to guide treatment decisions, potentially optimising clinical responses.
The majority of existing studies are among individuals of European or East Asian continental ancestry, and it is well acknowledged that there is inadequate representation of individuals with African American ancestry in biomedical genomic research.12 The few studies including African American participants have generally been performed with small cohorts.13 This underrepresentation highlights the critical need for diverse and inclusive studies for comprehensive understanding of Crohn’s disease across populations. Although some comparative studies of Crohn’s disease between European and African American ancestries indicate few differences in disease presentation, others revealed distinct genetic landscapes with respects to ancestry.14,15 Notably, Mo et al. [2020] reported differential ileal gene expression between African American and European ancestry IBD cases, engaging multiple pathways associated with adverse disease progression.16
We present a pioneering study profiling the multi-omic landscape of circulating immune cells from Crohn’s disease and healthy individuals of African American ancestry.17 Our analysis addresses three hypotheses: 1] there is a systemic difference in the PBMC profiles of Crohn’s disease and non-IBD donors; 2] gene expression heterogeneity is randomly assorted across B cell and T cell compartments of the adaptive immune system; and 3] multi-omic analysis can discern gene regulatory networks [GRNs] underlying subsets of PBMC transcriptomes. Broadly, we reject both the first and second hypotheses, finding that for the most part Crohn’s disease and non-IBD profiles intermingle, but also noting that there are three or possibly four profiles that strongly co-cluster between the B and T cell compartments. Joint snRNA-seq [single-nucleus RNA-seq] and single-nucleus assay for transposase accessible chromatic sequencing [snATAC-seq] analysis on the same cells using the 10x Genomic multi-ome assay18 identify distinct immune signatures and GRNs within different subsets of individuals, contributing to a new perspective of autoimmune and inflammatory disease pathogenesis.
The resulting data serve as a valuable resource for further investigations of molecular mechanisms underlying autoinflammatory disease, particularly in an underrepresented population. We identified a subset of Crohn’s disease patients characterised by an expansion of activated immune cells, alongside an increase of TGFß signalling gene expression and chromatin accessibility. This begins to document how chromatin accessibility regulates heterogeneous patterns of immune cell gene expression. The diverse profiles are mostly shared by Crohn’s disease patients and healthy donors, contrary to naïve expectations of a consistent signature of disease, suggesting that altered immune function needs to be evaluated against a background of normal heterogeneity. The discovery of two divergent autoinflammatory profiles has implications for personalised medicine if it turns out to be possible to moderate these with different targeted immunomodulatory approaches.
2. Materials and Methods
2.1. Patient samples and genotyping
A total of 59 samples were genotyped on the H3Africa array19 and annotated against the hg19 human genome. These included 42 individuals with Crohn’s disease, 14 healthy individuals, and three individuals whose status was unknown at the time of analysis. Of those 59 individuals, 19 were used for single-nucleus multi-ome analysis, and an additional 11 multi-ome samples lacking genotype data were also included. Due to limited sample availability, we were unable to obtain genotypes and single-nucleus multiple data for all samples. Patient metadata are available in Supplementary Table 1.
2.2. Single-cell library preparation, sequencing, and alignment
Whole blood [approximately 6–8 ml] was collected in BD Vacutainer CPT Tubes [Becton, Dickinson, NJ, Cat #362753] using standard venepuncture techniques. To ensure proper barrier formation, the tube was inverted and centrifuged at room temperature within 2 h of collection at a speed of 1600 RCF [relative centrifugal force] for 30 min. Serum was aspirated and a buffy layer containing mononuclear cells was collected, washed, and centrifuged [300 RCF, 5 min] to get the cell pellet. Total cell count and viability were assessed using the Luna FX7 automated cell counter and Acridine Orange Propidium Iodide stain [AOPI, Cat# F23001] [Logo Biosystems, VI]. PBMCs were immediately cryopreserved in freezing media containing 90% fetal bovine serum [FBS] and 10% dimethyl sulphoxide [DMSO] and stored in liquid nitrogen until processed for multi-ome.
Cryopreserved PBMCs were thawed at 37°C for 1–2 min and washed with RPMI and 10% FBS media. After thawing, 500K-1 million cells were used to isolate nuclei using the 10X genomics demonstrated protocol [CG000365 Revision A]. A lysis time of 4 min was used for all samples to achieve optimal lysis of the cells [<5–10% viability]. Viability and nuclear membrane quality was assessed using AOPI and light microscope. We targeted 9000 nuclei per barcode by pooling two or three samples with 3000–4500 nuclei per sample. We proceeded to Chromium Next GEM Single Cell multi-ome ATAC + Gene Expression protocol according to manufacturer’s instructions [10X Genomics, part # 1000283].
Libraries were made using the Dual Index Kit TT set A [part # 1000215] and sequenced at the Georgia Tech Molecular Evolution Core. The libraries generated for Sample_1, Sample_2, Pool_2 and Pool_3 [eight individuals total] were first run on a NextSeq Mid Output [150 cycles], and re-run on a NextSeq High Output kit for greater sequencing depth. All other libraries were run on NovaSeq S1 kits [100 cycles]. The read configuration was 28/10/10/90 for gene expression, and 50/8/24/49 for ATAC in accordance with 10X Genomics’s recommendations. The fastq files were aligned using cellranger-arcv2.0, with the refdata-cellranger-arc-GRCh38-2020-A-2.0.0 reference.20 The cellranger output summary statistics are reported in Supplementary Table 2.
2.3. Multi-ome quality control and data processing
Code used for the methods and figure generation is on our GitHub repository: [https://github.com/GibsonLab-GT/scmulti-ome-Crohns-Disease]. Gene expression matrices generated by cellranger were assessed with Seurat [v4.3.0],21,22 and ATAC fragment matrices from cellranger were analysed using ArchR [v1.0.2].23 Quality control for the snRNA-seq data included removing cells with fewer than 300 or more than 3500 genes detected, as well as with more than 20% of their transcripts being mitochondrial genes characteristic of dying, dead, or uncharacteristically stressed cells [Supplementary Figure 1A, B]. For snRNA-seq analysis, each cell’s gene expression was normalised using the Seurat ‘LogNormalize’ function, adjusting the total expression of each cell to log counts per 10 000. For visualisation purposes, the abundance mean expression was shifted across cells to 0 and the variance scaled to 1.
For snATAC-seq data, cells with a transcription start site [TSS] enrichment score below 8 or above 30 were removed to control the uniformity of chromatin accessibility signals. Cells with less than 3 or greater than 4.75 log10[nFragments] detected were removed to filter out possible empty droplets and multiplets [Supplementary Figure 1C, D]. After applying these quality control measures, 72 457 nuclei remained for downstream analysis.
2.4. Clustering and peak calling
We conducted a stability analysis to identify the most stable set of clusters generated with different parameter settings by computing the Rand Index with the R package Dune. The most stable set of clusters was observed by jointly clustering on snRNA-seq data and snRNA-seq data together, and testing several parameters to determine the most stable set of clusters [iterations = 5, resolution = c[0.2, 0.4, 0.5, 0.8], variable features = 25000, dim = 1:20] [Supplementary Figure 2A]. Batch effect correction was performed with Harmony [corCutOff = 0.5].24 Clustering was then performed on the corrected data [addClusters, resolution = 0.5] followed by UMAP dimensionality reduction [nNeighbors = 30, minDist = 0.5]. We clustered on snATAC-seq data [same parameters as joint clustering] and snRNA-seq data [with Seurat and Harmony, dims 1:20 and res 0.75] separately to confirm clustering on the jointly generated data, leading to identification of the most stable set of clusters using the Rand index [Supplementary Figure 2B]. For cell type annotation we used both gene expression and chromatin accessibility data, focusing on cell type specific genes. We confirmed that the same cell populations could be identified based on clustering from snRNA-seq data [Supplementary Figure 3]. Using the snATAC-seq data, peak calling was performed on pseudobulk group coverages of cell type clusters per sample [see 2.5. Deconvolution of pooled samples]. This was accomplished using addGroupCoverages, followed by peak calling with MACS2 [addReproduciblePeakSet in ArchR].25,26
2.5. Deconvolution of pooled samples
A total of 31 samples were obtained, with three processed as individual samples and the remaining 29 samples processed in pools. One individual was sampled twice and treated as a single individual for the analysis. The 29 pooled samples were divided into 10 pools consisting of either two or three individuals per pool. Of these, four had genotypes available for all samples [Pools 2, 5, 7, 8], and the remainder had only some samples genotyped. Fully genotyped pools were deconvoluted using demuxlet [after converting to hg38 using liftOver], which used the genotype data to determine which cells came from each individual.27,28 The partially genotyped pools were deconvoluted using freemuxlet, which performs variant calling from Cellranger bam files to identify cells originating from the same individual. We compared the called variants from freemuxlet [after converting the VCF files from hg38 to hg19 using liftOver] to the genotyped individuals’ genotype data and determined the identity of the remaining individuals based on their self-reported sex [Supplementary Figure 4]. Two samples from Pool 4 could not be deconvoluted [no genotype data, same phenotypic sex]. Pool 6 Donor 3 was the same individual as Pool 3 Donor 3, determined by comparing all three Pool 6 donors’ freemuxlet called variants with Pool 3 Donor 3 to finish deconvolution of Pool 6. Of the 31 samples, two lacked metadata but were included in the clustering analysis before being excluded from downstream analysis.
2.6. Unsupervised hierarchical clustering
Pseudobulk transcriptomic data from B and T cells were used to identify subsets of individuals with similar profiles. For B and T cells, separately, raw count data were converted to counts per million [CPM] after summing the counts for all cells of a population for each donor and dividing by the total number of reads per donor. This was converted to CPM by multiplying by a scale factor of 1 000 000, and then log2 transformed by computing log2[CPM + 1]. Hierarchical clustering was performed for B and T cells on the top 5000 expressed genes using Ward’s clustering method. The proportions of B cell subpopulations and T cell subpopulations associated with each hierarchical cluster were evaluated by analysis of variance [ANOVA] using the R package speckle.29 We investigated batch having an impact on cluster assignments using ANOVA, yielding p-values of 0.13 and 0.08 within the B and T cell compartments, respectively, suggesting batch effects did not appear to significantly contribute to the hierarchical clusters. We investigated potential effects of unique molecule identifier [UMI] counts driving the cluster assignments with ANOVA, yielding p-values of 0.23 and 0.87 for B and T cell compartments, respectively, suggesting UMI counts did not impact hierarchical clusters. This analysis was replicated in the Onek1k dataset,30 with hierarchical clusters shown in Supplementary Figure 5A and proportions of subpopulations in the B and T cell compartments shown in Supplementary Figure 5B. To compare subclusters identified in the Onek1k dataset with our multi-ome clusters, we assessed PC1 scores from principal component analysis [PCA] of the top 50 DEGs for B cell and T cell subsets from our Crohn’s dataset using Onek1k, followed by ANOVA and Tukey tests for evaluation of significance across groups, with PC1 score distributions depicted in Supplementary Figure 5C. As a negative control, PC1 scores were also performed on permuted data. To assess robustness of the clustering, we also performed Combat normalisation using the sva package.31 Pseudobulk logCPM data from above was processed separately for the 26 B cell samples and 28 T cell samples plus one duplicate. Since healthy Donor 1 was processed independently, we grouped those samples with one of the other batches, resulting in three batches of eight to ten samples each.
2.7. Identifying differential features, enriched pathways, and ligand-receptor interactions
Differentially expressed genes [DEGs] were identified using the MAST R package which implements a two-part, generalised linear model. The number of UMIs from the snRNA-seq data was used as a covariate.32 DEGs were determined by expression in at least 10% of cells from either test group, had an absolute log2FC value greater than or equal to 0.25, and passed a Bonferroni adjusted p-value of at most 0.05. For differentially accessible regions [DARs], getMarkerFeatures[] from ArchR was used on MACS2 identified peaks [testMethod = binomial], controlling for TSSEnrichment and log10[nFragments]. DARs meeting the following criteria were considered: a false-discovery rate [FDR] less than or equal to 0.05 and an absolute log2FC value greater than or equal to 0.25.
Pathways enriched in DEGs were identified using ToppFun, which is part of the ToppGene Suite.33 Upregulated pathways were identified using DEGs with log2FC greater than 0.25, and downregulated pathways were identified using DEGs with log2FC less than -0.25. Queries were performed against multiple databases, including BioCyc, Reactome, KEGG, Pathway Interaction Database, GenMAPP, Panther DB, Pathway Ontology, SMPDB, and MSigDB C2 Biocarta [v7.1]. Bonferroni correction was applied for comparison control, with a 0.05 significance threshold. Ligand-receptor interactions between cells from Crohn’s disease Patients 4 and 12 were investigated using the R package CellChat [v1.6.0].34 The CellChat human database was used to compare highly expressed genes against potentially interacting ligand-receptor pairs, with default parameters for computing communication probabilities between interacting cell groups.
2.8. Predicted transcriptional regulation analysis
Predicted transcriptional regulatory activity was determined by linking DEGs and DARs. For each DEG, we defined a cis-regulatory region extending 500 kb upstream of the transcription start site [TSS] and downstream 100 kb of the transcription end site [TES]. This choice of distances was based on previous expression quantitative trait loci [eQTL] fine mapping results, in addition to previous work by Chen et al. identifying transcription factor regulatory ranges.35,36 To establish associations between gene expression and chromatin accessibility, we examined DARs that overlapped with the defined cis-regulatory regions. Gene coordinates with their TSS and TES coordinates were obtained from GENCODE release version 41 human reference GRCh38.p13.37 Differentially expressed transcription factors [DETFs] were identified from the DEG list with focus placed on upregulated transcription factors. Emphasis was placed on DARs with increased accessibility to infer potential transcription factor binding. The R package TFBSTools [min.score = 80%] was used to scan each DAR sequence for potential binding sites for DETFs with a positional weight matrix from the JASPAR2020 database.38,39 Figures were generated with circos plots using the circlize R package.40
3. Results
3.1. Multi-omic profiling of PBMC reveals 18 distinct cell types
To investigate gene regulatory mechanisms [GRNs] of circulating immune cells of Crohn’s disease, we sampled PBMC from 19 individuals with Crohn’s disease and nine healthy control individuals, all of African American ancestry. We performed snRNA-seq and snATAC-seq multi-omic profiling of the same cells in pooled assays and genotyped the individuals for sample deconvolution [see Methods, Figure 1A]. After applying quality control measures to snRNA-seq and snATAC-seq data [see Methods, Supplementary Figure 1A–D] 72 457 cells were retained for downstream analysis, averaging 2588 immune cells per individual. We performed unsupervised clustering using both transcriptomic and epigenetic features jointly, resulting in 18 distinct cell populations visualised in the UMAP projection in Figure 1B [see Methods]. Clustering stability was assessed by clustering both transcriptomic and epigenetic data evaluated jointly and separately, and these robustness measures are reported in Supplementary Figure 2 [see Methods].
![Workflow and identification of single nucleus clusters. A] General multi-ome workflow from sample generation to library preparation. PBMC were partitioned with 10x Genomics followed by snRNA-seq and snATAC-seq library generation from the same cell. B] UMAP projection of 18 cell types obtained from jointly clustering on snRNA-seq and snATAC-seq features. C] Stacked barplots showing the proportion of peak types obtained from peak calling with MACS2 for each cell type. D] Dot plot showing the gene expression [top] and chromatin accessibility [bottom] of the top gene markers identified for each cell population. PBMC, peripheral blood mononuclear cells; Tcm, central memory T cell; GdT, gamma delta T cell; MAIT Cells, mucosal associated invariant T cells; NK cells, natural killer cells.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ecco-jcc/18/12/10.1093_ecco-jcc_jjae055/1/m_jjae055_fig1.jpeg?Expires=1747886363&Signature=jRi-JWwfgN7YWXyIw~7IGW5k30ef3sY~03j9UCfa9pheLQ8HjWe9ZzB0dKIAUbVCvu6B4Fn9UqpWTVay4ESnkAgPrdVmh5si2xM~LPndPTIEh9WT8yfiN1bedsdWWus29Q2pPc27WKuAOoTg~lZR8r9Ww3Mb3ze1WXvN~s4kYhIds~-XqXQ7SpCMuN~qW9S6bDu03daD9R~0CRpddKy3r0VX6q3Hpf10WczWNiGHDvexuDuNyi5ICARxWS~6awXZQHwexXSMgOoEyvSlYmMIUOpd9moZtxc0DEfQ1aYpmQh~KIKb2iQJlC69lYnzimIACxQ8O6Rrd8Nwxmx~30wdXw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Workflow and identification of single nucleus clusters. A] General multi-ome workflow from sample generation to library preparation. PBMC were partitioned with 10x Genomics followed by snRNA-seq and snATAC-seq library generation from the same cell. B] UMAP projection of 18 cell types obtained from jointly clustering on snRNA-seq and snATAC-seq features. C] Stacked barplots showing the proportion of peak types obtained from peak calling with MACS2 for each cell type. D] Dot plot showing the gene expression [top] and chromatin accessibility [bottom] of the top gene markers identified for each cell population. PBMC, peripheral blood mononuclear cells; Tcm, central memory T cell; GdT, gamma delta T cell; MAIT Cells, mucosal associated invariant T cells; NK cells, natural killer cells.
The immune cell types within PBMC were annotated with respect to canonical gene markers examined in both the gene expression and chromatin accessibility data. Within the monocyte compartment, we identified CD14 + monocytes, FCGR3A + monocytes, and a proinflammatory monocyte population expressing high levels of CD86 and ITGAX which encodes CD11c, an integrin observed to be expressed during an inflammatory response.41 Both conventional and plasmacytoid dendritic cells were observed in small numbers, in addition to four B cell populations labelled as transitional naive B cells, resting naive B cells, an activated B cell population expressing high levels of CD83 and ZEB1, and antibody-secreting plasma cells.42 The T cell compartment consisted of seven populations, including naive CD4 + T cells, central memory T cells [Tcm], gamma delta T cells [GdT], Th1/Th17 cells, mucosal-associated invariant T [MAIT] cells, CD8 + cytotoxic T cells, and an interferon-positive T cell population marked by elevated expression of interferon response genes including STAT1, ISG15, and IL12RB2.43–45
Gene markers used to identify cell populations were observed in the gene expression data and the chromatin accessibility data. ATAC-seq peak calling performed with MACS2 identified 519 220 peaks, of which 51.4% were observed in intronic regions, 35.1% in distal regions, 7.0% in exonic regions, and 6.5% in promoter regions. These proportions were consistent across all cell types [Figure 1C]. A total of 7060 DEGs were detected using the Wilcoxon rank sum test implemented using the FindAllMarkers[] function in Seurat v4, 50 of which are plotted in Figure 1D. Transcriptomic enrichments exhibited greater cell type specificity compared with chromatin accessibility, which primarily distinguished higher-level myeloid, B cell, and T cell compartments. Whereas more refined sub-clusters can be identified in the snRNAseq data, subsequent analyses focus on the cell clusters identified by joint profiling.
3.2. Transcriptomic profiles of adaptive immune cells reveal heterogeneity in PBMC of healthy donors and donors with Crohn’s disease
To understand heterogeneity of the PBMC samples, we performed hierarchical clustering on the transcriptomic profiles of the adaptive immune cells. Raw counts for total B and T cells were converted to counts per million [CPM] as pseudobulk profiles per donor, and Ward’s clustering was performed on the top 5000 expressed genes depicted on Figure 2A [see Methods]. Four clusters were observed for each compartment, labelled Groups 1 through 4 for the B cells and A through D for the T cells. Remarkably, similar grouping of donors [~80% concordance] was observed in both the B and T cell compartments. Two Crohn’s disease individuals, Samples 4 and 12, exhibited the most distinct transcriptomic profiles in both the B and T cell pseudobulk profiles, but the other three groups contained a mixture of healthy and Crohn’s disease samples, indicating that they are not disease-specific.
The samples were processed in three batches, although this did not make a significant impact on overall expression profiles [see Methods]. To ensure robustness, we used Combat31 to adjust for three batches in the pseudobulk profiles of the B and T lymphocyte compartments separately. This collapsed the four observed clusters in the unadjusted dataset to three by splitting B cell Group 3/T cell Group D [Supplementary Figure 6]. Crohn’s Sample 17 clustered with the two outlier samples in both compartments. Notably, there was almost complete co-clustering at the level of three groups, with 25 of the 26 B cell samples in the same group as their corresponding T cell sample. Since batch sizes are small, with approximately 10 samples each, there is inevitable confounding biology with batch, and Combat may overfit the technical effect. To confirm this, we performed Combat normalisation after five rounds of permuting batch labels. The results were similar, though intermediate between the unadjusted and true batch-adjusted datasets; 17 of 26 samples consistently co-clustered across all permutations, with most disagreements related to B cell Group 3 merging with T cell Group B instead of D. We conclude co-clustering is robustly supported at the level of three groups, the smallest of which is only seen in the Crohn’s disease samples, and there is less support for a possible further subdivision within one of the larger groups. Our further analyses nevertheless proceed assuming four clusters.
Cell type proportions per sample are shown in Figure 2B, showing all cell types present in most samples. However, Crohn’s Individuals 4 and 12 exhibited notable expansion of three cell populations: proinflammatory monocytes, activated B cells, and Th1/Th17 cells, suggesting unique biology in these patients. Visually, there is some correspondence between Group 2/B and cell proportions, but no clear differentiation of Groups 3/D and 4/C. To assess whether subpopulations were driving the hierarchical clustering of donor pseudobulk groups, we performed ANOVA on the proportions of cell subpopulations within each group [see Methods]. In the B cell compartment, activated B cells were significantly expanded in Group 1 [p = 3 × 10-8, Supplementary Figure 7A]. Excluding Group 1, ANOVA detected significant differences in plasma cell and activated B cell proportions [p = 0.01 and 0.03, respectively], and post hoc comparison showed elevated plasma cell proportions specifically in Group 2. It should also be noted that pseudobulk profiles may be influenced by elevated UMI counts observed in plasma cells. Within the T cell compartment, Th1/Th17 cells were significantly expanded in Group A [p = 4 × 10-16, Supplementary Figure 7B], and analyses excluding Group A suggest that IFN responsive T cells were elevated in Group B and Tcm were replaced by CD8 cytotoxic T cells in Group D [p = 0.02 and 0.04, respectively]. However, the notable variability among individuals within groups results in modest significance values, suggesting that cell type proportions alone do not dictate pseudobulk profile group assignments.
Regardless of the underlying cause of the correlated clustering of individuals’ transcriptomes into distinct groups, we hypothesised that these individuals have distinct gene regulatory mechanisms that correlate with these differentiated groups. We identified DEGs for each group within the B and T cell compartments [see Methods]. Within the B cells, each group exhibited distinct DEGs, including many transcription factors, suggesting distinct regulatory mechanisms within the groups. Group 1 had the most DEGs with 1581 genes [including 142 transcription factors], Group 2 had 954 DEGs [including 61 transcription factors], Group 3 had the fewest DEGs with 275 genes [including 33 transcription factors], and Group 4 had 884 DEGs [including 60 transcription factors]. Expression of the top upregulated transcription factors per group is shown in Figure 2C. Notably, Group 1 upregulated ZEB1 and NF-Kappa B complex transcription factors [NFKB1, REL, RELB], which have been shown previously to be associated with inflammation and immune cell activation.46,47 Group 2 upregulated plasma cell specific transcription factors PRDM1 and XBP1, likely due to increased plasma cell numbers in Crohn’s Samples 7 and 8.48 Group 3 shared many DETFs [differentially expressed transcription factors] with Group 1, suggesting similar regulatory biology between these groups, including upregulation of BACH1 and HIF1A, which have been associated with oxidative stress.49,50 Group 4 B cells exhibited the fewest upregulated transcription factors [n = 5], including LYL1 and KLF2, both implicated in controlling cellular homeostasis and proliferation.51,52
![Heterogeneity of adaptive immune cells. A] Pseudobulk clustering of B and T cells on the top 5000 expressed genes per population identifies four distinct subsets of individuals with ~80% concordance across B and T cell compartments. Concordant groups are colour coordinated, with Group 1 B cells being concordant with Group A T cells, Group 2 with Group B, Group 3 with Group D and Group 4 with Group C. B] Stacked bar plots showing the proportions of cell types per individual. C] Dot plots showing the gene expression of the top differentially expressed transcription factors identified for each group within the B cell [top] and T cell [bottom] compartments. D] Pathway enrichment shown as heatmaps for each group within the B [left] and T [right] compartments for each group. CPM, counts per million.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ecco-jcc/18/12/10.1093_ecco-jcc_jjae055/1/m_jjae055_fig2.jpeg?Expires=1747886363&Signature=XMkKubE7j9GzXcxQZb3rWs-znPUMjOtW~n6ZRbsdzGQ0PzhCB7q~IxS6EwcYm3HFCLjlUlPz3gBXxFeSiAw6CjM4FTf-CJZsML4v8PghqlPc53267ZthcslJRi1QOI-R1bIzb~Nb~66PmrV8h~zRA8jz~ibOS9kg3JhvzmIGG-jOXgqk06uSJcuzq-02DU1mCEEIkeelVM0W5nTaXnmuqD8x1Dh4vMdNEX2s-g4W-25~elwQ7z5giOQL-x7obLIsOs3nRWs~mh679wUpT~74uJh0ufH~Ediru~q7rPb~7tqD4IwlETFZwkLzW4Bl65GZw3GjQAT0yFAyZYsV1Fk7cg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Heterogeneity of adaptive immune cells. A] Pseudobulk clustering of B and T cells on the top 5000 expressed genes per population identifies four distinct subsets of individuals with ~80% concordance across B and T cell compartments. Concordant groups are colour coordinated, with Group 1 B cells being concordant with Group A T cells, Group 2 with Group B, Group 3 with Group D and Group 4 with Group C. B] Stacked bar plots showing the proportions of cell types per individual. C] Dot plots showing the gene expression of the top differentially expressed transcription factors identified for each group within the B cell [top] and T cell [bottom] compartments. D] Pathway enrichment shown as heatmaps for each group within the B [left] and T [right] compartments for each group. CPM, counts per million.
Similarly, distinct DEG profiles were observed within the T cell groups, with some overlapping DETFs allowing establishment of concordance with the B cell groups. Group A [concordant with Group 1 B cells] was also observed to be the most distinct, with 2241 DEGs [including 183 transcription factors]. Group B [concordant with Group 2 B cells] had 916 DEGs [51 transcription factors], Group C [concordant with Group 4 B cells] had 600 DEGs [including 43 transcription factors, 41 of which are downregulated], and Group D [concordant with Group 3 B cells] had the fewest DEGs with 363 genes [including 33 transcription factors]. Expression of the top upregulated transcription factors per group is shown in Figure 2C. Like the B cells, distinct DETFs were observed within each donor group, suggesting distinct gene regulatory mechanisms contributing to heterogeneous transcriptomic profiles. Group A T cells shared five of the top 10 DETFs with their B cell compartment analogue, upregulating ZEB1 and NF-Kappa B complex transcription factors, suggesting increased inflammation and immune cell activation. Group C only upregulated two transcription factors, KLF2 and LEF1. Group D shared similar DETF gene expression with Group A, as well as Groups 1 and 3, including REL, JUNB, and JUND, implying inflammation and immune cell activation.53
Gene ontology enrichment-based pathway analysis on DEGs for each subset of donors in the B and T cell compartments [see Methods, Figure 2D] confirmed shared biological mechanisms across compartments. B cell Group 1 and T cell Group A differentially upregulate TGFß, a critical factor for cellular development, proliferation, and differentiation, and [relevant to the context of Crohn’s disease] fibrotic progression.54,55 These donors were also observed to differentially regulate genes within the NF-Kappa B signalling and MAPK signalling pathways in B and T cells. Group 2 donors within the B cell compartment upregulated genes involved in interferon signalling and EGF/EGFR signalling, which were also enriched in upregulated genes for concordant T cell Group B. Group 3 B cells and concordant Group 4 T cells exhibited lower-level upregulation of similar pathways as Group 1, including TGF-beta, NF-Kappa B, and MAPK signalling. These pathway enrichments in each group’s DEGs further suggest these individuals are heterogeneous with distinct biological programmes reflected by their transcriptomic profiles.
3.3. Mapping chromatin accessibility and transcription factor activity identifies differential gene regulatory networks in donor groups
Our multi-omic dataset of snRNA-seq and snATAC-seq data obtained from the same cells provides the opportunity to characterise differential GRNs. We identified DEGs and differentially accessible regions [DARs] in the chromatin associated with each group within both the B and T cell compartments [see Methods]. DARs were linked with genes by identifying overlapping cis-regulatory regions for each DEG. Potential transcription factor binding sites were identified within those DARs, revealing candidate transcription factors with regulatory potential for the target DEG [see Methods, Figure 3A]. Focusing on transcription factors, we summarised this data in circos plots for each group [Figure 3B–E] with rays indicating potential positive [red] or negative [blue] interactions. Interactions initiated by the TF binding open chromatin in the vicinity of a target TF are indicated by the broken second circle, and potentially responsive interactions are shown by extension of the ray to the outer circle.
![Gene regulatory network analysis. A] Rationale for linking transcription factors with genes. B–E] Circos plots depicting potential regulatory relationships for DETFs in B cells for Group A [B], Group B [C], Group C [D] and Group D [E]. Log2FC is the average log2 fold change of the gene expression of the transcription factors for each group from differential testing. Rays indicate potential interactions among transcription factors. Interactions initiated by the TF binding open chromatin in the vicinity of a target TF are indicated by the broken second circle, and potentially responsive interactions are shown by extension of the ray to the outer circle. TSS, transcription start site; TES, transcription end site.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ecco-jcc/18/12/10.1093_ecco-jcc_jjae055/1/m_jjae055_fig3.jpeg?Expires=1747886363&Signature=t0VaLzidbDdrenz4N~kH8PtXQ8KEs5WCh8WI3G6JBFi-RhMziQbilvmOrfNqxaQQN0kxPNouPO2vGZbT8TLDLd5cimbbtSCj-jjWecfI5w8iJmUmDugL-LfKiu3q6w7yZit266xs7PtkPDBqgWpsWZzBX5k-VTCYGZQyaZWSDVQXTrEuQpYUjISX3Jl6FTezOtIDvw1uOBsdG5T9r~QoSXrnkZsWxpE9tm5jn3rNmxa6mmLRlXccWne~HaCMQf7v~K4pvi19~1BbtHKk3JKT91Iu2CRklZAHiK7tcvMUftWLkO99NG7AGmFBMdrnZiiruw4KysuAvGgYwW-QxgqFFg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Gene regulatory network analysis. A] Rationale for linking transcription factors with genes. B–E] Circos plots depicting potential regulatory relationships for DETFs in B cells for Group A [B], Group B [C], Group C [D] and Group D [E]. Log2FC is the average log2 fold change of the gene expression of the transcription factors for each group from differential testing. Rays indicate potential interactions among transcription factors. Interactions initiated by the TF binding open chromatin in the vicinity of a target TF are indicated by the broken second circle, and potentially responsive interactions are shown by extension of the ray to the outer circle. TSS, transcription start site; TES, transcription end site.
The transcription factor networks reveal different regulatory architectures among the four groups of donors, both in density of interactions and the involvement of specific DETFs. Within the B cell compartment, we identified 2000 potential regulatory relationships in Group 1, 281 in Group 2, 239 in Group 3, and 131 in Group 4. In Group 1, TCFL5, NRF1, ZEB1, and NFE2L2 were observed among the TFs with the most potential gene regulatory activity. Group 2 had the most potential transcriptional regulatory activity, with PRDM1 potentially targeting 35 DETF targets and itself receiving possible input from 13 regulators [including itself]. The gene PRDM1 encodes Blimp-1, a plasma cell master regulator, which aligns with higher proportions of plasma cells in this group.48,56 Within Group 3, BACH1 exhibited the most potential transcriptional regulatory activity, with 24 potential DETF targets and 14 potential regulators. BACH1 is reported to work with BACH2 in germinal centre regulation and B cell activation.48 The differential transcriptional regulatory profile for Group 4 was dominated by repression [n = 42 downregulated DETFs with potential gene regulatory relationships with DETFs], mediated primarily by KLF2, SPIB, and TFEC, all known to be essential for maintenance of B cell homeostasis and development.52,57,58
One DEG of interest encodes ZEB1, the highest differentially expressed transcription factor in the Crohn’s specific subset within both the B and T cell compartments. ZEB1 has been described as a master transcriptional regulator of the epithelial-mesenchymal transition in colorectal cancer [CRC] patients.59 In the B cell compartment, ZEB1 was potentially regulated by 24 DETFs, and within the T cell compartment there were 15 potential regulators of it. Notably, we found ZEB1 binding motifs in DARs within the cis-regulatory region of ZEB1 itself in both the B and T cell compartments, suggesting that ZEB1 may self-activate. An expanded positive feedback loop may be engaged, as potential ZEB1 transcription factor binding sites are observed in DARs within cis-regulatory regions for NFKB1, REL, FOXK1, FOXK2, MAX, GABPA, and NRF1, all of which harbour potential transcription factor binding sites in DARs within the cis-regulatory region of ZEB1.
3.4. Chromatin reprogramming in activated immune cells of two Crohn’s disease samples is associated with increased TGFß-signalling
As we observed a distinct TGFß signalling signature in a subset of our Crohn’s disease samples across multiple cell types, we investigated the nature of the differential regulation of this pathway. We conducted CellChat analysis of receptor-ligand interactions for Group 1 Crohn’s donors to determine which cell populations had potential to interact with each other [see Methods].34Figure 4A shows an interaction network highlighting the prospective engagement of multiple cell populations expanded in Crohn’s Group 1 samples, including activated B cells, proinflammatory monocytes, TGFB1 + NK cells, GdT cells, and Th1/Th17 cells], in cell-to-cell communication. Gene expression of the DEGs for Crohn’s Group 1 monocytes, B cells, T cells, and NK cells that are enriched in the TGFß signalling pathway are shown in Figure 4B, in comparison with healthy control cells. Notably, TGFB1 was differentially upregulated in Crohn’s Group 1 B cells and T cells, TGFBR1 was found to be differentially upregulated in Crohn’s Group 1 B cells, and TGFBR2 was found to be differentially upregulated in Crohn’s Group A T cells and monocytes compared with healthy controls.
Differentially expressed genes and accessible chromatin related to TGFß signalling in PBMC samples for Group 1 Crohn’s donors. A] Chord diagram illustrating the cellular crosstalk based on the gene expression of ligand and receptor pairs from the TGFß signalling pathway across cell type populations in two Crohn’s disease donors who differentially upregulate TGFß gene. B] Gene expression of genes from the TGFß signalling pathway which were found to be differentially expressed in Crohn’s immune cells versus healthy immune cells. C] Chromatin accessibility depicted as track plots and gene expression shown as log normalised counts on violin plots for ligand and receptor genes within the TGFß signalling pathway. Peaks highlighted in blue were found to be differentially accessible. D] Putative links of Crohn’s specific differentially expressed transcription factors [DETFs] and TGFß ligand/receptor pair genes via differentially accessible chromatin [DARs]. Links indicate a potential binding site of a DETF within a DAR which overlaps the cis-regulatory region of the target gene. PBMC, peripheral blood mononuclear cells.
Next, we asked whether chromatin accessibility at TGFß ligand and receptor genes is differentiated between Group 1 and all healthy controls. DARs were identified for Crohn’s Group 1 cells by comparing peaks identified from MACS2 in the snATAC-seq data of these donors relative to healthy controls [see Methods]. Several DARs were found in proximity to TGFB1, TGFBR1, and TGFBR2 loci across several cell populations as shown in Figure 4C, identifying candidate regulatory regions responsible for the T cell activation in this pair of Crohn’s disease patients. These DARs contained potential binding sites for several transcription factors identified in Figure 3B. Different numbers of DARs were observed in each gene and cell type, namely 5, 8, and 11 in TGFB1, TGFBR1, and TGFBR2, respectively, in monocytes; 3, 5, and 0 in the B cell compartment; and 6, 1, and 4 in the T cell compartment. Some of these transcription factors have been shown previously by ChIP-seq experiments to target TGFß ligand/receptors in [including FOSL2, JUND, NFKB1, SMAD3], but their specific modes of regulation, whether activation or repression, remain largely unknown.60–62 Whereas it is unlikely that all potential interactions defined by this in silico analysis are functional, our results suggest a complex transcriptional regulatory programme associated with increased TGFß signalling in a small subset of patients with Crohn’s disease within circulating immune cells.
3.5. A mutation mapped to TFBS in Crohn’s specific accessible chromatin suggests mechanisms for disruption of gene regulation
A subset of our multi-ome samples was genotyped using the H3Africa array which includes common single nucleotide polymorphisms [SNPs] observed in multiple African populations.19 We identified 2579 variants overlapping Crohn’s specific DARs containing potential DETF binding sites within DEG cis-regulatory regions in the B cell compartment, and 1886 in the T cell compartment. The two Group 1 Crohn’s disease samples have 11 variants located within these DARs not observed in the healthy donors within the B cells, and 17 in the T cells. One of these variants, rs78857247 C > A, is located within a DAR 67 kb downstream of the DEG SMAD4 [Figure 5A]. SMAD4 is known to become activated by TGFß signalling in cancer and acts as a tumour suppressor that inhibits epithelial cell proliferation.63,64 According to the ALFA database of dbGaP SNPs, the minor allele frequency in African Americans is 0.02, compared with 0.11 in Europeans, but it is 0.17 [5 of 30 genotyped alleles] in the Crohn’s disease multi-ome subset and absent from the six healthy controls.65 This is highly significant enrichment even for a small sample, but likely fortuitous since the odds ratio for rs78857247 in the largest African American IBD GWAS study is 1.01, and is well below the standard genome-wide significance threshold.66 Since the two Group 1 individuals are a heterozygote and homozygote for this rare variant, it may be a risk factor for the development of their aberrant inflammatory profile.
![Potential disruption of SMAD4 gene regulation. A] Differentially accessible chromatin within Crohn’s disease B cells overlaps potential binding sites for differentially expressed transcription factors [DETFs] ZEB1 and SP3, which may be affected by SNP rs78857247 observed in our Crohn’s disease samples. B] SMAD4 eQTL observed in eQTLGen, in which rs78857247 is located. Blue SNPs represent those which are fine mapped to be within the credible interval. SNP, single nucleotide polymorphism.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ecco-jcc/18/12/10.1093_ecco-jcc_jjae055/1/m_jjae055_fig5.jpeg?Expires=1747886363&Signature=YPtYtmF6Ky6UzIoSn55M~OCiGesjg4pLcrIzGAN~6bA5iPgiz892H8~h3laLOL8tV4U8NeDGGlqyjJzsquk8YL~R-yEGszP09cuSEOw8BX82Ni6oN81fB6vU7WG-mBLPMsoRTkf4ClMReSoz4l7MYBr3Rpfwhs3ancd4nGgZOYncaAK7l3D4vRzTNecgF-4O7hNHa1uyK4IcAmN7L8br-D4q67b0zPUBg2LfseEbOydBdXkwwDBVK1BOMPstEGcMIXTtYvLWCT~NcsHjWxMLjOAaKPbFR7nPF8YzlzKfBf6-DgrfELSSMKrnuiXDxmPWaBAZciat4Oz8vsYrkv76Lw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Potential disruption of SMAD4 gene regulation. A] Differentially accessible chromatin within Crohn’s disease B cells overlaps potential binding sites for differentially expressed transcription factors [DETFs] ZEB1 and SP3, which may be affected by SNP rs78857247 observed in our Crohn’s disease samples. B] SMAD4 eQTL observed in eQTLGen, in which rs78857247 is located. Blue SNPs represent those which are fine mapped to be within the credible interval. SNP, single nucleotide polymorphism.
The location of rs78857247 overlaps a potential transcription factor binding site for two differentially upregulated transcription factors of the Group 1 donors, ZEB1 and SP3. The alternative minor variant involves a cytosine to adenine switch at a position which does not appear to substantially depend on a cytosine for ZEB1, whereas it does for SP3 [Figure 5A]. With this observation, we hypothesise that the A substitution reduces SP3 binding affinity specifically. The transcription factor SP3 recruits histone deacetylases to repress expression of a proximal gene target, even if the binding site is located downstream of the gene body.67 Reduction of SP3 binding affinity may allow ZEB1 to outcompete SP3, facilitating SMAD4 transcription. Supporting this hypothesis, the eQTLGen Consortium reports a SMAD4 eQTL that overlaps with this variant and the associated DAR [Figure 5B]. Furthermore, our own fine mapping showed that rs78857247 overlaps a Bayesian credible set of SNPs within this eQTL.68
4. Discussion
Crohn’s disease is a complex gastrointestinal illness influenced by genetic, immune, and environmental factors. Previous studies have not adequately included individuals with African American ancestries, who remain underrepresented in biomedical genomic research. Single nucleus transcriptomic and epigenetic studies on Crohn’s disease have also focused on cells from colonic tissue, despite evidence of differences within circulating immune cells. Our study aimed to address these gaps by profiling the multi-omic landscape of PBMC in 28 individuals, using both snRNA-seq and snATAC-seq data, allowing us for the first time to describe heterogeneity of GRNs within B and T cell compartments of Crohn’s disease and healthy control subjects. We identify an unexpectedly high concordance between the pseudobulk gene expression profiles of the B and T cell compartments and identify two individuals with a disease-specific activated immune cell profile driven by altered TGFß signalling.
Large-scale PBMC profiling at the single-cell level has been used for characterising overall differences between sample types, although studies are beginning to emphasise the heterogeneity of profiles.30,69 At least eight clusters of individual blood profiles have been documented in systemic lupus erythematosus [SLE], characterised by gene expression within the B, T, and myeloid compartments.10 Clinical outcomes vary among these profiles, suggesting individuals with distinct PBMC characteristics may exhibit varying symptoms. Similarly, we have shown that complex malaria is accompanied by multiple profiles of peripheral blood expression changes.70 Crohn’s disease is known for its transcriptional diversity in the colon, and numerous DEGs related to immunity correlate with penetrating rather than stricturing disease progression, overlaying considerable heterogeneity.71
Despite our relatively small sample of 28 individuals, our analysis identified four sub-clusters of PBMC profiles that to our knowledge are for the first time shown to be surprisingly concordant between the B and T cell compartments. Collapsing single-nucleus expression into pseudobulk transcriptomes revealed 80% of donor samples align with the same group in both cell types, surpassing the 30% expected by chance. This robust result holds true to different hierarchical clustering algorithms or inclusion criteria, despite minor switches in group membership for a handful of samples. This highlights the significance of understanding shared and distinctive aspects of B and T cell biology in Crohn’s disease, providing insights into coordinated responses of these cell types in disease pathogenesis. One group consists of two outlier Crohn’s disease samples with an aberrant profile, and another includes two individuals with elevated plasma cell counts. However, there is considerable interindividual variability in the abundance of B and T cell subsets within each group, suggesting gene expression and cellular proportions jointly drive clustering. Reanalysis of the OneK1K dataset comprising scRNA-seq of PBMC from ~1000 healthy Australian adults, summarised in Supplementary Figure 5 and described in Methods, demonstrates similar concordance of B and T cells [approximately 62%] between three major clusters of subjects, further supporting our findings. Due to study batch effects, we were unable to combine our dataset with Onek1k data; however we observe similarities in the major groups, and demonstrate that PC1 scores of the Crohn’s multi-ome group DEGs for B and T cells separate the groups within the Onek1k B and T cells [Supplementary Figure 5C]. This replication of concordance in a larger cohort further highlights interindividual variability of PBMC transcriptomes and the importance of confronting this heterogeneity in future studies.
The observed profiles may be due to intrinsic variability in the activity of key transcription factors shared between compartments or to the response of transcriptional networks to cytokine communication among immune cells. A combination of both mechanisms is also likely. Shared DETFs across the groups, including ZEB1, NFKB1, and SKIL in Groups 1/A, KLF2 in Groups 4/C, and JUNB in Groups 3/D, support the intrinsic transcription factor hypothesis. Individual-level analysis confirms these DETFs tend to exhibit similar expression levels within groups [Supplementary Figure 8]. However, these patterns might reflect a transcriptional response to the cytokine milieu, as immune profiles tend to return to an individuals’ normal state after perturbation.72,73 This dual perspective on the potential drivers of group profiles underlines complex interplay between intrinsic and extrinsic cellular heterogeneity and dynamic responses to environmental stimuli within the immune system. The profiles are independent of other known immune subtypes, suggesting the presence of crosstalk or regulatory mechanisms that remain to be characterised. Although concordance is not 100%, the heterogeneity of immune profiles within human blood is notable for implications in health and disease.
We focused our evaluation of GRNs in the total B and T cell compartments, despite expected differences in subpopulations of B and T cells. This decision was made for two reasons. First, ATAC markers show less cell type discrimination across cell types than DEGs [Figure 1D], and aggregating cells within each compartment provided greater statistical power since accessible chromatin was used to evaluate enrichment of transcription factor binding sites. Second, our main interest was in detecting regulatory networks contributing to heterogeneous groups into which individuals cluster. Evaluating GRNs in multiple cell types may offer more detailed networks, but this could overwhelm the higher-level signatures operating throughout and across compartments. By prioritising a holistic view of GRNs, our study captures overarching regulatory mechanisms that shape the immune landscape, providing a foundation for further investigations into the nuanced roles of distinct cell populations within the B and T cell compartments.
Our interpretation of the GRN represented in Figure 3 does not imply that there are only four networks, or that each individual exclusively engages a single network across all B or T cells. This is evident in Group 2, where only two individuals with Crohn’s disease have an excess of plasma cells, in addition to other unique DETF profiles observed in different individuals [Supplementary Figure 8A]. For the remaining groups, most individuals displayed differential expression of each DETFs. These results suggest attractor states in B and T cells characterised by specific suites of transcription factors regulating one another to shape the overall transcriptional profile. This provides a framework for understanding the stable yet dynamic nature of immune cell transcriptional profiles, contributing to our comprehension of regulatory dynamics underlying immune responses. The TGFß receptor-ligand co-activation network in Group A is an example of one potential feedback circuit, and our analysis suggests the primary contributors of this signature in the B and T cell compartments were from activated B cells and Th1/Th17 cells. The complexity and redundancy of these networks are notable, with each DETF potentially binding to numerous targets regulated by multiple transcription factors. The complexity of these networks and fluidity of gene expression may be a hallmark of immune responses. Boolean networks can help elucidate these dynamic transitions as they model switch-like behaviours of genes being turned on or off in response to environmental stimuli,74 allowing for exploration of the different regulatory states that circulating immune cells can assume in healthy individuals and those with Crohn’s disease. These findings are only reported from observations in the peripheral blood, limiting the scope of what can be inferred about environmental stimuli and activity in other tissue sites. Further research should address whether establishment of these GRNs is a response to the cytokine milieu or is from the regulatory logic due to polymorphism, which bias the dominance of specific transcription factors within an individual’s compartment.75,76
Two Crohn’s disease patients exhibited a PBMC profile distinct from the other 26 samples, which is not observed in the OneK1K dataset, suggesting potential association with inflammatory disease. This consists of elevated proportions of proinflammatory monocytes, activated B cells, and Th1/Th17-biased T cells, all with a signature of TGFß activation. No clinical metadata including treatment, response, age, and sex, provide an explanation for why these two donors have similar transcriptional profiles, suggesting the observations are due to similar inherent biological signatures. Although the potential therapeutic impact of targeting TGFß signalling is uncertain, this finding provides an example of the potential clinical benefits of single-nucleus omics in personalised medicine. A tantalising feature of these two individuals is that they both carry a polymorphism predicted to disrupt SP3 binding, which may induce SMAD4, an activator of the TGFß pathway. For an SNP with an allele frequency of 2% in African Americans, seeing it in three of the four alleles of these two patients is highly unexpected. We observed two other Crohn’s individuals to be carriers without having the inflamed profile, so although it is not sufficient to cause it, it may be a predisposing factor. The variant is not associated with Crohn’s disease in African Americans,77 and is not African American-specific [11%, in European ancestry cohorts]. Although this variant is not linked to Crohn’s disease in African Americans, future studies should evaluate whether it is associated with systemic inflammatory disease, specifically in African ancestry individuals. Our results illuminate potential regulatory mechanisms of increased TGFß signalling, but we acknowledge the exploratory nature of our analysis, and the functional relevance of all identified interactions requires experimental validation. Profiling other immune conditions is required to establish whether it is specific to IBD.
Limitations of this study include small sample size, restricted metadata, absence of a multi-omic replication cohort, and restriction of sampling to a single time point from one tissue. Whereas the majority of the Crohn’s disease patient samples clustered with three groups of healthy controls, further investigations with larger cohorts are necessary to evaluate whether membership of one or the other group is a risk factor for inflammatory disease or adverse outcomes. Paired tissue samples, such as lymphoid or intestine, would enhance the relevance of circulating immune cells in the peripheral blood to Crohn’s disease pathology and interactions between cell types. Obtaining metadata for clinical observations of inflammation severity is additionally necessary to identify associations of biological signatures and inflammation, reflected in the transcriptome, epigenome, and clinical presentation. Longitudinal profiling would be valuable to assess stability of transcriptome profiles over time, although previous studies using bulk RNA-seq and flow cytometry suggest individuals maintain stable profiles over decades.73 Inclusion of snATAC-seq data would also provide insights into the role of feedback loops in establishing GRNs.
In conclusion, this study investigated the multi-omic landscape of PBMC profiles in 28 individuals, revealing heterogeneous gene regulatory mechanisms in healthy individuals and individuals with Crohn’s disease. This allowed for characterisation of GRNs and heterogeneity in the B and T cell compartments, revealing four sub-clusters of PBMC profiles with concordance between the B and T cells. This identified a subset of Crohn’s disease individuals with a distinct PBMC profile characterised by an activated immune cell profile related to TFGß signalling supported by differential gene expression and chromatin accessibility. These findings emphasie the need for the further investigation of TFGß signalling within PBMCs of individuals with Crohn’s disease, along with exploring broader heterogeneity of Crohn’s disease and PBMC profiles of individuals. The focus on heterogeneity contributes to our understanding of Crohn’s disease and underscores the broader implications of immune system diversity in health and disease. This research has particular relevance in understanding the diverse mechanisms of health and disease, leading toward development of personalised medicine approaches.
Funding
This project was funded by the National Institute of Diabetes and Digestive and Kidney Diseases [grant number R01 DK087694] awarded to SK and GG.
Conflict of Interest
The authors declare no conflicts of interest.
Author Contributions
MB and GG prepared the presentation of the work, performed the bioinformatic pipelines, and wrote the drafts of the manuscript. SK provided the blood samples for the experiments. AD and VK conducted the assays and experiments. AD and FS wrote the Methods section of the manuscript. AD also provided the metadata for the patients. FS performed the sequencing of the libraries. SN performed the initial quality control for the transcriptome samples. EG identified the eQTL overlap and helped generate figures. GG and SK formulated the overall research plan, acquired financial support for the project, and edited the manuscript.
Acknowledgements
We thank all donors for their participation, as well as David Cutler, Emory School of Medicine, Atlanta, Georgia for correspondence on the African American IBD GWAS.
Data Availability
All snATAC-seq and snRNA-seq data are available through the Gene Expression Omnibus through accession GSE244831.