SEanalysis 2.0: a comprehensive super-enhancer regulatory network analysis tool for human and mouse

Abstract Super-enhancers (SEs) play an essential regulatory role in various biological processes and diseases through their specific interaction with transcription factors (TFs). Here, we present the release of SEanalysis 2.0 (http://licpathway.net/SEanalysis), an updated version of the SEanalysis web server for the comprehensive analyses of transcriptional regulatory networks formed by SEs, pathways, TFs, and genes. The current version added mouse SEs and further expanded the scale of human SEs, documenting 1 167 518 human SEs from 1739 samples and 550 226 mouse SEs from 931 samples. The SE-related samples in SEanalysis 2.0 were more than five times that in version 1.0, which significantly improved the ability of original SE-related network analyses (‘pathway downstream analysis’, ‘upstream regulatory analysis’ and ‘genomic region annotation’) for understanding context-specific gene regulation. Furthermore, we designed two novel analysis models, ‘TF regulatory analysis’ and ‘Sample comparative analysis’ for supporting more comprehensive analyses of SE regulatory networks driven by TFs. Further, the risk SNPs were annotated to the SE regions to provide potential SE-related disease/trait information. Hence, we believe that SEanalysis 2.0 has significantly expanded the data and analytical capabilities of SEs, which helps researchers in an in-depth understanding of the regulatory mechanisms of SEs.

was found to change the distribution of SEs via TGF-␤ signaling to orchestrate the functional drift of fibroblasts into the myofibroblastic phenotype ( 3 ). The Wnt signaling pathway can activate the expression of a canonical cancer dri v er MYC by collaborating with oncogenic SEs in colon cancer ( 5 ). An emerging study found that a group of patients with mutations of SE of the TAL1 oncogene exhibited a poor prognosis regardless of the le v el of oncogene d ysregula tion ( 6 ). This stud y demonstra ted tha t the mechanism of SEmediated oncogene dysregulation was critical for clinically distinct patient subgroups, and further emphasized the future of SE targeting thera py ( 6 ). Obviousl y, the complex networks formed by the functional interplay among pathways, TFs, SEs and genes are particularly critical in the study of regulatory mechanisms of many biological processes. Thus, we de v eloped SEanalysis ( 7 ) for downstr eam and upstr eam regulatory network analysis of SEs in 2019.
Exploring SEs has gradually re v ealed their mechanism of action in comple x networ ks. A large number of studies confirmed that TFs played a crucial role in the regulation of SEs. For example, the hormone-stimulated glucocorticoid receptor ( GR ) can bind to the upstream superenhancer region of oncogene DDIT4 to promote the interaction between GR and DDIT4 through the formation of a chromatin loop ( 8 ). Ther efor e, r e v ealing the regulatory mechanism of TFs in SE-associated regulatory networks by dissecting pathway-TF-SE-gene interactions in certain biological processes is of great significance. Further, studies showed that dynamically changing SEs influenced by SNPs were often key factors in promoting cell state changes compared with other SEs. For example, a B-cell-restricted SE, which was in contact with the SNP locus associated with systemic lupus erythematosus (SLE) and was mediated by STA T3 , was associa ted with B-cell deregula tion in SLE ( 9 ). Hence, se v eral studies focused on the clues of marker gene d ysregula tion, especially the upstream SEs and their TFs and SNPs.
To further meet the need of r esear chers and elucidate the regulatory mechanisms of SEs-associated network, we developed SEanal ysis 2.0, w hich significantl y expanded the data and analytical capabilities of SEs ( Figure 1 ). Currently, SEanalysis 2.0 documents 1 717 744 SEs from 2670 samples, including 1739 human samples and 931 mouse samples. Importantly, we added two new analyses ('TF regulatory analysis' and 'Sample comparati v e analysis'), and significantly extended three original analysis functions for understanding context-specific gene r egulation. Furthermor e, we provided annotation information on risk SNPs to link SEs with diseases / traits. Hence, SEanalysis 2.0 not only expanded species and large-scale data, but also facilitated more comprehensi v e anal yses, w hich might further promote the understanding of the biological mechanisms of epigenomic network regulation.

Data expansion
Super -enhancers. Compared with SEanal ysis 1.0, SEanalysis 2.0 provided significant data improvement. We added 931 mouse SE sets and extended 1198 human SE sets from the H3K27ac ChIP-seq data of the SEdb 2.0 database de v eloped by our group. Currently, the SE-related ChIP-seq samples in SEanalysis 2.0 ar e mor e than fiv e times than those in SEanalysis 1.0 (Supplementary Table S1). Specifically, the raw H3K27ac ChIP-seq data were collected from the NCBI GEO / SRA ( 10 ), ENCODE ( 11 ), Roadmap ( 11 , 12 ), Genomics of Gene Regulation Project (GGR) ( 11 ) and National Genomics Data Center Genome Sequence Archi v e (NGDC GSA) ( 13 , 14 ). In particular, the raw sequencing r eads wer e processed via optimized SE identification workflo w (Bo wtie2-MACS2-ROSE) and the r efer ence genomes were updated to hg38 and mm10. Currentl y, SEanal ysis 2.0 includes 1 167 518 human SEs from 1739 samples involving ∼180 cells / tissues and 550 226 mouse SEs from 931 samples involving ∼110 cells / tissues. SE-tar g et g enes. In SEanalysis 1.0, four dif ferent stra tegies were used to annotate SE-associated genes: closest acti v e genes ( 15 ), overlapping genes, proximal genes and the closest genes ( 16 ). Besides these original strategies, SEanal-ysis2.0 also provided two new target gene identification strategies named 'JEME' and 'Prestige'. The 'JEME' strategy constructed enhancer-target networks in 935 samples described in ( 17 ). The 'Prestige' strategy proposed in a previous study ( 18 ) connected the enhancers with their gene targets in 13 samples. We obtained these enhancer-gene relationships to assign target genes to our SEs. The histone modifications were strongly associated with gene expression, such as acti v e promoters were enriched by H3K27ac (19)(20)(21). Thus, we calculated the H3K27ac signal density in gene regions as the activity scores of each gene using deep-Tools (Supplementary Figure S1A, Supplementary Material) ( 22 ). Briefly, we first normalized H3K27ac ChIP-seq da ta with dif ferent sequencing depths using deepTools bam-Coverage with parameter '-normalizeUsing RPKM' to allow for unbiased comparisons of signal intensities. Then, the deepTools computeMatrix was used for extracting the activity scores for each gene with the following parameters 'scale-regions -p 10 -beforeRegionStartLength 3000 -regionBodyLength 5000 -afterRegionStartLength 3000 -skipZeros'.
T r anscription f actor data. We further extended the TF ChIP-seq data and the motif data, which were rapidly accumulated in recent years, to obtain a more comprehensi v e TF-SE relationship. For the TF ChIP-seq data, we looked at the corresponding databases, including ReMap 2022 ( 23 ), Cistrome ( 24 ), ChIP-Atlas 2021 ( 25 ), GTRD ( 26 ) and EN-CODE ( 11 ), all of which added large amounts of highquality TF ChIP-seq data to their updated versions. We obtained 10 710 human TF ChIP-seq samples and 1051 mouse TF ChIP-seq samples involving 1468 human TFs and 446 mouse TFs. Then, we identified TF-SE regulatory relationships based on the TF-binding sites in constituent enhancers of SE regions of corresponding cell / tissue types using BEDTools (v2.25.0) ( 27 ). We also predicted TF-SE regula tory rela tionships based on motif scanning using FIMO software with a P -value threshold of 1e-5 ( 28 , 29 ). For motif data, we collected 3680 human TF motifs of 869 TFs from HOCOMOCO v11 ( 30 ), JASPAR ( 31 ), Jolma2013 ( 32 ), Homeodomains ( 33 ), UniPROBE ( 34 ), and Wei2010 ( 35 ). HOCOMOCO is a new data source added in SEanalysis 2.0, which is a widely used motif source. Further, we added 742 mouse TF motifs of 568 TFs from HOCOMOCO v11 and UniPROBE. In order to reduce the limitations of motifbased prediction, we supported stricter FIMO threshold settings and calculated the number of TF binding sites in SE region to help filter TF-SE relationships.
Risk SNP annotation. We collected risk SNP information from the GWAS Catalog ( 36 ), which contained risk SNP locations, rsID and related diseases / traits. Then, we filtered risk SNPs lacking location information and obtained 449 062 risk SNPs related to diseases / traits. Finally, we used BEDTools to annotate these risk SNPs to the SE regions when the SNP locations overlapped with the constituent enhancers of SE regions (Supplementary Material, Figur e S1B). Furthermor e, we further calculated the number of risk SNPs related to each disease / trait for each sample. These annotation details of risk SNPs were displayed on the w e b server using interactive charts.
Sequence conservation. We first obtained phastCons scores for human and mouse sequences from the UCSC browser, which wer e measur ed based on multiple alignments of 99 vertebrate genomes to the human genome and 59 vertebrate genomes to the mouse genome, respecti v ely (Supplementary Material). Then, we used the bigwigAver-ageOverBed tool to calculate the conservation of each SE ( 37 ).

Enhanced functions in SEanalysis 2.0
In SEanalysis 2.0, the H3K27ac samples were significantly e xpanded. Notab l y, SEanal ysis 2.0 included 1739 human samples and 931 mouse samples. Human SEs were increased from about 330 000 to 1 167 518 and 550 226 mouse SEs were added. Meanwhile, the human TFs increased from 1044 to 1700, and 755 mouse TFs were added. With increasing data, the SE-associated regulatory network covered a larger amount of regulatory information about SEs, TFs , potential pathways , and genes. Thus , the original three analysis functions in SEanalysis 1.0 (pathway downstream analysis, upstr eam r egulatory analysis, and genomic region annotation) were significantly improved in SEanalysis 2.0. Furthermore, we added the gene activity score and risk SNP annota tion informa tion to the original analyses, thus further improving the interpretability of analysis results in SEanalysis 2.0.
We de v eloped a ne w analysis function as 'TF regulatory analysis' for supporting mor e compr ehensi v e analyses of SE regulatory networks driven by TFs, so as to further understand the context-specific TF regulation (Supplementary Figure S2A). The 'TF regulatory analysis' helped users discover the tissues / cells regulated by TFs of interest through SEs, and further elucidate TF-related functions and potential biological mechanisms in the specific tissue / cell. Specificall y, SEanal ysis 2.0 first determined a scope of TFs in each sample based on the 'FIMO Threshold' with the input of TFs of interest and the setting of the enrichment significance P -value, SE-gene linking strategies, and FIMO threshold. Subsequently, we performed the hypergeometric test between the input TFs and TF set of each sample to identify the enriched samples. After the enrichment analysis r esults wer e filter ed through the pr e-set significance P -value threshold', the significantly enriched samples and their information were displayed in the result table. The result table  included 'Sample ID', 'Species', 'Tissue type', 'Biosample name', 'Annotated TF', 'Annotated TF number', 'ALL TF number', 'P value' and 'q value'. Next, the users could further select up to two samples of interest and click on the 'Submit' button to obtain the detailed regulatory information and visualization, including regulatory network, risk SNP annotation, gene activity score and statistics information. Among these, the regulatory network is formed by annota ted TFs, pa thways containing these TFs, SEs bound by TFs, and SE-associated genes. For each sample, the relationships between SEs and annotated TFs were built through ChIP-seq data and motif scanning under a pre-set 'FIMO Threshold'. SEs and their target genes were linked using 'SE-Gene linking strategy'. We established pathway-TF relationship if the TF was a component of the pathway. Finally, these r elationships wer e merged to construct r egulatory networks. Furthermor e, we calculated the number of SEs bound by each annotated TF and displayed it in the histogram. The activity score of genes related to SEs bound by TFs was also visualized. We performed risk SNP annotation for each SE. The number of risk SNPs for each disease / trait associated with SEs was shown in the bar chart.
SEs are usually considered as cell / tissue-specific DNA regulatory elements. We added 'Sample comparati v e analysis' function to explore the roles of differential SEs and common SEs between two samples of interest (Supplementary Figure S2B). The users could select two SE samples of interest filtering by 'Species', 'Tissue Type' and 'Sample Name'. Further, the users could choose multiple thresholds, including 'FIMO Threshold' and 'SE-Gene Linking Strategies'. Next, we will compare the genomic regions of SEs between the two selected samples, these non-overlapping regions as specific SEs and overlapping region as common SEs of each sample (Supplementary Figure S2B). The 'Sample comparison analysis' returned the detailed regulatory network information of common / specific SEs in the two samples, which will contribute to evaluate the different regulatory roles of these SEs. The output result included the following: (i) the detailed information of the selected SE samples; (ii) the table and visualization of corresponding regulation networks of common / specific SEs in two samples; (iii) the gene activity score of common / specific SE-target genes; (iv) the topology of each node in the network, including degree, closeness, betweenness and page rank and (v) the ratio of risk SNPs within the SE region for each disease / trait.

Implementation
The current version of the SEanalysis w e bsite runs on a Linux-based Tomcat w e b server 8.5.78 ( http://tomcat. apache.org/ ). All data in this program is stored in the rela tional da tabase MySQL 5.7.16 ( http://www.mysql.com ). We built the project using Spring Boot 2.7.0 framework ( https://spring.io/projects/spring-boot ). The SEanalysis w e b interface was designed and built using Bootstrap W524 Nucleic Acids Research, 2023, Vol. 51, Web Server issue v5.1.3 ( https://getbootstrap.com/ ) and jQuery v3.6.0 ( http: //jquery.com ). We used ECharts ( http://echarts.baidu.com ) as a graphical visualization frame wor k, and JBrowse ( https: //jbro wse.or g/jb2/ ) as the genome browser frame wor k. We proposed using a modern w e b browser that supported the HTML5 standard such as Firefox, Google Chrome and Edge for the best display.

CASE STUDY
To explore the mechanism of action of leukemic cell marker TFs mediated by SEs, we downloaded 58 marker TFs of cancer cells in blood tissue from TF-Marker ( 38 ) database as input for the TF regulatory analysis, including SP1 and FLI1 , etc. (Figure 2 A). The parament settings were as follows: Species: Human; Fimo threshold: 1e-09; Enrichment Threshold: 0.05 and SE-Gene Linking Strategies: Closest acti v e. First, SEanalysis 2.0 performed enrichment analysis based on the input TFs and TFs binding to SEs in each sample after clicking the 'Analysis' button. The enrichment analysis results were displayed in the table, including 'Sample ID', 'Species', 'Tissue type', 'Biosample name', 'Annotated TF', 'Annotated TF number', 'ALL TF number', 'P value' and 'q value' (Figure 2 A). A total of 81 samples were significantly enriched. Among these, 63 (72%) enriched samples were related to blood, bone marrow, haematopoietic, and lymphoid tissues. Further, 16 of the remaining 18 samples were cancer cells / tissues, including breast cancer and renal clear cell adenocarcinoma (Figure 2 B). This result suggested that these TFs, the markers of cancer cells, might potentially impact the de v elopment and progression of cancer by regulating SEs.
The leukaemia cell sample (Sample 02 0337; MV411; P value = 0.00185) and the non-cancer sample (Sample 02 1329; CD34 + cells; P value = 0.00113) ranked highly as second and first among all enriched samples (Figure 2 A). The number of annotated TFs in the two samples was 11 and 7, respecti v ely. Ne xt, we selected these two samples and clicked the 'Submit' button to further explore the regulatory mechanism of TFs. We first constructed the transcriptional regulatory network formed by these annotated TFs in the corresponding sample, including the pathway containing these TFs, SEs bound by these TFs, and SE-related genes (Figure 2 C). This network was visualized interacti v ely and could also be viewed in the table. The node size r epr esented its degree in the network. The number of SEs bound by TFs was displayed in a histogram (Figure 2 D). We found that most TFs bound more SEs in cancer cells than in non-cancer cells. For example, TF RUNX1 with the highest degree regulated 606 SEs in the leukaemia cell line MV411 and only 54 SEs in noncancer cells. In contrast, TF GA TA1 regula ted only one SE in MV411 cells, but it regulated 61 SEs in non-cancer cells. Meanwhile, we used limma ( 39 ) to obtain the differential expression information based on GEPIA2 ( 40 ) tool on acute myeloid leukemia (LAML) data from TCGA. The result showed tha t R UNX1 and GA TA1 were significantly dif ferentially expressed genes. RUNX1 is an ov er-e xpressed gene (log 2 (FC): 2.686; P value: 4.73e-74), whereas GATA1 is an under-expressed gene (log 2 (FC): -4.762; P value: 5.41e-62). Furthermore, the annotation result of SNP showed that these SE regions tended to have SNPs related to blood cell processes and other traits. We provided a histogram to display the SNP number of each disease / trait in the MV411 sample. Among the top 10 diseases / traits, 7 were related to blood cells, including White blood cell count and Monocyte count (Figure 2 E).
Finall y, we anal yzed 529 genes associated with SEs occupied by 11 annotated TFs. Notably, these genes exhibited significantly higher activity scores compared with other genes in the leukaemia cell line MV411 (Figure 2 F). Next, we further analyzed these SE-target genes in LAML using GEPIA2. As expected, the scatter chart indicated that target genes of SEs were significantly correlated with TFs binding to SEs (Pearson r = 0.7, P value = 0) (Figure 2 G). The box diagram and heatmap showed that the e xpression le v els of these genes were significantly higher in LAML compared with normal samples. More importantl y, the survival anal ysis showed that the high expression of these genes was associated with poor overall survival (Figure 2 G).
The afor ementioned r esults indica ted tha t R UNX1 and other marker TFs caused the abnormal transcription of downstream genes in diseases by regulating SEs, and affected the disease process and survival. At the same time, specific disease / trait-related SNPs played an important role in activating and inhibiting SEs. Consistent with our results, a previous study showed that the SE in intragenic of RUNX1 was editing-outed will r epr essed RUNX1 , further inhibited cell growth and induced death in AML cells expressing mtRUNX1 ( 41 ). This proved the effecti v e analytical ability of SEanalysis 2.0 tools. Furthermore, SEanalysis 2.0 also re v ealed the nov el regulatory mechanism of master TF. Another study showed that GATA1 maintained the expression of the KIT receptor in human erythr oid pr ogenitors by binding to a stage-specific SE ( 42 ). Se v eral studies further showed that the mutation of KIT was significantly associated with a poor prognosis ( 43 ). The regulatory mechanism of GATA1 and SE in leukaemia was still unclear. Our analysis results hinted that the dysregulation of GATA1 during leukemia progression was likely associated with its upstream SE (Figure 2 D).

SUMMARY
The complex network mediated by SEs and associated TFs underlies lineage identity ( 44 ). SEanalysis has been de v eloped to provide SE-associated regulatory network analyses. A large number of recent studies focused on the clinical role of SE-mediated molecular mechanisms affecting oncogene d ysregula tion. TFs and SNPs play an important role in the action of SEs. They can activate or inhibit the SEs and affect their upstream and downstream regulatory relationships. Meanwhile, increasing evidence indica tes tha t SEs can be considered potential drug targets. To further advance mechanistic r esear ch, we de v eloped SEanalysis 2.0 for more comprehensi v e and fle xib le analyses for SE-related regulatory networks. SEanalysis 2.0 had two major improvements: data extension and enhanced anal ysis functions. SEanal ysis 2.0 currently offers 1739 human samples and 931 mouse samples, which are at least 5 times as many as in SEanalysis 1.0. Meanwhile, TF ChIP-seq data and motifs are further  collected to cover more TFs. We added > 600 human TFs and newly added 755 mouse TFs. The explosion of SEs and TFs has made the regulatory relationship coverage more comprehensi v e, thus significantly improving the ability of three SE-related network analysis tools in the original version 1.0 ('pathway downstream analysis', 'upstr eam r egulatory analysis' and 'genomic region annotation'). We also added two analytical tools: 'TF regulatory analysis' and 'Sample comparati v e analysis' for supporting mor e compr ehensi v e analyses of SE regulatory networ ks dri v en by TFs. The urgent need to elucidate the regulatory mechanisms of TFs mediated by SEs promotes the establishment of 'TF regulatory analysis' which can facilitate a comprehensi v e analysis of SE-dri v en TF regulatory networ ks. The effecti v eness of TF regulatory analysis was supported by the analysis of leukaemia samples. Furthermore, SEs can orchestrate cell type-specific gene regulation. The regulatory network analysis for special / common SEs in two samples is essential for understanding context-specific gene regulation. Thus, we also de v eloped the 'Sample comparati v e analysis' to help interpret cell type-specific regulatory roles of these SEs. The cell type-specific regulation of SEs is often associated with important biological processes and diseases. Accordingly, we further provided the annotation and statistics of risk SNPs, which could link SEs with diseases / traits. The enrichment of risk SNPs in these SE regions could provide information on mechanisms underlying diseases / traits at a cell type-specific le v el.
Hence, we belie v ed that SEanalysis 2.0 significantly expanded the data and analytical capabilities of SEs. SEanalysis 2.0 helped better explore the key role of SEs in molecular mechanisms underlying disease occurrence and cell biological processes, thus providing a more in-depth understanding of SEs for r esear chers. Notably, although we have used a large amount of ChIP-seq data to construct SE-TF rela tionships, ChIP-seq da ta cannot cover more TFs or cells / tissues. We extended SE-TF relationships using the motif-based method. Howe v er, the motif-based method has certain limitations. For example, some TFs with similar motifs are likely to be identified together increasing the number of non-functional hits. Obviously, it is necessary to continue to expand the TF-SE relationships identified from ChIP-seq da ta. With the accumula tion of ChIP-seq da ta, we will continue to update the SEanalysis.

DA T A A V AILABILITY
SEanalysis 2.0 is freely available without registration or login at http://licpathway.net/SEanalysis .