3D-GNOME 3.0: a three-dimensional genome modelling engine for analysing changes of promoter-enhancer contacts in the human genome

Abstract In the current update, we added a feature for analysing changes in spatial distances between promoters and enhancers in chromatin 3D model ensembles. We updated our datasets by the novel in situ CTCF and RNAPII ChIA-PET chromatin loops obtained from the GM12878 cell line mapped to the GRCh38 genome assembly and extended the 1000 Genomes SVs dataset. To handle the new datasets, we applied GPU acceleration for the modelling engine, which gives a speed-up of 30× versus the previous versions. To improve visualisation and data analysis, we embedded the IGV tool for viewing ChIA-PET arcs with additional genes and SVs annotations. For 3D model visualisation, we added a new viewer: NGL, where we provided colouring by gene and enhancer location. The models are downloadable in mmcif and xyz format. The web server is hosted and performs calculations on DGX A100 GPU servers that provide optimal performance with multitasking. 3D-GNOME 3.0 web server provides unique insights into the topological mechanism of human variations at the population scale with high speed-up and is freely available at https://3dgnome.mini.pw.edu.pl/.


INTRODUCTION
One of the primary challenges in human genetics, precision medicine, and evolutionary biology is deciphering gene expr ession r egulation and understanding the transcriptional effects of genome variation (1)(2)(3). The three-dimensional organisation of chromatin and the spatial proximity between enhancers and gene promoters have been shown to impact gene expression significantl y (4)(5)(6)(7). Additionall y, structural variants (SVs) that alter chromatin structure can profoundly affect gene regulation ( 8 ). Genomic studies indica te tha t SVs can directly impact the interactions between the promoter and enhancer regions of the chromatin ( 9 , 10 ), which could lead to the de v elopment of new therapeutic targets and diagnostic tools ( 11 , 12 ). Thus, de v eloping an accessible tool to investigate these complex interactions is essential for advancing the understanding of gene expression regulation and genome variation. This paper proposes a new version of the 3D-GNOME w e b server ( 13 , 14 ) that provides tools for comparing different 3D structures of the genome (Figure 1 ). It enables the analysis of changes in the mod- elled distance distribution between enhancers and gene promoters in the GM12878 cell line, with genomic rearrangements based on Structural Variants (SV) provided by 1000 Genome Project ( 15 ). Furthermore, although the primary focus of the 3D-GNOME w e b server is to analyse largescale genome rearrangements, such as Structural Variants, it is also capable of handling small changes like insertions or deletions of a few nucleotides (indels), particularly if they have an impact on chromatin contacts. By offering differential contact sets from various samples, the system allows for a broader range of structural polymorphisms to be analysed. In both cases, the server allows for the provision of custom variants and chromatin interaction datasets.
The pre vious v ersion of the 3D-Gnome w e b server (2.0) ( 13 ) was de v eloped by incorpor ating Structur al Variants to model 3D structural changes in chromatin, based on longread ChIA-PET GRCh37 data and SVs from 1000 Genome Projects. With the update to both: 3D data into in situ ChIA-PET GRCh38 and the new SV dataset from 1kGP, CPU-based computation turned out to be slow. Ther efor e, to handle new dense datasets, we extended the 3D-GNOME modelling engine (which is described in detail in ( 16 )) with GPU-acceleration. Moreover, because of up to 30 × speedup achie v ed now, we can create ensemb les of 3D models of r efer ence and r eorder ed genomes to study the changes in the promoter-enhancer modelled distance distribution, providing a more specific picture to understand gene expression.
Apart from the major changes described above, we shifted the w e b serv er to ne w GPU serv ers with A100 graphics car ds, gi ving users fast r esults. Mor eo ver, to mak e the re-Nucleic Acids Research, 2023, Vol. 51, Web Server issue W7 sults more comprehensible, we changed the loop viewer to IGV ( 17 ) and the model viewer to NGL ( 18 ). Also, for better visualisation of the modelling r esults, models ar e colour ed based on gene promoter body and enhancer location.
As far as we are aware, only a limited number of w e b servers are available that offer the ability to generate chromatin 3D models and detailed genomic feature analysis ( 19 , 20 ). Howe v er, none of these web servers provides the option to calculate changes in spatial distances between enhancers and genes caused by structural variants in different human populations by generating full ensembles of chromatin 3D models based on high-resolution ChIA-PET data, w hich is w hy we find our new feature unique. This new release gives abilities for analysing the potential impact of genome spatial changes on gene activity, allowing for a deeper understanding of gene regulation and cellular processes.

New datasets
In the pre vious v ersion of 3D-GNOME w e b server, the modelling of chromatin structure was based on long-range ChIA-PET data, including CTCF and RNAPII chromatin interactions of the GM12878 cell line mapped onto the GRCh37 r efer ence genome, as well as structural variants from 2502 samples from the 1000 Genomes Project release 3, also mapped onto GRCh37.
In the current v ersion, we hav e replaced the previous dataset of CTCF and RNAPII interactions in the GM12878 cell line, which was obtained from long-read ChIA-PET ( 21 , 22 ), with high-resolution data from in situ ChIA-PET ( 23 ), w hich was ma pped onto the GRCh38 r efer ence genome. The new dataset provides substantially more chromatin interactions with higher confidence and offers a more comprehensi v e and accurate view of the genome's spatial architecture in the GM12878 cell line. As a result of this new dataset, the quality of chromatin 3D models generated using 3D-GNOME has also improved.
The structural variant dataset ( 15 ) has been updated, with the previous GRCh37 version replaced with a GRCh38 version. The number of samples expanded to 3202 by including 30x high-coverage data from the NYGC on GRCh38. These updates provide a more comprehensi v e and accurate r epr esentation of chromatin structure, enabling further analysis and understanding of its impact on gene expression.

GPU-accelerated modelling engine
We have implemented GPU acceleration into our modelling engine, which is based on the Simulated Annealing Monte Carlo method, to address the significant increase in calculation time when analysing ensembles of chromatin 3D models using much larger datasets of chromatin 3D contacts. As a result, we hav e achie v ed a 30x speed-up compared to the pre vious v ersion. To facilitate subsequent analysis, we hav e converted the models from the hcm, 3D-GNOME nati v e format to the XYZ and mmCIF formats, which can handle models with many more beads than the PDB format.

Updated web server architecture
The primary modelling task is performed on the Eden cluster, an in-house heterogeneous computing cluster equipped with Nvidia DGX A100 nodes. The Eden cluster is controlled by the Slurm ( 24 ) queuing system, which is deployed at the Faculty of Mathematics and Information Science at Warsaw Uni v ersity of Technology. The 3D-GNOME w e b interface runs on an LXC container in a Pro xMo x environment.
When a user submits a modelling request, the Flask w e b serv er e xecutes a sequence of tasks, including validating the da ta, saving the da ta in a shared loca tion with the Eden cluster, crea ting a da tabase entry for the new task, and passing the task identifier to a concurrently running Gnu Parallel process ( 25 ). Gnu Parallel runs a Python script with a pipeline that performs local data pre-processing and then sends a request to run the modelling on the cluster.
Communication between the container and the Slurm controller is done through a REST API. The pipeline process periodically checks the status of the Slurm task, and when it recei v es information about the completion of the computation, it performs post-processing and updates the database entry. Once the modelling is complete, the user can view the results by refreshing the page.

Ensemble analysis
A key feature of the current update is the ability to analyse changes in spatial distances between gene promoters and enhancers caused by structural variants. This involves genera ting multiple chroma tin 3D models for a specific chromatin region, both for the reference chromatin contact pattern and for the pattern affected by the SVs. Genes (GRCh38) and enhancers (based on Enhancer Atlas 2.0 ( 26 ), liftovered to GRCh38) are mapped onto each model, and the Euclidean distance between enhancers is calculated. The distance measure is specific to the 3D-GNOME engine, so the key factor for analysis is a change in distance distribution, as demonstrated in Sadowski et al. ( 27 ). To test the significance of the change in distance distribution, we use the Mann-Whitney U test with a P -value threshold of 0.05. This analysis provides insights into the impact of SVs on gene regulation by identifying changes in the spatial proximity between gene promoters and enhancers.

Input
In the request form, as in the previous version, the user may use pr epar ed da tasets for GM12878 chroma tin interactions, set the region of interests and 3D modelling parameters and choose the sample ID of structure variation from the 1000 Genome Project database ( 15 ). It is also possible to upload chromatin interactions in BEDPE format or SVs in VCF format (VCFv4.2).
In the current version, we add to the form checkbox that runs ensemble analysis and sets the number of models in the ensemble.

Output
The 3D-GNOME w e b serv er presents ne w results in a fully responsi v e tab le and a boxplot generator for visualising the distribution of gene-enhancer distances. In addition, the w e b server has been updated with new tools for data visualisation, building on the functionality of the previous version.
Genome br o wser . We have integra ted the Interacti v e Genome Viewer (IGV) ( 17 ) as a genome br owser, pr oviding an alternati v e to pr esenting ar c diagrams in static PNG format as in the previous version (Figure 2 A). IGV is a highly responsi v e tool that allows users to visualise and manipulate genomic tracks, such as chromatin contact arcs, and gene, enhancer, and structural varia tion annota tions for r efer ence and variant samples. All data are displayed on the GRCh38 genome assembly. One notable feature of IGV is its ability to save results in vector file format (SVG), which makes it easy to present results outside the w e b server.
3D view er . We have integra ted NGL ( 18 ), a modern and interacti v e molecular visualisation tool, to present chromatin 3D structures dynamically and intuiti v ely. NGL enables users to explore and interact with 3D models generated by 3D-GNOME, allowing them to adjust the view, zoom in and out, and rotate the structures to better understand the spatial relationships between genes and enhancers. Users can investigate the impact of structural variation on the 3D organisation of the genome by displaying both r efer ence and variant 3D models on two separate 3D viewers, with coloured by genes and enhancers mapped on them (Figure 2 B).
Promoter-enhancer distance comparison. We present the results of comparisons of promoter-enhancer distances in a responsi v e tab le generated using the Bootstrap package (Figure 2 C). The table displays the genes, gene types (including pseudogenes), enhancers with an enhancer score, average gene-enhancer distance in an ensemble in the reference and variant structures, as well as differences between these two ensembles and p-values of the significance of those differences. Users can search, sort, and filter the results by columns. Furthermore, we have added an option to generate a distribution boxplot for a selected region (Figure 2 D). The user may select rows with gene-enhancer pairs using checkboxes and use the 'Generate distance boxplots' button to submit the task. After that, using Ajax, the task is asynchronously transferred to Flask, the boxplots are calculated, and they are drawn using the Seaborn package ( 28 ).
Finally, after the automated page r efr eshing, the boxplots are viewed on the result w e b page. The boxplots with distances are displayed on the screen below the table and can also be downloaded from the download section.
Download section . The download section now includes the entire generated ensemble of models in mmCIF and XYZ f ormat f or manual analysis using common tools for Nucleic Acids Research, 2023, Vol. 51, Web Server issue W9 visualising 3D structures, such as UCSC Chimera. Additionally, a tsv file with the results of the distance analysis is provided, including gene IDs, gene and enhancer coordinates, average distances in the ensemble, and the results of the Mann-Whitney U test of distribution changes ( P -value and statistical value). Each gene-enhancer distance boxplot generated by clicking the 'Generate boxplots' button is also included in the output file folder.

CONCLUSIONS AND FUTURE PLANS
This la test upda te to 3D-GNOME w e b server provides an advanced tool for analysing modelled distance changes between enhancers and gene promoters. This is a valuable resource for exploring the impact of 3D chromatin structure on gene transcription and regulation. The new version offers significantly improved speed and efficiency due to GPU acceleration and Eden cluster ar chitectur e, enabling faster and more efficient chromatin modelling and analysis. We have also added new tools, including the NGL Viewer and IGV genome browser, which enhance the user experience by providing an intuiti v e and visuall y a ppealing way to analyse data.
In the near future, we plan to extend our datasets of chromatin interactions by including additional cell lines, such as H1ESC, HFFC6 and WTC11, as well as new structure variants from the Simons Di v ersity Projects for modern humans and archaic populations, such as Neanderthals and Denisovans. Including these archaic populations will provide a unique opportunity to investigate the evolution of chromatin structure and its impact on gene regulation across dif ferent popula tions, shedding new light on the history and di v ersity of our species. We also plan to add ne w input formats and datasets, such as Hi-C data, which are already standard in the scientific community. To facilitate this, we plan to implement in the web server chromatin loop calling software, which is necessary for conv erting nati v e Hi-C data for 3D-GNOME modelling. In the near future, we will add new annotation tracks to the IGV genome browser, such as cell line-specific H3K27Ac marks, and colour these genomic features on 3D models to improve accessibility and facilitate better analysis of complex interactions between them.