Mol* Volumes and Segmentations: visualization and interpretation of cell imaging data alongside macromolecular structure data and biological annotations

Abstract Segmentation helps interpret imaging data in a biological context. With the development of powerful tools for automated segmentation, public repositories for imaging data have added support for sharing and visualizing segmentations, creating the need for interactive web-based visualization of 3D volume segmentations. To address the ongoing challenge of integrating and visualizing multimodal data, we developed Mol* Volumes and Segmentations (Mol*VS), which enables the interactive, web-based visualization of cellular imaging data supported by macromolecular data and biological annotations. Mol*VS is fully integrated into Mol* Viewer, which is already used for visualization by several public repositories. All EMDB and EMPIAR entries with segmentation datasets are accessible via Mol*VS, which supports the visualization of data from a wide range of electron and light microscopy experiments. Additionally, users can run a local instance of Mol*VS to visualize and share custom datasets in generic or application-specific formats including volumes in .ccp4, .mrc, and .map, and segmentations in EMDB-SFF .hff, Amira .am, iMod .mod, and Segger .seg. Mol*VS is open source and freely available at https://molstarvolseg.ncbr.muni.cz/.


INTRODUCTION
Segmentation, which is the decomposition of a twodimensional (2D) or three-dimensional (3D) image into regions that can be associated with defined objects, has been recognized as the bridge to interpreting microscopy data in a biological context. While medical imaging has traditionally relied heavily on manual segmentation, advancements in segmentation tools ( 1 , 2 ), data format definition ( 3 ), and data sharing pipelines ( 4 ) have led to a substantial increase in the number of depositions of volume segmenta tion da ta from cell and molecular imaging experiments to public r epositories ( 5 , 6 ). Wher eas many desktop applications provide segmentation functionality and visualization options (7)(8)(9)(10)(11)(12), there is very limited support for the visualization of 3D volume segmentations in public repositories.
For example, the Cell Image Library (CIL) ( 13 ) has integrated CDeep3M ( 11 ) f or perf orming segmentation tasks on CIL entries, but it only provides visualization of 2D slices. The Brain Observatory Storage Service and Database (BossDB) ( 14 ) has integrated a Neuroglancer interface ( https://github.com/goo gle/neuro glancer ) that facilitates seamless zooming through superimposed 2D slices, but does not allow the real-time examination of 3D volumes. Similarly, the Electr on Micr oscopy Database (EMDB) ( 15 ) provides 3D visualization of densities, but segments are only defined on the 2D slices. Other repositories such as Electr on Micr oscopy Public Image Archi v e (EMPIAR) ( 6 ) and Image Data Resources (IDR) ( 16 ) do not provide 3D visualization. Ther efor e, ther e is a need for accessible, w e b-based visualization of 3D volume segmentations, especially in the context of complex structural data and annotations linked across different databases.
We previously introduced Mol* as a library of tools for the visualization and analysis of macromolecular data ( 17 ). Mol* has become a large collaborati v e project, and its associated w e b-based 3D view er Mol* View er ( 18 ) has been fully incorporated into the public interfaces of PDBe ( 19 ), RCSB PDB ( 20 ), and AlphaFold Protein Structure Database ( 21 ), enabling real-time visualization and interrogation of 3D models and related macromolecular data for millions of users.
Here, we introduce Mol* Volumes & Segmentations (Mol*VS) ( https://molstarvolseg.ncbr.muni.cz/ ), a free tool based on Mol* and dedicated to the real-time visualization of large-scale volumetric data from cryo-EM, light microscopy, volume-EM, and other imaging experiments, as well as their segmentations and annotations for biological context. Mol*VS provides seamless access to all curated segmenta tion da tasets available in EMDB and EMPIAR. Additionally, Mol*VS can be run locally to support the visualization and sharing of custom datasets containing volumetric segmentation data in se v eral formats. Mol*VS is an open-source project hosted on GitHub ( https://github.com/ molstar/molstar-volseg ).
• Displaying biological context annotations for each volume or mesh segment. • Supplementing the volumetric and segmentation data with macromolecular coordinates. • Concomitant display of different segmentations of the same dataset to facilitate visual comparison. • Streaming data according to visualization needs.
Instructions on how to use Mol*VS are available at its w e b page.

Ar chitectur e and implementation
Mol*VS processes volumetric and segmentation data and deli v ers it to a dedicated Mol* Viewer VS extension, so that e v en v ery large datasets can be visualized with low latency. Mol*VS has four major components, namely a preprocessor module (written in Python), an internal database with preprocessed data, a server module that queries the internal database (written in Python), and a client module (written in Typescript) that requests and interprets the data recei v ed so that it can be displayed (Figure 1 ).

Workflow
The Mol*VS workflow ( Figure 1 ) is not exposed to users but is designed to ensure seamless integration within the Mol* environment, effecti v ely providing w e b-based concomitant and interacti v e visualization of 3D volumes of cells, organelles , and molecules , together with volume segmentations and their annotations, irrespecti v e of the size of the original dataset.
Input processing. is only needed when the internal database has to be updated by adding, removing, or changing entries. By default, the pr epr ocessor module of Mol*VS tak es tw o inputs for each entry: a volume or mesh segmentation file in EMDB-SFF format (.hff) and a 3D map file (.map, .mrc, .ccp4) from the EM reconstruction. Segmentation input can also be provided in other formats (Amira .am, iMod .mod, Segger .seg), which will be internally converted to EMDB-SFF using the EMDB-SFF Toolkit ( https://sf ftk.readthedocs.io/en/la test/toolkit.html ) integrated in Mol*VS. While the default workflow is optimized for EM data, Mol*VS also has experimental support for OME-NGFF input to facilitate the visualization of light microscopy data (not mentioned in Figure 1 ). The pr epr ocessor module of Mol*VS converts the input into an internal format (Zarr, https://zarr.readthedocs.io ), which is essentially a set of chunked, compressed, N -dimensional arrays. These preprocessed data are stored in the Mol*VS internal database in both original and downsampled forms, together with precomputed sta tistics, metada ta, and internal biological annotations for each dataset (in JSON format).
Data deliv ery. Whene v er data ar e r equested by the Mol*VS client module, the server module performs a metadata request of the internal database and sends metadata to the client . Metadata is then used by the client to pr epar e the query for volumetric and segmentation da ta. Then, the c lient module sends the appropriate query for volumetric and segmentation data, specifying the volume / segmentation region and maximum acceptable size of the data. The server module decides the appropriate downsampling le v el based on the client request, queries the internal database , packs the requested volumetric and segmenta tion da ta into BinaryCIF forma t ( 26 ), and deli v ers them back to the client module.
Visualization. is handled by the client module (the Mol* Vie wer VS e xtension). Upon recei ving data from the serv er module, the Mol* Viewer VS extension unpacks it, allowing Mol* Viewer to create a state tree with volume and segmenta tion da ta. The corresponding entities ar e r ender ed on the 3D canvas ( Figure 2 ). Segment annotations are displayed via the same mechanisms that support annotations for amino acid residues and protein chains. The concomitant display of related entries from different public repositories effecti v ely allows dif ferent types of da ta to be examined within the same biological context (Figure 3 A). Different segmentations of the same dataset are stored in separate entries of the Mol*VS internal database, which can be displayed concomitantly to facilitate comparison (Figure 3 B).

Local instance of Mol*VS
W hile visualiza tion is the focus of most end users, other categories of users can also benefit from Mol*VS. In particular, institutes and consortia who wish to share volume or mesh segmenta tion da ta with a priva te or public user community can host a local instance of Mol*VS. This way, their users can visualize the data in a w e b browser, without having to download anything. A step-by-step tutorial with all technical information needed to host Mol*VS locally is available in its GitHub repository.

Database co ver age
The Mol*VS internal database covers all EMDB and EM-PIAR entries with segmenta tion da ta, and is upda ted periodicall y. Additionall y, we provide entries deri v ed from EMDB, BioImage Archi v e, and IDR datasets to showcase its support for applica tion-specific segmenta tion forma ts and facilitate comparison. A full description of the contents of the internal database is available in the Mol*VS documentation (see its w e b page). Individual users and platforms providing access to cell imaging data can freely host a local instance of Mol*VS and fill the internal database with any content supported by the preprocessor module.

Limitations and outlook
While recent years have seen an increase in the deposition of volume segmentation data, this trend is still in its infancy. Ther efor e, some of the source data may contain errors or are not well visualized by Mol*VS when default settings are applied. Users can resolve or mitigate such issues by following the instructions listed in the documentation (see Mol*VS w e b page), and can report issues or gi v e suggestions via the Mol*VS GitHub r epository. Furthermor e, Mol*VS currently provides only experimental support for the OME-NGFF format, as showcased in entry idr-6001240 of the internal database (Figure 3 E), because this format is still in acti v e de v elopment.
Limita tions rela ted to source data availability and forma t standardiza tion af fect our ability to optimize Mol*VS at this time. Nonetheless, we belie v e that EMDB-SFF and OME-NGFF will become the standard formats for sharing and storing EM and light microscopy data, respecti v ely. Ther efor e, we ar e fully committed to adjusting and extending the Mol*VS support f or these f ormats as needed. In fact, we are acti v ely coopera ting with teams from EMDB , EM-PIAR and BioImage Archi v e ( 31 ) to ensure that Mol*VS always contains the la test segmenta tion da ta available in these primary sources, and that all updates to the data format are adequately supported. We are confident that, by facilitating the w e b-based visualization of segmentation data and annotations, Mol*VS will promote the deposition of segmenta tion da ta in public repositories.   ( 27 )). ( B ) Concomitant display of two segmentation outcomes (manual and automated) for a dataset from cryo-EM combined with individual-particle electron tomo gra phy imaging of human plasma lipoproteins (in purple and azure) in complex with a monoclonal antibody (in green) (EMD-9094 ( 28 )). ( C ) Mitochondrial reticulum in murine skeletal muscle imaged using ion beam scanning EM, where segmentation distinguishes mitochondria from other cellular structures (e.g., blood vessels highlighted in dark red) (EMPIAR-10070 ( 29 )). ( D ) HeLa cells imaged using confocal microscopy (EMPIAR-10819, not published yet). In the image, the cell and background is highlighted in dark blue while the surrounding environment is green. ( E ) Nuclear segmentation of mouse blastocysts imaged using confocal microscopy (image 6001240 from dataset idr0062 ( 30 )).

CONCLUSION
Mol*VS ( https://molstarvolseg.ncbr.muni.cz/ ) is a powerful w e b application for the interacti v e visualization of volumetric and segmentation data supported by macromolecular data and biological annotations. Volume data may originate from various imaging experiments, from cryo-EM to classical light microscopy. Segmentation data may be provided in generic (EMDB-SFF .hff, OME-NGFF) or applicationspecific formats (Amira .am, iMod .mod, or Segger .seg). Both volumetric and mesh segmentations are supported. Multiple segmentations for the same dataset can be compar ed easily. Data str eaming allows interacti v e visualization, irrespecti v e of the size of the original data set. Mol*VS facilita tes the visualiza tion of all EMDB and EMPIAR entries with segmentation data, but users may also run a local instance of Mol*VS to visualize and share custom datasets.

DA T A A V AILABILITY
The Mol*VS w e b server and its documentation are accessible at https://molstarvolseg.ncbr.muni.cz/ . The current version of the source code is in the Supplementary Data, while the most recent version is available at https://github.com/ molstar/molstar-volseg .

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.