ChemMaps.com v2.0: exploring the environmental chemical universe

Abstract Access to computationally based visualization tools to navigate chemical space has become more important due to the increasing size and diversity of publicly accessible databases, associated compendiums of high-throughput screening (HTS) results, and other descriptor and effects data. However, application of these techniques requires advanced programming skills that are beyond the capabilities of many stakeholders. Here we report the development of the second version of the ChemMaps.com webserver (https://sandbox.ntp.niehs.nih.gov/chemmaps/) focused on environmental chemical space. The chemical space of ChemMaps.com v2.0, released in 2022, now includes approximately one million environmental chemicals from the EPA Distributed Structure-Searchable Toxicity (DSSTox) inventory. ChemMaps.com v2.0 incorporates mapping of HTS assay data from the U.S. federal Tox21 research collaboration program, which includes results from around 2000 assays tested on up to 10 000 chemicals. As a case example, we showcased chemical space navigation for Perfluorooctanoic Acid (PFOA), part of the Per- and polyfluoroalkyl substances (PFAS) chemical family, which are of significant concern for their potential effects on human health and the environment.


INTRODUCTION
Easily navigating chemical space is a current challenge today due to the rapid growth of chemical databases and the multiple steps r equir ed to r epr esent multidimensional data in a navigable space. Chemo gra phy, defined as the field for navigating chemical space ( 1 , 2 ), is facing scientific cyberinfrastructure challenges to improve navigation tools as well as a need to le v erage cheminformatics approaches to define the space that typically rely upon complex projection techniques ( 3 ). Typically, a chemical space is defined based on a large set of chemicals projected into two or three dimensions, wher e r elati v e distances between chemicals are a function of their similarity.
Navigating in chemical space has se v eral estab lished applications in drug discovery. It can be used to identify analogs ( 4 ), guide drug optimization ( 5 ) or more broadly to expand the drug space by observing and interpreting chemical similarity in a large chemical uni v erse (6)(7)(8). Howe v er, little attention has been gi v en to navigating the environmental chemical space, i.e. chemical space defined using chemicals found in the environment such as industrial chemicals , pesticides , food additi v es, personal care product ingredients, and contaminants of emerging concern. Navigating within this space could have major applications in regulatory decision frame wor ks to identify structural analogues for data-poor chemicals that could support risk assessments, r esear ch projects involving non-targeted analysis to define chemicals that may be present in the exposome ( 9 ), or r ead-across wher e similar chemical properties ar e used to fill data gaps for chemicals of interest ( 10 ), to name a few.
In 2018, we de v eloped ChemMaps.com, a w e b-based tool inspired by Google Maps, to navigate the chemical space ( 11 ). We focused the first version of the tool on the drug space, by projecting the DrugBank database ( 12 ) that included approved drugs and drugs-in-development. Subsequently, we de v eloped an initial environmental chemical space using the U.S. EPA Toxic Substances Control Act (TSCA) in ventory ( https://www.epa.gov/tsca-in ventory ) of > 40 000 chemicals. Here we present the de v elopment of ChemMaps.com v2.0., which substantially extends the scope of the first version to incorporate the U.S. Envir onmental Pr otection Agency's Distributed Structure-Searchable Toxicity (DSSTox) database ( 13 ); with over 1 million chemicals, this is the world's largest publicly available cur ated structur al database f or en vironmental chemicals . In ChemMaps .com v2.0, we have also mapped the rich in vitro assay data sets from the Tox21 and ToxCast highthr oughput screening pr ograms, covering thousands of cellular and molecular targets relevant to toxicological modes of action ( 14 ). In addition to the chemical space and annota ted da ta expansion, ChemMaps.com v2.0 provides new functionalities based on user-identified needs.

Expansion of chemical space
Originall y ChemMa ps.com was de v eloped to navigate in two maps, called the DrugMap and the EnvMap, computed respecti v ely from (i) drugs and drugs-in-de v elopment available in the DrugBank database ( 12 ) and (ii) chemicals included in the TSCA inventory. ChemMaps.com v2.0 now includes a vastly extended environmental chemicals uni v erse di vided into three ma ps. The largest ma p was de v eloped by incorporating the U.S. Environmental Protection Agency's Distributed Structur e-Sear chable Toxicity (DSSTox) database, the world's largest publicly available curated database for environmental chemicals with over 1 million structures ( 15 ), and is called DSSToxMap. The DSSToxMap included all the chemicals included in all of the other maps available in ChemMaps.com. We de v eloped two additional sub-maps: one with all 14629 identified per-and polyfluorinated substances (PFAS) structures downloaded on the EPA chemical dashboard ( 13 , 16 ) called PFASMap, and one with the 8236 chemicals that were tested in the Tox21 and ToxCast high-throughput screening programs called Tox21Map ( 17 ). These two subgroups of chemicals were chosen due to their prevalence in the environment and increasing concern over ecological and human health impacts, and their wealth of data on mechanistically informati v e targets, respecti v el y. We updated the DrugMa p with the latest version of the DrugBank (v5.1.10, release 2023-01-04).
Maps were computed using an updated version of the same approach originally de v eloped in ChemMaps v1.0 ( 11 ), i.e. using a set of non-correlated and informati v e (descriptor variance not null) 1D, 2D and 3D molecular descriptors computed using RDKIT (version 2021). Chemicals are projected into the space using a combination of two principal component analyses calculated from the 1D and 2D descriptors for the first two dimensions and 3D descriptors for the third dimension of the space. We developed a python library called CompDesc (v1.0.3), made available to compute molecular descriptors using RDKIT ( https://test.pypi.org/project/CompDesc/ ).

Chemical feature projections
Chemicals are represented in the space by a star or a planet, depending on the le v el of information available. For the DrugMap, stars are used to r epr esent approved and withdrawn drugs and planets are used to r epr esent drugs-in-de v elopment. For the environmental chemical maps (DSSToxMap, PFASMap and Tox21Map), stars are used if chemicals have oral acute toxicity class according to the United Nations Globally Harmonized System of Classification and Labelling of Chemicals (UN GHS) available ( 18 , 19 ). In addition to the shape r epr esentation, up to fiv e features can be chosen by users and used to color chemicals on the ma ps. DrugMa ps features include experimental physicochemical properties available in the DrugBank database, and for DSSToxMa p, PFASMa p and Tox21Map predicted physicochemical properties computed using OPERA v2.8 ( 20 ).

Chemical bioactivities
We have mapped the rich in vitro assay data sets from the T ox21 and T oxCast high-throughput screening programs, covering thousands of cellular and molecular targets, to the Tox21Map in 3D chemical space. Users can select specific assays or groups of assays based on their target to see on the Tox21Map chemical activities and the most active assays by chemicals. We also provide to users an interacti v e spreadsheet to navigate data and the option to project on each chemical its most potent activity, as r epr esented by the lowest AC50 and corresponding assay target. We used the curated version of the HTS data processed using the US EPA's tcpl R package ( 21 ) and the National Toxicology Progr am Inter agency Center for the Evaluation of Alternati v e Toxicological Methods (NICEATM) Integrated Chemical Environment curation and annotation workflow (22)(23)(24).

User-defined chemicals
In ChemMaps.com v2.0, users can upload up to 100 of their own chemicals to project onto a chosen map. Input options include SMILES format, CASRN ID or DTXSID (unique structural identifier from US EPA's DSSTox database). Chemicals will be pr epar ed and projected on the fly in three steps to allow users to control and refine their input. To save computational time, each chemical uploaded is saved internally in our database. On the map, chemicals are represented using a rocket and are assigned an ID that can used in the search bar.

Webserver navigation
Since all information and coordinates of the molecules ar e pr e-computed, browsing does not r equir e computational skills. The DSSToxMap that includes more than 1 million chemicals is loaded by subset, each comprising around 10 000 chemicals centered on the chemicals of inter ests, i.e. pr eselected chemicals or those uploaded by users . ChemMaps .com v2.0 was de v eloped to wor k on commonly used w e b-browsers and tested for Firefox 111.0.1, Chrome 105.0.5195.102, and Edge 104.0.1293.70 and requires the WebGL JavaScript API as a dependence.

Webserv er dev elopment
The w e bserv er was de v eloped using Django in Python 3.9 on a Linux server. Data is stored in a PostgreSQL database used to store molecular descriptors , coordinates , and the 20 closest neighbors for each chemical and corresponding prepar ed structur e. Mor e than one million chemical entries ar e included in the database.

Application for PFOA
Here we demonstrate the use of ChemMaps.com v2.0 to explore the chemical space ar ound perfluor ooctanoic acid (PFOA). The chemical PFOA belongs to the perand polyfluoroalkyl substances (PFAS) chemicals family, also called 'fore v er chemicals'. These chemicals contain a least one polyfluoroalkyl chain that gi v es them particular resistance properties and are used in consumer products and industry ( 25 ). PFAS, and particularly PFOA, are Nucleic Acids Research, 2023, Vol. 51, Web Server issue W81 high-concern chemicals since they are found in the blood of > 97% of Americans, and there is strong emerging evidence that they can contribute to a variety of adverse health effects, including altered immune and thyroid function, li v er disease and cancer ( 26 ).
First, we searched PFOA (DTXSID8031865) on the DSSToxMap, Figure 1 . PFOA does not have a measured oral acute toxicity value but its ammonium form, ammonium perfluorooctanoate, has been studied and is classified as GHS 4 (harmful if inhaled). The chemical space around PFOA is an ar ea wher e the density and di v ersity of chemicals is low. Most of the direct neighbors of PFOA are also PFAS chemicals. A variety of analysis features may be projected onto the space. For example, PFOA and its closest neighbors (up to 20) fail the Lipinski rule by one property, which makes them more likely to be absorbed by the body and be bioaccumulati v e. We then e xplored PFOA on the PFASMap including only PFAS chemicals, Figure  2 . Most of its neighbors' chemicals on the DSSToxMap are the same as its neighbors on the PFASMap, and we can examine whether neighboring chemicals are predicted to be androgen receptor antagonist or estrogen receptor agonist which can help to identify endocrine disruption effects of these chemicals, ( 27 , 28 ), Figure 2 . The neighborhood of PFOA includes only few chemicals with an acute tox GHS classification (most in class 4 to 5), demonstrating that a broad range of these 'fore v er chemicals' remain untested in traditional toxicity studies. Finally, we explored the neighborhood of PFOA on the Tox21Ma p, w here the lowest AC50 and the count of acti v e assays from the Tox21 high throughput screening program for each chemical are mapped, Figure 3 . PFOA is reported acti v e in 75 assays from the T ox21 / T oxCast program and is the most acti v e in an assay that targets transthyretin (TTR) (CCTE GLTED hTTR dn) with an AC50 equal to 0.43 M. TTR is one of the major transport proteins responsible for binding to and transporting thyroid hormones to the necessary tissues. A detailed view of available toxicological information and specific assay results by chemical can be explored on the US EPA CompTox Chemicals dashboard ( 13 ) which is linked from ChemMaps v2.0 via the chemical ID in the chemical information panel. By exploring the neighborhood of PFOA we noticed that surrounding chemicals are most acti v e in assays that impact crucial pathways such as steroid hormone metabolism (NVS ADME hCYP2C9), xenobiotic metabolism (LTEA HepaRG CYP2B6 up), hepatic metabolism (CCTE Deisenroth AIME 384WELL CTox Inacti v e dn) and functional neural network activity (CCTE Shafer MEA dev spike duration mean dn) looking specifically at the fiv e closest chemicals. Exploring this space using this visual tool gi v es potential activity clues f or assa ys and endpoints that have not yet been tested, and can assist read-across analyses where similar chemicals profiles can fill testing gaps.

DISCUSSION
The expansion of the ChemMaps.com v2.0 space to an unprecedented number of environmental chemicals, and the ability to project user-defined chemicals, has tremendous utility to quickly visualize and explore chemicals of potential concern for impacts on human health and the environment, and to guide future research directions. The ChemMaps.com v2.0 w e bsite is free and open to all users, there is no login requirement, and it is available at https: //sandbox.ntp .niehs .nih.gov/chemmaps/ .
The ChemMaps.com v2.0 w e bsite ( https://sandbox.ntp. niehs.nih.gov/chemmaps/ ) is free and open to all users and there is no login requirement.