Atomic Charge Calculator II: web-based tool for the calculation of partial atomic charges

Abstract Partial atomic charges serve as a simple model for the electrostatic distribution of a molecule that drives its interactions with its surroundings. Since partial atomic charges are frequently used in computational chemistry, chemoinformatics and bioinformatics, many computational approaches for calculating them have been introduced. The most applicable are fast and reasonably accurate empirical charge calculation approaches. Here, we introduce Atomic Charge Calculator II (ACC II), a web application that enables the calculation of partial atomic charges via all the main empirical approaches and for all types of molecules. ACC II implements 17 empirical charge calculation methods, including the highly cited (QEq, EEM), the recently published (EQeq, EQeq+C), and the old but still often used (PEOE). ACC II enables the fast calculation of charges even for large macromolecular structures. The web server also offers charge visualization, courtesy of the powerful LiteMol viewer. The calculation setup of ACC II is very straightforward and enables the quick calculation of high-quality partial charges. The application is available at https://acc2.ncbr.muni.cz.


INTRODUCTION
Partial atomic charges are real numbers that model the distribution of charge density in a molecule. Thus, they provide clues to the chemical behaviour of molecules, even though they are not physically observable entities. The concept of partial atomic charges was first used in chemistry for explaining reactivity (1,2). Partial atomic charges were also adopted by computational chemistry (e.g. applications in molecular dynamics, docking, or conformational searches) (3)(4)(5) and they also became popular in chemoinformatics (e.g. descriptors for QSAR and QSPR modelling, or virtual screening) (6)(7)(8) and bioinformatics (e.g. similarity searches, or the study of mechanisms and effects connected with certain chemical actions) (9,10). Many computational approaches have been introduced for calculating them. The most reliable method for partial charge calculation is an application of quantum mechanics (QM), as recently reviewed in Cho et al. (11). Because QM methods are generally time-demanding, quicker non-QM empirical charge calculation approaches were developed. Specifically, the non-QM empirical methods do not consider individual electrons (or/and basis functions) in the calculations, but they work on the level of atoms. These approaches can be divided into conformationally independent, which are based on the 2D structure (so-called 2D methods; e.g. Gasteiger and Marsili's PEOE (12), MPEOE (13), KCM (14), or DENR (15)), and conformationally dependent, which are calculated from the 3D structure (so-called 3D methods; e.g. EEM (16), QEq (3), or EQeq (17)). Since non-QM empirical methods are often parameterized towards QM methods, their accuracy is comparable (see e.g. (18,19)), and due to the calculation speed, they are also applicable for biomacromolecules (10). For this reason, several non-QM empirical methods are frequently used and highly cited (e.g. PEOE > 3000 citations, EEM > 700 citations, QEq > 2000 citations), as was discovered in a literature search of the Web of Science database (https://www.webofknowledge.com/, 'All Databases' dataset) that we carried out on 31 March 2020. Complete results of the literature search are shown in the Supplementary Table S1.
The practical utilization of non-QM empirical charge calculation approaches brings with it three challenges: 1. Most non-QM empirical approaches are only described within an article, and their implementation is not available to the community. Only a few of the non-  (21), EEM-SOLVER (22), and ABEEM-SOLVER (22)), and only one non-QM empirical method (the EEM) is accessible as a web application (Atomic-ChargeCalculator (ACC) (23)). Despite the limited functionality of ACC, it became frequently used (∼2000 accesses per year). 2. The non-QM empirical approaches use parameters taken from physicochemical constants or QM charges. These parameters, however, are usually only optimized for specific types of molecules and not generally applicable. Therefore many parameter sets have been published, and their limitations are not easily accessible information. 3. Even though non-QM empirical approaches are much faster than QM methods, the time complexity of the conformationally dependent approaches is often O(N 3 ), where N is the number of atoms in the molecule, due to solving the system of linear equations. For their application on larger molecular systems (e.g. biomacromolecules), sophisticated complexity reduction algorithms (e.g. cutoff and cover methods (23)) have to be integrated.
In this article, we address all these challenges, and we provide Atomic Charge Calculator II, an update to ACC, which includes these key innovations: • Implementation of 17 charge calculation approaches, including the highly cited (QEq, EEM), the recently published (EQeq, EQeq+C), and the old but still often used (PEOE). Where applicable, the approaches that involve solving linear equation systems use cutoff and cover methods (23) for the fast processing of large macromolecular structures. If the approaches utilize parameters, all the published parameter sets were collected from the literature and integrated into ACC II. The list of all parameters included in ACC II is available in the Supplementary  Table S2. • The visualization of charges uses the powerful LiteMol viewer (24), which offers several viewing options as well as the manipulation of computation results. • The calculation setup is very straightforward and enables the quick calculation of high-quality charges.

DESCRIPTION OF THE WEB SERVER
ACC II is an interactive web application for the calculation of partial atomic charges via non-QM empirical charge calculation approaches and for the visualization of these charges. ACC II is composed of a frontend and a backend. The frontend is a modern web application written in JavaScript using the Bootstrap library. Its first function is to read the user input that consists of molecular structure(s) and computation settings (e.g. one of the charge calculation methods that are integrated into the backend). Its second purpose is to present the output, i.e. calculated charges. These charges are available as downloadable data files, and can also be visualized via the LiteMol viewer, which is part of the ACC II frontend. The backend is a Python Flask ap-  Table S1. plication. All the computations of charges are carried out by the core C++ application, which integrates 17 non-QM empirical charge calculation approaches. It also includes the implementation of cutoff and cover methods (23) for the fast solving of linear equation systems. All three parts of ACC II are available on GitHub under the MIT license: the frontend and backend are available at https://github.com/ krab1k/AtomicChargeCalculator2, while the core is available at https://github.com/krab1k/ChargeFW2.

Non-QM empirical charge calculation approaches in ACC II
ACC II integrates nine conformationally independent (2D) methods (PEOE (12) (34)). An overview of these methods is depicted in Figure 1, which also shows relationships between the methods (i.e. if two methods are connected by a line, the upper is a successor of the lower), their division into 2D and 3D approaches, and the year of their publication. The principles and theoretical basis of all these methods (including their quality criteria) are described in the Short description of the methods in the Supplementary data.

ACC II workflow
The procedure for using the ACC II application involves six steps: (i) uploading the structure(s), (ii) internal validation, (iii) selecting the non-QM empirical method and its parameter set, (iv) executing the selected method, (v) visualizing the computed charges, (vi) downloading the computed charges.
Nucleic Acids Research, 2020, Vol. 48, Web Server issue W593 (i) Uploading the structure(s). The first step is to upload the molecular structure for which the charges will be calculated. The structure can be provided in SDF, MOL2, PDB, or mmCIF file formats. ACC II is also able to accept more than one structure. In this case, the accepted formats are the same, but the input files have to be compressed into one input archive, which can be in zip or tar.gz format.
(ii) Internal validation. The input files are then validated. Specifically, ACC II tests whether the files are in one of the supported formats and whether they contain the necessary information for the description of a molecular structure (i.e. coordinates of the atoms, definition of bonds in the case of small molecules). If the input files do not pass the validation procedure, the user is informed about detected problems. The most common one is that the input file does not conform to the standard of a particular file format.
(iii) Selecting the non-QM empirical method and its parameter set. After reading the input molecule(s), ACC II first detects non-QM empirical methods which can be executed on the user's data. Specifically, each non-QM empirical approach has one or more parameter sets. Some non-QM empirical approaches use just the tabular values of physicochemical constants as parameters. But most of the methods require more complex parameter sets, also containing parameters for individual elements (e.g. if the parameter set is focused on proteins, it contains parameters for C, H, O, N, and S, but it can lack parameters for Cl, Br, I, Si, etc.). Such a method can only be executed on a molecule composed of elements that are parameterized in at least one parameter set belonging to the method. Note that the methods that do not need parameters for individual elements cover every molecule. If a set of molecules is provided by a user, they can only use those approaches which cover all of the input molecules. Please note that it makes no sense to use multiple parameter sets on a single set of input molecules, since the computed charges will not be comparable. Methods that can be used for the specific input data set are further denoted as 'applicable methods'.
The users have two ways of selecting a non-QM empirical charge calculation approach from the applicable methods: they can use the automatic setup via the 'Compute charges' button or select a method themselves by pressing the 'Setup computation' button.
If the users select the calculation method themselves, they can not only choose the method, but also its parameter set where applicable (e.g. if more than one parameter set was published). On the 'Computation settings' web page, each approach and each parameter set are supplemented by a reference to the publication in which it was described.
If the user prefers the automatic setup, ACC II selects the approach that was documented as being the most suited from the available methods. Details about the selection process of the most suited charge calculation method are described in the ACC II online documentation (https://acc2. ncbr.muni.cz/static/manual.pdf). If the approach has more parameter sets, the one that is the most suitable for the specific input molecules is selected (e.g. a parameter set specialized on drug-like molecules is used for small organic molecules). The list of parameter sets included in ACC II is available in the Supplementary Table S2.  ). An activator is marked with a blue oval, the C domain is marked with a green oval. The C domain of activated BAX is depolarized -it is mainly white or whitish in colour. This depolarization causes the C domain to be released and penetrate the mitochondrial membrane and initiate apoptosis. The partial atomic charges were calculated by EEM.
(iv) Executing the selected method. The selected charge calculation approach is executed on the backend for each input molecular structure. Each computation on the backend has two inputs: A user-provided molecular structure and the selected parameter set. If the approach integrates cutoff and cover methods for solving a linear equation system and the molecular structure has >20 000 atoms but <80 000 atoms, the cutoff method is utilised. If the number of atoms is equal to or higher than 80 000, the cover method is utilised.
(v) Visualizing the computed charges. The calculated charges can be presented to the user via the LiteMol viewer that is integrated into ACC II (specifically, in its 'Computation results' web page). Three visualization models can be used: balls and sticks model, cartoon model, and surface model. All three visualization models can be coloured using the values of the computed charges. In the balls and sticks model (see Figure 2), the balls are coloured directly according to the partial atomic charge values. In the cartoon model (see Figure 3), the helices, sheets and tubes between them are divided into regions that represent individual amino acids. Each region is coloured according to the sum of atomic charges that belongs to a particular amino acid. In the surface model (see Figure 4B), the surface is divided into parts that belong to individual surface atoms. Each part of the surface is then coloured according to the partial atomic charge of the atom it represents.
The colour scale spans from blue through white to red. Negative charges are red (the more negative the value of the charge, the more intense the colour) and positive charges are blue (the more positive the value of the charge, the more intense the colour). The closer the value of the charge is to zero, the closer its colour is to white. A user can select the relative colour scale that spans from the lowest to the highest charge value in the visualized structure, or an absolute colour scale that spans from a user-defined value to another user-defined value).
(vi) Downloading the computed charges. Partial atomic charges calculated using ACC II can be downloaded in PQR, MOL2, and plaintext file formats. ACC II provides one ZIP file containing charges for all input molecules in relevant output formats (PQR for proteins, MOL2 for small molecules, plaintext for both).

Limitations
ACC II currently has a few limitations: It includes only non-QM empirical charge calculation methods (not QM), which are fully automated (no hand-tuning required) and which methodology is sufficiently described in its publication. The size of the input file is limited to 10 MB. The cartoon visualization model is only available when the input file is in the PDB or mmCIF format (i.e. formats containing information about amino acids and other residues). Non-QM empirical approaches that require parameters for individual atoms can only process molecules for which at least one of their parameter sets covers these molecules. The cutoff and cover methods for the fast solving of linear equation systems are integrated into EEM, SFKEEM, QEq, SMP/QEq, EQeq, EQeq+C. Other methods that involve linear equation systems employ different schemes for which the cutoff and cover methods are not directly applicable. Due to a limitation of LiteMol, the computation result cannot be visualized when an input molecule was provided in an mmCIF file that lacks the ' atom sites' record.

RESULTS AND DISCUSSION
We provide three examples which demonstrate possible uses of the ACC II web application. The interactive form of these examples is presented on the ACC II webpage. Files with structures from these examples are available in the Supplementary data.

Example I -dissociating hydrogens from phenols
In the first example, we show a charge calculation for seven phenolic drug compounds (see Table 1), described in Drug-Bank. We obtained their structures from the PubChem database and calculated their partial atomic charges using ACC II (automatic setup). The results of the calculation are available on the ACC II web page. In this example, we would also like to provide a preview of charge utilization--an application in the field of acid dissociation study. Acid dissociation is a reaction in which a molecule releases a hydrogen atom. The ability of the molecule to release the hydrogen is described by its acid dissociation constant (K a ) and its negative logarithm (pK a ). The relation between the charge of the dissociating hydrogen and pK a is well known and often used for pK a prediction (6,7,35). In our example, we focused on this relation. The dissociating hydrogen is a part of the phenolic OH group. For each of our seven compounds, we obtained the pK a value (from (7)) and the charge on the phenolic hydrogen and summarized these values in Table 1. It can be seen from Table 1 that there is a clear dependence between pK a and the charge on the phenolic hydrogen. Specifically, the higher the pK a (dissociation requires higher pH), the lower charge the hydrogen has. More details about the relationship between pK a and charges in phenols and its application can be found here (7).

Example II -apoptotic protein activation
In the second example, we would like to show an application of charges in protein research. Specifically, we focus on the apoptotic protein BAX in its inactive and activated forms. We obtained the structures from Protein Data Bank (the inactive form has PDB ID 1f16 and the active PDB ID 2k7w) and calculated their partial atomic charges using ACC II (automatic setup). In this comparison, the BAX protein initiates apoptosis in the following way: its C domain (marked green in Figure 3) releases and penetrates the mitochondrial membrane (10). The release is enabled by an activator (Figure 3B, blue oval), which causes a redistribution of partial atomic charges, a depolarization of the C domain and discharge of electrostatic forces binding the domain (see Figure  3). Partial atomic charges provide us with a clue to understanding the BAX activation mechanism. This mechanism is described in detail in the article (10).

Example III -transmembrane protein
In the third example, we show a charge calculation for a large transmembrane protein -the nicotinic acetylcholine Nucleic Acids Research, 2020, Vol. 48, Web Server issue W595 receptor. This receptor passes the cell membrane (see Figure  4A) and serves as an ion channel (36). We obtained its structure from Protein Data Bank (PDB ID: 2bg9), added missing hydrogens via WHAT IF (37)) and calculated the partial atomic charges using ACC II (automatic setup). The visualization of partial charges on the surface (see Figure 4B) highlights the difference between the nonpolar transmembrane part (mostly white due to charge around zero) and the polar surface of the extracellular and cytoplasmic parts (with a mosaic of blue positive and red negative charges). The comparison demonstrates that this charge distribution agrees with the receptor membrane position reported in the literature (36).

CONCLUSION
In this article, we presented ACC II, a novel web application for the calculation of partial atomic charges using all the main non-QM empirical approaches and for all types of molecules including biomacromolecules. ACC II also allows the visualization of charges via three main charge visualization models. The web application is easy to use and is platform-independent. Viewing results and manipulations of them are fully interactive. All results of ACC II can be downloaded in various formats (PQR, MOL2, and plaintext format). Documentation explaining the methodology and examples is provided on the webpage of ACC II. ACC II is freely available at https://acc2.ncbr.muni.cz with no login requirement.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.