Abstract

Previous studies have shown the porphobilinogen synthase (PBGS) zinc-binding mechanism and its conservation among the living cells. However, the precise molecular interaction of zinc with the active center of the enzyme is unknown. In particular, quantum chemistry techniques within the density functional theory (DFT) framework have been the key methodology to describe metalloproteins, when one is looking for a compromise between accuracy and computational feasibility. Considering this, we used DFT-based models within the molecular fractionation with conjugate caps scheme to evaluate the binding energy features of zinc interacting with the human PBGS. Besides, phylogenetic and clustering analyses were successfully employed in extracting useful information from protein sequences to identify groups of conserved residues that build the ions-binding site. Our results also report a conservative assessment of the relevant amino acids, as well as the benchmark analysis of the calculation models used. The most relevant intermolecular interactions in Zn2+–PBGS are due to the amino acids CYS0122, CYS0124, CYS0132, ASP0169, SER0168, ARG0221, HIS0131, ASP0120, GLY0133, VAL0121, ARG0209, and ARG0174. Among these residues, we highlighted ASP0120, GLY0133, HIS0131, SER0168, and ARG0209 by co-occurring in all clusters generated by unsupervised clustering analysis. On the other hand, the triple cysteines at 2.5 Å from zinc (CYS0122, CYS0124, and CYS0132) have the highest energy attraction and are absent in the taxa Viridiplantae, Sar, Rhodophyta, and some Bacteria. Additionally, the performance of the DFT-based models shows that the processing time-dependence is more associated with the choice of the basis set than the exchange–correlation functional.

Investigation of the catalytic-zinc binding mechanism of human porphobilinogen synthase (PBGS) through quantum chemistry techniques within the density functional theory (DFT) and Clustering and Phylogenetic analysis.
Graphical Abstract

Investigation of the catalytic-zinc binding mechanism of human porphobilinogen synthase (PBGS) through quantum chemistry techniques within the density functional theory (DFT) and Clustering and Phylogenetic analysis.

Significance to Metallomics

During the synthesis of porphobilinogen (PBG), the human porphobilinogen synthase (PBGS) enzyme requires zinc to play the chemical reaction. The amino acids closest to the metal provide the ideal environment for maintenance by enabling zinc to coordinate the asymmetric condensation between two chains of aminolevulinic acid. Previous investigations faced great difficulty to get reliable information about the relationship between zinc and the proteins at a low computational cost. This work reveals the variable energy levels between the PBGS residues and zinc ion. Structural, phylogenetic, and clustering analyses are also reported, as well as the performance of the density functional theory-based models used.

Introduction

Porphobilinogen synthase (PBGS; E.C. 4.2.1.24) is a protein indispensable for nearly all cellular life because it is the only enzyme able to synthesize porphobilinogen (PBG), a precursor of tetrapyrroles (e.g. porphyrins, corrins, chlorins), and an important molecule for the respiration, photosynthesis, and methanogenesis.1–3 Belonging to the Δ-aminolevulinic acid dehydratase (ALAD) family, PBGS is an enzyme highly conserved in both sequences (>35% for any two species and structure), being adapted to diverse niches and cell locations (e.g. cytosol, chloroplast, and apicoplast).2,4 The sequence of each human PBGS monomer is composed of 330 amino acids, which are part of two main domains: the αβ-barrel domain, with ∼300 residues, comprising the enzyme active site, and the N-terminal “arm” domain, with about 25 amino acids involved with multimeric interactions.2 The abnormal functioning of PBGS can lead to a profound loss in heme biosynthesis, resulting in porphyria diseases.5 Also, PBGS is considered a lead exposure biomarker6 capable of holding up to 99% of heavy metal in the bloodstream in cases of poisoning. The retention occurs through the replacement of zinc within the catalytic pocket.7 Two distinct zinc-binding sites have been described in mammalian PBGS: one being composed of histidine and cysteine amino acids (non-essential Zn), and the other one consisting mainly of three cysteine amino acids (catalytic Zn).8–10 The catalytic mechanism to form PBG is well established, occurring through the asymmetric condensation between two chains of α-aminolevulinic acid (ALA). Besides, the ε-amino groups of the lysines within the hydrophobic pocket (LYS0199 and LYS0252 in human PBGS) form a Schiff base (—CH = N—) with the C4 of the ALA molecules, originating, respectively, the side chain A and the side chain P (propionic) of PBG.11 While it happens, the thiolate groups (S–) from three cysteine residues (CYS0122, CYS0124, and CYS0132) bind to catalytic zinc, a water molecule, and P-ALA chain, providing a suitable geometry to the catalytic reaction12 (see Fig. 1).

Crystal structure representation of PBGS zinc-binding site complex and synthesis of PBG. (A) Conformation of human PBGS (PDB entry: 5HNR),12 with N and C representing the amino and carboxyl terminal regions; (B) tetrahedral zinc coordination formed by thiolate groups from CYS0122, CYS0124, and CYS0132, and a water molecule (H2O). Also shown a schematic illustration of the asymmetric condensation of two molecules of α-aminolevulinic acid (A-ALA and P-ALA) to form PBG.
Fig. 1

Crystal structure representation of PBGS zinc-binding site complex and synthesis of PBG. (A) Conformation of human PBGS (PDB entry: 5HNR),12 with N and C representing the amino and carboxyl terminal regions; (B) tetrahedral zinc coordination formed by thiolate groups from CYS0122, CYS0124, and CYS0132, and a water molecule (H2O). Also shown a schematic illustration of the asymmetric condensation of two molecules of α-aminolevulinic acid (A-ALA and P-ALA) to form PBG.

Among the various ions discovered in proteins, zinc (Zn2+) is the second most abundant in the human body serving an essential biological function.13 Bioinformatic analysis of the human genome suggests that up to 3000 proteins participate in zinc binding, which correspond to about 10% of all encoded proteins.14 For instance, zinc-dependent metalloproteins have been widely investigated as potential drug targets in several diseases ranging from cardiovascular disorders to cancers.13 The principles governing metal–protein complexes depend upon the physical and chemical properties of the metal ion (e.g. its charge and size) and the protein environment.13,15 Curiously, the plastic coordination of Zn2+ yields a varied biological use, which allows zinc to be found in all recognized classes of enzymes.2,16,17 In the protein environment, zinc can be accommodated by harder donors as oxygen and nitrogen, and also softer donors as sulfur.18 In this sense, some authors suggest that zinc plays an important physiological role in the protection of thiols groups against irreversible oxidation.18–20

Metalloproteins are challenging objects regarding the investigation of their chemical reactivity and metal–protein relationships that control the biological function using theoretical approaches. Clearly, adequate information about the energetics of protein–ion interactions is required for an understanding of these systems. Given that a significant component of the free energy of ion complexation involves both polarization and charge transfer effects, proper estimation of the interaction energies among the ion and the relevant ligands can be derived from quantum mechanical (QM) calculations.21,22 Knowing that the complexity of these biomolecules often requires a compromise between accuracy and feasibility, DFT has been widely employed to study metalloproteins with many atoms unable to be described by wave-function-based methods.23 For instance, many enzymatic complexes containing transition metal active sites, such as cytochrome P450, glycine N-methyl transferase, and deoxyribonucleotidase, have indeed been addressed and solved by using DFT approaches.24,25 Fragment-based approaches, including the molecular fractionation with conjugate caps (MFCC), make possible to study a large number of amino acids with low computational cost, enabling working on a linear processing scale while preserving the precise molecular description.26,27 Recently, numerical studies were carried out to check the performance of a new metal-MFCC approach, and the calculated potential energies and atomic forces of a series of metalloproteins with different metal ions yielding excellent agreement with the full system calculations at the M06-2X level.22

Additionally, there is a variety of in silico approaches such as phylogenetics28 and clustering analysis to extract useful biological information from sequences.29 Recently, Ibuot and collaborators made use of these tools to identify groups of conserved residues that build cation diffusion facilitator transporters from algae. Many metal ion transporter genes may be attractive targets for future applications of metal content biological engineering in plants or microorganisms.30

Taking into account that evolutionary and QM techniques can provide a more comprehensive understanding of the zinc-binding relationship in human PBGS, we intend here to report the intermolecular energy between catalytic zinc divalent metal ion and the amino acids of human PBGS using the MFCC scheme within a DFT framework, as well as the conservation of the most relevant residues. Our main goals are

  • to depict the energetic features that form the PBGS zinc-binding site,

  • to show how the results from QM and molecular evolution methods can be helpful for a better understanding of the protein–ligand binding mechanisms, and

  • to report the performance of the DFT-based methods employed in this work.

Materials and methods

Zn2+–PGBS complex data

To perform the in silico simulations, we used the X-ray crystallographic data of the human zinc-dependent metalloprotein (PDB ID: 5HNR) at a resolution of 2.83 Å.12 The crystal asymmetric unit contains two enzymes (A and B), both exhibiting differences in shape and amino acid composition, with the enzyme B showing a higher number of missing residues, while considering the enzyme A as our pattern of study. Afterward, we obtained the protonation state of the protein amino acids set up at physiological pH using the PROPKA 3.1 package.31 We withdrew the δ-amino valeric acid from crystal structure and added missing hydrogen atoms for amino acids according to the protonation study, as well as for water molecules. Then, heavy atoms were constrained, and hydrogen atoms were submitted to a classical geometry optimization using the Chemistry at Harvard Macromolecular Mechanics (CHARMm) force field, which has parameters specifically for organomolecules.32 We set the convergence tolerances to 10−5 kcal/mol (total energy variation), 0.001 kcal/mol Å (root mean square gradient), and 10−5 Å (maximum atomic displacement).

Molecular fractionation with conjugated caps

The interaction energy between the Zn2+ (ligand) and the PBGS protein (enzyme) was obtained throughout quantum mechanical calculations (see below) within the MFCC scheme. Proposed by Zhang and Zhang in 2003,33 the MFCC scheme has been widely used to calculate the interaction energy between protein–ligand and protein–protein complexes.26,34,35

Based on this scheme, a protein is decomposed into individual amino acid fragments by cutting it through its peptide bonds. To preserve the local chemical environment and comply with the valence requirements, a pair of conjugate caps is designed to saturate each fragment, and hydrogen atoms are added into the molecular caps to avoid dangling bonds. Here, we labeled the ligand as Zn, and the residue interacting with it as Ri, with i denoting the index of the ith amino acid residue. The cap C1−i (C1−i) is formed by the neighbor residue covalently bound to the amine (carboxyl) group of residue Ri along the protein chain, providing a better description of its electronic environment. In this way, the interaction energy between zinc and each amino acid residue of PGBS active site, IEZnRi, is calculated by
(1)
where the term E(Zn + C1−iRiC1+i) is the total energy of the system comprised by the zinc ion, the reference residue, and their respective caps. To single out the Zn2+–residue interaction energy, we subtracted from the first step the capped residue energy, E(C1−iRiC1+i), and the energy of a system formed by the Zn+2 ion and the caps, E(Zn + C1−iC1+i). Finally, notice that we removed the interaction energy between the caps twice when we subtracted the two terms above; to correct this, the E(C1−iC1+i) term was added.

Quantum calculations

After Zn2+–PBGS fragmentation, the structures generated are further submitted to QM calculations within the DFT framework by using the Gaussian 09 package,36 in a dual X5690, 3.47 GHz (24 nucleus), as well as designating seven cores to each MFCC subsystem calculation. The interaction energies of a system in terms of its electronic density of state, and the exchange–correlation electronic potential are the utmost idea of the DFT approach. So far, various approximations have been proposed leading to a bewildering variety of functionals that have been developed and used in computational studies. From the analysis of comparative studies of the interaction energies between Zn2+ and biologically relevant ligands using both classical and quantum mechanics methods,16,21 we selected three most used exchange–correlation functionals, namely B3LYP,37 B98,38 and M06.39

The electronic density of state of a molecular system is commonly represented by a Slater determinant of molecular orbitals, which, in turn, are constructed as a linear combination of atom-centered basis functions, also known as the basis set. To accurately describe the polarization of chemical bonds, additional polarization functions [the “P” in def2-SV(P), or the “*” in 6-311G*] are necessary. Hydrogen bonding requires polarization functions to be present on hydrogen atoms as well (the “P” in def2-TZVP, or the second “*” in 6-311G**). To adequately model the anions, it can become necessary to additionally use diffuse functions, which are better at describing the density in regions remote from the nucleus (the “D” in def2-TZVPD, or the “+” in 6-311+G*). Given that the size of the basis set represents a compromise between accuracy and computational overhead, three models are considered for the expansion of the electronic orbitals in this study: the cc-pVTZ of triple-zeta quality,40 the 6-311+G(d,p) that is triple-zeta quality in the valence space with double-zeta-quality polarization functions,39 and the LanL2DZ, a pseudo-potential of double zeta (DZ) quality and the overall combination of effective core potentials (ECPs) and valence basis set.41

Since fragmentation schemes withdrew the effect of surrounding amino acids and water molecules, two increasing values of the dielectric constant (ε = 10 and 40) were included by conductor-like polarizable continuum model42 to mimic the influence of the electrostatic environment surrounding the amino acid–ligand complex to improve the calculations.26

Finally, in order to investigate the total binding energy using the quantum chemistry approach based on fragments, one should consider all residues that interact appreciably with the Zn2+. To achieve this goal, we progressively considered more and more residues, until further addition of fragments yielded little change to the energy. We considered amino acid residues located within an imaginary sphere of radius r, centered at the Zn2+. The binding pocket radius r achieves convergence when the energy variation is smaller than 10% in more than three subsequent measurements.43

Evolutionary analyses

To gain insight into the conservation and diversity of the main amino acid residues of PBGS zinc-binding site, we performed a clustering analysis in all porphobilinogen synthase sequences available at the National Center for Biotechnology Information (NCBI) protein data bank. First, we searched for “aminolevulinic dehydratase” into the RefSeq database, returning 22443 entries in the GenPept format (downloaded on 25 May 2020). We succeeded in data cleaning and formatting to fasta using custom python 3.7 scripts. The final fasta file contained 13678 sequences. All codes are available at github.com/diegogotex/filogenia_ALAD. Next, we aligned all sequences using the MAFFT v7.3974444 software with default parameters. The alignment file was processed by TrimAL v.1.245,45 using the gappyout method in order to remove most gappy fraction of the columns from the alignment. Then, we used the CIAlign v.1.0.4 (github.com/schnamo/CIAlign) to remove sequences shorter than 75% of the total alignment positions. The final alignment contained 13576 sequences and 268 positions.

The phylogeny was inferred using the FastTree software,46 which calculates the support values for each split in the tree using the Shimodaira–Hasegawa test47 on the three alternate topologies. The tree was built under the general amino acid replacement matrix LG.48 Since the amount of sequences from bacteria represents 92% of the whole dataset, we inferred a second phylogeny where we kept all sequences from different taxa, and we only include 250 sequences from each different cluster, which were identified according to the zinc-binding domain. For the clustering analysis, we selected only the columns in the alignment matrix related to most relevant interaction residues involving the zinc atom with the metalloprotein.

Both alignments were imported into R49 and the pairwise distance between the sequences was calculated using the Fitch algorithm,50 implemented by the dist.alignment method in seqniR package.51 The distance matrices were then used as an input to identify groups by a non-supervised algorithm providers using an HCPC function available in the FactorMineR package52 and an automatic cluster detection. Finally, the sequences were separated according to their cluster and submitted to WebLogo3 software53 in order to better observe the residue conservation.

Results and discussion

Performance of the DFT-based methods

It is not easy to reproduce the correct spin multiplicity and estimate the possible charge delocalization that results from the chemical nature of metal complexes,54 which makes it hard to work in silico with metalloproteins. Therefore, the choice of the DFT functional and basis set is an essential step to provide the appropriate description of its molecular features. In the last two decades, a number of papers have been published showing the interaction energy between biologically relevant metals and amino acids or organic compounds that mimic biomolecules,55,56 but only a few of them compared the results obtained with different calculation method (see Ahlstrand et al.16,21 to cite just a few). Besides, there are no studies presenting the interaction energy and processing time of metal–protein through fragmentation methods for different DFT functionals and basis sets. Thus, we first intended to provide a comparison among three of the most employed functionals and basis sets used in the metalloprotein systems in the description of zinc–amino acid interaction energy aiming to present their feasibility. In this sense, we briefly reported the processing time of the calculations performed with the functionals B3LYP, M06, and B98; the basis sets 6-311+G(d,p), cc-pVTZ, and LanL2DZ; and two values of dielectric constant (ε = 10 and 40) within the MFCC scheme.

Taking all combinations into account, we obtained 1620 systems containing from 120 to 230 atoms, which are composed by each one of the 45 amino acids studied here (see Fig. 1), organized in the four subsystems depicted in the MFCC scheme (see Eq. 1), and nine simulation groups generated by using combinations of the functionals and the basis sets (see Tables S1–S9), i.e. the 1620 systems are a combination of 45 amino acids organized in four subsystems with interaction energy calculated using nine functional/basis set combinations (45 amino acids × 4 subsystems × 9 functional/basis set combinations = 1620 systems, where “×” represents the multiplication symbol). The data presented in the graphic panel (Fig. 2) explain the behavior concerning the computational experiments accomplished in this work. It should be mentioned that both dielectric constant values were considered in Fig. 2, i.e. the processing time obtained for each Zn+2-residue pair is also a sum of the time spent for the calculations using ε = 10 and ε = 40. At first glance, the graphic points to a proportional relationship between the processing time (the blue, orange, and green shadow areas at the bottom) and the number of atoms belonging to the systems (the black dots in the upper region of the figure), with almost no qualitative difference among the curves. By focusing on the exchange–correlation functional, one can see that the processing time is quite similar when the same basis set is applied. However, comparing the processing time for each individual Zn+2–residue pair, it can be seen that the Zn+2–ARG0221 pair showed the highest processing time variation of all systems, when we considered B3LYP and B98 functionals and 6-311+G(d,p) basis set (76999.3 s), and all other values were very lower than that, while the lowest difference was observed for the residue GLY0133 when the functionals B98 and M06, and LanL2DZ basis set (1588 s) were compared. It is important to reinforce that the processing time shown in Fig. 2 is a sum of the time spent to perform each one of the four steps of the MFCC plus two dielectric constants. Besides, this high difference (76999.3 s) only occurs in one Zn+2–residue pair and all other values were very lower than that. On the other hand, it is observed that calculations performed with LanL2DZ presented the lowest computational cost when compared to 6-311+G(d,p) and cc-pVTZ, respectively, independently of the DFT functional chosen, showing that the processing time was more sensitive to the basis set than the functional choice.

Graphic panel showing the 45 residues of the Zn2+–PBGS system. We have also depicted the processing time obtained from the binding interaction energies considering the three exchange–correlation functional (B3LYP, B98, M06) and basis sets [cc-pVTZ, 6-311+G(d,p), LanL2DZ]. The number of atoms is also shown.
Fig. 2

Graphic panel showing the 45 residues of the Zn2+–PBGS system. We have also depicted the processing time obtained from the binding interaction energies considering the three exchange–correlation functional (B3LYP, B98, M06) and basis sets [cc-pVTZ, 6-311+G(d,p), LanL2DZ]. The number of atoms is also shown.

As can be observed, the cc-pVTZ basis set was much more expensive, attracting some attention because it needed more computer time than the LanL2DZ and 6-311+G(d,p) ones. It was expected since the triple-zeta basis set cc-pVTZ is much larger than those of LanL2DZ and 6-311+G(d,p). Also, cc-pVTZ contains native polarization terms that provide the best electronic description among the basis sets used in this work. On the other hand, the ECP presented in the LanL2DZ pseudo-potential reduces the number of functions allowing a small computational cost.

In fact, calculations performed with LanL2DZ were compared with cc-pVTZ and 6-311+G(d,p) and showed similar results for geometry and energy features.57,58 In this sense, the LanL2DZ basis set has been extensively used in geometry optimization59 and single-point studies,60 as well as UV–vis absorption spectra61 and reactivity descriptors calculations62 of the metal complexes. Recently, Strodel and Coskuner-Weber summarized several studies that used LanL2DZ basis set for studying the binding between transition metal ions and disordered amyloid-β peptides in the pathogenesis of Alzheimer's disease, specifically resulting in impacts on the structures, thermodynamic properties, and aggregation propensities of monomeric and oligomeric Aβ in aqueous solution.63

Thus, the time consumed to perform the quantum-based MFCC through the calculation methods applied here is remarkably dependent on the basis set choice, with LanL2DZ showing the smaller processing time, obeying the following rank: B3LYP/LanL2DZ > B98/LanL2DZ > M06/LanL2DZ > B3LYP/6-311+G(d,p) > M06/6-311+G(d,p) > B98/6-311+G(d,p) > M06/cc-pVTZ > B3LYP/cc-pVTZ > B98/cc-pVTZ. Small differences can be seen when functionals are changed.

Energetic features of PBGS zinc-binding site

To take into account a full scenario, the quality and compliance of the results should be considered. Therefore, we also compared the energy feature of the systems studied, looking for the similarities and differences among the calculation methods applied in this work, but unfortunately, there is no experimental study to base our analysis. Thus, our main goal was to focus on the qualitative behavior of the outcomes. If one intends to understand the binding mechanism within a pocket of a protein, it is often not enough to consider only the residues nearest to the ligand. Then, to obtain the reliable stability in the total interaction energy, we performed a search for an optimal binding site radius, adding the individual interaction energies of those residues within a pocket radius r centered at the zinc, ranging from 2.5 to 10.5 Å, with a step increment of 0.5 Å.

We depicted in Figs 3 and 4 the total interaction energy between the zinc ion and the PBGS residues, taking into account two different scenarios. First, we showed the interaction energy profiles of the Zn2+–PBGS complex, considering the B98, B3LYP, and M06 functionals and the three basis sets for ε = 40 (see Fig. 3). From this analysis, we intended to observe whether by changing the combination between functional and basis set there would be any significant modification in the energy feature. Overall, one can observe that the qualitative behavior of the total interaction energy is quite similar among the calculation methods, with one peak in 5.5 Å and the other one (smaller) in 8.5 Å.

Binding interaction energy profiles of the metalloprotein Zn2+–PBGS for all three basis sets considering a dielectric function ε = 40 and the following hybrid functionals: (A) B98; (B) B3LYP; and (C) M06.
Fig. 3

Binding interaction energy profiles of the metalloprotein Zn2+–PBGS for all three basis sets considering a dielectric function ε = 40 and the following hybrid functionals: (A) B98; (B) B3LYP; and (C) M06.

Binding interaction energy profiles of the Zn2+–PBGS considering the B98 hybrid functional and LanL2DZ basis set. Different values of the dielectric constant ε were taken into account. Amino acid residues responsible for the regions of steepest negative (yellow and red) and positive (blue) variation are highlighted.
Fig. 4

Binding interaction energy profiles of the Zn2+–PBGS considering the B98 hybrid functional and LanL2DZ basis set. Different values of the dielectric constant ε were taken into account. Amino acid residues responsible for the regions of steepest negative (yellow and red) and positive (blue) variation are highlighted.

As one can see, the results obtained for LanL2DZ basis set presented the weakest energy values for all functionals followed by 6-311+G(d,p) and cc-pVTZ, respectively, which is equal to the order of processing time. On the other hand, combinations with B98 (Fig. 3A) functional showed the weakest interaction energy values [−149.37, −163.78, and −173.76 kcal/mol for LanL2DZ, 6-311+G(d,p), and cc-pVTZ, respectively] independently of the basis set chosen, except when it was in combination with cc-pVTZ (B98/cc-pVTZ), while B3LYP (Fig. 3B) functional presented the highest total interaction energy values (−158.65, −175.01, and −186.51 kcal/mol for LanL2DZ, 6-311+G(d,p), and cc-pVTZ, respectively). Besides, the energetic results obtained through the functionals B98 and B3LYP are strongly dependent on the basis sets, as one can see from the difference of −9.98 and −11.50 kcal/mol between B98/6-311+G(d,p) and B98/cc-pVTZ, and B3LYP/6-311 and B3LYP/cc-pVTZ, respectively. On the other hand, the energy values found for the M06 hybrid functional (Fig. 3C) are quite close for cc-pVTZ and 6-311+G(d,p) basis sets.

Since the energy feature was quite similar among all the combinations, with B98/LanL2DZ showing the lowest total interaction energy, and this combination was the second lowest processing time, B98 functional and LanL2DZ basis set was chosen as representative to depict the relevant interactions between Zn2+ and PBGS residues (all energy values are in the Supplementary material). It is not a random choice, but also consider previous data. Using MP2 and four DFT functionals, namely B3LYP, B98, TPSSh, and M06, Ahlstrand et al. calculated interaction energies and potential energy for small complexes of Zn2+ and amino acid mimics (acetate, methanethiolate, and imidazole) or water.16 They showed that B98 functional performs quite well to calculate the interaction energies between metal ions (Zn2+ and Cd2+) and biologically relevant ligands. Moreover, despite that meta and hybrid DFT calculations overestimated the magnitude of the polarization energy, a comparison between B3LYP and M06 reveals that the latter functional overestimates the polarization energy less than the former.

Fragmentation methods blur the surrounding effect over the zinc-residue quantum calculations. Therefore, different values of the dielectric constant have been used to introduce the polarization of the surrounding, reproducing the protein environment.26,34,35 Since metalloproteins are sensitive to variation in dielectric constant values,64 we have investigated what effect the choice of the dielectric constant has on the energetics, considering two different values, namely ε = 10 and ε = 40. Figure 4 presents a small linear difference of the binding energy profiles. Furthermore, it can be seen that the behavior of the energy curves is quite similar for different values of ε. It has been previously shown that the exact choice of the dielectric constant is less critical in the quantum chemical modeling of enzyme active sites, since the energy values will change, but the mechanism of the enzyme under consideration is quite similar.65 Thus, from now on, the results are only presented for ε = 40.

As we can see in Fig. 4, all total binding energy curves present a similar behavior, although the intensity of the energies varies according to the decrease in the dielectric constant. At first, the binding energy has an increase (r = 2.5 Å) due to the cysteine residues (CYS0122, CYS0124, and CYS0132) closer to the zinc. As r increases, attractions (repulsions) are caused by the residues highlighted in red (blue). Slight increases in the total binding energy are mainly due to SER0168, ASP0169 (r = 4.0 Å), and ASP0120 (r = 6.0 Å) and small depletions are caused by ARG0221, HIS0131 (r = 5.5 Å), ARG0209 (r = 7.0 Å), and ARG0174 (r = 8.5 Å). Stability of the systems is achieved at r = 8.5 Å (−221.28 and −149.04 kcal/mol for ε = 10 and ε = 40, respectively), and small shifts are seen for 8.5 Å < r < 10.5 Å.

Before considering the individual binding energy results, we should remark that the levels obtained here depend strongly on the value used for the dielectric constant. Although some authors found the value (ε = 40) as the best fit within an approach designed to predict the stability of the proteins, the same authors also suggested testing different values to improve the agreement between calculations and observations.34,35,66 Then, to infer the most relevant residues around the zinc ion, which contribute the most to the zinc–PBGS binding, we chose the B98 hybrid functional combined with LanL2DZ basis set for ε = 10 and ε = 40.

In the stable binding site radius of 10.5 Å, we have identified 45 residues of the PBGS metalloprotein exhibiting different interaction energies. The majority of the residues that show the smallest energy values (PHE0036, LEU0077, PHE0079, GLY0080, GLY0091, ALA0094, LEU0123, PRO125, GLY0130, LEU0134, PHE0141, LEU0150, ALA0166, PRO0167, MET0170, MET0171, PHE0208, ALA0212, and PRO0216) have low water affinity (non-polar), playing an essential structural role to form the hydrophobic environment of the protein. The PBGS active site includes the formation of a Schiff base, which requires a highly basic active site environment and deprotonated lysine amino groups.2 Accordingly, we set a neutral profile for LYS0199 (r = 5.0 Å) and LYS0252 (r = 7.5 Å) to perform the quantum calculations (see Table S1).

After performing the energy calculations, we highlight and compare the main interactions (in kcal/mol units) of the Zn2+–PBGS complex (see graphic panel in Fig. 5). The most important residues at the binding pocket that interact with the zinc atom in each radius are CYS0124 > CYS0132 > CYS0122 > ASP0120 > ASP0169 > SER0168 > GLY0133. On the other hand, the presence of zinc played repulsive interaction for ARG0221 > ARG0209 > ARG0174 > VAL0121 > HIS0131.

Graphic panel showing the most relevant residues that contribute to the Zn2+–PBGS complex, as well as the radius (in Å) where the residues are located.
Fig. 5

Graphic panel showing the most relevant residues that contribute to the Zn2+–PBGS complex, as well as the radius (in Å) where the residues are located.

Through in vitro-induced mutations, Jaffe and co-workers evaluated the substitution of the cysteine (CYS0122, CYS0124, and CYS0132) amino acids by alanine in the PBGS active site and observed a significant decrease in the enzymatic activity, verifying an important catalytic role for these residues.67 Afterward, the mutation C132R was reported in humans by Akagi et al. in 2005, who observed that the arginine residue induced erythrocyte PBGS activity <10% of the normal.8 The mutant residue is bulkier than cysteine, and its positively charged profile could hamper the polarization of environment needed for zinc binding.

According to Poole, among the 20 common amino acids, perhaps the most intriguing and functionally diverse is cysteine, one of the two residues containing a sulfur atom. Its thiol (or “sulfhydryl”) group is ionizable, which allows deprotonation generating a negatively charged thiolate group, boosting its reactivity.68 Once this thiol group is vulnerable to alkylation by electrophiles and oxidation by reactive oxygen and nitrogen species, it can lead to posttranslationally modified forms exhibiting significantly altered functions. In this sense, the cysteine residues play a physiological response for pH changes, and consequently, in the human PBGS activity.

In Fig. 5, we can see that the cysteine amino acid residues inside r = 2.5 Å show the strongest interaction energies (all values in kcal/mol for ε = 10 and ε = 40, respectively): CYS0122 (−66.85; −43.61), CYS0124 (−89.35; −62.27), and CYS0132 (−71.53; −46.59). The thiolate groups (S–) from deprotonated cysteines coordinate a high thiolate(S–)–zinc(Zn2+) electrostatic interaction (ion–ion) at 2.36 Å from zinc (see Fig. 6A). After discussing the great relevance of the thiolate groups from cysteine residues in the total binding energy, due to their high interaction energy (see Figs 3 and 4), now we can investigate how the Zn2+ influences within the protein environment focusing on the relevant amino acids identified through our quantum calculations.

Schematic illustration of the most important amino acids involved in the binding of PBGS with the Zn2+. The distance (in Å) in each structure indicates the coordinates along with a potential energy scan that was performed, and is indicated by colored dashed lines, pointing the type of interaction. (A) Attractive ion–ion interactions obtained by triple cysteines (CYS0122, CYS0124, and CYS0132) and Aspartates (ASP0120 and ASP0169) represented by yellow (strongest) and red colors, respectively; (B) repulsive ion–ion interactions obtained by ARG0209, ARG0221, and ARG0174 (blue color); and (C) attractive (GLY0133 and SER0168) and repulsive (HIS0131 and VAL0131) ion–dipole interaction represented by orange and green colors, respectively.
Fig. 6

Schematic illustration of the most important amino acids involved in the binding of PBGS with the Zn2+. The distance (in Å) in each structure indicates the coordinates along with a potential energy scan that was performed, and is indicated by colored dashed lines, pointing the type of interaction. (A) Attractive ion–ion interactions obtained by triple cysteines (CYS0122, CYS0124, and CYS0132) and Aspartates (ASP0120 and ASP0169) represented by yellow (strongest) and red colors, respectively; (B) repulsive ion–ion interactions obtained by ARG0209, ARG0221, and ARG0174 (blue color); and (C) attractive (GLY0133 and SER0168) and repulsive (HIS0131 and VAL0131) ion–dipole interaction represented by orange and green colors, respectively.

In order of magnitude, the following amino acids presented relevant influences on the interaction energy curves: ASP0120 and ASP0169 (ARG0221, ARG0209, and ARG0174) negatively (positively) charged (details in Figs 4 and 5). In fact, one can note that the ASP0169 (r = 4.0 Å) and ASP0120 (r = 6.0 Å) residues are responsible for the largest attractive contributions due to an important electrostatic interaction between Zn2+ and the ionized guanidine group (all energies in kcal/mol): −10.08; −2.62 and −10.58; and −2.79, respectively (see Fig. 6A). On the other hand, the charged arginines (ARG0221, ARG0209, and ARG0174) presented ionic repulsion from Zn2+ through their high-polar guanidinium group (CH5N3) (see Fig. 6B). ARG0221 (9.98; 2.46) is mainly responsible for the repulsion observed in r = 5.5 Å. The metal also interacts with ARG0209 (r = 7.0 Å) and ARG0174 (r = 8.5 Å) with a pronounced positive (repulsive) energy value of 7.32; 1.73 and 7.01; and 1.76, respectively. In the PBGS environment, amino acids can interact with the substrate (ALA) to provide the ideal alignment and consequently the enzymatic reaction. Arginines can be one of these essential amino acids coordinating the hydroxyl group of A-ALA chain to assume an ideal position favoring the asymmetric condensation between A-ALA and P-ALA.2,69

Figure 6C highlights the amino acids SER0168 (r = 4.0 Å), HIS0131 (r = 5.5 Å), GLY0133 (r = 6.5 Å), and VAL0121 (r = 6.5 Å). The side chain and oxygen backbone atoms (oxygen and nitrogen oxygen backbone atoms) of SER0168 (GLY0133) are the most relevant atoms to the attractive interactions with Zn2+, which are −2.48; −0.95 (1.69; −0.76). According to Akagi et al., PBGS structure with a deleterious mutation in site 133, such as G133R substitution, had significantly decreased enzyme activity.70 This residue is located in the coil region after CYS0132 and its side chain is composed by only one hydrogen playing possible a good flexibility of the zinc-binding site. Finally, HIS0131 and VAL0121 residues are buried in the PBGS core, a hydrophobic region with low solvent accessibility. The electron density distribution in the Zn-binding site can be observed in Fig. S1, where a high (low) electron density is represented in red (blue) color on the electrostatic potential isosurface, with color scales given at the right side. One can see the small positive-polarization points from hydrogen atoms in the electronic density of HIS0131 and VAL0121, leading to the small repulsive energy values obtained: 2.26; 1.60 and 2.69; and 2.13, respectively.

Molecular evolution analysis

In this section, we inspect the evolutionary conservation of the relevant residues from our simulations to better comprehend their presence in the PBGS structure.

As ancient and ubiquitous molecules, the amino acids have been required by evolution for a variety of purposes in living cells.71 To make use of the zinc plasticity, specific PGBS amino acids have been selected over time, building the zinc-binding site to coordinate the PBG synthesis.2,9 Previous studies have succeeded in applying clustering methods over amino acid sequences to identify groups of conserved residues into cation-binding domain regions.30 Here, we used a PCA followed by an unsupervised clustering approach to identify groups of conserved residues into all the PGBS residues involved in the zinc-binding pointed out in Fig. 6 (ASP0120, VAL0121, CYS0122, CYS0124, HIS0131, CYS0132, GLY0133, SER0168, ASP0169, ARG0174, ARG0209, and ARG0221) (see Fig. 7A).

Clustering analysis of the most energetically relevant residues to the zinc binding. (A) PCA representation of each PBGS sequence analyzed in the present study, clustered by the distance of the alignment region related to the zinc-binding relevant residues, according to the human PBGS coordinates (PDB: 5HNR). The dimensions 1 and 2, and Dim1 and Dim2, respectively, are related to 82.8% of the variation of the data. An unsupervised clustering analysis was applied over the PCA coordinates to identify four clusters of residues. (B) Logo visualization of the 12 relevant residues located at the position of zinc-binding relevant residues for human PBGS, for each of the four different clusters. (C) Phylogenetic analysis of all PBGS sequences available in NCBI protein database. The tree was constructed using the maximum likelihood method under LG model from the alignment of full-length amino acid sequence. The leaves in the phylogenetic tree are colored according to the following taxa: Amoebozoa, Apusozoa, Archeae, Bacteria, Cryptophycaea, Opisthokontha, Rhodophyta, Sar, and Viridiplantae. (D) Same topology of the phylogenetic tree as shown in panel C, but with all leaves colored according to the cluster to which the residues of the sequence belonged in panel B.
Fig. 7

Clustering analysis of the most energetically relevant residues to the zinc binding. (A) PCA representation of each PBGS sequence analyzed in the present study, clustered by the distance of the alignment region related to the zinc-binding relevant residues, according to the human PBGS coordinates (PDB: 5HNR). The dimensions 1 and 2, and Dim1 and Dim2, respectively, are related to 82.8% of the variation of the data. An unsupervised clustering analysis was applied over the PCA coordinates to identify four clusters of residues. (B) Logo visualization of the 12 relevant residues located at the position of zinc-binding relevant residues for human PBGS, for each of the four different clusters. (C) Phylogenetic analysis of all PBGS sequences available in NCBI protein database. The tree was constructed using the maximum likelihood method under LG model from the alignment of full-length amino acid sequence. The leaves in the phylogenetic tree are colored according to the following taxa: Amoebozoa, Apusozoa, Archeae, Bacteria, Cryptophycaea, Opisthokontha, Rhodophyta, Sar, and Viridiplantae. (D) Same topology of the phylogenetic tree as shown in panel C, but with all leaves colored according to the cluster to which the residues of the sequence belonged in panel B.

The analysis returned four clusters of amino acids (see Fig. 7B), where we can see the high conservation of the ASP0120, HIS0131, GLY0133, SER0168, and ARG0209 residues co-occurring in all clusters. Figure 7C shows the constructed phylogenetic tree containing 2397 sequences, being 1250 from bacteria and the other 1147 from different organisms, including Opisthokonta, Viridiplantae, Archeae, Amoebozoa, Rhodophyta, Cryptophyceae, Apusozoa, and Sar. Finally, Fig. 7D depicts the distribution of clusters inside the clades described through the phylogenetic tree (Fig. 7C).

It should be mentioned that most zinc-dependent PBGSs also contain an allosteric site for magnesium ion (Mg2+), except in Fungi and Metazoa kingdoms. On the other hand, PBGSs from plants, apicoplasts, and some prokaryotes lack active site cysteines (non-Zn utilizing) but have the putative catalytic magnesium site that stimulates physico-chemical changes to modulate the enzymatic activity.12,72 In this sense, the taxa Viridiplantae, Sar, Rhodophyta, and some Bacteria identified in cluster 1 (Fig. 7A) are non-Zn enzymes, presenting PBGS that requires magnesium to enable the catalytic mechanism. The high affinity of protoporphyrin IX for zinc is known to replace magnesium during chlorophyll maturation, which indicates an evolutionary pressure to reduce the utilization of Zn2+ by these organisms, in particular in PBGS.9,73 In the previous section, the quantum results (Fig. 5) depict the triple-cysteine residues obtaining high ion–ion attraction to zinc. Taking this into account, we observe that cluster 2 encompasses sequences of Archeae and Bacteria that can present significant zinc-binding affinity due to the cysteine residues (CYS0122 and CYS0132),9 as well as the last evolutionary step to zinc coordination by raising cysteine residue in site 0124 instead of acid aspartic. Cluster 3 (Amebozoa and the majority of Eukaryota including human sequence) and cluster 4 can bind Zn2+ with the highest affinity for having all three cysteines in their sequences.9 Among clusters (Fig. 7A), only site 0131 shares an equal probability for histidine and glutamine in cluster 1 due to the common codon change in the third position.74 On the other hand, the sites for VAL0121, ARG0174, and ASP0169 suffered considerable variation among the clusters, suggesting higher variability among the living cells. Despite the permutations, for site 0121, the size and chemical composition (hydrophobic profile) are greatly similar. This does not occur for sites 0174 and 0169.

It is interesting to observe (Fig. 7C) the scattering of the Bacteria individuals along the tree topology, which contrasts with Viridiplatae that composes a monophyletic clade. The Opisthokonta clade, a broad group of eukaryotes, is also a monophyletic group. We also colored the leaves on the tree according to the clusters previously identified and observed that mostly Opisthokonta sequences belong to a single cluster. It is not novel about the concentration of zinc-binding domain between the Metazoa and Fungi kingdoms,1,2 which becomes much more clear in our high-resolution phylogenetic tree in association with the clustering analysis. We noticed that most PGBS sequences from Opisthokonta clade carry the three conserved cysteines into the zinc interaction residues, along with some Bacteria and Archaea. Frère9 and collaborators (2005) argued that a possible route to create a Zn2+-free PBGS is by means of gene duplication and parallel evolution of two PGBS enzymes within one organism,1 which could explain the presence of Bacteria along all four different clusters. On the other hand, clusters 3 and 4 occur in most living organisms and contain all thiolate (S–) groups from cysteine residues. Taking into account the modifications of the sulfhydryl group in the physiological environment, the clusters containing all cysteine can be the most precise to modulate the zinc binding, and consequently, the catalytic activity.68

Our computational approaches, at the molecular level, revealed details about the relevant residues associated with the zinc-binding site of a crystallographic structure of human PBGS. Additionally, the clustering and phylogenetic analysis showed the conservation of the amino acids mentioned above among all living organisms, including non-zinc-dependent PBGS sequences. The cysteine residues CYS0122, CYS0124, and CYS0132 closest to the ion, which showed high ion–ion attraction energy, exist mainly in more complex organisms, and are absent in the most basal forms of life. Furthermore, the residues ASP0120, GLY0133, SER0168, and ARG0209 that presented low interaction energy occur in all clusters. These highly conserved residues were present in the PBGS structure even before its association with the zinc ion.9 Finally, presenting low binding energy, the residues VAL0121, ARG0174, and ASP0169 were the only ones that showed variability between clusters.

Presently, in silico approaches play a significant role in unraveling the rich physical chemistry properties of metalloproteins.23 Although there are several specific force fields for modeling metalloproteins systems using classical molecular mechanics, the reliability of their results is not always guaranteed since they have a limited accuracy.75,76 In this sense, the quantum description using the MFCC scheme could help computational chemists to explore the ligand–metalloproteins and protein–metalloprotein complexes employing DFT-based models to identify important residues in each system. Additionally, bioinformatics techniques to mine the data generated from quantum calculations could be a good ally to investigate them in the face of protein variability, which allows extracting detailed information about the nature of the proteins.

The use of pseudo-potentials for high-throughput DFT calculations has been a good choice when it comes to exploring data of organometallic systems, as well as for hard computational approaches to find increasingly better approximations of molecular behavior. For instance, sophisticated techniques such as machine learning have been using DFT data for the development of computational models suitable to molecular science.77 The success of this approach depends not only on the initial precise set up but also in the quantity of the input data in the system78 and, therefore, benchmark analysis focused on generating large amounts of reliable data is also useful to drive researchers to improve algorithms as well as to support biological investigations.

Taking into account that the LanL2DZ basis set has a lower computational cost than cc-pVTZ or 6-311+G(d,p), it is valuable for cases where it is necessary to obtain large amounts of data on a zinc-dependent metalloprotein. Considering the importance of biotechnological and toxicological studies, the data originated by the LanL2DZ basis set are of great value because this model was developed to identify the valence electrons of transition metals and can describe systems containing heavy, light, and transition atoms.41 Due to the high affinity for metals, PBGS is listed as a relevant biomarker of environmental and occupational poisoning by heavy metals such as Cu2+, Ag+, Hg2+, Al3+, Ga3+, In3+, Ti3+, Sn2+, Pb2+, and Bi3+, among others, which inhibits PBGS by replacing the active site metal cofactor.3 Then, the use of LanL2DZ can be helpful to study the toxicity of heavy metals in PBGS, as well as in other macromolecules. Recently, LanL2DZ has been extensively used in geometry optimization59 and single-point studies,60 as well as UV–vis absorption spectra61 and reactivity descriptor calculations62 of the metal complexes.

Conclusions

In summary, our results reinforce the usefulness of computational approaches to understand the complexity of the biological systems mainly when associated with transition metals. Through interaction energy data obtained by DFT-based calculations, we demonstrated the most important residues involved in PBGS zinc-binding site, namely: CYS0122, CYS0124, CYS0132, ASP0169, SER0168, ARG0221, HIS0131, ASP0120, GLY0133, VAL0121, ARG0209, and ARG0174. The well-known thiolate groups from CYS0122, CYS0124, and CYS0132 obtained the strongest binding affinity with zinc, and the complementary residues presented less intensity with ion evidencing the wealth of details of results from the approaches employed. Our description corresponds to the molecular frame of the PBGS crystallographic structure 5HNR, which can be replicated in future investigations with PBGS as well as other proteins under different circumstances of its nature.

The molecular evolution analysis showed that among the living cells, the PBGS amino acids ASP0120, HIS0131, GLY0133, SER0168, and ARG0209 showed high conservancy, and VAL0121, ARG0174, and ASP0169 presented the lower occurrence. The alliance with the quantum description brought a new glance into the PBGS zinc-binding site. The techniques consider the energy data from amino acids interacting with zinc and their variability into the living organisms. In this sense, we suggest that the application of these self-complementary approaches plays a better understanding of the protein binding probabilities.

Finally, extracting a large amount of reliable data from metalloprotein structures in a reasonable time is a challenge for computational biology. As an additional contribution, we find that the use of the LanL2DZ basis set in combination with the various hybrid functional considered here yields results with both good precision and low computational cost, with a performance ∼27 (8) times faster than the cc-pVTZ [6-311+G(d,p)] one, suggesting that it can be used to promptly generate large quantities of data to test machine learning simulation models.

Acknowledgments

We would like to thank the Núcleo de Processamento de Alto Desempenho of the Universidade Federal do Rio Grande do Norte—NPAD/UFRN for allowing us to access their computer facilities.

Funding

This work was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—CAPES.

Conflicts of interest

The authors declare no conflicts of interest.

Data availability

The data underlying this article are available in the article and in its online supplementary material.

References

1.

Jaffe
E. K.
,
An unusual phylogenetic variation in the metal ion binding sites of porphobilinogen synthase
,
Chem. Biol.
,
2003
,
10
(
1
),
25
34
.

2.

Jaffe
E. K.
,
The remarkable character of porphobilinogen synthase
,
Acc. Chem. Res.
,
2016
,
49
(
11
),
2509
2517
.

3.

Rocha
J. B. T.
,
R. A.
,
Saraiva
S. C.
,
Garcia
F. S.
,
Gravina
C. W.
,
Nogueira
,
Aminolevulinate dehydratase (δ-ALA-D) as marker protein of intoxication with metals and other pro-oxidant situations
,
Toxicol. Res.
,
2012
,
1
(
2
),
85
102
.

4.

Jaffe
E. K.
,
The porphobilinogen synthase catalyzed reaction mechanism
,
Bioorg. Chem.
,
2004
,
32
(
5
),
316
325
.

5.

Kerr
B. T.
,
Ochs-Balcom
H. M.
,
López
P.
,
García-Vargas
G. G.
,
Rosado
J. L.
,
Cebrián
M. E.
,
Kordas
K.
,
Effects of ALAD genotype on the relationship between lead exposure and anthropometry in a cohort of Mexican children
,
Environ. Res.
,
2019
,
170
,
65
72
.

6.

La-Llave-León
O.
,
Méndez-Hernández
E. M.
,
Castellanos-Juárez
F. X.
,
Esquivel-Rodríguez
E.
,
Vázquez-Alaniz
F.
,
Sandoval-Carrillo
A.
,
García-Vargas
G.
,
Duarte-Sustaita
J.
,
Candelas-Rangel
J. L.
,
Salas-Pacheco
J. M.
,
Association between blood lead levels and delta-aminolevulinic acid dehydratase in pregnant women
,
Int. J. Environ. Res. Public Health
,
2017
,
14
(
4
),
432
.

7.

Jaffe
E. K.
,
Stith
L.
,
ALAD porphyria is a conformational disease
,
Am. J. Hum. Genet.
,
2007
,
80
(
2
),
329
337
.

8.

Akagi
R.
,
Kato
N.
,
Inoue
R.
,
Anderson
K. E.
,
Jaffe
E. K.
,
Sassa
S.
,
δ-Aminolevulinate dehydratase (ALAD) porphyria: the first case in North America with two novel ALAD mutations
,
Mol. Genet. Metab.
,
2006
,
87
(
4
),
329
336
.

9.

Frère
F.
,
Reents
H.
,
Schubert
W.-D.
,
Heinz
D. W.
,
Jahn
D.
,
Tracking the evolution of porphobilinogen synthase metal dependence in vitro
,
J. Mol. Biol.
,
2005
,
345
(
5
),
1059
1070
.

10.

Spencer
P.
,
Jordan
P. M.
,
Investigation of the nature of the two metal-binding sites in 5-aminolaevulinic acid dehydratase from Escherichia coli
,
Biochem. J.
,
1994
,
300
(
2
),
373
381
.

11.

Erdtman
E.
,
Bushnell
E. A.
,
Gauld
J. W.
,
Eriksson
L. A.
,
Computational insights into the mechanism of porphobilinogen synthase
,
J. Phys. Chem. B
,
2010
,
114
(
50
),
16860
16870
.

12.

Mills-Davies
N.
,
Butler
D.
,
Norton
E.
,
Thompson
D.
,
Sarwar
M.
,
Guo
J.
,
Gill
R.
,
Azim
N.
,
Coker
A.
,
Wood
S. P.
,
Erskine
P. T.
,
Coates
L.
,
Cooper
J. B.
,
Rashid
N.
,
Akhtar
M.
,
Shoolingin-Jordan
P. M.
,
Structural studies of substrate and product complexes of 5-aminolaevulinic acid dehydratase from humans, Escherichia coli and the hyperthermophile Pyrobaculumcalidifontis
,
Acta Crystallogr. D Struct. Biol.
,
2017
,
73
(
1
),
9
21
.

13.

Zhang
J.
,
Yang
W.
,
Piquemal
J. P.
,
Ren
P.
,
Modeling structural coordination and ligand binding in zinc proteins with a polarizable potential
,
J. Chem. Theory Comput.
,
2012
,
8
(
4
),
1314
1324
.

14.

Kochanczyk
T.
,
Drozd
A.
,
Krężel
A.
,
Relationship between the architecture of zinc coordination and zinc binding affinity in proteins: insights into zinc regulation
,
Metallomics
,
2015
,
7
(
2
),
244
257
.

15.

Haas
K. L.
,
Franz
K. J.
,
Application of metal coordination chemistry to explore and manipulate cell biology
,
Chem. Rev.
,
2009
,
109
(
10
),
4921
4960
.

16.

Ahlstrand
E.
,
Spångberg
D.
,
Hermansson
K.
,
Friedman
R.
,
Interaction energies between metal ions (Zn2+ and Cd2+) and biologically relevant ligands
,
Int. J. Quantum Chem.
,
2013
,
113
(
23
),
2554
2562
.

17.

Zhu
T.
,
Xiao
X.
,
Ji
C.
,
Zhang
J. Z. H.
,
A new quantum calibrated force field for zinc–protein complex
,
J. Chem. Theory Comput.
,
2013
,
9
(
3
),
1788
1798
.

18.

Kurian
R.
,
Bruce
M. R.
,
Bruce
A. E.
,
Amar
F. G.
,
The influence of zinc (II) on thioredoxin/glutathione disulfide exchange: QM/MM studies to explore how zinc (II) accelerates exchange in higher dielectric environments
,
Metallomics
,
2015
,
7
(
8
),
1265
1273
.

19.

Maret
W.
,
Vallee
B. L.
,
Thiolate ligands in metallothionein confer redox activity on zinc clusters
,
Proc. Natl. Acad. Sci. USA
,
1998
,
95
(
7
),
3478
3482
.

20.

Maret
W.
,
Analyzing free zinc (II) ion concentrations in cell biology with fluorescent chelating molecules
,
Metallomics
,
2015
,
7
(
2
),
202
211
.

21.

Ahlstrand
E.
,
Hermansson
K.
,
Friedman
R.
,
Interaction energies in complexes of Zn and amino acids: a comparison of abinitio and force field based calculations
,
J. Phys. Chem. A
,
2017
,
121
(
13
),
2643
2654
.

22.

Xu
M.
,
He
X.
,
Zhu
T.
,
Zhang
J. Z.
,
A fragment quantum mechanical method for metalloproteins
,
J. Chem. Theory Comput.
,
2019
,
15
(
2
),
1430
1439
.

23.

Stiebritz
M. T.
,
Hu
Y.
,
Computational methods for modeling metalloproteins
,
Methods Mol. Biol.
,
2019
,
1876
,
245
266
.

24.

Himo
F.
,
Quantum chemical modeling of enzyme active sites and reaction mechanisms
,
Theor. Chem. Acc.
,
2006
,
116
(
1–3
),
232
240
.

25.

Lonsdale
R.
,
Ranaghan
K. E.
,
Mulholland
A. J.
,
Computational enzymology
,
Chem. Commun.
,
2010
,
46
(
14
),
2354
2372
.

26.

Lima Neto
J. X.
,
Bezerra
K. S.
,
Barbosa
E. D.
,
Oliveira
J. I.
,
Manzoni
V.
,
Soares-Rachetti
V. P.
,
Albuquerque
E. L.
,
Fulco
U. L.
,
Exploring the binding mechanism of GABAB receptor agonists and antagonists through insilico simulations
,
J. Chem. Inf. Model.
,
2019
,
60
(
2
),
1005
1018
.

27.

Gordon
M. S.
,
Fedorov
D. G.
,
Pruitt
S. R.
,
Slipchenko
L. V.
,
Fragmentation methods: a route to accurate calculations on large systems
,
Chem. Rev.
,
2012
,
112
(
1
),
632
672
.

28.

Ziemert
N.
,
Jensen
P. R.
,
Phylogenetic approaches to natural product structure prediction
,
Methods Enzymol.
,
2012
,
517
,
161
182
.

29.

Chen
Y.
,
Reilly
K. D.
,
Sprague
A. P.
,
Guan
Z.
,
SEQOPTICS: a protein sequence clustering system
,
BMC Bioinformatics
,
2006
,
7
(
4
),
S10
.

30.

Ibuot
A.
,
Dean
A. P.
,
Pittman
J. K.
,
Multi-genomic analysis of the cation diffusion facilitator transporters from algae
,
Metallomics
,
2020
,
12
(
4
),
617
630
.

31.

Olsson
M. H.
,
Søndergaard
C. R.
,
Rostkowski
M.
,
Jensen
J. H.
,
PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions
,
J. Chem. Theory Comput.
,
2011
,
7
(
2
),
525
537
.

32.

Momany
F. A.
,
Rone
R.
,
Validation of the general-purpose QUANTA®3.2/CHARMm® force field
,
J. Comput. Chem.
,
1992
,
13
(
7
),
888
900
.

33.

Zhang
D. W.
,
Zhang
J. Z. H.
,
Molecular fractionation with conjugate caps for full quantum mechanical calculation of protein–molecule interaction energy
,
J. Chem. Phys.
,
2003
,
119
(
7
),
3599
3605
.

34.

Bezerra
K. S.
,
Fulco
U. L.
,
Esmaile
S. C.
,
Lima Neto
J. X.
,
Machado
L. D.
,
Freire
V. N.
,
Albuquerque
E. L.
,
Oliveira
J. I.
,
Ribosomal RNA-aminoglycoside hygromycin B interaction energy calculation within a density functional theory framework
,
J. Phys. Chem. B
,
2019
,
123
(
30
),
6421
6429
.

35.

Campos
D. M. O.
,
Bezerra
K. S.
,
Esmaile
S. C.
,
Fulco
U. L.
,
Albuquerque
E. L.
,
Oliveira
J. I. N.
,
Intermolecular interactions of cn-716 and acyl-KR-aldehyde dipeptide inhibitors against Zika virus
,
Phys. Chem. Chem. Phys.
,
2020
,
22
(
27
),
15683
15695
.

36.

Frisch
M. J.
,
Trucks
G. W.
,
Schlegel
H. B.
,
Scuseria
G. E.
,
Robb
M. A.
,
Cheeseman
J. R.
,
Scalmani
G.
,
Barone
V.
,
Mennucci
B.
,
Petersson
G. A.
,
Nakatsuji
H.
,
Caricato
M.
,
Li
X.
,
Hratchian
H. P.
,
Izmaylov
A. F.
,
Bloino
J.
,
Zheng
G.
,
Sonnenberg
J. L.
,
Hada
M.
,
Ehara
M.
,
Toyota
K.
,
Fukuda
R.
,
Hasegawa
J.
,
Ishida
M.
,
Nakajima
T.
,
Honda
Y.
,
Kitao
O.
,
Nakai
H.
,
Vreven
T.
,
Montgomery
J. A.
,
Peralta
J. E.
,
Ogliaro
F.
,
Bearpark
M.
,
Heyd
J. J.
,
Brothers
E.
,
Kudin
K. N.
,
Staroverov
V. N.
,
Kobayashi
R.
,
Normand
J.
,
Raghavachari
K.
,
Rendell
A.
,
Burant
J. C.
,
Iyengar
S. S.
,
Tomasi
J.
,
Cossi
M.
,
Rega
N.
,
Millam
J. M.
,
Klene
M.
,
Knox
J. E.
,
Cross
J. B.
,
Bakken
V.
,
Adamo
C.
,
Jaramillo
J.
,
Gomperts
R.
,
Stratmann
R. E.
,
Yazyev
O.
,
Austin
A. J.
,
Cammi
R.
,
Pomelli
C.
,
Ochterski
J. W.
,
Martin
R. L.
,
Morokuma
K.
,
Zakrzewski
V. G.
,
Voth
G. A.
,
Salvador
P.
,
Dannenberg
J. J.
,
Dapprich
S.
,
Daniels
A. D.
,
Farkas
O.
,
Foresman
J. B.
,
Ortiz
J. V.
,
Cioslowski
J.
,
Fox
D. J.
,
Gaussian 09
,
Gaussian, Inc.
,
Wallingford, CT
,
2009
.

37.

Becke
A. D.
,
Density-functional thermochemistry. III. The role of exact exchange
,
J. Chem. Phys.
,
1993
,
98
(
7
),
5648
5652
.

38.

Schmider
H. L.
,
Becke
A. D.
,
Optimized density functionals from the extended G2 test set
,
J. Chem. Phys.
,
1998
,
108
(
23
),
9624
9631
.

39.

Zhao
Y.
,
Truhlar
D. G.
,
The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals
,
Theor. Chem. Acc.
,
2008
,
120
(
1–3
),
215
241
.

40.

Kendall
R. A.
,
Dunning
T. H.
,
Harrison
R. J.
,
Electron affinities of the first-row atoms revisited. Systematic basis sets and wave functions
,
J. Chem. Phys.
,
1992
,
96
(
9
),
6796
6806
.

41.

Hay
P. J.
,
Wadt
W. R.
,
Abinitio effective core potentials for molecular calculations. Potentials for the transition metal atoms Sc to Hg
,
J. Chem. Phys.
,
1985
,
82
(
1
),
270
283
.

42.

Cossi
M.
,
Rega
N.
,
Scalmani
G.
,
Barone
V.
,
Energies, structures, and electronic properties of molecules in solution with the C-PCM solvation model
,
J. Comput. Chem.
,
2003
,
24
(
6
),
669
681
.

43.

Neto
J. X. L.
,
Bezerra
K. S.
,
Manso
D. N.
,
Mota
K. B.
,
Oliveira
J. I.
,
Albuquerque
E. L.
,
Caetano
E. W.
,
Freire
V. N.
,
Fulco
U. L.
,
Energetic description of cilengitide bound to integrin
,
New J. Chem.
,
2017
,
41
(
19
),
11405
11412
.

44.

Yamada
K. D.
,
Tomii
K.
,
Katoh
K.
,
Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
,
Bioinformatics
,
2016
,
32
(
21
),
3246
3251
.

45.

Capella-Gutiérrez
S.
,
Silla-Martínez
J. M.
,
Gabaldón
T.
,
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
,
Bioinformatics
,
2009
,
25
(
15
),
1972
1973
.

46.

Liu
X.
,
Deng
Y.
,
Ni
Y.
,
Li
Z.
,
FastTree: a hardware KD-tree construction acceleration engine for real-time ray tracing
, In:
2015 Design, Automation & Test in Europe Conference & Exhibition (DATE)
,
Grenoble, France
,
2015
, pp.
1595
1598
.

47.

Shimodaira
H.
,
Hasegawa
M.
,
Multiple comparisons of log-likelihoods with applications to phylogenetic inference
,
Mol. Biol. Evol.
,
1999
,
16
(
8
),
1114
1116
.

48.

Le
S. Q.
,
Gascuel
O.
,
An improved general amino acid replacement matrix
,
Mol. Biol. Evol.
,
2008
,
25
(
7
),
1307
1320
.

49.

Team
R. C.
,
R: A Language and Environment for Statistical Computing
, Vienna, Austria,
2013
.

50.

Fitch
W. M.
,
An improved method of testing for evolutionary homology
,
J. Mol. Biol.
,
1966
,
16
(
1
),
9
16
.

51.

Charif
D.
,
Lobry
J. R.
,
SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis
, in
Structural Approaches to Sequence Evolution
,
Springer
,
Berlin, Heidelberg
,
2007
, pp.
207
232
.

52.

S.
,
Josse
J.
,
Husson
F.
,
FactoMineR: an R package for multivariate analysis
,
J. Stat. Softw.
,
2008
,
25
(
1
),
1
18
.

53.

Crooks
G. E.
,
Hon
G.
,
Chandonia
J.-M.
,
Brenner
S. E.
,
WebLogo: a sequence logo generator
,
Genome Res.
,
2004
,
14
(
6
),
1188
1190
.

54.

Radoń
M.
,
Revisiting the role of exact exchange in DFT spin-state energetics of transition metal complexes
,
Phys. Chem. Chem. Phys.
,
2014
,
16
(
28
),
14479
14488
.

55.

Mehandzhiyski
A. Y.
,
Riccardi
E.
,
van Erp
T. S.
,
Koch
H.
,
Åstrand
P.-O.
,
Trinh
T. T.
,
Grimes
B. A.
,
Density functional theory study on the interactions of metal ions with long chain deprotonated carboxylic acids
,
J. Phys. Chem. A
,
2015
,
119
(
40
),
10195
10203
.

56.

Deepak
R. K.
,
Chandrakar
B.
,
Sankararamakrishnan
R.
,
Comparison of metal-binding strength between methionine and cysteine residues: implications for the design of metal-binding motifs in proteins
,
Biophys. Chem.
,
2017
,
224
,
32
39
.

57.

Erhardt
S.
,
Jaime
E.
,
Weston
J.
,
A water sluice is generated in the active site of bovine lens leucine aminopeptidase
,
J. Am. Chem. Soc.
,
2005
,
127
(
11
),
3654
3655
.

58.

Drici
N.
,
Krallafa
M. A.
,
Effect of mutation on the stabilization energy of HIV-1 zinc fingers: a hybrid local self-consistent field/molecular mechanics investigation
,
J. Biol. Inorg. Chem.
,
2017
,
22
(
1
),
109
119
.

59.

Mondal
S.
,
Chakraborty
M.
,
Mondal
A.
,
Pakhira
B.
,
Mukhopadhyay
S. K.
,
Banik
A.
,
Sengupta
S.
,
Chattopadhyay
S. K.
,
Crystal structure, spectroscopic, DNA binding studies and DFT calculations of a Zn (II) complex
,
New J. Chem.
,
2019
,
43
(
14
),
5466
5474
.

60.

Ghosh
B.
,
Adak
P.
,
Naskar
S.
,
Pakhira
B.
,
Mitra
P.
,
Chattopadhyay
S. K.
,
Ruthenium (II/III) complexes of redox non-innocent bis (thiosemicarbazone) ligands: synthesis, X-ray crystal structures, electrochemical, DNA binding and DFT studies
,
Polyhedron
,
2017
,
131
,
74
85
.

61.

Cárdenas-Jirón
G. I.
,
Barboza
C. A.
,
López
R.
,
Menéndez
M. I.
,
Theoretical study on the electronic excitations of a porphyrin-polypyridyl ruthenium (II) photosensitizer
,
J. Phys. Chem. A
,
2011
,
115
(
43
),
11988
11997
.

62.

Rahmouni
N. T.
,
el Houda Bensiradj
N.
,
Megatli
S. A.
,
Djebbar
S.
,
Baitich
O. B.
,
New mixed amino acids complexes of iron (III) and zinc (II) with isonitrosoacetophenone: synthesis, spectral characterization, DFT study and anticancer activity
,
Spectrochim. Acta A
,
2019
,
213
(
5
),
235
248
.

63.

Strodel
B.
,
Coskuner-Weber
O.
,
Transition metal ion interactions with disordered amyloid-β peptides in the pathogenesis of Alzheimer's disease: insights from computational chemistry studies
,
J. Chem. Inf. Model.
,
2019
,
59
(
5
),
1782
1805
.

64.

Dudev
T.
,
Lim
C.
,
Metal binding in proteins: the effect of the dielectric medium
,
J. Phys. Chem. B
,
2000
,
104
(
15
),
3692
3694
.

65.

Chen
S.-L.
,
Fang
W.-H.
,
Himo
F.
,
Technical aspects of quantum chemical modeling of enzymatic reactions: the case of phosphotriesterase
,
Theor. Chem. Acc.
,
2008
,
120
(
4–6
),
515
522
.

66.

Vicatos
S.
,
Roca
M.
,
Warshel
A.
,
Effective approach for calculations of absolute stability of proteins using focused dielectric constants
,
Proteins Struct. Funct. Bioinf.
,
2009
,
77
(
3
),
670
684
.

67.

Jaffe
E. K.
,
Martins
J.
,
Li
J.
,
Kervinen
J.
,
Dunbrack
R. L.
,
The molecular mechanism of lead inhibition of human porphobilinogen synthase
,
J. Biol. Chem.
,
2001
,
276
(
2
),
1531
1537
.

68.

Baez
N. O. D.
,
Reisz
J. A.
,
Furdui
C. M.
,
Mass spectrometry in studies of protein thiol chemistry and signaling: opportunities and caveats
,
Free Radic. Biol. Med.
,
2015
,
80
,
191
211
.

69.

Jaffe
E. K.
,
Lawrence
S. H.
,
Allostery and the dynamic oligomerization of porphobilinogen synthase
,
Arch. Biochem. Biophys.
,
2012
,
519
(
2
),
144
153
.

70.

Akagi
R.
,
Nishitani
C.
,
Harigae
H.
,
Horie
Y.
,
Garbaczewski
L.
,
Hassoun
A.
,
Mercelis
R.
,
Verstraeten
L.
,
Sassa
S.
,
Molecular analysis of δ-aminolevulinate dehydratase deficiency in a patient with an unusual late-onset porphyria
,
Blood
,
2000
,
96
(
10
),
3618
3623
.

71.

Kitadai
N.
,
Maruyama
S.
,
Origins of building blocks of life: a review
,
Geosci. Front.
,
2018
,
9
(
4
),
1117
1153
.

72.

Chauhan
S.
,
O'Brian
M.
,
Bradyrhizobiumjaponicum delta-aminolevulinic acid dehydratase is essential for symbiosis with soybean and contains a novel metal-binding domain
,
J. Bacteriol.
,
1993
,
175
(
22
),
7222
7227
.

73.

Jaffe
E. K.
,
Porphobilinogen synthase, the first source of heme's asymmetry
,
J. Bioenerg. Biomembr.
,
1995
,
27
(
2
),
169
179
.

74.

Singer
G. A.
,
Hickey
D. A.
,
Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content
,
Gene
,
2003
,
317
,
39
47
.

75.

Jing
Y. Q.
,
Han
K. L.
,
Quantum mechanical effect in protein–ligand interaction
,
Expert Opin. Drug Discovery
,
2010
,
5
(
1
),
33
49
.

76.

Ryde
U.
,
Soderhjelm
P.
,
Ligand-binding affinity estimates supported by quantum-mechanical methods
,
Chem. Rev.
,
2016
,
116
(
9
),
5520
5566
.

77.

Butler
K. T.
,
Davies
D. W.
,
Cartwright
H.
,
Isayev
O.
,
Walsh
A.
,
Machine learning for molecular and materials science
,
Nature
,
2018
,
559
(
7715
),
547
555
.

78.

Schleder
G. R.
,
Padilha
A. C. M.
,
Acosta
C. M.
,
Costa
M.
,
Fazzio
A.
,
From DFT to machine learning: recent approaches to materials science—a review
,
J. Phys. Mater.
,
2019
,
2
(
3
),
032001
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)