-
PDF
- Split View
-
Views
-
Cite
Cite
Weixiao Cheng, Jon A Doering, Carlie LaLone, Carla Ng, Integrative Computational Approaches to Inform Relative Bioaccumulation Potential of Per- and Polyfluoroalkyl Substances Across Species, Toxicological Sciences, Volume 180, Issue 2, April 2021, Pages 212–223, https://doi.org/10.1093/toxsci/kfab004
Close - Share Icon Share
Abstract
Predictive toxicology is increasingly reliant on innovative computational methods to address pressing questions in chemicals assessment. Of importance is the evaluation of contaminant impact differences across species to inform ecosystem protection and identify appropriate model species for human toxicity studies. Here we evaluated 2 complementary tools to predict cross-species differences in binding affinity between per- and polyfluoroalkyl substances (PFAS) and the liver fatty acid-binding protein (LFABP): the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool and molecular dynamics (MD). SeqAPASS determined that the structure of human LFABP, a key determinant of PFAS bioaccumulation, was conserved in the majority of vertebrate species, indicating these species would have similar PFAS bioaccumulation potentials. Level 3 SeqAPASS evaluation identified several potentially destabilizing amino acid differences across species, which were generally supported by DUET stability change predictions. Nine single-residue mutations and 7 whole species sequences were selected for MD evaluation. One mutation (F50V for PFNA) showed a statistically significant difference with stronger affinity than wild-type human LFABP. Predicted binding affinities for 9 different PFAS across 7 species showed human, rat, chicken, and rainbow trout had similar binding affinities to one another for each PFAS, whereas Japanese medaka and fathead minnow had significantly weaker LFABP-binding affinity for some PFAS. Based on these analyses, the combined use of SeqAPASS and MD provides rapid screening for potential species differences with deeper structural insight. This approach can be easily extended to other important biological receptors and potential ligands.
Per- and polyfluoroalkyl substances (PFAS) encompass a class of potentially thousands of short-chained, long-chained, and branched organofluorine structures (eg, perfluorooctane sulfonate [PFOS] and perfluorooctanoic acid [PFOA]) (Wang et al., 2017). These synthetic chemicals have been used in numerous industrial applications and consumer products including fire-fighting foams, as stain and oil repellents, in lubricants, apparels, upholstery, etc. (Buck et al., 2011; Glüge et al., 2020). Their widespread use and persistence in the environment, as well as their ability to bioaccumulate have been recognized globally, making PFAS of research interest, particularly in understanding potential toxicities across species. Because of the ubiquitous nature of PFAS, they have been measured in tissues from species as diverse as whales, birds, fish, and even invertebrates, covering the range of trophic levels (Burkhard and Elonen, 2019). Reproductive toxicity, neurotoxicity, hepatotoxicity, immunotoxicity, and modulation of metabolism have been reported as adverse effects of PFAS exposure in mammals and possibly other organisms (Sunderland et al., 2019). However, considering the variety of PFAS, limited studies are available to understand their adverse effects, with even less known regarding effects across taxa.
Notably, some PFAS have high bioaccumulation potential in animals. Studies have found that many long-chain PFAS (perfluoroalkyl carboxylic acids [PFCA] with ≥ 7 perfluorinated carbons and perfluoroalkane sulfonic acids [PFSA] with ≥6 perfluorinated carbons) accumulate in blood, liver, and kidney (Ng and Hungerbühler, 2014); their biological half-lives were estimated to be several years for humans (Olsen et al., 2007) and several days for male rats (Kennedy et al., 2004; Kim et al., 2016; Kudo and Kawashima, 2003). The underlying molecular mechanisms for PFAS bioaccumulation potential are closely related to protein-PFAS interactions. PFAS have high binding affinity to serum albumin and to liver-type fatty acid-binding proteins (LFABP) in liver and kidney tissues, making those tissues important accumulation media (Han et al., 2003; Sheng et al., 2018; Woodcroft et al., 2010). Moreover, cellular transport of PFAS is most likely controlled by both passive diffusion and active transport facilitated by transporter proteins (Weaver et al., 2010; Yang et al., 2009, 2010), which impacts species-specific elimination half-lives. By explicitly considering key proteins and transporters, we developed physiologically based toxicokinetic models that successfully simulated the toxicokinetics and tissue distribution of PFAS in rat and fish (Cheng and Ng, 2017; Ng and Hungerbühler, 2013). Both experimental studies and modeling results demonstrate that protein-PFAS interactions play essential roles in determining PFAS bioaccumulation potential in animals. Thus, estimates of the strengths of these interactions can be used as proxies to evaluate their bioaccumulation potential.
Because of the large number of PFAS involved and potential species of interest, with limited resources for testing, it is not feasible to assess the bioaccumulation potential for all PFAS and all species through laboratory experiments. In silico approaches, on the other hand, hold great promise for hazard and risk assessment. Here, we propose an integrative in silico approach to inform PFAS bioaccumulation potential across different species from the perspective of the sequence, structure, and function of a critical protein, LFABP. There are 2 main components included in our approach. The first is the U.S. Environmental Protection Agency’s Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; https://seqapass.epa.gov/seqapass/; v4.0), a web-based tool used for cross-species extrapolation for chemical toxicity based on assessment of protein sequence and structural similarity (LaLone et al., 2016). It consists of 3 levels of sequence evaluation, ranging from the whole primary amino acid sequence alignment to conserved functional domain and individual residue alignments, with increasing levels of complexity (LaLone et al., 2016). The second is a multistep molecular modeling workflow that combines homology modeling, molecular docking, and molecular dynamics (MD) to estimate the protein-binding affinities for different PFAS, which can provide additional structural insights. Homology modeling can predict protein structures based on sequences and is used to construct the 3-dimensional structure for proteins that have no structural information available (Kelley et al., 2015). Molecular docking, a powerful tool to predict the conformation of ligands bound to proteins (Trott and Olson, 2010), is then used to generate protein-PFAS complex structures. Taking the initial complex structure as input, MD simulation was used to estimate the protein binding affinity for each PFAS. Our previous study has shown that the MD approach can predict the relative protein-binding affinity of different PFAS structures in a fast and reliable way (Cheng and Ng, 2018).
To demonstrate the utility of this integrative in silico approach, we focused on LFABP as the protein proxy for bioaccumulation assessment in this study. LFABP is a well-studied protein thought to explain the high accumulation of PFAS in liver tissue, and there are many available experimental binding affinity data for different PFAS, which we previously used for evaluation of the multistep molecular modeling workflow (Cheng and Ng, 2018). Here, we considered a total of 9 PFAS with different functional head groups and fluorinated carbon chain lengths including 6 PFCAs (ie, PFBA, PFPA, PFHxA, PFHpA, PFOA, and PFNA) and 3 PFSAs (ie, PFBS, PFHxS, and PFOS); the chemical structures and other information for those PFAS are summarized in Table 1 and in Supplementary Table 1.
Structural Information for the 9 PFAS Included in This Study
| Name . | Acronym . | CAS Number . | Carbon Chain Length . | 2D Structures . |
|---|---|---|---|---|
| Perfluorobutanoic acid | PFBA | 375-22-4 | 4 | ![]() |
| Perfluoropentanoic acid | PFPA | 2706-90-3 | 5 | ![]() |
| Perfluorohexanoic acid | PFHxA | 307-24-4 | 6 | ![]() |
| Perfluoroheptanoic acid | PFHpA | 375-85-9 | 7 | ![]() |
| Perfluorooctanoic acid | PFOA | 335-67-1 | 8 | ![]() |
| Perfluorononanoic acid | PFNA | 375-95-1 | 9 | ![]() |
| Perfluorobutane sulfonate | PFBS | 375-73-5 | 4 | ![]() |
| Perfluorohexane sulfonate | PFHxS | 355-46-4 | 6 | ![]() |
| Perfluorooctane sulfonate | PFOS | 1763-23-1 | 8 | ![]() |
| Name . | Acronym . | CAS Number . | Carbon Chain Length . | 2D Structures . |
|---|---|---|---|---|
| Perfluorobutanoic acid | PFBA | 375-22-4 | 4 | ![]() |
| Perfluoropentanoic acid | PFPA | 2706-90-3 | 5 | ![]() |
| Perfluorohexanoic acid | PFHxA | 307-24-4 | 6 | ![]() |
| Perfluoroheptanoic acid | PFHpA | 375-85-9 | 7 | ![]() |
| Perfluorooctanoic acid | PFOA | 335-67-1 | 8 | ![]() |
| Perfluorononanoic acid | PFNA | 375-95-1 | 9 | ![]() |
| Perfluorobutane sulfonate | PFBS | 375-73-5 | 4 | ![]() |
| Perfluorohexane sulfonate | PFHxS | 355-46-4 | 6 | ![]() |
| Perfluorooctane sulfonate | PFOS | 1763-23-1 | 8 | ![]() |
Structural Information for the 9 PFAS Included in This Study
| Name . | Acronym . | CAS Number . | Carbon Chain Length . | 2D Structures . |
|---|---|---|---|---|
| Perfluorobutanoic acid | PFBA | 375-22-4 | 4 | ![]() |
| Perfluoropentanoic acid | PFPA | 2706-90-3 | 5 | ![]() |
| Perfluorohexanoic acid | PFHxA | 307-24-4 | 6 | ![]() |
| Perfluoroheptanoic acid | PFHpA | 375-85-9 | 7 | ![]() |
| Perfluorooctanoic acid | PFOA | 335-67-1 | 8 | ![]() |
| Perfluorononanoic acid | PFNA | 375-95-1 | 9 | ![]() |
| Perfluorobutane sulfonate | PFBS | 375-73-5 | 4 | ![]() |
| Perfluorohexane sulfonate | PFHxS | 355-46-4 | 6 | ![]() |
| Perfluorooctane sulfonate | PFOS | 1763-23-1 | 8 | ![]() |
| Name . | Acronym . | CAS Number . | Carbon Chain Length . | 2D Structures . |
|---|---|---|---|---|
| Perfluorobutanoic acid | PFBA | 375-22-4 | 4 | ![]() |
| Perfluoropentanoic acid | PFPA | 2706-90-3 | 5 | ![]() |
| Perfluorohexanoic acid | PFHxA | 307-24-4 | 6 | ![]() |
| Perfluoroheptanoic acid | PFHpA | 375-85-9 | 7 | ![]() |
| Perfluorooctanoic acid | PFOA | 335-67-1 | 8 | ![]() |
| Perfluorononanoic acid | PFNA | 375-95-1 | 9 | ![]() |
| Perfluorobutane sulfonate | PFBS | 375-73-5 | 4 | ![]() |
| Perfluorohexane sulfonate | PFHxS | 355-46-4 | 6 | ![]() |
| Perfluorooctane sulfonate | PFOS | 1763-23-1 | 8 | ![]() |
MATERIALS AND METHODS
SeqAPASS Workflow to Predict PFAS Susceptibility Across Species
The U.S. Environmental Protection Agency’s SeqAPASS (https://seqapass.epa.gov/seqapass/; v4.0 data version 4) tool was used to evaluate protein conservation and predict bioaccumulation potential across species for different PFAS (LaLone et al., 2016). The SeqAPASS evaluation was also used to predict which critical amino acid residues could be influencing potential differences in bioaccumulation across species by comparing amino acid side chain classification and molecular weight among species-specific amino acid substitutions (Doering et al., 2018). The query protein used for evaluating primary amino acid sequence conservation using SeqAPASS Level 1 was the human LFABP (NCBI Reference Sequence accession NP_001434). No functional domains were identified as specific hits in NCBI’s Conserved Domains database (Marchler-Bauer et al., 2015), therefore, no SeqAPASS Level 2 runs were submitted. Previous computational studies employing molecular docking and MD simulations were reviewed to evaluate the interactions between human and Norway rat LFABP with PFAS. A number of amino acids were identified as important because they formed hydrogen bonds with the PFAS and/or had large energy contributions. These hydrogen-bond-forming amino acids were therefore identified as critical amino acids for evaluation in SeqAPASS Level 3 and were used to predict conservation of the LFABP sequence across species. These previous studies also indicate that there are differences in binding affinity between human and rat (Cheng and Ng, 2018). To further explore these species’ differences and to identify other potentially important amino acids, the human LFABP was aligned with the Norway rat LFABP (NCBI Reference Sequence accession NP_036688.1) using NCBI Constraint-based Multiple Alignment Tool, where 22 of the 127 amino acids were not exact matches (Supplementary Figure 1). All 22 individual amino acids were evaluated in SeqAPASS Level 3, with only 6 amino acids predicted to impact binding considering human and rat LFABP, specifically positions 48, 50, 54, 81, 97, and 104 (using human LFABP as the template). Conservation of these 6 amino acid positions was then evaluated further across vertebrate LFABP sequences using SeqAPASS Level 3, which compares amino acid side chain classification and molecular weight (differences of >30 g/mol) to predict cross-species conservation in protein-chemical interaction (Doering et al., 2018). Amino acid differences in these positions were identified across taxa and submitted to DUET (http://biosig.unimelb.edu.au/duet/stability), which is a web-tool for predicting the effects of mutations on protein stability (Pires et al., 2014). DUET was used to predict whether the changes in amino acids (representing those seen in other species) altered stability by submitting the crystal structure for human LFABP as input with specified mutations.
Multistep Molecular Modeling to Predict Protein Binding Affinity
Multistep molecular modeling workflow
The multistep molecular modeling workflow was developed based on our previous MD modeling to estimate the LFABP binding affinity for different PFAS. As shown in Figure 1, the workflow consists of 4 major steps: curation of structures, molecular docking, MD, and molecular mechanics combined with Poisson-Boltzmann surface area (MM-PBSA) calculation.
Molecular structures were either the 3-dimensional crystal structures obtained from the Protein Data Bank (PDB, https://www.rcsb.org/) or were constructed by the homology modeling tool Phyre2 for LFABP across different species. In the PDB, we chose protein structures based on their resolution (the higher, the better) and completeness of key residues. Phyre2 was used to construct the 3-dimensional structure because it is one of the most popular protein structure prediction servers and very user-friendly (Kelley et al., 2015). We selected the structure with the highest confidence as the output of Phyre2. All protein structures constructed for this study have a confidence of 100%. The 3-dimensional structures for PFOA and PFOS were obtained from the PDB (PDB codes 5JID and 4E99, respectively), and for other PFAS ligands were constructed using Avogadro (v1.2.0) (Hanwell et al., 2012). For each ligand generated in Avogadro, molecular mechanics-based geometry optimization was performed to ensure a realistic rendition of the molecule (Hanwell et al., 2012). To evaluate the performance of Avogadro in generating PFAS structures when experimental ones are not available, the 3-dimensional structures of PFOA and PFOS were also created and compared with those obtained from the PDB. As shown in Supplementary Figure 2, the root-mean-square deviation (RMSD) between those 2 versions of structures is 0.897 Å for PFOA and 0.692 Å for PFOS. Those values demonstrate the effectiveness of using Avogadro to generate 3-dimensional structures for PFAS (Morris and Lim-Wilby, 2008).
Next, the molecular docking tool Autodock Vina (v1.1.2) (Trott and Olson, 2010) was employed to generate the initial structures of protein-PFAS complexes following the same procedure we developed in our previous study (Cheng and Ng, 2018). The outputs from the docking experiments include binding free energies and docking poses for each protein-PFAS pair. The top 3 strongest binding modes (ie, the lowest energies) were selected as initial structures for MD simulations.
The MD simulations of all protein-PFAS complexes were performed with the AMBER 14 suite (Case et al., 2014), as described in our previous study (Cheng and Ng, 2018). Briefly, the protein-PFAS complex system was first explicitly solvated in a cubic box of TIP3P water molecules (Jorgensen et al., 1983), with the addition of Na+ or Cl− counterions to neutralize the systems. Next, the whole system was subject to a series of processes to mimic experimental conditions. Those processes include: (1) 3500 cycles minimization; (2) heating with constant volume and temperature for 20 ps from 0 to 300 K; (3) adjusting density to 1 g/cm3 at constant pressure (1 bar) for 100 ps; (4) equilibration with constant pressure (1 bar) and temperature (300 K) for 2 ns; (5) production with constant pressure (1 bar) and temperature (300 K) for 24 ns. Each phase generates trajectories containing coordinates and velocity information of the molecular system, which can be used to calculate the free energy of binding (ΔGbind).
Single-residue mutation effects
SeqAPASS Level 3 results identified a number of key amino acid residues differing between human and other species, and those residues include F50, A54, T81, T93, and N97. To further evaluate how the mutation of those residues would affect protein binding affinity to PFAS, the multistep molecular modeling was used to estimate the binding affinity of mutant human LFABP for PFAS. Based on SeqAPASS Level 3 results, we considered a total of 8 single amino acid substitutions including F50V, F50I, F50L, A54T, T81A, T81G, T93A, and N97G (here, F50V means residue at position 50 mutated from F to V, and so on for other mutations), and 1 double amino acid substitution 5093 which indicates 2 sequential mutations F50V, followed by T93A. The 3-dimensional structures of those mutant human LFABPs were generated using DUET and then fed into the molecular modeling workflow. The calculated binding affinities for those mutations were compared with the wild-type human LFABP to assess the mutation effects on protein binding affinity for PFAS. Finally, to evaluate whether this methodology works, experimental data from the Sheng et al. (2016) study was used to compare with the model results. In the Sheng et al. study, the binding affinities of wild-type human LFABP and its variants (ie, S39G, M74G, N111D, and R122G) for PFOA and PFNA were determined by fluorescence displacement and isothermal titration calorimetry experiments. To compare, the binding affinities for the S39G, M74G, N111D, and R122G mutations were predicted using the molecular modeling workflow.
Whole protein cross-species effects
Based on the SeqAPASS Level 1 results, a total of 7 different species (ie, human, rat, chicken, zebrafish, rainbow trout, Japanese medaka, and fathead minnow) were selected to further examine differences in protein binding affinity across species using the full LFABP gene sequence for each species and 9 different PFAS. The 3-dimensional crystal structures were obtained from the PDB website for human LFABP (PDB code: 3STM) and rat LFABP (PDB code: 1LFO). For other LFABPs, Phyre2 (Kelley et al., 2015) was used to construct 3-dimensional structures. The protein sequences used to build the 3-dimensional structures are shown in Supplementary Figure 3. Those molecular structures were then used to predict ΔGbind values for all LFABP-PFAS pairs from the molecular modeling workflow.
Data analysis
For each LFABP-PFAS pair, a total of 9 ΔGbind values were generated from the molecular modeling workflow. Those values were calculated based on the trajectories from 9 independent simulation phases and thus can be considered as random samples. One-way ANOVA was conducted to test for significant differences among the 7 different species of LFABP for the 9 tested PFAS ligands. In addition, multiple comparisons with Tukey’s test was performed to identify which groups are significantly different from each other for both single-residue mutation effects and whole protein cross-species effects. The Python package SciPy (https://scipy.org/index.html) and statsmodels (https://github.com/statsmodels/statsmodels) were used for ANOVA and Tukey’s test, respectively; and both tests were conducted based on the 9 different ΔGbind values for each LFABP-PFAS pair.
RESULTS
SeqAPASS Level 1: Conservation of Primary Amino Acid Sequence
The SeqAPASS Level 1 primary amino acid sequence comparison of human LFABP resulted in the alignment of sequences from 347 species (62 of which were identified as ortholog candidates), across 21 taxonomic groups representing vertebrates, invertebrates, and fungi. Results from SeqAPASS predict that LFABP is conserved in 302 species from Mammalia, Aves, Lepidosauria, Testudines, Crocodylia, Amphibia, Actinopteri, and Chondrichthyes taxonomic groups, whereas 43 invertebrates and fungi lack conservation of the protein (Figure 2; Supplementary SeqAPASS Output 1). Therefore, SeqAPASS Level 1 provides a line of evidence that vertebrates would share similar bioaccumulation potential to PFAS as humans.
SeqAPASS (v4.0) results illustrating the percent similarity across species compared with human (Homo sapiens) liver fatty acid-binding protein (LFABP) aligning primary amino acid sequences. The green open circle (○) represents the human LFABP and black filled circle (•) represents the species with the highest percent similarity within the specified taxonomic group. Red dots represent ortholog candidates. The top and bottom of each box represent the 75th and 25th percentiles, respectively. The top and bottom whiskers extend up to 1.5 times the interquartile range. The mean and median values for each taxonomic group are represented by horizontal thick and thin black lines on the box, respectively. The dashed line indicates the cut-off for predictions of LFABP conservation with those taxonomic groups above or crossing through the cut-off predicted to have similar bioaccumulation potential for PFAS as human and those below likely to be different.
SeqAPASS Level 3: Conservation of Critical Individual Amino Acid Residues
In evaluating critical amino acid residues identified from previous work as involved in hydrogen bonding interactions with PFAS, the SeqAPASS Level 3 prediction aligned 277 vertebrate species (removing hypothetical, partial, and nonmatching annotated proteins from Level 1) to the human LFABP as the template, identifying only 42 species predicted to differ in bioaccumulation potential from that of humans (Supplementary SeqAPASS Output 1). Specifically, those that were predicted to differ included 8 species from Mammalia, 17 from Aves, 1 from Lepidosauria, 1 from Amphibia, 14 from Actinopteri (including common model organisms in ecotoxicology Japanese medaka and fathead minnow), and 1 from Chondrichthyes (Supplementary SeqAPASS Output 1). Therefore, the LFABP for the majority of vertebrate species was conserved and predicted to have similar bioaccumulation potential to PFAS as that of humans.
SeqAPASS Level 3: Identification of Other Potential Critical Amino Acids Across Species
Because Level 3 of the SeqAPASS tool can be used to both extrapolate from known critical amino acids and develop research hypotheses by identifying potential amino acid differences across species that may help explain differences in susceptibility, the SeqAPASS Level 3 evaluation was expanded via comparison of human and rat sequences and then further across taxa (Supplementary Data 1). The SeqAPASS Level 3 results identified other potentially critical individual amino acid residues in human LFABP—namely, phenylalanine (F50), alanine (A54), threonine (T81), threonine (T93), and asparagine (N97)—that were compared with other vertebrate species identifying 4 species groups (Table 2). Most primates, ruminants, and whales/dolphins aligned identical amino acids to the human. Rodents and other mammals, fish, amphibians, and testudines have amino acid substitutions aligning with human at positions 50, 54, 81, and 97. Aves, Lepidosauria, and Chondrichthyes have amino acid substitutions aligning with each critical position identified in humans, whereas Crocodylia has amino acid substitutions aligning with human positions 54, 93, and 97. Interestingly, the zebrafish, a common model organism, aligned an alanine at human position T93, whereas all other fish species aligned either a Threonine (no difference from human) or Valine.
Amino Acid Differences Across Species Compared With Human LFABP That Predict Different Bioaccumulation Potential
| Human Amino Acid Position . | Type 1 Primates, Ruminants, Whales/Dolphins . | Type 2 Rodents and Other Mammals, Fish, Amphibians, Testudines . | Type 3 Aves, Lepidosauria Chondrichthyes . | Type 4 Crocodylia . | SeqAPASS Level 3 Prediction of Similar to Human LFABP Template . | Mutation in DUET . | Stability Change from DUET (ΔΔG, kcal/mol) . |
|---|---|---|---|---|---|---|---|
| 50 | Phenylalanine (F) | Phenylalanine | Yes | ||||
Valine (V) Isoleucine (I) Leucine (L) | Valine (V) Isoleucine (I) Leucine (L) | No No No | F50V F50I F50L | −1.196 (destabilizing) −0.808 (destabilizing) −0.893 (destabilizing) | |||
| 54 | Alanine (A) | Yes | |||||
| Threonine (T) | Threonine | Threonine | No | A54T | −0.195 (destabilizing) | ||
| 81 | Threonine (T) | Threonine | Yes No |
T81A |
−0.749 (destabilizing) | ||
Alanine (A) Glycine (G) | Alanine | No | T81G | −0.023 (destabilizing) | |||
| 93 | Threonine (T) | Threonine Valine |
Alanine | Yes Yes No |
T93V T93A |
0.031 (stabilizing) −1.004 (destabilizing) | |
| 97 | Asparagine (N) | Yes | |||||
| Glycine | Glycine | Glycine | No | N97G | 0.521 (stabilizing) |
| Human Amino Acid Position . | Type 1 Primates, Ruminants, Whales/Dolphins . | Type 2 Rodents and Other Mammals, Fish, Amphibians, Testudines . | Type 3 Aves, Lepidosauria Chondrichthyes . | Type 4 Crocodylia . | SeqAPASS Level 3 Prediction of Similar to Human LFABP Template . | Mutation in DUET . | Stability Change from DUET (ΔΔG, kcal/mol) . |
|---|---|---|---|---|---|---|---|
| 50 | Phenylalanine (F) | Phenylalanine | Yes | ||||
Valine (V) Isoleucine (I) Leucine (L) | Valine (V) Isoleucine (I) Leucine (L) | No No No | F50V F50I F50L | −1.196 (destabilizing) −0.808 (destabilizing) −0.893 (destabilizing) | |||
| 54 | Alanine (A) | Yes | |||||
| Threonine (T) | Threonine | Threonine | No | A54T | −0.195 (destabilizing) | ||
| 81 | Threonine (T) | Threonine | Yes No |
T81A |
−0.749 (destabilizing) | ||
Alanine (A) Glycine (G) | Alanine | No | T81G | −0.023 (destabilizing) | |||
| 93 | Threonine (T) | Threonine Valine |
Alanine | Yes Yes No |
T93V T93A |
0.031 (stabilizing) −1.004 (destabilizing) | |
| 97 | Asparagine (N) | Yes | |||||
| Glycine | Glycine | Glycine | No | N97G | 0.521 (stabilizing) |
Amino Acid Differences Across Species Compared With Human LFABP That Predict Different Bioaccumulation Potential
| Human Amino Acid Position . | Type 1 Primates, Ruminants, Whales/Dolphins . | Type 2 Rodents and Other Mammals, Fish, Amphibians, Testudines . | Type 3 Aves, Lepidosauria Chondrichthyes . | Type 4 Crocodylia . | SeqAPASS Level 3 Prediction of Similar to Human LFABP Template . | Mutation in DUET . | Stability Change from DUET (ΔΔG, kcal/mol) . |
|---|---|---|---|---|---|---|---|
| 50 | Phenylalanine (F) | Phenylalanine | Yes | ||||
Valine (V) Isoleucine (I) Leucine (L) | Valine (V) Isoleucine (I) Leucine (L) | No No No | F50V F50I F50L | −1.196 (destabilizing) −0.808 (destabilizing) −0.893 (destabilizing) | |||
| 54 | Alanine (A) | Yes | |||||
| Threonine (T) | Threonine | Threonine | No | A54T | −0.195 (destabilizing) | ||
| 81 | Threonine (T) | Threonine | Yes No |
T81A |
−0.749 (destabilizing) | ||
Alanine (A) Glycine (G) | Alanine | No | T81G | −0.023 (destabilizing) | |||
| 93 | Threonine (T) | Threonine Valine |
Alanine | Yes Yes No |
T93V T93A |
0.031 (stabilizing) −1.004 (destabilizing) | |
| 97 | Asparagine (N) | Yes | |||||
| Glycine | Glycine | Glycine | No | N97G | 0.521 (stabilizing) |
| Human Amino Acid Position . | Type 1 Primates, Ruminants, Whales/Dolphins . | Type 2 Rodents and Other Mammals, Fish, Amphibians, Testudines . | Type 3 Aves, Lepidosauria Chondrichthyes . | Type 4 Crocodylia . | SeqAPASS Level 3 Prediction of Similar to Human LFABP Template . | Mutation in DUET . | Stability Change from DUET (ΔΔG, kcal/mol) . |
|---|---|---|---|---|---|---|---|
| 50 | Phenylalanine (F) | Phenylalanine | Yes | ||||
Valine (V) Isoleucine (I) Leucine (L) | Valine (V) Isoleucine (I) Leucine (L) | No No No | F50V F50I F50L | −1.196 (destabilizing) −0.808 (destabilizing) −0.893 (destabilizing) | |||
| 54 | Alanine (A) | Yes | |||||
| Threonine (T) | Threonine | Threonine | No | A54T | −0.195 (destabilizing) | ||
| 81 | Threonine (T) | Threonine | Yes No |
T81A |
−0.749 (destabilizing) | ||
Alanine (A) Glycine (G) | Alanine | No | T81G | −0.023 (destabilizing) | |||
| 93 | Threonine (T) | Threonine Valine |
Alanine | Yes Yes No |
T93V T93A |
0.031 (stabilizing) −1.004 (destabilizing) | |
| 97 | Asparagine (N) | Yes | |||||
| Glycine | Glycine | Glycine | No | N97G | 0.521 (stabilizing) |
These potentially important amino acid differences observed across species by SeqAPASS were also evaluated in DUET for predicted changes in protein stability. In instances where amino acid substitutions were predicted by SeqAPASS to lead to differences in species ability to interact with PFAS, DUET provided another line of evidence that mutations to the human LFABP based on known cross-species substitutions destabilized the protein (Table 2). The only result from SeqAPASS (sequence-based prediction) that differed from DUET (structure-based prediction) was the N97G substitution which was predicted as different from human in SeqAPASS but stabilizing from DUET. This difference suggests that structural consideration can add unique information to the cross-species comparison of changes in amino acids.
Effect of Single-Residue Mutations on Protein Binding Affinity by MD
To evaluate the effectiveness of the MD workflow for predicting the effects of mutation on protein binding affinity, we compared our model predictions to experimental observations by Sheng et al. (2016). In that study, fluorescence displacement assays showed that after mutations S39G and M74G, the binding affinities for PFOA and PFNA were comparable to wild-type human LFABP (no substantial change), whereas after N111D or R122G mutations, the binding of both PFOA and PFNA to human LFABP were not detected (loss of binding). Isothermal titration calorimetry experiments in the same study indicated similar results, except that the R122G mutation did not cause a significant change to the human LFABP binding affinity for either PFOA or PFNA. Our molecular modeling results (Figure 3) showed a significant decrease in binding affinity after the R122G mutation for both PFOA and PFNA, whereas S39G and M74G mutations did not change the binding affinity significantly for either PFOA or PFNA. Finally, the N111D mutation caused a significant decrease in predicted binding affinity for PFOA, but not for PFNA (Figure 3). This shows good agreement between the experimental data and the prediction results and demonstrates the capability of our molecular modeling workflow to predict the mutation effects on protein binding affinity for PFAS.
Multiple comparisons (Tukey test) between wild-type human LFABP and single mutated LFABP mean free energy (ΔG) measures for PFOA and PFNA. Mutations (eg, S39G, meaning residue at position 39 was mutated from S to G, similar for other mutations) were selected for comparison with experimental data from the Sheng et al. study. Blue is wild-type human LFABP; red indicates significant difference (p .05); gray means no significant difference from wild-type (p > .05). The error bars represent the 95% confidence interval for each dataset.
Figure 4 shows the comparison between wild-type human LFABP and mutated LFABP selected based on SeqAPASS results. As indicated, there is no significant difference (p > .05) between wild-type human LFABP and all mutations for both PFOA and PFNA, except the single mutation F50V for PFNA, which has a significantly stronger binding affinity than wild-type human LFABP (p ≤ .05).
Multiple comparisons between wild-type human LFABP and mutated LFABP (Tukey test) for PFOA and PFNA, those mutations (T93F means residue at position 93 mutated from T to F, similar for other mutations) were selected based on SeqAPASS Level 3 results. Blue is wild-type human LFABP; red indicates significant difference (p ≤ .05); gray means no significant difference (p > .05). The error bars represent the 95% confidence interval for each dataset.
Cross-Species Effects on Protein Binding Affinity Using Full Sequences
The ΔGbind value and its 5 energy terms (ie, van der Waals, electrostatic, polar and nonpolar solvation energy, and entropy) were generated from the multistep molecular modeling workflow for each LFABP-PFAS complex using full gene sequences for human, rat, chicken, rainbow trout, zebrafish, Japanese medaka, and fathead minnow (Supplementary Tables 2–8 and Supplementary Figs. 3–6). The average ΔGbind values over the 9 tested PFAS ligands for human, rat, chicken, and rainbow trout are smaller than −8.0 kcal/mol. This is a substantially lower value (ie, stronger binding affinity) than that predicted for Japanese medaka and fathead minnow (average ΔGbind values larger than or equal to −5.25 kcal/mol, Table 3). The binding affinity for zebrafish is between these 2 groups, with an average ΔGbind value of −6.44 kcal/mol. A 1-way ANOVA shows there is a significant difference across 7 species for all 9 tested PFAS in terms of their LFABP-binding affinity, with p values ranging from 1.02E-10 to .021 (Table 4).
The Maximum, Minimum, and Mean Predicted Free Energy of Binding (ΔGbind, kcal/mol) Over All Tested PFAS Ligands for 7 Different Species of LFABP
| LFABPs . | Max ΔGbind . | Min ΔGbind . | Mean ΔGbind . |
|---|---|---|---|
| Human | −4.39 | −13.99 | −8.89 |
| Rat | −4.85 | −10.34 | −8.07 |
| Chicken | −4.89 | −12.99 | −9.20 |
| Zebrafish | −3.13 | −10.89 | −6.44 |
| Rainbow trout | −2.01 | −16.24 | −8.46 |
| Japanese medaka | 2.96 | −12.99 | −3.87 |
| Fathead minnow | 1.02 | −10.93 | −5.25 |
| LFABPs . | Max ΔGbind . | Min ΔGbind . | Mean ΔGbind . |
|---|---|---|---|
| Human | −4.39 | −13.99 | −8.89 |
| Rat | −4.85 | −10.34 | −8.07 |
| Chicken | −4.89 | −12.99 | −9.20 |
| Zebrafish | −3.13 | −10.89 | −6.44 |
| Rainbow trout | −2.01 | −16.24 | −8.46 |
| Japanese medaka | 2.96 | −12.99 | −3.87 |
| Fathead minnow | 1.02 | −10.93 | −5.25 |
The Maximum, Minimum, and Mean Predicted Free Energy of Binding (ΔGbind, kcal/mol) Over All Tested PFAS Ligands for 7 Different Species of LFABP
| LFABPs . | Max ΔGbind . | Min ΔGbind . | Mean ΔGbind . |
|---|---|---|---|
| Human | −4.39 | −13.99 | −8.89 |
| Rat | −4.85 | −10.34 | −8.07 |
| Chicken | −4.89 | −12.99 | −9.20 |
| Zebrafish | −3.13 | −10.89 | −6.44 |
| Rainbow trout | −2.01 | −16.24 | −8.46 |
| Japanese medaka | 2.96 | −12.99 | −3.87 |
| Fathead minnow | 1.02 | −10.93 | −5.25 |
| LFABPs . | Max ΔGbind . | Min ΔGbind . | Mean ΔGbind . |
|---|---|---|---|
| Human | −4.39 | −13.99 | −8.89 |
| Rat | −4.85 | −10.34 | −8.07 |
| Chicken | −4.89 | −12.99 | −9.20 |
| Zebrafish | −3.13 | −10.89 | −6.44 |
| Rainbow trout | −2.01 | −16.24 | −8.46 |
| Japanese medaka | 2.96 | −12.99 | −3.87 |
| Fathead minnow | 1.02 | −10.93 | −5.25 |
ANOVA Test Between 7 Different Species of LFABP (Human, Rat, Chicken, and Zebrafish, Rainbow Trout, Japanese Medaka and Fathead Minnow) for Different Ligands
| Ligands . | F-Value . | p Value . |
|---|---|---|
| PFBA | 3.69 | .0032 |
| PFPA | 3.43 | .0053 |
| PFHxA | 5.02 | .00028 |
| PFHpA | 15.06 | 1.02E-10 |
| PFOA | 3.56 | .0042 |
| PFNA | 3.35 | .0061 |
| PFBS | 6.07 | 3.29E-05 |
| PFHxS | 2.68 | .021 |
| PFOS | 3.28 | .0064 |
| Ligands . | F-Value . | p Value . |
|---|---|---|
| PFBA | 3.69 | .0032 |
| PFPA | 3.43 | .0053 |
| PFHxA | 5.02 | .00028 |
| PFHpA | 15.06 | 1.02E-10 |
| PFOA | 3.56 | .0042 |
| PFNA | 3.35 | .0061 |
| PFBS | 6.07 | 3.29E-05 |
| PFHxS | 2.68 | .021 |
| PFOS | 3.28 | .0064 |
ANOVA Test Between 7 Different Species of LFABP (Human, Rat, Chicken, and Zebrafish, Rainbow Trout, Japanese Medaka and Fathead Minnow) for Different Ligands
| Ligands . | F-Value . | p Value . |
|---|---|---|
| PFBA | 3.69 | .0032 |
| PFPA | 3.43 | .0053 |
| PFHxA | 5.02 | .00028 |
| PFHpA | 15.06 | 1.02E-10 |
| PFOA | 3.56 | .0042 |
| PFNA | 3.35 | .0061 |
| PFBS | 6.07 | 3.29E-05 |
| PFHxS | 2.68 | .021 |
| PFOS | 3.28 | .0064 |
| Ligands . | F-Value . | p Value . |
|---|---|---|
| PFBA | 3.69 | .0032 |
| PFPA | 3.43 | .0053 |
| PFHxA | 5.02 | .00028 |
| PFHpA | 15.06 | 1.02E-10 |
| PFOA | 3.56 | .0042 |
| PFNA | 3.35 | .0061 |
| PFBS | 6.07 | 3.29E-05 |
| PFHxS | 2.68 | .021 |
| PFOS | 3.28 | .0064 |
The multiple comparison Tukey test between human LFABP and the LFABPs for the other 6 species shows that Japanese medaka has significantly weaker LFABP binding affinity compared with human for all PFAS ligands (p ≤ .05) except PFHxA, PFOA, and PFNA (Figure 5). Fathead minnow also shows significantly weaker LFABP-binding affinity than human for PFBS and PFHxS (p ≤ .05), whereas LFABP of other species all indicate comparable binding affinity to human LFABP (p > .05) for all PFAS.
Multiple comparison (Tukey test) between human LFABP and other LFABPs for different PFAS. Blue is human LFABP; red indicates significant difference (p ≤ .05); gray means no difference from human wild type (p > .05). The error bars represent the 95% confidence interval for each dataset.
The analysis of each specific energy term of ΔGbind shows that for all LFABPs of different species, electrostatic interaction makes the most significant contribution to ΔGbind for PFAS, although most of the electrostatic interaction is compensated by the polar solvation energy. The nonpolar solvation energies are very small and remain stable among ligands, thus making minor contribution to the binding free energy (Supplementary Tables 2–8). Finally, as indicated in Supplementary Figures 4–7, a strong negative relationship is observed between van der Waals and carbon chain length for all LFABPs of different species: as carbon chain length increases, van der Waals decreases. The entropy component also shows a similar trend, but the correlation is weaker than van der Waals.
Finally, a correlation analysis was performed between ΔGbind and carbon chain length. As indicated in Figure 6, in all LFABP systems, a quite strong negative relationship was observed for both LFABP versus PFCAs and LFABP versus PFSAs, with the correlation coefficient ranging from −0.64 to −1.0.
Distribution of ΔGbind for different PFAS-LFABP complexes across species. The correlation coefficient between the mean of ΔGbind and PFAS chain length ranges between −0.64 and −1.00, indicating that binding affinity increases with chain length (a more negative ΔGbind means stronger affinity, and is exponentially related to the dissociation constant, Kd).
DISCUSSION
In this study, we demonstrated how to combine SeqAPASS with a multistep molecular modeling workflow to inform PFAS bioaccumulation potential across species. SeqAPASS informs PFAS bioaccumulation potential based on the analysis of protein sequence conservation. The SeqAPASS evaluation for LFABP conservation in this example provides a prediction that identifies species as likely or unlikely to have similar bioaccumulation potential to human. By assessing protein sequence similarity at different levels, SeqAPASS predicted similar PFAS bioaccumulation potential between humans and the majority of vertebrates when considering those critical amino acid residues that form hydrogen bonds with PFAS, with the exception of a subset of vertebrates that include the Japanese medaka and fathead minnow.
Further, the SeqAPASS tool can be used for hypothesis generation via the identification of differences in amino acids across taxa. Therefore, to followup from previous studies identifying differences among human and rat predicted binding affinities to PFAS, SeqAPASS Level 3 identified a number of potential critical individual residues (ie, F50, A54, T81, T93, and N97) that differed between human LFABP and LFABPs in other species. Based on side-chain classification and molecular weight as a surrogate for size, those residue differences among species groups provide insights as to which residues are likely to influence interactions with PFAS and therefore change the bioaccumulation potential. Additional evidence that these individual amino acid mutations found across species may be important was generated using the DUET tool to mutate those single amino acids in the human structure to represent residues seen in other species, where destabilization of the protein was found. Therefore, to more fully understand whether these predicted amino acid differences determined based primarily on sequence translate to changes in binding affinity based on structure the multistep molecular modeling workflow was employed. By evaluating changes in individual amino acids and double mutations at the structural level using MD, insights were gained regarding which amino acids may or may not be important for PFAS-LFABP interaction. In considering the five amino acid differences identified by SeqAPASS, only F50V mutation led to significant differences in binding to PFNA, showing stronger binding affinity than the wild-type human LFABP (Figure 4), therefore demonstrating the utility of the combined methods for hypothesis generation. However, the molecular modeling workflow also showed that in most cases, single or double mutation of these residues which distinguish one species from another did not significantly change their binding affinity (Figure 4), which suggests that these sole residue differences may not significantly affect LFABP-binding affinity for PFAS.
Building upon and advancing the sequence-based predictions, the multistep molecular modeling workflow provides additional PFAS bioaccumulation potential information from the perspective of protein function (eg, binding affinity) from evaluation of the full sequences. By estimating PFAS-binding affinity to structural models of LFABP across different species (ie, human, rat, chicken, zebrafish, rainbow trout, Japanese medaka, and fathead minnow), our workflow revealed that human LFABP has comparable PFAS binding affinity to all other vertebrate species evaluated, except Japanese medaka and fathead minnow. The LFABP of those 2 fish species indicated significantly weaker binding affinities than human for some PFAS ligands (Table 3 and Figure 5). A closer look at the binding modes of PFAS bound to human, Japanese medaka, and fathead minnow LFABP shows that the close contact residues are very similar across those species for different PFAS, but the positions of these residues are quite different between human and the 2 fish species (eg, SER124 vs. SER52, Supplementary Table 9 and Supplementary Figure 8). It seems that the position of key residues, which seem to drive the position of ligand binding, can cause significantly different binding affinities between humans and the 2 fish species. Because the identity, not the position, of close-contact residues is conserved (ie, the residue is a serine in both cases in the example above), the specific amino acids are implicated in facilitating certain key interactions (eg, hydrogen bonding). At the same time, when the position of ligands is closer to the bottom of the LFABP binding pocket, the binding affinity also tends to be stronger (Supplementary Figure 8). Thus, we conclude that when the position of key residues facilitate binding in a region of the protein that is more energetically favorable (eg, increases hydrophobic contacts), stronger binding affinities result. However, these observations should be tempered with an acknowledgment that molecular simulations have a degree of uncertainty and variations in the predictions of exact binding conformations can and do occur from simulation to simulation.
Given the SeqAPASS and molecular modeling results, human, rat, chicken, zebrafish, or rainbow trout seem to be better representative species of the higher range of vertebrate bioaccumulation potential of PFAS than Japanese medaka and fathead minnow. It is worth pointing out that this conclusion is based on the interaction between PFAS and LFABP. Other proteins such as serum albumin and membrane transporters also play important roles in determining the bioaccumulation behavior of PFAS (Cheng and Ng, 2017) and should be included in future work.
In addition to offering fast and reliable estimation of protein-binding affinity for PFAS across different species, which significantly expands the cross-species evaluation beyond yes or no predictions of bioaccumulation potential generated by SeqAPASS, the inclusion of MD in this multistep molecular modeling workflow provides valuable insights into how the chemical structures of PFAS influence their protein-binding behavior. For example, the ΔGbind results for different species showed that a strong negative correlation exists between the carbon chain length of PFAS and their LFABP binding affinity. A further examination of each individual energy component of ΔGbind indicates that this strong correlation is largely because the van der Waals interactions and entropy change, both of which have a close relationship with carbon chain length (Supplementary Figs. 4–7). The tools evaluated in this study (SeqAPASS, DUET, and MD) are highly complementary, with unique strengths and weaknesses. The major strength of SeqAPASS lies in its ability to rapidly predict bioaccumulation potential for hundreds of species and further screen for potentially important differences across hundreds of proteins. Its main limitation is that the type of difference observed across sequences may not necessarily translate to a susceptibility difference based on structural details of a protein/receptor (eg, whether that difference lies within a ligand-binding domain or results in a large enough structural shift). This weakness can be overcome by coupling to a structure-based tool such as MD. Although requiring considerably more time and computational resources than SeqAPASS, MD provides valuable structural insight into key differences across species. Finally, the differences observed in our single-residue mutations compared with whole-species sequences indicates that the additional structural information provided by comparing “wild-type” proteins among species can be valuable, particularly in determining whether a “model species” is indeed a good model for predicting bioaccumulation potential or toxicity in humans.
Finally, it is worth pointing out that beyond this investigation of PFAS-LFABP interaction, the combined SeqAPASS and molecular modeling workflow can be applied to other ligand-protein pairs to provide insights into biological distribution, tissue-specific bioaccumulation driven by protein binding, and/or toxicity based on specific receptor interactions for any chemical and biological target of interest. For example, studies have shown PFAS could cause developmental and immunotoxic effects via pathways that may depend on their interaction with peroxisome proliferator-activated receptors (PPARs) or may be independent of these receptors, and this dependence may differ by species (eg, humans vs rodents) (Abbott et al., 2012; DeWitt et al., 2009; Szilagyi et al., 2020). Our workflow could be employed to examine the interactions between PPARs and PFAS in different species to understand relative strengths of interaction. To generalize the workflow to other ligand-protein pairs, only the protein sequences and the 3-dimensional crystal structures for the proteins and ligands are required as input data. Moreover, it is useful to have some experimental data (eg, protein binding affinity for ligands) for model evaluation.
SUPPLEMENTARY DATA
Supplementary data are available at Toxicological Sciences online.
ACKNOWLEDGMENTS
This article has been reviewed in accordance with the requirements of the U.S. Environmental Protection Agency (USEPA) Office of Research and Development. However, the recommendations and conclusions made herein do not represent USEPA policy. Mention of products or trade names does not indicate endorsement by the USEPA.
FUNDING
National Science Foundation (1845336).
DECLARATION OF CONFLICTING INTERESTS
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
REFERENCES
Author notes
Carlie LaLone and Carla Ng contributed equally to this study.















Comments