-
PDF
- Split View
-
Views
-
Cite
Cite
Véronique Hourdel, Stevenn Volant, Darragh P. O’Brien, Alexandre Chenal, Julia Chamot-Rooke, Marie-Agnès Dillies, Sébastien Brier, MEMHDX: an interactive tool to expedite the statistical validation and visualization of large HDX-MS datasets, Bioinformatics, Volume 32, Issue 22, 15 November 2016, Pages 3413–3419, https://doi.org/10.1093/bioinformatics/btw420
Close -
Share
Motivation: With the continued improvement of requisite mass spectrometers and UHPLC systems, Hydrogen/Deuterium eXchange Mass Spectrometry (HDX-MS) workflows are rapidly evolving towards the investigation of more challenging biological systems, including large protein complexes and membrane proteins. The analysis of such extensive systems results in very large HDX-MS datasets for which specific analysis tools are required to speed up data validation and interpretation.
Results: We introduce a web application and a new R-package named ‘MEMHDX’ to help users analyze, validate and visualize large HDX-MS datasets. MEMHDX is composed of two elements. A statistical tool aids in the validation of the results by applying a mixed-effects model for each peptide, in each experimental condition, and at each time point, taking into account the time dependency of the HDX reaction and number of independent replicates. Two adjusted P-values are generated per peptide, one for the ‘Change in dynamics’ and one for the ‘Magnitude of ΔD’, and are used to classify the data by means of a ‘Logit’ representation. A user-friendly interface developed with Shiny by RStudio facilitates the use of the package. This interactive tool allows the user to easily and rapidly validate, visualize and compare the relative deuterium incorporation on the amino acid sequence and 3D structure, providing both spatial and temporal information.
Availability and Implementation: MEMHDX is freely available as a web tool at the project home page http://memhdx.c3bi.pasteur.fr
Contact:marie-agnes.dillies@pasteur.fr or sebastien.brier@pasteur.fr
Supplementary information:Supplementary data is available at Bioinformatics online.
1 Introduction
Hydrogen Deuterium eXchange followed by Mass Spectrometry (HDX-MS) is a biophysical tool in structural biology capable of probing protein/ligand interactions, conformational changes and protein folding and dynamics (Konermann et al., 2011; Tsutsui and Wintrode, 2007; Wales and Engen, 2006). Despite an increased number of applications (Jaswal, 2013; Pirrone et al., 2015), the expansion of the technology has been slowed by its intrinsic technical and analytical complexity (i.e. digestion at pH 2.5 and rapid HPLC separation at 0 °C). With recent advancements in sample preparation robotics, refrigerated ultra-high performance liquid chromatography systems (Venable et al., 2012; Wales et al., 2008) and high-resolution mass spectrometers, HDX-MS has surged in popularity and is now emerging from the academic benchtop to the expanse of the pharmaceutical sector (Campobasso and Huddler, 2015; Huang and Chen, 2014; Majumdar et al., 2015; Marciano et al., 2014; Wei et al., 2014).
The use of improved HDX-MS workflows enables the structural analysis of larger protein systems such as antigen-antibody complexes (> 180 kDa) or integral membrane proteins, in a more routine way (Bertoldi et al., 2016; Chung et al., 2011; Faleri et al., 2014; Malito et al., 2014; Malito et al., 2013). The characterization of such systems results in very complex HDX-MS datasets, for which specific non-commercial analytical software (e.g. HXExpress, HeXicon, ExMS, HDXFinder, etc.), as well as commercial platforms (e.g. DynamX and HDExaminer), have been developed (Guttman et al., 2013; Hamuro et al., 2003; Kan et al., 2011; Kreshuk et al., 2011; Lindner et al., 2014; Lou et al., 2010; Miller et al., 2012; Weis et al., 2006). Such software extract deuterium incorporation information from (un)processed raw m/z data files and produce deuterium uptake curves that can pinpoint regions of interest on protein 3D structures. However, many of these tools do not integrate statistical approaches and use the absolute difference of deuterium uptake to evaluate the significance between conditions.
Hydra (or Mass Spec Studio) was the first standalone application to introduce Student’s t-test and P-values to statistically evaluate the difference across HDX-MS experiments (Rey et al., 2014; Slysz et al., 2009). Alternatively, Houde et al. (2011) calculated confidence limits based on the experimental uncertainty of measuring deuterium uptake across replicates. In this context, two distinct confidence limits were calculated manually using either the differences of deuterium uptake or the summed value of HDX differences measured for each peptide, in each condition, and at each time point. Lastly, HDX Workbench includes a two-tailed t-test and Tukey multiple comparison procedure for the statistical cross-comparison of two or more datasets (Pascal et al., 2009; Pascal et al., 2007, 2012).
Despite significant efforts to design requisite statistical tools, the aforementioned software solutions are suitable to analyze HDX-MS data at one time point only, failing to account for the time dependency of the HDX-MS reaction. Thus, Liu et al. (2011) proposed a multiple regression or ANCOVA model. By deriving a statistical test based on the model parameters, they evaluated the significant difference between two groups under comparison, for all peptides in the dataset, and across all independent replicates.
Expanding on this, we introduce a web application named ‘MEMHDX’ (Mixed-Effects Model for HDX experiments) to aid in the rapid statistical validation and visualization of large HDX-MS datasets. MEMHDX uses a linear mixed-effects model where replicates are considered as random effects. This accounts for both the time dependency and the variability across replicates. Moreover, instead of testing the variation in global deuterium exchange between two experimental conditions, we propose to calculate two individual P-values for each peptide. First, the difference between conditions is measured (P-value for the ‘Magnitude of Delta Deuterium (ΔD)’), followed by the evolution of the deuterium uptake behavior over the time course of the experiment (P-value for the ‘Change in dynamics’). MEMHDX, therefore, allows the clustering of each peptide in the dataset based on these two respective P-values. A user-friendly interface developed with Shiny by RStudio facilitates the use of the application, allowing the user to easily visualize and compare the relative deuterium incorporation across the entire protein sequence and 3D structure. As a test system, we used the receptor-binding Repeat-in-ToXin (RTX) domain of CyaA produced by Bordetella pertussis, the causative agent of whooping cough, to pinpoint regions undergoing structural and conformational changes upon calcium binding (O'Brien et al., 2015; Sotomayor-Perez et al., 2015).
2 Methods
The MEMHDX strategy. (A) .csv file containing deuterium uptake values for all identified peptides is exported after raw data extraction and analysis by dedicated HDX-MS software (e.g. DynamX). The .csv file is uploaded to MEMHDX where a linear mixed-effects model is applied to statistically validate the dataset. Main results are displayed on a ‘Logit’ representation (for data clustering and validation) and visualized using a global summary plot and the 3D structure, where available. A user-friendly ShinyInterface facilitates the use of the application (Color version of this figure is available at Bioinformatics online.)
The MEMHDX strategy. (A) .csv file containing deuterium uptake values for all identified peptides is exported after raw data extraction and analysis by dedicated HDX-MS software (e.g. DynamX). The .csv file is uploaded to MEMHDX where a linear mixed-effects model is applied to statistically validate the dataset. Main results are displayed on a ‘Logit’ representation (for data clustering and validation) and visualized using a global summary plot and the 3D structure, where available. A user-friendly ShinyInterface facilitates the use of the application (Color version of this figure is available at Bioinformatics online.)
2.1 The MEMHDX R-package
2.1.1 The mixed-effects model
2.1.2 The statistical inference
The ‘nmle’ R package was utilized to estimate the vectors of fixed effects (β) and ur (http://CRAN.R-project.org/package=nlme). The restricted maximum-likelihood method was used to fit the linear mixed-effects model. To evaluate the statistical significance of the condition and the interaction on the rate of deuterium incorporation of the protein, two P-values were calculated per peptide, using two individual Wald tests. The condition-associated P-value (hereafter referred to as P-valueMagnitude_of_Delta_D) tests the null hypothesis of there being no difference in deuterium uptake between conditions 1 and 2; the interaction-associated P-value (hereafter referred to as P-valueChange_in_dynamics) tests the null hypothesis of there being no change in the deuterium uptake behavior between conditions 1 and 2 and takes into account the time dependency of the HDX reaction (see Comment 1 of Supplementary material). A multiple testing procedure was applied to adjust the significance level of each Wald test. In this work, the false discovery rate (FDR) criterion was used instead of the classical familywise error rate to achieve higher statistical power (Benjamini and Hochberg, 1995).
2.2 User interface and data output
2.2.1 The ‘start analysis’ window
To facilitate the use of the MEMHDX R-package, a user-friendly interactive web interface was developed with Shiny by RStudio (http://shiny.rstudio.com). The ‘Start analysis’ window (Supplementary Fig. S1A) contains a short reminder of the main variables required in the input .csv file, i.e. the sequence and the position of each peptide in the protein, charge state z, extracted centroid m/z values for each time point and replicate and in each condition, exposure time (min), number of replicates, and the maximum number of exchangeable amide hydrogens (MaxUptake) that could be theoretically replaced into the peptide. MaxUptake corresponds to the number of amino acid residues contained in the peptide, minus the number of proline residues and minus one for the N-terminus that back-exchanges too rapidly to be measured (Englander and Kallenbach, 1983). Once the .csv file is loaded, the system informs the user of any missing variables. Missing centroid m/z values are allowed; in such a scenario, they are automatically replaced by the mean value across all replicates.
2.2.2 Data output
Once processing is complete, the ‘HDX-MS Results’ panel appears below the ‘Start analysis’ window. This is composed of six independent tabs, namely Raw Data, Peptide Plot, Logit Plot, Global Visualization, 3D Structure and Summary.
The ‘Raw Data’ tab lists all the variables included in the .csv file. The ‘Peptide Plot’ tab (Supplementary Fig. S1B1) allows the user to control the quality of the fitted model for each peptide and to evaluate the reproducibility across replicates. The ‘Logit Plot’ tab summarizes the main statistical results (Supplementary Fig. S1B2). This is divided into three sections. The ‘plot options’ section is used to adjust the options of the plot (size points, max distance in pixels) and the export. The center area displays the ‘Logit’ representation, where each dot represents one individual peptide. The Logit function of the two P-values defines the position of each peptide in the graph. The red lines correspond to the statistical significance threshold for the ‘Magnitude of ΔD’ (vertical line), and for the ‘Change in dynamics’ (horizontal line), thus dividing the ‘Logit’ plot into four regions (Supplementary Fig. S1B2): peptides only statistically significant for the ‘Change in dynamics’ or the ‘Magnitude of ΔD’ are located in the ‘a’ and the ‘b’ region, respectively; statistically significant peptides in both states are clustered in the ‘c’ region; statistically non-significant peptides are displayed in the ‘d’ region. The ‘Logit’ viewing mode of MEMHDX provides a rapid and effective way to discriminate statistically significant from non-significant peptides and to classify each based on their respective HDX behavior. The final section appears at the bottom of the ‘Logit’ screen after selection of an individual peptide in the ‘Logit’ representation (not shown). MEMHDX automatically displays a summary table containing the mean incorporation (in Da) per time point, and state, and associated ‘Logit’ values. This summary table can be exported into a .csv format. The identity of each peptide (i.e. sequence, position and deuterium uptake curves) appears below the ‘Logit’ plot when the mouse cursor is hovered over a dot.
The ‘Global Visualization’ and the ‘3D Structure’ tabs are dedicated to the visualization of the HDX data and contain two sections (Supplementary Figs S1B3 and 1B4). The left section allows the user to navigate through the different parameter options. The right section of the ‘Global Visualization’ summarizes the deuterium uptake behavior observed in the two states (Supplementary Fig. S1B3, upper and middle charts). The RFU values calculated by MEMHDX are plotted as a function of peptide position. In addition, the lower chart plots the difference in RFU between states 1 and 2, providing a more quantitative assessment of the difference between states. Statistically significant peptides are highlighted in gray. The right section of the ‘3D Structure’ tab displays the mapping of the HDX results on the crystal structure of the protein (Supplementary Fig. S1B4). Finally, the ‘Summary’ tab recaps the different parameters used by MEMHDX to perform the statistical analysis and includes an export option for both the raw data and statistically relevant results (not shown).
3 Results and discussion
MEMHDX was developed to complement the HDX-MS pipeline commercialized by Waters Corporation. The software is fully compatible with the main output generated by DynamX (i.e. cluster data), but has been designed to handle data from any HDX-MS platform, as long as the input file is structured in .csv format and with the appropriate architecture, i.e., columns listed in Panel 1 of Supplementary Figure S1A.
MEMHDX was evaluated using our recently published differential HDX-MS dataset generated with the C-terminal Repeat-in-toxin Domain (RD, 701 residues) of the CyaA toxin (O'Brien et al., 2015). Briefly, we used HDX to identify and locate secondary structural elements in the intrinsically disordered Apo-RD protein, and followed its transition to a more compact and folded state upon calcium binding. In total, 13 experimental time points were selected, and all data was collected in triplicate.
3.1 Pre-processing and dataset quality control
A total of 602 peptic peptides of RD were identified by ProteinLynX Global Server (Waters Corporation) using the default search parameters, of which 198 remained after filtering by DynamX. Following this, 162 were selected for HDX analysis and covered 98.4% of the RD sequence (Supplementary Fig. S2). Considering 162 peptides, 1 charge state, 3 replicates, 14 time points (including the unlabeled control) and 2 conditions, we have up to 13 608 unique data points in the complete test dataset. This demonstrates the complexity of datasets acquired using modern HDX-MS pipelines, the processing and interpretation of which is extremely challenging and time consuming by traditional, manual means.
The current version of MEMHDX can only handle one unique charge state per peptide. A pre-processing step is required by the user to select the most appropriate charge state for analysis. The quality determination of the dataset is directly accomplished by MEMHDX. A Box and Whisker plot displaying the agreement across replicates is automatically generated (Supplementary Fig. S1B1). For each peptide, the calculated deuterium uptake values (Ya,i,r,t) are averaged across all time points in each respective state. This comparison is possible as each replicate is considered as an independent variable. A scoring function (log-likelihood) is determined per peptide to easily control the quality of the fitting. MEMHDX includes a filtering option to sort each peptide according to this score and to remove those with a low fitting quality from the analysis.
3.2 Statistical analysis and peptide clustering
Example of statistical results generated with MEMHDX. (A) ‘Logit’ plot obtained with RD. The effect of calcium binding on the deuterium uptake behavior was measured on 162 RD peptides. Each dot corresponds to one unique peptide. Peptides are classified and color-coded based on their respective HDX behavior: peptides showing dynamic events in both the Apo- and Holo-state are colored in gray; peptides only dynamic in the Apo- or Holo-state are colored in red and blue; non-dynamic peptides in both states are colored in green. The statistical significance threshold was set to 1%. (B) Deuterium uptake curves for selected peptides in the Apo- (open circles) and Holo- (filled squares) states. The position of each in the ‘Logit’ plot is also reported (Color version of this figure is available at Bioinformatics online.)
Example of statistical results generated with MEMHDX. (A) ‘Logit’ plot obtained with RD. The effect of calcium binding on the deuterium uptake behavior was measured on 162 RD peptides. Each dot corresponds to one unique peptide. Peptides are classified and color-coded based on their respective HDX behavior: peptides showing dynamic events in both the Apo- and Holo-state are colored in gray; peptides only dynamic in the Apo- or Holo-state are colored in red and blue; non-dynamic peptides in both states are colored in green. The statistical significance threshold was set to 1%. (B) Deuterium uptake curves for selected peptides in the Apo- (open circles) and Holo- (filled squares) states. The position of each in the ‘Logit’ plot is also reported (Color version of this figure is available at Bioinformatics online.)
HDX-MS results visualized by MEMHDX. (A) Global visualization plots of RD in both the Apo- and Holo-state. The relative fractional uptake values are plotted as a function of peptide position. This representation allows the user to visualize the deuterium uptake behavior of each peptide across the entire protein sequence. (B) Fractional uptake difference plot showing the variations of deuterium uptake between Apo- and Holo-RD. A high-uptake difference value corresponds to a large calcium-induced protective effect, while a low value is indicative of a weak effect. Significant peptides are highlighted in gray. The statistical significance threshold was set to 1%. (C and D) Mapping of the HDX results on the model of RD using the 3D-structural tool of MEMHDX. RD is shown as a cartoon or in a space filling model. Regions experiencing deuterium uptake or dynamic changes upon calcium binding are colored in cyan; regions with no change are colored in red (Color version of this figure is available at Bioinformatics online.)
HDX-MS results visualized by MEMHDX. (A) Global visualization plots of RD in both the Apo- and Holo-state. The relative fractional uptake values are plotted as a function of peptide position. This representation allows the user to visualize the deuterium uptake behavior of each peptide across the entire protein sequence. (B) Fractional uptake difference plot showing the variations of deuterium uptake between Apo- and Holo-RD. A high-uptake difference value corresponds to a large calcium-induced protective effect, while a low value is indicative of a weak effect. Significant peptides are highlighted in gray. The statistical significance threshold was set to 1%. (C and D) Mapping of the HDX results on the model of RD using the 3D-structural tool of MEMHDX. RD is shown as a cartoon or in a space filling model. Regions experiencing deuterium uptake or dynamic changes upon calcium binding are colored in cyan; regions with no change are colored in red (Color version of this figure is available at Bioinformatics online.)
3.3 Global and 3D visualization
MEMHDX integrates two visualization tools to facilitate the interpretation of the HDX results. The ‘global visualization’ tool contains two distinct plots. The relative fractional uptake plot enables the user to directly visualize and compare the relative deuterium uptake and the exchange behavior of each peptide across the entire protein sequence, providing both spatial and temporal information (Fig. 3A). Using this representation, the regions of the RD-protein containing residual structural elements in the Apo-state or acquiring secondary structures in the Holo-state are easily identified. The second plot displays the difference in relative fractional uptake between the two states (Fig. 3B). A high uptake difference indicates that the peptide incorporates more deuterium in state 1 than in state 2 and vice versa. This is the case for peptides 203–209 of RD, which incorporates more deuterium in the Apo-state than in the Holo-state, indicative of a calcium-induced protective effect (Fig. 3B). In addition, the statistical results are reported on the differential uptake plot. Statistically non-significant peptides are highlighted in white, whereas those displaying significant P-values in gray.
The second visualization tool allows the mapping of the HDX results on the 3D structure of the protein (Fig. 3C and D). The user can either download a PDB file or enter the
Protein DataBank identifier of the protein. In this scenario, MEMHDX will automatically retrieve and display the structure from the protein databank archive. Differential exchange behaviors will be color-coded to distinguish modified from unmodified regions (Fig. 3C and D).
4 Conclusions
MEMHDX is a statistical tool designed to aid in the analysis of large HDX-MS datasets. The initial concept of the software was to complement the HDX-MS solution provided by Waters Corporation, which is to-date, the only complete automated pipeline commercially available. The use of two distinct P-values in the handling of HDX-MS data introduces a novel way to interpret and classify HDX results. This was successfully demonstrated here with the analysis of the RD protein from the adenylate cyclase toxin. In this regard, the validation of the complete RD dataset was significantly expedited, taking only 1 day with MEMHDX, compared with up to 2 weeks when performed manually. The current version of the software allows for the comparison of two unique conditions using only one unique charge state. As a future perspective, the software will be enhanced to allow for the comparison of multiple conditions, using multiple charge states.
Acknowledgements
We thank Christophe Malabat for providing the virtual machine, Amine Ghozlane for his help with the 3D visualization tool and Diogo Borges Lima for his contribution to the on-line tutorial.
Funding
This work was supported by the Institut Pasteur (Projet Transversal de Recherche, PTR#451 and PasteurInnov TransCyaA 2015), the Centre national de la Recherche Scientifique (CNRS UMR 3528, Biologie Structurale des Processus Cellulaires et Maladies Infectieuses; CNRS USR 3756) and funding from the Investissements d’Avenir through the CACSICE project.
Conflict of interest: none declared.
References
Author notes
†The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.



