MSModDetector: a tool for detecting mass shifts and post-translational modifications in individual ion mass spectrometry data

Abstract Motivation Post-translational modifications (PTMs) on proteins regulate protein structures and functions. A single protein molecule can possess multiple modification sites that can accommodate various PTM types, leading to a variety of different patterns, or combinations of PTMs, on that protein. Different PTM patterns can give rise to distinct biological functions. To facilitate the study of multiple PTMs on the same protein molecule, top-down mass spectrometry (MS) has proven to be a useful tool to measure the mass of intact proteins, thereby enabling even PTMs at distant sites to be assigned to the same protein molecule and allowing determination of how many PTMs are attached to a single protein. Results We developed a Python module called MSModDetector that studies PTM patterns from individual ion mass spectrometry (I2MS) data. I2MS is an intact protein mass spectrometry approach that generates true mass spectra without the need to infer charge states. The algorithm first detects and quantifies mass shifts for a protein of interest and subsequently infers potential PTM patterns using linear programming. The algorithm is evaluated on simulated I2MS data and experimental I2MS data for the tumor suppressor protein p53. We show that MSModDetector is a useful tool for comparing a protein’s PTM pattern landscape across different conditions. An improved analysis of PTM patterns will enable a deeper understanding of PTM-regulated cellular processes. Availability and implementation The source code is available at https://github.com/marjanfaizi/MSModDetector.


B A C
Figure S5: Noise and error distribution in I 2 MS data from endogenous p53.Data was obtained from MCF7 cells under 2 different conditions with two replicates each (Nutlin-3a only and UV radiation).Peaks from a mass range where no signal is observed (44.6kDa -46kD) is used to obtain the distribution for the basal noise (A).
The mass spectrum region between 43.6 kDa and 44.6 kDa is selected to determine horizontal and vertical error (B,C).Mass differences between peaks that differ by 1 Da are considered as horizontal error (B).To calculate the vertical error Gaussian distributions are fitted to the mass spectrum and the relative deviation from the fit to the peaks are used for the distribution of the vertical error.Purple lines represent fitted beta distributions.Theoretical mass spectra of manually generated p53 phosphorylation patterns including basal noise, horizontal and vertical error are generated 100 times.Each value depicted here is the average of 100 simulations.The mass tolerance is set to 36 ppm.The phosphorylation pattern data set contains 7 PTM patterns (see Supplementary Table S1).On the left, the number of detected mass shifts is displayed for all different combinations of noise and error and how well their predicted abundances match the observed abundances.On the right, the average value for all correct PTM pattern predictions is shown for three different objective functions.Basal noise and vertical and horizontal error are added to the complex PTM pattern mass spectrum and simulations are run 100 times.The mass tolerance is set to 36 ppm.We observe that iterating through the solution space of the linear program increases the number of correct PTM pattern predictions.If we look at the first optimal solution only (top panel) 3 PTM patterns out of 18 are predicted successfully by MSModDetector.A PTM pattern for a given mass shift is determined to be successfully predicted by MSModDetector if 75%, out of 100 simulations, are predicted correctly.If we consider the 3 optimal solutions for each mass shifts, 6 PTM patterns are successfully predicted.The number of successfully predicted PTM patterns increases to 7 if we consider the 10 optimal solutions.For every detected mass shift the linear program with the objective to minimize the number of PTMs is solved and the solution space is explored for a maximum mass tolerance of 36 ppm.The number of possible PTM pattern combinations is shown for each mass shift.The following PTM types are considered for the pattern combinations: phosphorylation (Ph), acetylation (Ac), methylation (Me1), di-methylation (Me2), tri-methylation (Me3), phosphate (Ph-OH), oxidation (Ox), cysteinylation (Cys), sodium adduct (Na).Table S4: Top 3 PTM pattern predictions for the I 2 MS data of endogenous p53 under Nutlin-3a and UV conditions.PTM pattern predictions are shown for two different objective functions: "min both" is minimizing the number of PTMs and the error between observed and inferred mass shift and "min ptm" is only minimizing the number of PTMs.The following PTM types are considered for the pattern combinations: phosphorylation (Ph), acetylation (Ac), methylation (Me1), di-methylation (Me2), tri-methylation (Me3), phosphate (Ph-OH), oxidation (Ox), cysteinylation (Cys), sodium adduct (Na).

Figure S6 :
FigureS6: Impact of noise and error on the algorithm's prediction for phosphorylation patterns on p53.Theoretical mass spectra of manually generated p53 phosphorylation patterns including basal noise, horizontal and vertical error are generated 100 times.Each value depicted here is the average of 100 simulations.The mass tolerance is set to 36 ppm.The phosphorylation pattern data set contains 7 PTM patterns (see Supplementary TableS1).On the left, the number of detected mass shifts is displayed for all different combinations of noise and error and how well their predicted abundances match the observed abundances.On the right, the average value for all correct PTM pattern predictions is shown for three different objective functions.

Figure S7 :
FigureS7: Evaluation of PTM pattern prediction using the objective to minimize the number of PTMs.Basal noise and vertical and horizontal error are added to the complex PTM pattern mass spectrum and simulations are run 100 times.The mass tolerance is set to 36 ppm.We observe that iterating through the solution space of the linear program increases the number of correct PTM pattern predictions.If we look at the first optimal solution only (top panel) 3 PTM patterns out of 18 are predicted successfully by MSModDetector.A PTM pattern for a given mass shift is determined to be successfully predicted by MSModDetector if 75%, out of 100 simulations, are predicted correctly.If we consider the 3 optimal solutions for each mass shifts, 6 PTM patterns are successfully predicted.The number of successfully predicted PTM patterns increases to 7 if we consider the 10 optimal solutions.
objective: min both incorrect PTM pattern mass shift not detected correct PTM pattern Top 1 PTM pattern predictions Top 3 PTM pattern predictions Top 5 PTM pattern predictions Top 10 PTM pattern predictions

Figure S8 :Figure S9 :
FigureS8: Evaluation of PTM pattern prediction using the objective to minimize both the number of PTMs and the error between observed and inferred mass shifts.Basal noise and vertical and horizontal error are added to the complex PTM pattern mass spectrum and simulations are run 100 times.The mass tolerance is set to 36 ppm.As in Supplementary Fig.S6, we observe that iterating through the solution space of the linear program increases the number of correct PTM pattern predictions.If we look at the first optimal solution only (top panel) 5 PTM patterns out of 18 are predicted successfully by MSModDetector.A PTM pattern for a given mass shift is determined to be successfully predicted by MSModDetector if 75%, out of 100 simulations, are predicted correctly.If we consider the 3 optimal solutions for each mass shifts, 6 PTM patterns are successfully predicted.The number of successfully predicted PTM patterns increases to 7 if we consider the top 5 or 10 optimal solutions.