Cheetah-MS: a web server to model protein complexes using tandem cross-linking mass spectrometry data

Abstract Summary Protein–protein interactions (PPIs) are central in many biological processes but difficult to characterize, especially in complex, unfractionated samples. Chemical cross-linking combined with mass spectrometry (MS) and computational modeling is gaining recognition as a viable tool in protein interaction studies. Here, we introduce Cheetah-MS, a web server for predicting the PPIs in a complex mixture of samples. It combines the capability and sensitivity of MS to analyze complex samples with the power and resolution of protein–protein docking. It produces the quaternary structure of the PPI of interest by analyzing tandem MS/MS data (also called MS2). Combining MS analysis and modeling increases the sensitivity and, importantly, facilitates the interpretation of the results. Availability and implementation Cheetah-MS is freely available as a web server at https://www.txms.org.


Introduction
Cross-linking mass spectrometry (XL-MS) is a powerful technique to measure protein-protein interactions (PPIs) directly in complex samples (O'Reilly et al., 2018). Bi-functional reagents are used to covalently link two specific residues when the proteins are in their native states. The proteins then undergo enzymatic digestions resulting in many peptides linked by the reagents. The length of the crosslinker arm reveals the maximum distance between the two crosslinked amino acids, and this information is then used to identify and characterize the PPI. Using macromolecular modeling tools such as Rosetta (Koehler et al., 2020), a structural model can be created if enough cross-linked peptides are identified. Here, we propose Cheetah-MS, a web server based on our previously published method, targeted chemical cross-linking MS (TX-MS), a deep integration of protein structure modeling, and chemical XL-MS (Hauri et al., 2019). The power of Cheetah-MS relies on its fast convergence to the solution due to iterative sampling and filtering by XL peptides, where we reduced the number of decoy sampling by order of magnitude. Cheetah-MS supports tandem MS/MS acquisition data type based on non-cleavable reagents (DSS/BS3, DSG and EGS) and can detect up to 12 post-translational modifications (PTMs).
input PDBs, recognize the chains, retrieve the sequences and combine the two PDBs into a starting conformational model. XLgenerator provides a complete list of all theoretical XLs without considering distance cutoff. Next, this list is passed to Taxlink for MS/ MS data analysis. In case the input file is not already in Mascot Generic Format, msconvert from ProteoWizard (Kessner et al., 2008) converts the input mzML file to MGF file format. This file goes then for a filtering/cleaning process according to the XLs provided by the previous step where only spectra containing the monoisotopic mass/charge of interest are passed to the filtered version of the file. Here, for each XL, a set of ion fragments are produced, and their pattern is investigated through the filtered MGF file to find the match. In the modeling-core, selected XLs from the Taxlink node are used to score a set of docking models (2000 models for all runs), provided by Megadock v4.0 (Ohue et al., 2014), and the top scored models are selected. Finally, the best model that supports the largest number of XLs is chosen to be visualized in the output.
To run Cheetah-MS, users need to provide two PDB files and one MS/MS mzML (or converted MGF file) containing the XL-MS data. The advanced options to set include the XL agent, the PTM(s) of interest, the number of final models, the cutoff threshold for modeling, the delta-window for precursor and product ion detection, and finally, the intensity value to remove the background noise in MS/ MS data analysis.
After submitting the workflow, the status of the running job is shown, containing the job identifier at the top and the exact processing time of each submodule below. Once the workflow is finished, the best-scoring model is visualized using the NGL viewer (Rose et al., 2018) together with the data analysis report in a Jupyter Notebook. The report was designed to both allow a user to assess the results quickly and to download and extend them to gain deeper insights, often in project-specific ways.

Results and applicability
Cheetah-MS has been applied to several case studies as the core MS/MS analysis part of the TX-MS approach. Table 1 summarizes the list of published studies where Cheetah-MS was applied for MS/MS data analysis. Also, to test the applicability of the workflow in the webserver context, we reconstructed the Streptococcus pyogenes M1 protein interactions with two human plasma proteins (fibrinogen and albumin) based on MS/MS samples obtained from recombinant M1 protein and purified human plasma fibrinogen and albumin. This has resulted in 27 and 10 XLs between M1-fibrinogen and M1-albumin, respectively. Based on the list of detected XLs and produced models, the same binding interface is obtained compared to the initial study (details on the web server manual page).