- Split View
-
Views
-
Cite
Cite
Yaojun Wang, Dongbo Bu, Chuncui Huang, Hui Wang, Jinyu Zhou, Junchuan Dong, Weiyi Pan, Jingwei Zhang, Qi Zhang, Yan Li, Shiwei Sun, Best-first search guided multistage mass spectrometry-based glycan identification, Bioinformatics, Volume 35, Issue 17, September 2019, Pages 2991–2997, https://doi.org/10.1093/bioinformatics/btz056
- Share Icon Share
Abstract
Glycan identification has long been hampered by complicated branching patterns and various isomeric structures of glycans. Multistage mass spectrometry (MSn) is a promising glycan identification technique as it generates multiple-level fragments of a glycan, which can be explored to deduce branching pattern of the glycan and further distinguish it from other candidates with identical mass. However, the automatic glycan identification still remains a challenge since it mainly relies on expertise to guide a MSn instrument to generate spectra.
Here, we proposed a novel method, named bestFSA, based on a best-first search algorithm to guide the process of spectrum producing in glycan identification using MSn. BestFSA is able to select the most appropriate peaks for next round of experiments and complete the identification using as few experimental rounds. Our analysis of seven representative glycans shows that bestFSA correctly distinguishes actual glycans efficiently and suggested bestFSA could be used in practical glycan identification. The combination of the MSn technology coupled with bestFSA should greatly facilitate the automatic identification of glycan branching patterns, with significantly improved identification sensitivity, and reduce time and cost of MSn experiments.
Supplementary data are available at Bioinformatics online.
1 Introduction
A large number of technologies have been proposed for glycan identification, among which mass spectrometry is one of the most specific and sensitive techniques without requirement of glycan standards (Reinhold et al., 2013; Smit et al., 2015; Zhang et al., 2016). Similar to peptide identification, the existing MS-based methods explore MS1 or MS2 information for glycan identification (Dwek et al., 1995; Hänsler et al., 1995; Malhotra et al., 1995; Rademacher et al., 1994; Youings et al., 1996).
However, MS1 or MS2 cannot provide sufficient information to elucidate complicated branching patterns of glycans, rendering limitation of the existing approach for glycan identification (Ashline et al., 2017). There are two reasons: first, identifying the highly branched glycans requires more information than identifying linear peptides (Gińdzieńska-Sieśkiewicz et al., 2016; Pekelharing et al., 1988). Second, the number of different mass components of glycans is much fewer than that of peptides, it makes the informative points in the spectrum is less than that of peptides. Superior to MS1 and MS2, multiple-stage MS experiments (MSn) can provide various insights into a glycan structure (Ashline et al., 2005, 2007, 2017; Reinhold et al., 2013; Sun et al., 2018; Zaia, 2008). Briefly, MSn refers to the concatenation of MS experiments, where a peak is selected from the existing spectra as precursor ion to feed into the mass spectrometer to undergo another round of fragmentation, and the produced ions are reported in the form of a mass spectrum. MSn cleaves the glycan molecule into much smaller fragments, and thus has the potential to reveal more detailed information of the glycan.
As described above, most existing bioinformatics studies related to glycan identification focus on the analysis of MS2 data while tools for the MS3 or even more stages of mass spectrum for glycan identification are relatively rare (Goldberg et al., 2005; Hu et al., 2015; Kameyama et al., 2005; Reinhold et al., 2013; Tang et al., 2005; Yu et al., 2013). Given the fact that the MSn technique provides far more detailed information on glycan structures, such tools are timely needed.
The process using MSn to identify glycans begins with MS2. Peaks in MS2 can be selected to execute a new round of experiment and produce new spectra. If more rounds of experiments are needed, all existing peaks which have not been fragmented can be selected for the next round of experiment. This process continues until the sample runs out. A more efficient strategy is checking after each round of experiment, and the process stops if the spectra you have got can pick the actual glycan out.
The automatic identification of glycan branching patterns poses great challenges to the MSn technique in guiding MSn spectra generation and glycan branching pattern identification. Specifically, the successful application of the MSn technique heavily relies on the spectra producing sequence because in real scenarios the sample of glycans is usually in trace amount and more rounds of experiment means failure. For the sake of efficiency and saving labor cost, it is also meaningful to complete the identification with a least rounds of experiments. Kameyama et al. (Kameyama et al., 2005) reported a strategy to select peaks for the next round of experiment based on a comparison of signal intensity profiles of spectra between the analyte and a library of observational mass spectra acquired from structurally defined glycans prepared using glycosyl transferases. It relies on a large amount of standards to construct mass spectra database and it is infeasible to acquire reference mass spectra for all glycans in libraries.
In our opinion, the process of the producing MSn spectra can be regarded as a process of traversing a spectra tree. In this tree, each node is a spectrum and edge is referring to a node (spectrum) produced by its parent node (spectrum containing its precursor peak). A peak selected to produce a new spectrum means a new node (the new spectrum) is traversed. The cost of every edge is 1, referring to a round of experiment.
Finding a spectra generation process that can complete the identification with least rounds of experiments is similar to traverse the spectra tree with least steps to get to a predefined target. The selection of peak as precursor ion at every round of experiment is decisive for the result.
There are two common strategies modeling for execution. The highly non-trivial strategy is manual selection and it heavily relies on expertise of MS operators (Ashline et al., 2005; Lapadula et al., 2005; Zhang et al., 2005). Another strategy is to select peaks based on the intensity of peaks. The peak with the high intensity will be selected preferentially. Based on this strategy, two alternative methods, with reference to tree-traversal, can be used to guide the peak selection in MSn. The first one is breath-first search (BFS) algorithm for tree-traversal. From MS2, peaks in MS2 are further fragmented and get their MS3 spectra step by step in order of their intensities, then all peaks in the MS3 are further fragmented and get MS4, and so on until MS5. The second one is depth-first search (DFS) algorithm for tree-traversal. It begins with MS2 and selects the peak with highest intensity in MS2 to produce MS3, then selects the highest peak in MS3, and so on. The process continues until MS5, then backtrack.
Manual selection requires considerable expertise and time, whereas product-ion spectrum generated based on intensity may not be structurally informative and therefore need many rounds of experiment to pick the actual glycan out. New strategies that can guide the MSn experiment procedure to complete the identification in the most efficient way is important for the popularization of the MSn technique in glycomics.
Here, we propose to solve the peak selection problem using best-first search algorithm (bestFSA) by defining a virtual target state and the distance from each peak to this target state. The algorithm, also named GIPS (Glycan Intelligent Peak Selection), was designed to minimize the number of MSn experiment rounds required to pick the actual glycan out from candidates. We evaluated the algorithm on seven glycan standards. The experimental results show that our algorithm could correctly distinguish the actual glycans for all standard samples within a very few rounds of MSn experiments. It suggests that bestFSA can guide identification experiments in a very efficient way and greatly help facilitate the automatic glycan MSn identification.
2 Materials and methods
2.1 Scoring function based on spectrum-tree
Calculating the probability of each candidate glycan to be the actual one based on the MSn spectra has the following challenges:
how to integrate the information from all experimental spectra of a sample to assign the actual glycan;
how to transform the information to an appropriate form such that we can set a universal termination threshold for reliable identification.
The scores are thus normalized into real values that sum up to 1, making it possible to set a universal termination threshold for reliable identification. Although a termination threshold of 0.50 can be sufficient, 0.70 is used to allow robust and reliable identification in our work. The score function has another advantage in its additivity property, i.e. when we get a new spectrum , can be calculated as , which will greatly simplify the calculation process.
2.2 BestFSA
The process of the MSn experiment is like a tree-traversal problem. During the experiment, the spectrum-tree can be expanded further by any of the peaks in existing spectra. Each peak can be seen as a next feasible step of tree-traversal, and traversal of a peak means the peak is selected to execute a new round of experiment and get a new spectrum.
The desired spectrum-tree should be able to pick the actual glycan out and has the least number of the nodes. Best-first search algorithms for tree-traversal is very suited to our problem (Dechter and Pearl, 1985).
Best-first search is a search strategy which explores a graph by gradually expanding the most promising node chosen according to how close the end of a path is to a solution. The path which is judged to be closest to a solution is extended first.
In order to apply best-first-search algorithms to this problem, we need to first resolve two issues:
2.2.1 Target state
In most situations of best-first search, there is a target node. But in this problem, our aim is not to get to a certain spectrum, but rather to get a smallest spectrum-tree that can distinguish the actual glycan from other candidates. We consider the optimal outcome as the target state, its probability vector is a one hot vector that one of the values is 1, the others are 0. The entropy of this state is 0. Notice that we cannot always get this target state, but we can use it to guide our direction.
2.2.2 Distance
For the definition of distance, the difficulties are two folds. Firstly, an effective score needs to be designed to measure the distance of a peak to the target state using its experimental product-ion spectrum. In this work, the distance is defined to measure how much extra information is still needed in subsequent experiments to reach the target state after combining the peak’s spectrum into spectrum-tree. After updating candidates’ probabilities according to existing experimental spectrum-tree, we use entropy of the probability vector to measure how much more information is still needed to collect from subsequent experiments. Intuitively, the more uniformly distributed the probability is, the more information is still needed in upcoming rounds of experiments. A peak will be given a short distance if its product-ion spectrum has potential to enlarge the difference of the candidates’ probabilities. On the other hand, a peak with a short distance implies there is significant difference among the corresponding fragment ions labeling this peak from different candidates.
In addition, that in the real scenario, experimental spectra of peaks are unavailable when the distances of peaks to the target state need to be measured. Thus, a statistical estimation method is used to estimate the distances of existing peaks by simulating all of its possible product-ion spectra. Process of possible spectra generation is described in detail in Section 2.2.3. The distance of every possible mass spectrum is calculated by entropy of the updated probability vector after virtually adding the possible spectrum into existing spectrum-tree. Then the mean of distance of all possible spectra is used as the expected distance of the target peak to the target state.
The details of the distance computation are described in Algorithm 1.
1: Existing spectrum tree as Tree0
2: Candidates set as }
3: for each peak do
4: Set
5: Initialize an empty set A
6: for every candidate Gk in the candidate set Cdo
7: Label pj with Gk and add all ions of Gk that can explain pi in set A
8: end for
9: Initialize an empty set T
10: for every ions do
11: Enumerate all possible combinations of theoretical peaks of Im as a theoretical spectrum and add it into set T
12: end for
13: for every possible spectrum do
14: Add tl into the Tree0 separately and update the probability vector of all candidates
15: Computing the entropy of the resulting vector and add it to distance0.
16: end for
17:
18: end for
2.2.3 Enumeration
When to select an appropriate peak as precursor-ion, its experimental spectrum is unavailable; thus, the distance of this peak to target state cannot be computed directly and an effective approach is needed to estimate distance considering all possible spectra of the peak. These possible spectra are simulated by computer program. The first step is to construct a set of possible spectra of the peak to be evaluated, pj, by computer simulation. The word ‘possible’ has two meanings. The first meaning is that for every candidate glycan, its fragment ion (or fragments) that can explain the peak pj are listed and for each of possible fragments, we simulate its fragmentation process and list all of its peaks. Another meaning is since each peak may or may not appear in the final product-ion spectrum, we enumerate all combinations of all these possible peaks and simulated a large set of possible spectra. If there are n possible peaks, there will be possible spectra. It seems that combination blast will happen when n increases, but in a real implementation, most of the peaks are shared by different fragments, according to formula (1), these peaks has no effect on the probability value of candidates. Some peaks shared by the same fragments group, they have the same effect on the probability value of candidates. Taking all these into consideration, the number of simulated possible spectra can be reduced greatly. Simulated spectra from all candidates are merged together and form whole set of possible spectra of the peak to be evaluated.
After defining the target state and the definition of distance, we developed a new method, named bestFSA, based on Best-first search to guide MSn experiment process.
The detail of the algorithm is described in Algorithm 2:
1: Initialize the mass spectrum-tree with only one root node that corresponds to the input MS2 spectrum;
2: Initialize the active ion set ;
3: whileA is not empty do
4: For each peak , calculate its expected distance to the target state;
5: Extract the peak pmin with the minimal expected distance;
6: Inject the ion pmin into the mass spectrometer to acquire its experimental spectrum, denoted as ;
7: Add as a child node of the spectrum that pmin belongs to;
8: if the highest probability predefined-threshold then
9: return the glycan with the highest likelihood in ;
10: else
11: Delete pmin from A
12: Add the peaks in into A;
13: end if
14: end while
3 Results
3.1 Reference glycan database
There has been a considerable increase in the number of glycan databases since 2000, e.g. the KEGG GLYCAN (Hashimoto et al., 2006), GlycomeDB (Ranzinger et al., 2011) and EUROCarbDB (Al Jadda et al., 2015). We selected the most widely-used and well-documented one, CarbBank(Doubet et al., 1989) (also known as CCSD), developed by the Complex Carbohydrate Research Center, University of Georgia (Athens) which consists of 7837 glycan structures.
3.2 Materials and reagents
To demonstrate the feasibility of this strategy, seven glycans were purcharsed from Elicityl (Crolles, France), as shown in Table 1. Permethylation and purification were performed as previously reported (Schiel et al., 2013). Briefly, methyl iodide was added to 2 nmol standard glycans released in the presence of slurry mixture of dimethyl sulfoxide/sodium hydroxide (DMSO/NaOH), and then the sample was agitated on an automatic shaker at the room temperature for 20 min. Next, the reaction was quenched by adding water, and permethylated glycans were extracted with chloroform twice and then washed four times with water. The chloroform layer was dried in a centrifugal vacuum concentrator. Finally, glycans were purified on a Sep-Pak C18 cartridge and then dried.
Man-5D1 . | Man-6 . | Man-7D3 . | Bi-AntiA2 . | Hybrid-Octa . | NGA3 . | NGA4 . |
---|---|---|---|---|---|---|
Man-5D1 . | Man-6 . | Man-7D3 . | Bi-AntiA2 . | Hybrid-Octa . | NGA3 . | NGA4 . |
---|---|---|---|---|---|---|
Man-5D1 . | Man-6 . | Man-7D3 . | Bi-AntiA2 . | Hybrid-Octa . | NGA3 . | NGA4 . |
---|---|---|---|---|---|---|
Man-5D1 . | Man-6 . | Man-7D3 . | Bi-AntiA2 . | Hybrid-Octa . | NGA3 . | NGA4 . |
---|---|---|---|---|---|---|
3.3 Acquiring of MSn spectra
Permethylated glycan standards were analyzed on an Axima MALDI Resonance Mass Spectrometer with a QIT-TOF configuration (Shimadzu). A nitrogen laser was used to irradiate samples at 337 nm, with an average of 200 shots accumulated. Permethylated glycan standards dissolved in methanol were applied to a μfocus MALDI plate target (900 μm, 384 circles, HST). A matrix solution (0.5μL) of 2,5-dihydroxybenzoic acid (20 mg/mL) in a mixture of methanol/water (1:1) containing 0.1% trifluoroacetic acid and 1 mM NaCl was added to the plate and mixed with samples. The mixture was air dried at the room temperature before analysis. Among the four different resolution settings (FWHM 70, 250, 500 and 1000) for precursor isolation, the window at FWHM 500 with a width of 3–5 mass units was considered appropriate and used for the present study.
The product-ion spectrum acquired at each stage was introduced into the our program as an mzXML file (Shimadzu) for peak evaluation, using a signal-to-noise ratio 3:1 as the filtering parameter. Candidate glycans were extracted from CarbBank. The probability and distances were calculated, and the results were manually fed back to the mass spectrometer data system. The node (peak) with the shortest distance to target state was selected to execute next round of experiment (traveled) until the probability of one candidate glycan exceeded the pre-defined threshold of 0.70.
For MSn using the depth-first-search and breadth-first-search, peaks were selected in the order of intensity.
3.4 Outlines of the strategy
In order to test the performance of our algorithm, we performed identification for seven standard glycan samples, including high-mannose-type glycans, complex-type glycans and hybrid glycans.
First, we used Man6 as a concrete example to explain the running process of the bestFSA approach. As shown in Figure 1, the identification process of Man6 contains the following three steps:
The MS1 spectrum of the Man6 sample exhibited a significant peak with a mass of 1783, then a total of 12 candidate glycans (including the actual glycan Man6 and 11 false-positive candidates) with the same mass were extracted from the Carbbank glycan database. Initially, all of these 12 candidate glycans were assigned with identical probability of 1/12 since we have no prior knowledge regarding the actual glycan.
Based on MS2, we updated probabilities of these candidate glycans. Note that none of probabilities reached the predefinied threshold of 0.7, and thus MS3 was required. Unlike MS2, which uses the only peak MNa+ in the MS1 spectrum as the precursor, multiple peaks were present in the MS2 spectrum (Fig. 1).
At this stage, BSF and DSF selected the highest peak, mass, in the MS2 spectrum as the precursor ion for the next round of experiment. Instead, bestFSA selected the peak with the shortest distance. Distance is a calculated measure of a peaks’s ability to produce a product-ion spectrum containing distinctive structural information to be used in differentiating it from other isomers. The smaller the value, the more distinctive-structural information it can produce. Peak, mass (distance = 0.97), has the expected shortest distance to the target state and was selected as precursor ion to yield the MS3 spectrum.
Based on existing spectrum-tree, MS1, MS2 and MS3 (precursor mass = 1506), of BFS and DFS, the probabilities of all candidates were updated and there was still no candidate whose probability exceeding 0.7. BSF selected the second highest peak, mass , in MS2 and DSF selected the highest peak in MS3, mass, to execute a next round of experiment.
Based on existing spectrum-tree, MS1, MS2 and MS3 (precursor mass = 1084), of bestFSA, probabilities of all candidates were updated again and still there was no candidate whose probability exceeding 0.7. Thus bestFSA proceeded to calculate distance to the target state of each peak in the MS2 and MS3 spectra except 1084 in MS2, and selected the peak mass in MS3 with the shortest distance (0.90) to yield the MS4 spectrum.
Based on updated spectrum-tree (MS1, MS2 and MS3, MS4), the probability for G1 increased to 0.75, which means a sufficiently confident identification. The experiment stopped and reported G1 as the identification result. In fact, G1 is Man6, which is the actual glycan of the sample and the output means a successful identification.
Using bestFSA, we needed three rounds of experiments (not including MS1) to pick the actual glycan out.
3.5 Uniqueness of the bestFSA
To demonstrate the feasibility of this strategy, we compared bestFSA with DFS and BFS using seven glycans as representatives. Table 2 summarizes the results of these experiments.
For Hybrid-Octa and NGA3, all three algorithms selected the same peaks in MS3, so they all needed two rounds of experiments. For Bi-AntiA2, MS2 was enough for the identification and peak selection was not needed.
For high-mannose-type glycan samples and complex-type glycan samples, bestFSA successfully identified Man-5D1, Man-6, Man-7D3, NGA3, NGA4 and Bi-AntiA2 samples with high confidence using up to three round of experiments. DFS could identify all seven glycans too. BFS identified six but failed at Man-6. However, they both used more rounds of experiments.
For Hybrid-Octa (mass: 1824), best-first-search correctly reported the actual glycan with a high confidence of 0.82 using two rounds of experiments (MS2 and MS3), in contrast, BFS and DFS incorrectly reported another glycan as the prediction.
Name . | Num of cand. . | BFS . | DFS . | bestFSA . | |||
---|---|---|---|---|---|---|---|
. | . | Prob. . | Rds . | Prob. . | Rds . | Prob. . | Rds . |
Man-5D1 | 12 | 0.75 | 4 | 0.75 | 10 | 0.75 | 2 |
Man-6 | 12 | 0.10 | 6 | 0.89 | 9 | 0.75 | 3 |
Man-7D3 | 9 | 0.93 | 5 | 0.73 | 14 | 0.93 | 3 |
Bi-AntiA2 | 2 | 0.90 | 1 | 0.90 | 1 | 0.90 | 1 |
Hybrid-Octa | 9 | 0.10 | 2 | 0.10 | 2 | 0.82 | 2 |
NGA3 | 7 | 0.89 | 2 | 0.89 | 2 | 0.92 | 2 |
NGA4 | 5 | 0.95 | 3 | 0.72 | 4 | 0.73 | 2 |
Name . | Num of cand. . | BFS . | DFS . | bestFSA . | |||
---|---|---|---|---|---|---|---|
. | . | Prob. . | Rds . | Prob. . | Rds . | Prob. . | Rds . |
Man-5D1 | 12 | 0.75 | 4 | 0.75 | 10 | 0.75 | 2 |
Man-6 | 12 | 0.10 | 6 | 0.89 | 9 | 0.75 | 3 |
Man-7D3 | 9 | 0.93 | 5 | 0.73 | 14 | 0.93 | 3 |
Bi-AntiA2 | 2 | 0.90 | 1 | 0.90 | 1 | 0.90 | 1 |
Hybrid-Octa | 9 | 0.10 | 2 | 0.10 | 2 | 0.82 | 2 |
NGA3 | 7 | 0.89 | 2 | 0.89 | 2 | 0.92 | 2 |
NGA4 | 5 | 0.95 | 3 | 0.72 | 4 | 0.73 | 2 |
Note: The bestFSA successfully identified all seven glycan standards within a maximum of three rounds (shown in bold) of MSn experiments (MS1 did not count). DFS and BFS generally required more rounds of MSn than bestFSA.
Name . | Num of cand. . | BFS . | DFS . | bestFSA . | |||
---|---|---|---|---|---|---|---|
. | . | Prob. . | Rds . | Prob. . | Rds . | Prob. . | Rds . |
Man-5D1 | 12 | 0.75 | 4 | 0.75 | 10 | 0.75 | 2 |
Man-6 | 12 | 0.10 | 6 | 0.89 | 9 | 0.75 | 3 |
Man-7D3 | 9 | 0.93 | 5 | 0.73 | 14 | 0.93 | 3 |
Bi-AntiA2 | 2 | 0.90 | 1 | 0.90 | 1 | 0.90 | 1 |
Hybrid-Octa | 9 | 0.10 | 2 | 0.10 | 2 | 0.82 | 2 |
NGA3 | 7 | 0.89 | 2 | 0.89 | 2 | 0.92 | 2 |
NGA4 | 5 | 0.95 | 3 | 0.72 | 4 | 0.73 | 2 |
Name . | Num of cand. . | BFS . | DFS . | bestFSA . | |||
---|---|---|---|---|---|---|---|
. | . | Prob. . | Rds . | Prob. . | Rds . | Prob. . | Rds . |
Man-5D1 | 12 | 0.75 | 4 | 0.75 | 10 | 0.75 | 2 |
Man-6 | 12 | 0.10 | 6 | 0.89 | 9 | 0.75 | 3 |
Man-7D3 | 9 | 0.93 | 5 | 0.73 | 14 | 0.93 | 3 |
Bi-AntiA2 | 2 | 0.90 | 1 | 0.90 | 1 | 0.90 | 1 |
Hybrid-Octa | 9 | 0.10 | 2 | 0.10 | 2 | 0.82 | 2 |
NGA3 | 7 | 0.89 | 2 | 0.89 | 2 | 0.92 | 2 |
NGA4 | 5 | 0.95 | 3 | 0.72 | 4 | 0.73 | 2 |
Note: The bestFSA successfully identified all seven glycan standards within a maximum of three rounds (shown in bold) of MSn experiments (MS1 did not count). DFS and BFS generally required more rounds of MSn than bestFSA.
These results clearly demonstrate the advantage of the proposed bestFSA over BFS and DFS for improved experimental efficiency, time saving and sample consumption.
We also examined the reproducibility of the results and possible factors which may affect the bestFSA results. The complete identification procedure, including the acquisition of MSn spectra guided by bestFSA, for seven samples was repeated at least twice (four or five times for selected samples). The results were reproducible. For the present work, a mass tolerance was set at 0.5 although a value between 0.5 and 1.0 does not affect the results, and a medium/standard (fwhm 250) mass resolution was used for optimum sensitivity although other resolution settings gave similar results. The collision energy was between 100 and 400 mV, and within this range the relative intensities of fragment ions varied, but the identification results remained largely the same (Supplementary Table S3–S5).
4 Conclusion and discussion
Glycan branching patterns, although important, cannot be easily elucidated by most of the existing glycan profiling methods. We developed a strategy based on bestFSA to provide an automatic, sensitive and rapid way for identification of glycan branching patterns. BestFSA reduces the expertise required to perform MS experiments by making recommendations to MS operators the peaks which are most likely to select the actual glycan out.
For the glycan samples used in this study, the branching patterns were successfully identified using 10–50 ng samples within 5–10 min and used the information of at most 4 MSn spectra (using less than three rounds of MSn experiments). Compared with breadth-first-search and depth-first-search, bestFSA significantly shortened the analysis time and reduced sample consumption more than 1–2 orders of magnitude.
The approach is preferable to static MALDI-MS because of the interative process. Calculation must be performed before next-stage acquisition, and repeat scanning of existing experimental spectra is required, which leads to the incompatible utilization for dynamic Liquid Chromatography Mass Spectrometry (LC-MS).
It is also important to recognize the another limitation of our approach in the identification of linkage information of glycans. The underlying reason is that all of our experiments were performed using MALDI mass spectrometry with the -ion mode. Under this model, the major generated ions are B, C, Y, Z types, which cannot provide the linkage information between monosaccharides. In order to distinguish linkage information, the A and X ions are necessary which contain the information of cross-ring fragments. Therefore, a possible solution to identify the linkage sites is to using alternative instrument setting to generate A, X ions, for example, under the negative-ion mode or using ETD/ECD fragmentation instrument. By combining the additional fragment ion information, bestFSA may have the potential to deduce the linkage sites of glycan.
It should also be pointed out that our bestFSA is a database search approach, and thus the identification results are limited to the glycans recorded in the database we used in the study. The extension of database to include more glycans will broaden the application range of our approach. In principle, de novo approach has the potential to identify glycans not recorded in database but the existing algorithms are slow and prone to error.
Overall, our bestFSA can greatly facilitate the automatic identification of glycan branching patterns, and the basic idea can be extended to the identification of other important molecules, such as lipids and metabolic molecules.
Funding
This work has been supported by the National Key Research and Development Program of China (2018YFC0910405), the National Natural Science Foundation of China (31671369, 31600650, 31770775) and International Partnership Program of Chinese Academy of Sciences (No. 153311KYSB20150012).
Author contributions
SS and DB conceived the study and designed the computational model, YL and CH designed the mass spectrometry methodology. CH and JZ established the MALDI-MSn and glycan structural analysis procedure and performed and analyzed the mass spectral data. YW, HW, JD, WP, QZ and JZ implemented the bestFSA approach. All authors discussed the results and commented on the manuscript.
Conflict of Interest: none declared.
References