-
PDF
- Split View
-
Views
-
Cite
Cite
Marc Kirchner, Bernhard Y. Renard, Ullrich Köthe, Darryl J. Pappin, Fred A. Hamprecht, Hanno Steen, Judith A. J. Steen, Computational protein profile similarity screening for quantitative mass spectrometry experiments, Bioinformatics, Volume 26, Issue 1, January 2010, Pages 77–83, https://doi.org/10.1093/bioinformatics/btp607
- Share Icon Share
Abstract
Motivation: The qualitative and quantitative characterization of protein abundance profiles over a series of time points or a set of environmental conditions is becoming increasingly important. Using isobaric mass tagging experiments, mass spectrometry-based quantitative proteomics deliver accurate peptide abundance profiles for relative quantitation. Associated data analysis workflows need to provide tailored statistical treatment that (i) takes the correlation structure of the normalized peptide abundance profiles into account and (ii) allows inference of protein-level similarity. We introduce a suitable distance measure for relative abundance profiles, derive a statistical test for equality and propose a protein-level representation of peptide-level measurements. This yields a workflow that delivers a similarity ranking of protein abundance profiles with respect to a defined reference. All procedures have in common that they operate based on the true correlation structure that underlies the measurements. This optimizes power and delivers more intuitive and efficient results than existing methods that do not take these circumstances into account.
Results: We use protein profile similarity screening to identify candidate proteins whose abundances are post-transcriptionally controlled by the Anaphase Promoting Complex/Cyclosome (APC/C), a specific E3 ubiquitin ligase that is a master regulator of the cell cycle. Results are compared with an established protein correlation profiling method. The proposed procedure yields a 50.9-fold enrichment of co-regulated protein candidates and a 2.5-fold improvement over the previous method.
Availability: A MATLAB toolbox is available from http://hci.iwr.uni-heidelberg.de/mip/proteomics.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
Current global quantitative proteomics experiments provide time-resolved insight into the dynamic behavior of cellular processes at the protein level and are more reflective of the immediate status of a cell compared with, e.g. transcriptional studies which completely ignore post-transcriptional regulation. In this context, the quantitative and qualitative characterization of protein expression-level profiles over a series of time points or a set of environmental conditions is becoming increasingly important. Quantitative mass spectrometry (MS) is the method of choice to directly identify, quantitate and characterize hundreds or thousands of proteins simultaneously, delivering accurate peptide abundance profiles that yield relative quantitative information (Bantscheff et al., 2007; Ong and Mann, 2005).
However, given the large numbers of proteins in these studies, the biochemical validation of the information gathered in such experiments is not feasible. It is hence desirable to develop computational screening procedures that can rank proteins based on their similarity to the abundance profile of a reference protein over a time course or a set of conditions. Although observing similar protein abundance profiles cannot prove specific biochemical properties, the associated ranking can yield a valuable enrichment of protein groups associated with the same or similar cellular processes and provide a criterion for the prioritization of biological validation experiments, i.e. a testable shortlist of candidate proteins (Andersen et al., 2003; Foster et al., 2006).
Quantitative MS methods provide direct information about abundance levels of endogenous proteins (Bantscheff et al., 2007; Ong and Mann, 2005). Quantitative MS is thus a method of choice for the comprehensive differential analysis of protein abundance profiles, which vary with time and/or experimental conditions (Bürckstümmer et al., 2006; Fields and Song, 1989; Puig et al., 2001; Rigaut et al., 1999; Ross et al., 2004; Selbach and Mann, 2006; Tedford et al., 2008; Thompson et al., 2003; White, 2008). Multiplexed isobaric mass tagging (IMT) approaches or multiplexed metabolic labeling allow for time-resolved protein abundance measurements for thousands of proteins simultaneously, overcoming the need for tedious individual protein testing. Recent computational analyses (Hill et al., 2008; Oberg et al., 2008) have provided means of statistical evaluation of differential abundances between IMT labels but have not focused on the statistical concepts necessary to compare peptide and protein profiles. Protein correlation profiling (PCP) is a heuristic protein profile screening approach that has been developed in the context of tracking interacting proteins over fractions of a sucrose gradient. It has successfully been used for large-scale proteomic characterization of the various human organelles (Andersen et al., 2003; Foster et al., 2006). Here, we use PCP as a de facto standard for performance comparison.
Our study introduces a protein profile similarity screening (PSS) procedure that utilizes abundance profiles from quantitative proteomics experiments using the IMT strategy. We investigate the statistical consequences of data normalization, which, if not accounted for, can jeopardize standard testing procedures. We establish the connection between IMT series and the analysis of compositional data (Aitchison, 1982, 1983, 1994) and introduce a novel approach to propagate quantitative profile information obtained from peptide measurements to the protein level. The proposed procedure creates a similarity-ranked shortlist of proteins in an automated and user-independent manner. Such shortlists are easily tested by biochemical assays and circumvent the laborious screening procedures that are currently used. The proposed method is evaluated on a biologically relevant example: we attempt to identify substrates of the E3 ubiquitin ligase Anaphase Promoting Complex/Cyclosome (APC/C), a master regulator of the cell cycle through M and G1 phases (Peters, 2006). We are interested in proteins that are degraded during mitosis or G1 phase, using quantitative proteomics data from an IMT-based experiment measuring the relative protein abundance at four different points during the cell cycle specifically chosen to profile the activity of the APC/C. Conventional assignment of a particular substrate to its unique E3 ubiquitin ligase is a laborious task involving the biochemical screening of hundreds or thousands of cloned and expressed proteins in biochemical assays. Consequently, computational screening procedures that help to prioritize among the candidates contribute to significantly reducing the biochemical effort.
Section 2 of the article provides all methodological details and the proposed screening procedure is applied to real-world experimental data in Section 3. In Sections 4 and 5, we report and discuss results, suggesting that the proposed approach is indeed powerful: with only few protein IMT abundance measurements, the identification of a set of well-known co-regulated proteins is possible. Conclusions and perspectives are offered in Section 6.
2 METHODS
2.1 Workflow overview
We propose a novel procedure for the inference of protein abundance profile similarity from IMT analyses of proteomic time series experiments. Given a set of normalized IMT peptide reporter ion profiles (Fig. 1A), we apply a hierarchical clustering (Fig. 1B) method tailored to the statistical dependence structure that results from the normalization. The Dirichlet likelihood ratio test (DLRT) delivers a suitable cluster tree cutoff strategy and yields a data grouping on the peptide level. From there we construct protein signatures, representing the protein-wise peptide distribution over the clusters (Fig. 1C). The Mallows distance then provides a suitable measure for the inference of protein similarity (Fig. 1D). In the final step, proteins are ranked according to their profile similarity to one or more predefined marker proteins (Fig. 1E).

Data analysis workflow for protein coregulation estimation: (A) IMT measurements yield sum-normalized quantitative peptide reporter ion profiles. (B) The reporter ion profiles are subjected to hierarchical clustering using an appropriate simplicial distance measure. The number of clusters is determined using a DLRT based on the observed peptide reporter ion profile distributions on the n-dimensional simplex. (C) Given the clustering, the quantitative measurements are grouped on the protein level, yielding a peptide cluster distribution for each protein. (D) The protein signatures are used to determine Mallows distances between proteins, taking into account the fact that the underlying clusters differ in their similarity. (E) The resulting distance matrix is subsequently evaluated to yield a shortlist of coregulation candidates.
2.2 Statistical properties of IMT time-series measurements
2.2.1 Isobaric mass tagging
IMT labels such as TMT and iTRAQ generally consist of three parts: a reactive group which binds to the peptide, a reporter group and a balancer group. Varying combinations of light and heavy isotopes in the reporter and balancer groups yield four unique reporter ion masses while keeping the overall mass constant (Ross et al., 2004; Thompson et al., 2003). For quantitation experiments, K labels are attached to N peptide species from K experimental conditions. In LC/MS analysis, the differentially tagged species have the same retention time and consequently form a single peptide isotope distribution in the MS parent spectrum. During fragmentation, the reporter/balance/peptide compound breaks in three and yields K absolute reporter ion abundance measurements x=(x1, x2,…, xK)T, for each of the N peptide species. Given a protein, the vector x holds the respective reporter ion profile of observed abundances.
2.2.2 Normalization
An absolute reporter ion profile x is subjected to variable interpeptide ionization efficiency (Song et al., 2008; Turck et al., 2007) and is dependent on the MS/MS sampling mode. Especially for data-dependent acquisition (DDA) schemes, MS/MS sampling depends on the sample complexity and there is no guarantee that MS/MS quantitation is carried out at the apex of peptide elution. In order to remove these effects, peptide reporter ion profiles need to be normalized. Commonly applied approaches include reference- or sum normalization, i.e. element-wise division by the abundance of a designated reporter ion or by the sum of all abundances, respectively. In both cases, the normalization eliminates 1 degree of freedom and a covariance/dependency structure is imposed on the measurements xi (Supplementary Material). The following presentation studies the mathematically more tractable idea of sum normalization. It yields normalized abundance reporter ion profiles x*=(x*1, x*2,…, x*K)T, where x*i=xi/∑j=1Kxj. The loss of a degree of freedom is illustrated by the property that the relative intensity of any marker i can be recovered from the remaining normalized reporter ion intensities, i.e. x*i=1−∑j≠ix*j.
2.3 Clustering peptides on the simplex
2.3.1 Hierarchical clustering on the simplex

2.3.2 Dirichlet likelihood ratio test

For the determination of statistical significance, we derive a likelihood ratio test (Casella and Berger, 2001) for the Dirichlet distribution. Assume we have two sets of observations 𝒳 and 𝒴. We test whether the observations of the two groups stem from the same underlying Dirichlet distribution with parameter vector α𝒳∪𝒴. In other words, we evaluate if the null hypothesis H0 : α𝒳=α𝒴 can be rejected in favor of the alternate hypothesis H1:α𝒳≠α𝒴.






2.3.3 Adaptive thresholding for cluster determination
With the DLRT, it is possible to use a rigorous statistical testing scheme to determine adaptive thresholds in the clustering tree: starting from the root we conduct a DLRT for each cluster tree node. Given a predefined type-I error rate/alpha level (generally 0.05 or 0.01), we merge all tree leaves into a cluster if the P-value assigned to a node is larger than the alpha-level threshold. This implicitly determines the number of clusters and the top-down scheme circumvents potential multiple testing issues intrinsically related with bottom-up testing procedures (Benjamini and Hochberg, 1995).
2.4 Estimating protein profile similarity
2.4.1 Protein signatures
To determine which proteins show similar reporter ion profiles over a set of K experiments, the quantitative peptide-level information needs to be aggregated. The DLRT-based peptide-level clustering identifies peptides with similar behavior and groups them into C clusters. We represent each of the P proteins observed in the MS/MS experiments by a C × 1 peptide signature vector sp with p∈{1,…, P}. Hence, the element spq holds the ratio of peptides observed for protein p which fall into cluster q. Thus, making use of ratios we avoid a dependency on the absolute number of peptides that have been identified for a protein. In addition, the peptide cluster representation for proteins eliminates intracluster variance (which is then regarded as experimental noise) and serves as a data-dependent dimension reduction procedure, effectively projecting the protein onto the peptide clusters.
The rationale behind this approach is that IMT peptide reporter ion profiles are susceptible to post-translational modification effects: in the presence of PTMs, peptides of a protein may exhibit very diverse reporter ion profiles. Different types of reporter ion profiles aggregate in different clusters and determining the distribution of peptides over these clusters yields a robust and versatile protein representation. Subsequent comparison of protein signatures then allows for the calculation of protein-level abundance profile similarity.
2.4.2 Mallows distance





2.5 Identifying similar proteins
It is now possible to derive a shortlist of proteins that exhibit similar abundance profiles from the distance matrix M. Given a known substrate protein p, the elements of the column vector mp=(m1p, m2p,…, mPp)T are constrained to the interval [0, 1] and approximately follow a beta distribution. The parameters αp and βp are estimated by maximum likelihood and subsequently allow the computation of a cutoff quantile q (generally the 0.01 or 0.05 quantile). All proteins t with a Mallows distance mtp below the quantile q are then included in the protein shortlist.
3 EXPERIMENTS
We evaluated our method on an iTRAQ (a specific IMT strategy) MS experiment of the APC/C. The APC/C is a highly specific ubiquitin ligase that marks its substrates for degradation by the 26S proteasome and thus controls entry into and exit from mitosis in the cell cycle.
The analysis attempts to elucidate APC/C substrate candidates from a full cell extract, based on the temporal protein abundance profile of the known APC/C substrate Cyclin-B1 (CCNB1) (King et al., 1995).
We compared the proposed workflow against PCP (Andersen et al., 2003), which calculates peptide-level χ2 distances based on a predefined set of marker proteins and takes peptide medians to infer protein-level dissimilarity. PCP has been used in a large-scale proteomic organelle mapping study (Foster et al., 2006).
3.1 Experimental background
The data stem from lysates of HeLa S3 cells arrested in four time points in the cell cycle: prometaphase, M/G1, G1 and G1/S (Fig. 2). Over the selected time course cells divide and the observed changes in protein abundance also reflect changes induced by APC/C activity, i.e. controlled protein degradation. The samples were digested with trypsin, iTRAQ-labeled, combined, fractionated first by SCX then by reversed phase liquid chromatography and analyzed by MALDI-TOF/TOF MS (Applied Biosystems/MDS Sciex 4800 TOF/TOF). The iTRAQ reagents (Ross et al., 2004) consist of three parts: a reporter group with mass 114–117, a balance group with mass 28–31 and the amine-specific peptide reactive group (N-hydroxysuccinimide, NHS), targeting the peptide N-terminal and the ϵ-amino group of lysine. The overall mass of the reporter-balance combinations is kept constant (145 Da) using differential isotopic labeling of 13C, 15N and 18O. Peptide and protein identifications were performed using the Mascot search engine (Matrix Science, version 2.2.1) (Perkins et al., 1999) with a fully tryptic human database (IPI human, version 3.23) and a false positive rate of 4.1% at the peptide level. The iTRAQ reporter group abundances were extracted from the raw MALDI-TOF/TOF data, isotope-correlated and matched to identified peptides using DataExplorer (Applied Biosystems, Foster City, CA, USA). In addition, the quality of the spectra and/or identification matches was also assessed requiring a spectral quality score (SQS; Parker et al., 2004) above 1000.

Experimental setup: lysates from HeLa S3 cells were arrested in different states of the cell cycle. Samples were digested, iTRAQ-labeled, combined and analyzed by LC-MS/MS. Reporter ion profiles were acquired by subsequent quantitation and normalization.
3.2 Computational analysis
The MS analysis yielded 19 619 MS/MS spectra with complete quantitative information, and identified 2443 proteins based on two or more of the 16 785 unique peptides. All reporter ion profiles were sum-normalized and subjected to two computational analyses: (i) PSS was carried out as described in the previous section, with a DLRT significance level of 0.01 and (ii) PCP (Andersen et al., 2003). The resulting distance measurements and χ2 values were used to derive a ranked protein list for each method. In both cases, we selected CCNB1 as a reference, and derived the top 1% shortlist for the proteins in the sample whose protein-level abundance profiles are most similar to the ones of the reference.
4 RESULTS
Table 1 lists the ranks of 10 known APC/C substrates and PRC1 that were observed in the acquired data as reported by PSS and PCP. See the Supplementary Material for detailed references concerning the chemical validation of the respective compounds. The CCNB1 reference profile is reported with rank zero and excluded from all following statistics.
Description . | PSS . | PCP . |
---|---|---|
CCNB1: G2/mitotic-specific cyclin-B1 | 10 | 10 |
TK1: Thymidine kinase cytosolic | 12 | 11 |
PRC1: Protein regulator of cytokinesis 1 | 16 | 11 |
TPX2: Targeting protein for Xklp2 | 17 | 54 |
NUSAP: Nucleolar/spindle-assoc. protein 1 | 12 | 623 |
PLK1: Serine/threonine-protein kinase | 24 | 28 |
CKAP2: Cytoskeleton-associated protein 2 | 399 | 624 |
AURKA: Serine/threonine-protein kinase 6 | 548 | 86 |
CDCA5: Sororin | 1565 | 1958 |
DNMT1: DNA methyltransferase 1 | 1598 | 876 |
GTSE1: G2 and S phase-expressed protein 1 | 1724 | 373 |
Confirmed proteins in top 1% ranks | 5/10 | 2/10 |
Ratio of confirmed proteins (q=1%) | 20.8% | 8.3% |
Enrichment factor (q=1%) | 50.9 | 20.4 |
Description . | PSS . | PCP . |
---|---|---|
CCNB1: G2/mitotic-specific cyclin-B1 | 10 | 10 |
TK1: Thymidine kinase cytosolic | 12 | 11 |
PRC1: Protein regulator of cytokinesis 1 | 16 | 11 |
TPX2: Targeting protein for Xklp2 | 17 | 54 |
NUSAP: Nucleolar/spindle-assoc. protein 1 | 12 | 623 |
PLK1: Serine/threonine-protein kinase | 24 | 28 |
CKAP2: Cytoskeleton-associated protein 2 | 399 | 624 |
AURKA: Serine/threonine-protein kinase 6 | 548 | 86 |
CDCA5: Sororin | 1565 | 1958 |
DNMT1: DNA methyltransferase 1 | 1598 | 876 |
GTSE1: G2 and S phase-expressed protein 1 | 1724 | 373 |
Confirmed proteins in top 1% ranks | 5/10 | 2/10 |
Ratio of confirmed proteins (q=1%) | 20.8% | 8.3% |
Enrichment factor (q=1%) | 50.9 | 20.4 |
The table displays the list of known (i.e. biochemically validated) APC/C substrates present in the sample. The entries are ordered by the ranking derived from computational PSS and annotated with the ranking delivered by PCP (Andersen et al., 2003). PSS identifies 5 of the 10 known coregulating proteins among the top 1% ranks whereas PCP identifies only two. PSS thus yields a 50.9-fold enrichment of CCNB1-coregulation candidates among the top 1% proteins in the shortlist and a 2.5-fold increase compared with PCP.
Description . | PSS . | PCP . |
---|---|---|
CCNB1: G2/mitotic-specific cyclin-B1 | 10 | 10 |
TK1: Thymidine kinase cytosolic | 12 | 11 |
PRC1: Protein regulator of cytokinesis 1 | 16 | 11 |
TPX2: Targeting protein for Xklp2 | 17 | 54 |
NUSAP: Nucleolar/spindle-assoc. protein 1 | 12 | 623 |
PLK1: Serine/threonine-protein kinase | 24 | 28 |
CKAP2: Cytoskeleton-associated protein 2 | 399 | 624 |
AURKA: Serine/threonine-protein kinase 6 | 548 | 86 |
CDCA5: Sororin | 1565 | 1958 |
DNMT1: DNA methyltransferase 1 | 1598 | 876 |
GTSE1: G2 and S phase-expressed protein 1 | 1724 | 373 |
Confirmed proteins in top 1% ranks | 5/10 | 2/10 |
Ratio of confirmed proteins (q=1%) | 20.8% | 8.3% |
Enrichment factor (q=1%) | 50.9 | 20.4 |
Description . | PSS . | PCP . |
---|---|---|
CCNB1: G2/mitotic-specific cyclin-B1 | 10 | 10 |
TK1: Thymidine kinase cytosolic | 12 | 11 |
PRC1: Protein regulator of cytokinesis 1 | 16 | 11 |
TPX2: Targeting protein for Xklp2 | 17 | 54 |
NUSAP: Nucleolar/spindle-assoc. protein 1 | 12 | 623 |
PLK1: Serine/threonine-protein kinase | 24 | 28 |
CKAP2: Cytoskeleton-associated protein 2 | 399 | 624 |
AURKA: Serine/threonine-protein kinase 6 | 548 | 86 |
CDCA5: Sororin | 1565 | 1958 |
DNMT1: DNA methyltransferase 1 | 1598 | 876 |
GTSE1: G2 and S phase-expressed protein 1 | 1724 | 373 |
Confirmed proteins in top 1% ranks | 5/10 | 2/10 |
Ratio of confirmed proteins (q=1%) | 20.8% | 8.3% |
Enrichment factor (q=1%) | 50.9 | 20.4 |
The table displays the list of known (i.e. biochemically validated) APC/C substrates present in the sample. The entries are ordered by the ranking derived from computational PSS and annotated with the ranking delivered by PCP (Andersen et al., 2003). PSS identifies 5 of the 10 known coregulating proteins among the top 1% ranks whereas PCP identifies only two. PSS thus yields a 50.9-fold enrichment of CCNB1-coregulation candidates among the top 1% proteins in the shortlist and a 2.5-fold increase compared with PCP.
Figure 3 displays the normalized peptide reporter ion profiles (gray lines) for the same set of proteins along with the geometric means over the profiles of all associated peptides. The geometric means serve as a measure of (simplicial) central tendency and are suitable for visual comparison and discussion of the results. High-ranking substrates (TK1, NUSAP, PLK1, TPX2) and PLK1 exhibit U-shaped tendencies similar to CCNB1, whereas the low-ranking AURKA, CDCA5, DNMNT1 and GTSE1 show clearly different tendencies.

Peptide reporter ion profile plots for all identified APC/C substrates in the sample: peptide reporter ion profiles are shown in gray, protein-wise geometric means are used as a measure of simplicial central tendency and shown in black.CCBN1 (upper left corner), the reference protein in the analysis, exhibits a U-shaped central tendency of peptide profiles which is shared by the coregulating proteins reported by the proposed screening procedure at the 1% level as well as by CKAP2. In the bottom row, the observed peptide reporter ion profiles and strongly diverging central tendencies support the algorithmic findings that the data do not exhibit detectable coregulation for AURKA, CDCA5, DNMT1 and GTSE1.
At a 1% confidence level, PSS reports five of the known APC substrates, PCP reports two. Both approaches report confident hits for PRC1, a mitotic spindle-associated microtubule binding and bundling protein that is essential to cell cleavage. Its tight regulation is necessary to maintain the spindle midzone and to guarantee microtubule interdigitation. For PRC1, there is a body of evidence indicating that it tightly co-regulates with CCNB1 and that it indeed may be an APC/C substrate (Jiang et al., 1998; Mollinari et al., 2002), although biological validation is still pending. For all following statistics, we included PRC1 into the list of known coregulating proteins.
The PSS results on the APC/C iTRAQ dataset yield an 50.9-fold enrichment of CCNB1 co-regulated proteins as compared with the original raw data: the likelihood to observe an CCNB1-coregulating protein (i.e. an APC/C substrate candidate) in the set of significant ranks is 5/24=20.8% compared with 10/2443=0.41% in the original unranked data. For PCP, we observe an enrichment factor of 20.4, corresponding to a likelihood of 8.3%. The fraction of confirmed proteins present in the top 1% ranks is 5/10=50% for PSS and 2/10=20% for PCP.
5 DISCUSSION
The biologically validated set of top-ranked APC/C substrates includes: CCNB1, TK1, NUSAP, PLK1, TPX2 and PRC1. The examination of the peptide reporter ion profiles of the known APC/C substrate (AURKA, CDCA5, DNMNT1 and GTSE1), which were not reported as coregulation candidates at a 1% cutoff shows significant deviations from the CCNB1 reporter ion profiles (Fig. 3). The two observable peptide reporter ion profiles for CKAP2 exhibit a U-shape with higher starting and lower ending points compared with CCNB1. The cluster assignment of one of the peptide profiles is close to a CCNB1 cluster (data not shown). However, because only two reporter ion profiles are available, only half of the CKAP2 protein signature matched to CCNB1; we assume that if better sequence coverage were available, CKAP2 would be ranked closer to the top. In this context, limiting the approach to proteins with a minimum amount of sequence coverage might be a worthwhile step to increase the screening accuracy. In summary, the proteins that fall out of the top 1% ranks feature protein signatures very different from the reference which result in increased distance measures. This intuitive assessment of performance also underlines the different distance measures used by PSS and PCP: PCP orders PLK1 and AURKA further to the top. This is due to the definition of the median and in particular in the case of AURKA, the median-based PCP delivers less intuitive results than PSS.
Based on the experiments conducted in this study, PSS provides promise for practical application: among the top 1% ranked proteins, the likelihood of finding a truly coregulating protein was 2.5 times higher with PSS than with PCP; given that screening experiments in general need to be followed up with labor-intensive biological validation, this is a significant difference.
6 CONCLUSIONS
The proposed data analysis procedure enables PSS from IMT experiments. The procedure introduces novel statistical methodology for the treatment of IMT abundance reporter ion profiles that takes into account the dependency structure inherently present in the measurements. It also introduces advances in exploratory data analysis that enable protein-level inference based on peptide-level measurements. The experimental results indicate that the methodology is sufficiently powerful to cope with practical requirements.
In addition, the protein signatures sp hold the information across which reporter ion profile clusters the peptides of a particular protein are distributed. This information can be used to gain insight if different homologs of a protein are present in an experiment.
PSS identifies proteins with similar abundance profiles without the need for tailored biochemistry or high-effort experimental protocols. In particular, the method is applicable to full cell lysate measurements at endogenous protein levels. As a consequence, the method is unbiased. In practical application, similarity screening is carried out in a fully automated manner, requiring only a single, well-interpretable user-parameter (the DLRT significance level). The overall algorithmic setup merely assumes sum-normalized relative quantification measurements, and the underlying statistical methodology is thus applicable to a wide range of proteomic research questions.
Ultimate validation of substrate relationships has to be carried out in the biochemical domain. However, in the case of APC/C co-regulation, our findings indicate that high-confidence candidates reported by the proposed methodology are well-chosen candidates for biochemical validation.
Of particular importance for the proposed approach is the fact that each analysis step makes use of the correct metrics with respect to the underlying statistical dependency structures. Thus, the overall approach maintains statistical power and is able to generate usable results even with comparatively small sample sizes. The underlying methods, including the DLRT, can be applied to a wide field of use cases and PSS can be used as a drop-in replacement for PCP.
Future developments in time-resolved IMT experiments will likely include the ability to measure the sample under investigation at much better temporal resolution, providing a much more complete description of quantitative protein behavior and a significant increase in the amount of available discriminative information.
ACKNOWLEDGEMENTS
The authors would like to thank Michael Hanselmann, Xinghua Lou (Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Germany), Flavio Monigatti and Wiebke Timm (Deparrment of Pathology, Children's Hospital, Boston, MA, USA) for comments, suggestions and fruitful discussions.
Funding: DFG (HA4364/2-1 to B.Y.R. and F.A.H.; KI1498/1-1 to M.K.); the Alexander von Humboldt-Foundation (3.1-DEU/1134241 to M.K.); the Children's Hospital Trust (to J.A.J.S. and H.S.); Harvard Medical School (Junior Faculty Award to J.A.J.S.).
Conflict of Interest: none declared.
REFERENCES
Author notes
† The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First authors.
‡ The authors wish it to be known that, in their opinion, the last two authors should be regarded as joint Last authors.
Associate Editor: Burkhard Rost