Motivation: In recent years, development of a single-method fold-recognition server lags behind consensus and multiple template techniques. However, a good consensus prediction relies on the accuracy of individual methods. This article reports our efforts to further improve a single-method fold recognition technique called SPARKS by changing the alignment scoring function and incorporating the SPINE-X techniques that make improved prediction of secondary structure, backbone torsion angle and solvent accessible surface area.
Results: The new method called SPARKS-X was tested with the SALIGN benchmark for alignment accuracy, Lindahl and SCOP benchmarks for fold recognition, and CASP 9 blind test for structure prediction. The method is compared to several state-of-the-art techniques such as HHPRED and BoostThreader. Results show that SPARKS-X is one of the best single-method fold recognition techniques. We further note that incorporating multiple templates and refinement in model building will likely further improve SPARKS-X.
Availability: The method is available as a SPARKS-X server at http://sparks.informatics.iupui.edu/
Given a query protein sequence with unknown structure, the most reliable structure prediction technique is to recognize its matching structural folds from existing known structures with or without significant sequence similarity (called homology modeling and fold recognition, respectively). This approach is also known as template-based modeling. Template-based modeling becomes increasingly powerful because most popular structure folds (adopted by multiple sequences) are known (Dai and Zhou, 2011; Kihara and Skolnick, 2003; Zhang et al., 2006).
However, recognizing structurally similar folds in the absence of sequence similarity (fold recognition) is challenging, as revealed from the critical assessment of structure prediction (CASP) techniques. CASP experiments highlighted the importance of post-treatment of models predicted by individual fold recognition methods through the use of consensus predictions [For example, ROBETTA (Chivian et al., 2003), Pmodeller6 (Wallner et al., 2007), Fams-ace (Terashi et al., 2007), Phyre (Bennett-Lovsey et al., 2008)] and/or constrained template–fragment recombination and refinement [For example, Chunk-TASSER (Zhou et al., 2007a), I-TASSER (Zhang, 2007)]. The experiments also indicated a convergence of techniques that can be broadly characterized as mixing and matching of multiple fragments and templates (Bujnicki, 2006; Zhou et al., 2010). Examples of recently developed new methods include the combined use of fragment and template comparison (Zhou and Skolnick, 2010), non-linear scoring function from conditional random field model (Peng and Xu, 2009) and profile entropy (Peng and Xu, 2010), employment of predicted torsion angles (Wu and Zhang, 2008; Zhang et al., 2008) and a combined use of profile–profile alignment and pairwise and solvation potentials (Lobley et al., 2009).
We have developed a series of single fold recognition methods (SPARKS, SP2, SP3, SP4 and SP5) that are based on weighted matching of multiple profiles that include sequence profiles generated from multiple sequence alignment (Altschul et al., 1997), predicted versus actual secondary structures (Rost et al., 1997; Zhou and Zhou, 2004, 2005a), knowledge-based profile (single-body) score function (Zhou and Zhou, 2004), depth-dependent sequence profiles derived from template structures (Zhou and Zhou, 2005a), predicted versus actual solvent accessible surface area (Liu et al., 2007) and predicted versus actual dihedral angles (Zhang et al., 2008). Statistically significant improvement is observed for the accuracy and sensitivity of fold recognition as the number of matching profiles increases from 3 to 5 (Liu et al., 2007; Zhang et al., 2008; Zhou and Zhou, 2004, 2005a). In particular, SPARKS, SP3 and SP4 were ranked among the top performers for automatic servers in CASP 6 (Tress et al., 2005; Zhou and Zhou, 2005b) and CASP 7 (Battey et al., 2007; Liu et al., 2007) experiments.
One issue in the methods developed above is that matching predicted 1D profiles of query sequence with actual profiles of templates is based on simple difference matrices. It does not account for the probability of errors in predicted 1D structural properties such as secondary structure, backbone torsion angles and solvent accessible surface area. In this article, we introduce energy terms based on the estimated probability of a match between predicted and actual 1D structural properties, a technique commonly used in fold recognition based on hidden Markov models (Hargbo and Elofsson, 1999). In addition, we take advantage of recently improved accuracy in predicted secondary structure [Q3=81−82% by SPINE X (E.Faraggi et al., submitted for publication)], torsion angles [SPINE X (Faraggi et al., 2009b), mean absolute error = 33° for ψ and 22° for ϕ] and solvent accessibility (ASA) [correlation coefficient of 0.74 between predicted and actual values, Real-SPINE 3.0 (Faraggi et al., 2009a)]. The above proposed algorithm leads to the new method called SPARKS-X in order to distinguish from previous SP series methods.
We tested SPARKS-X alignment accuracy, fold recognition and structure prediction by using several benchmarks, compared it to several state-of-the-art techniques and participated in the automatic server part of CASP (CASP 9). All results indicate that SPARKS-X is one of the best single-method fold recognition servers. The performance of the method can likely be further improved significantly by incorporating the techniques of multiple templates and refinement in model building that are employed in many other automatic servers.
2.1 Alignment score
The alignment score of SP5 for aligning query position i with the template position j is (Zhang et al., 2008)1) is the profile–profile comparison between the sequence profile from the query sequence and that from the template sequence. Fseqquery(i) is the sequence-derived frequency profile of the query sequence, Mseqtemplate(j) and Mseqquery(i) are the sequence-derived log odd profile of the template sequence and that of query sequence, respectively. These sequence profiles are constructed by three iterations of PSIBLAST (Altschul et al., 1997) searching (E value cutoff of 0.001) against non-redundant (NR) sequence database, which was filtered to remove low-complexity regions, transmembrane regions and coiled-coil segments (Jones, 1999). The second term in Equation (1) compares the sequence profile from the query sequence and that derived from the template structure (sequence profiles that would ‘fit’ to the structure). Fstructemplate(j) is a depth-dependent sequence profile generated from the sequences of those structural fragments that are similar to 9-residue segment structures of the template (Zhou and Zhou, 2005). The third term in Equation (1) measures the difference Δkij between the predicted 1D structural properties of the query sequence and the actual properties of the template (three-state secondary structure, real-value solvent accessibility and real value torsion angles).1) in SP5 to Equation (2) in SPARKS-X: the removal of sequence profile derived from template structure (designed sequences for templates) and replacement of simple difference Δkij by energy terms dependent on the predicted confidence—E(SSt(i)|SSq(j),CSS,q(j)) and E(Δkij|Ck,q(j)). Here, torsion angles ϕ and ψ are treated separately so that the maximum value of k is 4. We dropped the structure-derived sequence profile (the main novel feature in SP3) because we found that including this term no longer improves our results in our new formulation.Faraggi et al., 2009b) for a native secondary structure SSt, and the reference probability P(SSt) is the probability of secondary structure SSt in native proteins. For obtaining the probabilities, secondary structures were predicted by SPINE-X (E.Faraggi, submitted for publication) with three states for templates defined according to DSSP (Kabsch and Sander, 1983). CSS,q is evenly divided into eight discrete states.Faraggi et al., 2009b). The difference for ϕ and ψ are evenly divided into 18 bins, Ck,q are evenly discretized into eight states. Real value solvent accessibility is predicted by Real-SPINE 3 (Faraggi et al., 2009a). The difference values are divided into 20 states, and C4,q is employing 20 amino acids to represent the prediction confidence. All energy terms were obtained from a NR dataset of 2479 proteins with length <500 amino acids from the original SPINE database [25% sequence identity cutoff, X-ray resolution lower than 3 Å and no unknown structural regions (Dor and Zhou, 2007)].
2.2 Parameter training and template ranking
As in SP5, the Smith–Waterman alignment algorithm (Smith and Waterman, 1981) is used to optimize the score that matches the query profiles with template profiles. To reduce the number of parameters, we set w2=w3 (equal weights for two torsion angles). All weight parameters and two gap penalty parameters (gap opening go and gap extension ge) were trained on the Prosup structural alignment benchmark (Domingues et al., 2000). The parameters were trained using the Powell method by many repeats from different random seeds (Press et al., 1992). The final parameters used are w1=1.04, w2=w3=0.23, w4=3.21, go=10.2, ge=0.69 and sshift=−1.52.
The templates are ranked by the greater one of two Z-scores, which is calculated based on the raw alignment score normalized by Lα or lα with L, the full alignment length, l, the non-end gap alignment length and α, a free parameter. The fractional exponent is introduced to mimic the fractional exponent employed in calculating domain–domain interactions (Zhou et al., 2007b). We find that α=3/4 yields a slightly improved (0.4% in TMscore of built model for the SCOP_20 dataset, see below) ranking. This ranking method is the same as used in SP3, SP4 or SP5 except that α=1 was used previously.
2.3 CASP 9 template library and model building
An automatically updated template library is used for the threading. When a new protein is input to the library, it is first divided into domains according to the ‘Author’ parameters in DDOMAIN (Zhou et al., 2007b). The divided domains are compared to existing domains in the library. If the sequence identity is < 40%, or the TM score [by TM align (Zhang and Skolnick, 2005)] between them is smaller than 0.5, the new domains and its chain will be included in the library. The automatically updated library had 31 750 templates on July 15, 2010 at the completion of server predictions in CASP 9.
The model is built by modeller9v7 (Sali et al., 1995) using the alignment generated by SPARKS-X. When there are gaps of > 30 residues in the termini, the program will be recalled to build a model for the missing parts in the region. After that, these different models are linked and steric clashes are removed by using the DFIRE potential functions (Yang and Zhou, 2008; Zhou and Zhou, 2002).
3.1 Alignment accuracy
As in SP3 and SP4, SPARKS-X was optimized by using the Prosup benchmark (Domingues et al., 2000) and tested in SALIGN (Marti-Renom et al., 2004). The Prosup benchmark, prepared by Sippl's group, consists of 127 pairs of proteins with alignment by the structural alignment program Prosup (Domingues et al., 2000). The SALIGN benchmark (Marti-Renom et al., 2004) contains 200 selected pairs with an average pair sharing 20% sequence identity or less and 65% (or more) of structurally equivalent Cα atoms superposed with an rmsd of 3.5 Å (Marti-Renom et al., 2004). Reference alignment is obtained from the structural alignment obtained from the TMalign program (Zhang and Skolnick, 2005) [i.e. TM overlap].
Table 1 shows the alignment accuracy of different methods given by different benchmarks. There is a consistent gradual improvement (1–2%) from SP3, SP4 to SP5 but a much larger improvement from SP5 to SPARKS-X (4–6%). This accuracy is comparable with the recently developed BoostThreader (Peng and Xu, 2009) or the new version of Raptor (Peng and Xu, 2010).
|SP3 (%)||SP4 (%)||SP5a (%)||SP−Xb (%)||BTa (%)||PXa (%)|
|SP3 (%)||SP4 (%)||SP5a (%)||SP−Xb (%)||BTa (%)||PXa (%)|
bSP-X: SPARKS-X, this work.
cOne-to-one match given by the method and Prosup.
dWithin four residues by the method and Prosup.
eOne-to-one match given by the method and TMalign.
It is of interest to know the contribution to the overall accuracy of SPARKS-X made by individual terms in Equation (2). Table 2 compares the accuracy made by individual scoring terms by either adding to sequence profile [Position Specific Scoring Matrix (PSSM)] or removing from SPARKS-X. The results are obtained by training with the Prosup benchmark and testing with the SALIGN benchmark. It is clear that all three terms (secondary structure, torsion angles and ASA) contributed to the accuracy of alignment. Adding them to the PSSM increases the alignment accuracy while removing them from SPARKS-X decreases the accuracy. The contribution from ASA is the largest (5% adding to PSSM in SALIGN or 4% removing from SPARKS-X in SALIGN). Smaller but significant contributions are observed for secondary structure or torsion angles (3–4% for adding to PSSM and 0.4–1% for removing from SPARKS-X). The results from training and testing are consistent with each other.
|Prosup (%)||SALIGN||Prosup (%)||SALIGN|
|Method||1–1a||≤4b||1–1c (%)||Method||1–1a||≤4b||1–1c (%)|
|Prosup (%)||SALIGN||Prosup (%)||SALIGN|
|Method||1–1a||≤4b||1–1c (%)||Method||1–1a||≤4b||1–1c (%)|
aOne-to-one match given by the method and Prosup.
bWithin four residues by the method and Prosup.
cOne-to-one match given by the method and TMalign.
dUsing PSSM matrix from PSIBLAST only.
fUsing PSSM plus secondary structure, or ϕ/ψ, or ASA as noted.
gExcluding secondary structure, or ϕ/ψ, or ASA as noted.
3.2 Testing fold recognition with Lindahl benchmark
The purpose of improving alignment is to increase the ability of recognizing the correct structural fold of a query sequence from a template library. We employed the Lindahl Benchmark for comparing SPARKS-X with different methods. The benchmark is a large data set of 976 proteins, with 555,434, and 321 pairs of proteins in the same family, superfamily and fold, respectively (Lindahl and Elofsson, 2000). Here, the fold recognition sensitivity of each method is tested by aligning each protein with the rest 966 proteins, and checking whether or not the method can recognize the member of same family, superfamily or fold as the first ranked or within top five ranked templates. Thus, the benchmark tests both the modeling accuracy and the ranking methods for fold recognition.
Table 3 shows the fraction of correctly recognized matches for proteins in the same family, superfamily, fold as the first ranked or within top five ranked templates given by various methods. Although many published methods have been applied to this benchmark (Kim et al., 2003; Shi et al., 2001; Xu et al., 2003; Zhou and Zhou, 2004), we only list the most recent ones (Cheng and Baldi, 2006; Liu et al., 2007; Peng and Xu, 2009; Zhou and Zhou, 2004, 2005a). This is because of the time-dependent nature of sequence databases for sequence profiles.
|Family (%)||Superfamily (%)||Fold (%)|
|Methods||Top 1||Top 5||Top 1||Top 5||Top 1||Top 5|
|Family (%)||Superfamily (%)||Fold (%)|
|Methods||Top 1||Top 5||Top 1||Top 5||Top 1||Top 5|
aFrom (Zhou and Zhou, 2004).
bThe percentage in each cell is the fraction of correctly recognized match of proteins in the same fold, super family, family as the first ranked or within top 5 ranked templates.
cFrom Ref. (Cheng and Baldi, 2006).
dFrom Ref. (Zhou and Zhou, 2005).
eFrom Ref. (Liu et al., 2007).
fFrom Ref. (Zhang et al., 2008).
gFrom Ref. (Peng and Xu, 2009).
Table 3 indicates that the improvement over SP3, SP4, SP5 in success rate of fold recognition by SPARKS-X exists in all three levels (family, superfamily and fold) except the Top 1 ranked model in superfamily where the success rate is similar between SP5 (59.8%) and SPARKS−X (59.0%). The largest improvement over SP5 is observed in fold level (7% absolute increase in Top 1 and 8% absolute increase for the best in Top 5). This is somewhat expected because the method was trained for remote homolog recognition (structurally similar protein with < 30% sequence identity in the Prosup benchmark). Comparing to BoostThreader, SPARKS-X is less successful in homology detection (family and superfamily in Top 1) but more successful in fold recognition (2% improvement in Top 1 and 10% improvement in Top 5) as trained.
The above success rates of matching sequences within the same SCOP classification are based on somewhat subjective SCOP definition of family, superfamily and fold (Murzin et al., 1995). A more direct measurement of accuracy is to calculate the accuracy of the first-ranked model built from the fold recognition alignment. First, the model is built by transferring the Cα coordinates of the template structures to the aligned residues in the query sequence. Then, the constructed model is assessed by using the MaxSub score between the model and the known native structure. MaxSub score (Siew et al., 2000) between two structures is a measure of similarity between them with 0.0 indicating no similarity and 1.0 a perfect match. The value is calculated by searching for the largest subset of well-superimposed residues (≤3.5 Å). Table 4 reports the MaxSub scores for the models built by SP3, SP4, SP5 and SPARKS-X methods averaged over the number of proteins. Again SPARKS-X improves over SP5, SP4 and SP3 in all levels. The relative improvement of SPARKS-X over SP5 in MaxSub score is 12, 22 and 13% in family, superfamily and fold levels, respectively.
aAll 976 proteins.
eThe average MaxSub score for the first-ranked models.
3.3 Testing fold recognition with SCOP-20 dataset
We built a SCOP-20 dataset by using domains of sequence identity <20% and chain lengths > 60 from SCOP 1.75. After removing domains with Cα atoms only, we obtained 6367 domains. We also compared our results with HHPRED (Soding et al., 2005) (version 1.5.1) and PRC (Madera, 2008) (version 1.5.6) because these two programs could be downloaded and installed on our local machine. The profiles of the domains for HHPRED are directly downloaded from HHPRED's web page (http://toolkit.tuebingen.mpg.de/hhpred). The profiles for PRC are using profiles generated from three iterations of PSIBLAST. For both these two predictors, default parameters were used. We would like to emphasize that we have only assessed PRC with the sequence profiles generated from PSIBLAST. Its performance may be different if other profiles are employed.
First, we tested the ability of HHPRED and SPARKS-X to recognize a match in the same family, same superfamily (after removing family members from the templates) and same fold (after further removing superfamily members) according to the SCOP definition within top-N templates. Note that on a given search we removed the query protein from the template library. Figure 1 shows the success rates of recognizing at least one template within same family, superfamily or fold as a function of the number (N) of top predicted matching templates. At the family and superfamily level, HHPRED has a higher success rate than SPARKS-X based on top 1–12 templates but a lower success rate afterwards. At the fold level, SPARKS-X has a consistent higher success rate than HHPRED and the difference becomes greater as more top templates are included. Similar results are observed in the ROC curve when the true positive rate is plotted as a function of the false positive rate (Fig. 2) for all pairs of the SCOP-20 dataset. Here, true positives denote the detection of the templates within the same classification (family, superfamily or fold). The performance of SPARKS-X is consistently better than that of HHPRED at the fold level while HHPRED has a higher true positive rate only at low false positive rate at the family and superfamily levels.
To avoid somewhat subjective definition of family, superfamily and fold, another way to compare the ability of recognizing structural similarity is to directly calculate the structural similarity between the target structure and the structure recognized without actually building the model. Results of average TM score between query and Top 1 template are shown in Table 5 where structural similarity is measured by TM align (Zhang and Skolnick, 2005). The table shows that the average TM score given by SPARKS-X is about 3% higher than that given by HHPRED when all templates are employed. The difference between the TM scores given by SPARKS-X and that given by HHPRED is larger if easily recognized templates are removed. SPARKS-X's average TM scores are 5, 14 and 16% higher than that given by HHPRED when templates from same family, superfamily and fold are excluded from the templates library. This result indicates that SPARKS-X has a higher ability than HHPRED or PRC to recognize structurally similar proteins regardless if they are in the same family, superfamily or same fold. The results in Table 5 can be further illustrated by a ROC curve for all SCOP-20 templates (Fig. 3). The positives are defined by templates having TM score > 0.5 to query structures by TM align (i.e. to test the ability to recognize a similar structure). The figure shows that the performance of SPARKS-X is consistently better than that of HHPRED at detecting structurally similar templates from all templates, without the same family members, and without the same family and superfamily members. The difference between the two methods is small at very low false positive rates (see the insert of Fig. 3) but increases significantly at low false positive rates. The difference between Figure 3 and Figure 2 is because family and superfamily members in SCOP are defined according to sequence evolution origins, rather than structural similarity. Our results suggest that using structural similarity is more direct and accurate assessment of the performance of structure prediction techniques.
aAll templates in the dataset.
bAll templates except those belonging to same family, or same superfamily, or same fold, as the query sequence.
The results reported in Table 5 and Figure 3 are based on direct structural comparison between target and template structures. A more common comparison is to measure the accuracy of the model built based on sequence template alignment. We found that this will further improve the performance of SPARKS-X relative to that of HHPRED/PRC because SPARKS-X uses local-global alignment while HHPRED and PRC are based on local alignment. As a result, SPARKS-X typically gives a longer alignment than HHPRED and PRC. This leads to improved scores for models built. For example, the average TM score of Top 1 model from all templates for HHPRED and SPARKS-X are 0.476 and 0.517, respectively. This is 9%, rather than 3% improvement based on structural alignment of target and template structures (Table 5). We also tested HHPRED with the option of ‘-mact 0.05’ because this option leads to almost global alignment and better scoring models. Although it does not change the ability of recognizing structurally similar proteins (Table 5), this option indeed increases the average TM score of Top 1 model from 0.476 to 0.502, which is 3% rather than 9% behind SPARKS-X.
3.4 CASP9 blind prediction
SPARKS-X participated in CASP 9 blind test and ranked #21 within automatic servers in SUM-Zscore, and #12 within independent groups (after removing redundant servers). The majority of the methods ranked before SPARKS-X are consensus techniques except HHPRED and RAPTORX. If the total TM score of Top 5 models (http://zhanglab.ccmb.med.umich.edu/casp9/) are employed as a criterion, SPARKS-X is ranked #12 (#6 in groups). This comparison of Top 5 models is meaningful as all top servers except HHPRED series submitted five models. Moreover, if ranked by TM score plus hydrogen bond score of Top 1 model, SPARKS-X is ranked #6 (#5 in groups) behind QUARK/Zhang-Server, ROSETTA, Seok-Server and GWS only. This indicates that the models built by SPARKS-X have better hydrogen bonds than many servers. As a reference, our method can be compared to MUSTER (Wu and Zhang, 2008), an extension to the SP4 server (Liu et al., 2007) by incorporating torsion angles and hydrophobicity. The summed TM score of SPARKS-X server is 5% higher than that of MUSTER.
We have reported a new fold recognition server called SPARKS-X that is significantly different from our previous versions in how the profile–profile matching score is obtained. Moreover, we also employed significantly improved secondary structure prediction, real value torsion angle prediction and solvent accessibility prediction. All these techniques made an improvement over our previous SP series possible. We found that predicted ASA contributes the most to the overall accuracy of SPARKS-X.
One interesting observation is that SPAKRS-X performs significantly better in recognizing structurally similar proteins (3%) and in building better models (3%) based on the large dataset of SCOP-20 and the latest version of HHPRED available on the web. On the other hand, limited CASP 9 blind prediction suggests the opposite. The official average GDT score for 147 domains given by HHPREDB is 59.5, compared with 57.7 given by SPARKS-X (http://predictioncenter.org/casp9). This 3% improvement of HHPRED over SPARKS-X is likely due to significantly more sophisticated model building techniques employed in the unreleased version of HHPRED by using distance restraints derived from multiple templates together with alignment confidence. Furthermore, SPARKS-X is only 8% behind the best automatic server in official average GDT score of Zhang server (62.2). This 8% is likely due to combined effect of consensus prediction from multiple fold recognition servers, the use of multiple templates and model refinement. This is an area of focus in our future work for further improving SPARKS.
We would like to thank Johannes Soding for helpful comments and for making HHPRED available and Martin Madera for making PRC available.
Funding: National Institutes of Health (grants R01 GM 085003 and 067168).
Conflict of Interest: none declared.