Motivation: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins.
Results: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method.
To whom correspondence should be addressed.