Abstract

We present a statistical analysis of protein structures based on interatomic Ca distances. The overall distance distributions reflect in detail the contents of sequence-specific substructures maintained by local interactions (such as α-helixes) and longer range interactions (such as disulfide bridges and β-sheets). We also show that a volume scaling of the distances makes distance distributions for protein chains of different length superimposable. Distance distributions were also calculated specifically for amino acids separated by a given number of residues. Specific features in these distributions are visible for sequence separations of up to 20 amino acid residues. A simple representation, which preserves most of the information in the distance distributions, was obtained using six parameters only. The parameters give rise to canonical distance intervals and when predicting coarse-grained distance constraints by methods such as data-driven artificial neural networks, these should preferably be selected from these intervals. We discuss the use of the six parameters for determining or reconstructing 3-D protein structures.

Author notes

Present address: The Rockefeller University, Box 270, 1230 York Avenue, New York, NY 10021, USA