This paper presents an unsupervised inference method for determining the higher-order structure from sequence data. The method is general, but in this paper it is applied to nucleic acid sequences in determining the secondary (2-D) and tertiary (3-D) structure of the macromolecule. The method evaluates position -position interdependence of the sequence using an information measure known as expected mutual information. The expected mutual information is calculated for each pair of positions and the chi-square test is used to screen statistically significant position pairs. In the calculation of expected mutual information, an unbiased probability estimator is used to overcome the problem associated with zero observation in conserved sites. A selection criterion based on known structural constraints of the strongest interdependent position pairs is applied yielding position pairs most indicative of secondary and tertiary interactions. The method has been tested using tRNA and 5S rRNA sequences with very good results.

You do not currently have access to this article.