We present a systematic analysis of sequence motifs found in metazoan protein factors involved in constitutive pre-mRNA splicing and in alternative splicing regulation. Using profile analysis we constructed a database enriched in protein sequences containing one or more presumptive copies of the RNArecognition motif (RRM). We provide an accurate alignment of RRMs and structure-based criteria for identifying new RRMs, including many that lack the prototype RNP-1 submotif. We present a comprehensive table of 125 sequences containing 252 RRMs, including 22 previously unreported RRMs in 17 proteins. The presence of a putative RRM in these proteins, which are implicated in a variety of cellular processes, strongly suggests that their function involves binding to RNA. Unreported homologies in the RRM-enriched database to the metazoan SR family of splicing factors are described for an Arg-rich human nuclear protein and two yeast proteins (S. pombe mei2 and S. cerevisiae Npl3). We have rigorously tested the phylogenetic relationships of a large sample of RRMs. This analysis indicates that the RRM is an ancient conserved region (ACR) that has diversified by duplication of genes and intragenic domains. Statistical analyses and classification of repeated Arg-Ser (RS)and RGG domains in various protein splicing factors are presented.

Author notes

+ Balliol College, Oxford, OX1 3BJ, UK
§ New England Biolabs, Beverly, MA 01915, USA