In many species of the protist phylum Apicomplexa, ribosomal RNA (rRNA) gene copies are structurally and functionally heterogeneous, owing to distinct requirements for rRNA-expression patterns at different developmental stages. The genomic mechanisms underlying the maintenance of this system over long-term evolutionary history are unclear. Therefore, the aim of this study was to investigate what processes underlie the long-term evolution of apicomplexan 18S genes in representative species. The results show that these genes evolve according to a birth-and-death model under strong purifying selection, thereby explaining how divergent 18S genes are generated over time while continuing to maintain their ability to produce fully functional rRNAs. In addition, it was found that Cryptosporidium parvum undergoes a rapid form of birth-and-death evolution that may facilitate host-specific adaptation, including that of type I and II strains found in humans. This represents the first case in which an rRNA gene family has been found to evolve under the birth-and-death model.
In order to synthesize protein, both eukaryotic and prokaryotic cells possess ribosomes that consist of protein as well as RNA molecules. These RNA molecules are encoded by three to four different ribosomal RNA (rRNA) genes that are present in variable copy numbers in a given genome. In bacteria, there are three ribosomal rRNA genes (23S, 16S, and 5S), which are arranged in units that are usually present in 10 copies or fewer and dispersed throughout the genome (reviewed in Liao 2000). Previous studies have shown the copies to be highly homogenized at the nucleotide level, although some exceptions do exist (Wang, Zhang, and Ramanan 1997; Yap, Zhang, and Wang 1999). In eukaryotes, three nuclear rRNA genes (28S, 18S, and 5.8S) are arranged in a unit that occurs as a tandem repeat, which in most species also contains a fourth rRNA gene, the 5S gene. However, in some species, 5S gene copies are dispersed throughout the genome, as in Schizosaccharomyces pombe (Wood et al. 2002), are organized in one or more tandem arrays separate from the other nuclear rRNA genes, as in soybeans (Gottlob-McHugh et al. 1990), or as a combination of both, as in humans (Little and Braaten 1989).
Regardless of the organizational pattern, rRNA genes are usually repeated a few to several hundred times and are presumed by most researchers to be highly homogenized, owing to their concerted evolution. Under the concerted evolution model, mutations that arise in one member of a multigene family can spread to all other members through some sort of homogenization process other than purifying selection, such as gene conversion or unequal crossover (Brown, Wensink, and Jordan 1972; Zimmer et al. 1980; Dover and Coen 1981; Arnheim 1983; Li 1997; Ohta 1989, 2000). As a result, the members of the multigene family do not evolve independently. The notion that all rRNA genes (with the exception of those in organellar genomes) evolve in a concerted manner has in effect become dogma (Dover and Coen 1981; Hillis and Dixon 1991; Ohta 2000). Nevertheless, there are cases in which departures from the rRNA concerted evolution model have been identified. For example, there are two distinct repeat families of 18S rRNA genes in dugesiid flatworms (Carranza et al. 1996; Carranza, Baguñà, and Riutort 1999) and two distinct repeat families of 16S rRNA genes in certain actinomycetous bacteria (Wang, Zhang, and Ramanan 1997; Ueda et al. 1999). Similarly, many species of the microbial eukaryotic phylum Apicomplexa possess distinct rRNA gene “types.”
In apicomplexans, the structure and function of rRNA genes has been best studied in Plasmodium. Species within this genus possess functionally distinct rRNA “types” believed to be maintained in response to developmental constraints imposed by a multihost life cycle (Gunderson et al. 1987, McCutchan et al. 1988; Zhu et al. 1990; Rogers et al. 1996; Li et al. 1997; Gardner et al. 2002; Mercereau-Puijalon, Barale, and Bischoff 2002). Specifically, a certain rRNA type is expressed only during a particular growth stage of the organism, whereas another type is expressed at a different stage (Gunderson et al. 1987; Le Blancq et al. 1997; Mercereau-Puijalon, Barale, and Bischoff 2002). An important question that remains unanswered is how rRNA functional heterogeneity evolved and subsequently has been maintained over millions of years of apicomplexan evolutionary history. This is an important question to answer, because the “rule” in all other eukaryotes is that rRNA genes are homogeneous (Coen, Strachan, and Dover 1982; Nei 1987; Li 1997; Graur and Li 2000). As such, the exception of apicomplexan rRNA genes to this “rule” indicates that they must undergo unique mechanisms of multigene family evolution. The purpose of this study is to examine these mechanisms in more detail by studying the evolutionary patterns of 18S rRNA gene diversification in representative apicomplexan species.
Materials and Methods
Nucleotide sequences for all five 18S rRNA genes of Plasmodium falciparum and their 5′ and 3′ flanking regions were extracted from this species' complete genome (Gardner et al. 2002). The 18S gene sequences from other Plasmodium species (Order Hemosporida) as well as species in the Order Eimeriida (represented by the genera Toxoplasma, Cryptosporidium, Neospora, Besnoitia, and Hammondia) were extracted from GenBank. Sequences from the Cryptosporidium parvum whole genome shotgun (Abrahamsen et al. 2004) were also used. The species names and sequence accession numbers are given in the resultant trees. Sequences were aligned using the computer program ClustalX (Thompson et al. 1997) and checked for errors by visual inspection. The computer program PAUP* 4.0 beta, version 10 (Swofford 2002) was used to reconstruct phylogenetic trees by using the neighbor-joining (NJ) (Saitou and Nei 1987) and maximum likelihood (ML) methods. The distances and models used included the Kimura (1980) two-parameter; Hasegawa, Kishino, and Yano (1985) (HKY); and Tamura and Nei (1993) models. Maximum likelihood trees were reconstructed using tree-bisection-reconnection (TBR) branch-swapping under a full heuristic search in which the starting tree was obtained by using the NJ method. Statistical reliability of internal branches was assessed using 1,500 bootstrap replicates with the NJ method and 100 bootstrap replicates with the ML method. The computer program TOPALi v 0.18 was used to assess the probability that nucleotide sequences are subject to recombination. This program implements the graphical and Bayesian approaches developed by McGuire, Wright, and Prentice (1997), McGuire and Wright (2000), and Husmeier and McGuire (2003).
Multigene Family Models
In this study, patterns of apicomplexan 18S gene evolution are reconciled with either the concerted evolution (Brown, Wensink, and Jordan 1972; Zimmer et al. 1980; Arnheim 1983) or birth-and-death evolution models (Hughes and Nei 1989; Ota and Nei 1994; Nei, Gu, and Sitnikova 1997; Gu and Nei 1999; Rooney, Piontkivska, and Nei 2002). In the latter model, gene duplication gives rise to new genes, some of which persist in the genome for long periods, whereas others are lost through deletion events or degenerate into pseudogenes. Accordingly, multigene family members evolve more or less independently and do not show high levels of nucleotide sequence homogeneity under this model, except in the case of recently duplicated genes. Thus, in a phylogenetic analysis of genes from several closely related taxa, sequences will not show a within-species clustering pattern, except in the case of recent gene duplicates (fig. 1).
In contrast, concerted evolution is a form of nonindependent evolution through which a mutation that arises in one member of a multigene family spreads to all other members through a process of recombination such as gene conversion or unequal crossover (Brown, Wensink, and Jordan 1972; Zimmer et al. 1980; Arnheim 1983). As a result of concerted evolution, a multigene family will show the following: (1) high levels of nucleotide sequence identity between gene copies will be retained within species; (2) gene copies will diverge in sequence similarity between species, the degree of which depends on the time since the species last shared a common ancestor; and (3) all gene copies from the same species will cluster together in a phylogenetic tree to the exclusion of genes from other species (fig. 1). These patterns in part distinguish the effects of concerted evolution from that of purifying selection, which also constrains sequence evolution and, therefore, can serve as a homogenizing force. The primary distinction between the two is that concerted evolution is a rapid process that creates species-specific gene clusters, whereas purifying selection results in the maintenance of sequence identity beyond speciation events and, therefore, creates multispecies gene groups. As an aside, purifying selection typically exerts distinct constraints on synonymous versus nonsynonymous nucleotide sites in protein-coding genes, whereas concerted evolution makes no such distinction (Rooney, Piontkivska, and Nei 2002).
Mode of Multigene Family Evolution
The pattern of 18S gene evolution in apicomplexans is marked by varying degrees of differentiation between sequences, which points to a dynamic level of gene duplication, turnover, and maintenance. These patterns are consistent with the birth-and-death model of multigene family evolution described previously. For example, evidence of between-species clustering was found for many different species. In Plasmodium, this pattern is readily apparent with respect to Plasmodium vivax, Plasmodium cynomolgi, Plasmodium knowlesi, Plasmodium malariae, and Plasmodium fragile (fig. 2). Furthermore, in many instances the 18S sequences from the same species were observed to be quite divergent, such as the Plasmodium berghei sequences that differ from each other by 76 to 154 nucleotides (nucleotide distance (d) = 0.038 to 0.08) (fig. 2). Likewise, some C. parvum sequences are highly divergent, in some cases by as many as 126 nucleotides (d = 0.081) (fig. 3). Unsurprisingly then, the Cryptosporidium 18S genes show a clearly discernible pattern of between-species clustering, as does the sarcosystid apicomplexan Toxoplasma gondii (fig. 3).
However, there are instances in which sequences from the same species cluster together or are identical (figs. 2 and 3). There are three explanations for this: (1) the sequences represent different genes that are homogenized through gene conversion; (2) the sequences represent different genes derived from a recent duplication event, in which case not enough time has elapsed for nucleotide differences to have occurred; or (3) the sequences represent allelic copies of the same gene. With regard to some species, the sequences analyzed may be a mixture of paralogs and alleles. For example, in the case of P. vivax, the three large gene clusters correspond to at least three different paralogs. Yet, there may also be allelic sequences within some clusters (e.g., multiple “El Salvador” or sequences within two of the clusters). Thus, it is difficult to choose any one of the aforementioned explanations to the complete exclusion of the others without the aid of genome mapping or sequence data. For this reason, an analysis of the 18S genes from the completed genome sequence of P. falciparum is invaluable because we know the exact number of 18S genes in that genome and their precise chromosomal locations, thus removing any doubt that the sequences are distinct genes and not allelic copies of the same gene. This allows for a thorough examination of gene divergence and phylogeny for 18S coding regions and their 5′ and 3′ flanking regions in this species and will also facilitate analyses designed to detect recombination owing to gene conversion or unequal crossover.
An examination of Kimura (1980) nucleotide distances (d) between P. falciparum genes shows that the 18S coding regions of chromosomes 11 and 13 genes differ by only one nucleotide (d = 0.001) and that these coding regions in turn differ from the chromosome 1 coding region by only five (d = 0.002) and six nucleotides (d = 0.003), respectively. All three of these coding regions are highly divergent (between 199 [d = 0.104] and 130 [d = 0.105] nucleotide differences) from the coding regions of the chromosome five and seven genes, which show 25 nucleotide sequence differences (d = 0.018) from each other. Thus, certain pairs of 18S genes appear to be homogenized. The question raised by this information is whether the apparent homogeniza-tion is the result of recombination or if it actually represents sequence conservation resulting from purifying selection.
To answer this question, the 5′ and 3′ flanking regions from each P. falciparum were analyzed. The latter corresponds to the internal transcribed spacer 1 (ITS1) region, which lies between the 18S and 5.8S genes. The level of divergence displayed between the 5′ and 3′ flanking sequences greatly exceeds what is observed between the 18S coding sequences. The chromosome 11 and 13 sequences differ by 60 nucleotides (d = 0.109) in the 5′ flanking region and by 3 nucleotides (d = 0.008) in the 3′ flanking region. These genes differ from the chromosome 1 gene by 185 (d = 0.427) and 191 (d = 0.438) nucleotides, respectively, in the 5′ flanking region and by 24 (d = 0.072) and 26 (d = 0.079) nucleotides, respectively, in the 3′ flanking region. The three genes differ by an even greater nucleotide distance compared to those on chromosomes 5 and 7, in which the ranges lie between 234 (d = 0.624) and 268 (d = 0.712) differences in the 5′ flanking region and between 53 (d = 0.186) and 62 (d = 0.224) in the 3′ flanking region. The chromosome 5 and 7 genes show the smallest level of divergence in the coding region, yet they differ by 30 nucleotides (d = 0.052) in the 5′ flanking region and by 16 nucleotides (d = 0.046) in the 3′ flanking region. As a whole, these divergence data strongly suggest that recombination does not influence 18S genes in P. falciparum. Otherwise, the topologies of the trees reconstructed from the flanking and coding regions (fig. 4) would have been different from one another owing to the creation of a mosaic of distinct phylogenetic histories in these different regions. Yet, this did not occur. Thus, it is no surprise that Bayesian analyses of recombination conducted using the computer program TOPALi failed to find evidence in support of recombination on a concatenation of all thee regions or when each region was analyzed separately. In summary, these results suggest that recombination does not act on P. falciparum 18S genes, or, if it does, the degree of recombination is so negligible that it has little impact.
Collectively, these results indicate (1) that the high level of nucleotide sequence identity observed between the coding regions of P. falciparum genes is due to purifying selection, (2) that gene conversion (for which no evidence was found here) is not important over the evolutionary history of the P. falciparum 18S genes, and (3) that the duplication events giving rise to these genes are not recent. Thus, evidence was not found in this study for extensive homogenization owing to concerted evolution of 18S genes in this species (Enea and Corredor 1991; Corredor and Enea 1994) or in any other examined here. Instead, the results support a model of birth-and-death evolution under strong purifying selection.
Rapid Locus Turnover in C. parvum
In most cases, genes are shared for prolonged periods between species under the birth-and-death model (Nei, Sitnikova, and Gu 1997; Takahashi, Rooney, and Nei 2000). Yet, there are some exceptional cases in which rapid locus turnover occurs. The MHC class I genes of callitrichine New World monkeys (Cadavid et al. 1997) and the eosinophil-associated RNase genes of rodents (Zhang, Dyer, and Rosenberg 2000) are good examples of multigene families that experience rapid gene turnover due to birth-and-death evolution. In these cases, rapid gene turnover has led to the creation of species-specific gene clusters as a result of frequent gene duplication and loss. Consequently, few or no genes are shared between species.
The results from this study suggest that 18S genes in some apicomplexans may also undergo rapid birth-and-death evolution. This is best shown through an analysis of C. parvum sequences. It was possible to determine in several cases the host-species for the C. parvum strains that that produced the 18S sequences (fig. 3). The sequences that correspond to the human host-species are separable into two classes known as type I and type II. These types produce distinct schizonts that are distinguishable on the basis of how they function in reproduction. Essentially, type I schizonts are involved in asexual reproduction, whereas type II schizonts are involved in sexual reproduction (reviewed in Laurent et al. 1999). In addition, the former have been found only in human hosts, whereas the latter have been found in humans as well as in other animals (Gibbons et al. 1998).
The phylogeny in figure 3 shows that types I and II 18S sequences form two distinct clusters that in turn are distinct from the other C. parvum 18S sequences. These latter sequences come from strains that possess nonhuman host-species (listed in parentheses next to each taxon in fig. 3). This pattern of clustering on the basis of host-species is explainable by host-specific adaptation, which has been described for other kinds of C. parvum molecular and biochemical data (Xiao et al. 2002; Gibbons-Matthews and Prescott 2003). Accordingly, a specific molecular or biochemical genotype is associated solely with a unique host-species. The reasons for host-specific adaptation are unclear, but they may have to do with antigenic variation and evasion of the host immune system. At any rate, rapid birth-and-death evolution would certainly facilitate, if not enhance, the process of host-specific adaptation, regardless of the causes. Accordingly, rapid gene duplication and loss would aid the acquisition of a set of genes optimally adapted to a particular host-species and would eventually guarantee that genes particular to one C. parvum strain are not shared with other strains.
This hypothesis can be tested through a phylogenetic analysis of 18S genes from the recently completed genome of a C. parvum type II strain (Abrahamsen et al. 2004) in conjunction with the sequences of other strains. There are five 18S rRNA genes in C. parvum (LeBlanq et al. 1997), and five genes were reported in the published type II genome. Unfortunately, only a small fragment (approximately 125 bp) of one of those genes (cgd7_5535) was sequenced, while another (cgd2_1375) appears to have been misidentified as an 18S gene. Regarding the latter, there are short stretches of sequence similarity between that gene and the other C. parvum sequences analyzed in this study, but the sequence is so divergent from the other 18S sequences that it cannot be aligned with any reliability. Furthermore, when the sequence was searched against GenBank using Blast, the closest matches were human oncogenes. Thus, only three of the 18S genes from the complete type II genome are useful for our purposes. In so far as that is concerned, the C. parvum genome sequences cluster with other type II sequences (fig. 3) but separately from the type I sequences, indicating that the genes originated subsequent to the divergence of the respective strains from which they came. These results indicate that types I and II genes undergo rapid turnover (i.e., rapid duplication and loss). It should be noted, however, that the support for distinct type I and type II clades is not very high (bootstrap values of 66% and 60%, respectively). Yet, this is not unexpected if the two groups were recently separated. It should also be pointed out that, until the remaining two genes from the type II genome become available for analysis, it cannot be said that all genes from the type II genome cluster apart from the type I sequences. Clearly, more studies are needed to investigate this problem. Nevertheless, the previously described results clearly indicate that 18S genes in C. parvum undergo rapid duplication and loss.
The “rule” regarding rRNA gene copy diversification is that it does not occur within a species because all copies of a given rRNA gene must remain interchangeable in order to maintain the structural and functional homogeneity of the rRNA products that they encode (Coen, Strachan, and Dover 1982; Nei 1987; Li 1997; Graur and Li 2000). Consequently, it is commonly held that the existence of divergent gene copies would skew the rate of ribosome synthesis from its optimum and result in a negative impact on fitness. Still in all, Plasmodium, Cryptosporidium, and other apicomplexans, produce distinct rRNA “types” that differ on the basis of their expression pattern as well as in possessing different regions that control mRNA decoding and translational termination (Gunderson et al. 1987; Le Blancq et al. 1997; Rogers et al. 1996; Li et al. 1997). It has been shown that differences between rRNA genes result in changes to the biology of P. falciparum at different stages of its development in response to the need for unique adaptation and immune-evasion strategies in different hosts (Velichutina et al. 1998). However, what remain to be elucidated are the evolutionary mechanisms that influence these genes, which clearly do not evolve in concert (Rogers et al. 1995) unlike the rRNA genes of virtually all other eukaryotes.
This study shows that apicomplexan 18S rRNA genes evolve according to a birth-and-death model under strong purifying selection. The action of purifying selection guarantees that rRNA gene copies maintain their functional integrity in spite of their independent evolution from one another. The differential rates of duplication and loss produced under the birth-and-death model explain why some rRNA genes are shared between species and are maintained for long periods of evolutionary time, while others appear to be recent gene duplicates or to have been lost from the genomes of other species (figs. 2 and 3). This explains why any one 18S gene copy does not exist in all species of apicomplexans, as there is a distinct process of gene turnover owing to repeated duplication and loss. Yet, how rapid is the process of gene turnover among apicomplexan 18S rRNA genes?
The results presented here indicate that there is a fairly rapid degree of 18S gene turnover in C. parvum and that this is probably tied to host-specific adaptation. It has been known for some time that there must be an underlying reason for the production of distinct C. parvum molecular and biochemical genotypes that have led to their classification as type I or type II (Gibbons-Matthews and Prescott 2003). What is unclear is whether type I and type II genotypes reflect that the cells from which the sequences come are distinct strains. The results from this study indicate that they indeed represent distinct strains, on the basis of our phylogenetic analyses (fig. 3). Given that type I has been found only in humans, whereas type II has been found in both humans and nonhuman species, it likely that type I represents a uniquely human-adapted strain. This information should prove useful in epidemiological studies of both human and veterinary cryptosporidiosis.
The results concerning rapid gene turnover in C. parvum are particularly interesting in light of the observation that T. gondii 18S rRNA genes do not form species-specific clusters (fig. 3) despite the fact that they are organized in a tandem array (Gagnon, Bourbeau, and Levesque 1996). This presents itself as an unusual situation that warrants further study, because concerted evolution is supposed to be the rule among multigene family members arranged in tandem arrays (Nei 1987; Li 1997; Graur and Li 2000). Although it cannot be shown with the currently available data, perhaps this result is also indicative of rapid locus turnover in different strains of this species. Thus, it will be interesting to examine T. gondii 18S genes more thoroughly after the completion of this species' genome sequencing project, provided that the rRNA gene cluster is sequenced.
Unfortunately, rRNA genes are not sequenced in their entirety in most genome projects if they are organized in large clusters. One reason for this is the practical difficulty of sequencing through a cluster of highly similar genes without having inadvertently sequenced any individual gene repeatedly. Another reason is the general presumption that all rRNA genes are identical (or nearly so) in a given genome, which leads many to believe that there is no need to sequence through rRNA clusters because they are uninteresting in terms of their molecular or genomic evolution. Thus, a large effort need not be expended upon them. Nevertheless, this study and others (Carranza et al. 1996; Carranza, Baguñà, and Riutort 1999) show that the evolutionary genomics of rRNA genes is more complex than what we currently assume in many different species. For instance, the 5S rRNA genes of many eukaryotes are organized in a tandem array, but in some species of fungi they are dispersed across the genome (e.g., Wood et al. 2002). In many of these species, the 5S genes appear to be divergent from one another (unpublished data), suggesting that they might also undergo birth-and-death evolution. Clearly, these observations and the findings of this study indicate that the evolutionary genomics of rRNA genes deserves a higher level of scrutiny than what it is currently afforded.
Laura Katz, Associate Editor
I thank L. Katz, C. P. Kurtzman, M. Nei, T. J. Ward, J. Zhang, and two anonymous reviewers for comments on the manuscript. The mention of firm names or trade products does not imply that they are endorsed or recommended by the U.S. Department of Agriculture over other firms or similar products not mentioned.