The complete nucleotide sequence of the mitochondrial genome of a very primitive unicellular red alga, Cyanidioschyzon merolae, has been determined. The mitochondrial genome of C.merolae contains 34 genes for proteins including unidentified open reading frames (ORFs) (three subunits of cytochrome c oxidase, apocytochrome b protein, three subunits of F1F0-ATPase, seven subunits of NADH ubiquinone oxidoreductase, three subunits of succinate dehydrogenase, four proteins implicated in c-type cytochrome biogenesis, 11 ribosomal subunits and two unidentified open reading frames), three genes for rRNAs and 25 genes for tRNAs. The G+C content of this mitochondrial genome is 27.2%. The genes are encoded on both strands. The genome size is comparatively small for a plant mitochondrial genome (32 211 bp). The mitochondrial genome resembles those of plants in its gene content because it contains several ribosomal protein genes and ORFs shared by other plant mitochondrial genomes. In contrast, it resembles those of animals in the genome organization, because it has very short intergenic regions and no introns. The gene set in this mitochondrial genome is a subset of that of Reclinomonas americana, an amoeboid protozoan. The results suggest that plant mitochondria originate from the same ancestor as other mitochondria and that most genes were lost from the mitochondrial genome at a fairly early stage of the evolution of the plants.
Mitochondria are thought to be derived from eubacterial endosymbionts because they have their own DNA and the machinery for gene expression. The mitochondrial genomes of the three major kingdoms, Animalia, Eukaryomycota and Plantae, have different characters. The size of animal mitochondrial genomes ranges mostly between 16 and 19 kb and they do not contain introns (1). The complete nucleotide sequences of many animal mitochondrial genomes have been determined (Homo sapiens, Xenopus laevis, etc.; 2,3). The size of mitochondrial genomes of fungi is in the range 17–176 kb (4). The mitochondrial genomes of fungi encode a few more genes than those of animals. But the main reason for the variety in size is not differences in coding capacities but the sizes of introns and spacer regions. The size of plant mitochondrial genomes (in this paper we mean by ‘plants’ plants and algae) is extremely variable. It ranges from 16 to 2400 kb (5). The variety of gene content and molecular structure and the variation of the length of spacer regions and introns are the major characteristics of plant mitochondrial genomes. In plants, the complete nucleotide sequence of the mitochondrial genome has been reported in several species: Arabidopsis thaliana (6), Chlamydomonas eugametos (7), Chlamydomonas reinhardtii (8), Chondrus crispus (9), Prototheca wickerhamii (10) and Marchantia polymorpha (11).
To shed light on the origin and relationship of plant mitochondria, the mitochondrial genome of a very primitive photosynthetic organism should be analyzed and compared with those from other sources. One good candidate for the most primitive alga is Cyanidioschyzon merolae. This alga is thought to be a most primitive plant from observations such as the mode of cell division, the localization of plastid nuclei and the small size of the nuclear genome (12,13). The results of molecular phylogenetic analyses also support the primitiveness of this alga (14,15). In this paper, the complete nucleotide sequence of the mitochondrial genome of the unicellular red alga C.merolae was determined. The genes contained in it were compared with those of other sources and the evolution of mitochondrial genomes is discussed.
Materials and Methods
Growth conditions and isolation of mitochondrial DNA
The mitochondrial DNA was digested with restriction endonucleases, EcoRI and HindIII, and the resultant fragments were cloned in pBluescript II SK+ (Stratagene, CA) using Escherichia coli XL-1 blue as the host bacterium. Exonuclease III and mung bean nuclease digestions (Stratagene, CA) were used to create a series of overlapping deletions of each mitochondrial insert. Nucleotide sequence was determined on both strands by the chain termination method (18) with the Taq Dye Terminator Sequencing Kit (Applied Biosystems, CA).
Computer-aided sequence analysis
Open reading frames (ORFs) and tRNA genes were detected with the DNASIS software package (Hitachi, Tokyo, Japan). Similarity searches of the putative open reading frames and tRNA sequences against the GenBank and SwissProt databases were performed with the BLAST program (19) at the Genome Net through the Internet. Similarity searches were also performed against the mitochondrial ORFs of Acanthamoeba castellani (20), A.thaliana (6), C.crispus (9), M.polymorpha (11), P.wickerhamii (10) and Reclinomonas americana (21) with the FASTA program (22) for the Power Mac.
Construction of phylogenetic tree
A phylogenetic tree based on the amino acid sequence deduced from the nucleotide sequence of cox3 was constructed by the maximum likelihood method (23) with the PUZZLE program for the Power Mac (24) using the JTT model of sequence evolution (25) and 1000 puzzling steps. The following database entries of amino acid sequences were used in the calculation: A.castellanii, A.thaliana (Y08501), C.crispus (Z47547), C.merolae (this study), Cyanidium caldarium (Z48930), H.sapiens (D38112), M.polymorpha (M68929), P.wickerhamii (U02970) and R.americana (AF007261).
Results and Discussion
Physical characterization of the mitochondrial genome of C.merolae
The restriction map of the ∼32 kb mitochondrial genome of C.merolae strain 10D was described in a previous report (12). We cloned and sequenced the whole mitochondrial DNA. The DNA is a circular molecule of 32 211 bp, which confirmed the earlier results. The overall G+C content is 27.2%. This base composition is comparable with those of P.wickerhamii (25.8%), C.crispus (27.9%), R.americana (26.1%) and A.castellanii (29.4%) and lower than that of M.polymorpha (42.4%) and A.thaliana (44.8%).
Gene content and overall organization
The gene organization of the genome deduced from the nucleotide sequence is shown in Figure 1. The mitochondrial genome of C.merolae contains 34 protein coding genes, three genes for rRNA and 25 genes for tRNAs. The genes identified in the mitochondrial genome are summarized in Table 1. None of the genes contain an intron, though some of the mitochondrial genes of C.crispus, P.wickerhamii, M.polymorpha, A.thaliana, A.castellanii and R.americana contain one or more introns.
Spacer regions between each pair of genes are very short. Intergenic regions amount to only 4.5%, showing the high coding density of the genome. The absence of introns and the very high coding density of the mitochondrial genome are features shared by the mitochondrial genomes of animals (1).
There are five regions of overlapping genes: nad6-trnL(uag), yejR-yejW, yejV-rps20, orf267-trnE(uuc), rps12-trnA(ugc). The whole trnE gene is contained in the coding region of orf267. Each overlapping region belongs to one of the following two categories: (i) between genes encoding protein and tRNA on the same strand or (ii) between two genes on different strands.
In the mitochondrial genome of C.merolae, genes and ORFs are present in four major gene clusters (Fig. 1). The largest gene cluster is from orf267 to cox1. This cluster contains genes for rRNAs, cytochrome oxidase subunits, succinate dehydrogenase subunits and tRNAs and so forth. The second largest gene cluster is from yejR to nad3. This cluster contains many genes for NADH dehydrogenase subunits. The third gene cluster is from trnS(uga) to rpl20. This cluster contains many ribosomal protein genes. The smallest cluster contains yejV and yejW, both of which were only known up until now from the R.americana mitochondrial genome. Genes and ORFs are encoded in two major clusters in C.crispus and P.wickerhamii, one cluster in A.castellanii and four clusters in R.americana.
In the mitochondrial genome of C.merolae, intergenic regions are short (<70 bp) except for three regions. The three rather long intergenic regions are between cox1 and nad3 (107 bp), trnR and atp9 (118 bp) and rps3 and trnQ (94 bp) (Fig. 2). The regions cox1-nad3 and trnR-atp9 are located diametrically opposite each other on the circular genome and both of the two regions contain two putative stem-loop structures. The cox1-nad3 spacer region, which is the junction of the two clusters encoded on the opposite strand, contains two stable stem-loops with high free energies of −130.0 and −263.1 kJ/mol, respectively. The G+C content of the region is 38% and this value is approximately equal to the average for the mitochondrial genome of C.merolae (27.5%). The trnR-atp9 spacer region also contains two putative stem-loops (Fig. 2b). Between the two stem-loops, a direct repeat composed of 7 bp (5′-TTAAACC-3′) exists. The G+C content in this intergenic region is 18% and this value is about average for a spacer region of the mitochondrial genome of C.merolae. Their free energies are −50.3 and −66.2 kJ/mol, respectively, i.e. lower than those of the former two stem-loops. No other stem-loop structures are predicted in intergenic sequences. These observations suggest that these regions play a role in DNA replication or transcription.
The rps3-trnQ spacer region is the junction of the two clusters encoded in the opposite strands, namely the cluster from 5′-rps3 to rpl20-3′ and the cluster from 5′-orf267 to cox1-3′. This region does not contain a typical inverted repeat sequence but contains a purine-rich region of 20 bp (Fig. 2c, underlined sequence), comprising 13 guanines, six adenines and one cytosine.
Phylogenetic relationship of the mitochondrion
The phylogenetic tree inferred from the amino acid sequence of cox3 was constructed (Fig. 3). We chose cox3 to construct a phylogenetic tree because cox3 is known to construct a robust mitochondrial phylogeny (26). Phylogenetic trees constructed by the maximum likelihood method (23) are shown in Figure 3. Cyanidioschyzon merolae is within the branch of rhodophytes together with C.caldarium and C.crispus. Chlorophytes (in sensu lato containing land plants) and rhodophytes are grouped together, respectively. This result suggests the monophyly of plant mitochondria.
Protein coding genes
We searched for ORFs longer than 60 codons starting with an ATG or GTG codon and 34 ORFs were detected. Genes were identified by similarity searches against databases. All the ORFs for proteins start with ATG and no mRNA editing is suggested from the sequence comparison.
The mitochondrial genome of C.merolae contains a standard set of mitochondrial protein coding genes, namely the genes for subunits of cytochrome oxidase, apocytochrome b, ATP synthase and NADH dehydrogenase complexes. The genome also contains additional protein coding genes: genes involved in cytochrome c biogenesis, genes for succinate dehydrogenase subunits, genes for ribosomal proteins and two other ORFs homologous to other mitochondrial genes (Table 1).
A particular feature of the protein coding genes of C.merolae is the existence of several genes that are not included in the standard set of mitochondrial genes despite the small genome size: three genes for succinate dehydrogenase subunits (sdhB, sdhC and sdhD) and four genes for c-type cytochrome biogenesis (yejR, yejU, yejV and yejW). Among them, yejV (channel subunit of ABC transporter for cytochrome c1) and yejW (ATP-binding subunit of ABC transporter for cytochrome c1) have previously been identified as mitochondrial genes only in R.americana.
In the mitochondrial genome of C.merolae, several functionally related genes are found in clusters. Two genes for the subunits of succinate dehydrogenase, sdhB and sdhC, as well as the genes for the ATPase subunits atp6 and atp8 and two genes for the subunits of yejV and yejW are together in the same orientation. The genes for apocytochrome b (cytb) and cytochrome oxidase subunit (cox1, cox2 and cox3) are on the same strand and located close together. Among the genes for NADH dehydrogenase subunits, nad1, nad2, nad3, nad4 and nad5 are grouped together in the same transcriptional orientation, although nad4L and nad6 are encoded by the opposite strand and located apart from the remaining nad genes.
The mitochondrial genome of C.merolae contains rpl5, rpl6, rpl14, rpl16, rpl20, rps3, rps4, rps8, rps11, rps12 and rps14 as ribosomal protein genes. All the ribosomal protein genes except rpl20 belong to the large ribosomal gene clusters, that correspond to the str, S10, spc and a operons in E.coli. Eight ribosomal protein genes out of 11 make one gene cluster in the mitochondrial genome of C.merolae. The rpl6, rps4 and rps12 genes are encoded by the opposite strand and located apart from the main ribosomal gene cluster. This suggests that these genes have been translocated during the evolution of the mitochondrial genome. A comparison of the organization of the ribosomal protein gene cluster of E.coli and some other mitochondrial genomes is shown in Figure 4.
In E.coli, 33 genes are present in the str, S10, spc and a clusters. The mitochondrial genome of R.americana contains 19 ribosomal protein genes that are present in these clusters. It lacks fus, rpl3, rpl4, rpl7, rpl15, rpl17, rpl22, rpl24, rpl29, rpl30, rpl36, rps5 and rps17 that are commonly found in the ribosomal gene cluster of bacteria. Cyanidioschyzon merolae lacks, in addition, rpl2, rps10, rps13 and rps1. In the rhodophyte lineage, C.crispus and Porphyra purpurea contain only four ribosomal protein genes in these clusters. rpl14, rpl5, rps14, rps8 and rpl6, which are present in the mitochondrial genome of C.merolae, have been lost from the mitochondrial genome of C.crispus and P.purpurea. As rpl14, rpl5, rps14 and rps8 are contiguous in C.merolae, these four genes might have been lost from the mitochondrial genome by a single event during the evolution of the red algae.
In the lineage of chlorophytes (in sensu lato, above), M.polymorpha contains 14 ribosomal protein genes and P.wickerhamii contains 11. The results suggest that ribosomal protein genes appear to have been lost from the mitochondrial DNA by several different events. It seems that, in rhodophytes, genes tend to be lost from the mitochondrial genome faster than in chlorophytes.
Two ORFs have been identified as homologs of their counterparts in plant mitochondria. orf171 is homologous to orf183 in M.polymorpha, orf183 in P.wickerhamii, orf183 in C.crispus and orf25 in angiosperms. orf267 is homologous to orf244 in M.polymorpha, orf234.1 in P.wickerhamii, orf262 in C.crispus and orfx in angiosperms.
Protein genes contained in C.merolae were compared with those of the entirely sequenced mitochondrial genomes of C.crispus, P.wickerhamii, M.polymorpha, A.thaliana, A.castellanii, R.americana and H.sapiens in Table 2. The gene set in the mitochondrial genome of C.merolae is a subset of that of R.americana, suggesting that the mitochondria of C.merolae have a common ancestor with R.americana.
The atp1 gene is not contained in the rhodophyte lineage. The atp1 gene might have been lost from the mitochondrial genome after rhodophytes separated from chlorophytes.
The mitochondrial genome of P.wickerhamii contains less genes than that of M.polymorpha. Prototheca wickerhamii does not contain genes for succinate dehydrogenase subunits and c-type cytochrome biogenesis. Acanthamoeba castellanii and H.sapiens do not contain them. These genes appear to have been lost from the mitochondrial genome by several different events in the process of evolution.
The nad1, nad2, nad3, nad4, nad4L, nad5 and nad6 genes seem to be members of the standard set of mitochondrial genes. Cyanidioschyzon merolae contains this standard set. Prototheca wickerhamii contains nad7 and nad9, in addition. Acanthamoeba castellanii contains nad7, nad9 and nad11. Marchantia polymorpha contains a pseudo nad7. Reclinomonas americana contains nad7, nad8, nad9, nad10 and nad11. In rhodophytes, C.crispus contains the standard set. The results suggest that nad7, nad8, nad9, nad10 and nad11 were transferred to the cell nucleus in the course of the evolution of the rhodophyte lineage.
The yejV and yejW genes are located between the two large transcriptional units, one of which contains the atp and nad genes and the other contains ribosomal protein genes. The yejV and yejW genes have been lost from the mitochondrial genome of C.crispus in the rhodophyte lineage.
In the rhodophytes, the mitochondrial genome of C.crispus uses UGA as one of the tryptophan codons (9). In contrast, the mitochondrial genomes of P.wickerhamii and M.polymorpha use the universal code, i.e. only UGG is used as the codon for tryptophan. The alignment of the mitochondrially encoded proteins in C.merolae with homologous proteins from other sources reveals that no modified code is used in this mitochondrial DNA. Cyanidium caldarium, which is closely related to C.merolae, also uses the universal code (27). The results suggest that UGA was the ancestral stop codon in rhodophytes and, in the process of evolution, UGA was changed to code for tryptophan in the lineage of C.crispus.
Transfer RNA genes
Twenty-five tRNA genes are present in the mitochondrial genome of C.merolae (Fig. 5). Unlike mammalian mitochondrial tRNAs, but like plant mitochondrial tRNAs, those of C.merolae can all be folded into the standard cloverleaf configuration and have normal patterns of invariant and semi-invariant residues. Four tRNAs display unorthodox features as compared with the conventional numbering system (28): T14 is present in place of A and the pyrimidine 11-purine 24 pair is changed to A-U in trnG(ucc) and trnfM(cau); pyrimidine is present in place of the usual purine 26 in trnW(cca); A is present in place of the normal pyrimidine 32 in trnM(cau). The tRNA for tryptophan has the anticodon 5′-CCA-3′ and is expected to read UGG but not UGA.
This set of 25 tRNAs is not sufficient to translate all codons, even when taking into account wobble and the possible modifications of their anticodons. A minimum of one tRNA gene remains to be identified for complete translation of the mitochondrial genetic information of C.merolae, namely, trnT for ACN (Table 3). In potato and wheat, trnT is encoded in the cell nucleus (29) and in C.crispus, trnT is also lacking from the mitochondrial genome. In A.castellanii, Chlamydomonas reinhardtii, M.polymorpha and angiosperms, some of the mitochondrial tRNAs are of plastid origin (30). In C.merolae, trnT, which is not in the mitochondrial genome, may be imported from the cytosol or plastids or generated from another tRNA by partial editing or post-transcriptional modification.
Evolutionary implications from the comparison of mitochondrial genomes
Mitochondria are thought to have originated from an a-proteobacteria-like ancestor (30). The putatively most ‘primitive’ mitochondrial genome so far studied is that of R.americana (21), a freshwater protozoan. Its mitochondrial genome encodes a eubacterial RNA polymerase. The mitochondrial genome of R.americana contains several genes that are not known in other mitochondrial genomes. Acanthamoeba castellanii is an amoeboid protozoan and is also considered to be among very primitive organisms. Though the mitochondrial genome contains less genes than that of R.americana, it shares many genes with the mitochondrial genomes of plants and algae, including C.merolae. Cyanidioschyzon merolae is thought to be among the most primitive plants and its mitochondrial genome shares many genes with the mitochondrial genome of higher plants, A.castellanii and R.americana. The results suggest that the mitochondria of plants and animals have a common ancestor, from which the mitochondrial genomes of plants and animals have diverged.
In the rhodophyte lineage, the mitochondrial genome of C.crispus has ‘non-plant’ characteristics (27) such as a small size and high coding densities. The mitochondrial genome of C.merolae also has these two ‘non-plant’ characteristics.
In plants and algae, the differences in gene content of C.merolae, P.wickerhamii, M.polymorpha and A.thaliana are rather small, suggesting that the mitochondrial genome of algae and plants have a common ancestor. In the lineage of rhodophytes, the mitochondrial genomes of C.crispus contains a smaller number of genes than that of C.merolae. The results suggest that the genes contained in the mitochondrial genome have been lost step by step in the evolutionary process of rhodophytes. In this process, the code for tryptophan has been changed from UGG to UGA. The change of the genetic code has also been shown in several green algae mitochondria (31). The result suggests that the change of genetic code occurred independently in the evolutionary process of various lineages of the mitochondrion.
This work was supported in part by a Grant-in-Aid for Scientific Research to N.O. (no. 085404) and to N.S. (no. 0845254) from the Ministry of Education, Science, Sports and Culture of Japan and a grant to N.O. from Funakoshi & Co. Ltd, Tokyo, Japan.