Abstract

RNase P is the endonuclease that removes 5′ leader sequences from tRNA precursors. In Eukarya, separate RNase P activities exist in the nucleus and mitochondria/plastids. Although all RNase P enzymes catalyze the same reaction, the different architectures found in Eukarya range from ribonucleoprotein (RNP) enzymes with a catalytic RNA and up to 10 protein subunits to single-subunit protein-only RNase P (PRORP) enzymes. Here, analysis of the phylogenetic distribution of RNP and PRORP enzymes in Eukarya revealed 1) a wealth of novel P RNAs in previously unexplored phylogenetic branches and 2) that PRORP enzymes are more widespread than previously appreciated, found in four of the five eukaryal supergroups, in the nuclei and/or organelles. Intriguingly, the occurrence of RNP RNase P and PRORP seems mutually exclusive in genetic compartments of modern Eukarya. Our comparative analysis provides a global picture of the evolution and diversification of RNase P throughout Eukarya.

RNase P is the endonuclease that removes 5′ leader sequences from tRNA precursors, an essential step in tRNA maturation (Lai et al. 2010; Liu and Altman 2010). The virtually ubiquitous enzyme independently originated at least twice in evolution with different architectures. Ribonucleoprotein (RNP) enzymes based on a catalytic RNA molecule (P RNA) represent the more ancient type that is found in all three domains of life. Although their RNA is structurally conserved, their protein partners are highly divergent with a single protein in Bacteria, 4–5 in Archaea, and up to 10 in eukaryal nuclei (Hartmann et al. 2009; Ellis and Brown 2010; Lai et al. 2010; Liu and Altman 2010; Walker et al. 2010). All known nuclear RNase P RNPs are composed of a P RNA of about 350 nt and a set of proteins, always including RPP21/RPR2, RPP29/POP4, RPP30/RPP1, POP5, POP1, RPP20/POP7, and RPP25/POP6 (Hartmann E and Hartmann RK 2003; Rosenblad et al. 2006; Walker et al. 2010). The reasons for the massive increase in the protein moiety of the enzyme in Eukarya as compared with Archaea or Bacteria are poorly understood and have been speculated to be related to added functionality of the eukaryal enzyme (Marvin and Engelke 2009a, 2009b; Jarrous and Gopalan 2010), although recent RNase P replacement experiments do not support such notion (Weber et al. 2014). Studies of the prevalence of nuclear RNP RNase P subunits in eukaryal genomes are complicated by the presence of a related RNP, RNase MRP, exclusively found in Eukarya and involved in 5.8S rRNA maturation. This RNP enzyme is composed of a structurally related, but nonetheless distinguishable RNA, and a largely overlapping set of proteins (Jarrous and Gopalan 2010; Walker et al. 2010). In fact, it appears that RPP21 is the only protein not shared by the two RNPs, but consistently specific to RNase P.

A fundamentally different type of RNase P is composed of protein only (PROteinaceous RNase P, PRORP) and appears confined to the eukaryal domain. In its simplest form it consists of a single 60-kDa protein, but requires additional subunits in some cases, for example, two other protein components in human mitochondrial RNase P (Holzmann et al. 2008; Gobert et al. 2010, 2013; Gutmann et al. 2012; Taschner et al. 2012; Pinker et al. 2013). The two kinds of RNase P are highly similar in terms of substrate and cleavage specificity, and they were even found to be functionally exchangeable in Escherichia coli and Saccharomyces cerevisiae (Gobert et al. 2010; Taschner et al. 2012; Weber et al. 2014).

The discovery of protein-only RNase P (PRORP) enzymes in Eukarya pointed out that the evolution of RNase P is more intriguing and complex than previously thought. Questions are raised as to when PRORP appeared during evolution, and if there may still be evolutionary traces of its coexistence with RNP RNase P within the same cellular compartment. Where and how did such a coexistence lead to the divergent specialization and compartmentalization of the different RNase P enzymes? Here, we analyze and compare the prevalence and architectural type of both RNP and PRORP enzymes in Eukarya. We find that PRORP enzymes are widespread among eukaryal lineages and propose reasonable scenarios for the evolution of RNase P in Eukarya.

Results and Discussion

Incidence of Nuclear Ribonucleoprotein RNase P

Here we update the distribution of P RNA and RPP21 (the protein subunit not found in RNase MRP) in eukaryal nuclear genomes, based on previously published studies (Hartmann E and Hartmann RK 2003; Marquez et al. 2005; Piccinelli et al. 2005; Rosenblad et al. 2006) and analyses of newly available genome data, to determine the prevalence of nuclear RNP RNase P in the different branches of Eukarya. The results are summarized in table 1 and the inventory detailed in supplementary table S1, Supplementary Material online (http://bioinf.pharmazie.uni-marburg.de/supplements/rnase_p_2015/ last accessed September 14, 2015). For example, we identified a variety of novel P RNAs including hitherto unexplored taxa.

Table 1.

Overview of the occurrence of RNP and PRORP RNase P enzymes in nuclei and organelles throughout the eukaryal tree.

Nucleus Encoded
Organelle Encoded
Nuclear RNase P
Organellar RNase P
RNP RNase P
PRORP
RNP RNase P
SupergroupsSubgroupsRepresentative speciesP RNAP proteinNuclearOrganellarP proteinP RNA
OpisthokontaHolozoaMetazoaAnimaliaBilateriaHomo sapiensnanamaC
RadiataAcropora digitiferanmC
PoriferaAmphimedon queenslandicann?
PlacozoaTrichoplax adhaerensnnmC
ChoanomonadaMonosiga brevicollisnmC
FilastereaCapsaspora owczarzakinmC
IchthyosporeaAmoebidium parasiticumnnmC
Sphaeroforma arcticanmC
NucletmyceaNucleariaNuclearia simplex
FungiMicrosporidiaEncephalitozoon cuniculinnC
ChytridiomycotaSpizellomyces punctatusn−n
BlastocladialesAllomyces macrogynusn
MucoromycotinaRhizopus oryzaennmC
MortierellaceaeMortierella verticillatannmC
DikaryaAscomycotaSaccharomyces cerevisiaenana(m)a(m)aC
BasidiomycotaPostia placentann(m)aC
AmoebozoaDiscoseaAcanthamoeba castellaniinn
ArchamoebaeEntamoeba disparnC
MyxogastriaPhysarum polycephalumn
DictyosteliaDictyostelium discoideumnana
ArchaeplastidaGlaucophytaCyanophora paradoxan′np
RhodophyceaeBangialesPorphyra purpureap
CyanidialesCyanidioschyzon merolaep
FlorideophycidaeChondrus crispusp
PorphyridiophyceaePorphyridium purpureumn′p
Chloroplastida (Viridiplantae)ChlorophytaTrebouxiophyceaeChlorella variabilisnm, pC
ChlorophyceaeChlamydomonas reinhardtiinm, pC
MamiellophyceaeOstreococcus taurinam, pm, pC
CharophytaStreptophytaArabidopsis thaliananam, paC
SARStramenopilesBlastocystisBlastocystis hominis?
LabyrinthulomycetesSchizochytrium aggregatumm
PelagophyceaeAureococcus anophagefferensnm, pC
EustigmatalesNannochloropsis gaditananm, pC
PeronosporomycetesPhytophthora sojaenmC
PhaeophyceaeEctocarpus siliculosusnm, pC
DiatomeaThalassiosira pseudonananm, pC
AlveolataProtalveolataPerkinsus marinus?
DinoflagellataKarenia brevis?
ApicomplexaHaemosporidiaPlasmodium falciparumnnaC
CiliophoraTetrahymena thermophilan
RhizariaCercozoaBigelowiella natansnm, pC
RetariaForaminiferaReticulomyxa filosa?
ExcavataMetamonadaFornicataDiplomonadidaGiardia lambliannC
ParabasaliaTrichomonas vaginalisnC
DiscobaJakobidaReclinomonas americanam
DiscicristataHeteroloboseaNaegleria gruberin
EuglenozoaEugleneaEuglena mutabilis?
TrypanosomatidaTrypanosoma bruceinamaC
Relation to supergroups unclearApusomonadidaThecamonas trahensnnmC
CryptophyceaeGuillardia thetanm, pC
HaptophytaPrymnesiophyceaeEmiliania huxleyim
Nucleus Encoded
Organelle Encoded
Nuclear RNase P
Organellar RNase P
RNP RNase P
PRORP
RNP RNase P
SupergroupsSubgroupsRepresentative speciesP RNAP proteinNuclearOrganellarP proteinP RNA
OpisthokontaHolozoaMetazoaAnimaliaBilateriaHomo sapiensnanamaC
RadiataAcropora digitiferanmC
PoriferaAmphimedon queenslandicann?
PlacozoaTrichoplax adhaerensnnmC
ChoanomonadaMonosiga brevicollisnmC
FilastereaCapsaspora owczarzakinmC
IchthyosporeaAmoebidium parasiticumnnmC
Sphaeroforma arcticanmC
NucletmyceaNucleariaNuclearia simplex
FungiMicrosporidiaEncephalitozoon cuniculinnC
ChytridiomycotaSpizellomyces punctatusn−n
BlastocladialesAllomyces macrogynusn
MucoromycotinaRhizopus oryzaennmC
MortierellaceaeMortierella verticillatannmC
DikaryaAscomycotaSaccharomyces cerevisiaenana(m)a(m)aC
BasidiomycotaPostia placentann(m)aC
AmoebozoaDiscoseaAcanthamoeba castellaniinn
ArchamoebaeEntamoeba disparnC
MyxogastriaPhysarum polycephalumn
DictyosteliaDictyostelium discoideumnana
ArchaeplastidaGlaucophytaCyanophora paradoxan′np
RhodophyceaeBangialesPorphyra purpureap
CyanidialesCyanidioschyzon merolaep
FlorideophycidaeChondrus crispusp
PorphyridiophyceaePorphyridium purpureumn′p
Chloroplastida (Viridiplantae)ChlorophytaTrebouxiophyceaeChlorella variabilisnm, pC
ChlorophyceaeChlamydomonas reinhardtiinm, pC
MamiellophyceaeOstreococcus taurinam, pm, pC
CharophytaStreptophytaArabidopsis thaliananam, paC
SARStramenopilesBlastocystisBlastocystis hominis?
LabyrinthulomycetesSchizochytrium aggregatumm
PelagophyceaeAureococcus anophagefferensnm, pC
EustigmatalesNannochloropsis gaditananm, pC
PeronosporomycetesPhytophthora sojaenmC
PhaeophyceaeEctocarpus siliculosusnm, pC
DiatomeaThalassiosira pseudonananm, pC
AlveolataProtalveolataPerkinsus marinus?
DinoflagellataKarenia brevis?
ApicomplexaHaemosporidiaPlasmodium falciparumnnaC
CiliophoraTetrahymena thermophilan
RhizariaCercozoaBigelowiella natansnm, pC
RetariaForaminiferaReticulomyxa filosa?
ExcavataMetamonadaFornicataDiplomonadidaGiardia lambliannC
ParabasaliaTrichomonas vaginalisnC
DiscobaJakobidaReclinomonas americanam
DiscicristataHeteroloboseaNaegleria gruberin
EuglenozoaEugleneaEuglena mutabilis?
TrypanosomatidaTrypanosoma bruceinamaC
Relation to supergroups unclearApusomonadidaThecamonas trahensnnmC
CryptophyceaeGuillardia thetanm, pC
HaptophytaPrymnesiophyceaeEmiliania huxleyim

n, m, p, and a indicate the identification of sequences in the respective phylogenetic subgroup and their predicted or experimentally verified localization to either the nucleus, mitochondria, plastids or apicoplasts, respectively; (m), the corresponding genes are found in some mitochondrial genomes, but not for all species; ?, nuclear-encoded sequences for which localization predictions could not be obtained; a, lineages for which RNase P enzymes were experimentally validated. ′ and −, P RNA candidates with some (′) or more severe (−) deviation from the consensus. Empty cells correspond to lineages where RNase P-related sequences could not be found. Gray cells correspond to lineages in which the organelles do not have a genome. Light gray cells correspond to lineages for which nuclear genome sequencing projects are not complete, although partial sequence information is available. Finally, C indicates the correlation between the predicted occurrence of a given type of enzyme and the absence of the other one (RNP or PRORP RNase P) in a specific lineage and/or compartment.

Table 1.

Overview of the occurrence of RNP and PRORP RNase P enzymes in nuclei and organelles throughout the eukaryal tree.

Nucleus Encoded
Organelle Encoded
Nuclear RNase P
Organellar RNase P
RNP RNase P
PRORP
RNP RNase P
SupergroupsSubgroupsRepresentative speciesP RNAP proteinNuclearOrganellarP proteinP RNA
OpisthokontaHolozoaMetazoaAnimaliaBilateriaHomo sapiensnanamaC
RadiataAcropora digitiferanmC
PoriferaAmphimedon queenslandicann?
PlacozoaTrichoplax adhaerensnnmC
ChoanomonadaMonosiga brevicollisnmC
FilastereaCapsaspora owczarzakinmC
IchthyosporeaAmoebidium parasiticumnnmC
Sphaeroforma arcticanmC
NucletmyceaNucleariaNuclearia simplex
FungiMicrosporidiaEncephalitozoon cuniculinnC
ChytridiomycotaSpizellomyces punctatusn−n
BlastocladialesAllomyces macrogynusn
MucoromycotinaRhizopus oryzaennmC
MortierellaceaeMortierella verticillatannmC
DikaryaAscomycotaSaccharomyces cerevisiaenana(m)a(m)aC
BasidiomycotaPostia placentann(m)aC
AmoebozoaDiscoseaAcanthamoeba castellaniinn
ArchamoebaeEntamoeba disparnC
MyxogastriaPhysarum polycephalumn
DictyosteliaDictyostelium discoideumnana
ArchaeplastidaGlaucophytaCyanophora paradoxan′np
RhodophyceaeBangialesPorphyra purpureap
CyanidialesCyanidioschyzon merolaep
FlorideophycidaeChondrus crispusp
PorphyridiophyceaePorphyridium purpureumn′p
Chloroplastida (Viridiplantae)ChlorophytaTrebouxiophyceaeChlorella variabilisnm, pC
ChlorophyceaeChlamydomonas reinhardtiinm, pC
MamiellophyceaeOstreococcus taurinam, pm, pC
CharophytaStreptophytaArabidopsis thaliananam, paC
SARStramenopilesBlastocystisBlastocystis hominis?
LabyrinthulomycetesSchizochytrium aggregatumm
PelagophyceaeAureococcus anophagefferensnm, pC
EustigmatalesNannochloropsis gaditananm, pC
PeronosporomycetesPhytophthora sojaenmC
PhaeophyceaeEctocarpus siliculosusnm, pC
DiatomeaThalassiosira pseudonananm, pC
AlveolataProtalveolataPerkinsus marinus?
DinoflagellataKarenia brevis?
ApicomplexaHaemosporidiaPlasmodium falciparumnnaC
CiliophoraTetrahymena thermophilan
RhizariaCercozoaBigelowiella natansnm, pC
RetariaForaminiferaReticulomyxa filosa?
ExcavataMetamonadaFornicataDiplomonadidaGiardia lambliannC
ParabasaliaTrichomonas vaginalisnC
DiscobaJakobidaReclinomonas americanam
DiscicristataHeteroloboseaNaegleria gruberin
EuglenozoaEugleneaEuglena mutabilis?
TrypanosomatidaTrypanosoma bruceinamaC
Relation to supergroups unclearApusomonadidaThecamonas trahensnnmC
CryptophyceaeGuillardia thetanm, pC
HaptophytaPrymnesiophyceaeEmiliania huxleyim
Nucleus Encoded
Organelle Encoded
Nuclear RNase P
Organellar RNase P
RNP RNase P
PRORP
RNP RNase P
SupergroupsSubgroupsRepresentative speciesP RNAP proteinNuclearOrganellarP proteinP RNA
OpisthokontaHolozoaMetazoaAnimaliaBilateriaHomo sapiensnanamaC
RadiataAcropora digitiferanmC
PoriferaAmphimedon queenslandicann?
PlacozoaTrichoplax adhaerensnnmC
ChoanomonadaMonosiga brevicollisnmC
FilastereaCapsaspora owczarzakinmC
IchthyosporeaAmoebidium parasiticumnnmC
Sphaeroforma arcticanmC
NucletmyceaNucleariaNuclearia simplex
FungiMicrosporidiaEncephalitozoon cuniculinnC
ChytridiomycotaSpizellomyces punctatusn−n
BlastocladialesAllomyces macrogynusn
MucoromycotinaRhizopus oryzaennmC
MortierellaceaeMortierella verticillatannmC
DikaryaAscomycotaSaccharomyces cerevisiaenana(m)a(m)aC
BasidiomycotaPostia placentann(m)aC
AmoebozoaDiscoseaAcanthamoeba castellaniinn
ArchamoebaeEntamoeba disparnC
MyxogastriaPhysarum polycephalumn
DictyosteliaDictyostelium discoideumnana
ArchaeplastidaGlaucophytaCyanophora paradoxan′np
RhodophyceaeBangialesPorphyra purpureap
CyanidialesCyanidioschyzon merolaep
FlorideophycidaeChondrus crispusp
PorphyridiophyceaePorphyridium purpureumn′p
Chloroplastida (Viridiplantae)ChlorophytaTrebouxiophyceaeChlorella variabilisnm, pC
ChlorophyceaeChlamydomonas reinhardtiinm, pC
MamiellophyceaeOstreococcus taurinam, pm, pC
CharophytaStreptophytaArabidopsis thaliananam, paC
SARStramenopilesBlastocystisBlastocystis hominis?
LabyrinthulomycetesSchizochytrium aggregatumm
PelagophyceaeAureococcus anophagefferensnm, pC
EustigmatalesNannochloropsis gaditananm, pC
PeronosporomycetesPhytophthora sojaenmC
PhaeophyceaeEctocarpus siliculosusnm, pC
DiatomeaThalassiosira pseudonananm, pC
AlveolataProtalveolataPerkinsus marinus?
DinoflagellataKarenia brevis?
ApicomplexaHaemosporidiaPlasmodium falciparumnnaC
CiliophoraTetrahymena thermophilan
RhizariaCercozoaBigelowiella natansnm, pC
RetariaForaminiferaReticulomyxa filosa?
ExcavataMetamonadaFornicataDiplomonadidaGiardia lambliannC
ParabasaliaTrichomonas vaginalisnC
DiscobaJakobidaReclinomonas americanam
DiscicristataHeteroloboseaNaegleria gruberin
EuglenozoaEugleneaEuglena mutabilis?
TrypanosomatidaTrypanosoma bruceinamaC
Relation to supergroups unclearApusomonadidaThecamonas trahensnnmC
CryptophyceaeGuillardia thetanm, pC
HaptophytaPrymnesiophyceaeEmiliania huxleyim

n, m, p, and a indicate the identification of sequences in the respective phylogenetic subgroup and their predicted or experimentally verified localization to either the nucleus, mitochondria, plastids or apicoplasts, respectively; (m), the corresponding genes are found in some mitochondrial genomes, but not for all species; ?, nuclear-encoded sequences for which localization predictions could not be obtained; a, lineages for which RNase P enzymes were experimentally validated. ′ and −, P RNA candidates with some (′) or more severe (−) deviation from the consensus. Empty cells correspond to lineages where RNase P-related sequences could not be found. Gray cells correspond to lineages in which the organelles do not have a genome. Light gray cells correspond to lineages for which nuclear genome sequencing projects are not complete, although partial sequence information is available. Finally, C indicates the correlation between the predicted occurrence of a given type of enzyme and the absence of the other one (RNP or PRORP RNase P) in a specific lineage and/or compartment.

In brief, a P RNA and RPP21 are prevalent among the Holozoa subgroup of Opisthokonta. Within metazoans, P RNA candidates were newly identified in the more basal Placozoa, Porifera, and in radially symmetric animals (supplementary figs. S1–S5, Supplementary Material online). In Nucletmycea, P RNAs are identifiable in all branches except for Nuclearia. Among Amoebozoa, nuclear RNP RNase P is generally present. Relative to previous analyses (Marquez et al. 2005; Piccinelli et al. 2005), we predicted additional P RNAs and RPP21 homologs in Archamoebae and Dictyostelia. In contrast, within the photosynthetic supergroup Archaeplastida (plants and algae with chloroplasts of primary endosymbiotic origin), RNP RNase P appears absent from the nuclei of Chloroplastida. However, P RNAs are predicted in glaucophytes and in rhodophytes. In the SAR (Stramenopiles, Alveolata, Rhizaria) group, P RNA and RPP21 were not identified in Stramenopiles, consistent with previous studies (Hartmann E and Hartmann RK 2003; Piccinelli et al. 2005; Rosenblad et al. 2006), but were found in Ciliophora and Apicomplexa genomes (Alveolata). In Excavata, the occurrence of nuclear RNP RNase P is widespread, but appears to have been lost in Euglenozoa. In Haptophyta and Cryptophyceae, P RNA or RPP21 could not be identified; yet, genome information is scarce in these clades and it remains unclear whether this is due to the loss of RNP RNase P or to structurally highly deviant P RNA and RPP21 homologs.

Incidence of Organellar Ribonucleoprotein RNase P

Mitochondria (mt) and plastids (pls) possess their own genome coding for a complete or partial set of tRNAs. They originated from primary endosymbiosis with an ancestral α-proteobacterium and a cyanobacterium, respectively, yet pls also derive from secondary or tertiary endosymbiosis in various groups. It is thus not surprising to find bacterial-like P RNAs still encoded in some organellar genomes. Organelle RNP RNase P, however, is particularly diverse (Rossmanith 2012) and P RNAs are highly degenerate in some cases (Seif et al. 2005). We have (re)analyzed the occurrence of mt and pl–P RNAs in organelle genomes throughout Eukarya as well as the occurrence of RnpA and Rpm2, two proteins of organellar RNP RNase P. The comprehensive list of all identified organellar P RNAs and proteins is given in supplementary table S1, Supplementary Material online, and summarized in table 1.

In short, in the supergroup Opisthokonta, no P RNA gene was found in the mitochondrial genomes of Holozoa. Most mitochondrial genes were found in the fungal lineage particularly in saccharomycetaceaen species. Among Archaeplastida, a patchy occurrence of P RNAs was found in organellar genomes of phylogenetically basal alga including Glaucophyta, Rhodophyceae, and Chlorophyta. No P RNA gene was found in Streptophyta. Most, if not all, pl-encoded P RNAs were found in primary photosynthetic Eukarya. In Excavata, P RNAs were only found in jakobid mtDNAs (Seif et al. 2006). Finally, in the groups of amoebozoa and SAR, organellar P RNA appears to be scarce. All in all, organelle P RNA occurrence is patchy. In some phyla, they were either lost or their sequences have diverged to an extent that makes them undetectable by recognition algorithms used here. Protein subunits of these enzymes are even more elusive. The subunits previously identified are bacterial-type RNase P proteins (RnpA) and a pentatricopeptide repeat (PPR) protein called Rpm2, both nuclear encoded and unrelated to PRORP. Within the fungal branch, Rpm2 was shown to be part of mitochondrial RNase P in S. cerevisiae (Morales et al. 1992; Daoud et al. 2012). Close Rpm2 homologs are only found in Saccharomycetales (supplementary fig. S6, Supplementary Material online). In Archaeplastida, no P protein of bacterial origin is encoded in any organellar genome, although rnpA-like genes are encoded in several nuclear genomes in Mammiellophyceae of the Chlorophyta subgroup, (Lai et al. 2011) and these RnpA proteins are predicted to localize to organelles (supplementary table S2, Supplementary Material online). Our analysis and three-dimensional structure predictions revealed that these algae RnpAs are characterized by N- and C-terminal extensions not present in bacterial RnpAs (supplementary figs. S7 and Supplementary Data, Supplementary Material online). Their function is unknown, but might be involved in specific contacts with algae organellar P RNAs or with yet unidentified proteins.

Incidence of Protein-Only RNase P

Our analyses confirm and substantiate previous observations that a number of eukaryal groups lack RNase P genes for a nuclear and/or organellar RNP enzyme. We thus performed a systematic analysis of the distribution and localization of putative PRORP enzymes to determine whether PRORP could be the RNase P in these lineages/compartments. As a prerequisite, we had to define robust features characterizing PRORP. Candidates were only considered as genuine PRORPs when their architecture included a specific C-terminal NYN (N4BP1, YacP-like Nuclease) metallonuclease domain presumably originating from the bacterial ribonuclease yacP (Anantharaman and Aravind 2006), an N-terminal α-super helical domain containing PPR motifs (Small et al. 2004) as revealed by systematic structure predictions and a bipartite zinc-binding module connecting the two main domains. Further signatures are present in specific phyla. Their occurrence might point out additionally acquired functions or interactions with phylum-specific proteins that remain to be identified (fig. 1).

Fig. 1.

Description of the conserved features defining PRORP proteins. (A) Schematic representation of the different domains of PRORP. Sequence logos of residue conservation for the subdomains involved in zinc binding, as well as for a plant-specific glycine-rich insertion and for a “hydrophobic domain” conserved in organisms that contain a plastid (or had contained a plastid) were generated with WebLogo 3. The number of sequences analyzed and the percentage of sequences originating from animals (Metazoa), plants (land plants), or other organisms (Chlorophyta, Stramenopiles, Alveolata, Cryptophyceae, Haptophyta, Rhizaria, Choanoflagellates, Filastera, Ichtyosporea) are as follows from left to right: Plant-specific insertion: 169 sequences (100% land plants); N-terminal ½ Zn binding domain 1: 275 sequences (1/2 land plants, 1/3 metazoa, 1/6 others); hydrophobic domain: 138 sequences (60% land plants); C-terminal ½ Zn binding domain 2: 249 sequences (1/3 land plant, 1/3 metazoa, 1/3 other). OTS, organellar targeting signals (to mitochondria, plastids, or apicoplasts); NLS, nuclear localization signal. (B) Conserved residues present in the PRORP-defining NYN domain signatures, specified for different phyla. The positions of the eight residues constituting part 1 of the NYN signature of PRORP have been numbered as indicated above the first logo. Numbers between the conserved motifs indicate the distance range (in amino acids) that separate the motifs in the different PRORPs analysed. (C) Three-dimensional structure predictions for N-terminal domains of representative PRORP proteins considered in this analysis. All the putative PRORPs have an α-superhelical domain consistent with the conserved fold of PPR proteins. N-terminal extremities are shown on the left, C-terminal ones on the right.

Based on these common features, we searched for putative PRORP genes in the three domains of life. We confirmed that PRORP proteins are Eukarya specific, exclusively encoded in nuclear genomes and widely distributed, that is, found in four of the five eukaryal supergroups. The full set of putative PRORPs is given in supplementary table S1, Supplementary Material online, and summarized in table 1. Briefly, among Opisthokonta, PRORPs are present in Metazoa and all the associated lineages (Choanomonada, Filasterea, and Ichthyosporea), but absent from fungi and associated lineages. No PRORP could be identified in the supergroup of Amoebozoa. Among Archaeplastida, PRORP was not found in the basal groups such as Glaucophyta and Rhodophyta, but was found in all Chlorophyta and Charophyta as single genes, while in Embryophyta, more than two PRORPs were typically found. In Spermatophyta, PRORP sequences can be subdivided into three evolutionary distinct clusters that we term cluster I, II, and III (supplementary fig. S9, Supplementary Material online). Most of the species have three PRORPs with one representative of each cluster. However, the Brassicaceae (e.g., Arabidopsis) make an exception, because Arabidopsis PRORP2 and 3 both belong to cluster III. PRORPs are also found in the supergroup SAR, two to three PRORP proteins are encoded in all Stramenopiles. In Alveolata, no genes coding for PRORPs were found in ciliates, but a single gene could be identified in all Apicomplexa genomes. Among Excavata, PRORP is found in the sequenced genomes of some Discoba organisms but not in Metamonada. Although present in Euglenozoa, it is not identifiable in Heterolobosea.

To gain insight into the origin and distribution of PRORP, a phylogenetic analysis was performed. The results suggest an ancient origin of PRORP (supplementary fig. S10, Supplementary Material online). Still, in some instances PRORP might also have spread during horizontal gene transfer (HGT) events such as secondary and tertiary endosymbiosis. This might have happened, for example, in stramenopiles where, among individual species, multiple PRORPs cluster in evolutionary distinct groups (supplementary fig. S10, Supplementary Material online).

Although the prevalence of PRORP in Eukarya could be established, understanding the distribution of RNP and PRORP in specific compartments requires to know the precise subcellular localization of PRORPs in the respective lineages. To gain such information, we applied localization prediction tools to full-length PRORP sequences. The results are compiled in supplementary table S2, Supplementary Material online, and summarized in table 1. In short, in Opisthonkonta, all animal PRORPs are mitochondrial. In green algae single PRORP genes might encode both nuclear and organellar PRORPs expressed by alternative translation starts. In land plants, cluster III contains nuclear orthologs of PRORP, while cluster I and II PRORPs are predicted to be organellar. In other groups, SAR, Excavata, Crypthophyceae, or Haptophyta, multiple PRORPs can be targeted to mt and nuclei, or a single PRORP can be found in specific compartments as, for example, in the apicoplast of apicomplexan. Overall, the predicted localizations confirm that PRORP proteins are not restricted to organelles as initially envisaged (Lai et al. 2010), but demonstrates that they are also widespread in nuclei.

Conclusions and Possible Scenarios for the Evolution of RNase P Distribution

In most instances our analyses revealed a correlation between the predicted occurrence of a given type of enzyme (RNP RNase P or PRORP) and the absence of the other one in a specific lineage and/or compartment. The most divergent examples are fungi, where RNP enzymes are active in both mt and nuclei while PRORP is absent, and Streptophyta or Trypanosomatida, where PRORPs are found in organelles and nuclei, whereas RNP genes are absent. Similar correlations are summarized in table 1 for all Eukarya groups.

Our analysis implies that PRORP might have evolved very early during eukaryal evolution, in an organism at the root of modern Eukarya (fig. 2), although its distribution points to some HGT events as well. It appears likely that the fusion of PPR, NYN, and all the features defining PRORP took place only once during evolution. The RNP and protein-only forms of RNase P thus probably coexisted in an early eukaryote, a functional redundancy that, however, might not have persisted in any organism to the present. We did not find solid evidence for this coexistence within the same compartment, although it cannot be ruled out for some Mamiellophyceae, where isoforms of PRORP might be targeted to both nuclei and organelles while RNP RNase P has been retained in organelles. RNP was kept in some organisms (fungi) or compartment (nucleus of metazoa) and protein-only enzymes were not retained. In these organisms, RNPs might have gained additional functions that could not be provided by PRORP, for example, as observed in human nuclei with the requirement of RNP RNase P for the formation of RNA polymerase III initiation complexes (Serruya et al. 2015). In contrast, PRORP was kept in other organisms (some chlorophytes, streptophytes, trypanosomids) or in specific compartments (nucleus of other chlorophytes and mt of metazoans) and RNPs were lost. Similarly, PRORPs targeted to organelles might have coexisted with RNP RNases P encoded in organellar genomes. P RNA genes might have been lost in the course of rearrangements of organellar genomes, consolidating PRORP as the RNase P enzyme in this compartment.

Fig. 2.

Distribution of RNP and PRORP RNase P enzymes in the eukaryal domain of life. Relations between eukaryal groups are schematically indicated according to Petersen et al. (2014). R and P indicate the occurrence of RNP and PRORP RNase P enzymes in the respective groups, based on the study presented here. Crossing out P or R indicates putative evolutionary events associated with the loss of PRORP or (nuclear) RNP RNase P. The question mark indicates an example where limited genomic data prevented conclusions as to the occurrence of the given enzyme type in the respective group. The diagram highlights how the distribution of RNase P seemingly involved multiple events of losses of either PRORP or RNP RNase P.

In animal and plant lineages, RNase P distribution followed two different routes. Unicellular organisms basal to Metazoa (Ichtyosporea, Filasterea, Choanomonada) seem to have retained PRORP proteins for mitochondrial RNase P function and this status was also preserved in all metazoan species. In contrast, unicellular organisms basal to Chlorophyta seem to have initially retained PRORP enzymes only for nuclear RNase P activity. Then, in more recent species of the Chloroplastida lineage, PRORP also took over the organellar RNase P function.

In conclusion, looking at the global picture, since its origin PRORP seems to have been an invasive enzyme, taking over the function of ancestral RNP RNase P in several eukaryal groups, in entire organisms, or in given cellular compartments. The evolutionary trend to replace RNP with PRORP becomes plausible if one considers its capability to instantly replace RNP enzymes in tRNA biogenesis, as experimentally demonstrated for the E. coli and yeast systems (Gobert et al. 2010; Taschner et al. 2012; Weber et al. 2014). This evolution may witness a still continuing transitional process from the RNA to the protein world.

Materials and Methods

Identification of Nuclear-Encoded RNase P RNAs

We identified P RNAs using Infernal (Nawrocki and Eddy 2013) with an E-value threshold of 1 × 108 based on the RFAM 12.0 (Nawrocki et al. 2015) models RF00009 (Nuclear RNase P) and RF01577 (Plasmodium RNase P). In addition, we used the tool Bcheck (Yusuf et al. 2010) with default parameters. The predictions were curated and assessed manually for their conserved core. This ensemble of methods also allows discriminating P RNAs from MRP RNAs.

Search for Homologs of the RNP RNase P-Specific Protein Subunit RPP21

We selected reference sequences from several sources: 1) The Rpr2 alignment provided by Rosenblad et al. (2006), 2) the seed alignment provided for the PFAM family PF04032 (RNase P Rpr2/Rpp21/SNM1 subunit domain) (Finn et al. 2011), and 3) WormBase version WS247 (Harris et al. 2010) gene Y37E11B.6 (rpp21). Reference domains were identified and a scoring algorithm was implemented based on regular expressions.

Identification of Rpm2p and Mitochondrial P RNAs in Fungi

The HMMER algorithm (Finn et al. 2011) as well as BLAST searches (Altschul et al. 1990) were used to retrieve proteins with homology to the Rpm2 domain as defined in PFAM (Finn et al. 2014). Putative rpm1 was retrieved from unannotated fungal mitochondrial genomes with RNAweasel (Gautheret and Lambert 2001).

PRORP Sequence Analysis and Structural Predictions

PRORP sequences were retrieved using the BLAST tool in NCBI (National Center for Biotechnology Information), Ensembl, Bogas, Phytozome, JGI, and Broad. The proteins were aligned using MUSCLE (Edgar 2004). The sequences of these domains were then retrieved and realigned with MUSCLE before using WebLogo 3 (Crooks et al. 2004) to highlight the conserved residues. Protein structures were predicted using the Phyre2 algorithm in the intensive modeling mode (Kelley and Sternberg 2009).

Subcellular Localization Predictions

Subcellular localization predictions were determined for most proteins with TargetP, Predotar, and MultiLoc2 when applicable (Small et al. 2004; Emanuelsson et al. 2007; Blum et al. 2009). PredAlgo was used for PRORP sequences of green algae (Chlorophyta) (Tardif et al. 2012). PlasmoAP and PATS were used for Apicomplexa PRORP in order to determine if they possess an apicoplast targeting peptide (Zuegge et al. 2001; Foth et al. 2003).

Phylogenetic Analyses of PRORP

Phylogenetic analysis of PRORP protein sequences were performed with the maximum-likelihood method with 100 bootstrap replicates (Dereeper et al. 2008).

Acknowledgments

This work was supported by the “Centre National de la Recherche Scientifique,” the University of Strasbourg, the Medical University of Vienna, and the Philipps-University of Marburg. We thank Prof. B.F. Lang for critical discussion on the evolution of RNase P. This work was supported by Agence Nationale de la Recherche (grant PRO-RNase P, ANR 11 BSV8 008 01 to P.G.), LabEx consortium “MitoCross,” the German Research Foundation (grants HA 1672/17-1 and IRTG 1384 to R.K.H.), and the Austrian Science Fund (grant I299 to W.R).

References

Altschul
SF
Gish
W
Miller
W
Myers
EW
Lipman
DJ
.
1990
.
Basic local alignment search tool
.
J Mol Biol.
215
:
403
410
.

Anantharaman
V
Aravind
L
.
2006
.
The NYN domains: novel predicted RNAses with a PIN domain-like fold
.
RNA Biol.
3
:
18
27
.

Blum
T
Briesemeister
S
Kohlbacher
O
.
2009
.
MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction
.
BMC Bioinformatics
10
:
274
.

Crooks
GE
Hon
G
Chandonia
JM
Brenner
SE
.
2004
.
WebLogo: a sequence logo generator
.
Genome Res.
14
:
1188
1190
.

Daoud
R
Forget
L
Lang
BF
.
2012
.
Yeast mitochondrial RNase P, RNase Z and the RNA degradosome are part of a stable supercomplex
.
Nucleic Acids Res.
40
:
1728
1736
.

Dereeper
A
Guignon
V
Blanc
G
Audic
S
Buffet
S
Chevenet
F
Dufayard
JF
Guindon
S
Lefort
V
Lescot
M
et al. .
2008
.
Phylogeny.fr: robust phylogenetic analysis for the non-specialist
.
Nucleic Acids Res.
36
:
W465
W469
.

Edgar
RC
.
2004
.
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
.
BMC Bioinformatics
5
:
113
.

Ellis
JC
Brown
JW
.
2010
.
The evolution of RNase P and its RNA
. In:
Liu F, Altman S, editors. Ribonuclease P. New York: Springer. p. 17–40
.

Emanuelsson
O
Brunak
S
von Heijne
G
Nielsen
H
.
2007
.
Locating proteins in the cell using TargetP, SignalP and related tools
.
Nat Protoc.
2
:
953
971
.

Finn
RD
Bateman
A
Clements
J
Coggill
P
Eberhardt
RY
Eddy
SR
Heger
A
Hetherington
K
Holm
L
Mistry
J
et al. .
2014
.
Pfam: the protein families database
.
Nucleic Acids Res.
42
:
D222
D230
.

Finn
RD
Clements
J
Eddy
SR
.
2011
.
HMMER web server: interactive sequence similarity searching
.
Nucleic Acids Res.
39
:
W29
W37
.

Foth
BJ
Ralph
SA
Tonkin
CJ
Struck
NS
Fraunholz
M
Roos
DS
Cowman
AF
McFadden
GI
.
2003
.
Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum
.
Science
299
:
705
708
.

Gautheret
D
Lambert
A
.
2001
.
Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles
.
J Mol Biol.
313
:
1003
1011
.

Gobert
A
Gutmann
B
Taschner
A
Gößringer
M
Holzmann
J
Hartmann
RK
Rossmanith
W
Giegé
P
.
2010
.
A single Arabidopsis organellar protein has RNase P activity
.
Nat Struct Mol Biol.
17
:
740
744
.

Gobert
A
Pinker
F
Fuchsbauer
O
Gutmann
B
Boutin
R
Roblin
P
Sauter
C
Giegé
P
.
2013
.
Structural insights into protein-only RNase P complexed with tRNA
.
Nat Commun.
4
:
1353
.

Gutmann
B
Gobert
A
Giegé
P
.
2012
.
PRORP proteins support RNase P activity in both organelles and the nucleus in Arabidopsis
.
Genes Dev.
26
:
1022
1027
.

Harris
TW
Antoshechkin
I
Bieri
T
Blasiar
D
Chan
J
Chen
WJ
De La Cruz
N
Davis
P
Duesbury
M
Fang
R
et al. .
2010
.
WormBase: a comprehensive resource for nematode research
.
Nucleic Acids Res.
38
:
D463
D467
.

Hartmann
E
Hartmann
RK
.
2003
.
The enigma of ribonuclease P evolution
.
Trends Genet.
19
:
561
569
.

Hartmann
RK
Gößringer
M
Späth
B
Fischer
S
Marchfelder
A
.
2009
.
The making of tRNAs and more—RNase P and tRNase Z
.
Prog Mol Biol Transl Sci.
85
:
319
368
.

Holzmann
J
Frank
P
Loffler
E
Bennett
KL
Gerner
C
Rossmanith
W
.
2008
.
RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme
.
Cell
135
:
462
474
.

Jarrous
N
Gopalan
V
.
2010
.
Archaeal/eukaryal RNase P: subunits, functions and RNA diversification
.
Nucleic Acids Res.
38
:
7885
7894
.

Kelley
LA
Sternberg
MJE
.
2009
.
Protein structure prediction on the Web: a case study using the Phyre server
.
Nat Protoc.
4
:
363
371
.

Lai
LB
Bernal-Bayard
P
Mohannath
G
Lai
SM
Gopalan
V
Vioque
A
.
2011
.
A functional RNase P protein subunit of bacterial origin in some eukaryotes
.
Mol Genet Genomics.
286
:
359
369
.

Lai
LB
Vioque
A
Kirsebom
LA
Gopalan
V
.
2010
.
Unexpected diversity of RNase P, an ancient tRNA processing enzyme: challenges and prospects
.
FEBS Lett.
584
:
287
296
.

Liu
F
Altman
S
.
2010
Ribonuclease P.
New York
:
Springer
.

Marquez
SM
Harris
JK
Kelley
ST
Brown
JW
Dawson
SC
Roberts
EC
Pace
NR
.
2005
.
Structural implications of novel diversity in eucaryal RNase P RNA
.
RNA
11
:
739
751
.

Marvin
MC
Engelke
DR
.
2009a
.
Broadening the mission of an RNA enzyme
.
J Cell Biochem.
108
:
1244
1251
.

Marvin
MC
Engelke
DR
.
2009b
.
RNase P: increased versatility through protein complexity?
RNA Biol.
6
:
40
42
.

Morales
MJ
Dang
YL
Lou
YC
Sulo
P
Martin
NC
.
1992
.
A 105-kDa protein is required for yeast mitochondrial RNase P activity
.
Proc Natl Acad Sci U S A.
89
:
9875
9879
.

Nawrocki
EP
Burge
SW
Bateman
A
Daub
J
Eberhardt
RY
Eddy
SR
Floden
EW
Gardner
PP
Jones
TA
Tate
J
et al. .
2015
.
Rfam 12.0: updates to the RNA families database
.
Nucleic Acids Res.
43
:
D130
D137
.

Nawrocki
EP
Eddy
SR
.
2013
.
Infernal 1.1: 100-fold faster RNA homology searches
.
Bioinformatics
29
:
2933
2935
.

Petersen
J
Ludewig
AK
Michael
V
Bunk
B
Jarek
M
Baurain
D
Brinkmann
H
.
2014
.
Chromera velia, endosymbioses and the rhodoplex hypothesis—plastid evolution in cryptophytes, alveolates, stramenopiles, and haptophytes (CASH lineages)
.
Genome Biol Evol.
6
:
666
684
.

Piccinelli
P
Rosenblad
MA
Samuelsson
T
.
2005
.
Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes
.
Nucleic Acids Res.
33
:
4485
4495
.

Pinker
F
Bonnard
G
Gobert
A
Gutmann
B
Hammani
K
Sauter
C
Gegenheimer
PA
Giegé
P
.
2013
.
PPR proteins shed a new light on RNase P biology
.
RNA Biol.
10
:
1457
1468
.

Rosenblad
MA
Lopez
MD
Piccinelli
P
Samuelsson
T
.
2006
.
Inventory and analysis of the protein subunits of the ribonucleases P and MRP provides further evidence of homology between the yeast and human enzymes
.
Nucleic Acids Res.
34
:
5145
5156
.

Rossmanith
W
.
2012
.
Of P and Z: mitochondrial tRNA processing enzymes
.
Biochim Biophys Acta.
1819
:
1017
1026
.

Seif
E
Cadieux
A
Lang
BF
.
2006
.
Hybrid E. coli—mitochondrial ribonuclease P RNAs are catalytically active
.
RNA
12
:
1661
1670
.

Seif
E
Leigh
J
Liu
Y
Roewer
I
Forget
L
Lang
BF
.
2005
.
Comparative mitochondrial genomics in zygomycetes: bacteria-like RNase P RNAs, mobile elements and a close source of the group I intron invasion in angiosperms
.
Nucleic Acids Res.
33
:
734
744
.

Serruya
R
Orlovetskie
N
Reiner
R
Dehtiar-Zilber
Y
Wesolowski
D
Altman
S
Jarrous
N
.
2015
.
Human RNase P ribonucleoprotein is required for formation of initiation complexes of RNA polymerase III
.
Nucleic Acids Res.
43
:
5442
5450
.

Small
I
Peeters
N
Legeai
F
Lurin
C
.
2004
.
Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences
.
Proteomics
4
:
1581
1590
.

Tardif
M
Atteia
A
Specht
M
Cogne
G
Rolland
N
Brugiere
S
Hippler
M
Ferro
M
Bruley
C
Peltier
G
et al. .
2012
.
PredAlgo: a new subcellular localization prediction tool dedicated to green algae
.
Mol Biol Evol.
29
:
3625
3639
.

Taschner
A
Weber
C
Buzet
A
Hartmann
RK
Hartig
A
Rossmanith
W
.
2012
.
Nuclear RNase P of Trypanosoma brucei: a single protein in place of the multicomponent RNA-protein complex
.
Cell Reports
2
:
19
25
.

Walker
SC
Marvin
MC
Engelke
DR
.
2010
.
Eukaryote RNase P and RNase MRP
. In:
Liu F, Altman S, editors. Ribonuclease P, protein reviews. Vol. 10. New York: Springer. p. 173–202
.

Weber
C
Hartig
A
Hartmann
RK
Rossmanith
W
.
2014
.
Playing RNase P evolution: swapping the RNA catalyst for a protein reveals functional uniformity of highly divergent enzyme forms
.
PLoS Genet.
10
:
e1004506
.

Yusuf
D
Marz
M
Stadler
PF
Hofacker
IL
.
2010
.
Bcheck: a wrapper tool for detecting RNase P RNA genes
.
BMC Genomics
11
:
432
.

Zuegge
J
Ralph
S
Schmuker
M
McFadden
GI
Schneider
G
.
2001
.
Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins
.
Gene
280
:
19
26

Author notes

Associate editor: Claus Wilke

Supplementary data