A Mitosome With Distinct Metabolism in the Uncultured Protist Parasite Paramikrocytos canceri (Rhizaria, Ascetosporea)

Abstract Ascetosporea are endoparasites of marine invertebrates that include economically important pathogens of aquaculture species. Owing to their often-minuscule cell sizes, strict intracellular lifestyle, lack of cultured representatives and minimal availability of molecular data, these unicellular parasites remain poorly studied. Here, we sequenced and assembled the genome and transcriptome of Paramikrocytos canceri, an endoparasite isolated from the European edible crab Cancer pagurus. Using bioinformatic predictions, we show that P. canceri likely possesses a mitochondrion-related organelle (MRO) with highly reduced metabolism, resembling the mitosomes of other parasites but with key differences. Like other mitosomes, this MRO is predicted to have reduced metabolic capacity and lack an organellar genome and function in iron–sulfur cluster (ISC) pathway-mediated Fe–S cluster biosynthesis. However, the MRO in P. canceri is uniquely predicted to produce ATP via a partial glycolytic pathway and synthesize phospholipids de novo through the CDP-DAG pathway. Heterologous gene expression confirmed that proteins from the ISC and CDP-DAG pathways retain mitochondrial targeting sequences that are recognized by yeast mitochondria. This represents a unique combination of metabolic pathways in an MRO, including the first reported case of a mitosome-like organelle able to synthesize phospholipids de novo. Some of these phospholipids, such as phosphatidylserine, are vital in other protist endoparasites that invade their host through apoptotic mimicry.


Introduction
Mitochondria are best known as the powerhouse organelles supplying eukaryotic cells with energy in the form of ATP via oxidative phosphorylation where oxygen serves as the terminal electron acceptor (Van der Giezen and Tovar 2005;Müller et al. 2012;Makiuchi and Nozaki 2014;Stairs et al. 2015;Roger et al. 2017;Gawryluk and Stairs 2021). Mitochondria harbor several other fundamental cellular metabolisms ranging from Fe-S cluster biogenesis to the synthesis of phospholipids, quinones, steroids, amino acid, and nucleotides (Van der Giezen and Tovar 2005;Santos et al. 2018). From these complex aerobic mitochondria, a diverse range of related organelles with reduced functions (mitochondrion-related organelles; MROs) have evolved across the tree of eukaryotic life in anaerobic and microaerophilic lineages of unicellular organisms (protists) (Müller et al. 2012;Makiuchi and Nozaki 2014;Stairs et al. 2015;Roger et al. 2017;Santos et al. 2018;Gawryluk and Stairs 2021). The functional diversity of MROs covers a continuum with no clear boundaries (Müller et al. 2012;Makiuchi and Nozaki 2014;Stairs et al. 2015;Roger et al. 2017;Santos et al. 2018;Gawryluk and Stairs 2021), but a few main "classes" have been proposed (Müller et al. 2012). Along the most reduced end of the spectrum are the genome-less hydrogenosomes that produce hydrogen, have lost the respiratory chain and oxidative phosphorylation, and generate ATP exclusively via substrate-level phosphorylation (Müller et al. 2012;Stairs et al. 2015;Leger et al. 2016;Stairs et al. 2021). Even more reduced are the mitosomes, which do not produce ATP, but continue to be involved in the biosynthesis of Fe-S clusters (Tovar et al. 2003;Goldberg et al. 2008;Mi-ichi et al. 2009;Jedelský et al. 2011) or sulfate activation (Mi-ichi et al. 2009). The near ubiquity of Fe-S cluster biosynthesis pathways even in the most reduced MROs has led to the suggestion that this function is one of the fundamental roles of the MROs, although other currently poorly characterized functions such as the production of formate and methylated folate species were recently observed (Zítek et al. 2022).
MROs have been best studied in parasitic lineages of medical importance, including the hydrogenosomes of trichomonads (e.g., Trichomonas vaginalis (Müer 1993), or the mitosomes of microsporidia (e.g., Encephalitozoon cuniculi (Katinka et al. 2001), diplomonads (e.g., Giardia intestinalis (Tovar et al. 2003)) and amoebozoa (e.g., Entamoeba histolytica (Mai et al. 1999)). However, the known functional diversity of MROs has greatly expanded in recent years from genomic investigations of hitherto enigmatic and often neglected parasitic (Burki et al. 2013;Santos et al. 2018;Mathur et al. 2021;Salomaki et al. 2021), commensal, as well as free-living protists (Stairs et al. 2014;Stairs et al. 2021, Zítek et al. 2022. MROs have been reported in nearly all major eukaryotic supergroups, demonstrating remarkable convergent adaptations to life in low-oxygen or host environments. Yet, the characterization of new MROs, especially from poorly sampled supergroups, remains an important task that will enable us to better understand the tremendous plasticity of mitochondrial functions and the parallel reductive evolution of these organelles. Ascetosporea is a group of poorly known unicellular parasites that belongs to Rhizaria, which is also of the least investigated supergroups of eukaryotes (Burki and Keeling 2014;Bass et al. 2019;Biard 2023). Ascetosporean parasites infect a vast range of aquatic invertebrates, including many commercially important shellfish species such as mussels and oysters, often causing economic losses and limiting trade due GBE to biosecurity legislation (Mourton et al. 1992;Hartikainen et al. 2013;Hartikainen et al. 2014). It is a relatively diverse group, with about 50 described species and several hundreds more lineages known only from environmental sequences. These parasites are challenging to study, because they are generally very small (<5 µm), intracellular, and not easily separable from host tissues. More importantly, no cultures are available, thereby limiting the scope of experiments that can be performed. Despite these difficulties, a handful of transcriptomes of Ascetosporea have been produced, including that of the Pacific oyster parasite Mikrocytos mackini by sequencing cDNA of semipurified isolates prepared from laboratory-infected hosts (Burki et al. 2013). The phylogenomic analysis of M. mackini revealed that this parasite is extremely fast-evolving, displaying the longest branch in a phylogenetic tree among all included eukaryotes. Moreover, M. mackini is predicted to possess a highly reduced MRO only functioning in protein folding and Fe-S cluster generation via the ISC pathway. This was the first report of an MRO in Rhizaria (Burki et al. 2013). Following this discovery, a less reduced hydrogen-producing mitochondrion was described in another rhizarian lineage, the microaerophilic and free-living cercomonad Brevimastigomonas motovehiculus (Gawryluk et al. 2016).
In this study, we hypothesized that other ascetosporean parasites have adapted to intracellular lifestyle by reducing mitochondrial functions. To expand our knowledge of the functional range of mitochondria in Ascetosporea, we analyzed the genetic complement of Paramikrocytos canceri, a close relative of M. mackini (Hartikainen et al. 2014). This parasite was discovered as the agent of an emerging disease in juvenile crabs in the UK, causing hypertrophy of the antennal gland and bladder of reproductively immature crabs (Bateman et al. 2011;Thrupp et al. 2013;Hartikainen et al. 2014). In addition to crabs, targeted molecular surveys showed that P. canceri is in fact present in a range of shoreline invertebrates, although its impact on these diverse hosts remains uncertain (Hartikainen et al. 2014). Microscopic examinations of infected crab tissue showed numerous uni-and multi-nucleate plasmodial cells invading the host cytoplasm. Transmission electron microscopy images revealed that the uninucleate stages are often found near the host's mitochondria, while the plasmodial stages contain an abundance of minuscule (approx. 0.5 nm in diameter), spherical double membrane-bounded organelle which were described as putative MROs of unknown functions (Hartikainen et al. 2014).
Here, we present genomic and transcriptomic data for P. canceri isolated from host-infected tissues collected directly from the environment. Reads corresponding to P. canceri were extracted from a heterogeneous mixture of host and microbiome using a comprehensive bioinformatic workflow and were assembled into the first ascetosporean genome with high completeness. We used this genome to reconstruct the energy metabolism of P. canceri and compare that to its close relative M. mackini. Altogether, we confirm the previous TEM observations of the presence of an MRO in P. canceri (Hartikainen et al. 2014), inferring similar functions as in the mitosome of M. mackini, which probably originated in their common ancestor. However, we find that this newly described organelle is predicted to perform at least two metabolic pathways that have not been described in other reduced mitosomes: de novo phospholipid synthesis and parts of glycolysis.

Results
Genome Sequencing, Assembly, and Quality of Paramikrocytos canceri Genomes and Transcriptome To investigate the energy and mitochondrial metabolism of P. canceri, we sequenced DNA and cDNA libraries generated from both heavily infected antennal glands of diseased juvenile crabs and claw muscle tissues isolated from a healthy individual (supplementary fig. S1, Supplementary Material online). The "healthy tissue"" DNA library was used to subtract the host reads from mixed reads by mapping the DNA library of the "diseased tissue"" to the assembly generated from the DNA reads of the healthy crab ( fig. 1-healthy.host_genome box). Based on the BlobTools workflow that uses GC content, read coverage, and taxonomic identity of the contigs obtained by blast searches (Laetsch and Blaxter 2017), we discarded the non-target reads (e.g., prokaryotes, fungi, crab) from the diseased tissue DNA library (fig. 1-START 1 of the workflow). The resulting dataset highly enriched in P. canceri reads was assembled to yield a first genome assembly of the parasite (Pcanceri_genome1) (fig. 1-START 1 of the workflow). This assembly was used as reference to retrieve P. canceri reads from a previously published metagenomic Illumina library of a diseased crab tissue (Hartikainen et al. 2014), and generate a second genome assembly (Pcanceri_ genome2) (fig. 1-START 2 of the workflow). Finally, the reads of the diseased tissue RNA library that mapped to the P. canceri reference genome were assembled in the P. canceri transcriptome ( fig. 1-START 3 of the workflow). All statistics for the intermediate and final assemblies are summarized in Table 1. The two P. canceri genome assemblies showed similar N50 (∼6800 bp), GC content (∼30%) and assembly size (∼13 Mb) (Table 1). According to BUSCO values, genome completeness was estimated to be 16.1% and 23.1% using the nucleotide and protein sequences from the gene models, respectively (Table 1;  Considering the lack of BUSCO protein sets for organisms related to P. canceri, together with the highly divergent nature of the genome (that reduces the detection sensitivity of similarity searches), these values more probably reflect the lack of publicly available data than very partial genomes. Comparably low values (19.4% and 30% using the nucleotide and protein sequences, respectively- Table 1) were also observed for the transcriptome of M. mackini (Table 1), or for published genomes of other protists (e.g., Monocercomonoides exilis: BUSCO score = 34%) (Karnkowska et al. 2016).
In addition to BUSCO estimates, we used three alternative methods to assess the P. canceri genome completeness. First, we looked at the fraction of reads that mapped onto the assembled genomes and the average read coverage for each assembly ( Table 1). The reference assembly of P. canceri (P. canceri_genome1) has a high read coverage (above 700X; Table 1), and the entire read library was integrated in the assembly (100% of reads mapped onto the assembled genome- Table 1). We further compared the k-mer frequency spectra of the P. canceri reference genome and its respective read library -Bioinformatic workflow to assemble the genomic and transcriptomic data of P. canceri. The letters in yellow circles indicate the methodology used for each step of the workflow, while the arrows show the direction of the workflow from 'start' to 'stop'. The Illumina libraries and the decontamination pipeline to obtain each of the assemblies used in this study are shown. START 1 depicts the workflow that generated the first P. canceri genome assembly (Pcanceri_genome1) from the diseased DNA HiSeq library that were generated for this study. In START 2, P. canceri reads were retrieved from the diseased DNA HiSeq library generated by Hartikainen et al 2014 (Hartikainen et al. 2014) and assembled in the second P. canceri genome assembly (Pcanceri_genome2). The transcriptome of P. canceri was assembled following the workflow in START 3 (Pcanceri_transcriptome).

Table 1
Assembly Statistics The assembly were deposited on ENA (European Nucleotide Archive Databases) and their accession numbers can be consulted in column 3. The average GC content, the size of the genome assemblies, and the N50 was calculated with Quast.
The transcriptome N50, the size of the transcriptome assemblies and number of transcripts were calculated with Trinity. The average read coverage for each assembly was calculated with Samtools as the number of mapped bases divided by the assembly size. The eukaryotic database euk_9 was used to calculate the BUSCO values. The functional annotation was predicted with BLAST suit v. 2.9.0+, InterProScan v.5.30-69.0, and eggNOG-mapper v2.
online). Most of the read k-mers were assembled once and only a very low fraction of rare read k-mers is missing from the P. canceri reference assembly (see black and red frequency spectra from supplementary fig. S2, panel B Supplementary Material online). The high read coverage and high proportion of read k-mers singularly assembled suggest that P. canceri genome is well recovered and not over assembled. As a second evaluation of genome completeness, we mapped P. canceri transcripts (only those that could be confidently attributed to P. canceri, i.e., not derived from host or other identifiable contaminants) to the P. canceri reference genome assembly. A total of 81.3% clean transcripts mapped back to the P. canceri reference genome. The remaining unmapped transcripts all have undetectable homology against the nr database (using an threshold of e-value < 1e-10) and consist mostly of short tandem simple repeats. The high proportion of mapped transcripts indicates a high recovery of genes for this assembly. As a last evaluation of assembly completeness, we used a set of 127 highly conserved marker genes previously identified in M. mackini (Burki et al. 2013) to search for homologues in the P. canceri genomes, and recovered all 127 genes. Using most of these genes for a concatenated phylogenomic analysis, we confirmed the sister relationship of P. canceri and M. mackini (supplementary fig. S3, Supplementary Material online). Taken together, these analyses indicate high completeness of both P. canceri assemblies despite the low BUSCO values and fairly high assembly fragmentation as a result of short-read sequencing.
Paramikrocytos canceri is Predicted to Have a Mitosome-like Organelle that Functions in Fe-S Cluster Synthesis, Glycolysis, and de novo Phospholipid Synthesis We used our genome and transcriptome assemblies to predict the mitochondrial proteome of P. canceri. Using sequence homology, hidden Markov models, phylogenetic analysis and targeting signal predictions, we found at least 14 proteins likely to function in the mitochondrial compartment ( fig. 2). We identified components of the ISC pathway for Fe-S cluster biosynthesis, chaperone proteins, and the ATP-binding cassette transporter ATM1 that carries the byproduct of the pathway (Fe 2+ ) into to the cytoplasm (fig. 2, and supplementary Table S1, Supplementary Material online). Like for mitosomes in other organisms, we did not detect genes encoding proteins for the electron transport chain, ATP synthase, mitochondrial genome maintenance, and the Krebs cycle. As previously reported in the analysis of M. mackini, we also did not identify any ATP transporters or TIM/TOM translocon proteins (Burki et al. 2013). This suggests that TIM/TOM proteins are absent in both species or too divergent to be detected with current homology-based approaches.
In addition to the ISC pathway common to most mitosomes, we found that the MRO in P. canceri likely harbors two other pathways not known in such reduced organelles. One is glycolysis, for which we identified genes encoding a complete cytosolic glycolysis pathway but, unexpectedly, found two enzymes predicted to contain an MTS ( fig. 2, supplementary Table S1, Supplementary Material online). The first enzyme is the pay-off phase cofactor-independent phosphoglycerate mutase (PGM), which is present in two copies in P. canceri with one copy containing an MTS. The sister species M. mackini also encodes two copies of pgm gene but neither contain a MTS. The second glycolytic enzyme predicted to be targeted to the MRO in P. canceri is pyruvate kinase (PK). Additionally, in both P. canceri genomes, we identified a D-lactate dehydrogenase (ldh)  The closest predicted neighboring gene to ldh on both scaffolds has no homology in GenBank, but a clear eukaryotic ribosomal protein (rpl30) is encoded on the scaffold in P. canceri_genome 1. Taken together, these observations suggest that in P. canceri, the last steps of the pay-off phase of glycolysis (from PGM to PK) and LDH (Nývltová et al. 2015;Glancy et al. 2020) may localize to the MRO and could generate ATP, reoxidized NAD+, and lactate, and that the mitochondrial ldh may have been acquired by lateral transfer possibly from Clostridia.
The third metabolic pathway likely functioning in the MRO of P. canceri is responsible for the de novo synthesis of phospholipids via the mitochondrial CDP-DAG pathway. In P. canceri (and M. mackini), we identified genes encoding three out of the five enzymes involved in the synthesis of phosphatidic acid, phosphatidylserine (PS), and phosphatidylethanolamine (PE), each with a predicted MTS ( fig. 2 and supplementary Table S1, Supplementary Material online); these three enzymes are lysophosphatidic acid-acyltransferase (LPCAT), phosphatidylserine decarboxylase 1 (PSD1), and phosphatidylserine synthase 2 (PSS 2). In contrast, we did not detect any genes encoding proteins necessary for the synthesis of the rest of the phospholipid through the CDP-DAG pathway, that is phosphatidylcholine (PC), cardiolipin (CL), and phosphatidylinositol (PI) ( fig. 2). The other main pathway for phospholipid synthesis in eukaryotes, the Kennedy pathway in the endoplasmic reticulum (ER) that uses exogenous phospholipid, was partial. Only one enzyme (choline/ ethanolamine phosphotransferase) was detected in P. canceri and M. mackini, suggesting that these parasites are not able to generate PC nor PE from exogenous choline or ethanolamine. Yet, we were able to identify four choline transporters (predicted to function in the Golgi apparatus, ER, and lysosome) which suggest that the parasite could import choline from the host and distribute choline throughout its cell ( fig. 2

Discussion
We report the first genome and transcriptome data for the parasite P. canceri from infected crabs collected directly from the host environment. Although poorly studied, these small, intracellular, and uncultivatable parasites are part of a large undescribed diversity of ascetosporean parasites of aquatic invertebrates that likely plays important roles in controlling natural populations (Hartikainen et al. 2014;Garcia et al. 2018). These data represent one of a very few genomic datasets for Ascetosporea as a whole and, to our knowledge, the most complete genome assembly for the group. Using phylogenomic analysis of 125 marker proteins, we demonstrate the sister relationship between M. mackini and P. canceri (supplementary fig. S3, Supplementary Material online). We further investigated the mitochondrial metabolism in P. canceri and found a very reduced set of functions indicative of an MRO supporting the available microscopic observations of small double membrane-bound organelles in this parasite (Hartikainen et al. 2014). This highly reduced MRO is predicted to function in Fe-S cluster synthesis via the ISC pathway, a characteristic shared amongst almost all mitosomes known to date. However, unlike other mitosomes, this organelle is predicted to (i) harbor part of the energy pay-off phase of glycolysis and (ii) synthesize phospholipids de novo through the CDP-DAG pathway ( fig. 2, supplementary Table S1, Supplementary Material online). We also show that at least one component of the CDP-DAG pathway retains a putative mitochondrial localization signal that can be recognized by yeast cells in a heterologous expression system ( fig. 3, supplementary fig. S6, Supplementary Material online). Collectively, these predicted features suggest that the MRO in P. canceri closely resembles a mitosome but with a distinct metabolism. This is the second example of a mitosome-like organelle in Rhizaria, following the description of a mitosome in the closely related parasite M. mackini (Burki et al. 2013).
The MRO in P. canceri is the first example of a mitosomelike organelle predicted to enclose a partial glycolysis. To our knowledge, such organelle-localized glycolysis has only been reported in functionally more "complex" MROs or aerobic mitochondria from other eukaryotes. In some stramenopiles, for example, all genes involved in the payoff phase of glycolysis exist in two copies, each with a predicted cytosolic and mitochondrial subcellular localization (Liaud et al. 2000;Abrahamian et al. 2017;Río Bártulos et al. 2018). Furthermore, the mitochondrial localized triphosphate isomerase (TPI) and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) of stramenopiles are fused into a single protein equipped with an MTS. Similar fusion proteins containing clear N-terminal extensions have also been identified in other eukaryotes such as the cercozoan Paulinella chromatophora and the apuzozoan Thecamonas trahens (Nakayama et al. 2012). A partially duplicated glycolytic pathway was also predicted in the cercozoan B. motovehiculus (Gawryluk et al. 2016) and the ciliate Tetrahymena thermophila (Smith et al. 2007). The localization of the terminal steps of glycolysis in a mitosome could represent a way to obtain the necessary ATP for organelle-localized functions (e.g., protein folding). Other mitosome-containing organisms have solved the issue of ATP import in various ways. In the diplomonad G. intestinalis, ATP is produced in the parasite's cytosol through an extended glycolytic pathway and the arginine dihydrolase pathway, and imported into the organelle via an unknown transporter (Tovar et al. 2003;Jedelský et al. 2011). In microsporidia, ATP is directly harvested from the host cytoplasm and transferred through nucleoside transporter proteins into the mitosome (Tsaousis et al. 2008). In Entamoeba histolytica, glycolysis-derived ATP is imported from the cytoplasm via an ATP/ADP transporter from the mitochondrial carrier family (Chan et al. 2005). As for M. mackini, we could not detect genes encoding putative ATP transporters in P. canceri and neither PGM nor PK contain a mitochondrial targeting signal. However, we note that both of the M. mackini gene models predicted to encode PGM and PK are incomplete (supplementary Table S1, Supplementary Material online). Nevertheless they form well supported clades with the P. canceri proteins predicted to function in the MRO (supplementary fig. S5, Supplementary Material online-PGM and PK phylogenies). Moreover, like P. canceri, M. mackini encodes two copies of PGM. This suggests that M. mackini may also have a part of glycolysis localized in the MRO.
Based on the above results, we hypothesize that the relocation of the ATP-yielding pyruvate kinase reaction to the mitosome would allow ATP to be synthesized in the organelle rather than relying solely on ATP import from cytoplasmic pathways. However, because we did not detect a mitochondrial enolase or obvious nucleotide transporters in P. canceri that would be necessary for importing ADP/ AMP for the pyruvate kinase reaction, it remains to be explained how the substrates for pyruvate kinase-phosphoenol pyruvate and ADP-are imported into the organelle. The mitosomes might also scavenge host-derived ATP, especially given microscopy evidence that show both P. canceri and M. mackini often found near host mitochondria (Burki et al. 2013;Hartikainen et al. 2014). Although not confirmed by the bioinformatic predictions, one of the identified ABC transporters (supplementary Table S1, Supplementary Material online) could be responsible for the host-derived ATP import into the parasite cells.
Another uncommon characteristic of the mitosome in P. canceri (and M. mackini) is the predicted capacity to synthesize phospholipids ( fig. 2). Phospholipids are essential components of cellular membranes (Carman and Han 2011;Kay and Fairn 2019;Acoba et al. 2020) and are vital for many metabolic processes such as apoptosis (Amara and Mercer 2015), the electron transport chain Calzada et al. 2019), and cellular signaling (Carman and Han 2011). In some intracellular parasites, phospholipids are also important for immune evasion, host cell invasion, and nutrient acquisition (Vial et al. 2003;Campbell et al. 2013;Ramakrishnan et al. 2013;Wanderley et al. 2020).
The biosynthesis of phospholipids and role of mitochondria therein have been best studied in model organisms such as yeast (Carman and Han 2011), but little is known of the contribution of MROs in parasites to these pathways. Experiments with radiolabeled phospholipids showed that phosphatidylethanolamine (PE) and PC localize to the mitosomes in the microsporidian E. cuniculi (El Alaoui et al. 2001) and the diplomonad G. intestinalis (Yichoy et al. 2010). Furthermore, the pss and psd genes from the CDP-DAG pathway were amplified by RT-PCR and sequenced in E. cuniculi (El Alaoui et al. 2001) and identified in the genomes of two other microsporidia (Campbell et al. 2013), although neither study specifies the subcellular location for their protein products. Collectively, these observations suggest that even though the role of MROs in phospholipid synthesis is understudied, MROs might play a more general role in phospholipid de novo synthesis, expanding the functional repertoire of reduced organelles such as mitosomes.
The possible involvement of mitosomes in de novo phospholipid synthesis in parasites is interesting because these compounds are critical for membrane structural integrity and cell signaling (Vance and Steenbergen 2005; Calianese and Birge 2020). Phosphatidylserine (PS), for example, signals the immune system to initialize phagocytosis and the anti-inflammatory response when expressed on the outer surface of apoptotic cells (El-Hani 2012; Amara and Mercer 2015; Wanderley et al. 2020). Some protistan endoparasites employ apoptotic mimicry to invade the host (Wanderley et al. 2020). By expressing high amounts of PS on its cell surface, such parasites will phenotypically look like a host apoptotic cell and will therefore be targeted for ingestion by phagocytes. Once phagocytosed, the parasite can propagate within the host cell without triggering the immune system. Invertebrates, including crabs, rely on their innate immune system which is constituted by cellular and humoral immune responses (Johnson 1987). Previous investigations showed that during P. canceri infection, host tubule cells that are infiltrated by P. canceri plasmodial cells become apoptotic. Yet, no immune response seems to be triggered in the host. The heavily infected crab haematocytes (known to be involved in the crab immune defense) and nephrocytes (mobile cells of the secretory system) are filled with unicellular stages of P. canceri (Thrupp et al. 2013). As other ascetosporean parasites show similar invasion patterns (Mourton et al. 1992;Stentiford et al. 2004;Elgharsalli et al. 2013), we hypothesize that P. canceri might also employ apoptotic mimicry to invade the host.
In conclusion, we used bioinformatic predictions and heterologous gene expression to investigate the energy and mitochondrial metabolisms in an emerging pathogen of animals (Hartikainen et al. 2014). Intracellular parasites like P. canceri are inherently difficult to study because they cannot be propagated in the lab and live in tight association with host tissue. We show that informative metabolic reconstruction of the parasite can be achieved from metagenomic data obtained from infected tissue, following a rigorous pipeline of in silico data sorting in combination with subcellular visualization of GFP fusion proteins heterologously expressed in yeast. This approach suggests a mitosome in P. canceri with a combination of features previously not described in other organisms with similar organelles. Importantly, our study also shows that despite the highly divergent nature of P. canceri proteins and mitosomal protein import machinery in general (no import machinery was found), P. canceri mitosomal proteins retain subcellular localization signals comparable to unrelated model organisms. Together, these methods represent promising avenues to further explore the biology of important yet poorly known uncultured parasites from natural host populations.

Samples Collection and Pathological Investigation
Healthy and diseased individuals of juvenile edible crab (Cancer pagurus) were collected from Newton's Cove in Weymouth, UK, on June 26, 2014. Crabs were dissected and those individuals showing the characteristic signs of P. canceri infections (a proliferated antennal gland presenting as yellow gelatinous tissue) were classified as diseased candidates. Infections with P. canceri were further confirmed by histology following the protocol from Hartikainen et al. 2014(Hartikainen et al. 2014) (supplementary fig. S1, Supplementary Material online).

Nucleic Acid Extraction, Library Preparation and Sequencing
Nucleic acids were extracted from a highly infected antennal gland of a juvenile C. pagurus (diseased tissue) and a claw muscle of a healthy C. pagurus (healthy tissue) (supplementary fig. S1, Supplementary Material online). Briefly, tissues were homogenized using a Fast prep FP120 machine (MP biomedicals) at the highest setting for 2 min followed by an overnight incubation at 56 °C. The samples were subjected to centrifugation at 9000 rpm for 2 min and 50 µl of the supernatant from each sample was added to 150 µl of G2 buffer of the Qiagen DNeasy Blood and tissue kit. Total DNA was extracted from the homogenate using the EZ1 DNA tissue kit (cat no 953034) and the EZ1 advanced Biorobot (Qiagen) following manufacturer's instructions. For RNA extraction, frozen tissue samples were washed three times in PBS (Phosphate-buffered saline; 10 mM) by serial centrifugation. Complementary DNA (cDNA) was synthesized using the SMARTer Stranded Total RNA-Seq Kit v2-Pico Input Mammalian (Takara Bio) following the manual instructions.

Read Decontamination, Assemblies, and Quality Control
FastQC reports and assembly statistics before and after decontamination are shown in supplementary Table S2 and figure S1, Supplementary Material online. For all libraries the read quality and adaptor content was inspected with FASTQC v. 0.11.8 (Andrews 2010). Adaptor sequences were removed and low-quality reads were trimmed with Trimmomatic v. 0.36 (Bolger et al. 2014) under the following settings: ILLUMINACLIP:2:30:10 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20 MINLEN:30. The program KAT v.2.3.4 (Mapleson et al. 2017) was used to compare the k-mer frequencies and GC content of the two DNA libraries produced in this study.
To retrieve P. canceri reads from DNA and cDNA libraries of diseased crab tissue an iterative bioinformatic workflow was implemented ( fig. 1). The workflow is divided into three parts accordingly: START 1: The host draft genome and transcriptome were assembled from the healthy tissue DNA and cDNA libraries using metaSpades v. 3.13.1 (k-mer auto option) (Bankevich et al. 2012) The reads of the diseased tissue DNA.HiSeq_2018 library were mapped on these assembly with BWA v.0.7.8 (Li and Durbin 2009). The unmapped reads were retrieved using Samtools v.1. 9 ) and assembled with SPAdes v. 3.13.1 (k-mer auto option) (Bankevich et al. 2012) resulting in a preliminary draft assembly of the diseased tissue (v. 1). The host and non-target contigs (e.g., metazoa, bacteria, fungi, plants) were identified based on megablast search against NCBI nucleotide database, BLAST v. 2.9.0+, visualized with Blobtools v.1.1.1 (Laetsch and Blaxter 2017) and removed based on Blobtools v.1.1.1 (Laetsch and Blaxter 2017) scripts, which resulted in a second draft assembly of the diseased tissue (v.2). The "diseased tissue DNA.HiSeq_2018 library" was re-mapped on the assembly v.2, the mapped reads were kept and re-assembled (Spades v. 3.13.1 (Bankevich et al. 2012), k = 87 option). The optimal k-mer threshold was determined with the KmerGenie v. 1.7039 (Chikhi and Medvedev 2014). The contigs of this third assembly (v.3) were queried using blastx searches (with Diamond v.0.9.29 (Buchfink et al. 2015)) against Diamond protein database (August 2020).

GBE
All contigs that had hits against bacterial, archaeal, metazoan, Viridiplantae and fungal sequences with a bitscore and query coverage greater than 50% and 10%, respectively, were removed creating the Pcanceri_genome1. START 2: The read profiles of the DNA library generated in a previous study (Hartikainen et al. 2014) (noted as diseased tissue DNA.MiSeq_2014 library) were mapped on the first P. canceri assembly using BWA v.0.7.8 (Li and Durbin 2009). The mapped reads were retrieved Samtools v.1.9 ) normalized with BBmap v.38.08 (Bushnell 2014) at an average depth of minimum 5 × and maximum 100 × and assembled with SPAdes v. 3.13.1 (k-mer auto option) (Bankevich et al. 2012). This resulted in the second P. canceri genome assembly: Pcanceri_genome2. START 3: The cDNA reads of the diseased tissue library (diseased tissue cDNA.MiSeq_2018) were mapped with STAR v.2.7.2b (Dobin et al. 2013) on to the Pcanceri_genome1, the mapped reads were retrieved (Samtools v.1.9) and de novo assembled with Trinity v. 2.9.1 (Grabherr et al. 2011) in P. canceri transcriptome: Pcanceri_transcriptome. Additionally, the reads of the cDNA libraries obtained from the healthy and diseased tissue were de novo assembled with Trinity v. 2.9.1 in two metatranscriptomes.
To assess the levels of contamination within each assembly, the contigs were queried with Blast 2.9.0+ (Altschul et al. 1990) against the NCBI nucleotide (nt) database and the P. canceri reference genome to retrieve the top 500 hits per query below an e-value threshold of 1e-20. All contigs that had more than 10% hit coverage received a taxonomic identity using Blobtools v. To assess the assembly size, contiguity and completeness statistics were generated with Quast v.4.5.4 (Alexey et al. 2013) for genomes, and TrinityStats.pl script of Trinity v. 2.9.1 (Grabherr et al. 2011) for transcriptomes. The assembly completeness was initially evaluated with BUSCO v. 3.0.2b (Simão et al. 2015) and Augustus v.3.2.3 (Stanke and Morgenstern 2005) searching both the nucleotide and protein sequences from the gene models, respectively, with the eukaryotes database (Table 1,   ) were used to evaluate: 1) the percentage of reads that were included in the assembly and 2) the average read coverage for each assembly (estimated as number of mapped bases divided by the assembly size). The k-mer frequency plot of the P.canceri_genome1 assembly were constructed with KAT v2.3.4 (Mapleson et al. 2017). The intron aware alignment program STAR v.2.7.9a with StarLong algorithm (Dobin et al. 2013) was used to map the metatranscriptome of the diseased tissue onto the healthy tissue genome and transcriptome assemblies and on the Pcanceri_genome1 assembly. To further assign taxonomic identity to the unmapped transcripts, we used BLAST v. 2.12.0+ (Altschul et al. 1990); to identify homologues on NCBI nucleotides (blastx) and protein (blastp) databases (January 2022) using an e-value of 1e-50 and 1e-25, respectively. To identify P. canceri transcripts that did not map in the previous steps due to length difference, RNA editing, or synthetic chimeras, we performed additional blastn searches against the Pcanceri_genome1 assembly (e-value 1e-50). Based on transcript taxonomic identity, GC content and DNA read coverage we were able to separate P. canceri transcripts from host and other non-targeted organisms. These transcripts were mapped with STAR v.2.7.9a with StarLong algorithm on Pcanceri_genome1 assembly. As a last evaluation of the genome completeness, we used 127 genes retrieved from M. mackini transcriptome by previous studies (Burki et al. 2013;Schön et al. 2021). We used BLAST v. 2.12.0+ (Altschul et al. 1990) to search for homologues in the predicted proteomes of both P. canceri genomes.

Gene Model Predictions, Functional Annotation, and Subcellular Localization
When comparing the open-reading frames in the transcriptomic data with their genomic counterparts, we failed to observe obvious spliceosomal introns. Therefore, Prodigal v. 2.6.3 (Hyatt et al. 2010), a bacterial gene prediction tool, was used to predict gene models and proteins for both P. canceri genome assemblies. TransDecoder v.5.3.0 (https://github.com/TransDecoder/TransDecoder) was used to identify candidate coding regions from all transcriptome assemblies generated in this study and the published transcriptome of M. mackini (Burki et al. 2013). Functional annotation of the predicted proteins was performed based on the following strategy. The predicted proteome was used as a query against the NCBI nr database (May 2020) to retrieve the top scoring hits (BLAST suite v. 2.9.0+). Interpro (IPR) domains were assigned using Interproscan v.5.30-69.0 (Jones et al. 2014). The online version of eggNOG-mapper v2 (Huerta-Cepas et al. 2017) was used for orthology assignments of the predicted proteins and K numbers were assigned on the GhostKoala web server (https://www. genome.jp/kegg/tool/map_pathway.html). The subcellular localization of each protein was determined with targetP v.

Mitochondrial Protein Predictions and Downstream Analyses
Several searching strategies were employed to identify putative mitochondrial proteins in P. canceri and M. mackini (Burki et al. 2013). The predicted proteomes of P. canceri and M. mackini were inspected for the presence of proteins encoding MRO-localized proteins using the functional annotation and subcellular localization determined in the previous section. The output of eggNOG-mapper and Interproscan were additionally searched for any components of the protein import machinery or mitochondrial carrier family proteins. Moreover, the predicted mitochondrial proteomes from Pygsuia biforma (Stairs et al. 2014), Blastocystis sp. (Abrahamian et al. 2017) and B. motovehiculus (Gawryluk et al. 2016 were used as query sequences against the predicted proteins from P. canceri and M. mackini using BLAST v.2.1.8 (Altschul et al. 1990). Any protein that was predicted to be mitochondrial related based on at least one software tool used above or had one mitochondrial subject sequence retrieved in the top 100 BLAST hits was further investigated for completeness, annotation and mitosomal provenance. First, the gene model's completeness was assessed by manually examining the query coverage to similar sequences via BLAST. Those P. canceri predicted proteins that did not match to any sequence with BLAST or only aligned with hypothetical proteins were not examined but can be found in supplementary  Table S1A and C, Supplementary Material online. Those predicted protein sequences with a methionine that align with the starting methionine of subject sequences were annotated as "complete". In case that some but not all components of a certain metabolic pathway were predicted to be present in P. canceri, the putative missing genes were further searched in the metagenomic and metatranscriptomic predicted proteomes. If these searches proved negative, the raw reads of each library were further investigated with Phylomagnet (Schön et al. 2020). This program employs a gene centric approach to retrieve and assemble genes of interest directly from a raw read library. None of the investigated genes could be recovered from the raw reads.

Phylogenetic Dataset Assembly
To confirm the taxonomic identity of the putative mitochondrial-related protein identified in P. canceri and eliminate the possibility of residual contamination, maximum-likelihood phylogenetic trees were constructed (supplementary fig. S5, Supplementary Material online). Except for the mitochondrial ABC transporter gene (atm1), the phylogenetic analysis workflow was performed as follows. All mitochondrial-related proteins identified in P. canceri were queried against the NCBI nr database (August, 2020) with BLAST v.2.1.9 (Altschul et al. 1990) using the BLASTP algorithm. The top 5,000 hits with an e-value less than 1e-10 (or 1e-5 if few hits were identified) were retrieved and clustered at 90% identity with CD-HIT v.4.8.1 (Edgar 2010). The predicted proteomes of M. mackini and C. pagurus were searched with BLASTP to retrieve homologous proteins. Lastly, a reciprocal BLASTP in all P. canceri predicted proteoms was performed. The sequences were aligned (Mafft v.7.407 (Katoh and Standley 2013), mafft-auto). The alignments were trimmed of ambiguous sites with (trimAL v.1.4.1 (Capella-Gutierrez et al. 2009), -automated1). The amino acid substitution model was determined with IQ-TREE2.1.6.5 using the default settings (Kalyaanamoorthy et al. 2017). Phylogenies and 1,000 ultrafast bootstrap trees with 1,000 SH-aLRT replicates were constructed with IQ-TREE2 v.1.6.5 (Minh et al. 2013). These initial phylogenies were visualized in FigTree v.1.4.4 and manually pruned to reduce the number of taxa. The reduced data set was aligned (Mafft v.7.407 (Katoh and Standley 2013), mafft-linsi). Removal of ambiguous sites, evaluation of amino acid substitution models, and phylogenetic reconstruction proceeded as above. For the putative atm1 transporter, a Hidden Markov Model profile for orthologous group KOG0057 (retrieved from EggNOG 5.0.0 (Huerta-Cepas et al. 2019) database) was used to retrieve the protein models of P. canceri and M. mackini using the default settings of with hmmsearch. The resulting hits were used as queries against the NCBI nr database (August 2020) as described above. This dataset was supplemented with atm1 sequences reported previously (Freibert et al. 2017). The proteins were aligned with hmmalign from HMMER v.3.2.1 (http://hmmer.org/) and the Atm1 phylogeny was performed as described above.
Cloning of ISCU, mPGM, and PSS Genes Full length iscu, mpgm, and pss P. canceri genes were commercially synthesized in pUC57 plasmids (Genewiz). The pDDGFP_LEUD plasmid (Addgene plasmid #58352) was used for expressing recombinant proteins in Saccharomyces cerevisiae (Parker and Newstead 2014). Plasmids containing genes of interest and the pDDGPD_LEU2D destination plasmid were recovered using the Nucleospin plasmid DNA purification kit (Macherey-Nagel) following the protocol provided by the manufacturer. Plasmids were linearized with 12 U/µl of SmaI (Promega) at 25 °C for 1 h. SmaI was inactivated at 70 °C for 15 min. Linearized plasmids were digested with 10 U/µl of BamHI (Promega) at 37 °C for 1 h and recipient pDDGPD_LEU2D was dephosphorylated with 1 µl of Thermosensitive Alkaline Phosphatase (TSAP, Promega) at 37 °C during the last 15 min of incubation with BamHI. Finally, TSAP and BamHI were inactivated at 74 °C for 15 min. Digestion products were ligated into dephosphorylated recipient plasmids at a 1:2 vector:insert ratio with 3 U of T4 DNA ligase (Promega) and at room temperature for 15 min according to the protocol of the manufacturer. Ligase was inactivated at 70 °C for 10 min. Exact protocols available upon request from the authors. P. canceri genes were cloned upstream of green fluorescent protein (gfp) encoded on the pDDGFP-LEU2D plasmid and transformed into Escherichia coli DH5α by standard methods and selected on synthetic defined media (LB plus 50 ug/ml ampicillin). Polymerase chain reactions (PCR) were performed to screen for colonies with gene insert using Gotaq Green master mix (Promega) and 10 µM of each primer (Forward primer: gal1 5′"-TCTGGGGTAATTAATCAGCG AAGCG-3′". Reverse primers: iscu 5′"-GCTCCCGCAGCC GAAAGTCTTAAA −3′", pss 5′"-AGGATGCGGCCTACTG AATGGACC-3′", mpgm 5′"-GGGGCATAGGATCCATGA CTCAATGGTGTGTA-3′" and gfp 5′"-AGTAGCGTCACCT TCACCTTCACC-3′") using standard thermocycling conditions (95 °C 5 min; 30 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min, final elongation temperature of 72 °C for 10 min). The insertion of P. canceri genes into the pDDGFP_LEU2D vector was confirmed by Sanger sequencing with gal1_forward and gfp_reverse primers (Eurofins).

Yeast Transformation and Heterologous Gene Expression
Wild-type yeast (W303-1A) were maintained in Synthetic Defined (SD; 0.067% yeast nitrogen base, 2% glucose, drop-out mix with or without uracil). pDDGFP-LEU2D-ISCU and pDDGFP-LEU2D-PSS were transformed into S. cerevisiae using the S. C. EasyComp Transformation Kit (Invitrogen) according to manufacturer"s protocol and selected on SD-ura; SD without uracil. For fluorescence microscopy, cells were grown on SD-ura + 2% glucose media overnight at 30 °C. A 100 µl aliquot of cells were transferred to SD-ura supplemented with 2% galactose and lacking glucose for the induction of the genes and incubated at 30 °C for 4-6 h. To stain mitochondria and nuclei, Mitotracker Red CMXRos (final concentration of 200 nM) was added to the cells for 20 min, followed by DAPI (2 µg/ml) for 10 min, respectively. After incubation, cells were washed with PBS and loaded in a 1% agarose pad. Cells were imaged on a Zeiss Axio Imager.z2 microscope, using a Plan-Apochromat 100×/1.40 Oil Ph 3 M27 immersion oil objective lens. Excitation wavelength: mCherry (Mitotracker) 587 nm, EGFP 488 nm, and DAPI 353 nm, emission wavelength: mCherry 610 nm, EGFP 509 nm, and DAPI 465 nm. Exposure times for each channel were: mCherry 100 ms, EGFP 500 ms, DAPI 170 ms. Images were processed using linear adjustments (e.g., brightness/contrast) and deconvolved using Zen. Expression of PSS resulted in changes to the yeast cell morphology; additional pictures are provided in supplementary figure S6, Supplementary Material online.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).