The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures.


Introduction
Mitochondrial (mt) genomes are considered characteristic of eukaryotic cells (Lang et al. 1999;Gray et al. 2004). Although eukaryotic mt genomes are believed to have arisen from alpha-proteobacteria, extant eukaryotes possess either linear or circularized mtDNA with varied and reduced gene content (Lang et al. 1999). For example, most metazoan mt genomes are 13-20 kb, compact, circular molecules, encoding 12-13 proteins, 24-25 transfer RNAs (tRNAs), and 2 ribosomal RNAs (rRNAs). On the other hand, linear mtDNAs with terminal repeats (putative telomeres) have also been found in many species, such as the yeast, Candida, and the ciliate, Tetrahymena (Lang et al. 1999;Rycovska et al. 2004).
The ciliates, Paramecium aurelia, Tetrahymena pyriformis, and Tetrahymena thermophile, have linear mt genomes of 40-47 kb, which contain approximately 50 genes Gray et al. 2004). In contrast, only three protein-coding genes (cox1 [cytochrome oxidase subunit I], cox3 [cytochrome oxidase subunit III], and cob [cytochrome b]) and fragmented rRNAs (LSU, large subunit; SSU, small subunit) have been identified in mt genomes of apicomplexans and dinoflagellates (Feagin 1992;Nash et al. 2008;Vaidya and Mather 2009). Recent work on the mt genome of the malaria parasite, Plasmodium falciparum, found additional fragmented rRNAs and uncharacterized small RNAs (Feagin et al. 2012). Diverse linear mt genomes have been reported in apicomplexans (Vaidya and Mather 2009). For example, Plasmodium has mt genomes in tandemly repeated arrays with a unit length of approximately 6 kb. On the other hand, Babesia and Theileria have monomeric mt genomes (Hikosaka et al. 2012). Previous work on dinoflagellate mt genomes has suggested complex organization, with extensive recombined and fragmented gene copies (Waller and Jackson 2009). Fragmented mt genomes and/or transcripts have been reported in at least 25 dinoflagellate taxa (table 1). The foregoing studies have confirmed that dinoflagellate mtDNA includes cox1, cox3, cob and fragmented rRNAs, and have detailed unusual mRNA characteristics (reviewed by Nash et al. 2008;Waller and Jackson 2009). Extensive RNA editing of the three protein-coding genes reviewed in Lin et al. 2008) and trans-splicing of cox3 have been reported (Jackson et al. 2007;Imanian et al. 2012;Jackson and Waller 2013). However, transcripts from the basal dinoflagellates, Hematodinium sp. and Oxyyrhis marina, did not show RNA editing (Slamovits et al. 2007;Jackson et al. 2012). Transsplicing of cox3 was not found in O. marina (Slamovits et al. 2007). Losses of canonical start and stop codons have also been suggested (Norman and Gray 1997;Jackson et al. 2012;reviewed in Nash et al. 2008). On the other hand, analyses of noncoding sequences have been frustrated by high recombination rates in these genomes (Patron et al. 2005;reviewed in Waller and Jackson 2009). In addition, some reports have suggested that the total dinoflagellate mt genome size is likely to be large (Waller and Jackson 2009;Shoguchi et al. 2013), and the dinoflagellate mt genome is thought to be one of the most complex (Nash et al. 2008). For example, it is estimated that 85% of the mt genome in Amphidinium carterae is noncoding (Nash et al. 2007). Although inverted repeat (IR) elements in intergenic regions have been reported, functions of these elements are unknown (Waller and Jackson 2009). Thus, it has been assumed that each alveolate lineage developed different mt genomic structure (Slamovits et al. 2007). Interestingly, recently reported mt genomes of colponemids, an early alveolate lineage, suggest that the ancestral alveolate genome encoded a typical mt gene set (Janouškovec et al. 2013).
Our previous work on the endosymbiotic dinoflagellate, Symbiodinium minutum, has confirmed the presence of unusual nuclear (Shoguchi et al. 2013) and plastid genomes ). In addition, this species may possess high mt genome copy numbers (Shoguchi et al. 2013). In this study, by analyzing the wealth of sequence data, we characterized the Symbiodinium mt genome and transcriptomes, including many noncoding sequences, and we compared them with mt genomes of Plasmodium and dinoflagellates. Assembly of fragmented DNA in general is technically difficult, but physical link information from fosmid end sequencing greatly aided mt genome assembly. Our analysis reveals conserved, noncoding sequences during myzozoan (apicomplexans and dinoflagellates) mt genome evolution. In addition, Symbiodinium is a large genus, classified into nine major clades (Coffroth and Santos 2005;Pochon et al. 2014); therefore, the complete Symbiodinium mt genome will be an important resource to study populations and environmental adaptations using genomic approaches (Shinzato et al. 2014).

Results and Discussion
The De Novo Assembled mt Genome of S. minutum To reconstruct the mt genome of S. minutum, 20 analyses using only high coverage illumina paired-end reads (DNAseq) were performed (see also Materials and Methods). Two candidate mt contigs having more than 100Â read coverage were obtained (19,577 and 291,416 bp) (accession numbers: LC002801 and LC002802) by 49-kmer assembly. Physical link information from fosmid paired-end sequences  1A). In addition, joining of the 3 0 -end of the approximately 19-kb contig and the 5 0 -end of the approximately 291-kb contig was supported by FPES. BLAST (Basic Local Alignment Search Tool) searches showed that the approximately 19-kb contig contains the cox1 gene. The approximately 291-kb contig contains cob, cox3, and fragments of the LSU rRNA gene. Gene locations are explained in detail hereafter. Comparisons between the two contigs and the S. minutum genome assembly v1.0, using mapped FPES, showed that only scaffold 7473 (length: 15,538 bp) from genome assembly v1.0 (Shoguchi et al. 2013) was joined to the approximately 291-kb contig by more than 80 FPESs (fig .  1A). This suggested that nearly 40 kb of mtDNA had been identified. Estimation of the lengths of the two gaps was difficult. Accordingly, two bases (NN) were arbitrarily added between the two contigs and between the 291-kb contig and scaffold 7473 (see also fig. 1A). Comparison of the assembled mt genome with FPESs implies the presence of multiple recombinant mtDNA fragments, but our analysis suggests that S. minutum has a continuous mt genome of approximately 326 kb. Only simple repeats with fewer than 8 bp (~1.49%) and low complexity (~0.23%) were found in the mt genome assembly. The 49-bp repeats, which might be relevant to the assembly process, occurred fewer than four times in the approximately 326 kb.

Transcriptomes of Symbiodinium mt Coding Genes
RNAseq reads were mapped onto the continuous genome ( fig. 1B), revealing high coverage of cox1, cob, cox3, and the fragmented LSU gene ( fig. 1B). Mapped data indicated the possibility of polycistronic expression. Mapping of reads with polyA or T in the 5 0 sequence showed four major peaks for three protein-coding genes and the fragmented LSU. The highest peak is likely to be from the fragmented LSU gene, suggesting high expression and enhanced polyadenylation during RNA processing. Reads mapped from the transcription start site (TSS) library showed high coverage of multiple sites, suggesting multiple 5 0 cleavage sites and transcripts with modified 5 0 -phosphate groups ( fig. 1B). Symbiodinium minutum mt transcripts did not show evidence of RNA processing, such as 5 0 oligo (U) caps of O. marina mt transcripts (Slamovits et al. 2007).
Edited RNA sites for transcripts of cox1, cob, and cox3 were investigated using comparisons between assembled genomes and transcripts. A to G editing was found in 61% of the 72 sites, showing conservation between dinoflagellates (table 2; Lin et al. 2008). In addition, patterns of RNA editing-mediated amino acid substitutions correspond to previous report about another species (supplementary fig. S1, Supplementary Material online; Lin et al. 2008).
Another unusual feature of dinoflagellate mt genes is the lack of canonical start and stop codons to direct the initiation and termination of translation (Norman and Gray 2001;Jackson et al. 2012;reviewed in Nash et al. 2008). We have characterized start and stop codons of S. minutum mt genes using manual alignments between genomic and transcriptomic sequences (supplementary fig. S2, Supplementary Material online). We found AUA (Ile) and AUU (Ile) at the 5 0end of cox1 and AUU (Ile) in cox3. They are also candidates for start codons as mt genes in both ciliates and apicomplexans use AUA and AUU for this purpose (Feagin 1992;Edqvist et al. 2000).

Noncoding RNA Genes and Gene Map
In the apicomplexan P. falciparum, 39 RNA genes, including fragmented rRNA LSUs (15), SSUs (12), and uncharacterized small RNAs (12), have been identified (Feagin et al. 2012). These rRNA fragments are not arranged linearly, but synteny was conserved in Plasmodium (Vaidya and Mather 2009). It is suggested that this fragmentation occurred in the common ancestor of apicomplexans and dinoflagellates (Slamovits et al. 2007;Jackson et al. 2012). To predict fragmented rRNAs in the S. minutum mt genome, the most similar regions from P. falciparum (Feagin et al. 2012) were surveyed and aligned (table 3 and  tRNA genes were not found in the S. minutum mt genome using tRNA scan. So far no studies of dinoflagellate or apicomplexan mtDNAs have identified any tRNA genes, suggesting that tRNAs have been imported from the nuclear genome, as was reported for the apicomplexan, Toxoplasma gondii (Esseiva et al. 2004).
Interestingly, two LSU fragments, L4 and L5, map onto neighboring regions of the S. minutum mt genome with fewer than 100 bp between them (table 3 and  Comparisons of mt gene arrangements between Symbiodinium and Plasmodium showed only one microsyntenic region, which has the same gene arrangement on S12 and S10 ( fig. 2 and supplementary fig. S4, Supplementary Material online), suggesting that the fragmentation occurred in the common ancestor of apicomplexans and dinoflagellates and that genome rearrangements in these lineages were very frequent. Thus, this basic information is valuable for possible functional analysis of dinoflagellate mt genomes.

Unknown Noncoding Regions and Possible Expansion in the Dinoflagellate Lineage
Our analysis confirms that noncoding sequences of the Symbiodinium mt genome have been expanded, raising the question as to where the expanded sequences originated. Enormous expansion of intergenic content in the mt genomes of seed plants (200-2,900 kb) has been reported (Mower et al. 2012), and repeated proliferation of "selfish" DNA has contributed overwhelmingly to these expansions (Chaw et al. 2008). Highly repetitive sequences were not found in the mt genome of S. minutum. In addition, when compared with other dinoflagellate mt sequences, the S. minutum mt sequences suggest additional RNA fragments or pseudogenes (supplementary fig. S5, Supplementary Material online), but these were not identified.
To find conserved secondary structures of potential RNA genes, each chopped 300-bp sequence from the mt genome was employed as a query sequence to perform RNA homology searches using Infernal (Nawrocki and Eddy 2013). Twentytwo sequences showed similarity to reported sequences in the Rfam database, including microRNAs and LSUs (Nawrocki et al. 2015) and suggest the presence of unknown RNA genes (supplementary table S1, Supplementary Material online). Unexpectedly, possible secondary structures for these genes included a stem of more than 20 bp (supplementary fig. S6, Supplementary Material online). Although we did not find sequence similarities to reported small, IRs of dinoflagellate mtDNA (reviewed in Waller and Jackson 2009) in the S. minutum mtDNA, the result suggests that structures comprising IRs are conserved characters among dinoflagellates mtDNAs.
Transcriptional control of the alveolate mt genome is not clear (Gray et al. 2004;Waller and Jackson 2009). Our RNAseq reads support the possibility of polycistronic expression. To detect conserved intergenic regions, similarities to the six intergenic sequences of P. falciparum and mtDNA sequences of dinoflagellates were surveyed. Interestingly, our analysis showed that intergenic sequences of P. falciparum have similarities to the mt genome of S. minutum at the same level as comparisons between rRNA sequences. ( fig. 2 and supplementary fig. S7, Supplementary Material online).
It is very interesting to examine how organelle genomes of dinoflagellates evolved different structures, given the large variety of structures between plastid and mt genomes of Symbiodinium. Although the plastid genome has undergone reconfiguration to a compact DNA minicircle (~1.8-3.0 kb) with its own regulatory regions (Mungpakdee High coverage reads are found on cox1, cox3, fragment E of the ribosomal LSU, and cob. Only reads with poly A or T (more than four) are shown on middle graph, suggesting polyadenylated transcripts and potential 3 0 -ends. The lower graph displays reads from the TSS library, which is enriched RNA with 5 0 cap structures, indicating the presence of multiple 5 0 -ends.

Transcriptome Mapping
Transcriptome reads deposited at DRP000944 were mapped onto the assembled mt genome using Bowtie 2 (version 2.1.0) software (Langmead and Salzberg 2012). Detection of RNA editing for cob, cox1, and cox3 was basically performed in the same manner as for plastid transcripts . Differences between DNA and RNA were detected by aligning transcriptome contigs to a scaffold. RNAseq reads were mapped onto mt transcriptome contigs using TopHat (Trapnell et al. 2009) and accuracy was confirmed. Reads from a TSS library (Yamashita et al. 2011;Shoguchi et al. 2013) were mapped using Bowtie 2, as described in Mungpakdee et al. (2014).

Data Analysis Software and Sequences Used for Comparisons
Repeats in the mt genome assembly were detected using RepeatMasker in default mode (http://www.repeatmasker. org). tRNAscan-SE (with default parameters in organellar mode) (Schattner et al. 2005) was used to find tRNA genes in the mt genome and transcriptome contigs. Sequence alignments between Symbiodinium and Plasmodium were performed using GENETYX-MAC version 17 and BLASTN. RNAfold (http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi ;Zuker and Stiegler 1981) and CentroidFold (http://www.ncrna.org/ centroidfold/; Sato et al. 2009) with initial settings were used for prediction of RNA secondary structure. We prepared chopped 300-bp sequences from the mt genome with 100bp overlap sequences. RNA homology searches for each of the 300-bp sequences were performed using infernal (INFERence  2.-Mitochondrial gene order comparisons between S. minutum and P. falciparum. Genes from the S. minutum mt genome (~326 kb) to the upper are joined to those of P. falciparum. The gene order of S10 and S12 was the same in mt genomes of both P. falciparum and S. minutum (aqua lines), showing minimal conservation of gene order. Sequence similarities from intergenic regions of the P. falciparum mt genome are indicated by orange lines. Details for the S. minutum mt genome map are shown in supplementary figure S4, Supplementary Material online.