High Variability of Mitochondrial Gene Order among Fungi

From their origin as an early alpha proteobacterial endosymbiont to their current state as cellular organelles, large-scale genomic reorganization has taken place in the mitochondria of all main eukaryotic lineages. So far, most studies have focused on plant and animal mitochondrial (mt) genomes (mtDNA), but fungi provide new opportunities to study highly differentiated mtDNAs. Here, we analyzed 38 complete fungal mt genomes to investigate the evolution of mtDNA gene order among fungi. In particular, we looked for evidence of nonhomologous intrachromosomal recombination and investigated the dynamics of gene rearrangements. We investigated the effect that introns, intronic open reading frames (ORFs), and repeats may have on gene order. Additionally, we asked whether the distribution of transfer RNAs (tRNAs) evolves independently to that of mt protein-coding genes. We found that fungal mt genomes display remarkable variation between and within the major fungal phyla in terms of gene order, genome size, composition of intergenic regions, and presence of repeats, introns, and associated ORFs. Our results support previous evidence for the presence of mt recombination in all fungal phyla, a process conspicuously lacking in most Metazoa. Overall, the patterns of rearrangements may be explained by the combined influences of recombination (i.e., most likely nonhomologous and intrachromosomal), accumulated repeats, especially at intergenic regions, and to a lesser extent, mobile element dynamics.


Introduction
Mitochondria play various essential roles in eukaryotic cells, particularly with respect to their primary functions in respiratory metabolism and energy production. From its origin as a proto-mitochondrion derived from an alpha-proteobacterium to its current state as a cellular organelle, major changes have occurred not only in terms of protein repertoire but also in the organization of the mitochondrial (mt) genome (Adams and Palmer 2003;Gabaldon and Huynen 2004). Previous studies have shown that subsequent to the endosymbiotic origin of mitochondria, a high percentage of the ancestral alpha-proteobacterial protein-coding genes were lost and replaced by proteins of different origins (Gabaldon and Huynen 2003;Gabaldon and Huynen 2007). The loss of ancestral bacterial genes resulted in significant genome size reduction (Bullerwell and Lang 2005). mt genome evolution differs remarkably among the major groups of eukaryotes. Comprehensive reviews are available about mt genome evolution in animals (Boore 1999), plants (Levings and Brown 1989;Palmer et al. 2000;Kitazaki and Kubo 2010), protists (Gray et al. 1998;Burger et al. 2003), and fungi (Paquin et al. 1997), as well as comparison among these lineages (Burger et al. 2003;Bullerwell and Gray 2004). Animal mt genomes are generally gene rich, practically intron less, and they have a high rate of DNA sequence evolution. Gene order tends to be conserved, especially within major phyla although they can be variable between them (Boore 1999). In the last few years, however, examples from different animal groups, in particular molluscs (Boore and Brown 1994;Yamazaki et al. 1997;Boore 1999;Kurabayashi and Ueshima 2000;Rawlings et al. 2001), have challenged this view showing that rearrangements can occur within animal mt genomes (Perseke et al. 2008;Bernt and Middendorf 2011). An important feature is that animal mtDNAs typically do not engage in recombination (but see Rokas et al. 2003; Barr et al. 2005). In contrast, plant mt genomes have sequence characteristics that are associated with more frequent recombination, including large intergenic regions that can house repeated DNA sequences and a variable number of introns and their associated intronic open reading frames (ORFs) (Palmer et al. 2000). Such repetitive genomic elements contribute to the significant increase of mt genome size in some plants (e.g., in the Silene genus, Sloan et al. 2012). Also, plant mt genomes have experienced frequent rearrangements, particularly in vascular plants, and they have higher gene order variability relative to animal mt genomes (Palmer et al. 2000;Kitazaki and Kubo 2010;Galtier 2011;Liu et al. 2011 and references therein). Interestingly, algal mt genomes do not show many rearrangements and are thus a group of plants retaining many characteristics of the ancestral eukaryotic mitochondria (Liu et al. 2011). Plant mt genomes tend to have rates of DNA sequence evolution that are lower than in animals (Palmer et al. 2000;Kitazaki and Kubo 2010). They can also perform extensive RNA editing, although this capability is variable between plant lineages (Hecht et al. 2011;Liu et al. 2011). Other, less-well-studied eukaryotes include the phylogenetically diverse and nonmonophyletic protists, whose mt genomes can display the most variation in organizations (Burger et al. 2003;Bullerwell and Gray 2004). Protist mt genomes can be either linear or circular, have multiple linear chromosomes transcribed separately, and vary dramatically in size (Burger et al. 2003;Vlcek et al. 2011). There does not seem to be a generalized tendency in terms of rate of mt evolution or capacity to recombine among protists and they exhibit variability in terms of gene order (Gray et al. 1998;Burger et al. 2003).
Fungal mt genomes have been less studied than their animal or plant counterparts, yet they hold great potential for illuminating the evolution of organellar genomes. The most evident feature is that, although gene content is largely conserved, their relative gene order is highly variable, both between and within the major fungal phyla (Paquin et al. 1997 and references in table 1). One difference between the largest two fungal phyla, Ascomycota and Basidiomycota, is that in most ascomycetes, genes are typically encoded on the same mtDNA strand, whereas in basidiomycetes, they can be encoded on both mtDNA strands (references in table 1). Another remarkable characteristic is that, although fungi are a lineage more closely allied with animals, mtDNAs in these organisms show signals of recombination, a characteristic that is more similar to plant mtDNAs. Also similar to plants, fungal mt genomes can have large intergenic regions including sequence repeats and introns. Interestingly, fungi have mostly group I introns, whereas plant mitochondria tend to possess preferentially group II introns (Lang et al. 2007). Intron numbers are highly variable in fungal mtDNA; for instance, although the mitochondrion of the ascomycete Podospora anserina has 15 group I introns and 1 group II intron, that of the basidiomycete Schizophyllum commune shows no introns at all (Specht et al. 1992;Paquin et al. 1997). In fact, mt genome size variation can be explained in large part by differences in the number and length of introns: intron length can range from a few bp up to 5 kb (Lang et al. 2007). Such fungal introns often display autonomous proliferation in mt genomes via homing endonucleases (HEs), proteins with DNA endonuclease activity that allows them to move easily in the genome by intron transfer, and site-specific integration or "homing" (Lazowska et al. 1980;Lambowitz and Perlman 1990;Pellenz et al. 2002).
The presence of repetitive DNA within mt genomes in the form of introns and their associated traits of self-splicing and insertion endonuclease activity may contribute to the structural dynamics of fungal mt genomes, eliciting changes in gene order, overdispersal of repetitive elements, and the introduction of new genes through horizontal gene transfer (HGT) (Vaughn et al. 1995;Ferandon et al. 2010). Moreover, the repeats accumulated in mt introns have been associated with increased recombination and deletions , a process that is frequently invoked to explain differences in mt gene order in fungi but that is remarkably absent in Metazoa (Saccone et al. 2002). Finally, another factor potentially contributing to gene order variation is the distribution of transfer RNAs (tRNAs), which display editing, excision, and integration capabilities, that allow them to change location within the genome and participate in HGT events (Tuller et al. 2011). Because changes in tRNA location are relatively rare events, tRNA location within fungal mt genomes has been used to study fungal evolution and phylogenetic signal (Cedergren and Lang 1985).
To date, a number of studies have provided insights into fungal mt genomes (see references in table 1); however, to our knowledge, there has not been a large-scale comparative analysis providing a broader picture of the evolution of fungal mt genomes, especially of the remarkable variability in gene order. Here, we therefore set out to investigate variation in gene order among fungal mt genomes, including basidiomycetes, ascomycetes, and fungi from early diverging lineages. Our species sampling provided the taxonomic depth and balance and established the context for a comprehensive analysis of gene order evolution in fungi. Basal fungal taxa, being highly divergent with respect to our sampled ascomycetes and basidiomycetes, were analyzed separately to obtain reliable alignments. We investigated possible causes of gene order variability, specifically, we assessed 1) evidence of recombination and the dynamics of gene rearrangements and 2) the role played by intergenic regions and their associated repeats, the number of introns, intronic ORFs, and tRNA distribution. Finally, we discuss how the combined roles of recombination, chromosomal rearrangements, insertion of introns, and associated mobile elements can contribute to the high variability of gene order and tRNA distribution among fungal mitochondria.

Phylogenetic Inference
To analyze the evolution of gene order through time and across the sampled species, we first reconstructed a phylogenetic tree to map the different gene orders in an evolutionary context. The two data sets defined in this study, the dikarya and the basal fungi data sets, were analyzed independently using the same methods. First, protein sequences of the orthologs shared by all sampled species, including protein-coding genes cox1, cox2, cox3, atp6, atp8, atp9, nad1, nad2 nad3, nad4, nad5, nad4L, and nad6, were aligned using a combination of six different alignment strategies (Muscle, Mafft, and dialignTX, in forward and reverse). These alignments were automatically trimmed with trimAl (Capella-Gutierrez et al. 2009) to remove poorly aligned regions based on the fraction of gaps (0.1) and the consistency across aligners (>0.16) before they were concatenated. Subsequent phylogenetic reconstruction combined neighbor joining and maximum likelihood (ML), using PhyML (Guindon et al. 2009) and RAxML v.7.2.6 (Stamatakis 2006). For the ML analyses, four substitution rate categories were used, estimating the gamma parameter and the fraction of invariable sites from the data. Support values were computed using an approximate likelihood ratio test. Bootstrap analysis was conducted with 100 resampling iterations. Once we inferred the tree, we estimated evolutionary rates with the r8s software v. 1.8 (Sanderson 2003). We used the global Langley and Fitch (LF) model, which estimates a single evolutionary rate across the whole tree (i.e., assuming a molecular clock), and the local LF models allowing for the estimation of local rates for groups of clades within the tree. The approximate ages for internal nodes were obtained using the divergence of basidiomycetes and ascomycetes (Taylor and Berbee 2006;Lucking et al. 2009), conservatively set at 500 Ma, and the whole-genome duplication event within the Saccharomyces clade (Wolfe and Shields 1997), set at 100 Ma. These two dates were used as calibration points. The evolutionary rates and estimated node ages were subsequently used to infer an approximate rate of rearrangements per clade.

Whole-Genome Alignments, Recombination, and Rearrangement Events
Because whole-genome alignment methods produce better results, the more similar the genomes are, we decided to align groups of mt genomes that are not too distant in terms of sequence identity. To identify which genomes could be aligned together, we built a composite likelihood distance matrix based on the concatenated alignment of protein-coding genes cox1, cox2, cox3, atp6, atp8, and atp9. We determined the Euclidian phylogenetic distance and used the hierarchical agglomerative clustering method available in R (R Development Core Team 2011), with h ¼ 0.4 to determine the groups of most closely related genomes that could be used for whole-genome alignment. With the dikarya data set, we obtained the nine following groups (hereafter referred to as fungal clusters): 1) "basidios1," including Tr. cingulata, Mo. perniciosa, S. commune, and Pl. ostreatus; 2) "basidios2," including T. indica, U. maydis, and C. neoformans; 3) "basidios3," including M. violaceum-Sl and Ph. pachyrhizi; 4) "sordariomycetes," including B. bassiana, Co. bassiana, Ch. thermophilum, P. anserina, F. oxysporum, G. zeae, L. muscarium, and Me. anisopliae; 5) "saccharomycetes1," including Ca. albicans, D. bruxellensis, De. hansenii, Mil. farinosa, O. angusta, and Pi. pastoris; 6) "saccharomycetes2," including Ca. glabrata, K. lactis, N. bacillisporus, and V. polyspora; 7) "dothideomycetes," including My. graminicola, Pel. membranacea, and Ph. nodorum; 8) "eurotiomycetes," including A. obtusum, Mi. canis, Pa. brasiliensis, and Pen. marneffei; and 9) "basals," including Al. macrogynus and Rhizophydium sp. Neither Schizos. japonicus nor Y. lipolytica could be reliably aligned with the other clusters so they were excluded from further analysis. The complete mt genomes in each cluster were aligned with Mauve v.2.3.1 (Darling et al. 2010) using the full alignment option. This general-purpose multiple sequence aligner is able to handle whole-genome alignments and has the advantage that it identifies syntenic blocks despite rearrangements and reversals. We further refined the alignments of the syntenic blocks using t-coffee (Notredame et al. 2000) and analyzed them with GRIMM v. 1.04 (Tesler 2002) and UniMoG (Hilker et al. 2012) to infer a minimal history of rearrangements among the aligned genomes. We assumed the Double-Cut and Join (DCJ), restricted DCJ, Hannenhalli and Pevzner (HP), inversion, and translocation models. These methods predict optimized rearrangement scenarios between pairs of gene order lists. GRIMM infers inversions and takes only lists of gene orders including the same number of genes, in other words, it does not take into account gene losses, whereas UniMoG does. The latter has the advantage that it extends the DCJ to include the HP, inversion, and translocation models. Finally, the syntenic blocks, the intergenic regions, and the individual one-to-one orthologs of all genomes were tested for recombination, the most likely mechanism explaining the observed gene order variability.
There are several methods available to detect different types and signals of recombination (Martin et al. 2011). In our case, we needed methods that could identify incongruent blocks of sequence along the alignments. We chose methods that look for incongruence in terms of patterns of sites, including RDP3 v.4.16 (Martin et al. 2010), PhiPack (Bruen et al. 2006), and Recco (Maydt and Lengauer 2006). However, as far as we know, there is no specific software for the detection of nonhomologous intrachromosomal recombination, so it is possible that the methods available do not perform optimally for identifying this type of event. Nevertheless, looking for evidence of intrachromosomal recombination is often coincident with identifying breakpoints, reversals, and translocations, so the rearrangements we inferred using GRIMM (Tesler 2002) and UniMoG (Hilker et al. 2012) were used as a proxy for the particular case of intrachromosomal recombination.

Gene Order Variability: Modeling Gene Order Evolution
Gene order can be studied directly by the comparison of the sequential order of mt genes described in their respective articles and/or genetic maps (see references in table 1). We used these data to investigate the most likely evolutionary scenarios: We estimated the gene order conservation (GOC) index as described in  and the branch-specific GOC described in Fischer et al. (2006). GOC is simply defined as the number of contiguous ortholog pairs that are in common between compared genomes, normalized by the number of shared orthologs. Conversely, gene order loss (GOL) is defined as 1-GOC. As described in , the empirical models defined in that study attempt to fit the loss of GOC with respect to time. Model 0, proposed by Tamames (2001), is the simplest model that approximates GOC to a sigmoidal curve described by GOC ¼ 2/1 + e at , where parameter a is adjusted by regression. Model 1 fits time dependence with a square root dependence, thus GOC ¼ 1 -ˇat. Model 2 considers that GOC decreases with time in a negative proportion to the square of the GOC at time t, hence 1/GOC ¼ at + 1. Finally,  proposes a probabilistic approach where the probability (P) of two genes staying together after t consecutive generations is given by P ¼ p t . Thus, in Model 3: GOC ¼ p t . Note that this expression assumes that P is the same for all genes, which is thus somewhat unrealistic. We decided to also use the approach described in Fischer et al. (2006), where a measure of GOL for each branch in the tree is obtained and is thus more specific than the previously described empirical models. Branch-specific GOL (bsGOL) scores are obtained by minimizing the sum, over all the possible pairwise comparisons at hand, of the squared differences between the frequency of the observed GOL events and the sum of the predicted branch-specific values. The following expression is minimized to obtain the bsGOL scores: where b i,j is a Boolean variable that specifies the branches that are relevant for the estimation of a particular bsGOL (i.e., 0 if it is not relevant and 1 if it is), x j is obtained by minimizing L and is the actual bsGOL value, and GOL i are the estimated values from the pairwise comparisons, in other words, GOL i ¼ 1 À GOC i (Fischer et al. 2006). We approximated the GOC and bsGOL models described above to determine which of these models best fitted the data. In an attempt to better understand the process of gene order shuffling, we conducted a test to verify whether the changes observed in gene order occur randomly or whether they suggest other forces at work: Briefly, for each genome, we listed the order of genes, made all possible pairwise comparisons of these lists, estimated the GOC score , shuffled randomly the gene order, and estimated a new GOC value. We repeated this procedure 100,000 times and compared the original GOC score to the distribution of the GOCs after shuffling. We applied the Bonferroni correction for multiple testing and determined the significance (P values) of the comparisons. This test would indicate whether GOC between a pair of genomes is significantly different from random.

Influence of tRNA Distribution, Intergenic, and Intronic Elements on Gene Order
To determine which genomic elements play a significant role in shaping mt gene order evolution, if any, we first obtained Fungal Mitochondrial Genomics the number of mt protein-coding genes, introns, intronic ORFs, and repeats in all our sampled genomes. We also assessed the distribution of tRNAs, which is variable across fungal mt genomes (i.e., they can be dispersed across the genome, as in the case of Schizos. japonicus, or present in a few interspersed clusters, as is often the case in sordariomycetes), relative to mt protein-coding genes. The number of protein-coding genes, introns, and their associated intronic ORFs, as well as the number and location of tRNAs, were obtained from the annotations available in GenBank. Subsequently, we looked for simple repeats using RepeatMasker (Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0.1996-2010; http://www.repeatmasker.org, last accessed February 18, 2014) and mreps (Kolpakov et al. 2003) for detecting tandem repeats across the whole genomes. We focused on finding repeats located within intergenic regions because we hypothesize that they may be more likely to affect gene order. Additionally, we asked whether tRNAs are significantly more clustered in genomes with high GOC (i.e., where gene order is conserved) compared with genomes with low GOC. For every taxon, we listed the mt protein-coding genes and tRNAs in order; for each ordered list, we counted the number of noncontiguous tRNAs, performed 100,000 random permutations and recounted the number of noncontiguous tRNAs each time; we compared the count in the original ordered list with the distribution obtained by the permutations; we chose a 5% threshold for the significance of tRNA clustering. Finally, we investigated the influence that the amount of introns, intronic ORFs, intergenic repeats, and the number of predicted rearrangements may have on gene order variability. To this end, we employed Pearson's w 2 test, Fisher's exact tests, and randomization tests of independence to determine the correlation between the different genomic elements (i.e., number of introns, intronic ORFs, and repeats) and the number of rearrangement events predicted per fungal cluster.

Results
Our sampling in the dikarya data set provided the necessary taxonomic context and depth for a comprehensive analysis of gene order evolution in a phylogenetic context. The mtDNA of basal fungi was analyzed separately to obtain reliable alignments.

Phylogenetic Analysis in the Dikarya
All the 38 species analyzed in the dikarya data set have the standard core set of mt protein-coding genes (atp6, atp8, atp9, cox1, cox2, cox3, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, cob, rnl, and a variable number of tRNAs). In addition to these genes, we found the atypical presence of rsp3 (encoding the ribosomal protein S3) in S. commune, Mo. perniciosa, Pl. ostreatus, Tr. cingulata, and M. lychnidisdioicae. We also found rnpB (encoding the RNA subunit of mt RNase P) in the mt genomes of M. lychnidis-dioicae, S. commune, and U. maydis. To our knowledge, rnpB has not been described in other basidiomycete mt genomes; it has so far only been recognized in mtDNAs of a few zygomycete and ascomycete fungi, two protists, and never in animals and plants (Seif et al. 2003(Seif et al. , 2005. The inferred phylogenetic tree including all 38 species in the dikarya data set ( fig. 1) is in agreement with previously published fungal phylogenies (Marcet-Houben and Gabaldon 2009;Ebersberger et al. 2012), and most internal nodes are well supported (i.e., >90%). The global LF model that estimates a single evolutionary rate across the whole tree, that is, assuming a molecular clock, predicted 1.65 Â 10 À3 substitutions per site per time unit (Myr) and the local LF model, that is, without assuming a molecular clock, estimated 1.51 Â 10 À3 for the basal group (Al. macrogynus and Rhizophydium sp. 136), 2.1 Â 10 À3 for the ascomycetes, and 1.35 Â 10 À3 for the basidiomycetes, suggesting a faster evolutionary rate in ascomycetes relative to the basidiomycetes and the basal fungi sampled in this study. This rate is lower than the reported average rates for mammals (i.e., about 33.88 Â 10 À9 /Y, that is approximately 3.4 Â 10 À2 /Myr; Nabholz et al. 2009) but higher than that of plant mt genomes (i.e., 0.34 Â 10 À9 /Y, that is 3.4 Â 10 À4 /Myr; Wolfe et al. 1987). These are only approximate comparisons, as we did not analyze population data.

Rearrangements and Recombination in the Dikarya
Despite the overall conservation of the standard set of mt genes, we found striking variation in gene order among fungal species in the dikarya data set. Noteworthy exceptions to this trend are the nad4L/nad5 and nad2/nad3 genes, which tend to be next to each other in most species. The overlap of the stop and initiation codons between the particular genes in these two pairs is the most likely cause for their contiguity, as it occurs in Pleurotus mtDNA (Wang et al. 2008).
Because nonhomologous, intrachromosomal recombination is known to cause chromosomal rearrangements Bi and Liu 1996;Lobachev et al. 1998;Rocha et al. 1999;Waldman et al. 1999;Phadnis et al. 2005;Odahara et al. 2009;, it could potentially explain the high gene order variability we observe in fungal mitochondria. We thus set out to detect recombination among the syntenic regions and whole-genome alignments in all the fungal clusters. We did find signals of recombination in most alignments but not unequivocal evidence of nonhomologous, intrachromosomal recombination, as other types of processes may have generated similar signals. To be conservative, we decided to keep only those results supported with high confidence (P value < 0.05), but in general, most signals did not have a high support. The average recombination rate was estimated as the number of predicted recombination events normalized by the average substitution rate FIG. 1.-ML phylogeny of our sampled taxa including the 38 species in the dikarya data set. The gene tree was inferred from a concatenated alignment of 14 single-copy orthologous genes (atp6, atp8, atp9, nad1-nad6, nad4L, cob, and cox1-cox3). RAxML v.7.2.6 (Stamatakis 2006) was used assuming the LG substitution matrix and default parameters. On the right side of each taxon name is a series of colored boxes representing the mt gene order according to GenBank annotation. Bootstrap support appears next to each node. bsGOL values are shown next to each species name, and they are estimated by minimizing the following expression: L ¼ Recombination was not detected for the basal fungal cluster with high confidence. It is noteworthy that most recombination detection programs lack power when looking for sporadic traces of recombination, as it is the case in mt genomes (Posada and Crandall 2001;Barr et al. 2005;Neiman and Taylor 2009). Arguably, a better approach for investigating the evolution of gene order due to nonhomologous, intrachromosomal recombination is to use estimates of gene rearrangements as a proxy, as both involve identifying breakpoints, inversions and translocations. We, therefore, compared the gene order lists to infer the rearrangements that occurred between all pairs of species within each of the fungal clusters in the dikarya data set. The average minimal rearrangement rates, estimated as the number of predicted rearrangement events normalized by the average substitution rate for each clade (per fungal cluster) were: 0.03 for basidiomycetes (all three basidiomycete clusters); 0.01 for sordariomycetes; 0.04 for saccharomycetes1; 0.02 for eurotiomycetes; 0.05 for dothideomycetes; 0.02 for saccharomycetes2; and 0.03 for basals. These results are consistent with the overall higher gene order variability observed in basidiomycetes, saccharomycetes2, followed by the sac-charomycetes1, and in contrast to what is observed in sordariomycetes, dothideomycetes, and eurotiomycetes.

Gene Order Variability in the Dikarya
In the dikarya data set, GOC and bsGOL scores, estimated by the methods of  and Fischer et al. (2006), did not exhibit good fits to the patristic (phylogenetic) pairwise distance with tested empirical models (supplementary table S1, Supplementary Material online; fig. 2). The goodness-of-fit scores obtained were Model 0 ¼ 27.25, Model 1 ¼ 23.25, Model 2 ¼ 21.2, and Model 3 ¼ 24.43. Following Fischer's approach (Fischer et al. 2006) to refine the models with estimates of bsGOL, gene order scores were observed to vary slightly among fungal clusters (table 2; bsGOL values on the right side of each species name in fig. 1). Nevertheless, the average bsGOL score per group captures the GOL trend differences among fungal clusters: the highest average GOL score (i.e., where GOL is greatest) is for the basidiomycetes, with 0.21, followed by the early diverging fungi (Al.  fig. 3). This is also consistent with older clades, with deeper ancestral nodes, having more rearranged genes (e.g., basidiomycetes have a higher average bsGOL value than sordariomycetes).

Influence of tRNA Distribution, Intergenic, and Intronic Elements on Gene Order in the Dikarya
Rearrangements of the fungal mt genomes were influenced by the sequence characteristics, in particular the amount of repetitive DNA elements at intergenic regions. The average bsGOL value, normalized by the number of fungal species in each cluster, did not display significant correlation with any of the numbers of genomic elements (i.e., with either the number of repeats, the number of introns and their associated intronic ORFs, or the number and location of tRNAs, data not shown). Also, correlations between rearrangements and the proportions of introns and intronic ORFs were not significant (supplementary table S2, Supplementary Material online). However, the number of rearrangement events was significantly correlated with the proportion of repeats at intergenic regions (table 3): Pearson's w 2 test (observed w 2 ¼ 1,158.37, df ¼ 5, P value < 0.0001, alpha ¼ 0.05), Fisher's exact tests (P value two-tailed < 0.0001, where parameter a is adjusted by regression and t is the patristic distance between the two compared taxa.  alpha ¼ 0.05), Wilk'sG2 test of independence (observed Wilk's G2 value ¼ 1,156.383, df ¼ 5, P value < 0.0001), and a randomization test of independence with 5,000 Monte Carlo simulations (observed w 2 : 1,158.37, df ¼ 5, P value < 0.0001, alpha ¼ 0.05). Together, these results are consistent with the hypothesis that repeats favor recombination events, thereby promoting rearrangements that change gene order Bi and Liu 1996;Lobachev et al. 1998;Rocha et al. 1999;Waldman et al. 1999;Phadnis et al. 2005;Odahara et al. 2009;. In general, the more intergenic repeats in fungal mt genomes, the more likely it was to observe rearrangements and, therefore, gene order variability. The randomization test assessing pairwise GOC distributions and shuffled distributions relative to the patristic (phylogenetic) distance showed that the pairs of species whose gene order has evolved significantly differently from random correspond to the well-conserved gene order sets of the sordariomycetes (figs. 1 and 4). According to our randomization test, the other pairs of species have seen their mt DNA gene order change more or less randomly. The random permutation test implemented showed that the groups of species with highly conserved gene order, such as sordariomycetes, also showed tRNAs significantly grouped together in a few separate clusters along the mt chromosome (shown with a spiral on the right side in fig. 1). On the contrary, in species with high gene order variability (e.g., basidiomycetes), tRNAs tended to be scattered along the chromosome, consistent with the idea of tRNAs being associated to transposable elements that contribute to the variability in their distribution (Devine and Boeke 1996;Hughes and Friedman 2004) and favor the reorganization in the mt genome. Despite the presence of a few species with low gene order variability and significantly clustered tRNAs (Y. lipolytica, S. commune, Pl. ostreatus, Mo. perniciosa, and Tr. cingulata), we nevertheless detected a trend for most species with conserved gene order to have significantly clustered tRNAs and species with low conservation order to have more scattered tRNAs.

Gene Order Variability in Basal Fungi
Basal fungal taxa, being highly divergent with respect to ascomycetes and basidiomycetes, were analyzed separately to obtain reliable alignments. On the basis of the similarity matrix obtained for the basal data set, we performed a clustering analysis (previously described for the dikarya data set) that resulted in three clusters of basal species that could be reliably aligned together. We thus aligned the Blastocladiomycetes: Al. macrogynus with Bl. emersonii, the Glomeromycetes: Gi. margarita with Gl. intraradices, and the Monoblepharidomycetes: Harpochytrium sp. together with H. curvatum and Monoblepharella sp.
No recombination events were reliably detected in any of the alignments of basal fungi. The evolutionary rates (substitutions per site per Myr) predicted by r8s assuming the NPRS model were: 8 Â 10 À4 ,1 for the Blastocladiales, 6 Â 10 À4 for the Monoblepharidales, and 1 Â 10 À3 for the Glomeromycota. The rearrangement rate per clade per Myr, as predicted by UniMoG and r8s were: 0.007 for the Blastocladiales, 0.02 for the Monoblepharidales, and 0.06 for the Glomeromycota.   , and none of the sampled variables including introns, intronic ORFs, tRNAs or repeats appeared to have an effect on gene order. However, given the low number of data points available for the correlation analysis, we take these results with caution, as there may be low detection power. We therefore suggest that additional basal fungi need to be available before stronger conclusions can be drawn about the proximal causes of gene order variability. Overall, these results suggest that the mitochondria in basal fungi have evolved with a faster evolutionary rate relative to ascomycetes and basidiomycetes. On the other hand, mtDNA in basal fungi show comparable rates of rearrangements (an average of 0.029 events/Myr) with respect to ascomycetes and basidiomyctes, with the notable exception of Blastocladiomycetes (which exhibit a lower rate by one order of magnitude). Lynch et al. (2006) have pointed out that nonselective forces such as drift and mutation may account for major differences in organelle genome structure among animals, plants, and unicellular eukaryotes. Mutation rates for mtDNA vary strikingly between these groups of organisms, with animals at the highest mutation rate spectrum and plants at the lowest. According to our results, fungal mtDNAs lie at intermediate mutation rates between animal and plant mtDNA. Important features are shared between fungal and plant mtDNA: lower substitution rates than in animal mitochondria, the presence of introns and associated mobile elements, higher noncoding DNA than in animal mt genomes, and the capability of recombination and the presence of recombination-associated DNA repair mechanisms. Galtier (2011) has suggested that life cycle, metabolic, and ageing (senescence) differences may explain these striking differences between animal and plant mtDNA. We argue that such differences could also explain the discrepancies with respect to animal mtDNA and the similarities with plant mt genomes, although these comparisons have not been specifically addressed, as far as we know.

Discussion
Here, we have shown that there is high variability in terms of mt gene order among fungi, both between and within the major phyla (i.e., basidiomycetes, ascomycetes, and early diverging fungi). The mt genomes of basidiomycetes are in general among the most rearranged groups (average bsGOL ¼ 0.21), but other groups defined in this study, in particular the saccharomycetes1 and saccharomycetes2, also show high variability (bsGOL ¼ 0.2 and 0.18, respectively). On the contrary, the sordariomycetes and the eurotiomycetes have highly conserved gene arrangements (bsGOL ¼ 0.1 and 0.14, respectively). Although GOL can occur rapidly within a clade, as seen with the pairwise GOC models, it does not appear to increase linearly with time. The average bsGOL scores are somewhat more powerful at detecting trends in GOL than the empirical models for pairwise GOC. This means that, even if it is not very strong, there is a phylogenetic component to gene order variability patterns. Moreover, bsGOL scores show a moderate correlation with branch lengths. This indicates that time contributes somewhat to GOL although there could be other confounding factors (e.g., gene loss in the saccharomycetes2 and Schizos. japonicus).
GOC/GOL models measure gene order variability through time but do not offer a mechanistic explanation of this process. We used a simple nonparametric randomization test to try to identify the propensity of particular fungal groups to have greater change in gene orders. What our test showed is that, except for the sordariomycetes, which have remarkably conserved gene order within the group, all other fungal clusters seem to have genes rearranged more or less randomly. One could think that sordariomycetes display a highly conserved gene order because they constitute a relatively young fungal group. Nevertheless, in the same time unit of a million years, they have the lowest rearrangement rate compared with the other fungal clusters. Time alone does not explain gene order changes.
FIG. 4.-Pairwise GOC values as a function of the phylogenetic (patristic) distance between them, for the dikarya data set. Here, we conducted a randomization test as follows: for each genome, we listed the order of genes, made all possible pairwise comparisons of these lists, estimated the GOC score , shuffled randomly the gene order, and estimated a new GOC value. We obtained 100,000 reshufflings and compared the original GOC to the distribution of the shuffled GOCs. We applied the Bonferroni correction for multiple testing and determined the significance (P values) of the comparisons. The red dots represent significant P values, which correspond to the group of sordariomycetes (in fig. 1, the clade grouping Chaetomium thermophilum, Podospora anserina, Gibberella zeae, F. oxysporum, Lecanicillium muscarium, Cordyceps bassiana, Beauveria bassiana, and Metarhizium anisopliae).
Among the genomic elements studied here, repeats at intergenic regions show the strongest correlation with gene order. According to theoretical studies, the accumulation of repeats, among other mtDNA structural features, seems to be driven mostly by drift and mutation pressure, which are in turn largely determined by population size dynamics (e.g., genome size reduction or bottlenecks (Lynch and Blanchard 1998;Lynch et al. 2006). Although intron-associated ORFs, in particular those encoding HEs, have a great potential to insert copies in different locations within the genome, thereby changing gene order, we did not observe a strong correlation with gene rearrangements. If they play a role in shaping gene order it appears to be less important than that of simple and tandem repeats, especially those repeats present at intergenic regions.
The distribution of tRNAs contributes to protein-coding gene order variation among fungi, as they themselves can change location (Perseke et al. 2008). tRNAs have been associated with breakpoints involved in nuclear chromosomal rearrangements (Di Rienzi et al. 2009), and our results about fungal mtDNAs are consistent with this observation. Species showing the highest gene order variability are those showing a scattered tRNA distribution (e.g., Schizos. japonicus), as opposed to less variable species, whose tRNAs tended to be clustered (e.g., in sordariomycetes). Another source of gene order variability is gene loss (e.g., due to transfers to the nucleus), which could be important in the saccharomycetes2 and Schizos. japonicus. Finally, although less frequent, HGTs can also contribute to gene order changes (e.g., Bergthorsson et al. 2004) but we did not investigate it here.
A commonly invoked mechanism to explain gene order changes is the "tandem-duplication-random-loss" model (Lavrov et al. 2002) that was first proposed to explain gene order in millipedes and suggested that the entire mtDNA was duplicated with a subsequent loss of gene blocks. In our case, this model could partly explain the gene order differences (only the loss and not the duplication) observed among sac-charomycetes2 and in Schizos. japonicus due to the gene loss of the NADH gene family (Gabaldon et al. 2005), as these losses necessarily resulted in gene order changes relative to the ancestral gene arrangement that included the NADH genes. The tandem-duplication-random-loss model, however, cannot account for inversions and transpositions, which may be better explained by nonhomologous, intrachromosomal recombination. We suggest that most of the observed gene rearrangements among fungal mtDNAs are very likely caused by this or a similar recombinational mechanism. Notably, recombination has been reported in vitro and in natural populations for fungal mt genomes (van Diepeningen et al. 2010).
Difficulties in detecting recombination based on sequence data can result from multiple factors, including the scale of the genomic regions involved, where analysis of adjacent nucleotides may fail to detect recombination occurring across large physical distances (Neiman and Taylor 2009) or where sequences are not divergent enough for software to detect them (at least two phylogenetically informative sites must exist to each side of the recombination breakpoint [Martin et al. 2011]). Also, the power to detect recombination depends on the effective population size, which in the case of mitochondria depends on the bottleneck levels attributable to mt transmission (Neiman and Taylor 2009). Finally, although in principle there is one homologous site per base available for homologous recombination, there are many more sites available for nonhomologous recombination. This is consistent with the latter type of recombination being more likely to promote gene order changes.
Ectopic recombination is often facilitated by the presence of repeats in both plant and fungal mtDNAs, and it can have serious detrimental effects, including disruption of coding frames or gene expression alteration (Galtier 2011). Different strategies to protect the genome from the negative effects of ectopic recombination have evolved and they are remarkably different in plant and animal mtDNAs (Galtier 2011): although animal mtDNAs avoid the accumulation of repeats and introns at the cost of a higher mutation rate (Lynch et al. 2006), plant mtDNAs have selected for efficient recombination-mediated DNA repair mechanisms, thus explaining the low mutation rate observed in plant mt genomes (Odahara et al. 2009;Davila et al. 2011). Moreover, efficient mismatch repair is often accompanied by gene conversion in plant mtDNA (Davila et al. 2011). In this study, we have not investigated recombination-associated DNA repair mechanisms in fungal mt genomes; it is nevertheless interesting to speculate whether fungi have selected for mtDNA repair mechanisms similar to those found in plants as defense against repeat accumulation. It is known, for instance, that in P. anserina the nuclear-encoded gene grisea protects mtDNA integrity from the deleterious effects of ectopic recombination (Belcour et al. 1991). It would be particularly interesting to test this hypothesis in other fungal mtDNAs such as those of the sordariomycetes that show evidence of recombination (Kouvelis et al. 2004;Ghikas et al. 2006;Pantou et al. 2008) and high GOC.
The evolution of gene order in fungal mitochondria, particularly in basidiomycetes, suggests a complex interplay of opposing evolutionary forces. Although mt genes tend to be conserved at the sequence level due to their importance in cellular metabolism, here we have shown that in fungal mtDNA gene order is relatively free to vary, and that this variation is probably largely due to recombination (most likely nonhomologous, intramolecular). Indeed, in most studies, the diversity of gene order in mitochondria is taken as evidence of effective recombination. Furthermore, some mtDNA sequence characteristics appear to contribute to gene order variability. In particular, repeats at intergenic sequences tend to increase the probability of recombination, both homologous and nonhomologous, thereby facilitating rearrangement events, in agreement with numerous previous