The Actinobacteria are found in aquatic and terrestrial habitats throughout the world and are among the most morphologically varied prokaryotes. They manufacture unusual compounds, utilize novel metabolic pathways, and contain unique genes. This diversity may suggest that the root of the tree of life could be within the Actinobacteria, although there is little or no convincing evidence for such a root. Here, using gene insertions and deletions found in the DNA gyrase, GyrA, and in the paralogous DNA topoisomerase, ParC, we present evidence that the root of life is outside the Actinobacteria.
The Actinobacteria, among the most morphologically diverse prokaryotes, are widely distributed in both terrestrial and aquatic ecosystems (Embley and Stackebrandt 1994). Actinobacteria employ varied metabolic mechanisms, although no photosynthetic members are known. They are primarily chemoheterotrophs, which either respire or ferment. Their oxygen tolerances vary from strictly aerobic, to facultatively anaerobic, to microaerophilic, or to strictly anaerobic. In addition to utilizing some unique biochemical pathways not found in other prokaryotes, they also synthesize many macromolecules absent from other organisms, such as unique cell wall peptidoglycans (Gokhale et al. 2007). Given their diverse morphological and biochemical repertoires (Embley and Stackebrandt 1994; Boone and Castenholz 2001; Garrity and Holt 2001), properties that might indicate a deep placement in the tree of life, we investigate whether the root of life is contained within the Actinobacteria.
Here, we use top-down rooting to probe the origins of the Actinobacteria. This method analyzes indels, inserts and deletions, that are present in ingroup genes but are absent in some or all paralogous outgroup genes. Indel-based rooting methods are related to traditional methods of sequence-based rooting (Dayhoff and Schwartz 1980; Gogarten et al. 1989; Iwabe et al. 1989) but exclude roots rather than directly reconstructing rooted trees (Rivera and Lake 1992; Baldauf and Palmer 1993; Lake et al. 2007). We apply the method to all available Actinobacterial, double-membrane prokaryotic, Firmicute, and Archaeal sequences. Together these 4 groups represent all known prokaryotic life (Boone and Castenholz 2001). Archaea are primarily extremophiles and include many hyperthermophiles; Firmicutes, formerly named the low-GC gram positives, contain organisms like clostridia and bacilli; and double-membrane prokaryotes are a speciose, possibly primitively photosynthetic taxon exclusively containing all prokaryotes surrounded by double membranes.
Top-down rooting has provided evidence for excluding the root from all but 4 regions of the tree of life (Skophammer et al. 2006; Lake et al. 2007; Skophammer et al. 2007). The 4 remaining locations are 1) on the branch leading to the double-membrane prokaryotes; 2) on the branch leading to the Actinobacteria; 3) on the branch leading to the clade of the Firmicutes and the Archaea; and 4) within the Actinobacteria. Applying top-down rooting to an indel present in the type II DNA topoisomerase (GyrA) (Gupta 1998) and to the paralogous topoisomerase IV (ParC) (Champoux 2001), we provide evidence that the root of the tree of life is excluded from within the Actinobacteria and, thereby, reduce the number of possible locations for the cenancestral root.
DNA topoisomerases are essential in eubacteria, archaea, and eukaryotes. They serve to relieve the topological strains encountered by a cell during replication, transcription, recombination, and chromatin remodeling. Type II DNA topoisomerases introduce double-strand breaks and are adenosine triphosphate dependent. Type II DNA topoisomerases are further subdivided into type IIA found in all domains of life and type IIB topoisomerases found only in Archaea. Gyrase and topoIV are well-documented paralogs in the Topo IIA family and exhibit extensive sequence similarity (Champoux 2001). The prokaryotic homologs of gyrase and topoIV are heterotetramers. Gyrase contains 4 subunits, 2 each of GyrA and GyrB. These are homologous to the 2 subunits of topoIV, ParC, and ParE, respectively. Gyrase genes are ubiquitous, whereas topoIV genes are present within the Eubacteria but missing in the Archaea.
Upon comparing alignments of GyrA and ParC sequences, we confirmed that a 4 amino acid GyrA insert (Gupta 1998) is present in Actinobacterial gyrase sequences, between orthologous Escherchia coli positions 204 and 205 and absent in all other prokaryotic gyrase sequences. We report that this insert is missing in all eubacterial ParC sequences (ParC is absent in the Archaea) and analyze this information using top-down rooting. Representative sequences of these root-informative genes are summarized in table 1, and complete alignments of nearly 500 GyrA and ParC sequences are included in the supplementary analyses and data, sections S1 and S2 (Supplementary Material online), respectively.
Our rooting analyses are summarized in figure 1. For 4 taxa, there are 9 possible trees, corresponding to 4 crown groups, 4 stem groups, and 1 internal branch. The most parsimonious rooted trees for each of the 9 possible rootings are shown in figure 1. Note that the leaves of the unrooted trees are divided into 2 separate regions because the groups corresponding to the Actinobacteria (A), the double-membrane prokaryotes (D), the Firmicutes (F), and the Archaea (R) represent higher level phylogenetic clades rather than single sequences. Thus, the roots within the distal portions of the leaves (roots 1, 2, 8, and 9) are shown as 2 lines to represent the branching within these crown groups. The proximal portions of the leaves correspond to roots 3, 4, 5, 6, and 7. As shown by the large X in figure 1, the least parsimonious rooted tree—root 2,within the Actinobacteria—requires 3 changes, whereas roots 1 and 3–9 require only 2 changes. Others have suggested that GyrA has been transferred into the Archaea (Gadelle et al. 2003). Hence our rooting calculations assume the Archaeal GyrA genes are missing and uniformly eliminate root 2—the Actinobacterial root (for analyses, see supplementary sections S3 and S4, Supplementary Material online). Comparisons of indel distributions with GyrA gene trees showed no evidence for indel homoplasy (supplementary section S5, Supplementary Material online). Analyses of GyrA and ParC indel flanking sequences provide significant statistical support, P < 0.015, for excluding the root from the Actinobacteria (supplementary section S2, Supplementary Material online). Together these tests provide strong evidence for excluding an Actinobacterial root.
Previous analyses of directed indels have excluded roots within the double-membrane prokaryotes, within the Archaea, on the segment connecting the eukaryotes to the double-membrane prokaryotes, on the segment connecting the eukaryotes to the Archaea and within the Firmicute–Archaeal–eukaryotic clade (Skophammer et al. 2006; Lake et al. 2007; Skophammer et al. 2007). These excluded roots, plus the results presented here excluding an Actinobacterial root, are summarized on the tree of life and on the ring of life in figure 2 top and bottom, respectively. Shown for reference are a root within the clade of double-membrane prokaryotes, *, based on transition analyses (Cavalier-Smith 2006) and the classical root based on sequence analyses of anciently duplicated gene paralogs, X, (Gogarten et al. 1989; Iwabe et al. 1989). The 3 remaining roots are located on the branch (stem) leading to the double-membrane prokaryotes, root 1, on the branch leading to the Actinobacteria, root 2, and on the branch leading to the Firmicute/Archaeal clade, root 3. We hope that future indels will facilitate further testing of these roots.
Supplementary analyses and data are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
This study is supported by grants from National Science Foundation (NSF) and the University of California, Los Angeles, National Aeronautics and Space Administration Astrobiology Institute to J.A.L. The authors J.A.S., C.W.H., and R.G.S. were supported by a Cell and Molecular Biology Training Grant from National Institutes of Health (NIH), a Genomic Interpretation and Analysis Training Grant from NIH, and an Integrative Graduate Education and Research Traineeship training grant from NSF, respectively.