-
PDF
- Split View
-
Views
-
Cite
Cite
David E.K. Ferrier, The origin of the Hox/ParaHox genes, the Ghost Locus hypothesis and the complexity of the first animal, Briefings in Functional Genomics, Volume 15, Issue 5, September 2016, Pages 333–341, https://doi.org/10.1093/bfgp/elv056
Close - Share Icon Share
Abstract
A key aim in evolutionary biology is to deduce ancestral states to better understand the evolutionary origins of clades of interest and the diversification process(es) that has/have elaborated them. These ancestral deductions can hit difficulties when undetected loss events are misinterpreted as ancestral absences. With the ever-increasing amounts of animal genomic sequence data, we are gaining a much clearer view of the preponderance of differential gene losses across animal lineages. This has become particularly clear with recent progress in our understanding of the origins of the Hox/ParaHox developmental control genes relative to the earliest branching lineages of the animal kingdom: the sponges (Porifera), comb jellies (Ctenophora) and placozoans (Placozoa). These reassessments of the diversity and complexity of developmental control genes in the earliest animal ancestors need to go hand-in-hand with complementary advances in comparative morphology, phylogenetics and palaeontology to clarify our understanding of the complexity of the last common ancestor of all animals. The field is currently undergoing a shift from the traditional consensus of a sponge-like animal ancestor from which morphological and molecular elaboration subsequently evolved, to a scenario of a more complex animal ancestor, with subsequent losses and simplifications in various lineages.
Introduction
Hox/ParaHox genes are a distinctive type of developmental control gene in animals, encoding transcription factors, which form a subset of the ANTP class of homeobox genes [ 1 , 2 ]. They are renowned for their clustered organization and for exhibiting Colinearity—gene order in the cluster corresponding to gene activation (which can take various forms: spatial, temporal, quantitative, virtual [ 3–7 ]), although this is not universal, and plenty of examples exist of Hox and ParaHox genes that are no longer clustered and, hence, do not exhibit Colinearity [ 8–10 ]. Despite this diversity in organization, or in some cases because of it, the Hox (and to a currently lesser extent, the ParaHox) genes have become established as a paradigm for studying gene regulation in gene clusters, as well as a major focus for animal evolutionary developmental biology (evo-devo) because of the integral roles that these genes have in constructing animal form and its diversity. The evolutionary origin of such genes is thus of implicit interest.
In the early days of molecular evo-devo, the Hox genes were proposed to have been a central element of the Zootype—a defining aspect of being an animal (metazoan) [ 11 ]. However, a closer look at the earliest branching lineages of the animal kingdom revealed that some lineages seem to lack Hox/ParaHox genes completely, and it was concluded that the origin of the Hox/ParaHox genes occurred some time after the origin of the animal kingdom [ 12–14 ]. Until recently, this was based on searches involving approaches such as polymerase chain reaction (PCR) and genomic or complementary DNA library screening, and so there was always the possibility that genes could have been missed because of technical reasons (rather than biological, i.e. absence from the genome). With the advent of whole genome sequencing and the rapid decrease in sequencing costs, along with improved bioinformatics capabilities and sequencing technologies, the search for Hox/ParaHox genes has shifted to searching through whole genome sequences. Genes can still be missed by such searches because of the inevitable gaps in draft genome sequences; however, these searches based on whole (or nearly whole) genome sequences are clearly much more thorough than the older PCR/library-screening approaches.
The earliest examples of whole genome sequences from ‘basal' (early branching) animal lineages enforced the previous views that Hox/ParaHox genes were absent from the earliest stages of animal evolution (including poriferans and ctenophores). In this context, the name ParaHoxozoa was proposed to distinguish all of the animal phyla that contained Hox and/or ParaHox genes from the earliest lineages (Porifera, Ctenophora) that supposedly lacked these genes [ 14 ]. Note that terms like ‘basal lineages’ or ‘earliest-branching lineages’ do not imply that the extant organisms at the termini of these lineages are necessarily representative of the ancestral forms at the nodes from which these lineages evolved. An example that illustrates this point is that of the Ctenophora discussed below, for which extant, modern-day forms are likely to be different from the forms on the lineage stem soon after it split from the lineage giving rise to the rest of the animal phyla.
In what follows I will summarize how this has dramatically changed recently: a ParaHox gene has been discovered in poriferans and wider genome sequencing is enhancing our realization that differential gene loss is rife between animal lineages, producing Ghost Hox and ParaHox Loci for example [ 15 , 16 ]. This has important implications for our reconstructions of ancestral states for animals and consequently our understanding of the origin of the animal kingdom as a whole and what the first animal (or at least the last common ancestor of current animals) was like in terms of its genome content. Furthermore, if we can deduce the developmental control genes that were present and understand what these genes were doing and building (e.g. by comparison of the roles of these genes in extant animals), we will almost certainly have to revise our view of the appearance of this ancient ancestor.
The origin of the Hox/ParaHox gene clusters from a ProtoHox state
In 1998 Brooke et al . [ 17 ] revolutionized our understanding of the origin of the Hox genes following their discovery of the ParaHox cluster in amphioxus ( Branchiostoma floridae ). They hypothesized the ancestral existence of a ProtoHox cluster that gave rise to the Hox and ParaHox clusters via a whole-cluster duplication. Before the discovery of the ParaHox cluster, the consensus was that the Hox cluster was unique and had evolved via a series of tandem gene duplications in an ancestral animal. The Hox cluster has then been conserved in many present-day animals (but with various further lineage-specific changes, including extra gene duplications, losses and rearrangements in distinct species). However, the ProtoHox hypothesis separated the stage of initial tandem gene duplications from the origin of the Hox cluster by positing that a ProtoHox gene cluster was generated in the first instance and that this ProtoHox cluster then underwent a change that gave rise to both the Hox and ParaHox clusters simultaneously [ 13 , 17 , 18 ]. In the original formulation of the ProtoHox hypothesis, this transition from the ProtoHox state involved a whole-cluster duplication [ 17 ]. Subsequent variations of the ProtoHox hypothesis have debated whether this cluster duplication was a trans or cis duplication, or whether it was not a duplication at all but instead involved splitting a cluster into two [ 18–20 ].
Regardless of which of these specific scenarios reflects the truth, all involve relatively large-scale rearrangement events involving multiple Hox/ParaHox genes and associated neighbouring genes. Several types of duplications and rearrangements that are known to be prevalent in the evolution of genome architecture are consistent with scenarios of ProtoHox to Hox/ParaHox evolution. These include such things as whole genome or whole chromosome duplication, followed by gene losses around the new distinct Hox and ParaHox loci, so that distinctive neighbourhoods (in terms of gene content) result around the Hox versus ParaHox loci ( Figure 1 A). Alternatively, the ProtoHox cluster could have been within a segmental duplication, the vast majority of which occur intrachromosomally at their point of origin (at least in genomes of those extant animals that have been studied; reviewed in [ 21 ]). Or a ProtoHox cluster containing distinct precursors of both Hox and ParaHox genes could have been split by an inversion event. If these clusters were formed by either an intrachromosomal segmental duplication or an inversion, the newly separated Hox and ParaHox clusters would have initially existed on the same chromosome and, thus, be linked to the same neighbouring genes ( Figure 1 B). To arrive at the situation found in extant animals, with the Hox and ParaHox clusters located on different chromosomes and surrounded by distinctive gene neighbourhoods, a chromosome fission or an interchromosomal arm exchange could have occurred ( Figure 1 B). Such mutations again are common in modern-day animal genomes (e.g. [ 22 ]). In contrast to the first possible scenario (of whole genome or chromosome duplication, followed by differential gene losses around the Hox and ParaHox loci), the second scenario of separated Hox and ParaHox clusters on the same chromosome followed by an interchromosomal arm exchange or chromosome fission will immediately result in distinctive gene neighbourhoods around the Hox and ParaHox loci, without the need for differential gene losses. There are, of course, alternative routes to the Hox and ParaHox clusters being on different chromosomes and surrounded by distinctive gene neighbourhoods, but these alternative routes are less likely to have occurred, judging from the relative rates of the required mutational events when compared with those described above (as discussed in [ 15 ], and see below).
Likely potential routes to the origin of the Hox and ParaHox clusters on distinct chromosomes after evolution from a ProtoHox cluster. Black rectangles represent the homeobox genes of the ProtoHox, ParaHox and Hox clusters. Triangles represent genes that neighboured (were linked to) the ProtoHox cluster and then became Hox cluster neighbours, while ovals represent genes that neighboured (were linked to) the ProtoHox cluster and then became ParaHox cluster neighbours. Note, the ‘triangles’ and ‘ovals’ do not take on their specific Hox or ParaHox loci identities until the Hox and ParaHox clusters have arisen and separated onto distinct chromosomes, and any necessary gene losses have occurred to make the Hox and ParaHox neighbours largely mutually exclusive (shown as black triangles and ovals in the figure). Any paralogous genes that are affiliated with both Hox and ParaHox loci are not considered here (e.g. collagens and tyrosine kinase receptor genes [ 19 ]). ( A ) WGD = whole genome duplication WCD = whole chromosome duplication, X = gene loss, ( B ) SD = segmental duplication, and ‘single cluster split*' involves the number of homeobox genes in the ProtoHox cluster increasing to form the precursors to both the Hox and ParaHox gene families before the ProtoHox cluster being split, which is likely to result in the Hox and ParaHox first being linked because of the split initially being intrachromosomal before some form of translocation to distribute the clusters onto distinct chromosomes. Note, it is possible that a prior intrachromosomal inversion event is not required if the interchromosomal arm exchange happened to have a break precisely within the ProtoHox cluster. While this is formally possible, the likelihood intuitively seems to be low.
Alternatives to the ProtoHox cluster producing Hox and ParaHox clusters, and why these are less likely
Secondary coming together of homeobox genes
An alternative to the ProtoHox to Hox/ParaHox hypothesis is that the homeobox genes came together secondarily, to separately form distinct Hox and ParaHox clusters that both then evolved Colinearity, rather than being generated by a whole-cluster duplication or split. This view has never gained much traction, perhaps because the sequence relationships between the Hox and ParaHox genes imply that they did originate by duplications from each other. Because by far the most prevalent mode of gene duplication is tandem duplication (reviewed in [ 21 ]), the alternative of Hox and ParaHox genes coming together secondarily would most likely have required that the genes initially were adjacent, then separated and then came together again. This strikes most people as less parsimonious than the ProtoHox to Hox/ParaHox explanation, particularly when the independent evolution of Colinearity also needs to be invoked.
However, this alternative must be kept in mind in case it eventually turns out that there is a mechanistic basis that could have enabled ordered clusters to form secondarily and independently. Such mechanisms are not yet known. Note that such ordered clusters are distinct from the looser coming together of functionally related or co-regulated genes into distinct regions of some genomes [ 23 ] or the unordered association of genes into Topologically Associating Domains [ 24 ]. Nevertheless, there are some intriguing pieces of data that hint at the possible existence of mechanisms underpinning a secondary coming together into highly ordered clusters.
Secondary coming together of homeobox genes, at least along the same chromosome (i.e. intrachromosomally), appears to have happened in the case of the NK genes in distinct Drosophila lineages [ 25 ]. Also, this time interchromosomally, the Post1 Posterior Hox gene of Platynereis dumerilii has translocated out of the Hox gene cluster and off the Hox-bearing chromosome, and of all of the other chromosomes that it could have moved to ( P. dumerilii 2n = 28), it has secondarily become linked to the NK cluster genes [ 26 ]. Finally, Duboule [ 27 ] has argued that the vertebrate Hox clusters have been consolidated during evolution and secondarily became more highly ordered via the evolution of regulatory mechanisms that may well be specific to this group of animals. This uncertainty about the evolutionary dynamics effecting the distribution of duplicated genes across genomes needs to be addressed with wider taxon sampling and reconstruction of homeobox clusters within the context of their associated chromosome-scale neighbourhoods. In addition, deeper understanding of the regulatory mechanisms operating across these homeobox gene clusters is also required.
ProtoHox to Hox/ParaHox involving small-scale transposition
The types of chromosome rearrangements associated with hypotheses about the origin of the Hox and ParaHox clusters were outlined above, namely whole genome or chromosome duplications (followed by differential gene loss) or intrachromosomal duplications followed by chromosome arm exchange or fission. The plausibility and the prevalence of these types of mutations underpin the logic of the Ghost Locus hypothesis, whereby the genomic neighbourhoods of these homeobox genes can be used to trace their evolution [ 15 ]. Other types of gene duplications and translocations do, however, exist. But it is the much lower rates of these other mutations or their less-appropriate natures (when considering the overall biology of the Hox and ParaHox genes) that make them much less likely to have been involved in Hox/ParaHox evolution.
In the original formulation of the Ghost Locus hypothesis, Mendivil-Ramos and colleagues [ 15 ] (see also [ 21 ]) outlined the data that showed the much lower levels or inappropriateness of these alternative mutations, including retrotransposition and small-scale DNA-mediated interchromosomal translocations at the point of duplicate origin (see Supplementary Information in [ 15 ]). The data are inevitably drawn mainly from species for which high-quality genome sequences are available, which are reliably assembled to the chromosome scale to distinguish intrachromosomal from interchromosomal locations. This currently tends to restrict the analyses to conventional model systems like drosophilids, nematodes and some vertebrates like humans. In the intervening time since the original formulation of the Ghost Locus hypothesis, further data have provided additional support. A recent example is the analysis of duplications in the human genome [ 28 ].
In their analysis of young segmental duplications in humans, Bu and Katju [ 28 ] found that intrachromosomal duplications were clearly most prevalent, accounting for 100% of the youngest category, which presumably contains the duplications closest to their point of origin. Also, human segmental duplication sizes are larger than previously noted ([ 28 ]; see Supplementary Information of [ 15 ] and references therein), but are still on the small side when considering that duplication of a multigene array is, statistically speaking, most likely in the case of ProtoHox to Hox/ParaHox [ 29 ]. It is not clear, however, that duplication processes in humans are typical for animals as a whole [ 30 , 31 ] because of an apparent elevation of the activity of mobile elements at some point in primate evolution. This logic can, of course, be extended to the last common ancestor of all animals whose genome rearrangements may have occurred in a different way from present-day animals. It is to be hoped that improved sequencing technologies and assembly algorithms will increase the taxonomic range of well-assembled genomes so that more generally applicable rates for the various genome rearrangement mutations can be deduced to better assess the logic underlying ideas like the Ghost Locus hypothesis for Hox/ParaHox origins.
While we must obviously adopt the standard scientific approach of following the most likely or probable hypothesis, which in this case clearly involves the types of mutations invoked by the Ghost Locus hypothesis, it formally remains possible that an alternative route to the separation of Hox and ParaHox genes onto distinct chromosomes may have occurred. In this vein, there are intriguing instances of multigene interchromosomal translocations having occurred, including the MEDEA transposable element in beetles including several integral genes [ 32 ] or circular DNA translocations in cattle spanning several hundred kilobases or possibly moving single genes like the vasa genes of Tilapia [ 33 , 34 ]. The prevalence of such phenomena and their relative rates compared with, for example, intrachromosomal duplications, requires further investigation before they can be reasonably invoked as a plausible route to the origin of the Hox and ParaHox clusters.
The sponge revolution
These plausible genome rearrangement scenarios, based on commonly occurring types of mutations, form the background to the discovery of the Hox and ParaHox ghost loci, in which distinct Hox and ParaHox neighbourhoods can be detected in an animal like the sponge Amphimedon queenslandica , even though these loci do not contain the Hox/ParaHox homeobox genes themselves [ 15 ]. The idea of ghost loci first arose from consideration of the placozoan, Trichoplax adhaerens , which contains a single Hox/ParaHox-like gene ( Trox-2 ) that was of debated affinity: it was either viewed as a ParaHox gene [ 35 ] or as a direct descendant of a ProtoHox state [ 36 , 37 ], this second hypothesis requiring subsequent asymmetric evolution of the Hox/ParaHox genes to account for the clear grouping of Trox-2 with the Gsx family of ParaHox genes. Because asymmetric evolution of duplicated genes is known to be common (e.g. [ 38–41 ]), the molecular phylogenetic trees struggle to distinguish between these two alternative and plausible hypotheses for the identity of Trox-2 .
An alternative approach to distinguishing whether Trox-2 was a ParaHox or ProtoHox gene was required. Mendivil-Ramos and colleagues [ 15 ] used a synteny approach based on the evolutionary scenario outlined above for the origin of the Hox and ParaHox clusters from the ProtoHox state, which entails common, large-scale (multigenic) chromosomal rearrangement mutational events that lead to distinctive gene neighbourhoods around the newly formed Hox and ParaHox clusters. Another key finding that enables this synteny approach is that gene neighbourhoods (synteny) are conserved in some lineages from these ancient evolutionary times, including the last common ancestor of cnidarians and bilaterians [ 42 ] and the last common ancestor of placozoans and bilaterians [ 43 ].
Based on the detectable (statistically significant) conserved synteny between T. adhaerens and the last common cnidarian-bilaterian ancestor, Mendivil-Ramos and colleagues [ 15 ] found that Trox-2 resided in a ParaHox neighbourhood, and hence, on the basis of synteny as well as the previous molecular phylogenetic trees, Trox-2 is a ParaHox gene. However, under the ProtoHox to Hox/ParaHox hypothesis, one expects that an animal with a ParaHox locus should also have a Hox locus, and so the lack of a Hox gene from T. adhaerens to complement the Trox-2 ParaHox gene was problematic. This was solved by using the Putative Ancestral Linkage groups (PALs) of the last common cnidarian-bilaterian ancestor generated by Putnam and colleagues [ 42 ], which contained a list of genes that neighboured the Hox cluster in this ancestor. Mendivil-Ramos and colleagues [ 15 ] could clearly detect orthologues of many of these last common cnidarian-bilaterian ancestor Hox neighbours in T. adhaerens . Also, more importantly, they are clustered in a distinct region of the placozoan genome, which even though it does not contain a Hox gene it is clearly homologous to the Hox neighbourhoods of the last common cnidarian-bilaterian ancestor and extant cnidarians and bilaterians. The Hox neighbourhood of T. adhaerens is thus a Ghost Hox Locus.
With the discovery of the placozoan Ghost Hox Locus and lists of genes in Hox and ParaHox neighbourhoods, or PALs, of the last common ancestor of placozoans and cnidarians/bilaterians, the issue of the absence of Hox and ParaHox genes from sponges could be addressed afresh. In the first sponge genome to be fully sequenced, that of A. queenslandica [ 44 ], the orthologues of these Hox and ParaHox neighbours clustered into two distinct regions of the genome to statistically significant levels [ 15 ]. Amphimedon queenslandica thus appears to have distinct Ghost Hox and ParaHox loci.
This Ghost Locus hypothesis, involving loss of Hox and ParaHox genes from a basal lineage like the sponge A. queenslandica , received a major boost with the subsequent discovery of different species of poriferans that had retained a ParaHox gene. The calcisponges Sycon ciliatum and Leucosolenia complicata were found to contain Cdx genes [ 16 ]. The principal line of evidence supporting this assignment of these calcisponge genes to the Cdx family came from molecular phylogenetic trees. When dealing with such trees, which are built from the 60 amino-acid homeodomain, node support values can be low because of the unavoidable use of such short sequences, particularly when including sequences from basal animal lineages. Nevertheless, the node support values in a number of the trees in Fortunato et al . [ 16 ] were at levels that are conventionally considered to be significant. Confidence in the assignment can be boosted, despite occasional low support values, when the affiliation between the sponge gene of interest and a known homeobox family arises repeatedly when trees are constructed with different phylogenetic approaches (with their inherent different assumptions and pitfalls), varied taxon sampling and when focusing on particular families that possess amino-acid residues that seem to be relatively distinctive for certain families. Finally, building trees with and without difficult-to-classify, long-branch sequences from basal lineages such as sponges and checking the stability of the topology with respect to the assignment of the gene of interest also helps to establish confidence in the classification. Fortunato and colleagues [ 16 ] performed all of these checks and consistently found that the calcisponge genes of interest grouped with the Cdx family.
Consistent with this Cdx identity, S. ciliatum also possesses distinct Hox and ParaHox neighbourhoods in its genome (the L. complicata genome assembly was not suited to the analysis). Also, intriguingly, the SAR1 gene is a close neighbour of both S. ciliatum Cdx and the ParaHox gene cluster of the cnidarian Nematostella vectensis [ 16 , 45 ], and another S. ciliatum Cdx neighbour, PPWD1 , is also a ParaHox neighbour in some bilaterians (including humans). Although there are not enough neighbours on the S. ciliatum Cdx scaffold to assess a ParaHox affinity statistically, they clearly do not contradict the Cdx classification, but instead provide some intriguing potential ParaHox associations. Clearly, future work involving improvements to the S. ciliatum and L. complicata assemblies could help to increase the number of known neighbours to the Cdx gene so that a valid statistical test can be performed, and further taxon sampling may find still further species that have retained ParaHox (or Hox) genes.
The origin of the Hox/ParaHox genes in their distinct genomic neighbourhoods has thus been pushed deeper in animal evolution than the divergence of the poriferan lineage from the rest of the animals. Adopting the consensus opinion at the time, that the Porifera were monophyletic and also were the basal-most lineage of animals [ 46 ], led to the conclusion that the last common ancestor of the animals thus had distinct Hox and ParaHox genes [ 15 , 16 ]. Determining whether these sponge data really do indicate the state in the last common ancestor of all animals, or instead lead us to the state of an ancestor one or two nodes above the origin of the animal kingdom, is intimately connected to the phylogeny of the basal lineages of the animal kingdom.
Phylogeny of the basal lineages of the animal kingdom
A recent major area of debate in the field of animal phylogeny has been about which of the Porifera or Ctenophora constitute the basal-most, earliest-branching lineage in the animal kingdom [ 46–52 ]. The latest contribution to this debate, which represents the most thorough analysis so far with increased taxon sampling and careful testing for various possible sources of systematic error (such as the misleading signal contributed to many studies by the prevalent ribosomal proteins), robustly locates the Ctenophora as the basal-most animal lineage [ 51 ]. Whelan and colleagues [ 51 ] also resolved poriferans as monophyletic rather than paraphyletic, which has been another recent area of debate. With regards to the origin of the Hox/ParaHox genes, whether they arose before or after the origin of the Ctenophora remains to be seen ( Figure 2 ). A mention of a Cdx gene in the Supplementary Information of [ 50 ] requires closer examination to test its veracity, and hence establish whether the ProtoHox to Hox/ParaHox transition occurred after the divergence of the ctenophores (and before the origin of sponges) or instead occurred in the last common ancestor of all animals.
Animal phylogeny with origin of Hox and ParaHox specific to the animals (Metazoa) and before the origin of poriferans, but with ambiguity with regards to the ctenophore condition. ‘Ticks’ represent confirmed presence of Hox/ParaHox genes in at least some members of the phylum, while ‘X’ represents absence from choanoflagellates [ 15 ], the sister group to the Metazoa. Ghost cartoons represent the presence of ghost loci, and ‘?’ indicates the ambiguity about the ctenophore condition and, hence, the uncertainty about whether the ProtoHox to Hox/ParaHox transition happened before or after the divergence of the ctenophore lineage (see text for details). All animal pictures are taken from the Phylopic database ( http://phylopic.org ), except for the Sycon poriferan picture, which was constructed by the author and is derived from a photo from Maja Adamska ( http://biology.anu.edu.au/research/labs/adamska-lab-genomic-and-evolutionary-basis-animal-development ). The phylogeny is based on that of [ 51 ]. It should be noted, however, that the relative branching order of the ctenophore and poriferan lineages at the base of the animal phylogeny is still not resolved to the extent that there is a general consensus [ 53 ]. If the poriferan lineage is in fact the basal-most lineage, then the conclusions of Mendivil-Ramos et al . [ 15 ] and Fortunato et al . [ 16 ], that the last common ancestor of all extant animals had Hox/ParaHox genes in distinct loci, would stand.
Ctenophores as the basal animal lineage could well imply a simplification of morphology and range of cell types during poriferan evolution, including, for example, possible secondary loss of neurons and muscles [ 54–56 ], notwithstanding the alternative interpretation favoured by some authors that the ctenophores and eumetazoans evolved at least some of these cell types independently [ 50 , 52 ]. This potential secondary simplification or loss of cell types perhaps mirrors the extensive differential gene loss that is now being discovered across these lineages.
Differential gene loss is rife across basal animal lineages
Implicit in the ghost locus hypothesis is differential gene loss. In recent years, a greater appreciation for the importance of gene loss across animal lineages has developed, as a certain counterbalance to the prevalent view that deep homology and conservation of developmental control genes underlies much of animal evolution (e.g. [ 57–59 ]). While the deep conservation of developmental control genes certainly still holds to a large extent, it is clear that there is a much greater degree of evolutionary lability than was appreciated in the early days of molecular evo-devo [ 60 ].
Excellent reviews of the patterns of differential gene loss impacting on transcription factors are already available for basal animal lineages (including [ 61–64 ]). Differential gene loss is also seen in genes that encode proteins other than transcription factors, including signalling molecules, genes involved in nervous system activity, epithelial integrity and ‘immune response' [ 65 ]. Non-coding genes, such as microRNAs, have also been observed to be secondarily lost from some basal lineages, such as the Placozoa [ 43 ]. In the Hox/ParaHox context, differential gene loss is also consistent with the remaining ParaHox gene in the placozoan T. adhaerens being from one family, Gsx, while the remaining ParaHox gene from sponges is from another family, Cdx.
On balance, there has clearly been extensive gene loss during the evolution of the basal animal lineages, and the last common ancestor of all animals had a much larger developmental gene repertoire than was widely appreciated until recently (e.g. [ 42 , 64 , 66 , 67 ]). A more extensive understanding of the gene content of non-bilaterian animals, particularly via wider taxon sampling, is clearly needed to properly understand the genome content of the last common animal ancestor.
A large diversity of developmental control genes implies a morphologically complex animal ancestor?
Complexity can often be a loaded or ambiguous term in evolutionary biology and needs precise definition to try to avoid confusion. In terms of complexity simply meaning a large number of different entities, clearly, the ancestral animal was much more complex than appreciated even just a few years ago if we consider developmental control genes themselves. What is less clear is how this genetic complexity relates, and related, to morphological complexity. Previous attempts, for example, have involved ‘measuring’ the numbers and variety of different cell types or tissue types [ 68 ].
It is becoming clear that we need to carefully reassess our views about the ‘simplicity' of basal animal lineages. With improvements in microscopy, immunohistochemistry and histology, as well as closer examination of the expression of the large variety of genes being isolated, it is apparent that there is plenty of cryptic ‘hidden’ biology in basal animal lineages, and they are not as morphologically simple as previously believed [ 69 , 70 ]. Also, previous conclusions about the similarity and likely homology of cell types can be questioned when specifically examined. A particularly pertinent case in the present context of basal animal lineages is the long-assumed affinity between sponge choanocytes and the cells of the non-animal sister group, choanoflagellates. This homology is now being questioned [ 71 ]. Furthermore, despite the extensive secondary gene loss described above, sponges are proving to still be rather molecularly complex, with large gene numbers (for example, A. queenslandica estimates now exceed 40 000 genes [ 72 ]) and a diversity of long non-coding RNAs comparable with that of bilaterians [ 73 ]. Also, expansions of Taxon-Restricted Genes are just as prevalent in basal animal lineages as they are in bilaterians [ 74 , 75 ].
Additionally, the question mark in the subheading above is important. It is intended to highlight the uncertainty with which a diversity of developmental control genes can be equated to a diversity of developmental functions, which by extension could build complex morphologies composed of a diversity of cell types and structures. This uncertainty arises from the fact that we currently have a poor understanding of how developmental genes control the development of morphology in basal animal lineages. In large part, this is because of a traditional focus on a small number of model organisms in developmental biology, which did not include any examples of basal animal lineages. Also the extensive pleiotropy of developmental genes in these model organisms, with each gene having roles in a wide variety of developmental contexts, makes deductions about ancestral gene functions particularly difficult. The recent development of functional genetic techniques like TALEN and CRISPR/Cas9, that are gradually being brought to bear on basal lineage taxa, should help to address this problem in the near future [ 76 ]. Nevertheless, for the present it seems reasonable to conclude that the previously underappreciated number and diversity of developmental control genes in the last common ancestor of animals probably did equate to a complex, diverse set of functions that presumably did build complex morphologies.
This paucity of molecular description of morphological development (e.g. generation of cell type diversity) in basal animal lineages becomes less of an issue when we consider another valuable form of evidence in deducing the appearance of ancient ancestors, namely palaeontology. Here, great strides have also been made in recent years, with new preparatory techniques (e.g. serial grinding to build three-dimensional digital reconstructions [ 77 ] and X-ray computed tomography [ 78 ]) and, crucially, the discovery and description of new specimens. Ctenophore fossils are not so well known and the possible earliest, most ancient examples are controversial in their affinities (discussed in [ 69 ]). The most unambiguous ctenophore fossils are from the Lower Devonian [ 69 ], but potential Ediacaran ctenophores have also been found [ 79 ]. An important caveat to keep in mind in this context, however, is that extant ctenophore lineages seem to have diverged from each other relatively recently. The last common ctenophore ancestor is thus separated by a long evolutionary distance from the last common ancestor with other extant animal lineages ([ 69 ] and references therein). Much evolutionary change could thus have happened between the last common ancestor of all animals and the last common ancestor of extant ctenophores (including, for example, the evolution of skeletonization in some extinct ctenophores [ 80 ]), and the morphological differences between these two ancestors are far from clear. This perhaps warrants a reappraisal of the Ediacaran fossils (635–541 million years ago) and the likelihood of the last common ancestor of animals being present within this difficult-to-classify diverse group of organisms.
The classification of these Ediacaran organisms is noted for the controversy and range of opinions on their phylogenetic affinities, and their relationships with modern recognized phyla or even kingdoms. But a consensus seems to have emerged in recent years that at least some Ediacaran species are animals [ 81 , 82 ]. Some authors have hypothesized that some of the earliest Ediacaran forms (e.g. rangeomorphs) are stem ctenophores or cnidarians, but this is still debated ([ 81 ] and references therein), and some now prefer to treat rangeomorphs as organisms in their own right, without trying to shoe-horn them into a classification based on extant kingdoms and phyla [ 83–85 ]. These early Ediacarans, including but not restricted to the rangeomorphs, could well be the earliest animals, but with body plans that are rather distinct from modern-day metazoans. The complexity of these early Ediacarans and how this might relate to developmental gene diversity is wide open for debate and further research (e.g. [ 86 ]).
Summary
In conclusion, it is clear that great strides are being made in understanding the nature of the last common ancestor of animals on a number of fronts: genome sequencing, developmental gene characterization, functional genetics, phylogenetics and palaeontology. One of the key aspects to this improvement in our understanding is wider taxon sampling. The revolution in our understanding of the origin and subsequent evolution of the important Hox/ParaHox developmental control genes, which have long been a major focus in evo-devo, is a case that illustrates this importance of wide taxon sampling particularly well. The origin of the Hox/ParaHox genes occurred before the origin of the poriferans, clearly illustrating that the last common ancestor of sponges and the remaining Eumetazoa were more complex, molecularly speaking, than previously appreciated. Whether this ancestral state represents that of the last common ancestor of all animals will be determined by the rapidly progressing advances in phylogenetics, with the current indications being that we need a more extensive understanding of ctenophores. Advances in palaeontology will also be key, particularly with regards to a better understanding of the Ediacaran fossils, which are renowned for the controversy over their affinities (or lack of affinities) with modern-day lineages, and whether the origins of the animal kingdom can be distinguished amongst them. In the current context, it is perhaps suffice to say that the morphological appearance and relative complexity of the first animal is far from clear, and a morphologically complex ancestor is perfectly plausible, even more so given the shift of the ctenophore lineage to a basal position in the animal tree and the continued discovery of cryptic complexity in extant basal lineages. Continued wide taxon sampling and progress in all of the research fields listed above are significantly improving our chances of determining the complexity (both molecularly and morphologically) of the last common ancestor of animals, which was the starting point for the diversification of the entire animal kingdom. Darwin’s view that ‘from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved' [ 87 ] may need to be slightly revised for the animals because they started from a not-so-simple beginning after all.
Key Points
Differential gene loss has had an important role in animal evolution.
Origin of the Hox/ParaHox developmental control genes occurred before the origin of the poriferan lineage.
Differential gene loss of Hox/ParaHox genes in basal lineages makes it difficult to determine whether these genes arose from the ProtoHox state in the last common ancestor of all animals, or slightly higher in the animal phylogeny.
The last common ancestor of animals was genetically, and possibly morphologically, more complex than previously appreciated.
The nature of the first animal can only be deduced by integrating comparative developmental biology, phylogenetics, evolutionary genomics and palaeontology.
Acknowledgements
The author would like to thank Olivia Mendivil Ramos, Maja Adamska and Sofia Fortunato for discussions that shaped the ideas outlined in this review, and Joanne Ferrier for comments on the manuscript. I would also like to thank the various colleagues who continue to query and challenge the hypotheses discussed here.
References
z
![Likely potential routes to the origin of the Hox and ParaHox clusters on distinct chromosomes after evolution from a ProtoHox cluster. Black rectangles represent the homeobox genes of the ProtoHox, ParaHox and Hox clusters. Triangles represent genes that neighboured (were linked to) the ProtoHox cluster and then became Hox cluster neighbours, while ovals represent genes that neighboured (were linked to) the ProtoHox cluster and then became ParaHox cluster neighbours. Note, the ‘triangles’ and ‘ovals’ do not take on their specific Hox or ParaHox loci identities until the Hox and ParaHox clusters have arisen and separated onto distinct chromosomes, and any necessary gene losses have occurred to make the Hox and ParaHox neighbours largely mutually exclusive (shown as black triangles and ovals in the figure). Any paralogous genes that are affiliated with both Hox and ParaHox loci are not considered here (e.g. collagens and tyrosine kinase receptor genes [ 19 ]). ( A ) WGD = whole genome duplication WCD = whole chromosome duplication, X = gene loss, ( B ) SD = segmental duplication, and ‘single cluster split*' involves the number of homeobox genes in the ProtoHox cluster increasing to form the precursors to both the Hox and ParaHox gene families before the ProtoHox cluster being split, which is likely to result in the Hox and ParaHox first being linked because of the split initially being intrachromosomal before some form of translocation to distribute the clusters onto distinct chromosomes. Note, it is possible that a prior intrachromosomal inversion event is not required if the interchromosomal arm exchange happened to have a break precisely within the ProtoHox cluster. While this is formally possible, the likelihood intuitively seems to be low.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bfg/15/5/10.1093_bfgp_elv056/2/m_elv056f1p.jpeg?Expires=1712751805&Signature=ToPuWzzW-5QxqZ0TeVAGdrfkWdu4q7tm5FQJvQG3-iY40N-kSkTJ7zPZEEiToJydUHRxuU5IbjUi7aohkbkd-06C4~rR2HdI8xY2fShsvsnmsXEhJSgZJb7~S1QyCp-JfZ~GxoEqUyqNEQ6UJrdr9Oxl6eX8ju~RYVs0lozpGuS5Eor4FYT58VDuJVkhDTyFopJfMCwiXAJcO4XkEeVqhmsAglZ~0u3Ssqzy1ad5HPeeggsSuiqw9h-ShSERbL6nDDQhohDdofZVdu2k6AvLLaW0ZRDn8DUITvU8mAo2YGJ072JHpybNGi8oTkNS6P7xTZ8T0HwcnSRqNo7~qHXtUg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
![Animal phylogeny with origin of Hox and ParaHox specific to the animals (Metazoa) and before the origin of poriferans, but with ambiguity with regards to the ctenophore condition. ‘Ticks’ represent confirmed presence of Hox/ParaHox genes in at least some members of the phylum, while ‘X’ represents absence from choanoflagellates [ 15 ], the sister group to the Metazoa. Ghost cartoons represent the presence of ghost loci, and ‘?’ indicates the ambiguity about the ctenophore condition and, hence, the uncertainty about whether the ProtoHox to Hox/ParaHox transition happened before or after the divergence of the ctenophore lineage (see text for details). All animal pictures are taken from the Phylopic database ( http://phylopic.org ), except for the Sycon poriferan picture, which was constructed by the author and is derived from a photo from Maja Adamska ( http://biology.anu.edu.au/research/labs/adamska-lab-genomic-and-evolutionary-basis-animal-development ). The phylogeny is based on that of [ 51 ]. It should be noted, however, that the relative branching order of the ctenophore and poriferan lineages at the base of the animal phylogeny is still not resolved to the extent that there is a general consensus [ 53 ]. If the poriferan lineage is in fact the basal-most lineage, then the conclusions of Mendivil-Ramos et al . [ 15 ] and Fortunato et al . [ 16 ], that the last common ancestor of all extant animals had Hox/ParaHox genes in distinct loci, would stand.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bfg/15/5/10.1093_bfgp_elv056/2/m_elv056f2p.jpeg?Expires=1712751805&Signature=JvkKKv4vJpnRoKlpkQLEJmDWdNVUMdne7uyvHsKdMcQ-mfrR0XyFgpzLfYHZZCpoX8c6f-sy5SQHuPaNRweunPkAeh9OWSyIBaX6c7kLe2Zs4VOF-EsPX8nXby6zig1AOE7iIAkTRSy1iOISa8hvHF26nhSzj0nMaPnpvGFs1x4Hd-hUcFxF3QmAAruKVg3jE92cELSsL6VlHonSY9wFGD4cC4h-qdkTfptSFD45lJ6SX-ALNRDPg3UZgKNvZogrpXyphNIbFLGMSXRYtsBF~8HejWtWB6nqQzUQzgBCeZO07FoGPvpChQinEPt0wzXo4M0GEa9taM6-whmQz8V1Ew__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)