The ParaHox genes consist of 3 homeobox gene families, Gsx, Xlox, and Cdx, all of which have fundamental roles in development. Xlox (known as IPF1 or PDX1 in vertebrates), for example, is crucial for development of the vertebrate pancreas and is also involved in regulation of insulin expression. The invertebrate amphioxus has a gene cluster containing one gene from each of the gene families, whereas in all vertebrates examined to date there are additional copies resultant from ParaHox gene cluster duplications at the base of the vertebrate lineage. Extant vertebrates basal to bony and cartilaginous fish are central to the question of when and how these multiple genes arose in the vertebrate genome. Here, we report the mapping of a ParaHox gene cluster in 2 species of hagfishes. Unexpectedly, these basal vertebrates have lost a functional Xlox gene from this cluster, unlike every other vertebrate examined to date. Furthermore, our phylogenetic analyses suggest that hagfishes may have diverged from the vertebrate lineage before the duplications, which created the multiple ParaHox clusters in jawed vertebrates.
The ParaHox gene cluster was first discovered in the cephalochordate amphioxus (Brooke et al. 1998) and is composed of members of 3 Hox-related homeobox gene families: Gsx, Xlox, and Cdx. Studies of ParaHox genes in vertebrates and invertebrates show that they all have roles in developmental processes. Gsx genes, for example, have a role in brain patterning (Hsieh-Li et al. 1995; Valerius et al. 1995; Frobius and Seaver 2006), whereas Xlox is expressed in the invertebrate gut (Brooke et al. 1998; Frobius and Seaver 2006) and is involved in development of the vertebrate pancreas (Jonsson et al. 1994), and Cdx genes are involved in patterning of the posterior gut and neural tube (Moreno and Morata 1999; Reece-Hoyes et al. 2002).
Only one member of each ParaHox gene family is found in amphioxus, and these are located in a single cluster with Gsx adjacent to Xlox in the same orientation, followed by Cdx on the opposite strand. From the distribution of these gene families among protostomes and deuterostomes, it can be inferred that this gene complement and cluster organization are ancestral, dating at least to the basal bilaterian (Ferrier and Minguillon 2003). In humans, a ParaHox cluster is found on chromosome 13q12.2 (Pollard and Holland 2000; Ferrier et al. 2005); it contains the genes GSH1, IPF1 (also known as PDX1 or Xlox in other organisms; we prefer to use the name Xlox to refer to this gene family because the alternative names imply functions which are not always known), and CDX2, and although the intergenic distances differ, the order and orientation of the genes is identical to the single cluster of amphioxus. However, the human genome also contains paralogues of the Gsx and Cdx families (GSH2, CDX1, and CDX4) dispersed on 3 other chromosomes. The ParaHox genes of mouse, Xenopus, and some actinopterygian fishes show an equivalent genomic organization (Mulley et al. 2006). Studies of the chromosomal regions containing these genes have revealed that a single prevertebrate ParaHox gene cluster was duplicated en bloc, such that 4 ParaHox clusters were present in the genome by the time bony fish diverged (Prohaska and Stadler 2006) and indeed before cartilaginous fish diverged from the bony fish lineage (Mulley JF, Holland PWH, unpublished data). However, loss is prevalent among the ParaHox genes, and in all the vertebrate species studied to date, a maximum of one intact cluster remains.
The en bloc duplication of ParaHox gene clusters may be symptomatic of a larger scale duplication event. It has been hypothesized, and is becoming generally accepted, that 2 whole-genome duplications occurred on the vertebrate stem after the divergence of cephalochordates and urochordates and before the divergence of cartilaginous and bony fish (Holland et al. 1994; Venkatesh et al. 2007). Furthermore, it has been suggested that these genome duplications preceded and may even have been the cause of the innovations seen in vertebrate developmental patterning (Shimeld and Holland 2000). The timing of these duplications is therefore key to our understanding of vertebrate development and evolution. Ascertaining whether basal vertebrates have a characteristic vertebrate-like organization of homeobox genes will permit inferences about whether genome duplications can be associated with the appearance of the vertebrate body plan.
The most basal extant vertebrates, and of interest in this problem, are the jawless fish, which comprise 2 groups: hagfish and lampreys. These animals share many morphological features with jawed vertebrates which are not seen in invertebrate chordates, such as the presence of cartilage and mineralized tissue and elaboration of brain segmentation and sensory placodes. However, many other characters are only present in jawed vertebrates, such as paired appendages, hinged jaws, and increased complexity of the axial skeleton (Shimeld and Holland 2000). Traditionally, hagfish and lampreys were thought to form a monophyletic sister clade to the jawed vertebrates, known as the Cyclostomi (Dumeril 1806). However, a number of morphological and physiological characteristics, which are found in all other vertebrates, including lampreys, are either rudimentary or completely absent in hagfishes. This has led to suggestions that the jawless fish are paraphyletic and that hagfish form a sister group to the true vertebrates (Forey 1984; Maisey 1986). Molecular phylogenetic studies tend to support the traditional monophyly view (Furlong and Holland 2002; Blair and Hedges 2005), which therefore implies that these primitive features are the result of degeneracy on the hagfish lineage.
We examined the ParaHox genes in 2 species of hagfishes and mapped their organization. We show that hagfishes have ParaHox gene linkage, but this is unlike any ParaHox cluster known to date. No functional Xlox gene is present in either species. This is the first evidence for the genomic organization of homeobox genes in the hagfish lineage. Furthermore, our phylogenetic analyses suggest that hagfishes diverged from the vertebrate lineage before the duplications which created the multiple ParaHox clusters in jawed vertebrates.
Materials and Methods
Isolation and Analysis of Gsx-Positive Bacterial Artificial Chromosomes from 2 Species of Hagfish
Polymerase chain reaction (PCR) was carried out on Eptatretus burgeri genomic DNA using degenerate homeobox primers (forward SOGSH: CAGCTCTTGGARCTNGARCGN; reverse SO2: CKNCKRTTYTGRAACCA), yielding a fragment that showed similarity to previously published vertebrate Gsx genes. This fragment was used to screen an E. burgeri genomic phage lambda library (Ishiguro et al. 1992). Positive clones were then digested, subcloned, and sequenced to provide a longer Gsx sequence to which specific primers could be designed. The specific primers (forward GSH2: TATTTCTCCAGGTTCTGACG, reverse GSH30: ATGGAGTATCGAGAAGCAGA) were used to screen PCR pools of an E. burgeri bacterial artificial chromosome (BAC) library (Suzuki et al. 2004). A resulting Gsx-positive BAC with coordinates EB7-10H was then shotgun sequenced and gaps closed by PCR.
An arrayed BAC library of 55,296 clones of average size 118 kb was constructed from 14 ml of whole blood taken from multiple Myxine glutinosa individuals collected at Kristineberg Marine Research Station, Sweden. Library filters and clones are available from Amplicon Express, Pullman, WA (www.genomex.com). This library was screened at low stringency using an E. burgeri Gsx PCR fragment generated from the BAC EB7-10H using the previously described primers. Two Gsx-positive BACs were found, with coordinates MG090C08 and MG116H24; PCR and sequencing revealed that both contained the same Gsx sequence. MG090C08 was completely sequenced from shotgun subclones.
The complete BAC sequences were analyzed using Blast and GenScan. The sequences contain several transposable elements and many simple sequence repeats but no putative genes (under the criteria of being predicted by GenScan and having strong Blast match to known genes) other than the ParaHox genes described. No Cdx gene sequence was found in the completely sequenced M. glutinosa BAC MG090C08. PCR screening with degenerate and hagfish-specific primers revealed that BAC MG116H24 also did not contain a Cdx gene.
Degenerate nested Xlox primers (forward JMXLOX1C: GACGACAACAAGMGNACN AGR AC; forward nested XLOX2: CAGCTGCTVGAGCTVGAGAA; reverse XLOX3: YTCCTCYTTYTTCCACTTCAT; reverse nested XSO2: GCG NCG RTT YTG GAA CCA GAT) were used to screen M. glutinosa and E. burgeri genomic DNA and M. glutinosa cDNA from gut and bile duct. No positive results were achieved.
To confirm that the Xlox pseudogene was not a sequencing or cloning artifact, or derived from a mutant individual, primers were designed to the surrounding region of the BAC and used to amplify this region from total DNA isolated from a different specimen of M. glutinosa. The resulting fragment was sequenced and shown to contain the same pseudoXlox sequence with identical frameshifts and stop codons.
Hagfish ParaHox BAC sequences have been deposited in GenBank under the accession numbers EU122193 and EU122194.
Blast searches against GenBank were carried out using the deduced M. glutinosa and E. burgeri ParaHox coding sequences, and data sets of invertebrate and vertebrate Gsx and Cdx genes were constructed (accession numbers given in Supplementary Material online). Additionally, we searched the Petromyzon marinus (Sea lamprey) genome traces, the publicly available expressed sequence tags, and the pre!Ensembl assembly (February 2007 release) for ParaHox genes. The genome sequence data are found to be incomplete for these gene families. For example, we did not find any representative of Gsx exon 1 and only a single trace file for Gsx exon 2. Thus, we were unable to include the lamprey data in our further analyses.
The inferred amino acid sequences were aligned using ClustalW (Thompson et al. 1994). Neighbor-Joining (NJ) phylogenetic analysis of the amino acid alignment was carried out using PHYLIP (Felsenstein 2005) with the JTT matrix and 1,000 bootstrap replicates. Maximum likelihood phylogenetic analysis of the amino acid alignment was carried out using PhyML (Guindon and Gascuel 2003) with the Jones-Taylor-Thornton matrix, 4 categories of gamma rate heterogeneity plus 1 invariable category, and 1,000 bootstrap replicates. The amphioxus Cdx gene was originally included but does not resolve well, distorting the tree toward biologically incorrect groupings and reducing support values across the whole phylogeny. The same is true of Ciona intestinalis Gsx which additionally has an extremely long branch. Accordingly, neither were used in the trees shown. Both methods retrieved identical phylogenies for both data sets (with the exception that amphibian cad2 grouped with human and mouse Cdx1 in NJ analysis).
To determine the organization of the hagfish ParaHox genes, we first used degenerate PCR to isolate a Gsx-like fragment from genomic DNA of the inshore hagfish E. burgeri. This fragment was used to screen an E. burgeri genomic phage lambda library (Ishiguro et al. 1992). Positive clones were sequenced and used to generate specific primers to screen PCR pools of an E. burgeri genomic BAC library (Suzuki et al. 2004). A Gsx-positive BAC was isolated and completely sequenced. Analysis of the 103-kb insert revealed one Gsx and one Cdx gene family member in tail-to-tail configuration 36 kb apart (fig. 1). The intron–exon structure and orientation of these genes is similar to that seen in the ParaHox gene cluster of other vertebrates and invertebrates, but unlike the clusters of other animals no Xlox-like sequence is present between these 2 genes. To test whether an Xlox gene was present elsewhere in the genome, degenerate nested Xlox primers (known to work on a variety of chordates) were used in PCR amplification using E. burgeri genomic DNA as a template. No positive results were achieved.
To learn more about the absence of Xlox from this cluster, we examined a second species of hagfish, the Atlantic hagfish M. glutinosa. A genomic BAC library was constructed and screened with an E. burgeri Gsx fragment; a 145-kb Gsx-positive BAC clone was isolated and completely sequenced. This BAC contains a M. glutinosa Gsx gene consisting of 2 exons with 90% nucleotide identity to the coding region of the E. burgeri gene. The Myxine and Eptatretus genera diverged 60–90 MYA (Kuraku and Kuratani 2006); this is roughly equivalent to the divergence time between human and mouse (Waterston et al. 2002). Human and mouse have on average 85% nucleotide identity across their coding regions (Waterston et al. 2002) suggesting that the orthology of these 2 hagfish Gsx genes is likely. To confirm this, maximum likelihood phylogenetic analysis of the Gsx family was carried out, which revealed that the Gsx genes from the 2 hagfish species group together to the exclusion of other vertebrates and thus share a common ancestor due to speciation rather than duplication (fig. 2). We therefore conclude that the Gsx genes sequenced from E. burgeri and M. glutinosa are orthologues and infer that their surrounding genomic regions are syntenic.
No Cdx gene was found within the M. glutinosa BAC; however, the expected organization of the cluster and the intergenic distances of the E. burgeri cluster, suggest that this gene would be found a short distance outside this BAC insert. Attempts at genomic walking from the Gsx-containing BAC were unsuccessful, due to highly repetitive sequences.
Intriguingly, Blast searches revealed a sequence 11 kb downstream of the M. glutinosa Gsx gene, in the same orientation, with high similarity to the second exon of vertebrate Xlox genes, including the homeodomain. However, this sequence has numerous nonsynonymous changes and a frameshift between the regions that encode helices 1 and 2 (fig. 1) and is clearly an Xlox pseudogene. To confirm that this was not a sequencing or cloning artifact, or derived from a mutant individual, primers were designed to the surrounding region of the BAC and used to amplify this region from total DNA isolated from a different specimen of M. glutinosa. The resulting fragment was sequenced and shown to contain an identical pseudoXlox sequence. To test whether an Xlox gene was present elsewhere in the genome, degenerate nested Xlox primers were used in PCR amplification reactions on M. glutinosa genomic DNA and cDNA from gut and bile duct. No positive results were achieved.
One gene from each of the Cdx and Gsx ParaHox families was isolated in this study, and phylogenetic analysis can be used to determine their orthology to ParaHox genes from other species. If the genome duplications at the base of the vertebrates preceded the divergence of the hagfish lineage from the stem group, hagfishes would (like all bony vertebrates analyzed to date) have had an ancestral complement of 4 of each ParaHox gene. Under these circumstances, we would expect the Gsx and Cdx genes isolated in this study to fall into one of the known vertebrate-specific subfamilies resultant from these duplications. However, maximum likelihood phylogenetic analysis suggests that hagfish Gsx and Cdx diverged before the duplications that created multiple copies of Gsx and Cdx in the jawed vertebrates (fig. 2). This is also supported by the fact that we found only one Gsx and one Cdx gene in each species despite extensive screening.
Deciphering the organization of homeobox genes provides insight into vertebrate development and evolution. We have isolated and characterized ParaHox genes from the inshore hagfish E. burgeri and the Atlantic hagfish M. glutinosa. The Gsx and Cdx genes of E. burgeri are linked and show a typical ParaHox cluster organization, being found in tail–tail configuration. The absence of an Xlox gene in the E. burgeri ParaHox cluster is extremely unusual. Although genome sequencing has revealed that some invertebrate species such as Drosophila melanogaster and Caenorhabditis elegans lack an Xlox, they also lack any linkage of their ParaHox genes (Ferrier and Holland 2002). No other animal is known with a ParaHox cluster (Cdx–Gsx linkage) lacking the central Xlox.
To learn more about the absence of Xlox in E. burgeri, we examined the orthologous genomic region in M. glutinosa and discovered the remnants of an Xlox pseudogene close to Gsx in this species. Complete genome sequencing will provide a definitive answer, but our results suggest that Xlox was lost in both species by pseudogenization rather than translocation elsewhere in the genome. Furthermore, translocation involving inversion would have split the Gsx–Cdx linkage in Eptatretus. If ParaHox gene cluster organization has been retained in vertebrates and amphioxus by temporal colinearity of gene expression (Ferrier and Holland 2002) or by distant or shared enhancers (Mulley et al. 2006), then the spacing between the genes may be crucial. In this situation, it may be easier to lose a gene by sequence degeneration than by translocation as the latter is more likely to disrupt flanking genes, control elements, and intergenic spacing. The possibility remains that the ParaHox clusters have been retained purely by chance (e.g., Nadeau and Taylor 1984); however, recent work suggests that ancient retained gene linkages are usually due to functional constraints (Kikuta et al. 2007)
The presence of an Xlox pseudogene close to Gsx in Myxine suggests that the Xlox protein function was lost on the hagfish lineage, before the divergence of these 2 species 60–90 MYA. Consistent with this deduction, other studies have shown that, although loss is a stochastic process often involving deletion mutations, orthologous pseudogenes can be preserved for over hundreds of millions of years of evolution (Zheng et al. 2007).
The Xlox gene of invertebrates, including amphioxus, is expressed in the gut (Brooke et al. 1998). In vertebrates, Xlox appears to have taken on a novel role during development of a vertebrate-specific organ, the pancreas, and is also crucial for insulin production. Lack of a functional Xlox (IPF1) gene leads to pancreas agenesis (Jonsson et al. 1994) and mutations can cause diabetes (Macfarlane et al. 1999). We postulate that the absence of Xlox in hagfish is correlated with absence of an equivalent to the vertebrate pancreas. The disseminated clusters of endocrine cells around the hagfish bile duct are retrograde by comparison to the discrete islet organs of lampreys and the extraintestinal pancreas of most jawed vertebrates (Youson and Al-Mahrouki 1999). Given that Xlox has a key role in insulin production, how can hagfish produce insulin in its absence? In mouse, a small number of “early” endocrine cells develop independently of Xlox (IPF1) expression and form small clusters, able to produce insulin and glucagon; these are a different cell lineage from those that appear later, express Xlox (IPF1), and become the islets of Langerhans (Larsson 1998). These early cells persist in the pancreatic rudiment in Xlox-deficient mice and are (at least transiently) able to express insulin and glucagon but do not form the structural and exocrine cells necessary for pancreatic development (Ahlgren et al. 1996). It has been suggested that the later islet lineage arose by superimposing the activity of Xlox and other factors on a basal enteroendocrine program (Kim and MacDonald 2002). Perhaps, in the absence of Xlox, hagfish have exploited this basal system for insulin production.
Our phylogenetic analyses suggest, but do not prove, that hagfish diverged from the vertebrate stem before the “2R” genome duplications that created the multiple Hox and ParaHox clusters and many other multigene families seen in jawed vertebrates. Because molecular phylogeny strongly suggests that hagfish form a clade with lampreys, the only other extant jawless fishes, this conclusion can also be extended to lampreys. Intriguingly, studies of cyclostome Hox genes suggest that both hagfish and lampreys have multiple Hox clusters (Force et al. 2002; Irvine et al. 2002; Stadler et al. 2004). In lampreys, it has been suggested that at least 2 and possibly all 3 or 4 clusters originate from independent duplications and therefore that the common ancestor of lampreys and gnathostomes had not undergone 1 of the 2 duplications that created the 4 Hox clusters found in jawed vertebrates and may have undergone neither (Fried et al. 2003). It is notable that phylogenetic analyses of hagfish and lamprey gene families have often been unable to reach a consensus on the relative timing of the 2 whole-genome duplications, relative to the lamprey and hagfish lineage (e.g., Escriva et al. 2002; Bridgham et al. 2006; Zhang and Cohn 2006). Thus, although our data are consistent with both genome duplications postdating the divergence of cyclostomes from the vertebrate lineage, we suggest that larger scale comparisons of genome organization will be necessary to resolve the timing of these events.
Table of accession numbers for the sequence data used in phylogenetic analysis are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
This research was funded by Christ Church, Oxford, the John Fell OUP research fund, Biotechnology and Biological Sciences Research Council, and European Union FP6 Marine Genomics Europe (contract GOCE-04-505403). We would like to thank John Mulley for providing degenerate primers, Stephan Beck and Penny Coggill for sequencing support, Takashi Suzuki for supplying E. burgeri BAC clones, and Hiroshi Ishiguro for providing a hagfish phage library.