In most bilaterian organisms so far studied, Hox genes are organized in genomic clusters and determine development along the anteroposterior axis. It has been suggested that this clustering, together with spatial and temporal colinearity of gene expression, represents the ancestral condition. However, in organisms with derived modes of embryogenesis and lineage-dependent mechanisms for the determination of cell fate, temporal colinearity of expression can be lost and Hox cluster organization disrupted, as is the case for the ecdysozoans Drosophila melanogaster and Caenorhabditis elegans and the urochordates Ciona intestinalis and Oikopleura dioica. We sought to determine whether a lophotrochozoan, the platyhelminth parasite Schistosoma mansoni, possesses a conserved or disrupted Hox cluster. Using a polymerase chain reaction (PCR)–based strategy, we have cloned and characterized three novel S. mansoni genes encoding orthologues of Drosophila labial (SmHox1), deformed (SmHox4), and abdominal A (SmHox8), as well as the full-length coding sequence of the previously described Smox1, which we identify as an orthologue of fushi tarazu. Quantitative reverse transcriptase–PCR showed that the four genes were expressed at all life-cycle stages but that levels of expression were differentially regulated. Phylogenetic analysis and the conservation of “parapeptide” sequences C-terminal to the homeodomains of SmHox8 and Smox1 support the grouping of platyhelminths within the lophotrochozoan clade. However, Bacterial Artificial Chromosome (BAC) library screening followed by genome walking failed to reconstitute a cluster. The BAC clones containing Hox genes were sequenced, and in no case were other Hox genes found on the same clone. Moreover, the SmHox4 and SmHox8 genes contained single very large introns (>40 kbp) further indicating that the schistosome Hox cluster is highly extended. Localization of the Hox genes to chromosomes using fluorescence in situ hybridization showed that SmHox4 and SmHox8 are on the long arm of chromosome 4, whereas SmHox1 and Smox1 are on chromosome 3. In silico screening of the available genome sequences corroborated results of Southern blotting and BAC library screening that indicate that there are no paralogues of SmHox1, SmHox4, or SmHox8. The schistosome Hox cluster is therefore not duplicated, but is both dispersed and disintegrated in the genome.
Hox genes were first identified in Drosophila as grouped genes encoding members of a major class of transcription factors that regulate anterior-posterior patterning in the embryo and other aspects of development (see McGinnis and Krumlauf 1992 for review). Hox proteins are characterized by a highly conserved 60 amino acid sequence, the homeobox, which corresponds to the DNA-binding domain of the molecule. Mammals were found to have the same clustered chromosomal organization of Hox genes, with four copies of the cluster. Moreover, both Drosophila and mammals show colinearity of the spatial expression of members of the cluster during development of the embryo; genes at the 3′ end of the cluster pattern the anterior end of the embryo, whereas 5′ genes pattern the posterior end. However, whereas mammalian Hox genes also show temporal colinearity of expression, those in Drosophila do not. The conservation of the order of orthologous Hox genes and their spatial expression suggested that they might provide a molecular representation of the body plan at an early stage of development common to bilaterally symmetrical metazoans and referred to as the zootype (Slack, Holland, and Graham 1993). This implies that the Hox gene cluster might have fulfilled this crucial developmental role in the common ancestor of all bilaterians. Clustering of Hox genes is indeed likely to be ancient (Holland 2001), and this may have its origin in the cnidaria because Nematostella vectensis exhibits bilateral symmetry and possesses five Hox genes (three anterior like and two posterior like) which are expressed in a staggered manner along the anterior-posterior body axis (Finnerty et al. 2004). Moreover, restriction analysis and Southern blotting provided evidence for the existence of a Hox cluster in N. vectensis (Finnerty 2001), and the presence of both Hox and ParaHox genes (Finnerty and Martindale 1999) further supports the hypothesis that the Hox cluster originated before the emergence of cnidarians. An alternative hypothesis, however, is that the cluster arose in an acoeloid, the last common acoelomorph (Acoela + Nemertodermatida) flatworm ancestor (Bagunà and Riutort 2004) having a limited repertoire of Hox genes. In either case, successive gene duplications then generated the clusters in higher bilaterians (for a discussion of the possible mechanisms involved, see Garcia-Fernandez 2005).
Due to their central role in the development of the bilaterian body plan and the observed changes in gene content of the Hox cluster during the evolution of the metazoa, Hox gene phylogeny has been instrumental in recent advances in the resolution of the metazoan tree. The use of 18S rRNA sequences has led to the division of the bilaterians into three branches, the deuterostomes and two protostome clades, the ecdysozoans and the lophotrochozoans (or trochozoans) (Aguinaldo et al. 1997). Among the consequences of this new grouping, the rhabditophoran platyhelminths which were formerly considered to be an early offshoot of the metazoan tree were placed within the lophotrochozoan branch. Examination of the platyhelminth Hox genes supported this view (Balavoine 1997; de Rosa et al. 1999; Balavoine, de Rosa, and Adoutte 2002), and in particular, specific sequences present as signature residues within the homeodomain or as conserved peptides outside the homeodomain provided phylogenetic evidence for the division of the protostomes and the position of the Platyhelminthes within the Lophotrochozoa (Balavoine, de Rosa, and Adoutte 2002). One such sequence is the 11 residue “Ubd-A” parapeptide immediately following the homeodomain in ecdysozoan Ubx and abdominal A (Abd-A) and in trochozoan Lox2 and Lox4 which clearly distinguishes the two clades. Moreover, a nine-residue parapeptide is characteristic of trochozoan Lox5.
However, knowledge of lophotrochozoan and particularly of platyhelminth Hox cluster genes is only partial. Polymerase chain reaction (PCR)–based cloning has allowed the characterization of Hox genes from several free-living platyhelminths including Polycelis nigra (Balavoine and Telford 1995; Balavoine 1996), Girardia tigrina (Bayascas et al. 1997), Dugesia japonica (Orii et al. 1999), and Discocelis tigrina (Salo et al. 2001). Overall, these studies suggest that flatworms present, among the 11 ancestral Hox genes inferred for the trochozoans (Balavoine, de Rosa, and Adoutte 2002), at least the five anterior-most genes, two of the four central genes, and one of the two posterior genes, but so far no studies have been done to determine the localization of the Hox cluster or whether it is compact or dispersed in this group. We therefore sought to clone Hox cluster members from the platyhelminth parasite of humans, Schistosoma mansoni, to fully characterize their sequences and levels of expression throughout the parasite life cycle and use an existing BAC library (Le Paslier et al. 2000) to reconstitute the Hox cluster. One potential member of the cluster was previously described in S. mansoni. Smox1 is a possible orthologue of Drosophila antennapaedia (Antp) or fushi tarazu (ftz) and was characterized as a partial cDNA clone encoding a homeodomain truncated at the 3′ end (Webster and Mansour 1992). In this paper, we describe the characterization of the complete coding sequence of Smox1 which we define as an orthologue of Drosophila ftz and of three new S. mansoni genes, orthologues of labial (lab), deformed (Dfd), and Abd-A (and of mammalian paralogue groups (PGs) 1, 4, and 8, respectively; Scott 1993). Chromosome walking, BAC sequencing, and localization using fluorescence in situ hybridization (FISH) show that the S. mansoni Hox cluster is both disintegrated and dispersed; Smox1 and SmHox1 are present on chromosome 3, whereas SmHox4 and SmHox8 are located on chromosome 4. The significance of these findings in relation to schistosome embryology and Hox cluster evolution is discussed.
Materials and Methods
A Puerto Rican strain of S. mansoni was maintained in Biomphalaria glabrata snails and golden hamsters (Mesocricetus auratus). Cercariae were released from infected snails and harvested on ice. They were then washed three times by resuspension in 30 ml of Hank's Balanced Salt Solution (Invitrogen, Cergy Pontoise, France) in a corex tube (Corning S.A., Avon, France) and centrifuged for 10 min at 1,500 g. Adult worms were obtained by whole-body perfusion of 6-week infected hamsters (Smithers and Terry 1965). Eggs were obtained from the livers of infected hamsters and hatched out under light to obtain miracidia (Yoshino and Laursen 1995). Primary sporocysts were obtained after overnight axenic culture of miracidia as described (Yoshino and Laursen 1995). Parasite DNA was extracted from the free-living cercariae using standard methods (Sambrook and Russel 2001). Total RNA was extracted from all life-cycle stages using the guanidine thiocyanate/caesium chloride method (Chirgwin et al. 1979), and poly A+ RNA was purified on oligo-dT cellulose (Aviv and Leder 1972).
Novel homeodomain-containing genes from S. mansoni were cloned using a PCR-based strategy. For each reaction, 50–500 ng of cercarial genomic DNA or 100 ng of reverse-transcribed poly A+ RNA from adult worms or cercariae was used, and degenerate oligonucleotides corresponding to conserved homeodomain sequences were as previously described (Balavoine and Telford 1995). PCR was carried out using a touchdown protocol (Don et al. 1991). Briefly, cDNA (2 μl) was amplified using 40 pmol of forward and reverse degenerate primers in a 50 μl total volume with 2.5 U of TaqGold DNA polymerase (Applied Biosystems, Courtaboeuf, France), the supplied buffer, and 2 mM MgCl2. After 10 min of denaturation at 95°C, four sets of five cycles of 95°C for 15 s, an annealing temperature of respectively, 55°C, 50°C, 45°C, and 40°C for 30 s and 72°C for 1 min 30 s were performed, followed by 25 cycles of 95°C for 15 s, 37°C for 30 s, and 72°C for 1 min 30 s with an Applied Biosystems 9700 thermocycler. Analysis of the product was carried out on 1% agarose gels in tris acetate-EDTA buffer stained with ethidium bromide. Fragments of interest were excised from the gel, purified on silica beads (Geneclean kit, BIO 101), and cloned into pCR 2.1-TOPO (Invitrogen). Plasmids from positive clones were prepared by alkaline lysis (Birnboim and Doly 1979). Sequencing was performed on an ABI 377 automated sequencer (Applied Biosystems) using methods and reagents of the supplier. The fragments obtained by PCR were then used to screen a lambda gt10 cDNA library (a gift from Ricardo de Mendonça; Talla et al. 1998). Probes were [α32P]-labeled using the Random Primers DNA Labelling System (Invitrogen). Hybridization with the probes was carried out as described (Sambrook and Russel 2001). After purification of positive clones, the inserts were sequenced. Full-length cDNA was obtained by performing 5′ and 3′ rapid amplification of cDNA ends (RACE) using cDNA prepared from adult worms with the SMART RACE kit (BD Clontech, Palo Alto, Calif.) as a template and primers derived from the sequence of the lambda gt10 clones (Table 1 in Supplementary Material online). The resulting sequences were deposited in GenBank with the accession numbers noted in Table 2 (Supplementary Material online).
Northern and Southern Blotting
Electrophoresis of total RNA from larvae and adult worms (30 μg per lane) was carried out alongside RNA size markers (Ambion, Ltd., Huntingdon, UK) in a 1.0% agarose/3% formaldehyde gel (Lehrach et al. 1977) that was then blotted onto a Hybond N+ nylon membrane (Amersham, Saclay, France). Probes (cDNA for northern and Southern genomic blots or BAC clone DNA for Southern blots of BAC DNA) were [α32P]-labeled as above and hybridization carried out as described (Sambrook and Russel 2001). Genomic DNA (10 μg) or BAC clone DNA (1 μg) was digested using HindIII and separated on a 1% agarose gel in TAE buffer. After transfer to a Hybond N+ membrane, hybridization was carried out by standard methods (Sambrook and Russel 2001) with a probe radiolabeled as above. After stringent washes, blots were exposed overnight to X-Omat AR film (Kodak, Chalons-sur-Saône, France).
Reverse Transcriptase–Polymerase Chain Reaction
Reverse transcription of 5 μg of total RNA from each life-cycle stage was carried out using 40 pmoles of random hexamers (Promega) and the Superscript kit (Invitrogen). The resulting cDNA was then amplified in a 50 μl total volume with 10 mM Tris-HCl (pH 9.0), 50 mM KCl, 0.1% Triton X-100, 1.5 mM MgCl2, 0.2 mM deoxynucleoside triphosphates, 2.5 U of Taq DNA polymerase (Promega, Charbonnières-les-Bains, France), and 30–40 pmoles of forward and reverse primers. Oligonucleotides used in this study are shown in Supplementary Table 1 (Supplementary Material online). After 3 min at 95°C, 25 cycles of 95°C for 15 s, 60°C for 30 s, and 72°C for 1 min were carried out. Analysis of the products was carried out on 1.2% agarose gels in TAE buffer. Quantification was carried out by removing aliquots of the polymerase reaction every four cycles starting at eight cycles, dot-blotting samples onto a charged nylon membrane, and hybridization exactly as previously described (Pereira et al. 1998). The quantity of product for the Hox genes after 24 amplification cycles was compared to the S. mansoni 28 kilodalton glutathione S-transferase (Sm28GST) product obtained after 16 cycles. Dot blots were scanned using a PhosphorImager (Molecular Dynamics [Amersham Biosciences Europe GmbH], Saclay, France), and results are expressed as the relative intensity of the mean integrated signal (three determinations) for the Hox genes compared to Sm28GST.
BAC Library Screening and Chromosomal FISH Mapping
Genomic DNA clones were obtained by screening an S. mansoni BAC library (Le Paslier et al. 2000) on high-density nylon filters, again using the cDNA insert as a probe. Hybridization was carried out as described previously (Le Paslier et al. 2000). Growth of BAC clones and BAC DNA preparations were as previously described (Le Paslier et al. 2000). The extremities of BAC inserts were sequenced using the BigDye Terminator Cycle Sequencing kit (Applied Biosystems) and the −21 and Rev M13 primers. Probes corresponding to BAC insert extremities were generated using PCR and used to rescreen the BAC library for overlapping clones as previously. Entire BAC inserts were sequenced using a shotgun strategy. FISH was performed on S. mansoni sporocyst metaphase chromosome spreads with BAC clones using techniques previously described (Hirai and LoVerde 1995; H. Hirai and Y. Hirai 2004).
Sequence Analysis and Phylogenetic Tree Construction
Accession numbers of the sequences included in the data set are listed in the legends of figures 1 and 2. Amino acid sequences were aligned with the use of the BioEdit v7.0.1 package (http://www.mbio.ncsu.edu/BioEdit/bioedit.html), and phylogenetic inference was restricted to sites that could be unambiguously aligned (60 shared sites, corresponding to the homeodomain, were analyzed). Full-length alignments and sites used in the analysis are shown in Supplementary Material online (Fig. 1). Phylogenetic analysis of the data set was carried out using MrBayes v3_0b4 (Huelsenbeck and Ronquist 2001). Bayesian analysis was performed using the Jones-Taylor-Thornton (JTT) amino acid replacement model (Jones, Taylor, and Thornton 1992) + Γ (gamma distribution of rates and four rate categories) + I (proportion of invariant sites),with proportion of invariant sites and the shape parameter alpha of Γ distribution estimated from the data. Briefly, starting trees were random, four simultaneous Markov chains were run for 3 million generations, burn-in values were set at 80,000 generations (based on empirical values of stabilizing likelihoods), and trees were sampled every 100 generations. Bayesian posterior probabilities were calculated using a Markov chain Monte Carlo sampling approach (Green 1995) implemented in MrBayes v3_0b4. For comparison, the same data set was also analyzed using two other methods, PUZZLEBOOT (by Holder and Roger: http://www.tree-puzzle.de) and Neighbor-Joining (NEIGHBOR [PHYLIP package v3.62]). Tree-Puzzle v5.1 (Strimmer and von Haeseler 1996) was used to estimate the model of amino acid transition, proportion of invariant sites, and among-site rate variation categories under gamma or gamma plus invariant models. The model was variable time (Müller and Vingron 2000), and the estimated shape parameter alpha and the fraction of invariable sites were 0.43 and 0.12, respectively. Maximum likelihood–corrected distance matrices were calculated using Tree-Puzzle v5.1 and the shell script PUZZLEBOOT and then analyzed with NEIGHBOR (PHYLIP package v3.62). Support values for the distance tree were obtained by bootstrapping (Felsenstein 1985) 1,000 replicates with SEQBOOT implemented in the PHYLIP package v3.62. For the Neighbor-Joining analysis, the reliability of branches was evaluated by bootstrap with 10,000 replicates.
Screening of BAC Sequences and Genomic Data Mining
Public sequence expressed sequence tags (ESTs) were extracted from GenBank (http://www.ncbi.nlm.nih.gov/dbEST/index.html) and pooled into different sets (S. mansoni sequences, nonvertebrate sequences). Protein sequences were extracted from Swissprot and TrEMBL (http://us.expasy.org/sprot/; S. mansoni sequences, nonvertebrate sequences, all others). There are currently 152,942 schistosome ESTs in the databases, and 138,259 of these have been assembled into 12,912 contigs and 20,753 singletons on the The Institute for Genomic Research (TIGR) gene index database (http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=s_mansoni). Schistosoma mansoni ESTs were first screened using Blast for comparison between ESTs and proteins against the BAC sequences. Alignments with a score better than 96% over a minimum of 100 bases were retained. Located genes were further checked against the TIGR S. mansoni gene index database (http://tigrblast.tigr.org/tgi/). No gene structure prediction programs were used as no training set was available.
For screening of genomic sequence data, individual shotgun sequencing reads and contigs available from the Sanger Centre (http://www.sanger.ac.uk/Projects/S_mansoni/) and TIGR (http://www.tigr.org/tdb/e2k1/sma1/) Web sites were subjected to TBlastN analysis using the Blast Stand-Alone Programme for Win32 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST-BLAST/) using libraries downloaded from ftp://ftp.sanger.ac.uk/pub/pathogens/Schistosoma/mansoni/genome, with the homeodomain peptide sequences encoded by genes cloned during this study. A final series of analyses was done at the Sanger Centre schistosome genome Blast site (http://www.sanger.ac.uk./cgi-bin/blast/submitblast/s_mansoni).
PCR Cloning of S. mansoni Hox Complex Genes
The use of degenerate primers based on highly conserved homeodomain peptide sequences in PCR reactions on cDNA or genomic DNA from S. mansoni generated a large number of distinct fragments, only three of which encoded peptide sequences with a high level of identity to homeodomains belonging to the Drosophila melanogaster (Dm) Antp-like Hox cluster. These sequences were homologous, respectively, to DmDfd/mammalian PG 4, DmAbd-A/PG 8, and Dmlab/PG 1 homeodomains. The screening of a cDNA library, followed by 5′ and 3′ RACE were used to extend these sequences and allowed the isolation of cDNA sequences of 2,468 bp for SmHox4, 1,085 bp for SmHox1, and 3,248 bp for SmHox8. The sequences of SmHox4 and SmHox8 contain 5′ noncoding regions upstream of open reading frames encoding peptide sequences of 543 and 715 amino acids, respectively. Results of northern blotting (not shown) suggest that these two cDNA sequences are complete. The SmHox1 cDNA sequence encodes a 318 amino acid sequence that is continuous from the 5′ end. Moreover, northern blotting (not shown) suggests that the complete mRNA is approximately 3.1 kb in length. Repeated attempts to extend the sequence in the 5′ direction using RACE failed, possibly due to the very low levels of expression of this gene (see below). In addition 5′ and 3′ RACE were used to extend the previously described partial sequence of Smox1 (Webster and Mansour 1992) leading to the characterization of the complete coding sequence. The cDNA sequence obtained (2,927 bp) encodes a 745 amino acid peptide sequence which allowed us to complete the homeodomain sequence which was previously truncated.
Sequence Analysis and Phylogeny of the Homeodomains
Within the peptide sequences of the three novel S. mansoni Hox proteins as well as that of Smox1, only the homeodomain and its C-terminal extension (or parapeptide sequence) showed significant sequence conservation compared to available sequences from other species (fig. 1). All four show a high level of sequence conservation within the homeodomain compared to orthologues from other organisms. For instance, SmHox4 and SmHox1 show 85% and 83% identity, respectively, to DmDfd and Dmlab, and 87% and 82% identity, respectively, to the lophotrochozoan sequences, G. tigrina HoxA and Lingula anatina lab. Similarly, the SmHox8 homeodomain shows 88% identity to DmAbd-A and 92% identity to Pnox1a from the turbellarian P. nigra. In the case of SmHox8, key signature residues within the homeodomain sequence (shaded in fig. 1) and the presence of a conserved Ubd-A parapeptide C-terminal to the homeodomain (Balavoine, de Rosa, and Adoutte 2002) lend further support to the inclusion of the platyhelminths within the lophotrochozoan clade of protostomes. The Smox1 homeodomain (fig. 1D) showed the highest sequence identity to G. tigrina HoxE (93% identity) and HoxC (93%). Smox1 also has a parapeptide sequence C-terminal to the homeodomain which is very similar to those of lophotrochozoan Lox5 proteins (fig. 1D).
A phylogenetic tree of the Hox complex homeodomain sequences was constructed by Bayesian inference (MrBayes) and rooted with the Drosophila even-skipped homeodomain (fig. 2). The schistosome homeodomains cluster within their orthology groups and are generally most closely associated to other lophotrochozoan sequences, particularly platyhelminth sequences. All the major groupings are supported by high Bayesian Posterior Probability values (3 million generations). Thus, SmHox8 clusters with P. nigris Pnox1a and Pnox1b, SmHox4 with G. tigrina HoxA, and insect Dfds and SmHox1 with insect lab, vertebrate HoxB1, and P. nigris Pnox3. The Smox1 homeodomain clusters with the Lox5/ftz family and in particular with the GtDthoxC and E sequences from G. tigrina. Similar tree topologies were obtained using both PUZZLEBOOT and Neighbor-Joining analyses and supported the clustering of the schistosome homeodomains within the respective orthology groups (not shown).
Expression of Hox Genes During the S. mansoni Life Cycle
In order to determine whether the Hox complex genes were expressed at distinct periods of the schistosome life cycle, we carried out semiquantitative reverse transcriptase (RT)–PCR on RNA from the major stages (fig. 3). mRNA for the three Hox genes cloned during this study, as well as for Smox1, was detected at all life-cycle stages, but in markedly different amounts. SmHox1 was very weakly expressed at all life-cycle stages compared to the other genes (fig. 3A and C), whereas Smox1 was the most strongly expressed (fig. 3C). Interestingly, the four genes showed markedly different patterns of expression during the life cycle (fig. 3B). Whereas SmHox8, SmHox1, and Smox1 were all strongly expressed in eggs and miracidia compared to other stages, SmHox1 was much more weakly expressed in all the other stages. Although eggs and miracidia were analyzed separately in this study, the population of eggs isolated from livers of infected hamsters contained a mixture of immature and mature miracidial larvae. Development of the embryo commences just after egg deposition (Nez and Short 1957) and requires 11–12 days in host tissues in the case of Schistosoma japonicum (Ho and Yang 1979). Increased expression of mRNA in these stages may reflect a state of readiness for the transformation into the sporocyst within the intermediate host. SmHox8 was weakly expressed in cercariae and adult female worms, whereas the expression of Smox1 was more homogeneous throughout the life cycle. SmHox4, on the other hand, was weakly expressed in adult worms and eggs and most strongly expressed in cercariae. These results show that Hox gene expression in S. mansoni is developmentally regulated, but no conclusion can be made about its eventual coordination.
Localization of Hox Genes in the S. mansoni Genome: Isolation of BAC Clones and Genome Walking
Southern blotting of S. mansoni DNA digested with HindIII with probes generated by PCR from the cDNA sequences of the four Hox genes under investigation demonstrated the presence of only one band (not shown), indicating that one copy of each gene was present in the genome. The same probes were used to screen the S. mansoni BAC library (Le Paslier et al. 2000). Between one (Smox1) and seven (SmHox1) clones were obtained for each gene (Fig. 1 in Supplementary Material online) in line with the observed eightfold genome coverage of the library. In order to determine whether the S. mansoni Hox complex is compact in the genome or, on the contrary, widely spread, the extremities of the inserts from selected clones corresponding to each Hox gene were sequenced and the sequences (see Table 2 in Supplementary Material online) used to generate new probes to allow genome walking. The BAC clones obtained are indicated in Supplementary Material online (Fig. 2), and none allowed us to “walk” from one Hox gene to another. This approach was limited to one round of walking in each direction due to the presence of gaps in the coverage of the BAC library and the high frequency of repetitive sequences in the BAC ends (El Sayed et al. 2004). Nevertheless, our results suggest that the schistosome Hox complex, if it exists, is unlike the mouse complexes which span only about 200 kbp and would resemble the more extended complexes of D. melanogaster or Caenorhabditis elegans.
In agreement with this hypothesis, probes generated from the 5′ and 3′ ends of SmHox8 hybridized to different sets of BAC clones, suggesting that the gene was very large. In order to determine the structure of this gene, the corresponding probes were hybridized to Southern blots of HindIII digests of the BAC DNA from two clones hybridizing with the 5′ probe (29E6 and 2B16) and two clones hybridizing with the 3′ probe (16E16 and 62C8), as well as a control clone not hybridizing with either probe (39E9). Results show that, as expected, the 5′ probe hybridized specifically to one 12-kbp fragment present in the digests of 29E6 and 2B16 (not shown) and that the 3′ probe hybridized specifically to two bands of approximately 1.7 and 1.9 kbp in clones 16E16 and 62C8 (not shown). A probe generated from one extremity of clone 16E16 hybridized with clone 2B16 (Fig. 1 in Supplementary Material online), but the latter does not contain the 3′ end of the SmHox8 gene. This suggested that clone 2B16 spans the gap between clones 29E6 and 16E16.
Structure of the Hox Genes: BAC Sequencing
As part of the Schistosome Genome Sequencing Project and as preparation for whole-genome sequencing, several BAC clones were sequenced using the shotgun method (sequences are available at ftp://ftp.sanger.ac.uk/pub/pathogens/Schistosoma/mansoni/BACs). Among these, the BAC clones 29E6 and 16E16 containing the SmHox8 gene, 1F8 containing the SmHox4 gene, and 54H2 containing the SmHox1 gene were fully sequenced. These sequences permit the definition of the organization of these three Hox genes (fig. 4). BAC 29E6 (92,068 nt) contains the 5′ section (nucleotides 1–2137) of the SmHox8 gene as a single exon spanning nucleotides 64486–66622, whereas BAC 16E16 (22,170 nt) contains the 3′ section, again as a single exon from nucleotides 19473–20547. The single intron therefore spans 44,919 bp on these two BACs, which do not overlap. Sequences corresponding to the extremities of BAC 2B16 (Table 2 in Supplementary Material online) are found as expected, respectively, on BAC 29E6, starting upstream of the SmHox8 gene at position 29416, and on BAC 16E16 (starting at position 18468), 5′ of the second exon of the SmHox8 gene. This confirms that BAC 2B16 spans the gap between the two other clones as proposed in Supplementary Material online (Fig. 1). In order to determine the size of the gap, PCR was carried out using BAC 2B16 as a template and oligonucleotides derived from the 3′ end of BAC 29E6 and the 5′ end of BAC 16E16 (Table 1 in Supplementary Material online). A unique band of about 450 bp was obtained (not shown) and because the amplified extremities of BACs 29E6 and 16E16 account for 447 bp, we conclude that these BACs are practically contiguous. This suggests that the single intron in the SmHox8 gene spans about 45 kbp. In the case of the SmHox4 gene on BAC 1F8, the full cDNA sequence is present on the same BAC, but split into a small 5′ exon (to nucleotide 334) and a larger 3′ exon separated by a very large intron (42,048 bp). Finally, the incomplete SmHox1 cDNA sequence is contained within a single exon on BAC 54H2. However, the open reading frame is interrupted in the genomic sequence 10 codons upstream of the 5′ end of the cDNA sequence, and although there is another open reading frame encoding 104 amino acids 106 bp upstream in the genomic sequence, it has so far proved impossible to link the two using RT-PCR. The nature of the 5′ end of the SmHox1 gene therefore remains unknown.
The BAC clone containing the Smox1 gene (31N21) was not completely sequenced, but the structure of the gene was determined by sequencing the BAC DNA using primers covering the entire coding sequence. The gene contains three introns (fig. 4A) the size of which has not been determined. The first intron is situated immediately 5′ of the homeodomain, and this intron position is highly conserved in Lox5 and ftz genes, supporting the orthology of these groups (Telford 2000). This further supports the identity of Smox1 as a Lox5/ftz orthologue. The two remaining introns are within the homeodomain-encoding sequence.
Because the S. mansoni Hox cluster appeared to be dispersed in the genome, we examined the sequences of the BAC clones containing Hox genes to determine the nature of the genetic material in their vicinity (see Materials and Methods). In particular, we looked for repetitive transposable elements and for genes encoding other proteins. Figure 5 contains maps of the four BAC clones showing these features. There is a clear difference between BAC 54H2, containing SmHox1, and the three other BACs. BACs 1F8, 29E6, and 16E16 contain multiple copies (partial or complete, not shown) of sequences corresponding to the Smα retroposon (Spotila et al. 1989). BACs 1F8 and 29E6 contain two partial copies and the short BAC 16E16, one of SR2 family non–long terminal repeat (non-LTR) retrotransposons (Drew et al. 1999). These BAC clones neither contained open reading frames with significant sequence identity to known proteins nor did they contain transcribed sequences corresponding to S. mansoni ESTs. In contrast, BAC 54H2 has a dual profile. The 5′ extremity of the BAC (the first 28,800 bp of the 146,983-bp insert) is similar to the other BACs containing Hox genes. Apart from the SmHox1 gene, this region has a mean GC content of 31.2% and contains one copy of the Smα retroposon, concentrations of which are also visible in FISH data below (Fig. 3b in Supplementary Material online) that shows some signals (yellow) at paracentric regions in some chromosomes besides a main signal at the mid region of chromosome 3. There follows one complete copy of each of two LTR retrotransposons, Boudicca (Copeland et al. 2003) and Saci-3 (DeMarco et al. 2004). The remainder of the sequence (fig. 5) has a higher GC content (36.6%) and is gene rich. It contains at least three potential genes (pyruvate kinase, a gene with identity to the human T-complex protein 1, epsilon subunit, and a gene with identity to an insect intestinal mucin) and a number of other sequences represented among the ESTs. In addition, it contains nine fragments with significant identity to the SR2 family of non-LTR retrotransposons. The evident rupture in the characteristics of the genomic sequence following the LTR retrotransposons seems consistent with a recombination event.
Genome-Wide Search for Hox Gene Paralogues
In order to determine whether more than one copy of the Hox cluster was present in the schistosome genome, the available schistosome genome sequence data (individual shotgun reads and assembled sequences), representing an approximate ninefold genome coverage (the genome size is approximately 270 Mb; Simpson, Sher, and McCutchan 1982), were screened using the homeodomain sequences of SmHox8, SmHox4, SmHox1, and Smox1 and the TBlastN program. Only one copy of each of the first three genes was found, but one paralogue was detected for Smox1. This genomic sequence (shisto3266f03.p1k) contains four short exons (71, 67, 17, and 187 bp) which, when assembled, encode the complete homeodomain (fig. 4B) as well as the parapeptide and some downstream sequence. The introns are also short and, although of identical size (56 bp), their sequences are unrelated, but each contains the consensus GT and AG dinucleotides at the 5′ and 3′ ends, respectively. The alignment of the encoded peptide sequence to the Smox1 homeodomain shows 90% sequence identity, and one of the three intron positions is identical to that of the first intron within the homeodomain of Smox1 (fig. 4B), reinforcing the view that these genes are paralogues. This gene has therefore been named SmHox5. However, the parapeptide sequence (NFKSLNDPN) is less well conserved than that of Smox1 compared to other lophotrochozoan Lox5 sequences.
Localization of the S. mansoni Hox Genes on Chromosomes
The FISH technique was used to localize BAC clones containing S. mansoni Hox genes on metaphase chromosomes from sporocysts. In figure 6 (black and white) and Supplementary Material online (Fig. 3, colour), while SmHox8 and SmHox4 are unequivocally localized on chromosome 4 (fig. 6a, c, and d), SmHox1 and Smox1 are both on the long arm of chromosome 3 (fig. 6b and e). This unexpected and unusual separation of Hox complex genes on different chromosomes was confirmed when four probes, corresponding to all four Hox genes, were used and gave signals on both chromosomes 3 and 4 (fig. 6f). This localization also suggests that the order of genes within the Hox cluster differs from that found in arthropods and vertebrates because Smox1, an ftz/Lox5 orthologue, should be situated between SmHox4 and SmHox8. It is also possible that SmHox5, found by in silico screening of the genome database, is in this position on chromosome 4 and that Smox1 was translocated to chromosome 3 after the duplication event which gave rise to these paralogous genes. However, in this case it is difficult to suggest a mechanism for a separate translocation of SmHox1 to the same area of chromosome 3 in a different recombination event, and it is hence more probable that the gene order of the cluster changed prior to a single recombination which transferred both SmHox1 and Smox1 to chromosome 3.
In this first characterization of the genomic structure of a lophotrochozoan Hox cluster, we have shown that it is both dispersed on two different chromosomes and disintegrated, in that the order of the Hox genes seems to differ from that of arthropod or vertebrate clusters. We initially cloned and sequenced the cDNA of three new Hox cluster genes from the platyhelminth parasite, S. mansoni, as well as obtaining the complete coding sequence of the previously partially characterized Smox1 gene (Webster and Mansour 1992). Homeodomain peptide sequence alignments and phylogenetic analysis confirmed that the new genes encoded orthologues of mammalian/arthropod Hox8/Abd-A, Hox4/Dfd, and Hox1/lab and that Smox1 is an orthologue of Hox5/ftz and of lophotrochozoan Lox5. The alignments of the parapeptide sequences of SmHox8 and Smox1 confirmed the phylogenetic clustering of schistosome Hox genes with those of other lophotrochozoans. This further supports the classification of the Platyhelminthes within this grouping, although it is increasingly probable that the Acoela, previously grouped within the Platyhelminthes, are basal bilaterians (Ruiz-Trillo et al. 2002; Baguna and Riutort 2004).
Among the schistosome Hox cluster genes, the nature of Smox1 required definition. Smox1 was first characterized as a partial cDNA clone encoding a truncated homeodomain (Webster and Mansour 1992). The complete coding sequence we have obtained has allowed us to characterize this gene as an orthologue of arthropod ftz/lophotrochozoan Lox5. Telford (2000) has argued that the arthropod ftz gene derives from a Hox gene orthologous to Lox5 and, although it is not itself a homeotic gene, is derived from one and possibly lost its homeotic function due to redundancy of function with Scr. An alternative hypothesis (De Rosa et al. 1999) is that ftz and Lox5 are not orthologues but arose from separate duplications, leading to ftz and Antp in arthropods and to Lox5 and Lox7 in lophotrochozoans. However, subsequent phylogenetic studies of homeodomains clearly cluster arthropod ftz with Lox5, whereas Antp clusters with Lox7. Moreover, the consensus hexapeptide (F(F/Y)PWM(K/R)SYTD, N-terminal to the homeodomain) is present in both arthropod ftz (although it is not well conserved in Drosophila ftz) and lophotrochozoan Lox5, including Smox1. Finally, the conserved intron 5′ of the homeodomain in both ftz and Lox5 genes provides a further argument for orthology of these genes. Intriguingly, the full-length peptide sequence of Smox1 includes an LXXLL consensus sequence in the region N-terminal to the homeodomain. This sequence mediates the interaction of transcriptional coactivators with nuclear receptor AF-2 domains (Heery et al. 1997). It is present in Drosophila ftz and mediates its functional interaction with the Ftz-F1 nuclear receptor (Schwartz et al. 2001; Suzuki et al. 2001; Yussa et al. 2001). The functional interaction of a vertebrate orthologue of Ftz-F1, LRH-1, with a homeodomain cofactor (Steffensen et al. 2004) suggests that this type of mechanism may be conserved. Therefore, the intriguing possibility of a similar interaction between S. mansoni Ftz-F1 (de Mendonça et al. 2002; Bertin et al. 2004) and Smox1 is under investigation.
Smox1 is the only gene among those we investigated for which a possible paralogue was found by screening the available genome sequence databases. The homeodomain sequence of SmHox5 obtained is complete and contains three introns, one of which is at a conserved position compared to one of the two introns within the Smox1 homeodomain. However, the 5′ end of the gene fragment terminates downstream of the intron position conserved within the Lox5 family (Telford 2000), and it is therefore not possible to determine whether it is present in SmHox5. Although the latter is clearly a paralogue of Smox1, the divergent gene structure (two nonconserved intron positions) and the divergent parapeptide sequence of SmHox5 seem to indicate that the duplication giving rise to the two genes must be relatively ancient.
The presence of two Lox5 paralogues raised the possibility of the overall duplication of the Hox cluster in schistosomes which would explain the localization of Hox genes on two different chromosomes. All invertebrates so far studied, as well as in the cephalochordate amphioxus, and the urochordates Ciona intestinalis and Oikopleura dioica have unique Hox clusters. Nevertheless, duplication of Hox genes has been observed, notably in the planarian D. tigrina in which the genes DtGtHoxC and DtGtHoxE were proposed to derive from a recent duplication event (Bayascas et al. 1997). Moreover, while these two genes were putatively considered to be orthologues of Drosophila Antp (Bayascas et al. 1997), we have shown by phylogenetic analysis (fig. 2) that they cluster with SmHox5, Smox1, and Drosophila ftz. In another planarian, P. nigra, there is a duplication of the Hox1 (Pnox2 and 3) and Hox8 (Pnox1a and 1b) genes (De Rosa et al. 1999). In addition, Bryozoans, which are also lophotrochozoans, have a duplicated Hox4 gene (Passamaneck and Halanych 2004). However, exhaustive searches of the genomic databases failed to detect paralogous forms of SmHox1, 4, or 8, suggesting that the Hox5 duplication is specific to this gene. We can therefore conclude that the observation that SmHox1 and Smox1 are located on chromosome 3, whereas SmHox8 and SmHox4 are on chromosome 4, shows that the schistosome Hox cluster is both dispersed and disintegrated. Although most characterized Hox clusters have been shown to conserve the gene order seen in arthropod and mammalian clusters, there are notable exceptions. The Drosophila Hox cluster is split into the Antp and Bithorax clusters, which are both located on chromosome 3R, but are separated by 9.5 Mb (Aboobaker and Blaxter 2003). Similarly, in Bombyx mori, the lab and Hox2 genes, although present on the same chromosome as the rest of the Hox cluster, are widely separated from it (Yasukochi et al. 2004), in contrast to the intact Hox clusters in other insects such as Anopheles gambiae (Devenport, Blass, and Eggleston 2000). In the nematode C. elegans (Van Auken et al. 2000), the Hox cluster is also split and two genes, Ceh13 (lab/Hox1) and lin 39 (Scr/Lox20), are inverted compared to the arthropod and mammalian gene order. Again, however, the C. elegans cluster is situated on one chromosome (III). This is not the case for the Hox clusters of two urochordates. The O. dioica Hox genes (Seo et al. 2004) are all on separate BAC clones, despite the very compact (60–70 Mb) genome, suggesting that they are not all on the same chromosome. In C. intestinalis, the localization of the Hox genes to chromosomes using FISH (Ikuta et al. 2004) showed that seven of nine genes are located on a single chromosome, the others being located on one other chromosome. Moreover, among the seven genes that colocalized to one chromosome, the usual gene order was not maintained, with notably CiHox10 located between CiHox2–4 and CiHox5–6. In addition, the seven genes are spread out almost over the whole length of the chromosome (about 5 Mb). The Urochordata represent the earliest branch of the chordates, and in all other members of this phylum so far examined Hox gene order is strictly maintained. This suggests that the disruption observed is specific to the urochordate lineage.
It has been suggested (Ferrier and Holland 2002) that the integrity of the Hox cluster and gene order in different organisms are related to the colinearity of expression of the Hox genes. They propose the hypothesis that the ancestral cluster was contiguous with both spatial and temporal colinearity of gene expression. In lineages with derived modes of embryogenesis which are rapid and/or implicate a low number of cells and mosaic development (as is the case in Drosophila, C. elegans, and the urochordates) temporal colinearity of expression has been lost, together with the constraints on cluster organization. In both C. intestinalis and O. dioica, despite the disruption of the Hox cluster, a degree of spatial colinearity of expression of some of their Hox genes has apparently been maintained (Ikuta et al. 2004; Seo et al. 2004), suggesting that it is not as dependent on cluster organization as is temporal colinearity.
Although schistosome embryo development has not been extensively studied, it has been shown that the fertilized ovum undergoes asymmetric cell division (Nez and Short 1957) and that the embryo contains relatively few cells. It is therefore probable that they share with other lophotrochozoans such as annelids (Shankland 1991) a lineage-dependent mechanism for determining cell fate. The conditions are thus present for the removal of constraints on Hox cluster organization set out by Ferrier and Holland (2002). However, the data on Hox gene expression we obtained using RT-PCR are not adequate to determine whether or not colinearity of expression has been lost and studies using in situ hybridization will be necessary to decide this point. High levels of Hox gene expression detected in the egg and in the miracidium suggest that they may be involved in embryo development. However, this is not the only schistosome life-cycle stage at which the determination of cell fate along the anteroposterior axis may be important, the other being the development of infective cercariae from secondary sporocysts within the snail intermediate host. All the Hox genes studied were expressed in sporocysts, but RT-PCR was performed on mRNA from primary sporocysts generated in vitro which may explain the relatively low levels detected compared to eggs. Moreover, the high relative level of expression of SmHox4 mRNA in cercariae and the expression of Smox1 and SmHox8 in adult male worms probably indicates that these genes are also involved in processes other than embryo development.
Ferrier and Holland (2002) also suggested that the relaxation of constraints on Hox cluster organization would be followed by the invasion of the cluster by repetitive, transposable elements which would facilitate recombination events leading to disruption of the clusters. The presence of extremely large introns in two of the genes studied and the genome walking experiments show that the schistosome Hox cluster is spread out widely, similar to the Drosophila and C. elegans clusters and in contrast to vertebrate clusters. Whereas only fragments with sequence identity to members of the non-LTR retrotransposon SR2 family are present in the vicinity of SmHox8 and SmHox4, the presence of two complete copies of schistosome LTR retrotransposons downstream of the SmHox1 gene may be indicative of the role played by such elements in the dispersal of the schistosome Hox cluster. Moreover, these retrotransposons are followed by a transition from a gene-poor, AT-rich region, similar to that of the other BAC clones, to a gene-rich region with a higher GC content which may point to a translocation event which placed SmHox1 (as well as Smox1) on chromosome 3.
To conclude, we have shown that the Hox cluster of a lophotrochozoan, the platyhelminth parasite S. mansoni, is both dispersed and disintegrated with constituent genes present on two different chromosomes. This dispersion, as well as the nature of the genomic sequence in the vicinity of one of the putatively translocated genes, SmHox1, is consistent with a relaxation in constraints on cluster organization due to a lineage-dependent mode of embryogenesis. However, temporal colinearity of Hox gene expression and compact Hox clusters have so far only been described in the cephalochordate, amphioxus, and in vertebrates, but not in Cnidaria, Lophotrochozoa, or Ecdysozoa. In insects, ordered, but not compact, Hox clusters have been maintained along with spatial colinearity of expression. An alternative hypothesis to that of Ferrier and Holland (2002) is therefore that temporal colinearity of Hox gene expression was acquired in the chordate lineage and that the gene order in invertebrate Hox clusters was determined by other constraints.
Tables 1 and 2 and Figures 1–3 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). DNA sequences obtained in the course of this study have been deposited in GenBank with the accession numbers noted in Table 2.
The work was supported by the Institut National de la Santé et de la Recherche Médicale (U547), the Institut Pasteur de Lille, the Centre National de la Recherche Scientifique, the Microbiology program of the Ministère de l'Education Nationale, de la Recherche et de la Technologie (MENRT), the Wellcome Trust Beowulf Programme, and the Japan Society for Promotion of Science grant 13557021 (to H.H.). W.W. was supported by Inserm (Poste Vert). We thank Ricardo de Mendonça for his gift of the lambda gt10 S. mansoni cDNA library. We also thank the Sanger Institute (Hinxton, United Kingdom) and The Institute for Genome Research (Rockville, Md.) for making available the draft S. mansoni genome sequences.
*Inserm U 547, Institut Pasteur de Lille, 1 rue du Professeur A. Calmette, 59019—Lille, France; †Primate Research Institute, Kyoto University, Inuyama, Aichi, Japan; ‡The Welcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom; §School of Biology, Institute for Research on Environment and Sustainability, Devonshire Building, University of Newcastle upon Tyne, United Kingdom; ∥Experimental Taxonomy Division, Department of Zoology, The Natural History Museum, London, United Kingdom; ¶Centro de Pesquisas Rene Rachou, Fiocruz, Belo Horizonte, Brazil; #Centre de Génétique Moléculaire, Centre National de la Recherche Scientifique, UPR 2167, 91190 Gif sur Yvette, France