Myosins constitute a superfamily of motor proteins that convert energy from ATP hydrolysis into mechanical movement along the actin filaments. Phylogenetic analysis currently places myosins into 17 classes based on class-specific features of their conserved motor domain. Traditionally, the myosins have been divided into two classes depending on whether they form monomers or dimers. The conventional myosin of muscle and nonmuscle cells forms class II myosins. They are complex molecules of four light chains bound to two heavy chains that form bipolar filaments via interactions between their coiled-coil tails (type II). Class I myosins are smaller monomeric myosins referred to as unconventional myosins. Now, at least 15 other classes of unconventional myosins are known.
How many myosins are needed to ensure the proper development and function of eukaryotic organisms? Thus far, three types of myosins were found in budding yeast, six in the nematode Caenorhabditis elegans, and at least 12 in human. Here, we report on the identification and classification of Drosophila melanogaster myosins. Analysis of the Drosophila genome sequence identified 13 myosin genes. Phylogenetic analysis based on the sequence comparison of the myosin motor domains, as well as the presence of the class-specific domains, suggests that Drosophila myosins can be divided into nine major classes. Myosins belonging to previously described classes I, II, III, V, VI, and VII are present. Molecular and phylogenetic analysis indicates that the fruitfly genome contains at least five new myosins. Three of them fall into previously described myosin classes I, VII, and XV. Another myosin is a homolog of the mouse and human PDZ-containing myosins, forming the recently defined class XVIII myosins. PDZ domains are named after the postsynaptic density, disc-large, ZO-1 proteins in which they were first described. The fifth myosin shows a unique domain composition and a low homology to any of the existing classes. We propose that this is classified when similar myosins are identified in other species.
The past decade has seen a significant increase in research on myosins. A major effort has been put into finding novel members of this family of actin-based motor proteins. More than 16 classes of myosins have been discovered and characterized, and this number is still rising (Hodge and Cope 2000<$REFLINK> ; Sellers 2000<$REFLINK> ). These myosins are often referred to as unconventional (Mooseker and Cheney 1995<$REFLINK> ). The total number of known myosins is 17 if the conventional two-headed filament forming myosin II is included in the classification. Myosins have been identified in a wide variety of eukaryotic organisms. Some myosin classes are found in phylogenetically diverse organisms, whereas others, which have arisen later in evolution, have been found in only a single organism (Hodge and Cope 2000<$REFLINK> ).
Current research concentrates on the functional analysis of these new types of myosins. A number of studies suggest that these motors play important roles in a variety of cellular functions, including organelle, RNA and protein transport, maintenance of the cell architecture, cell movements, and signal transduction (table 1 ).
All known myosins comprise an N-terminal head domain, a neck regulatory domain, and a specific carboxy-terminal tail domain (fig. 1 ) (Mooseker and Cheney 1995<$REFLINK> ). The head or motor domain contains ATP- and actin-binding sites and is responsible for the mechanochemical properties of the protein (Gilbert and Mackey 2000<$REFLINK> ). Myosins show an actin-stimulated Mg2+ ATPase activity, thus converting the energy stored in ATP into mechanical force (Volkmann and Hanein 2000<$REFLINK> ). The latter is used to move the myosin molecules along the actin filaments or to translocate other molecules (Hasson and Mooseker 1995<$REFLINK> ; Langford 1995<$REFLINK> ).
The neck domain contains regulatory sites, composed of IQ (isoleucine-glutamine) motifs, repeats of 23–30 aa (Mercer et al. 1991<$REFLINK> ; Rhoads and Friedberg 1997<$REFLINK> ). Each IQ motif provides a binding site for a calmodulin or a related protein of the EF-hand family (Kawasaki, Nakayma, and Kretsinger 1998<$REFLINK> ). EF proteins have helix-loop-helix motifs in which the loop contains highly conserved residues that bind Ca2+ ions. The size of the neck domain varies from one to seven IQ tandem repeats. In addition, the neck is often the site of alternative splicing. This produces necks with variable lengths (variable number of IQ repeats), which are associated with the regulatory function. In general, calmodulin activates a diverse group of target cellular proteins when bound to Ca2+. Interestingly, most of the unconventional myosins carry IQ motifs that bind calmodulin with higher affinity in the absence of Ca2+.
After the neck domain, each myosin has a highly divergent tail domain. A subset of myosin tails has predicted coil-coil α-helical domains, which promote the formation of dimmers, a typical example being the two-headed conventional myosin II. Some other myosins lack coiled-coil domains but contain structural domains found in other proteins (table 1 ).
The classification of myosins is based on the sequence comparison of their core motor domains (myosin head), equivalent to amino acids 88–780 of chicken skeletal myosin II (Cope et al. 1996<$REFLINK> ). The motor domain is highly conserved among all myosins, reflecting the high conservation of its function. However, they have a number of class-specific features (characteristic inserts or substitutions), which might be important in defining the precise function of a given myosin. For further information see the Myosin home page at http://www.mrc-lmb.cam.ac.uk/myosin/myosin.html (Hodge and Cope 2000<$REFLINK> ). Phylogenetic analysis of the tail domain sequences produces similar results, indicating that heads and tails have coevolved (Korn 2000<$REFLINK> ).
Five myosin genes have been identified in yeast (Saccharomyces cerevisiae), falling into three classes: two class I myosins, one class II myosin, and two class V myosins (Brown 1997<$REFLINK> ). It was suggested that the whole yeast genome had undergone a duplication in ancient times, followed by a number of modifications. As a result, a small fraction of the genes were retained in duplicate (most of them being deleted), thus explaining the loss of the second myosin II gene (Wolfe and Shields 1997<$REFLINK> ). Saccharomyces cerevisiae is the organism with the lowest known number of myosin genes. This demonstrates that a eukaryote can function with a set of only three types of myosins.
Multicellular organisms have the ability to express some 10–40 myosin genes encoding at least six types of myosins. It seems that multicellular organisms require a multitude of specialized myosins. This has raised the question of what is the degree of functional redundancy between the classes and between the members of a given myosin class.
So far, 11 myosin genes have been identified in the slime mould Dictyostelium discoideum. Despite the fact that this is one of the simplest multicellular organisms, it expresses a diverse set of myosin genes (Soldati, Geissler, and Schwarz 1999<$REFLINK> ). They encode at least six different classes of highly specific myosins. There are six class I myosins (MyoA, B, C, D, E, K, and probably MyoF) and one member each of class II (MhcA), class VII (MyoI) (Titus 1999<$REFLINK> ), and class XI (MyoJ) (Hammer and Jung 1996<$REFLINK> ). The highly divergent MyoM is still to be classified (Schwarz, Geissler, and Soldati 1999<$REFLINK> ).
Fourteen myosin genes have been identified in the nematode Caenorhabditis elegans (Baker and Titus 1997<$REFLINK> ). They encode two structurally distinct class I, six class II, one class V, two class VI, one class VII, and one class IX myosins. It was found that C. elegans has a highly divergent type of myosin, which is the founding and only member of class XII myosins.
The situation with vertebrates appears even more complex. They express some 40 myosin genes grouped into 12 classes. In humans, there are 8 class I, 16 class II, 2 class III, 3 class V, 1 class VI, 2 class VII, 2 class IX, 1 class X, 2 class XV, and 1 class XVI myosins (Hasson et al. 1996<$REFLINK> ; Berg et al. 2000<$REFLINK> ; Berg, Powell, and Cheney 2001<$REFLINK> ). Recent studies have discovered two PDZ-containing myosins (Furusawa et al. 2000<$REFLINK> ), as well as a novel unclassified myosin (Berg, Powell, and Cheney 2001<$REFLINK> ). PDZ domains are named after the postsynaptic, disc-large, ZO-1 proteins in which they were first described.
In Drosophila eight different myosin genes have been described thus far (figs. 1A, 2 , and table 2 ). There are two class I myosins, members of subclass 3 (myosin IB) and subclass 4 (myosin IA) (Morgan et al. 1994<$REFLINK> ; Mooseker and Cheney 1995<$REFLINK> ; Morgan, Heinzelman, and Mooseker 1995<$REFLINK> ). Only a single muscle myosin II gene was found in Drosophila (Hastings and Emerson 1991<$REFLINK> ; Bernstein and Milligan 1997<$REFLINK> ). It encodes more than 13 protein isoforms with complex temporal and spatial expression patterns (Bernstein and Milligan 1997<$REFLINK> ; Zhang and Bernstein 2001<$REFLINK> ). Drosophila has a second myosin II gene that encodes a cytoplasmic nonmuscle myosin (Kiehart and Feghali 1986<$REFLINK> ; Mansfield et al. 1996<$REFLINK> ). The founding member of class III myosins was discovered in Drosophila (Montell and Rubin 1988<$REFLINK> ). The ninaC-III gene encodes two isoforms resulting from alternative RNA splicing. These differ in the composition of their C-terminal tails and show differential expression patterns (Porter et al. 1992<$REFLINK> ; Li, Porter, and Montell 1998<$REFLINK> ). A single myosin V gene, with at least two different splice forms, was identified in Drosophila (Bonafe and Sellers 1998<$REFLINK> ; MacIver et al. 1998<$REFLINK> ). Kellerman and Miller (1992)<$REFLINK> cloned a novel unconventional myosin from Drosophila—the first member of class VI myosins. The gene produces multiple protein isoforms, which are present throughout Drosophila development (Kellerman and Miller 1992<$REFLINK> ; Mermall and Miller 1995<$REFLINK> ; Deng, Leaper, and Bownes 1999<$REFLINK> ; Hicks et al. 1999<$REFLINK> ). Drosophila myosin VIIA was the first member of this class to be described (Cheney, Riley, and Mooseker 1993<$REFLINK> ; Kiehart et al. 1998<$REFLINK> ).
While this manuscript was in preparation, another paper dealing with Drosophila myosins was published (Yamashita, Sellers, and Anderson 2000<$REFLINK> ). The data from both research groups is complementary, with our manuscript focusing in detail on the molecular analysis and domain structure of the myosins.
Materials and Methods
Analysis of the Drosophila Genome—Analysis and Manipulation of Sequences
We have used the completed D. melanogaster Genome Project to determine the number of myosin-encoding genes in this species and to classify them. The new myosin genes were identified by comparing the Drosophila genome sequence with the conserved head (equivalent to amino acids 88–780) of the chicken skeletal myosin II. Comparison with the GenBank Data Base was done using the BLASTP algorithm of the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/BLAST/), the Pôle Bio-Informatique Lyonnaism server (http://pbil.univ-lyon1.fr/BLAST/blast.html), and The Berkeley Drosophila Genome Project BLAST (http://www.fruitfly.org/blast/). This search retrieved the eight previously described myosin genes as well as three new genes: Myo 28B1 (AAF52536), Myo10A (AAF47980), and Mhcl (or Myo89B) (AAF55271). Subsequent detailed analysis was done using the conserved head or specific tails from all known classes of myosins to search the translated Drosophila genome sequence (TBLASTN on the NCBI server). These searches identified two other sequences highly similar to myosins: Myo29D (AAF52683, AF454348) and Myo95E (AAF56246, AF454350).
Domain Analysis—Multiple Sequence Alignment
The domain structure was predicted with the Simple Modular Architecture Research Tool (SMART) server at http://smart.embl-heidelberg.de/ (Schultz et al. 1998<$REFLINK> ), the Pfam HMM database at http://pfam.wustl.edu/hmmsearch.shtml or http://www.sanger.ac.uk/Software/Pfam/search.shtml (The Sanger Centre), and the ProFile Scan Server of ISREC (Swiss Institute for Experimental Cancer Research) at http://www.isrec.isbsib.ch/software/ PFSCAN_form.html. Alignments of the detected domains were performed with CLUSTAL W (Thompson, Higgins, and Gibson 1994<$REFLINK> ) available from the Gene Jockey II software package distributed by Biosoft or from the WEB-based package at http://www2.ebi.ac.uk/clustalw/. Subsequently, the sequences were run on WEB-based BoxShade Server (http://ludwig-sun1.unil.ch:8080/software/BOX_form.html) and manually adjusted in Microsoft Word 98.
RNA Extraction and RT-PCR
Total RNA from larvae and adult flies was isolated using RNAeasy-Total RNA Kit (QIAGEN, Cat #: 74104). The RNA (4 μg) was primed with Oligo-p(dT)18 and reverse transcribed by Superscript™ II (GIBCO BRL), following the supplier's protocol. For subsequent DNA amplification, 2 μl out of 20 μl of the first-strand reaction mix was used. The PCRs were carried out with appropriate pairs of forward and reverse primers. The position of the primers is indicated in figure 3 . Primers for Myo95E: EF1 (TGT TGC TCG CCA GCA CAT G), EF2 (ATG GAG CAG GAA ATC GGC AC), EF3 (CAT CAG CGG CCT TCC TGA AT), EF4 (GAA GTA CAT AAG CGC TGC CT), ER1 (AGG CAG CGC TTA TGT ACT TC), ER2 (ACA ATT ATC TCC ATG CGG TTC G), ER3 (ACG TAG ATG CCT GAA CTA TC). Primers for Myo10A: AF1 (GCA GCA ATG AAT CAA CCG GT), AF2 (TGA TCT GGT CTG GTT CGA TC), AF3 (TCA GTG TCC AGA GGC ATG TG), AF4 (TGG AGT GGC GTG CCT GGA), AR1 (CAT CTT GTA CGG ATT CAC CG), AR2 (TCC GCA CGC GCA ACT TCC A), AR3 (CTT GCG GAA CTC CTG GAC A). Primers for Mhcl: MF1 (CAA CTT TAT GAA GAA GAG CGC), MF2 (AAG GCA GCT AGT GAT CAG GC), MF3 (TCG CAT AGG ACC CAG CCA G), MF4 (ATG TGG TCG GAT AAA AGT GCA), MF5 (GCT CTC AGA TCG CAT TAT ACA G), MR1 (AGC TCG CAG ATG TCC TCG A), MR2 (ACA TGG AGA CAA CCT TCT CG), MR3 (TAT ACG GAC GCA GGC GAT AG), MR4 (TCT TCC AGA TCA CTG ATA GAG). Primers for Myo29D: DF1 (ATC CGC ACA ACA TTC TGC AC), DF2 (ATG CAT CTT CAT CCA ACG AG), DF3 (AAT CAC AGC TTC AGC CAC AC), DF4 (GAG ACT GAT GCC TTC AAG CAC), DF5 (ACT TTG TGC GCT GCA TCC G), DR1 (GTC CCG ACA AGT GGA TCA G), DR2 (GTG CTT GAA GGC ATC AGT CTC), DR3 (AGG AAG AGT TGA ACA GAT GGA). Primers for genes in the Mhcl gene: IF1 (TTA CCT CCA TAA ACC TGC GG), IF2 (GTC GCC GAG CCC GAA GAG), IF3 (AAC GTC GCG TTC GCA AGA GG), IR1 (AAC GAT TCG GAG GTG CAC G), IR2 (CTA GCT CTG CGA AGA TCT CA), IR3 (CGT TCA TGG CTG CTA GTA CG). QIAGEN Taq and Stratagene Pfu Turbo® Polymerase in a 3:1 ratio were used in the PCR reactions. The QIAGEN PCR protocol designed to work with Q-Solution was followed. The PCR reactions were carried out as follows: one cycle at 94°C for 4 min; 35 cycles, step one—94°C for 40 s, step two—60°C for 40 s, step three—72°C for 1.8 min per expected kilobase pair of the PCR product; and one cycle at 72°C for 10 min. The obtained PCR products were isolated from Tris-acetate/Ethylenediaminetetra-acetate (TAE) gels, purified, and sequenced on a 373A automated DNA sequencer (ABI).
The BLASTP and BLASTN searches with the conservative chicken skeletal myosin II head, against the completed Drosophila genome sequence, retrieved the previously identified genes and five new myosin genes (fig. 1 ). We selected a limit of 30%–40% identity for a myosin to fall into a given myosin class, and 25%–30% was considered as the lower threshold for a protein to be classified as a myosin. The new myosins (fig. 1B ) were named according to their chromosome position (fig. 2 ). Interestingly, half of the myosin genes are located between polytene bands 27F–36A, thus forming a myosin hot spot on the left arm of the second chromosome.
In the cases of Myo 95E, Myo10A, Mhcl, and Myo29D, we did a detailed analysis of the molecular structure of the genes and the transcripts they produced. The main reason for this was that the predicted sequences for these genes encode proteins that produced low homology scores to other myosins. They were obviously myosins, containing all the conserved sequences and structural parts defining them as myosins and at the same time showing no more than 15%–29% identity to other myosins. This implied either incorrectly predicted genes or incorrectly predicted splicing of the transcripts. Open reading frames (ORF), 5′untranslated regions (UTRs), 3′UTRs, and the presence or absence of given motifs were tested by RT-PCR and subsequent sequencing of the products obtained. In these experiments we employed an ovarian Uni-ZAP XR® library produced in our lab, along with cDNAs produced by reverse transcription of RNAs from larvae and adult flies (see Materials and Methods). As a result, we determined a number of new myosin sequences and submitted them to the MEDLINE Database. The accession numbers for these are—Myo95E: AF454350, AF454351, and AF454352; Myo10A: AF454346 (presents a part of the first three exons including the 5′UTR); Mhcl: AF454347 (presents a part of the first four exons including the sequence encoding the PDZ domain); Myo29D: AF454348 and AF454349. Myo28B was not subjected to detailed analysis because it was found to be almost identical to the other myosin VII (crinkled) from Drosophila at both the DNA and protein levels.
To examine the evolutionary relationships between members of the myosin family in Drosophila and other phylogenetically diverse species, we used two different phylogenetic methods. We applied Distant-matrix and Maximum-Parsimony methods (PROTDIST and PROTPARS from the PHYLIP package) to compare the conserved head domains. These methods were chosen because they tend to outperform other methods (i.e., lower variance), such as the Maximum Likelihood, when dealing with large data sets. The two programs produced trees with similar topology (see the unrooted consensus tree in fig. 4 ). Multiple sequence alignments were performed with CLUSTAL W without corrections for gaps or multiple substitutions. Excluding the positions with gaps would have omitted a significant proportion of the data, a problem that occurs when large amounts of input sequences are dealt with. CLUSTAL W (GCG software package) is provided by the Human Genome Mapping Project Resource Centre, Cambridge, at http://www.hgmp.mrc.ac.uk/ (Thompson, Higgins, and Gibson 1994<$REFLINK> ). The reliability of the tree structure was checked by bootstrapping (1,000 trials) and reordering the alignments randomly (bootstrapping was performed with SEQBOOT from the PHYLIP package). The tests produced trees with similar branching order. A consensus tree was produced by the CONSENSE program of the PHYLIP package and graphically drawn with the TREEVIEW program (Page 1996<$REFLINK> ), and was then transferred to and manipulated with PowerPoint.
The protein sequences for the new Drosophila myosins are theoretical predictions. Myosins are large multiexon genes and are difficult to assemble with 100% accuracy from sequence data. There are also various isoforms of some myosins, which can lead to some misalignments; hence, it is unlikely that the tree shown perfectly reflects the evolution of the Drosophila myosins.
It is possible that some of the new Drosophila myosins could be pseudogenes. However, we found that the probability for this was low. Pseudogenes generally lack introns and are not transcribed into mRNA. We identified expressed sequence tags (ESTs) for all the myosin genes we predicted (the accession numbers for these are given in table 2 ), which confirmed their in vivo expression. The new myosins are described in detail subsequently.
Myo95E (Myosin IC)
This myosin was not found during the initial searches of the fly genome. It was subsequently identified in the AE003746 genome scaffolding. The predicted sequence for this gene is unusually short, resulting in only a 59-aa protein (accession number AAF56246). Detailed analysis showed that the gene was not predicted correctly by the Genome Project. The sequence for Myo95E was manually assembled, taking into account the reported sequence for this gene along with the identified ESTs (table 2 ), as well as the homology of the translated genomic DNA to other myosins (preserving the exon-intron spacing). To test our theoretical predictions, we used two splice-site prediction programs: The Neural Network at http://www.fruitfly.org/seq_tools/splice.html and the GENSCAN Server at MIT—http://genes.mit.edu/GENSCAN.html. Using the artificially assembled sequence, we were able to design primers, amplify a PCR product, and hence sequence the cDNA for Myo95E (see Materials and Methods). The resulting sequence differed from the predicted sequence, showing the presence of an unusually long exon 3 and variations in exon 4. RT-PCR analysis showed that the gene produces at least three different transcripts expressed during oogenesis, and larval and adult stages (fig. 3 ). Two of the transcripts (I-AF454350 and II-AF454351), which represent 5%–10% of the total amount of Myo95E transcripts, translate into two protein isoforms of 1,278 and 1,258 aa, respectively. Transcript I (5,285 bp) comprises all 16 exons of the gene. Transcript II (5,225 bp) lacks exon 12. The major third transcript (III-AF454352) has a longer exon 4 (a downstream extension of 16 bp, GTG CAC ATT ACC CAT T). This shifts the ORF and produces a stop codon TGA in exon 5. The third transcript (5301 bp) encodes two putative truncated proteins. The first is a 464 aa sequence containing only the GESGAGKT conserved region from the P-loop of the myosin head domain. It does not contain the Switch-1 (NxxSSR) and Switch-2 (DxxGxE) regions which together with the P-loop have been implicated as having a role in the hydrolysis of ATP. The second is an 864 aa protein containing a large part of the head domain, two IQ motifs and a myosin tail domain. The head of this myosin form contains only the Switch-2 (DFYGFE) conserved sequence, which prevents it from converting ATP, probably rendering the form inactive. Homology searches with the protein sequence for the head domain showed 33% identity (53% similarity) to vertebrate brush border class I myosins. Three ESTs were found for Myo95E. A search with them revealed up to 37% identity to vertebrate brush border myosins and less to other myosin classes. Analysis of Myosin 95E with domain-scanning programs confirmed its structural similarity to class I myosins (fig. 1B ). It was found to contain two IQ domains (928–974 aa), the second being poorly conserved (fig. 5 ), and a Basic Tail domain (974–1,278 aa for isoform I and 974–1,258 aa for isoform II) (fig. 1B and table 1 ). The latter is thought to be involved in membrane binding. Recent studies have shown that it can also bind to actin filaments (Lee et al. 1999<$REFLINK> ; Liu, Brzeska, and Korn 2000<$REFLINK> ). This changes the number of class I myosins in Drosophila to three, hence Myosin 95E was renamed Myosin IC. The Drosophila Myosin IC does differ from other myosins of class I. The unusually long exon 3 results in a 281-aa insertion into the head domain. This insertion contains a partial AAA domain, a conserved region that contains an ATP-binding site. So far, no other myosins from class I have been identified which contain such an insertion.
Myo28B (Myosin VIIB)
The amino acid sequence of the Myosin 28B head showed 61% identity (74% similarity) to ck-Drosophila myosin VIIA. It also exhibited a very high identity of 58% (72% similarity) to Myosin VIIA from zebrafish and to other class VII myosins. Analysis of Myosin 28B revealed that it has four IQ motifs (753–845 aa), the third being poorly conserved (figs. 1B and 5 ). Two Myosin Tail Homology 4 (MyTH4) (1,070–1,246 and 1,681–1,826 aa), two FERM (1,246–1,454 and 1,826–2,039 aa), and one SH3 (Src homology 3) (1,561–1,626 aa) domains were identified (fig. 6A and B ). The function of the MyTH4 domains is unknown. The FERM domain (the name stands for Band 4.1, ezrin, radixin, moesin-homology) is believed to be involved in linking cytoskeletal proteins to the membrane as well as in dimerization. Talin, merlin, and philopodin are other major members of the FERM superfamily. These deserve mentioning, especially the Talin, because the FERM domain exhibits the highest homology to the FERM domain in Talins (fig. 7 ) and less homology to FERM domains from other members of the FERM family. The SH3 domain has been identified in many proteins involved in signal transduction. It is believed that the SH3 domains mediate protein-protein interactions by binding to proline-rich domains. Other myosins, such as IV, X, and XV, also contain this motif. A short coiled-coil domain (849–908 aa) was predicted by the Paircoil program (fig. 8 ).
Myo10A (Myosin XV)
An RT-PCR analysis was used to verify the exon composition and the exon length of Myo10A transcripts. It was found that there are at least two transcripts, which are expressed from the larval stage onward (fig. 3 ). The longer transcript (7,462 bp) consists of all the previously known five exons and represents no more than 5% of the total amount of mRNA for this myosin. The shorter transcript (7,189 bp) lacks exon 2 and is expressed abundantly. The two transcripts translate into two protein isoforms of 2,424 and 2,333 aa, respectively. A BLAST search with the conserved head domain showed significant 47% identity (64% similarity) to mouse and human myosin XV. Myo10A is also related to human and mouse myosin VIIA with an identity of 42% (59% similarity). A specific N-terminal domain was identified in Myosin 10A (1–149 aa). The latter showed no similarity to the characteristic N-terminal domain found in other class XV myosins. The shorter protein isoform lacks this N-terminal domain. In the neck region three IQ domains (841–910 aa) were identified (fig. 5 ). Immediately after the IQ motifs there is a short coiled-coil region (919–946 aa) (fig. 8 ). Analysis of the tail revealed the presence of one MyTH4 domain (1,014–1,173 aa) (fig. 6 ), a glutamine-proline-alanine (QPA)-rich domain, a proline-rich domain (the borders of QPA and the proline-rich domain were not clearly defined), a short transmembrane motif (2,194 aa) (fig. 1B ), and an FERM domain (2,220–2,424 aa). The latter showed a very high identity of 40% (59% similarity) to the first and 13% identity (30% similarity) to the second FERM domain from mouse myosin XV and a limited similarity of 38% (17% identity) to Talin itself (fig. 7 ).
Mhcl (Myosin XVIII)
A partial sequence of this gene has been submitted to the GenBank Data Base by Biru (1999)<$REFLINK> . It shows high similarity to the Mhc type II of myosins and was termed Myosin heavy chain–like. Subsequently, the full length for this myosin was published by Celera (Adams et al. 2000<$REFLINK> ). The RT-PCR analysis we have undertaken shows that Mhcl has a very complex structure and expresses multiple transcripts throughout the Drosophila life cycle (fig. 3 ). Data from the available ESTs for Mhcl, PCR analysis, and a Promoter-predicting program (http://www.fruitfly.org/seq_tools/promoter.html) suggest that the seven identified transcripts are transcribed from four putative promoters. Transcripts I (6,603 bp) and III (6,909 bp) are highly expressed. The other five identified transcripts are expressed at comparatively lower levels. Transcripts I and III are composed of 18 exons (the gene comprises 19 exons), lacking exon 15. The difference between them is that transcript III has the second intron spliced out, which introduces a stop codon in the ORF. One unusual feature of Mhcl is the fact that it contains three other genes within it (fig. 3 ). The exons of these genes are not found in combination with any of the exons of the myosin gene. Importantly, there are ESTs for each of the three genes. One of the genes has been previously identified as phosphatidylserine-specific phospholipase A1 (CG4979). It is located in intron 9, and its reading frame is in the opposite direction to that of the Mhcl reading frame (EST: GH15759). The other two genes have ORFs in the same direction as the main (Mhcl) gene. The gene in intron 10 (ESTs: RE41368, RE44374) is a novel gene and does not show any significant homology to previously characterized genes. The gene in intron 12 (ESTs: LP08646, LP05315) shows a limited identity of 35% (46% similarity) to bovine synaptojanine 1 protein (synaptic inositol-1,4,5-trisphosphate 5-phosphatase 1). The Mhcl transcripts translate into a set of protein isoforms—I: 2,200 aa; II: 2,139 aa; III: 439 aa; IV: 899 aa; V: 730 aa; VI: 512 aa; and VII: 479 aa. Only the isoforms produced by transcripts I and II contain the conserved motor domain. BLAST analysis showed that Myosin heavy chain–like is most closely related to mouse PDZ-Myosin (41% identity) and human PDZ-Myosin (40% identity). It showed a limited identity of 28%–29% to smooth muscle and nonmuscle myosins and less than 25% to skeletal muscle myosin. BLAST searches revealed that the tail domain shows the highest similarity, 32%, to PDZ-containing myosins and less than 25% similarity to other types of myosins. Interestingly, this myosin was not predicted to contain a PDZ domain. We used the DNA sequence encoding the mouse PDZ domain to search the genomic sequence (AE003711) surrounding Mhcl (approximately 10 kb in each direction). This detected a PDZ domain 4,810 bp upstream of the predicted start for the Mhcl mRNA (the domain was found in the borders of the predicted genomic DNA for this gene but has not been previously included in its ORF). The presence of the PDZ domain was verified by RT-PCR, and the sequence was submitted to the NCBI database (AF454347). A sequence alignment of the PDZ domains from Drosophila, mouse, and human is given in figure 6 . Interestingly, transcript III translates into a truncated protein containing only this PDZ domain (349–429 aa). PDZ domains are known to bind C-terminal or internal (non–C-terminal) polypeptides. Two IQ domains were found in Mhcl (1,379–1,428 aa) (figs. 1B and 5), although the second differed slightly from the consensus sequence. The Paircoil program predicted two coiled-coil domains at 1,439–1,549 and 1,616–2,121 aa (fig. 8 ).
Database searches suggested that the first three exons of Myo29D are probably part of a gene adjacent to the myosin. Subsequent RT-PCR showed that they splice together with the other three myosin exons to produce at least two different transcripts. Transcript I (4,402 bp) contains all the six exons, whereas transcript II (2,530 bp) lacks exon 5. Transcripts I and II are present during all the stages of Drosophila development, with transcript I being expressed at the higher levels. Transcript II produces a truncated form of the protein, lacking most of the motor head domain. Analysis of the conserved head domain revealed that Myosin 29D is not similar to any of the known classes of myosins. It showed 29% identity (45% similarity) to myosin VII, X, and V from different species, which is sufficient for it to be considered as a myosin but not enough to be related to a given class. A search with the available ESTs for this myosin showed 36%–39% identity to myosin V from different species and 31%–36% to vertebrate Myosin heavy chains (myosin II). Motif search programs detected in Myosin 29D a specific N-terminal extension (1–338 aa), two IQ motifs (1,089–1,136 aa), the second being poorly conserved, and a short transmembrane motif in the tail (1,144–1,279 aa) (fig. 1B ).
The detailed search of the translated Drosophila genome sequence produced several low-score hits. Close analysis of these (searches up and downstream of the respective sequences) revealed that they are not true myosins. Despite exhibiting certain similarities to given parts of the myosins, none of them showed high homology to a larger part of the head domain. AE003112 showed similarity to the highly conserved region (GESGAGKT) from the P-loop of the head domain. Its polytene chromosome location remains undetermined. AE003614 (CG11199) showed homology to the myosin tail, containing coiled-coil and FERM domains. It is located on the chromosome arm 2L (27E), next to the ninaC gene. AE003495 (CG12047) also showed similarity to the myosin tail, containing coiled-coil and FERM domains. This is located on the X chromosome (12E1), close to the newly identified Myo10A.
The BLAST search retrieved one more myosin-like gene CG15831 in the AE002795 genome scaffolding. This is a single-exon gene, 219 bp long. The chromosome position of AE002795 has not been determined previously. The program showed similarity to a highly conservative part of the myosin head domain (LGVLDIFGFENFSHNSFEQLCINYTNEKLHKFFNH). We found that the DNA surrounding the gene shows no similarity to the myosin genes (at the DNA or protein level). This suggests that AE002795 is either incorrectly assembled in the genome or that the gene CG15831 is a partial duplication of a myosin gene (this might be either a part of Mhc [muscle myosin II] or didum [Myosin V], which produced the highest score for this sequence, 59% and 54% identity, respectively).
The myosin family has grown significantly in the past decade to encompass more than 177 myosins these days. All these myosins, with a few exceptions, fall into 18 classes. Myosins are expressed in both prokaryotes (though these have not been well studied) and eukaryotes.
The annotation of the genome sequences for S. cerevisiae, Dictyostelium discoideum, Arabidopsis thaliana, C. elegans, Drosophila melanogaster, and Homo sapiens made it possible to identify the complete sets of myosin genes in these organisms. It appears that all eukaryotes have an essential set of three myosin genes, these being from classes I, II, and V, as well as a number of species-specific myosins. In Drosophila 13 different myosin genes were identified. The fruitfly has three I, two II (one encoding for muscle and one for nonmuscle myosins), one III, one V, one VI, two VII, one XV, and one XVIII myosin genes, and one yet to be classified myosin gene. These classes of myosins have been found in a wide range of invertebrate and vertebrate animals. It has been shown that they have a role in a variety of cell functions, including membrane trafficking, signal transduction, and maintenance of the cell architecture. Several new myosin genes were identified in the fruitfly. The genome data provided cDNA sequences for four of them (Myo28B, Myo10A, Mhcl, and Myo29D). The sequences and the ORF for these myosins were verified by RT-PCR and sequencing. In addition, a fifth myosin was found (Myo95E). Using the fruitfly genomic sequence, we were able to predict and subsequently isolate and sequence this gene.
Only two of the five newly identified genes in Drosophila fell directly into previously known classes, these being Myosin VIIB (Myo28B) and Myosin heavy chain–like (Mhcl). Myosin VIIB is from the well-studied class VII myosins with a role in the membrane trafficking and stereocilia function.
Mhcl is a member of the recently defined class XVIII of myosins, which consists of only two other members, mouse and human PDZ-containing myosins. The PDZ domain, also called DHR (Dlg homologous region), is known to bind either C-terminal or internal (non–C-terminal) polypeptides. These domains have been identified in a broad range of signaling proteins from bacteria, yeast, plants, insects, and vertebrates. PDZ domains have been implicated in targeting signaling molecules to submembranous sites.
Myosin IC (Myo95E) is the third member of class I myosins in Drosophila. It contains the Basic Tail domain (TH1) specific to all class I myosins. It is thought that TH1 binds to acidic phospholipids and actin filaments. Interestingly, this myosin contains an additional N-terminal insertion, which is similar to the AAA motif, a conserved region of about 220 aa that contains an ATP-binding site. This domain is inserted in the region of the loop 1 of the motor domain, the other region associated with the hydrolysis site for ATP, and probably modulates its activity.
Myo10A is closely related to class XV myosins. This myosin has a short N-terminal extension, which differs from the N-terminal domain characteristic for vertebrate myosins of class XV. It also lacks the SH3 and the second MyTH4 domain found in other myosin XV tails. Instead, the Drosophila Myosin XV tail contains three addition motifs, a glycine-proline-glutamine–rich domain, a proline-rich domain, and a small transmembrane domain. The proline-rich sequences have been demonstrated to bind to SH3, a small 50-aa motif. SH3 domains have been identified in a wide variety of intracellular and membrane-associated proteins and are implicated in signal transduction, linking signals transmitted from the cell surface by protein tyrosine kinases to effector proteins located downstream on the hierarchical pathways.
Myosin 29D is a highly divergent member of the myosin superfamily. Presently, it forms a class of its own. This myosin contains an unusual N-terminal extension, which shows no homology to other proteins. It also has a small transmembrane domain in its tail rich in proline residues.
Drosophila melanogaster expresses many myosins genes. Apart from the essential myosins (classes I, II, and V), it also has myosins from classes III, VI, VII, XV, and XVIII, as well as a novel type of myosin. This new data should help to design experiments to investigate the roles of these newly identified myosins in the cell histology and development of Drosophila.
Antony Dean, Reviewing Editor
Keywords: unconventional myosins evolution genome project
Address for correspondence and reprints: Mary Bownes, Institute of Cell and Molecular Biology, University of Edinburgh, King's Buildings, Edinburgh EH9 3JR, United Kingdom. firstname.lastname@example.org
This is a joint sequence from the first 72 amino acids of T21544 and from the 11th amino acid to the end of AAA97925.
The sequence for the human PDZ-containing myosin is truncated. This does not seem to affect its evolutionary position in the phylogenetic tree.
We are grateful to Neville Cobbe for help with the phylogenetic analysis programs, to Gillian Millburn from flybase for comments on the manuscript, and to Sheila Milne for help with the preparation of the manuscript. We are grateful to the Darwin Trust, MRC, BBSRC, and B'nei B'rith Scholarship Committee for their support.