Sox proteins belong to the HMG box superfamily of DNA-binding proteins and are found throughout the animal kingdom. They are involved in the regulation of such diverse developmental processes as germ layer formation, organ development and cell type specification. Hence, deletion or mutation of Sox proteins often results in developmental defects and congenital disease in humans. Sox proteins perform their function in a complex interplay with other transcription factors in a manner highly dependent on cell type and promoter context. They exhibit a remarkable crosstalk and functional redundancy among each other.
Sox proteins entered the scientific stage less than a decade ago with the identification of Sry as the long elusive testis-determining factor Tdy/TDF, located on the Y chromosomes of mouse and man (1,2). Sry was found to contain a domain with similarity to the DNA-binding domains of the abundant non-histone chromosomal proteins HMG-1 and HMG-2. This so-called HMG domain is present in a large number of proteins which all belong to the HMG box superfamily. There are few amino acid positions within this 70–80 amino acid long domain which are conserved throughout the HMG box superfamily (3). Thus HMG domains can be highly diverse. However, in subgroups of the superfamily, strong conservation of HMG box sequences is observed.
Such a subgroup is the Sox protein family. Proteins are grouped into this family if they contain an HMG domain with strong amino acid similarity (usually >50%) to the HMG domain of Sry, which is also known as the Sry box. It is this box that gave the Sox protein family its name.
DNA-Binding and the HMG Domain of Sox Proteins
The HMG domain mediates DNA-binding of Sox proteins. Like other HMG box-containing proteins, Sox proteins have the ability to recognize specific DNA structures in vitro such as four-way junction DNAs of variable sequence (6,7), (CA)n repeats that can adopt such structures (8) or 1,2-d(GpG) cisplatin adducts (9). It is unclear whether this capacity is important for Sox protein function in vivo.
Sox proteins also bind to specific DNA sequences. Among HMG box proteins, this ability for sequence-specific DNA recognition is unique to Sox proteins and the distantly related TCF/LEF family (3). The consensus motif for Sox proteins has been defined as the heptameric sequence 5′-(A/T)(A/T)CAA(A/T)G-3′ (10).
The structure of the Sry-type HMG domain has been solved both on and off DNA (11,12). Like other HMG domains, it consists of three α-helices (I–III) that are arranged in a twisted L-shape, with the antiparallel helices I and II forming the long and helix III and its associated N-terminal extension forming the short arm of the L-shape. The overall structure is maintained by a hydrophobic core. The amino acids that constitute this core are as highly conserved among Sox proteins as the ones which provide the base-specific DNA contacts. Whereas the overall conformation of the HMG domain remains unaltered upon DNA binding, a large conformational change is induced into target DNA such that its minor groove follows the concave binding surface of the HMG domain perfectly. As a consequence, DNA bound by Sry or other Sox proteins has an overall 70–85° bend, a widened minor groove and is helically unwound relative to classical B-DNA (6,13–16).
The characteristic minor groove binding activity of Sox proteins (17) is different from DNA binding of most other transcription factors which mainly target the major groove. Thus binding of Sox proteins to DNA in close proximity to other transcription factors is sterically feasible. This and the aforementioned ability to bend DNA has led to the hypothesis that Sox proteins might perform part of their function as architectural proteins by organizing local chromatin structure and assembling other DNA-bound transcription factors into biologically active, sterically defined multiprotein complexes (18,19).
Sox proteins might also recruit proteins into these complexes which by themselves do not bind to DNA, but are linked to signal transduction pathways. Such a mechanism is suggested by analogy to the related TCF/LEF proteins which bind β-catenin upon Wnt signaling and thereby translate these extracellular signals into changes of gene expression (for a review see 20,21). Involvement in the Wnt signaling pathway is, however, specific for TCF/LEF proteins, as the β-catenin interaction domain is not conserved between TCF/LEF and Sox proteins.
NMR analyses of the HMG domains of Sox proteins have also shown that domain structure as well as sequence-specific DNA binding and bending activities impose severe constraints, resulting in a very limited number of permissive amino acid choices at numerous positions. These constraints are largely responsible for the high degree of sequence conservation within the HMG domain of Sox proteins which in turn allowed the rapid isolation of partial sequences for many members of the Sox family by a degenerate PCR approach (2,22,23).
Nomenclature of Sox Proteins
Contrary to other transcription factor families, such as the POU family (24,25) the names of individual members of the Sox protein family conform to general rules. All Sox proteins except Sry carry the term Sox in their name, an additional suffix and sometimes a prefix. For human proteins, all three letters are capitalized, for other species only the first. The suffix usually consists of a number, less frequently of a letter. Numbers are assigned to Sox proteins consecutively in the order of their identification, with the count being now at 24. If a prefix is present in the name of a Sox protein, it generally indicates the species from which it was isolated. The use of such a prefix is not very common for Sox proteins identified in mammals, but is often used for other vertebrates. Most Sox proteins so far have been cloned from man, rodents, chicken, Xenopus and rainbow trout. Therefore, the most common prefixes used are ‘x’ for Xenopus laevis, ‘c’ for chicken (Gallus gallus) and ‘rt’ for rainbow trout (Oncorhynchus mykiss).
It should, however, be pointed out that Sox proteins have additionally been identified in a number of vertebrates and invertebrates, including various marsupials, reptiles, ascidians and Drosophila melanogaster. A survey of genomic sequences from the nematode Caenorhabditis elegans furthermore revealed the presence of at least four putative genes for Sox proteins (Fig. 1), although experimental evidence for their expression is presently lacking. This indicates that Sox proteins are widely distributed throughout the animal kingdom. I was, however, unable to detect a Sox family member among HMG domain-containing proteins encoded in the yeast genome.
Classification of Sox Proteins
All Sox proteins that have been identified so far (including the ones from Drosophila and C.elegans) can be placed into one of seven subgroups, A–G (Table 1). Members of the same subgroup usually share a >80% amino acid identity within the HMG domain; members of different classes fall below this mark. Additionally, vertebrate members of the same subgroup share regions of significant homology even outside the HMG domain (Fig. 2). In some subgroups, regions flanking the HMG domain are highly conserved. This is for instance the case in classes B, C and E. Whereas the conserved region within group B is located C-terminal to the HMG box and consists of a short amino acid sequence motif, conserved amino acids in classes C and E are N-terminal of the HMG domain and in the case of subgroup E span 36 amino acids. Other conserved regions in several subgroups comprise the transactivation domains, which in the case of classes C and E are located at the extreme C-terminus (17,26–31).
Members of group D contain a leucine zipper motif in addition to an HMG domain. The distance between HMG domain and leucine zipper is ∼250 amino acids. It has been shown for Sox6/SoxLZ, rtSox23 and the long variant of Sox5 (L-Sox5) that the leucine zipper is capable of mediating homodimerization (32–34). For Sox6/SoxLZ and L-Sox5, heterodimerization has also been demonstrated (34). Unlike the leucine zipper of classic bZip transcription factors, the leucine zipper of group D Sox proteins is not adjacent to a basic region. Rather, the leucine zipper is always followed by a glutamine-rich region with which it forms a contiguous coiled-coil domain. As a consequence, dimerization does not lead to the creation of a new DNA-binding interface as is the case for bZip proteins. Rather, the influence of the leucine zipper on DNA binding seems to be indirect as leucine zipper-mediated dimerization strongly reduced the affinity of the dimer for single Sox consensus binding sites (32,33), while simultaneously increasing the ability to recognize two adjacent Sox binding sites (34). However, the leucine zipper does not only allow homodimerization or heterodimerization between group D Sox proteins, it also mediates interactions with other proteins (32). It is very tempting to speculate that not only the leucine zipper of group D Sox proteins, but also those regions found to be conserved in other classes of Sox proteins, exert their function by mediating interactions with other proteins.
Finally, members of the same group of Sox proteins tend to have similar genomic organizations in mammals. Mammalian genes for Sox proteins belonging to groups B and C are intronless, at least with respect to their coding region. As an aside, it deserves to be mentioned that this intronless character is not preserved in C.elegans Sox proteins belonging to these groups (Fig. 1). Open reading frames F40E10.2 and K08A8.2 for group B Sox proteins contain 2 and 4 introns, respectively, whereas 13 introns are found in the group C open reading frame C32E12.3.
In contrast to groups B and C, members of groups D-G possess exon-intron structure in mammals (Fig. 3). The exact genomic structure has only been determined for a handful of mammalian Sox proteins. Within the open reading frame of group E Sox genes, the position of introns is conserved (35–37; M.Wegner, unpublished). In group D gene Sox5 (38), group E genes Sox9 and Sox10 and group F gene Sox17 (39), the HMG domain is interrupted by an intron. Interestingly, even the C.elegans group D open reading frame W01C8.2 carries the intron within the HMG domain at exactly the same position as the mammalian group member Sox5, arguing for a conservation of intron positions within each subgroup. Furthermore, the group F gene Sox17 also carries its intron at the same position within the HMG domain as the group D gene Sox5. In group E genes, however, the intron has moved by four amino acids to the C-terminal side of the HMG domain. A look at the Sox open reading frames from C.elegans furthermore reveals that the ones belonging to groups B and C also contain introns within the HMG domain, sometimes up to three (see K08A8.2 in Fig. 1). These introns are found at positions other than the ones in the mammalian Sox proteins. Under the assumption that all Sox proteins are derived from a single ancestor, this finding indicates that at least some of the introns were not present in this common ancestor and were introduced later during the course of evolution. Sox proteins thus seem to be a good example in support of the ‘insertional theory of introns’.
Even those Sox proteins coded for by intron-containing genes are fairly small with sizes <20 kb. They are usually transcribed into messages of small to moderate sizes (Table 1). From the known chromosomal localizations of Sox genes it has to be concluded that they are interspersed throughout the mammalian genome and are not organized in clusters.
Partially Cloned Sox Proteins
As already mentioned, most Sox proteins were originally identified not as full-length proteins, but as PCR-derived partial sequences corresponding to their HMG domains. Because of the already mentioned high degree of sequence conservation within this part of the protein, PCR products sometimes differ from each other only by a limited number of bases. The deduced amino acid sequences differ even less.
Differences could indicate that PCR products are derived from highly related, but distinct, genes. Alternatively, they could be attributed to species-specific differences if different cDNA sources were used, or simply to PCR errors. Thus, it is generally difficult to make statements about the nature of a particular Sox protein other than a preliminary classification, as long as it is only known from its HMG domain.
Some of the originally identified Sox proteins (2,22,23), which have not yet been determined outside the HMG domain and still await cloning, are: group B Sox14 (DDBJ/EMBL/GenBank accession no. Z18963), group C Sox12 and Sox19 (accession nos Z18961, U70442 and X98368), group E Sox8 (accession no. Z18957), group F Sox7 (accession no. X65660) and group G Sox16 (accession no. L29084). It should be noted, however, that proteins highly related to the group E factor Sox8 and to the group F factor Sox7 have already been identified in other species with SoxP1 from rainbow trout (40) and xSox7 from Xenopus (41). Whether these proteins are indeed true orthologs will only become clear after identification of full-length clones for the mammalian factor.
Another problem arising from the original cloning by PCR is that identification was so rapid that the same name was given to PCR fragments of different identity. This has been a source of confusion.
Sox8 was assigned originally to a group E PCR fragment from mouse (DDBJ/EMBL/GenBank accession no. Z18957) (23) and a group F fragment from man (accession no. X65664) (22). Given the fact that the mouse homolog of the group F fragment has already been fully cloned as Sox18 (42), I suggest that the term Sox8 should exclusively be used for the group E protein, that is still to be cloned.
Sox9 has been used to describe a group E PCR fragment from mouse (accession no. Z18958) (23) and a group B fragment from man (accession no. X65665) (22). The group E protein is fully cloned in several species and has been extensively characterized both with regards to its expression and function (35,37,43). The group B protein on the other hand is still only known as a PCR fragment. Therefore, I propose to use Sox9 for the well-characterized group E protein. Once fully cloned, the human group B protein could be renamed.
Similarly, both a group E fragment from mouse (accession no. Z18959) (23) and a group B fragment from man (accession no. X65666) (22) were originally designated as Sox10. The group B fragment is highly similar, if not identical, to Sox3. The group E protein, on the other hand, has been recently identified in full by us and others and characterized as to its biological function (31,44,45). Thus, I suggest the term Sox10 to be used exclusively for the group E protein.
In the case of Sox12 the situation is even more complicated. The term has been used for a mouse group C (accession nos Z18961 and U70442) (23), a human group G (accession no. X73039) (46) and a Xenopus group D (accession no. D50552) (22) protein. The group D protein from Xenopus is the only one that has been fully cloned as xSox12 (47). The human group G fragment is identical to the recently cloned human Sox20 gene (48,49) and the mouse group C protein is still at large. Once cloned, it could either be given the name Sox12 and distinguished from xSox12 or, preferably, given a new name.
The same is true for Sox19 which was assigned to a group C PCR fragment from mouse (accession no. X98368) (49), but is also used to name the zebrafish group B Sox protein, zfSox19 (50). To avoid all complications, it would be most convenient to assign a new name to the group C factor once it has been cloned.
Finally, the term Sox21 has very recently been used to describe two fully cloned Sox proteins. One is identical to Sox10 (51), the other is a novel group B factor (52). Because one of the two is identical to Sox10, the term Sox21 should be exclusively used for the remaining group B factor.
Sox Gene Function
Between 15 and 20 different Sox proteins have already been identified in both mouse and man. If partially cloned Sox proteins and Sox proteins only known from other species are taken into account, it has to be assumed that the number of Sox proteins in any given vertebrate species will be >20. Given this high number, it cannot be surprising that most tissues and cell types express a Sox protein during at least one stage of their development (Table 2). In effect, a couple of studies have shown that a number of tissues (53–55) or cell types (27,34) express more than one Sox protein at certain times. Although it is conceivable that co-expressed Sox proteins perform different functions, especially when belonging to different subgroups and being distantly related, their recognition of similar, if not identical, DNA sequence motifs suggests that they might influence each other's activity or function redundantly. Functional redundancy is a recurring theme with Sox proteins. It also poses problems in the interpretation of gene deletion experiments or dominant-negative interference strategies. Nevertheless, these techniques combined with extensive analyses of developmental and adult expression patterns have helped in recent years to highlight some of the important biological roles of Sox proteins.
Sox proteins and sex determination
Sex determination is historically the classical domain of Sox protein function. Sry, the prototype of the Sox protein family, is a decisive factor for male sex determination in mammals located on the Y chromosome (1,2). Sry is expressed for a short period of time (10.5–12 days post-coitum in the mouse) in certain somatic cells of the genital ridge, triggering their differentiation into Sertoli cells and thereby initiating testis differentiation from the indifferent gonad (56–58). When expressed in mice as a transgene, mice develop as males even with an XX karyotype (59). When deleted from the Y chromosome, chromosomally male mice adopt a female phenotype (2,60). Analogous observations have been made in human sex-reversed patients (see for example 61,62; for a full list of allelic variants see MIM entry 480000).
How Sry performs its function on the molecular level still remains a puzzle. Functional interaction with other proteins involved in sex determination such as Dax1 and SF1 is probably an important factor (63,64). Most of the SRY missense mutations identified in human patients localize to the HMG domain (65). Severe ones are usually de novo, those with moderate effects can be transmitted to male progeny. A significant number of these mutations interfere with the DNA-binding ability of the HMG domain or alter the bending characteristics (14). Other functions of the HMG domain might also be affected. Thus, it has been shown that the HMG domain of SRY and other Sox proteins contains the protein's nuclear localization signal (66,67). In analogy to other transcription factors, it might also be expected that the HMG domain engages in functionally important protein-protein interactions.
Outside the DNA-binding domain, Sry is highly divergent between species which is otherwise uncommon for Sox proteins. In fact, Sry has been evolving so rapidly that significant differences can even be detected between various mouse strains (68). A transactivation domain has been mapped to the C-terminus of mouse Sry, which is not conserved (either on a structural or functional level) in the human protein (69). Reversely, human SRY interacts with SIP-1, a novel PDZ domain protein via its C-terminal seven amino acids which are not conserved in the mouse protein (70). Human SRY has recently been shown to be phosphorylated by protein kinase A or a kinase of similar substrate specificity at a serine residue in the N-terminal part and as a consequence to be modulated in its DNA-binding activity (71). Again, this serine residue is absent from mouse Sry. This lack of conservation outside the HMG domain might indicate that all important functions of Sry are mediated by the HMG domain. Alternatively, sequences outside the HMG domain might have to perform different functions in different species and therefore might have been subject to directional selection with species-specific adaptive divergence.
Sry must be a relatively recent addition to the mechanism of sex determination as it does not seem to exist in vertebrates other than mammals. Given the fact that Sry most closely resembles the group B protein Sox3 which is localized on the X chromosome, it has been proposed that Sry evolved from Sox3 (72). Although the developing central nervous system is the major site of Sox3 expression in mammals (55), Sox 3 is also expressed in somatic cells of the early urogenital ridge, lending further support to a Sox3-derived origin of Sry. Unlike Sry, however, Sox3 is not only expressed in the male, but also in the female genital ridge.
Sry and its putative ancestor Sox3 are not the only Sox proteins present very early in the genital ridge. Another Sox protein is the group E protein Sox9. Like Sox3, Sox9 is not exclusively expressed during development in the genital ridge, but is also expressed in a number of other places, most prominently at sites of chondrogenesis (43; below).
Sox9 is initially expressed in both male and female genital ridges up until the onset of Sry expression. Following this event, Sox9 expression becomes restricted to the male gonad, where its expression follows differentiation of Sertoli cells so closely that it has been proposed to function as a critical Sertoli cell differentiation factor (73,74). Such a function is also supported by the fact that ∼75% of karyotypically male patients with heterozygous SOX9 mutations develop as intersexes or XY females (35,37). Thus, Sox9 is expressed in the same somatic cells that express Sry. However, whereas Sry expression is transient, Sox9 expression is maintained throughout further testis development. These and other observations have led to the assumption that, although originally turned on in the genital ridge in a Sry-independent manner, Sox9 must be under the control of Sry in mammals at the time of sex-determination. Contrary to Sry, Sox9 is an ancestral component of the vertebrate sex-determination pathway as is evident from its male-specific expression during the sex determination period in chicken genital ridges (73,74). In the chick embryonic gonad, Sox9 expression starts ∼1 day after expression of the anti-Müllerian hormone (AMH) gene (75). Therefore, Sox9 does not seem to be essential for AMH gene induction. Nevertheless, Sox9 might be involved in the maintenance of AMH gene expression, as the AMH promoter contains Sox9 binding sites (75) and is activated in tissue culture by a combination of Sox9 and SF-1 (76).
Whereas Sry, Sox3 and Sox9 are all restricted to somatic cells of the gonad, there is a second set of Sox proteins expressed in germ cells. The group F protein Sox17, for instance, is found in pre-meiotic spermatogonia (39). In contrast, group D proteins Sox5 and probably Sox6 are both restricted in adult mice to post-meiotic germ cells, with highest levels in round spermatids (77,78). The occurrence of Sox proteins in the female germline is less well analyzed. Sox2 transcripts have been found in oocytes (55). Maternal transcripts were also detected for XLS13A and XLS13B, the Xenopus orthologues to mammalian group C protein Sox11 (79), rtSox24 (26) and the zebrafish group B protein zfSox19 (80).
Sox proteins and early embryogenesis
An important role for Sox proteins during the initial stages of ontogeny can so far only be inferred from maternal expression of some family members. Evidence exists, however, for an involvement of Sox proteins in the following early stages. The best studied case in mammals is the group B protein Sox2. This Sox protein is transiently expressed in the inner cell mass and the epiblast of mouse blastocysts, before it appears later again in the forming neuroepithelium (55,81). Mouse embryos homozygous for a targeted deletion of Sox2 die around implantation (82). Sox2 expression during early embryogenesis exhibits a pattern broadly overlapping with that of the POU transcription factor Oct-3/4 (83). Both factors have been found to cooperate in the transcriptional regulation of genes required during early embryogenesis, such as FGF4 and osteopontin (81,83,84). Early embryonic expression of the FGF4 gene is conferred by a strong enhancer in exon 3 with multiple transcription factor binding sites. One of these sites is a recognition element for POU proteins, one a binding site for Sox proteins. Both sites are separated by 3 bp. The function of this enhancer is critically dependent on the synergistic interaction between Sox-2 and Oct-3/4. Both factors have to be bound to their respective sites (81) and, additionally, engage in direct protein-protein interactions, thereby forming a ternary complex with exact stereospecific requirements (84). Because of these requirements, not any pair of Sox and POU proteins will function equally well together. Thus, it has been observed in tissue culture experiments that the group E protein Sox10 functions best with Tst-1/Oct6/SCIP and the group C protein Sox11 with Brn-1 or Brn-2 (27,44). This argues for the existence of a specific partner code between members of both families.
Early embryonic expression of osteopontin is driven by an enhancer in the first intron of the gene. As in the FGF4 enhancer, binding sites for both Sox2 and Oct-3/4 are present, although the exact configuration of both sites is different (83). As a consequence, the effects of Oct-3/4 and Sox2 on osteopontin gene expression strongly differ from their effects on FGF4 transcription. Binding of Oct-3/4 to the enhancer is alone sufficient for activation, whereas binding of Sox2 to its site causes gene repression. Thus, the same set of transcription factors can be used simultaneously for gene activation and repression depending on the gene-specific context. The context-dependent, combinatorial function of Sox2 and Oct-3/4 is probably an important principle of transcription factor function during early development.
On a sequence level, Sox2 is closely related to the Drosophila Sox protein Sox70D/fish-hook/Dichaete with a 42% overall identity on the amino acid level (85,86). Sequence similarity is paralleled by a conservation of biochemical properties such that ectopic expression of the mouse Sox2 rescued phenotypic aspects of Dichaete mutations in transgenic flies (87). There are also resemblances between the expression pattern of Sox2 and Sox70D/ fish-hook/Dichaete. Sox70D/fish-hook/Dichaete is expressed in the early Drosophila embryo, before taking on a second role later during neurogenesis (85,86). Expression initiates in the syncytial blastoderm in the form of a broad circumferential band corresponding to the entire trunk region. This band splits into two stripes and additional expression starts in the procephalic region. During cellularization of the blastoderm, Sox70D/fish-hook/Dichaete expression domains are further refined into seven irregular stripes. During and following gastrulation Sox70D/fish-hook/Dichaete is involved in maintaining the proper expression of several pair-rule and segment polarity genes, with the effect on hairy, runt and even-skipped expression being direct (15,86). Because of this role, functional inactivation of Sox70D/fish-hook/Dichaete leads to severe segmentation defects, including loss or fusion of abdominal denticle belts and to a defective organization of head structures. The observed segmentation defects are, however, extremely variable, indicating that Sox70D/fish-hook/Dichaete might not be absolutely required for pair-rule gene expression, but rather might have an accessory role, probably through alterations of chromatin structure and synergistic interactions with other transcription factors such as the POU proteins Pdm-1 and Pdm-2 in the case of even-skipped expression (15,86). Similarly, interaction with the POU protein Cf1a/drifter/ventral veinless has been postulated to explain Sox70D/fish-hook/ Dichaete function during development of midline glia in later phases of development (87).
A third example of how Sox proteins influence early developmental processes is given by Xsox17α and Xsox17β, two highly related Xenopus proteins with sequence similarity to the group F mouse protein Sox17 (88). Expression of both proteins is first detected in the late blastula of Xenopus, is restricted to the presumptive endoderm and remains endoderm-specific throughout gastrulation and neurulation. Xsox17β becomes undetectable during tailbud stages, whereas Xsox17α continues to be expressed in endoderm-derived tissues. Both proteins have been found to act downstream from the endoderm-inducing activin signal. Their expression is dependent on the homeodomain Mixer during gastrulation, not however during the initial phases (89). Xsox17α and Xsox17β themselves seem to be involved in the regulation of such genes as endodermin, HNF1β, cerberus and Xhlbox8, as their ectopic expression in animal caps activates these endodermal markers. Importantly, injection of Xsox17 proteins or dominant negative fusions between them and the engrailed repressor domain into whole embryos lead to significant disruptions of endoderm development. These data provide evidence that Xsox17α and Xsox17β function as early regulators of normal endoderm development.
Sox proteins and neural development
At late Xenopus blastula stages, not only Xsox17 proteins start to be expressed, but also SoxD. Contrary to Xsox17 proteins, however, SoxD is found in the prospective ectoderm (90). During gastrulation, expression levels increase, but at the same time become restricted to the dorsally located, prospective neuroectoderm. Later on, SoxD is widely expressed throughout neural plate and tube. SoxD expression can be induced by chordin and is suppressed by BMP-4 signaling. The importance of SoxD during neural development became obvious from the observation that microinjected SoxD mRNA induces ectopic formation of neural tissue in the Xenopus embryo, whereas microinjection of a dominantnegative form of SoxD interferes with proper formation of anterior neural structures. Thus, SoxD is an essential mediator of major neuralization pathways in Xenopus.
In addition to SoxD, Xenopus Sox2 is also induced by chordin and is widely expressed in the prospective neuroectoderm at the beginning of gastrulation (91). Contrary to SoxD, however, Sox2 did not exhibit any neuralizing activity upon microinjection of its mRNA into Xenopus embryos. Rather, Sox2 made the ectoderm responsive to extracellular signals such as bFGF. In agreement with this, experiments in the chick also showed that there is no strict correlation between Sox2 expression and the competence of a region for neural induction (92), thus assigning Sox2 a role in neural development downstream from SoxD.
Sox2 is widely expressed in the early neural plate and early neural tube of several species including Xenopus, chicken and mouse and thus is one of the earliest pan-neural markers (53,55,91,93). It is assumed that Sox2 strengthens the neural cell fate at this period and might help cells to acquire further specification. At later stages, Sox2 expression in the developing central nervous system becomes restricted to the neuroepithelial cells of the ventricular layer, which still divide and exhibit an immature phenotype. Cells that leave the ventricular layer loose Sox2 expression. Thus, Sox2 and markers of terminally differentiating cells are expressed in a mutually exclusive manner. In agreement with this, Sox2 expression is no longer detectable in the fully differentiated central nervous system of the adult.
As already mentioned, early neural development marks the second major period of Sox2 expression after the preceding presence in the preimplantation embryo. Whereas occurrence in the preimplantation embryo is unique among Sox proteins, Sox2 expression in the developing nervous system is higly similar to and strongly overlapping with the structurally related group B proteins Sox1 and Sox3 (55,82,93).
In the case of Sox1, it was possible to recapitulate this expression pattern in tissue culture (82). Upon retinoic acid induction of aggregated P19 cells to neuroectodermal derivatives, endogenous Sox1 is transiently induced and becomes downregulated before markers for the differentiated neuronal or glial phenotype are turned on. Importantly, induction of ectopic Sox1 expression in aggregated P19 cells was able to substitute for retinoic acid, showing that Sox1 must be an early response to neural inducing signals.
From the overlapping expression pattern of group B Sox proteins, it has been concluded that these proteins might perform similar functions in the developing nervous system and might be functionally redundant. Such an assumption is supported by the recent targeted deletion of Sox1 in mice, which exhibit a surprisingly mild central nervous system phenotype characterized only by spontaneous seizures (94).
Sox1, Sox2 and Sox3 are, however, not the only examples of possible functional redundancy between Sox proteins. Group C proteins Sox4, Sox11 and Sox22 also show overlapping expression in the developing central and peripheral nervous systems (17,27,95–98).
Among the group C proteins, Sox11 has been best analyzed for its expression in the nervous system. In the central nervous system, it is co-expressed in the neuroepithelium with Sox1, Sox2 and Sox3 on a low level and transiently up-regulated in cells that leave the neuroepithelium. As these cells simultaneously downregulate Sox2 and Sox3, a model has been postulated in which neural differentiation is characterized by an ordered switch from one group of Sox proteins to the next (53). Another interesting aspect of Sox11 concerns its expression in a subset of differentiating brain regions such as the cortical plate and the inferior colliculus during late embryonic development. This might indicate that Sox11 is not only important for the development of early neural precursors, but might also be involved in the differentiation of distinct neuronal subpopulations (27). Outside the nervous system, Sox11 seems to be primarily expressed at places of epithelio-mesenchymal interactions including somites, branchial arches, developing face and limbs (96).
Sox proteins and lens development
Sox1, Sox2 and Sox3 are also co-expressed during lens development in chicken. After initial induction of Sox2 and Sox3 in the prospective lens area of the surface ectoderm, Sox1 expression follows with a slight delay in the lens placode upon its invagination shortly after the onset of δ-crystallin gene expression (99). All three Sox proteins stimulate δ-crystallin gene expression. In the case of the δ1-crystallin gene, activation is mediated by direct binding of Sox proteins to the DC5 enhancer which is localized in the third intron (100).
In the mouse, Sox expression during lens development exhibits several marked differences as compared with chicken. Sox3, for instance, is not expressed at all in the developing lens and Sox2 becomes strongly down-regulated the moment Sox1 is turned on around 12.5 days post-coitum. From this time onwards, Sox1 is by far the predominant Sox protein in the developing lens (99). As a consequence, deletion of both Sox1 alleles in mice leads to a failure of lens fibre cells to elongate, ultimately resulting in microphthalmia and cataract (94). Most of the γ-crystallin genes, which like the δ-crystallin genes of birds are under the control of Sox proteins, are not activated in Sox1-deficient mice. Those that are, probably get activated by Sox2, but are turned off at a later point, indicating that maintenance of their expression also requires Sox1 function.
Sox proteins and the neural crest
Another Sox protein with strong expression in the nervous system is the group E protein Sox10 (31,44,45,101). Sox10 is first very broadly expressed in cells of the neural crest at the time of their emergence. Sox10 expression continues in neural crest cells that contribute to the forming peripheral nervous system and can be detected in the sensory, sympathetic and enteric ganglia as well as along nerves in a manner typical for cells of the Schwann cell lineage (44). Additional Sox10 expression has been detected in melanoblasts (45). Whereas expression in melanoblasts and in the enteric nervous system is transient, expression in all other structures of the peripheral nervous system continues into adulthood and seems to be confined at later times to glial cells (44). Differing from the peripheral nervous system, Sox10 expression in the central nervous system starts late (around day 13 post-coitum in the mouse), but then increases in strength, reaching maximal levels during adulthood. In the adult central nervous system, Sox10 expression seems to be largely confined to glia of the oligodendrocyte type (44).
In the spontaneous mouse mutant Dominant megacolon (Dom), Sox10 carries a frameshift mutation (45,102). This mutation leaves the HMG domain intact, but replaces the protein's normal C-terminal half with a structurally unrelated sequence. The resulting protein is functionally inactive (102). As a result of the mutation, Dom mice suffer from several neural crest defects which in the homozygous animal are characterized by a loss of neurons and glia in the peripheral nervous system and a complete lack of the enteric nervous system (45,102). In a large percentage of heterozygous animals, ganglia are absent from a variable part of the distal colon leading to the formation of an aganglionic megacolon, which in combination with pigmentation defects of the skin are the characteristic features of the Dom mouse. The phenotypic manifestation of the Sox10 mutation in the heterozygotes is variable and points to haploinsufficiency or to a dominant-negative mode of action.
Mutations in one SOX10 allele have also been found in patients who suffer from congenital aganglionic megacolon (Hirschsprung disease) associated with a combination of pigmentation defects and deafness (Waardenburg syndrome) (30,36; for a full list of allelic variants see MIM 602229). This so called Waardenburg-Hirschsprung or Shah-Waardenburg syndrome closely resembles the phenotype of the Dom mouse both with respect to the extensive neural crest dysfunction and the phenotypic variability. One of the reasons for the observed phenotypic variability might be the role of Sox10 as a modulator of other transcription factors (44). While only possessing a relatively weak transactivation potential, Sox10 efficiently cooperates with co-expressed transcription factors such as Pax3 and Tst-1/Oct6/SCIP (30,44). Thus, it is conceivable that the Pax3 or Tst-1/Oct6/SCIP allelic variant present in a particular mouse strain or individual strongly influences the phenotypic manifestation of a given Sox10 mutation. Additionally, SOX10 mutations from patients exhibit differing degrees of functional inactivation in tissue culture experiments (30). Taken together, these results clearly prove a role for Sox10 at an early stage in neural crest development.
Sox proteins and chondrogenesis
Closely related to Sox10 is the group E protein Sox9. As already mentioned Sox9 is expressed very early in the genital ridge and is likely to be involved in sex determination and testis development as a crucial Sertoli cell differentiation factor (73,74). Thus, mutation of one SOX9 allele in humans leads to autosomal sex reversal in the majority of affected chromosomally male individuals (35,37). Phenotypic manifestations are as variable as observed for SOX10.
Whereas sex reversal only affects karyotypically male patients with SOX9 mutations, all carriers are affected by severe skeletal malformations, including congenital bowing and angulation of the long bones, but also hypoplastic scapulae, dislocated hips and deformed pelvis. This autosomal dominant syndrome is known as campomelic dysplasia and affects ∼1 in 20 000 births (for a detailed description see MIM entry 114290). Mutations found in patients affected by campomelic dysplasia often constitute missense mutation within the HMG domain or nonsense mutations. These either abrogate DNA binding or remove the protein's C-terminal transactivation domain (29,35,37,103). Alternatively, the SOX9 locus is affected by translocations with breakpoints being localized 50–600 kb upstream of the transcriptional start site (35,37,104). Transgene experiments prove for at least some of these translocations that they alter the expression rate of SOX9 (105).
In addition to the various skeletal defects, campomelic dysplasia patients sometimes exhibit absence of the olfactory bulbs, heart and renal malformations, defects of the tracheopulmonary system, deafness or mental retardation. This variety of symptoms in humans reflects the expression pattern of Sox9 in mouse development where it is found in brain, otic vesicle, urogenital system, lung and heart (28,43). Most importantly, Sox9 is predominantly expressed in the mesenchymal condensations from which the skeleton develops. There it is expressed before and during cartilage deposition, first in the prechondrogenic precursors and later on in the proliferating and maturing chondrocytes. From this expression pattern and the phenotype of campomelic dysplasia patients it can be concluded that Sox9 plays an important role in chondrocyte development.
One aspect of its function has recently been revealed by the observation that Sox9 is largely co-expressed with collagen II (col2a1), the major extracellular matrix component of cartilage, during development of skeletogenic and a number of other tissues in the mouse (28,34). Additionally, Sox9 has been shown to bind to sites within the chondrocyte-specific enhancer present in intron 1 of the col2a1 gene and to activate this enhancer in tissue culture experiments (16,28,106). Mutation of these binding sites severely disturbs the chondrocyte-specific activity of this enhancer in vitro and in vivo. Additionally, expression of a Sox9 transgene under the control of the Hoxb2 promoter induces ectopic expression of both the endogenous col2a1 gene and a col2a1-lacZ transgene, indicating that col2a1 is probably a direct target of Sox9 (106). However, Sox9 activity alone is not sufficient to account for the full expression pattern of col2a1 and there is evidence that the col2a1 enhancer is targeted by other proteins, including L-Sox5 and Sox6 (34,107). These two Sox proteins are co-expressed with Sox9 at all chondrogenic sites of the mouse embryo and form with Sox9 a multiprotein complex on the col2a1 enhancer (34). As a result the col2a1 gene is activated by the combination of all three Sox proteins to a much higher extent than by Sox9 alone. A similar cooperative activation by L-Sox5, Sox6 and Sox9 has also been shown for a second chondrocyte marker, the aggrecan gene.
Sox proteins and haemopoiesis
Interaction with other proteins and functional redundancy also seem to be important determinants of Sox4 function. Sox4 is very widely expressed during embryogenesis in brain, gonads, lung, heart and thymus. It is a very prominent transcription factor in lymphocytes of the B- and T-cell lineages (17). Targeted deletion in the mouse led to embryonal lethality at day 14 post-coitum due to circulatory failure (97). This circulatory failure is caused by an impaired development of the Sox4-expressing endocardial ridges. As a consequence, the semilunar valves and the outlet portion of the muscular ventricular septum fail to develop. Using fetal liver cells from these knockout animals, it was also shown that there is a block in early B-cell development at the pro-B-cell stage, similar to that observed for Pax5 (108). This block was, however, not absolute as a small number of B-cells matured. Such leakiness is rather expected for proteins with accessory than with autonomous function.
Although T-cells also express Sox4, there was only a mild effect of Sox4 deletion on T-cell development. Only upon close inspection was it possible to detect slightly lower expansion and maturation rates which prevent cells of Sox4-deficient animals from efficiently competing in competitive intrathymic reconstitution assays with wild-type thymocytes (109). Although being highly expressed in both the T- and B-cell lineages, Sox4 is thus essential for B-cell development, but only of minor importance for T-cell development.
Research on Sox proteins has made rapid advances over the last few years, with more and more family members being identified and shown to be intricately involved in regulation of development. The increasing number of cases in which mutations in Sox proteins are associated with human diseases further highlights the importance of this group of transcription factors. More will be learnt about the functions of Sox proteins from mouse models as many family members are currently subject to targeted deletion strategies. However, it will be equally important to understand their mode of action on a molecular level. Whether they function as transcriptional activators, modulators or architectural components will only become fully apparent with the identification of more target genes, interaction partners and the signals by which Sox proteins are activated. The coming years should prove to be exciting.
I thank Elisabeth Sock and Irm Hermans-Borgmeyer for critical reading of the manuscript. Research in my laboratory on Sox proteins is supported by the Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie and by grants from the Deutsche Forschungsgemeinschaft (We1326/5-2 and We1326/7-1).
Note Added in Proof
The putative translation product W01C8.2 described in Figure 1 has recently been identified as a C.elegans Sox protein necessary for establishing a functional vulval-uterine connection and has been named COG-2 [Hanna-Rose, W. and Han, M. (1999) COG-2, a Sox domain protein necessary for establishing a functional vulval-uterine connection in Caenorhabditis elegans. Development, 126, 169–179].
- congenital abnormality
- amino acids
- dna-binding proteins
- germ layers
- neural crest
- sex determination (analysis)
- transcription factor
- embryologic development
- neural development
- congenital disorders
- sox10 gene
- sox9 gene