A significant proportion of bacteria express two or more chaperonin genes. Chaperonins are a group of molecular chaperones, defined by sequence similarity, required for the folding of some cellular proteins. Chaperonin monomers have a mass of c. 60 kDa, and are typically found as large protein complexes containing 14 subunits arranged in two rings. The mechanism of action of the Escherichia coli GroEL protein has been studied in great detail. It acts by binding to unfolded proteins and enabling them to fold in a protected environment where they do not interact with any other proteins. GroEL can assist the folding of many proteins of different sizes, sequences, and structures, and homologues from many different bacteria can functionally replace GroEL in E. coli. What then are the functions of multiple chaperonins? Do they provide a mechanism for cells to increase their general chaperoning ability, or have they become specialized to take on specific novel cellular roles? Here I will review the genetic, biochemical, and phylogenetic evidence that has a bearing on this question, and show that there is good evidence for at least some specificity of function in multiple chaperonin genes.
Introduction – role, structure, and mechanism of the Escherichia coli chaperonin GroEL
The groE genes of E. coli were the first chaperonin genes to be discovered. They were initially identified as genes that when mutated prevented the plating of several bacteriophage (Georgopoulos et al., 1972; Takano & Kakefuda, 1972). Many mutations in these genes resulted in a temperature-sensitive phenotype, implying that the GroE proteins had a fundamental cellular role. Two groE genes were identified, groEL and groES, encoding proteins of c. 60 and 12 kDa, respectively (Tilly et al., 1981). A major breakthrough was the discovery that the groEL protein of E. coli shares significant amino-acid homology with a protein found in plant chloroplasts, which is involved in the assembly of ribulose bisphosphate carboxylase (Rubisco) (Hemmingsen et al., 1988). This suggested that conserved proteins might be needed to assist the assembly or folding of other proteins in all organisms. The generic name of molecular chaperones was proposed for these proteins, and the proteins with homology to E. coli GroEL and its plant equivalent were named chaperonins.
Several key experiments confirmed these early suggestions. It was shown using purified protein components that the in vitro refolding of a bacterial Rubisco from its unfolded state was significantly enhanced by the presence of GroEL, GroES, and ATP, all three components being required (Goloubinoff et al., 1989a). Moreover, the yield of folded bacterial Rubisco in E. coli could be dramatically enhanced by coexpression of GroEL and GroES (Goloubinoff et al., 1989b). The hypothesis that GroEL is likely to act on many protein substrates was supported by experiments that showed that temperature-sensitive mutations in a wide range of different genes could be suppressed by overexpression of GroEL and GroES, suggesting that these mutations caused defects in the protein-folding pathway that high levels of chaperonins could overcome (van Dyk et al., 1989). The current view is that GroEL interacts with a wide range of unfolded or partially folded proteins, and with the assistance of GroES (often referred to as a co-chaperonin), helps them to reach their fully folded and active state. Coimmunoprecipitation experiments combined with MS have identified c. 300 proteins as being substrates of GroEL, of which about 85 are obligate substrates, meaning that they are completely dependent on GroEL and GroES to reach their folded form (Houry et al., 1999; Kerner et al., 2005). As 13 of these are required for cell viability, the essential nature of the groEL and groES genes in E. coli is explained. Other experiments looking at the change in the proteome when groEL is inactivated in vivo have suggested that the range of substrates may be even larger (Chapman et al., 2006).
Why is chaperonin function necessary for some proteins to fold? Until the discovery of molecular chaperones, it was generally assumed that proteins in the cell folded spontaneously during and after translation, because it was known from in vitro studies that the folded form of the protein is more energetically stable than its unfolded forms, and that proteins will hence spontaneously (in the thermodynamic sense) reach this structure from their unfolded states (Anfinsen, 1973). However, there are good reasons why protein folding inside the cell is problematic. In particular, the concentrations of proteins and other solutes are much higher in the cell than those used in experiments that demonstrate spontaneous refolding of unfolded proteins in vitro, and at these high concentrations unfolded proteins have a greater likelihood of forming insoluble aggregates (van den Berg et al., 1999; Ellis, 2001; Ellis & Minton, 2006). There is also the fact that proteins will tend to unfold en masse when the growth temperature of an organism is raised, which could lead to widespread protein aggregation unless mechanisms exist to protect unfolded proteins while the temperature is high and to help them refold when normal conditions are restored. It is thus unsurprising that most chaperones, including the chaperonins, are induced by heat shock and by other treatments that increase the amount of unfolded protein present in a cell.
The mechanism by which chaperonins act has been very thoroughly studied and is extensively reviewed elsewhere (e.g. Thirumalai & Lorimer, 2001; Horwich et al., 2006, 2007), and only those aspects needed to understand why the presence of multiple chaperonin genes is unexpected and interesting are considered here. GroEL exists as a large protein complex with two rings, each made up of seven identical subunits. Each ring contains a central cavity with an approximate volume of 85 000 Å3. There is no connection between the two cavities (Braig et al., 1994). In the GroEL reaction cycle, protein substrates bind initially to a ring of hydrophobic contacts at the end of the complex (Fenton et al., 1994). Attempts have been made to classify actual GroEL substrates into particular structural types (Kerner et al., 2005), but as GroEL has the capacity to bind about half the proteins in the E. coli proteome if they are unfolded (Viitanen et al., 1992), there is little specificity in its ability to recognize proteins apart, presumably, from the need for exposed hydrophobic residues. ATP also binds to GroEL, probably after the polypeptide, as does GroES. This causes a significant structural change in the GroEL subunits that enlarges the central cavity (see Fig. 1) and displaces the bound substrates from the end of the complex into the central cavity (Weissman et al., 1995, 1996; Mayhew et al., 1996; Xu et al., 1997). The structural change in GroEL may in some cases cause the bound substrate protein to further unfold, which may be an important part of the GroEL mechanism, helping proteins that have misfolded into an incorrect conformation to reinitiate folding from a less-folded state (Thirumalai & Lorimer, 2001; Lin et al., 2008). Once in the cavity, the bound protein has a chance to fold without interacting with any other folding proteins. This is effectively the same as diluting a folding reaction to a concentration where aggregation is no longer a problem. The length of time the protein spends in the cavity is determined by the time (typically around 10 s) that it takes for GroEL to hydrolyse the bound ATP. Once this happens, the binding of another molecule of unfolded protein at the opposite (trans) end of the GroEL complex, together with ATP and GroES, leads to the discharge of the initially bound GroES (Rye et al., 1997, 1999). The first protein that was bound can now diffuse out of the cavity into bulk solution, although it may be recaptured (or even not released) by GroEL if it has not yet reached a nonaggregation prone form (Martin & Hartl, 1997). This reaction cycle is shown in Fig. 2.
Although the range and quality of experimental work that has been carried out on the E. coli GroEL/GroES system is impressive, there is always the danger of overgeneralization from this system to all bacterial chaperonin and co-chaperonin proteins. This is particularly the case when considering the multiple chaperonins that are the subject of this review, because these are likely to have arisen through either gene duplication or horizontal gene transfer, either of which allows for the subsequent evolution of a new role for one of the two copies, while the other maintains a housekeeping function in the cell. Such evolution of novel function is well documented for duplicated genes. That this is a possibility specifically for chaperonins has been shown by an elegant study conducted to see whether the E. coli GroEL protein could be altered by selection to improve its capacity to fold particular substrates (Wang et al., 2002). In this study, groEL and groES were mutagenized and variants with an enhanced ability to fold the green fluorescent protein (GFP) in vivo were selected, with cellular function being assured by the presence of an unmutated copy of the wild-type groEL gene. Variants that showed improved folding of GFP were duly found, but they were shown to have a reduced ability to function as general chaperones in the cell, indicating a conflict between the potential ability of GroEL to fold specific substrates and its cellular role in assisting the folding of a wide range of substrates. Such a conflict would be removed following a gene duplication. We therefore now consider what is known about chaperonin function in those organisms where multiple chaperonin genes are present, and the extent to which this knowledge goes beyond what is known about GroEL in E. coli.
The nomenclature of GroEL and GroES homologues can be confusing. Strictly speaking, the names GroEL and GroES should be reserved for the E. coli proteins, because they refer to the bacteriophage-plating phenotype discussed above. Proteins that subsequently turned out to be GroEL homologues were known in many bacteria before the function of GroEL was fully understood, and these were often called Hsp60 or Hsp65 proteins, or ‘common antigen’ based on the fact that these proteins are often highly immunogenic. A variety of other names for GroEL homologues have also been used by different authors. A unified nomenclature has been proposed whereby chaperonin proteins are referred to as Cpn proteins, with homologues of the GroEL chaperonin called Cpn60, and those of the co-chaperonin GroES called Cpn10 (Coates et al., 1993). From now on in this review, I will use the Cpn nomenclature, with multiple chaperonin genes being referred to as cpn60.1, cpn60.2, etc., but will cross-reference these to the published names given by authors where appropriate. As we learn more about the possible roles of multiple Cpn proteins in different organisms, this nomenclature in turn is likely to require further revision. It is important to note that the use of a number (e.g. cpn60.1) for a gene does not imply any particular relatedness to genes with the same number in different bacterial taxa, because as will be described below, phylogenetic analysis clearly shows that duplication of chaperonin genes has taken place independently several times.
The occurrence in bacteria of multiple genes for chaperonins
Cpn60 and cpn10 genes are found in nearly all bacteria, with a small number of mycoplasmas that lack both genes being the only exceptions. A survey of 669 complete bacterial genomes using tblastn reveals that nearly 30% contain two or more chaperonin genes. Some of these contain only one cpn10 gene, and some have more than one. The numbers of cpn60 genes for these 669 completed genomes are shown in Table 1, divided by groups (as defined in the NCBI taxonomy browser at http://www.ncbi.nlm.nih.gov/Taxonomy). The overall numbers of species found with different numbers of Cpn60 homologues is shown in Fig. 3. Some interesting points emerge from this table. First, it is clear that multiple chaperonins are not randomly distributed among bacterial groups: some groups have no examples or very few, whereas in others multiples are the rule rather than the exception. In two cases of groups with several complete assembled genomes available (Cyanobacteria and Chlamydiae) there are no examples of members of the genus that have a single cpn60 gene. Second, the range of numbers of multiples also varies between groups. Thus, all Chlamydiae possess three cpn60 homologues, whereas among the Alphaproteobacteria, the numbers range from one to seven. These observations strongly support the hypothesis that selective pressure maintains the presence of multiple genes, because neither of these patterns would be likely to arise due to a process of random gene duplication and gene loss.
|NCBI group||Number of Cpn60 homologues|
|NCBI group||Number of Cpn60 homologues|
Six hundred and sixty-four complete genomes on http://xbase.bham.ac.uk were searched using tblastn with a cut-off value of P=0.05. In cases where unexpected results were obtained, for example, where a single member of a species gave a different result to all the other members, other methods were used to check the result, and in a few cases this led to a correction of the value for the number of cpn60 genes/genome. For example, in one sequenced strain of Corynebacterium glutamicum, a cpn60 gene has been interrupted by an IS element, and this leads to a spurious extra match in the tblastn search.
Why is the presence of multiple chaperonin genes surprising? As described above, GroEL binds a wide variety of cellular proteins with little evidence of structural or sequence specificity. In addition, in many cases tested, cpn60 homologues from other bacteria can function quite well in E. coli, showing that the spectra of proteins bound by Cpn60 proteins in other organisms must overlap considerably with those in E. coli. Thus, it might be predicted that cell's need for chaperonin function could always be fulfilled by a Cpn60 protein encoded by a single gene. What then is the adaptive significance of multiple cpn60 genes? Two extreme possibilities exist. One is that the proteins all have the same role in the cell, and the presence of multiple copies is simply one way that the organism can regulate its chaperoning capacity: having different genes regulated by different conditions is one way that an organism could regulate the total chaperonin activity in the cell. If this is the case, a genetic analysis would be predicted to show that the genes are equivalent and interchangeable. The other is that the proteins have evolved to have quite distinct and nonoverlapping functions, in which case they would not be able to substitute for each other genetically, and significant differences in biochemical properties and modes of regulation would also be expected. The actual situation may be intermediate between these two extremes, with overlap of function but also some specialization or subfunctionalization.
Beyond this fundamental question lie a number of others, the answers to which have the potential to enhance both our understanding of chaperonins in particular and the nature of in vivo protein folding in general. For example, if there is indeed functional specialization of the proteins, what is the structural basis for this? When there are multiple genes in the cell, are they expressed under the same or different regulatory controls? If they are expressed in the cell at the same time, do they form mixed or separate complexes? And what is the evolutionary connection between the different homologues: is gene duplication a frequent event or are the multiple genes the descendants of an ancestral event? Answers to most of these questions are emerging from the studies described below.
Only a relatively small number of specific studies have been made of multiple cpn60 genes in different bacteria, and those that have been carried out suggest that the answers to the questions above may not be general ones – what is true for multiple cpn60 genes in one organism or taxonomic group is not necessarily true for others. This finding is supported by the fact that a phylogenetic analysis of 249 multiple chaperonins in bacteria showed that they have arisen from independent gene duplications in different lineages, and so have discrete histories of adaptation and selection (Goyal et al., 2006). One consequence of this is that bacteria with multiple chaperonin genes tend to cluster phylogenetically, although there are some exceptions to this. This review will thus focus on four different groups of bacteria where multiple chaperonins have been studied, and assess the extent to which the above questions have been answered in each case. The organization of the duplicated chaperonin genes in these four groups is shown in Fig. 4.
Multiple chaperonin genes in the Actinobacteria
We begin with the Actinobacteria, as this was the first group in which such genes were reported, and as we shall see, they also provide the best evidence to date for a specialized function for one of the duplicated genes. Actinobacteria were the first examples found of bacteria with more than one cpn60 gene (Rinke de Wit et al., 1992; Kong et al., 1993), and the majority of the sequenced genomes of Actinobacteria have two or (in a few cases) three or four cpn60 genes. In all cases there is only one cpn10 homologue present. In those Actinobacteria where there is only one cpn60 gene, the cpn10 gene is always found separate from cpn60 on the genome, rather than being part of the same operon as is the case in other bacteria. Examples include Bifidobacterium longum, Clavibacter michiganensis, and Tropheryma whipplei (Maiwald et al., 2003). The most likely explanation for this is that the common ancestor of all present-day Actinobacteria had two cpn operons, one with a cpn10 gene and one without, arising perhaps from duplication of an ancestral cpn60 gene, and that one of the duplicated cpn60 genes has been lost in those Actinobacteria, which have only one copy left. Phylogenetic analysis of the duplicated cpn60 genes both from complete and incompletely sequenced genomes shows clearly that they fall into two distinct clusters (Goyal et al., 2006). One of these genes, often referred to as cpn60.1, is the gene that is found in an operon with cpn10 and that is missing from some sequenced genomes. The second gene, cpn60.2, is always found without an adjacent cpn10 gene, and is present in all Actinobacteria. These observations strongly suggest that cpn60.2 is essential and cpn60.1 is nonessential, and this is supported by the genetic analyses discussed below.
A comparison of evolutionary rates of the duplicated genes in three Actinobacteria (Mycobacterium tuberculosis, Mycobacterium leprae and Streptomyces albus) revealed that the duplicated genes in the two mycobacteria evolved at very different rates, with the cpn60.1 genes evolving much faster, but that this was not the case with the Streptomyces genes (Hughes et al., 1993). This difference, which was seen only at nonsynonymous sites and hence represents selection at the protein level, was greatest in regions that had previously been shown to be common epitopes for immune recognition between different Cpn60 homologues, thus suggesting that exposure to the immune system may in part be responsible for these different evolutionary rates. The more rapid evolution of cpn60.1 may simply reflect stronger selective constraints on the essential cpn60.2 gene, or it may reflect a significance in the way in which these two proteins interact with the immune system or other components of an infected host. In this context, it is interesting to note that both Cpn60 proteins from M. tuberculosis can act as cytokines, stimulating the release of both proinflammatory and anti-inflammatory molecules (Qamra et al., 2005; Henderson et al., 2006). The extension of these evolutionary studies to a broader range of Cpn60 homologues in pathogens and nonpathogens would address this issue in a greater depth.
Studies on heat shock responses have revealed some details about the regulation of expression of the multiple cpn60 genes in several Actinobacteria. In M. tuberculosis, both genes are heat shock inducible, as is the cpn10 gene (Stewart et al., 2002; Hu et al., 2008). All the cpn genes are negatively regulated by the HrcA repressor, which is a widely distributed bacterial repressor, generally involved in regulation of cpn60 genes. It does this by binding to a consensus sequence, found in one or more copies upstream of cpn operons, referred to as a CIRCE sequence (Narberhaus et al., 1999). Sequences with good matches to the CIRCE consensus are found in the presumptive promoters of both the cpn10-cpn60.1 genes and the cpn60.2 gene in M. tuberculosis. Despite this, studies using both microarrays and reverse transcriptase (RT)-PCR consistently show that the relative levels of induction of these genes are not the same: generally, cpn10 shows the highest fold induction, followed by cpn60.2 and then cpn60.1, thus implying a further level of transcriptional or post-transcriptional regulation. It has been suggested that although they are adjacent on the chromosome, the cpn10 and cpn60.1 genes in M. tuberculosis are transcribed from separate promoters (Kong et al., 1993). The highest level of induction of these genes was seen with heat shock, but increased expression was also seen when cells were exposed to other stresses including hyperosmolarity, starvation, and oxidative stress (Hu et al., 2008). No evidence of induction of either gene was seen when M. tuberculosis was grown in phagosomes, despite the presumably stressful environment that this represents (Schnappinger et al., 2003). It is interesting to note that M. leprae lacks a heat shock response, and although this is in part due to the lack of a function SigE protein, the reason why the duplicated cpn genes in this organism are not induced by heat is not known because they are also predicted to be regulated by HrcA (Williams et al., 2007).
In Corynebacterium glutamicum, both the cpn10-cpn60.1 operon and the cnp60.2 operon are heat shock induced, and both appear to be HrcA regulated as judged by the presence of CIRCE sites upstream (Barreiro et al., 2004). Additionally, cpn60.2 has a HAIR site in its promoter, which is a binding site for HspR, another common repressor of bacterial heat shock genes (Narberhaus et al., 1999) although there is no direct experimental evidence to support action by HspR in this particular case. A proteomic study on C. glutamicum has confirmed that the levels of both Cpn60.1 and Cpn60.2 are higher on heat shock (Barreiro et al., 2005), with Cpn60.1 being induced slightly more strongly than Cpn60.2.
In Streptomyces, regulation also appears to be through the HrcA/CIRCE system for both genes, as judged by the occurrence of CIRCE motifs upstream of both operons in Streptomyces coelicolor (Duchêne et al., 1994), Streptomyces lividans (de León et al., 1997) and S. albus (Grandvalet et al., 1998). In S. albus, this has been directly demonstrated by showing that deletion of hrcA leads to overexpression of both cpn operons (Grandvalet et al., 1998), although an earlier report suggested that post-transcriptional events may also have a role in the heat-induced induction of at least cpn60.1 (Servant et al., 1994). Significant differences in both the levels of expression and heat shock induction of the two cpn operons in S. lividans were reported (de León et al., 1997), with cpn60.2 being both the more highly expressed and strongly heat shock induced, reminiscent of the situation in M. tuberculosis. In summary, there are hints but no unequivocal evidence to suggest that the final levels of multiple chaperonin proteins may be determined by more than one mechanism in those Actinobacteria where this has been studied, but the basic mechanism of regulation always appears to be repression caused by binding of the HrcA repressor. A common but not universal finding is the higher expression and inducibility of the cpn60.2 gene relative to the cpn60.1 gene.
All the genetic evidence shows that the cpn60.2 gene is the housekeeping gene in Actinobacteria. First is the observation referred to above that in those Actinobacteria that have only one cpn60 gene, it is always the cpn60.2 gene that is present and the cpn60.1 gene that is lost. Secondly, knockout mutations in cpn60.1 have either been constructed successfully or found as spontaneous events in Mycobacterium smegmatis (Kim et al., 2003; Ojha et al., 2005), M. tuberculosis (Hu et al., 2008), C. glutamicum (Barreiro et al., 2005), and S. albus (Servant et al., 1994), but no mutations have been successfully constructed in cpn60.2 unless a complementing copy is already present in the cell.
An intriguing feature of the duplicated cpn60 genes in the Actinobacteria is that the essential cpn60.2 protein always has at its C-terminal end a repeated motif containing gly and met residues referred to as a GGM repeat. This feature is seen in the large majority of cpn60 proteins although its function is obscure (Brocchieri & Karlin, 2000). Actinobacterial Cpn60.1 proteins on the other hand lack this motif and contain C-termini, which are very variable but often strongly enriched in histidine residues (see Fig. 5). Again the functional significance of these is unknown at present, and this is an area where simple domain swap experiments may bear fruit. Although it might be expected that detailed bioinformatics study of the aligned cpn60.1 and cpn60.2 genes might help reveal which parts of the proteins are responsible for their postulated different functions, no such study has been reported to date.
For the most part, phenotypes of the cpn60.1 knockout mutants of Actinobacteria have not been closely analysed or are reported to be insignificant. In two cases, however, strong and possibly related phenotypes are seen. Knockout of the cpn60.1 gene in M. tuberculosis leads to little obvious phenotype when cells are grown planktonically apart from a slightly increased temperature sensitivity. However, when used to inoculate either mice or guinea pigs, the cpn60.1 mutants completely lose the ability to induce the normal granulomatous proliferation seen in lung. This effect appeared to be associated with reduced cytokine expression in these animals (Hu et al., 2008). This may well be linked to the immunomodulatory effects of the Cpn60.1 protein referred to above. In the related organism M. smegmatis, loss of the cpn60.1 gene leads to a loss of the ability of the cells to form mature biofilms, an effect that was traced to an inability of these cells to synthesize mycolic acids of a particular chain length (Ojha et al., 2005). Pull-down experiments showed that as cells shift from planktonic to biofilm growth, the Cpn60.1 protein binds to the KasA protein (a component of the fatty acid synthase II complex, which is involved in mycolate biosynthesis) and another protein, SMEG4308. Cpn60.1 may have a role in converting KasA between different isoforms, which may in turn affect the balance of mycolate synthesized in these cells and hence their ability to form mature biofilms. This is the only clear example of a very specific interaction of a chaperonin encoded by a duplicated gene with particular substrate proteins in the cell.
Although it is the house-keeping chaperonin in Actinobacteria, the Cpn60.2 protein has other intriguing properties particularly in the mycobacteria. In M. leprae, it is one of the most abundant proteins found associated with the cell wall (Marques et al., 1998), and it has also been shown to be associated with the cell wall in M. bovis BCG, although the majority of the protein is in the cytoplasm. No Cnp60.1 is found to be cell wall associated in these experiments. Moreover, Cpn60.2 is released, with an N-terminal truncation, into the supernatant of M. tuberculosis cultures, a process that requires the action of a novel envelope-associated protease called Rv2224c. Deletion of this gene significantly reduces the pathogenicity of M. tuberculosis (Rengarajan et al., 2008). It has also been reported that this protein has a proteolytic activity towards oligopeptides, although whether this is also true of Cpn60.1 was not reported (Portaro et al., 2002).
The biochemistry of the chaperonins in Actinobacteria also reveals further unusual features. Preliminary work on the chaperonin proteins from S. lividans showed that they form oligomers with evidence of chaperone activity (Marco et al., 1992; de León et al., 1997), with a size and EM appearance consistent with single ring particles of E. coli groEL. Intriguingly this activity appeared to be independent of a Cpn10 cochaperonin, although the assays were carried out on fractions in which the protein had not been fully purified. More unexpected was the finding that the Cpn60 proteins from mycobacteria appear not to exist as stable large oligomers. Early work on what subsequently proved to be the Cpn60.1 protein of M. bovis BCG showed that it had a subunit molecular mass of c. 65 kDa but purified as a complex of c. 250 kDa, corresponding roughly to a tetramer (de Bruyn et al., 1987). Moreover, both of the Cpn60 proteins from M. tuberculosis when purified formed only low molecular weight structures, probably monomers or dimers (Qamra et al., 2004). The Cpn60.2 protein from M. tuberculosis was successfully crystallized and shown to be in a dimer form in the crystal (Qamra & Mande, 2004). No in vitro ATP-dependent chaperone activity could be ascribed to these purified proteins, although intriguingly it has recently been reported that both Cpn60 proteins can potentiate the activity of the M. tuberculosis heat shock repressor HspR in an ATP-dependent manner in vitro (das Gupta et al., 2008), which points to a chaperone-like activity for these proteins and a role in heat shock regulation. The Cpn60.2 protein has been shown to complement loss of the essential E. coli groEL gene and can act with E. coli groES (Hu et al., 2008), which is known to form a heptameric structure (Hunt et al., 1996). It has also been shown that the Cpn10 proteins from M. leprae and M. tuberculosis form complexes with sevenfold symmetry (Roberts et al., 2003). It is thus tempting to speculate that in vivo, mycobacterial and perhaps other actinobacterial Cpn60 proteins are assembled into functional heptameric complexes on a Cpn10 template, perhaps only transiently, but such a process has not been demonstrated in vitro. Cpn60.1 from M. tuberculosis fails to complement loss of groEL in E. coli, which again is consistent with it having evolved a new and distinct function such that it can no longer recognize the broad spectrum of substrates that must be recognized by Cpn60.2 (Hu et al., 2008). Also consistent with this finding is the fact that when the Cpn60.1 protein is purified from M. smegmatis, no Cpn60.2 is copurified, showing that these proteins do not form mixed oligomers (Ojha et al., 2005).
In summary, good evidence exists to support the hypothesis that the Cpn60 proteins in Actinobacteria have become diversified in function because gene duplication, and in at least one case (that of Cpn60.1 of M. smegatis), this new function is associated with specific substrate proteins, KasA and SMEG4308. These proteins have features that are unusual for chaperonins, particularly in the mycobacteria where genetic evidence supports the existence of an oligomeric form but biochemical work has yet to identify it. Much work remains to be carried out in identifying the structural reasons for the diverse features of these chaperonin proteins, and there are many other examples of Actinobacteria with multiple Cpn60 genes where a study of their role would be of great interest, including several that have three or four Cpn60 genes.
Multiple chaperonin genes in root-nodulating Alphaproteobacteria
The most spectacular examples of multiple chaperonin genes in terms of number are those that occur in Alphaproteobacteria that nodulate roots and fix nitrogen. blast searches of their complete genomes, together with cloning and genetic analysis described below, has found that many of these bacteria contain several copies of cpn60 genes, with the record currently held by Bradyrhizobium japonicum USDA 110 at seven copies, the highest of any sequenced bacterial genome. It is striking that this high copy number for cpn60 genes correlates well with the root-nodulating, nitrogen-fixing phenotype. Thus B. japonicum, Sinorhizobium meliloti, Mesorhizobium japonicum, Rhizobium etli, and Rhizobium leguminosarum all have three or more cpn60 homologues, despite being phylogenetically fairly distinct (Fischer et al., 1993; Rusanganwa & Gupta, 1993; Wallington & Lund, 1994; Ogawa & Long, 1995; Kaneko et al., 2000a,b; Galibert et al., 2001; González et al., 2006; Young et al., 2006). Agrobacterium tumefaciens, which is more closely related to R. leguminosarum and S. meliloti than either are to B. japonicum, only has one (Wood et al., 2001). Intriguingly, the genomes of three nitrogen-fixing Frankia species all have four cpn60 homologues, the highest of any Actinobacteria (Normand et al., 2007); all four phylogenetically resemble those of other Actinobacteria (P.A. Lund, unpublished data). In addition, some members of the genera Burkholderia and Ralstonia (Betaproteobacteria some of which can nodulate and fix nitrogen; Moulin et al., 2001) also contain multiple cpn60 homolgoues (up to five). Further work is needed to establish whether or not these multiples have a role in nodulation and nitrogen-fixing ability. It is interesting to note that the Cpn60 protein in the free-living diazotrophic bacterium Klebsiella pneumoniae has a role in the assembly of the nitrogenase complex as well as in the folding of the regulatory protein NifA, which controls expression of nitrogenase genes (Govezensky et al., 1991).
The gene organization and location of the cpn60 multiples is interesting and suggests an association with some aspect of symbiosis. The majority of the duplicated cpn60 genes in these organisms have a cpn10 gene upstream, quite different from the situation in Actinobacteria. Some are found close to genes involved in some aspect of symbiosis, for example, the operon in B. japonicum containing cpn60.3 is in a cluster of genes involved in nitrogen fixation (Fischer et al., 1993), and good genetic evidence, discussed below, implicates the products of this operon in some aspect of nitrogen fixation. One of the five cpn operons in Mesorhizobium loti is in a ‘symbiotic island’, which contains many genes for both nodulation and nitrogen fixation (Kaneko et al., 2000b). One of the four cpn60 genes in R. leguminosarum A3 is on pRL9, a large plasmid that also carries some of the genes needed for nitrogen fixation (Young et al., 2006), and three of the five cpn60 homologues in S. meliloti map to the two pSym plasmids, which also carry genes for nitrogen fixation and root nodulation (Galibert et al., 2001). Phylogenetically, the multiple genes in these different symbiotic bacteria are very similar to each other, and no clear separation by species can be seen when they are all compared (Goyal et al., 2006; Gould et al., 2007a), suggesting that multiple gene duplications and transfers have taken place in this clade.
The regulation of these genes is complex, and also shows distinct evidence of a link to nodulation and nitrogen fixation. Of the five cpn operons whose regulation has been studied in B. japonicum, one is expressed constitutively, two are under the control of an HrcA-CIRCE system and one is regulated by (and adjacent to) one of three rpoH homologues that are found in this organism, which encode alternative RNA polymerase σ factors (Fischer et al., 1993; Babst et al., 1996; Minder et al., 2000). The fifth, however, is strongly upregulated in bacteroids, and has been shown to be under the control of the NifA activator and σ54, which in rhizobia are responsible for the regulated expression of nitrogen fixation genes under low-oxygen conditions (Fischer et al., 1994). In S. meliloti, two of the multiple cpn operons are heat shock induced although only one of these is regulated by RpoH (Mitsui et al., 2004; Bittner & Oke, 2006); the other (cpn1) has a CIRCE sequence upstream and is thus likely to be HrcA regulated. (The cpn2 operon also has a CIRCE sequence upstream but interestingly is not heat shock regulated). One cpn60 gene (named groEL5 by the authors; renamed as groEL3 in the annotated genome sequence) was identified as being expressed specifically in root nodules in a genome-wide screen using a promoter trap (Oke & Long, 1999). In R. leguminosarum A34, the housekeeping cpn operon is regulated through an HrcA-CIRCE interaction. cpn2 is regulated by RpoH while cpn3 requires NifA for expression and is only expressed under anaerobic conditions, such as would occur in root nodules (Rodríguez-Quiñones et al., 2005; Gould et al., 2007b). Thus regulation of the genes in many cases suggests a link with nitrogen fixation.
The simplest model to explain the multiple cpn operons in these bacteria would be that one or more of the chaperonin proteins has become completely functionalized for some aspect of nitrogen fixation, such as assembly of the nitrogenase complex, by specifically assisting the folding of one or more of its components. Genetic analysis in B. japonicum provides the most compelling evidence to support the existence of a link between the expression of specific chaperonins, root nodulation and nitrogen fixation, but shows that this simple model is not correct. All individual knock-out mutants of the five cpn60 homologues that are expressed in the strain B. japonicum 110spc4 were still capable of forming root nodules and fixing nitrogen (Fischer et al., 1993). A subsequent study found that a double mutant in two of the cpn operons (referred to as groESL3 and groESL4 in the original study) reduced nitrogen fixation activity to <5% of wild type (Fischer et al., 1999), due to drastically reduced levels of the NifH and NifDK nitrogenase proteins in this mutant background. However, this defect could be partially suppressed by overexpression of two of the other cpn operons, or even by overexpression of the E. coli groESL operon, though full suppression was only seen by expressing one of the two deleted cpn operons. This implies a degree of specificity of chaperonin function, but shows that this specificity is not absolute.
Good evidence also exists in S. meliloti of a connection between specific chaperonin genes and the root-nodulating, nitrogen-fixing phenotype. One of the cpn60 homologues (referred to initially as groELc, now annotated as groEL1) in S. meliloti was found to be required for expression of the nod genes in this organism, as a spontaneously isolated Tn5 mutation in this gene formed late nodules that were incapable of fixing nitrogen (Ogawa & Long, 1995). This defect was attributed to the decreased expression of three activator proteins (NodD1, NodD3 and SyrM) in the mutant, and a direct association between this Cpn homologue and the NodD proteins was demonstrated by coimmunoprecipitation. In vitro experiments showed that E. coli GroESL could improve the ability of purified NodD proteins to bind to their recognition sequences (Yeh et al., 2002). As was found with the chaperonin-mediated defects in B. japonicum, overexpression of another S. meliloti cpn operon was able to suppress the defects of the groEL1 mutant, although interestingly E. coli groESL could not do this (Ogawa & Long, 1995). Again, this fits with a ‘partial specificity’ model.
A comprehensive study of cpn function in S. meliloti has established that all of the five cpn operons are individually dispensable, although expression of at least one of groESL1 and groESL2 is required for growth (Bittner et al., 2007). Of these, only groESL1 is both necessary and sufficient for nodule formation and nitrogen fixation. The ability of other cpn operons to complement loss of groESL1 has been tested in two cases: groESL2 can complement if overexpressed, but groESL3 cannot (Bittner & Oke, 2006). Thus this again looks like an example of partial specificity for particular functions within the multiple cpn operons. In R. leguminosarum, deletion of the cpn60.2 and cpn60.3 genes led to an c. 50% reduction in the ability of the strain to form nodules and fix nitrogen, again suggestive of a partial but not complete specialization of these chaperonins (P.A. Lund & J.A. Downie, unpublished data).
The extent to which homologues in the same organism would be able to cross-complement has also been tested in R. leguminosarum A34 (which has three complete cpn operons, unlike the genome strain, which has four), where it was shown that only one of the cpn60 genes (cpn60.1) was essential (Rodríguez-Quiñones et al., 2005). This, the most highly expressed of the three cpn operons, is thus the ‘housekeeping’ chaperonin. Loss of the cpn60.1 gene could be partially complemented by strong overexpression of the cpn60.3 gene, but full restoration of function was not seen with the strains being quite temperature sensitive. Overexpression of the Cpn60.2 protein could not be obtained (Gould et al., 2007b). These results support the hypothesis that Cpn60.1 provides the core chaperoning capacity of the cell, and the other chaperonins can only partly substitute for this, again arguing for some degree of specialization in function.
The likelihood that there are functional differences between duplicated Cpn60 proteins in rhizobia was further supported by biochemical analysis of the three Cpn60 proteins from R. leguminosarum A34. All three were expressed in E. coli and purified to homogeneity, and their biophysical and biochemical properties were compared. The stabilities of the proteins were different, as were their ATPase activities. Most strikingly, when their abilities to refold lactate dehydrogenase were compared, it was found that Cpn60.1 and Cpn60.2 were equally competent, and roughly as effective as GroEL, whereas Cpn60.3 displayed no activity with this substrate, thus showing in principle that homologous chaperonins from the same organism can differ in their ability to chaperone the refolding of a specific protein (George et al., 2004). A degree of specificity of complexes was also found when Cpn60.1 and Cpn60.3 were coexpressed in E. coli: although mixed complexes were formed at a low frequency, the two proteins strongly preferred to assemble into homo-oligomeric complexes.
To sum up what is known about the multiple chaperonin genes in rhizobia, good evidence suggests that the unusually high level of gene duplication in these organisms is associated with several aspects of their ability to nodulate roots and fix nitrogen, but there is no evidence for significant specificity of function. It seems more likely that the different duplicated genes encode proteins that are partially adapted to function, perhaps with a range of novel substrates, in the root nodule, but that most of the proteins are sufficiently similar such that their functions overlap.
Multiple chaperonin genes in the Cyanobacteria
All complete cyanobacterial genomes (as accessed from http://xBase.bham.ac.uk on 28 August, 2008) contain two or (rarely) three cpn60 homologues. In those cases with two homologues, only one has an upstream cpn10 homologue; in those with three, two of the three cpn60 genes have cpn10 genes upstream. The cpn60.2 genes (lacking an upstream cpn10 gene) have several GGM repeats at the C-terminus, many of which are very long when compared with those in other Cpn60 proteins (see Fig. 5). The cpn60.1 genes (in operons with cpn10) do not encode a long GGM tail, a situation highly reminiscent of the duplicated genes in Actinobacteria, although the C-terminal tails do contain a few gly and met residues, and do not show the long his tails seen in Actinobacteria. The cyanobacterial chaperonin genes cluster into two clear phylogenetic groups, suggesting that an ancient gene duplication event has occurred and become fixed, as in the Actinobacteria (Goyal et al., 2006). In the previous section, we saw how nodulation and nitrogen fixation often correlated with the presence of multiple chaperonin genes, both in the rhizobia and in other nodulating bacteria. As Cyanobacteria are photosynthetic, an obvious question to ask is whether such a correlation also exist with photosynthesis. The answer appears to be yes, although once again the correlation is not absolute: c. 75% of photosynthetic bacteria whose complete genomes are available have two or more chaperonin genes. The pattern of chaperonin genes in photosynthetic eukaryotes varies depending on the lineage of their descent; in some cases (the cryptomonads), homologues of both cyanobacterial genes are found, and it has been proposed on the basis of this and other related studies that all phototrophic eukaryotes may require two chaperonin genes (Wastl et al., 1999; Zauner et al., 2006).
Analysis of gene expression in Cyanobacteria supports the hypothesis that the multiple cpn60 genes may have evolved some specialized functions. The cpn60 gene, which is an operon with cpn10 (referred to as groESL by the authors; referred to here as cpn60.1) in Synechocystis strain PCC 6803 is rapidly induced by heat shock, whereas the second cpn60 gene, which has no upstream cpn10, is much slower in its induction (Lehel et al., 1993; Kovács et al., 2001). When expression in the dark is compared with that in the light, a striking difference is found, in that the heat shock induction of cpn60.2 is effectively abolished whereas that of the cpn10-cpn60.1 operon is not (Glatz et al., 1997; Asadulghani et al., 2003). Heat shock induction of cpn60.2 is also abolished by the addition of an inhibitor that blocks electron flow from photosystem II to the plastoquinone pool, whereas cpn10-cpn60.1 is less affected. This implies the existence of a link between Cpn60 expression and the photosynthetic capability of the cell. Heat stress is known to inactivate photosynthesis in Cyanobacteria (Eriksson & Clarke, 1996), and thermotolerance (the enhanced ability following a sublethal heat shock to survive a lethal heat stress) does not develop in the dark, but a direct link with Cpn60 expression has not yet been demonstrated. The details of regulation of the heat shock response in Cyanobacteria are still under active study. Both cpn60.2 and cpn10-cpn60.1 have upstream CIRCE sequences in Synechocystis sp. PCC 6803 and the transcript initiates inside these sequences (Lehel et al., 1993; Glatz et al., 1997). All analysed cyanobacterial cpn operons where both a cpn10 and a cpn60 gene are present have CIRCE sequences upstream, but many of the cpn60.2 genes do not (Kojima & Nakamoto, 2007). Knocking out the hrcA gene in Synechocystis sp. PCC 6803 derepresses transcription of both genes, but further induction can still occur on heat shock, showing that other mechanisms of regulation must also be present (Nakamoto et al., 2003; Kojima & Nakamoto, 2007). The HrcA knockout shows higher thermotolerance and reduced thermobleaching of phycocyanin than the wild type, which may be attributable to the higher levels of Cpn60 protein that these cells contain: whether the protection is specific to one or the other of the two homologues has not been tested.
The ability of Cpn60 proteins from Cyanobacteria to functionally replace GroEL in E. coli has been tested for two species: Synechocystis sp. PCC 6803 (Kovács et al., 2001) and Synechococcus vulcanus (Furuki et al., 1996; Tanaka et al., 1997). In both cases, the cpn60.1 can complement well and the cpn60.2 poorly or not at all. This is distinct from the situation in Actinobacteria, where it is the orphan Cpn60 that can function in E. coli. Interestingly, then, it appears that the possession of a GGM tail is not the sole indication of which of the duplicated chaperonins is the one that has general chaperonin functions. The prediction from these complementation experiments would be that it is the cpn60.1 gene, with an upstream cpn10, that would be essential in Cyanobacteria, and it has been shown that a strain of Synechococcus elongatus PCC 7942 in which this gene is deleted could not be constructed (Sato et al., 2007). It has also been confirmed that the cpn60.2 gene in Thermosynechococcus elongatus can be deleted without effect under normal growth conditions, although it is required for growth at both heat shock and cold shock temperatures (Sato et al., 2008).
A link between differential expression of one of the cpn60 genes, thermotolerance, and photosynthesis has also been established for Anabaena L-31, which also expresses two chaperonin proteins, which both accumulate on heat shock (Rajaram & Apte, 2003, 2008). These strains show much better protection of photosynthesis after heat stress when they are grown under nitrogen-fixing conditions than when grown in nitrogen-supplemented media, and this correlates well with the presence or absence of cpn60.2 (the gene without an associated cpn10 homologue). Not only is cpn60.2 transcription repressed under heat shock in the presence of nitrate but also pre-existing Cpn60.2 protein is degraded. Recovery of photosynthetic activity in cells that have been grown in the presence of nitrate, heat stressed, and then returned to normal growth temperatures, correlates well with the reappearance of Cpn60.2. The role of Cpn60.2 in protecting photosynthesis and also nitrate reductase activity under heat shock conditions was confirmed by overexpressing this protein in Anabaena 7120. Whether the Cpn60.1 protein has a similar effect has not been reported. The possible role of the long GGM tail in membrane association is an obvious area for future study.
In conclusion, considerable indirect evidence supports the hypothesis that the duplicated cpn60 genes in Cyanobacteria have distinct functions, with the balance of evidence suggesting that the cpn60.2 protein is important for protecting aspects of the photosynthetic activity of these strains under heat shock conditions. What might the mechanism of such protection be? It has been suggested that chaperones in general and chaperonins in particular may have a role in maintaining membrane integrity under normal and stressed conditions (reviewed recently, Horváth et al., 2008), and it has been shown both for Anabaena PCC 7120 and for Synechocystis PCC 6803 that some of the chaperonin protein associates with thylakoid membranes (Jager & Bergman, 1991; Kovács et al., 1994); the two homologues were, however, not distinguished in these early studies. In the latter case, this association is heat shock induced. More work remains to be conducted on the distributions of the two homologues in Cyanobacteria and their possible specific roles in promoting photosystem stability.
Multiple chaperonin genes in the Chlamydia
All 13 sequenced Chlamydia (representing six different species of Chlamydia and Chlamydophila among them) species have three cpn60 genes and one cpn10 gene in their genomes (Karunakaran et al., 2003; P.A. Lund, unpublished data). Because of the lack of a suitable genetic system for studying these organisms, information on their role is scantier than with the examples described above. These genes are of particular interest because considerable evidence implicates Cpn60 in some of the symptoms of chlamydial infection (e.g. Morrison et al., 1989; Peeling et al., 1998; Sasu et al., 2001). Analysis of the sequences of the three genes shows that the best match for each protein is always its orthologue from a different species (Karunakaran et al., 2003; McNally & Fares, 2007), strongly suggesting that the three genes were present in the common ancestor of all modern day chlamydial species. Cpn60.2 and Cpn60.3 are more related to each other than either are to Cpn60.1, suggesting either two gene duplication events (McNally & Fares, 2007) or possibly acquisition of one of the ancestral genes from another bacterium. These are in fact most highly diverged Cpn60 proteins studied in any bacterium, with sequence identities between paralogues well below 20% in some cases. Indeed, some of the mutations that have occurred during the evolution of these genes would appear to rule out their functioning as classical chaperonins at all. For example, the sequence GDGTTT, which is extremely highly conserved in all prokaryotic and eukaryotic chaperonins and is important for ATP binding and hydrolysis, is mutated to GDGAKT in Cpn60.2 and ADGVIS in Cpn60.3 from Chlamydia trachomatis. This implies that Cpn60.1 will prove to be the housekeeping chaperonin, and the others may have diverged to take on other functions. Consistent with this finding, only cpn60.1 is capable of complementing a groEL mutant of E. coli (Karunakaran et al., 2003). Interestingly, only the cpn60.1 genes encode a protein with any sign of a GGM motif in the C-terminus, although it is highly reduced compared with other Cpn60 homologues (see Fig. 5). A detailed bioinformatic analysis of the sequences in several different Chlamydia suggests that divergence of the paralogues to bind different protein substrates is quite likely, but the nature of these substrates is unknown (McNally & Fares, 2007).
The regulation of the three genes has been studied, with intriguing results. All three genes are expressed throughout the life cycle of C. trachomatis in HeLa cells, with cpn60.1 being the most highly expressed. cpn60.1 was also the only one of the three genes that was induced when cells infected by C. trachomatis are heat shocked (Karunakaran et al., 2003). Expression of the cpn10 gene (which is upstream of the cpn60.1 gene) was also induced, and Western blot analysis of elementary bodies with antibodies specific for each paralogue also showed higher levels of expression of Cpn60.1. Consistent with this, the cpn.1 operon has a CIRCE sequence upstream whereas the other two cpn60 genes do not (Tan et al., 1996; Karunakaran et al., 2003). Further studies have shown not only that the expression of the cpn60.1 operon is indeed regulated via HrcA, but that the Cpn60.1 protein has a direct role in assisting the ability of HrcA to act as a repressor at CIRCE sites, while the other two Cpn60 proteins showed no such ability (Wilson & Tan, 2004; Wilson et al., 2005). However, in a separate study using Hep2 cells, cpn60.3 was the most highly expressed, and in a monocyte system, which has features of persistence, Cpn60.2 was by far the more highly expressed (Gérard et al., 2004). Expression levels in synovial tissue from patients with chlamydial-induced reactive arthritis showed high expression of cpn60.2 and cpn60.1 (with cpn60.2 always higher), and almost no expression of cpn60.3. These data are consistent with a model where the three genes are independently and differentially regulated and play different roles during the course of the chlamydial life cycle.
The evidence from this survey of four different groups of bacteria where multiple chaperonin genes are common provides strong circumstantial evidence, and more limited direct evidence, that a degree of subfunctionalization has taken place in the proteins encoded by these duplicated genes. In some cases (notably the rhizobia), the specialization of the proteins to carry out different functions is only partial, in that overexpression of other cpn60 genes from the same organism can partially suppress phenotypes caused by the loss of individual cpn60 genes. Whether this represents an example of evolution still being very much in progress is not clear. In other cases, the evidence is good for a greater degree of specialization of function, but even here the evidence is incomplete. The most convincing example is M. smegmatis, where a very clear phenotype (loss of biofilm formation) has been linked to association of one of the chaperonin proteins with a specific protein. Gene divergence is highest in the Chlamydia, but the lack of adequate genetic tools for studying this organism means that the significance of this is hard to assess. As Table 1 shows, there are many other intriguing examples of additional bacterial genomes with multiple chaperonin genes present, but very little experimental work has been performed on them to date.
Much more remains to be done in this field. The wealth of information on multiple Cpn proteins present within large numbers of completed genome sequences has barely begun to be studied in depth with bioinformatic tools, and the genetic analysis of these multiples is currently restricted to a small number of bacteria. The construction of chimeras and site-directed mutants has the potential to map regions of chaperonins that may be important in understanding their specialized functions (and will also enable the testing of hypotheses that may emerge from bioinformatics analysis). It is important, too, that studies on phenotypes are not restricted by what is known about GroEL function in E. coli, and that the possibility that some duplicated chaperonins may take on entirely new cellular roles should always be entertained when designing whole organism studies. Data are already available showing that chaperonins can take on many diverse functions other than just protein folding: these include acting as a lectin (Benkirane et al., 1992), acting to stabilize membranes (Török et al., 1997), acting as neurotoxins (Yoshida et al., 2001), and having diverse roles in bacterial infection (Henderson et al., 2006). Ultimately, it is to be hoped that the specialization of chaperonin proteins that appears to be occurring in some bacteria with duplicated genes will be explicable at a structural level, and for this reason more effort will need to be expended on determining structures of chaperonins with novel functions. The possibility that such studies will reveal new aspects of chaperonin function should be attractive not only to chaperone aficionados, but also to all biologists interested in establishing the links between genome content and organismal function.
I am very grateful to Nick Loman for help with the database searches. Work in my laboratory on multiple chaperonin genes has been supported by the Biotechnology and Biological Sciences Research Council, the Wellcome Trust, NATO, and the Darwin Trust of Edinburgh.