Summary: Biofilms are complex microbial communities found at surfaces that are often associated with extracellular polysaccharides. Biofilm formation is a complex process that is being understood at the molecular level only recently. We have identified a novel domain that we call the G5 domain (named after its conserved glycine residues), which is found in a variety of enzymes such as Streptococcal IgA peptidases and various glycosyl hydrolases in bacteria. The G5 domain is found in the Accumulation Associated Protein (AAP), which is an important component in biofilm formation in Staphylococcus aureus. A common feature of the proteins containing G5 domains is N-acetylglucosamine binding, and we attribute this function to the G5 domain.
In nature, a large fraction of bacteria are found to be associated with surfaces. Bacteria attached to a surface form a biofilm (O'toole et al., 2000; Sauer, 2003) that protects the bacteria from harsh physical conditions and forces. Biofilms also provide protection from attack by antibiotics and the immune system. The formation of biofilms occurs in several steps. Initially adherence to a surface (adhesion) is followed by microcolony formation (growth and recruitment) leading to maturation (structures). Biofilms are also medically important since they lead to chronic infection of implants and recalcitrance to antimicrobial therapy. Here, we discover a novel domain found in a wide variety of proteins including some involved in biofilm formation that may help to dissect the function of these proteins.
METHODS AND RESULTS
The G5 domain was first noticed as a family in the Pfam-B database. Pfam-B is an automatically generated clustering of proteins that is supplemental to the Pfam resource of protein family alignments and profile hidden Markov models (HMMs) (Bateman et al., 2004). Pfam-B is directly derived from PRODOM and entry Pfam-B_921 corresponds to PRODOM entry PD005454 in 2002.1 release (Corpet et al., 2000). This Pfam-B family contained 35 members found in a wide variety of different domain architectures in proteins associated with bacterial pathogenesis and antibiotic resistance. To detect further members of the family HMMs were built using the HMMer software (http://hmmer.wustl.edu/) from the Pfam-B alignment in both global (ls) and local (fs) modes and iteratively searched against Swiss-Prot and TrEMBL until no new sequences were found. Inclusion E-value thresholds of 0.1 and 0.01 were set for global and local HMMs, respectively. This resulted in the detection of 90 copies of the domain, which was added to the Pfam database with accession number PF07501. PSI-blast searches were carried out from a number of starting points to validate the alignment and the membership of the original Pfam-B alignment.
This region ranges in size from 76 residues in the beta galactosidase protein from Streptococcus pneumoniae (Q8L3E9) to 84 residues in the Methicillin-resistant surface protein from Staphylococcus aureus (P80544). The alignment of the G5 domain contains a few highly conserved residues, see multiple sequence alignment shown in Figure 1. None of these conserved residues is the polar types of amino acids found in active sites, hence it seems unlikely that this region has an enzymatic function. However, in nearly all cases the G5 domain is associated with a known enzymatic domain. Therefore, the G5 domain may confer localization or substrate specificity on the proteins in which it is found. Other alternative functions could be allosteric regulation of the enzymatic domain or co-factor binding.
The Pls protein from S.aureus (Swiss: PLS_STAAU, P80544) contains five copies of the G5 domain. (Fig. 2). The Pls protein is found to be cleaved between residues 387 and 388 by plasmin (hence the Pls name after Plasmin sensitive) (Savolainen et al., 2001). The pls gene is carried on a mobile genetic element (SCCmec type I) containing the β-lactam resistance gene, mecA, that are associated with some methicillin resistance S.aureus (MRSA) strains (Ito et al., 2001). Staphylococcus epidermis also contains a number of proteins related to Pls (Hussain et al., 1997) that contain between four and seven G5 domains. These proteins have been called accumulation-associated proteins and they appear to have a role in biofilm formation. It has been suggested that these proteins bind to PIA, the β-1,6-linked glucosaminoglycan that is the key matrix component of biofilms in S.epidermis. The biofilm-binding capacity of the Pls may therefore be crucial for the ability of S.epidermidis cells to be recruited and integrated into a maturing biofilm. As a corollary to this, the presence of the pls gene on the SCCmec type I element may enhance the antibiotic resistance by promoting biofilm formation in some MRSA strains.
Some strains of S.aureus also contain a homologue of Pls, SasG, that contain seven G5 domains. In addition to the G5 domains, both Pls and SasG contain an N-terminal domain that has been demonstrated to be involved with host ligand-binding (Roche et al., 2003). We have found that this region is likely to be a lectin domain. Adherence assays have demonstrated that both SasG and Pls promote binding to human desquamated nasal epithelium cells, and that it is the N-terminal domain that binds an as yet unidentified ligand on the epithelium cells (Roche et al., 2003). Recombinant protein studies have demonstrated that G5 domain containing region of the SasG is not involved in adherence to the host molecule. In addition to its ability to promote host–cell interactions, the SasG protein also causes bacteria cell–cell interaction. Expression of sasG in recombinant Lactococcus lactis caused aggregation, whereas aggregation was absent in the wild-type cells lacking sasG (Roche et al., 2003).
IgA1 peptidases are important proteins for pathogens that colonize mucosal surfaces. To evade the host immune defences, these bacteria produce a peptidase that cleaves IgA1—the predominant immunoglobulin of mucosal surfaces. This activity has evolved independently multiple times, since IgA1 peptidases from different classes of bacteria are unrelated. The IgA1 peptidase from streptococci are metallopeptidases that belong to family M26 (Rawlings et al., 2002).
The G5 domain is found in a number of S.pneumoniae glycosyl hydrolase enzymes. These enzymes are thought to help the organism break down oligosaccharides in its environment to provide a carbon source. β-N-Acetylglucosaminidase known as StrH (Swiss:STRH_STRPN, Q8DRL6) contains two G5 domains as well as two domains of the glycosyl hydrolase 20 family, which catalyse the hydrolysis of terminal non-reducing N-acetyl-d-hexosamine residues in N-acetyl-β-d-hexosaminides (Clarke et al., 1995).
EndoD (endo-β-N-acetylglucosaminidase) from S.pneumoniae cleaves the di-N-acetylchitobiose structure (a 1–4 linked disaccharide of N-acetylglucosamine) in N-linked oligosaccharides and generally acts on complex oligosaccharides after the removal of external sugars by β-galactosidase and β-N-acetylglucosaminidase. The catalytic activity of this protein residues in the N-terminal region that belongs to glycosyl hydrolase family 85. Muramatsu et al. (2001) noted that the C-terminus of EndoD was related to S.pneumoinae β-galactosidase in the region, which contains the G5 domain. The G5 domain seems likely to bind to the N-acetylglucosamine residues of di-N-chitobiose providing substrate specificity for the enzyme. In other proteins, this binding function could lead to a more general localization to the cell surface and provide an adhesive function.
The G5 domain is found at the C-terminus of some members of the VanW protein family. VanW is found as part of the Vancomycin resistance gene cluster, but the function for these proteins has not yet been identified. Vancomycin is often seen as a drug of last resort because some bacterial strains have become resistant to so many other antibiotics. Vancomycin action occurs by binding to the peptide component of unlinked peptidoglycan and blocking the action of transpeptidase enzymes. If VanW does have a direct or indirect role in Vancomycin resistance then the potential function of the G5 domain in binding NAG could provide localization to the NAG component of peptidoglycan. VanW may bind to other Vancomycin resistance proteins and help to localize them to unlinked peptidoglycan chains. The G5 domain containing VanW proteins are specific to clostridial species and Deinococcus and are not found in the clinically important enterococci strains.
The G5 domain is currently found widely distributed in both high- and low-GC gram positive eubacterial genomes. It is particularly expanded in the streptococcal genomes, with many of the domain architectures only seen in this group of bacteria. Based on the ability of AAP to bind PIA as well as the appearance of G5 domains in StrH and EndoD that act on N-acetylglucosamine-related substrates, we have suggested that this domain may bind NAG. Thus, we suggest that G5 domain is an important molecular component in a variety of biological situations including cell wall degradation as well as biofilm formation.
Supplementary data for this paper are available on Bioinformatics online.
We would also like to thank Neil Rawlings and David Studholme for useful comments on the manuscript. A.B. is supported by the Wellcome Trust.