Two-component systems (TCSs) are common signal transduction systems, typically comprising paired histidine protein kinase (HK) and response regulator (RR) proteins. In many examples, it appears RR and HK genes have fused, producing a “hybrid kinase ” We have characterized a set of prokaryotic genes encoding RRs, HKs, and hybrid kinases, enabling characterization of gene fusion and fission. Primary factors correlating with fusion rates are the presence of transmembrane helices in HKs and the presence of DNA-binding domains in RRs, features that require correct (and separate) spatial location. In the absence of such features, there is a relative abundance of fused genes. The order of paired HK and RR genes and the nucleotide distance between encoded domains also correlate with apparent gene fusion rates. We propose that localization requirements and relative positioning of encoded domains within TCS genes affect the function (and therefore retention) of hybrid kinases resulting from gene fusion.
Two-component systems (TCSs) comprise the majority of prokaryotic phosphotransfer signal transduction pathways and regulate a wide variety of cellular processes (Hoch and Silhavy 1995), suggesting that TCS pathway architecture represents a robust platform upon which evolution can act. An appreciation of the evolutionary forces that have acted upon TCSs will enhance our understanding of the properties of contemporary systems. To this end, we have characterized gross-level evolutionary changes acting on TCS genes: fusions and fissions.
Typical TCSs consist of a histidine protein kinase (HK) and a response regulator (RR) (fig. 1 A and B). Generally, HKs possess an N-terminal “input” domain, which perceives an environmental stimulus, and a C-terminal transmitter domain, which autophosphorylates upon stimulation. Input domains are often transmembrane (TM), perceiving extracellular signals. Typical RRs have a C-terminal effector/output domain and an N-terminal receiver domain, which is phosphorylated upon interaction with the transmitter domain of its partner HK. This alters the activity of the output domain, which often regulates transcription (Parkinson and Kofoid 1992). In bacteria, HK and RR genes encoding an entire TCS are typically found paired in the genome. However, in many cases the 2 genes have apparently fused, resulting in a “hybrid kinase” containing transmitter and receiver domains (fig. 1 C and D).
On 26 February 2007, 457 completed bacterial genomes were downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/. The presence of receiver and transmitter domains within annotated gene products was assessed as described previously (Cock and Whitworth 2007) using RPS-Blast (Marchler-Bauer et al. 2002). Isolated TCS genes had no other TCS genes within 5,000 nt, paired TCS genes were adjacent, on the same DNA strand and with no other TCS gene within 5,000 nt. Interdomain distances were derived assuming each domain spanned the region identified by an RPS-Blast hit. TM predictions were made using TMHMM v2.0 (Sonnhammer et al. 1998). The presence of DNA-binding domains was determined by RPS-Blast hits to a manually compiled list of PFAM domains (pfam00126, 00165, 00196, 00249, 00440, 00447, 00486, 01022, 01381, 02954, 04397, 04545, and 04967) using an expectation value cut-off of 10−4.
Results and Discussion
By comparing hybrid kinases with TCSs encoded by 2 separate genes, we have identified factors correlating with gene fusion. A total of 23,996 TCS genes were identified from 457 completely sequenced bacteria (Methods), a number similar to that expected by extrapolation from other studies (Koretke et al. 2000; Kim and Forst 2001; Zhang and Shi 2005). TCS genes were then classified as RRs, HKs, or hybrid kinases, and their gene organization was assessed. We then focused on “minimal TCSs,” which encode single transmitter and receiver domains (either as paired HK and RR genes or as isolated hybrid kinase genes). We have used abbreviations to capture the relative order of encoded transmitter and receiver domains of minimal TCSs and whether the TCS is 1 or 2 genes. Thus, T+R denotes a TCS with a HK encoded upstream of an RR, R−T denotes a hybrid kinase with an N-terminal receiver domain, and RT describes a TCS with a receiver domain encoded upstream of a transmitter domain (independent of whether 1 or 2 genes). Among the minimal TCSs, there were similar numbers of TR and RT systems (2,838 and 2,763, respectively); however, the proportion of single-gene systems (and therefore the apparent fusion rate) was found to be markedly different for TR and RT geometries: 27% and 3%, respectively, were hybrid kinases (independence rejected with chi-squared P value of 1.0 × 10−131).
Two factors important in TCS function that might affect TCS gene fusion are TM helices in input domains and the presence of DNA-binding domains, as these domains require separate spatial localization for function. TCSs were assessed for the presence of TM helices and DNA-binding domains (Methods). In all, 83% of minimal TCSs possessed TM helices and 25% contained DNA-binding domains (table 1). Most HKs possessed TM helices (4,183 out of 4,739 [88%]), as did a large number of hybrid kinases (458 out of 862 [53%]). The proportion of TM HKs was found to be much higher in R+T systems than in T+R systems (97% and 77%, respectively) and, additionally, was much lower in R−T systems than in T−R systems (1% and 60%, respectively). This suggests that there has been a selective pressure against the formation and/or retention of R−T systems that possess TM helices (and T−R systems to a lesser extent). For RT systems, gene fusion would result in a TM region being encoded within the protein core rather than at the N-terminus, presumably causing problems with appropriate membrane insertion. Indeed, our screens identified only 1 such protein, StyS of Xanthomonas axonopodis pv. citri (GI: 21244368).
|TM helices + DNA||880||33||139||0||1052|
|TM helices + DNA||880||33||139||0||1052|
There also appears to be a relationship between TCS geometry and the presence of DNA-binding domains (table 1). Of 4,739 RRs within minimal TCSs, 29% possess DNA-binding output domains. Small numbers of hybrid kinases were found to have DNA-binding domains (5% of T−R and 0% of R−T fusion proteins), as were R+T TCSs (6%), however, 57% of T+R systems had predicted DNA-binding RRs. Thus, it seems that hybrid kinases possessing DNA-binding domains are selected against, and this selection is stronger for R−T hybrids than for T−R hybrids (0% of DNA-binding RT geometry TCSs are hybrid kinases compared with 3% of TR geometry TCSs). RT geometry gene fusions would result in a C-terminal DNA-binding domain being transferred to the center of the hybrid kinase and presumably rendered nonfunctional.
The chi-squared test rejects the independence of gene order and the proportion of fused genes with P values of 6.1 × 10−3 and 1.6 × 10−237 when TM helices and DNA-binding domains, respectively, are excluded. However, considering only those TCSs that lack both TM helices and DNA-binding domains, we see similar proportions of fused genes in TR and RT geometries (61% RT, 65% TR, P value for independence 0.416), implying that the presence of TM helices and DNA-binding domains are the main factors affecting the propensity for gene fusion. It also suggests that in the absence of domains requiring specific spatial localization, evolution generates (and/or retains) fused gene products at a higher frequency than separated gene pairs, which agrees with more general studies (Snel et al. 2000; Kummerfeld and Teichmann 2005).
We also investigated whether the genetic structure of minimal TCSs might be related to gene fusion rates. Figure 2 shows the separation (nucleotides) between encoded transmitter and receiver domains for the 4 TCS geometries. Domains tend to be much closer to each other in TR than RT systems—because most RRs possess C-terminal output domains and typical HKs have N-terminal input domains. A subpopulation of T−R systems was apparent with an average domain separation of ∼500 bp (fig. 2), ∼400 bp larger than the main population, suggesting the presence of an additional domain between the transmitter and receiver domains. A periodicity of ∼400 bp was also observed for the interdomain distances of RT systems (fig. 2), presumably reflecting integer values of intervening domains between the transmitter and receiver domains. Comparing the domain separation distribution for T−R and T+R systems, it is apparent that domain separation is significantly larger for T−R than for T+R systems (fig. 2). Ten T−R TCSs and 1161 T+R systems have an interdomain distance of <50 nt (distance distribution modes are 69 and 26 nt, respectively). This suggests that a linker region of minimal length is required for appropriate interaction between the receiver and transmitter domains in a T−R hybrid kinase.
In summary, our findings suggest that the presence of TM helices and DNA-binding domains appear to be the primary factors correlating with observed rates of TCS gene fusion. In the absence of such domains, there appears to be a general tendency toward formation (and/or retention) of fused TCS systems. A further consideration is the relative genetic distance between encoded transmitter and receiver domains, which appears to be related to apparent TCS gene fusion rates in a geometry-specific manner.
Eighty-nine percent of minimal TCSs contain at least 1 TM helix and/or DNA-binding domain, suggesting that the role of the majority of TCSs is to couple extracellular sensing with transcriptional responses. This ability is presumably removed by gene fusion events, which enforce a single cellular location upon the entire TCS. However, the fusion event also removes a diffusion-limited step in signal transduction (formation of the HK:RR complex), such that the resulting hybrid kinase would, if functional, exhibit an increase in signaling speed and efficiency. Such fused TCSs could be regarded as a step backwards toward one-component systems, which consist of single proteins directly coupling an input and output domain (Ulrich et al. 2005). However, the newly formed hybrid kinase would retain its phosphotransfer signaling mechanism, providing additional opportunities for modulation of signal transduction by extrinsic kinases and phosphatases.
P.J.A.C. was funded by an Engineering and Physical Sciences Research Council studentship. D.E.W. was funded by Biotechnology and Biological Sciences Research Council grants P16665 and BBD0039891.