Most classical integrases of prokaryotic genetic elements specify integration into tRNA or tmRNA genes. Sequences shared between element and host integration sites suggest that crossover can occur at any of three sublocations within a tRNA gene, two with flanking symmetry (anticodon-loop and T-loop tDNA) and the third at the asymmetric 3′ end of the gene. Integrase phylogeny matches this classification: integrase subfamilies use exclusively either the symmetric sublocations or the asymmetric sublocation, although tRNA genes of several different aminoacylation identities may be used within any subfamily. These two familial sublocation preferences imply two modes by which new integration site usage evolves. The tmRNA gene has been adopted as an integration site in both modes, and its distinctive structure imposes some constraints on proposed evolutionary mechanisms.
Received May 23, 2001; Revised November 9, 2001; Accepted November 14, 2001.
Genetic elements capable of horizontal transfer and integration into host chromosomes are key agents in the evolution of cellular genomes; bacterial pathogenicity in particular is shaped by integrative elements that can carry with them genes promoting virulence. Integration site specificity is determined primarily by the integrase enzyme typically encoded within elements of several types: temperate bacteriophages, integrative plasmids, pathogenicity islands and conjugative and mobilizable elements. Integrases have also been harnessed for stably inserting foreign genes into bacterial chromosomes in the construction of useful strains. It is therefore important to understand both the mechanism and evolution of integration site usage.
The longest studied integrase, from phage lambda, has the following prototypical properties (1): it (i) catalyzes integration and excision of a genetic element, (ii) has one highly preferred integration site in the host chromosome (attB), (iii) recombines segments of identical sequence within attB and the attP region in the element (2,3), and (iv) belongs to the tyrosine recombinase family. This large protein family, defined by sequence consensus in the C-terminal domain that includes invariant residues involved in catalysis (4,5), contains many members that do not fit the above criteria for a classical integrase; some are involved in resolution of chromosome or plasmid multimers, or shufflon or integron function, and yet other members recombine non-identical sites. Phylogenetic analysis of the tyrosine recombinases has revealed several subfamilies (4), but a robust phylogenetic tree has been difficult to construct, as might be expected for a functionally diverse protein family that often promotes mobility of its own genes.
The lambda integrase is heterobivalent; each molecule can simultaneously bind two types of DNA site, with partitioning among domains: the central domain binds the ‘core’-type site and the N-terminal domain binds the ‘arm’-type site (1). In both attP and attB, an inverted pair of core sites closely surrounds the 7-bp crossover segment where 5′ strands are exchanged. attB consists of little more than this arrangement (6); attP is much larger (>200 bp) because it also contains multiple arm sites, as well as sites for auxiliary proteins that bind and bend DNA. Bivalency allows integrase molecules to couple attB to attP during assembly of a catalytic complex, in which four integrase molecules act on the four DNA strands of the two crossover segments. There seems to be no particular sequence requirement within the crossover segment itself, but activity is severely depressed if the two att sites are not identical there (7). The pattern of core, arm and auxiliary sites in the attP–attB pair differs from that in the pair of hybrid sites attL and attR formed by integration, providing a basis for controlling the directionality (integration versus excision) of recombination.
Less well understood than how established sites are used is how new integration sites arise. Focusing on tRNA genes is useful, as they frequently serve as integration sites (8); indeed this survey shows that they do so in the majority of cases. The uniformity of tRNA genes thus provides a convenient format for comparing most integration sites. Usage of tRNA gene attBs is here sorted into four classes, according to sequence identity between element and host, which are presumed to use three different regions in tDNA as crossover sites. Two of these regions are marked by symmetry, employing tDNA for either the anticodon stem–loop or the T stem–loop. The third region, as originally noted in a study of the attB for phage mv4 (9), is located further 3′ and marked by asymmetry. A full switch to a new attB entails changes in both attP and the integrase gene, whose co-evolution is facilitated by their typically adjacent location in genetic elements. Accordingly, integrase phylogeny is compared with tRNA gene sublocation use, revealing that integrase subfamilies can mix the usage of the two symmetric regions, but that subfamilies using the asymmetric region use it exclusively. This correlation provokes discussion of the evolution of new integration site/integrase combinations.
USE OF tRNA AND tmRNA GENES AS INTEGRATION SITES
tRNA gene attB sites have been characterized at three levels: (i) in vitro analysis of integration at three tRNA gene attBs has identified 7-bp crossover segments where 5′ overhang strands are exchanged (10–12). (ii) Deletion analysis of five tRNA genes has defined minimal segments required for attB function (9–13); these are small (16–30 bp) relative to minimal attPs (>200 bp), and centered at identified crossover segments. (iii) Sequencing of integration site regions for the unintegrated and integrated states defines an uninterrupted block of sequence identity that includes known crossover segments. This identity block is essentially the portion of the tRNA gene that is displaced upon integration yet restored by a viral copy (Fig. 1).
Some previously identified attB sites (for Gamma, TPW22 and bIL286) are recognized here as tRNA or tmRNA genes, and the assignment of the phage T12 attB as a tRNA-Ser gene (15) is corrected; it is instead the tmRNA gene (ssrA) of Streptococcus pyogenes. tmRNA is a bacterial RNA with some structural similarity to tRNA (Fig. 2), but a different physiological role (see below, ‘tmRNA gene usage’). Revisiting regions surrounding integrase genes in completed genome projects, endpoints in tRNA genes can be newly proposed for putative cryptic elements in Pseudomonas aeruginosa, Bacillus subtilis, Deinococcus radiodurans, Sinorhizobium meliloti and Thermoplasma acidophilum chromosomes; these elements are given names such as Pae12G (see Table 1) that reflect the host species, size in kilobase pairs and the aminoacylation identity of the tRNA gene used. The B.subtilis element had been annotated previously as ϕ2 (16) without definition of its endpoints, and Pae12G contains the genome of the filamentous phage Pf1 (17) but with substantial additions on either side that include the integrase gene.
Examples of tRNA gene usage were considered redundant if the same tRNA gene was involved, and the pairwise distance score of the aligned integrase sequences (see Fig. 4 legend) was smaller than with any other integrase and less than one. In all, 61 unique examples of elements integrating specifically into prokaryotic tRNA or tmRNA genes are available (Table 1; Fig. 3). Five of these cases involve the tmRNA gene. Genes for Glu, Gln, His, Met, Trp and initiator tRNAs are absent from the list; it is not yet clear whether any of these are inherently unsuitable for use by genetic elements. Some integration events appear to damage the displaced portion of the original tRNA gene (Fig. 3, lines 38, 44, 60 and 61), making excision less favorable to the host because it would generate a non-functional gene; such behavior may be an adaptation of the associated integrase.
To address the question of how frequently integrases of the tyrosine recombinase family use tRNA or tmRNA genes as attB sites, the literature was searched for non-redundant cases where integration specificity was especially well determined (Table 2). Fifty-eight cases were identified, and for 34 of these (59%) the attB is in a tRNA or tmRNA gene. The integrase subfamily with the largest number of unique attBs (LC3, found in Gram-positive hosts) uses tDNA rather infrequently, and if this group is excluded, 75% of the remaining attBs are in tDNAs or tmDNAs. A recent survey of integration sites for pathogenicity islands alone (mostly from proteobacterial hosts) similarly concluded that 75% are in tRNA or tmRNA genes (63). Another indication of the prevalence of tRNA gene attBs comes from Escherichia coli O157:H7, which among currently available genomes has by far the highest known count of apparently intact integrase genes; all are within islands or prophages not present in E.coli MG1655, and 12 of these 20 are flanked by tRNA or tmRNA genes (23). Despite going unnoticed until 1987 (44), tRNA gene usage predominates among prokaryotic elements.
INTEGRATION SITE SUBLOCATIONS WITHIN tRNA GENES
The sequence-identity block common to an attP–attB pair can be readily determined; it usually extends to, and sometimes well beyond, the 3′ end of the RNA gene (Fig. 1), which can be explained biologically by the need to retain function of the RNA gene after integration. Some of these identity blocks are quite long, implying an event during the evolution of new integration site/integrase combinations in which a segment of the host genome is captured by the attP of the element (64). The end of the identity block internal to the tRNA gene indicates with imprecision the location of strand crossover; identity might continue beyond the true crossover segment by one or a few positions, simply by chance, as a reflection of core site symmetry, or as a remnant from the original host DNA capture event. In Figure 2, tDNA attBs are ordered according to the gene-internal extent of the identity block, which suggests an organization into four classes: in classes IA and IB the identity block encompasses the anticodon loop; in class II it encompasses the T loop without extending into the variable region; in class III it is further 3′ and does not (or occasionally barely does) fully encompass the T loop. Class IB has only one member that provides the single exception to the rule that integrating elements replace the 3′ end of the gene they disrupt; it instead replaces the 5′ end of a tRNA gene (26).
The three tDNA attBs where strand exchange has been examined all fall into class IA and the crossover segments map precisely to the 7 bp encoding the anticodon loop (10,12,33). It has been proposed that this coincidence reflects a preference of the associated integrases for symmetry of flanking segments (a known preference of lambda integrase), which is assured in DNA encoding stem–loop RNA (8,10). The identity block for the class IB attB, although arriving from the 5′ end, also encompasses the anticodon-loop tDNA. The proposal for class II is based on its clear discontinuity from class I and slight discontinuity from the distribution of class III, and moreover on the observation that it fully includes the T loop; class II may therefore reflect integrase preference for flank symmetry centered at the 7-bp T loop, as do classes IA and IB at the anticodon loop.
In contrast, asymmetry was the notable characteristic of the minimal form of the class III attB for phage mv4, suggesting a mode of integrase-attB recognition differing from the symmetry-based lambda model; it was moreover recognized that several additional attB sites are similarly positioned in an asymmetrical setting at the far 3′ end of tRNA genes, quite distant from the anticodon region attBs (9). Crossover segments have not yet been determined for any members of class III, but it can be noted that all their identity blocks contain the same 7-bp stretch corresponding to the last 3 nt of the T stem, abutting 4 nt of the acceptor stem, and that this stretch is also at the center of the minimal class III attB determined for phage mv4. Thus, the study of attP–attB identity blocks tentatively delineates three tDNA attB sublocations, two characterized by their symmetry, and the third by its asymmetry. Determining crossover segments for attBs outside of class IA will be necessary to ascertain whether T-loop tDNA is in fact used by class II, and whether class III is a consistent group using a single precise position in tDNA.
CORRELATION OF INTEGRASE PHYLOGENY WITH attB SUBLOCATION
Although the class II identity block ends appear distinct from those of class III, they might sceptically be viewed as a skewed tail of the latter distribution (bottom line of Fig. 3). The possibility that class II could use the symmetrical T-loop tDNA, which would be impossible for almost all of class III, gives more credence to the distinction. Separate classification receives further support from the phylogeny of the associated integrases. Several subfamilies of related integrases emerge clearly from phylogenetic analysis of the tyrosine recombinase family (4,28,53); the most comprehensive public database of integrase sequence alignments and subfamily assignments is the Tyrosine Recombinase Website (www.members.home.net/domespo/trhome.html). Despite such success, a complete phylogenetic history of the family has not been established. The relationships between subfamilies are mostly ambiguous, and many singleton integrases are too divergent to place in any subfamily. Moreover, only the catalytic domain has been used in these analyses; complete alignment of the other domains, responsible for DNA-binding specificity, has not yet been achieved. Still, if domain shuffling among integrase genes has been infrequent, the phylogeny of the catalytic domain may adequately track relationships among the specificity domains.
An alignment of the catalytic domain sequence from the 58 integrases of Figure 3 for which sequences are available was used for three types of phylogenetic analysis: Fitch-Margoliash, parsimony and quartet puzzling. Figure 4 summarizes the analyses on the framework of the Fitch-Margoliash tree. Most previously described subfamilies were supported by all analyses. A new subfamily emerged clearly, comprising the integrases from ϕ16-3 and the recently recognized Mlo45V, with some support for inclusion of the D3 integrase. All subfamilies use genes of multiple tRNA identities.
Some subfamilies (P22, CTX, 16-3, SSV) exclusively use one of the attB sublocations characterized by symmetry (classes IA or II), but others (P2 and FRAT) mix the use of these two sublocations. These mixed-use subfamilies contain members known to promote strand exchange precisely at the 7 bp encoding the anticodon loop, strengthening the hypothesis that class II sites similarly exchange strands at the 7 bp encoding the T loop.
The status of the pSE subfamily is of interest because it may include the class IB integrase from Mlo38S, which is unique in replacing the 5′ portion of the tRNA gene into which it integrates. In the current release of the Tyrosine Recombinase Website, the Mlo38S integrase is absent, but its closest known BLAST relative (52% identity, from a segment of Bradyrhizobium japonicum DNA with the same attR as Mlo38S) (26) is included in the pSE subfamily along with the integrases from pSE101, pSE211 and Sco14R. Here, Mlo38S integrase was included in the pSE subfamily by parsimony analysis (along with the pSAM2 integrase), but the four-member pSE subfamily node occurred in only 44% of the Fitch-Margoliash bootstrap trees. The least intensive algorithm, quartet puzzling, which generally provided less support for other subfamilies, did not even support the core of this subfamily, pSE101 together with pSE211. Additional related integrase sequences will be necessary to determine conclusively whether the Mlo38S integrase is part of pSE subfamily; if so, it would be the only integrase subfamily involving hosts from more than one bacterial phylum, and would also link the unusual 5′ tDNA replacement by Mlo38S with the standard 3′ tDNA replacement by the other subfamily members.
Three subfamilies, P4, NBU and LC3, use exclusively the tDNA sublocation marked by asymmetry (class III). A pattern emerging from the phylogenetic analysis is that these latter subfamilies are segregated from those using exclusively the symmetrical sublocations (classes IA, IB or II, or mixtures). Some circularity should be admitted; for example, integrase phylogeny helped to sort the otherwise somewhat ambiguous attB of the element she (Fig. 3, line 38) into class III rather than class II; this in turn was used to tentatively sort Sme19T. What is not circular is that the ordered array of attB/attP identity blocks can be split (Fig. 3, class II/III border) so as to segregate the associated integrase subfamilies into two types. These two subfamily types, symmetry preferring and 3′ end preferring, imply two modes by which new integration site usage arises in tDNA.
Although the trees from the three analyses generally agreed on subfamily assignments, the rest of their branching patterns did not, and no nodes above the subfamily levels received significant levels of support by any analysis; therefore, such nodes in Figure 3 should be discounted and it is premature to discuss order or multiplicity for the emergence of sublocation symmetry preference.
Almost every integrase subfamily within the tyrosine recombinase family contains members using tRNA gene att sites (Table 2). The lambda subfamily can now be included in this group because the island 933M of the E.coli O157:H7 genome appears to use a class IA site in a tRNA gene (Table 1, line 7) and encode an early-branching member of the lambda subfamily (data not shown). The high frequency of tRNA gene use and its dispersion among integrase subfamilies suggests viewing the tDNA fraction of the genome as the true crucible where new site specificity evolves; non-tDNA sites may arise mainly from corruption of original tDNA site usage.
EVOLUTION OF NEW INTEGRATION SITE USAGE
Three general sorts of explanations, not mutually exclusive, can be proposed for how tRNA and tmRNA genes, which comprise <2% of bacterial genomes, come to serve so frequently as integration sites. (i) Drift: sequences similar to an established tDNA attB are most likely to be found in another tRNA gene. (ii) Selection: new site specificity can arise for virtually any chromosomal locus, but most sites do not serve the biology of genetic elements as well as do tRNA genes. (iii) Generic recognition: integrases recognize features that are generic for tRNA genes yet distinctive with respect to non-tDNA. How do the symmetry seeking and 3′ end seeking modes fit with these explanations?
The idea of sequence drift can be dispensed with first: sequences at a particular sublocation within tRNA genes may be so similar that a few mutations in attP can create a match to a new tRNA gene at the same sublocation. The best case in Figure 3 for this argument is the pair of elements pSE211 and pSE101 which function in the same host at class IA attB sites in different tRNA genes, using closely related integrases; the two anticodon stem–loop tDNAs have 16-bp blocks that are identical except for a 3-bp segment within the presumable crossover segment. With an integrase trained on one of these genes, a relatively small change in the attP might allow the element to switch and use the other tRNA gene. This sort of explanation does not seem broadly applicable, because the attB sequences used within integrase subfamilies do not generally appear closely related. For example, the class III region used by the LC3 subfamily varies from purine-rich to pyrimidine-rich among the tRNA genes recognized. However, there may have been pathways of gene switching that are obscured by the incompleteness of the data set.
Although the intact tRNA genes occasionally found within phage genomes are usually thought to improve decoding of phage mRNAs, they have also been proposed to play a role in the evolution of tRNA gene integration site usage through homologous recombination (68), but no specific scenarios have been described.
SELECTION FOR tRNA GENE INTEGRATION SITES
Two factors can be mentioned as favoring tRNA gene sites a priori over protein-coding genes, which typically comprise 85–90% of the bacterial genome. One factor is their reliability (68); combining data of Lynch (69) and Ochman et al. (70) allows the estimate that among bacteria, the sequence divergence rate per base pair for tRNA genes is from 4- to 9-fold (average, 6-fold) lower than for protein-coding genes. The stability of an attB sequence in tDNA may broaden host range or improve long-term survival prospects of a genetic element. A second factor favoring tRNA genes as integration sites is that their small size minimizes the amount of host DNA that must be captured in attP in order to restore the target gene upon integration. From most points within a protein-coding gene or operon, restoration would require the capture of an impractically large host fragment in attP.
Specific proposals have been made for benefits to molecular events in the life cycle of the genetic element from association with a tRNA gene. One is that an element may insert into a gene for a tRNA species decoding a codon that is more abundant in the element than in the host; this proposal has been applied to only one case (63). Another possible benefit could be transcriptional coupling of the integrated element to the tRNA gene, which would allow it to monitor the physiological state of the cell, as tRNA promoters are typically regulated by growth rate (71). However, there is not much regularity in the orientation of prophages to tRNA genes, and in half, or more, of the cases the attP carries with it an apparently strong rho-independent terminator that would act to sever this transcriptional connection (Table 1, columns ‘Orient. int’ and ‘Term.’). The tRNA gene setting might directly affect integrase function or the directionality of recombination in a way that is beneficial for genetic elements. It has been proposed that mature tRNAs occasionally hybridize to their own genes, commencing with the free 3′ CCA tail, and that the hybrid structure might somehow improve integrase action (72). The hypothesis has some problems: CCA tails added after transcription should not initiate hybridization to the 33% of attB tRNA genes that do not encode the tail (Fig. 3), and tDNA attBs are utilized efficiently (in vitro) by integrase in the absence of tRNA.
Whatever selective benefits there may be from tDNA location, coupling them with the typical integrase preference for symmetrical sites (8,10) may suffice to explain the familial use of the classes I and II attB sites. Familial use of the asymmetric class III sites is less obviously explained by selection alone; one possibility is that it may be more difficult to capture long DNA segments than short ones as 3′ gene ends, so that in an integrase subfamily with relaxed symmetry preference, we observe a statistical distribution of capture events that were as short as possible. This hypothesis would predict a heterogeneous pattern of positions for class III crossover segments, which makes their mapping more urgent.
HYPOTHESIS: GENERIC tDNA RECOGNITION
One interpretation of the familial use by integrases of the class III position, despite its lack of symmetry or sequence conservation, is that these integrases can actively recognize that particular sublocation within virtually any tRNA gene during the evolution of new integration site usage. Even the symmetry-preferring integrases could be directed generically to T and anticodon tDNA by some feature that marks tRNA genes and their sublocations.
What might mark tRNA gene sublocations so that integrases could find them? One model for tDNA marking would be the eukaryotic transcription factor IIIC which binds tRNA genes directly and specifically based on their conserved primary sequence features, found primarily at D and T stem–loop tDNA (73). No bacterial protein that acts likewise has been described.
Shape may accompany sequence as a marking principle. Following a proposal of Hou (72), transcription of tRNA genes (or post-transcriptional hybridization of mature tRNAs) could generate distinctive structures that would direct integrase to the DNA. One possibility is inspired by the behavior of RNA polymerase when it transcribes the primer for ColE1 plasmid replication. This RNA forms a special structure as it emerges from the polymerase, which at a certain point prevents the transcription bubble from collapsing behind the advancing polymerase, so that a persistent hybrid forms between transcript and template (74). This example may be a high-frequency form of behavior that occurs at low frequency for tRNA (or any) transcripts. Such template:transcript hybrids would be favored in a more negatively supercoiled domain of DNA, yet short-lived due to susceptibility to ribonuclease H. For a tRNA:tDNA hybrid, the opposite DNA strand would be free to present a distinctive pattern of both conserved primary sequence and secondary structure corresponding to the stem–loops of tRNA (Fig. 2). The strong bias observed in the attP capture of 3′ gene ends rather than 5′ ends might be partly established in the asymmetry of such a hybrid structure.
With such a hypothetical mechanism for reliably finding new tRNA genes as integration sites, benefits to the biology of genetic elements and their hosts arise from another feature of tRNA genes: their abundance in aggregate among genomes. This availability of many different tRNA genes in a host allows combinatorial acquisition of different genetic elements, without interference.
tmRNA GENE USAGE
The tmRNA gene has not been found in more than a single copy in any genome, yet it is used as attB as frequently as any tRNA gene, by members of four different integrase subfamilies at all three tDNA sublocations (Table 1). Its function is not that of a classical tRNA; although it is charged with alanine and the ribosome transfers that moiety to a nascent peptide, tmRNA does not read any codon (75). Rather, it is considered to solve problems arising from ribosomes that have stalled during translation (perhaps at rare codons or at the end of mRNAs that have no stop codon). tmRNA contains a reading frame that is translated, adding a peptide tag to the incomplete protein in the stalled ribosome, after transfer of its alanyl moiety and exchange with the troublesome mRNA. Translation continues to the stop codon in tmRNA, which rescues the stalled ribosome, and the tag targets the protein product for proteolysis. The tmRNA gene shows that prokaryotic genetic elements are not constrained to genes that provide classical tRNA function.
The distinctive structure of tmRNA (76) and its gene constrains hypothetical mechanisms of generic tDNA recognition. One of the arms of the tRNA L-shape, containing the acceptor stem and T stem–loop, probably forms similarly in tmRNA, while the other arm containing the D and anticodon stem–loops would have to differ in tmRNA (Fig. 2). No stem is apparent in the region equivalent to the D stem–loop, and the equivalent of the anticodon stem is elongated into an interrupted stem (P2) of ∼20 bp that is capped by a further-structured giant loop of ∼250 nt. Thus, at the 3′ end of the tmRNA gene, strict analogy to tRNA genes ends upstream of the T stem–loop. The features of the D stem–loop sequence that are conserved among tRNA genes cannot be found in tmRNA genes and should therefore not be included in any mechanism proposed to attract integrases.
One integrase of the P2 subfamily, another member of which is known to use the 7-bp anticodon loop tDNA as its crossover segment (10), exhibits apparent class IA usage of a tmRNA gene (Fig. 3, line 18) despite the absence of an equivalent to anticodon loop tDNA. However, dyad symmetry can still be noted (gray shading in Fig. 3) in the corresponding region of this tmRNA gene, which may satisfy integrase symmetry preference even though not expressed as stem–loop structures in the mature tmRNA. Selection of an imperfect mimic of anticodon stem–loop tDNA due to its position relative to true T stem–loop tDNA would tend to support the hypothesis of generic tDNA recognition. Similar dyads can be proposed for all the tmRNA genes known to serve as attBs, such that they still might form the same RNA–gene hybrid structure proposed for tRNAs (Fig. 2).
The high frequency at which tRNA genes are adopted as integration sites raises the question of how elements return to this class of genes. The striking outcome of this survey, that integrase subfamilies use characteristic sublocations within tRNA genes, poses a refined question: how do integrase clades direct the return to the same sublocation at many different tRNA genes? Although other explanations are possible, the correlation suggests that integrases may recognize some generic feature or form of tRNA genes before exercising their particular positional preference.
The evolution of new integration site usage may truly be beyond the reach of experimentation; its frequency is low, and it may depend on pre-marking of tRNA genes by events that occur at low frequency. Still, some accessible avenues should be explored further. (i) Crossover segments must be determined for more attBs, especially those of classes IB, II and III, for which there are yet no data; the crossover segments will provide a better basis for attB positional classification than the identity blocks used here. (ii) Investigating mechanisms for integrases that use class III sites could reveal how they apparently function without the symmetry preferences that have been established for lambda and other integrases. (iii) It may become possible to align integrase sequences outside of the catalytic domain, and find mechanistically relevant correlations between tRNA gene sublocation and the integrase domains responsible for DNA specificity. (iv) The binding properties of integrases for tRNA, single-stranded tDNA, or tRNA–tDNA hybrids may prove interesting.
The tRNA gene habit is not obligate among integration systems (a few members of the resolvase family of site-specific recombinases are known to provide integrase function for genetic elements, all at non-tDNA sites), but it is exhibited by elements of both prokaryotes and eukaryotes. Eukaryotic retroelements that target tRNA genes target them (together with other genes for Pol III-transcribed RNAs) collectively, to sites outside the mature RNA-coding sequence, and the elements are found as numerous repeats within a genome (77,78); in the best-studied case, targeting is based on a simple principle in which the genes are marked by binding of a Pol III transcription factor (79). Prokaryotic integrases target particular tRNA genes, one at a time, but may have the ability to move on to new tRNA genes at breakthrough points in their evolutionary history. This system allows genetic elements, often promoting survival of their hosts, to accrue in a combinatorial fashion: a genome will not be flooded by any one element, but rather could harbor a large number of different elements; witness E.coli O157:H7. The prokaryotic and eukaryotic systems are overwhelmingly different (no homology relationships have been detected between their integrases), but may derive similar benefits from their integration specificity. tRNA genes are numerous, reliable, uniform as a class yet distinctive in relation to the rest of the genome and they can be used innocuously.
Integrases may preserve an ancient but still profitable strategy of association; in some views of early evolution, the translational apparatus developed prior to the use of DNA as the genomic material, such that tRNA genes would have been uniform and numerous in the earliest DNA genomes. Continued success of an integration-specificity principle could have led to competition among genetic elements for tRNA genes, driving a subdivision of the niche through segregation of sublocation usage.
NOTE ADDED IN PROOF
Wassarman et al. (80) show that the phage P2 integration site in the E.coli chromosome is at the 3′ end of the gene for a small RNA (of unknown function but conserved among enterobacteria), in the inverted repeat encoding its apparent rho-independent terminator. This is the only example of an attB in a gene encoding something other than tRNA, tmRNA or protein, and points to generality in the correspondence of integration sites with conserved portions at the 3′ ends of transcripts.
This work was supported by NIH grant GM59881.
Tel: +1 812 856 5697; Fax: +1 812 855 6705; Email: email@example.com
1Integrase subfamily assignments from the Tyrosine Recombinase Website (www.members.home.net/domespo/trhome.html), adding the new subfamilies NBU (53), SSV (28) and 16-3 (Fig. 3); s, unclassified singleton; ?, integrase sequence unavailable; *, assigned here, integrase absent from website.
2Violet, Proteobacteria; cyan, Gram-positive bacteria.
3tRNA identity in one-letter amino acid code; Z, selenocysteinyl tRNA; X, tmRNA.
4Orientation to int gene: distance in base pairs from discriminator position (see Fig. 3) in attP tDNA to int (negative numbering when int upstream of tDNA); Nterm or Cterm, inside int at N-terminal or C-terminal end; > or <, same or opposite orientation for tRNA fragment and int gene.
5Apparent rho-independent terminator: distance from tDNA discriminator position (see Fig. 3) to last non-T stem nucleotide of terminator; ?, none found within 400 bp; ND, none found within available sequence <200 bp; NA, not applicable, archaeal host or 5′ end duplication.