Cyanobacterial tRNALeu (UAA) intron sequences from natural populations of Nostoc and other cyanobacteria were compared. Variation between the different introns was not randomly distributed but strongly restricted by the secondary and tertiary structure of the intron.
Although all Nostoc sequences examined shared high similarity, differences were observed in one stem-loop. This stem-loop could be divided into two classes, both built up from two base pairing heptanucleotide repeats. Size variation was primarily caused by different numbers of repeats, but some strains also contained additional sequences in this stem-loop not following the heptanucleotide repeat motif. Several sequences showing similarity with these additional sequences were identified in the Nostoc punctiforme genome. Furthermore, the regions flanking these sequences contained the same, or similar, heptanucleotide repeats as those flanking the corresponding sequences in the intron. It is proposed that both slipped strand mispairing during replication and homologous recombination among different loci in the genome are important processes causing variation between introns.
The coding sequences of many genes are interrupted by stretches of noncoding DNA termed introns. Introns are widespread genetic elements found in all major groups of organisms. Since they were discovered, their origin has been debated and different theories have been proposed (Darnell and Doolittle 1986<$REFLINK> ; Bell-Pedersen et al. 1990<$REFLINK> ).
Group-I and -II introns are two types of RNA enzymes that catalyze their own splicing by different mechanisms (Saldanha et al. 1993<$REFLINK> ). Although both groups are widely distributed, group-I introns have a broader phylogenetic distribution interrupting a wide variety of mitochondrial and plastid genes, nuclear rRNA genes, bacterial tRNA genes, and protein-coding genes of eukaryotic organisms and viruses. The ability of group-I introns to catalyze their own splicing is dependent on their highly conserved secondary and tertiary structures. Different group-I introns have relatively little sequence similarity, but all share a series of short conserved elements, P, Q, R, and S, known as the catalytic core (Cech 1988<$REFLINK> ).
The tRNALeu (UAA) intron in cyanobacteria is a group-I intron and has been suggested to be of ancient origin because it interrupts, in all known cases, the tRNA gene at a conserved position in the anticodon triplet (between the second and third base). Moreover, the tRNALeu (UAA) introns from cyanobacteria and chloroplasts show high homology and they are more closely related to each other than to any other known sequences, indicating an early origin. The fact that the intron is not found in all, but in many cyanobacterial and chloroplast lineages is best explained by multiple losses (Paquin et al. 1997<$REFLINK> ; Besendahl et al. 2000<$REFLINK> ). However, the evolutionary history of the tRNALeu (UAA) intron is still debated, and it has been suggested that the intron is of more recent origin and has been introduced through lateral transfer (Rudi and Jakobsen 1997<$REFLINK> , 1999<$REFLINK> ). In this study we will discuss the evidence supporting these theories.
In a number of studies, we have used the tRNALeu (UAA) intron as a marker for diversity and specificity of cyanobacteria symbiotically associated with plants and fungi (Paulsrud and Lindblad 1998<$REFLINK> ; Paulsrud, Rikkinen, and Lindblad 1998<$REFLINK> ; Costa, Paulsrud, and Lindblad 1999<$REFLINK> ; Paulsrud, Rikkinen, and Lindblad 2000<$REFLINK> ; Costa et al. 2001<$REFLINK> ; Paulsrud, Rikkinen, and Lindblad 2001<$REFLINK> ). The symbioses we have been interested in all contain cyanobacteria of the Nostoc type as their symbiont. The tRNALeu (UAA) intron has been instrumental in revealing both interesting patterns of diversity and spatial distribution of the cyanobacterial symbionts. Our previous studies provided us with a large number of sequences from closely related Nostoc strains from natural populations. Using these sequences, as well as other tRNALeu (UAA) intron sequences from the databases, it is possible to study evolutionary patterns in this genetic marker and to thus gain interesting insights into its evolution.
Materials and Methods
Sequencing and Alignments
Amplification and sequencing of the tRNALeu (UAA) intron was performed as previously described (Paulsrud, Rikkinen, and Lindblad 2001<$REFLINK> ).
Listed in table 1 are all strains and GenBank accession numbers of sequences used in the present study. Strains are arranged according to Bergey's Manual of Systematic Bacteriology (Garrity 2001<$REFLINK> , pp. 473–599). All sequences were imported to MacVector™ 7.0 for alignments (using ClustalW) and manual manipulations. The alignment of the tRNALeu (UAA) intron sequences was adjusted manually to align homologous positions using the secondary structure model suggested for the Anabaena sp. PCC 7120 intron (Cech, Damberger, and Gutell 1994<$REFLINK> ). Alignment of the regions shown as schematic hairpins in figure 1 was not attempted. Type-II intron sequences (Rudi and Jakobsen 1999<$REFLINK> ), deposited in GenBank as cyanobacterial, were not considered in this study (see Discussion).
Sequences were used to construct a prediction of the secondary structure of the intron based on the model suggested for the Anabaena sp. PCC 7120 intron (Cech, Damberger, and Gutell 1994<$REFLINK> ) (fig. 1 ).
RNA mfold Version 2.3 (http://bioinfo.math.rpi.edu/∼mfold/rna/form1.cgi) was used to predict the hairpin structure formed by the first variable region of the intron in the two Nostoc symbionts Nos9 and Nos30 (fig. 3 ). Default settings were used, except for the temperature parameter, which was set to 20°C (Zuker 1989<$REFLINK> ).
All figures were designed using Adobe Illustrator 9.0.
Results and Discussion
The alignment of cyanobacterial tRNALeu (UAA) introns reveals great sequence similarity (see alignment in Supplementary Material). Sequence variation is mostly confined to certain regions that, when the alignment is compared with the secondary structure predictions, are localized in some of the loops or hairpin structures. Figure 1 shows the 2D structure of the consensus sequences of the tRNALeu (UAA) intron from all Nostocs (fig. 1A ) and from representatives from all five cyanobacterial subsections (fig. 1B ). All size variation within the genus Nostoc can be found in the hairpin by structural element P6b. The length of this hairpin ranges in size from 78 to 127 nt in the Nostoc sequences (fig. 1A ). The nature of the size variation in Nostoc will be discussed subsequently. When cyanobacteria from different sections are considered, size variation is found in three additional regions in the structure (fig. 1B ). Size variations in hairpin extensions are crucial when considering aligning introns of different lengths in order to obtain a correct alignment of homologous positions.
All 54 Nostoc strains share a highly conserved intron sequence having only few variable positions. With one exception, a base pair in element P9, all these variable positions also show variability when different strains from different subsections are compared. In addition to these positions, the comparison of cyanobacteria from different subsections also contains several additional variable positions, especially in element P2 (fig. 1B ). However, the loop of the element P2 (L2), which is known to be involved in tertiary interactions with the element P8 (Cech, Damberger, and Gutell 1994<$REFLINK> ), is conserved.
In cases where variation in single nucleotides occurs, the secondary structure of the intron is often retained. This can be seen both in cases where a change in one position is accompanied by a change in the base pairing strand so that different sequences either have a G:C or A:U base pair in this position, as well as where base pairing of G:U type allows changes to occur on one strand without disturbing the base pairing structure. In figure 1 we have labeled these degenerate bases where different sequences among those used to construct the consensus sequence show either different canonical base pairs (that is either A:U or G:C) or differences between sequences involving G:U base pairing, in combination with either A:U or G:C base pairing. As can be seen in figure 1 , most of the degenerate positions in base pairing regions are also labeled to indicate the retained structure among different sequences. There are two possible explanations for this: (1) there may be a requirement for base pairing to retain the structure needed for autocatalysis of the intron, which would give a selection pressure for compensatory changes, and (2) the tendency for a higher mutation rate in unpaired or mispaired bases on structures formed by single-stranded DNA during, for example, transcription may also play an important role (Wright 2000<$REFLINK> ).
Heptanucleotide Repeats as a Source of Variation
When all available Nostoc tRNALeu (UAA) intron sequences are compared, most differences are found in the first variable region of the intron. This region was found to consist of degenerate heptanucleotide repeats (fig. 2 ) that fold into a hairpin structure (fig. 3 ) allowing base pairing of the repeats. Figure 2 shows the different heptanucleotide repeats in this region for all Nostoc tRNALeu (UAA) intron sequences. Size variation in the variable region is caused by different numbers of repeats and, in some cases, by the insertion of other genetic elements not having the heptanucleotide repeats. These elements are represented as schematic hairpins in figure 2 . For example, when comparing Nos28 and Nos29, an indel of a single repeat is responsible for their different sizes. On the other hand, when comparing Nos9 and Nos30, the size difference is caused by an additional sequence inserted within one repeat (fig. 3 ).
On the basis of the observation that different groups of degenerate heptanucleotide repeats could be found, the sequences were divided into two classes. In class 1, the region is built from two groups of repeats, one with the consensus sequence TDNGATT and the other, its pairing repeat, with AATYHAA. Nostoc punctiforme PCC 73102 (Nos6) can be found in this class. Class 2 is also built from two groups of repeats, but with the consensus sequences NNTGAGT and its base pairing repeat AACTCHN. Nostoc commune (Nos4) belongs to this class. Between the two sets of base pairing repeats, a central loop with different sequences in the two classes is found (fig. 2 ). Within the different groups of heptanucleotide repeats, several subgroups could be identified based on some repeats proving more common than others. These subgroups are also represented in figure 2 .
Increasing numbers of short-sequence DNA repeats are being identified in prokaryotes as more genomes become available and different algorithms that allow a simple way of detecting repeats are developed. Also, in cyanobacteria, several short-sequence repeats have been found, but the significance of their existence is still unknown (Mazel et al. 1990<$REFLINK> ; Asayama et al. 1996<$REFLINK> ; Lindberg, Hansel, and Lindblad 2000<$REFLINK> ). A possible mechanism involved in their formation is slipped strand mispairing occurring in combination with inadequate DNA mismatch repair pathways (Strand et al. 1993<$REFLINK> ). Also, the peculiar secondary structure of repetitive DNA allows mismatching of neighboring repeats, and depending on the strand orientation, repeats can be inserted or deleted during DNA duplication mediated by DNA polymerase (van Belkum et al. 1998<$REFLINK> ). These mechanisms can be responsible for the variation observed in the number of repeats in the variable region of the Nostoc tRNALeu (UAA) intron. The fact that sequences from Nostoc strains are different in this region has some implications for the alignment of sequences based on secondary structure predictions. Although it is often necessary to use secondary structure predictions in order to align homologous positions, as is the case with these intron sequences, one must be cautious when aligning hypervariable regions with little sequence similarity but with shared structural features (Hancock and Vogler 2000<$REFLINK> ). In this case, the Nostoc sequences could be divided into two classes with very different primary sequence but similar secondary structure in the variable region. To assume that the two classes are homologous based on their structural features would most likely be wrong. Supporting this is the fact that this region, in more distantly related cyanobacteria, such as Anabaena cylindrica and Calothrix desertica PCC7102 (GenBank accession numbers U83251 and U83252, respectively), contains the same heptanucleotide repeat motifs as the Nostoc class-1 sequences.
Sequences that did not follow the heptanucleotide repeat motif formed an additional hairpin that could be found interrupting a single repeat (e.g., Nos30) or between two consecutive repeats (e.g., Nos20) (fig. 2 ). These additional sequences were of different lengths; five were 24 nucleotides, one was 48 (2 × 24), and the other two were 42 and 45 nucleotides long. Figure 3 shows the hairpin formed by the first variable regions of the tRNALeu (UAA) intron from two Nostocs. As can be seen, the two regions share a high level of similarity and differ only by the extra hairpin formed by the sequence not having the heptanucleotide repeat motif. The genome sequence of N. punctiforme ATCC 29133 (Version 08jun00, last updated on September 1, 2000—http://www.jgi.doe.gov/JGI_microbial/html/nostoc/nostoc_homepage.html) was searched using all these additional sequences. Several similar sequences were found but only for the 24-nucleotide-long sequences (fig. 4 ). Most were situated in noncoding regions but some were found interrupting ORFs. The number of the contigs and the positions where these sequences were found are indicated in figure 4 . As the genome sequence is incomplete, sequence annotations may change when a more detailed analysis is made available. It should be pointed out that we are comparing sequences from different symbiotic Nostocs with the genome sequence of N. punctiforme, which does not contain any additional sequences in the variable region of the intron. The sequences found in the genome sequence show great similarity with their intron counterparts. When analyzing the regions flanking these similar sequences, heptanucleotide repeats similar or identical with those found in the intron sequences were identified (fig. 4 ). An explanation for the insertion of these additional sequences, consistent with our observations, is a mechanism allowing the exchange of different DNA sequences between separate regions of the genome containing similar heptanucleotide repeats motifs. Such a mechanism would involve homologous recombination between different loci in the genome containing similar heptanucleotide repeat motifs.
The question of the origin of introns has been controversial (Palmer and Logsdon 1991<$REFLINK> ; Shub 1991<$REFLINK> ). One view is that introns arose late in the evolutionary history and invaded originally uninterrupted genes. Another view is that introns are ancient components of genomes, which have been lost in many cases. The apparent lack of introns in the eubacterial lineage was used as an argument for the intron late hypothesis. When, in 1990, the first eubacterial intron was discovered in the tRNALeu (UAA) gene of cyanobacteria (Kuhsel, Strickland, and Palmer 1990<$REFLINK> ; Xu et al. 1990<$REFLINK> ), the intron early hypothesis was given an important argument. Since then, several eubacterial tRNA introns (of both group I and group II) have been found in different eubacterial lineages (Reinhold-Hurek and Shub 1992<$REFLINK> ; Paquin, Heinfling, and Shub 1999<$REFLINK> ). The question of the origin of these rare eubacterial introns has remained controversial. The tRNALeu (UAA) intron has been important in this debate. In contrast to other eubacterial introns interrupting tRNA genes, the tRNALeu (UAA) intron has features expected for an evolutionarily old genetic element. The tRNALeu (UAA) introns from cyanobacteria and chloroplasts are more closely related with each other than with any other introns. Their position between the second and third base of the anticodon is conserved, which also implies evolutionary age. The absence of the intron in some cyanobacterial lineages and algal chloroplasts would then be explained by a later loss of the intron (Paquin, Heinfling, and Shub 1999<$REFLINK> ; Besendahl et al. 2000<$REFLINK> ).
Recently however, data was presented that suggested a polyphyletic origin of tRNALeu (UAA) introns within the cyanobacterial radiation (Rudi and Jakobsen 1997<$REFLINK> , 1999<$REFLINK> ). Several authors have also cited these studies when the origin of the tRNALeu (UAA) intron has been discussed (Paquin, Heinfling, and Shub 1999<$REFLINK> ; Besendahl et al. 2000<$REFLINK> ). The evidence they presented for a complex origin of cyanobacterial tRNALeu (UAA) introns include the following: (1) presence of two different types of tRNALeu introns sometimes even in the same strain, one being of the type initially reported from cyanobacteria and chloroplasts and a second type more similar to other tRNA introns, and (2) lack of intron in several lineages.
The interpretation that this implies a complex pattern of intron evolution is, however, not unproblematic. The two different types of tRNALeu introns were found to be situated in different types of tRNA genes. The new intron type was inserted into a new type of tRNA gene; this would therefore not be a case of lateral transfer of introns but rather a case of a different tRNA gene containing a different intron. When the conventional tRNALeu (UAA) gene contained introns, they were always of the well-known type. It is also worth noting that all those sequences were obtained as PCR products from cultures from the NIVA collection, which consists of unialgal, but not axenic, cultures (Rudi and Jakobsen 1997<$REFLINK> , 1999<$REFLINK> ). The cyanobacterial origin of this second type of tRNA gene and intron thus remains to be shown by cloning of a greater genomic fragment containing both the new type of intron and clearly cyanobacterial genes. This is especially important because tRNALeu introns recently have been detected in other eubacteria such as Pseudomonas (GenBank accession number AY029760), which is a potential contaminant of cyanobacterial cultures. The many cyanobacterial genomes that are currently being sequenced will undoubtedly soon shed light on this controversy.
From the comparative analysis of tRNALeu (UAA) intron sequences in the databases we found interesting evolutionary patterns. Single nucleotide differences are, as expected, restricted by the secondary structure of the intron. This is seen in different ways: (1) different stem-loops contain different degrees of variation, and (2) much of the variation retains the secondary structure in such a way that nucleotides in base pairing positions only change to new base pairing nucleotides.
When analyzing size differences between closely related Nostoc sequences, these were found to be caused primarily by different numbers of copies of heptanucleotide repeats and, in some cases, because of additional sequences. Slipped strand mispairing was suggested to cause the difference in number of repeats, and homologous recombination within the genome was suggested to give rise to the additional sequences.
See accompanying files for alignments and color versions of the figures.
Rodney Honeycutt, Reviewing Editor
Contributed equally to this study
Keywords: cyanobacteria tRNALeu intron evolution heptanucleotide repeat Nostoc
Address for correspondence and reprints: José-Luis Costa, Department of Physiological Botany, Evolutionary Biology Centre, Uppsala University, Villavägen 6, SE-752 36 Uppsala, Sweden. firstname.lastname@example.org .
This study was financially supported by Anna and Gunnar Vidfelt's Fond, Stiftelsen Oscar and Lili Lamms Minne (to P.P.) and the Swedish Natural Science Research Council (to P.L.).