Evaluation of primers and PCR conditions for the analysis of 16S rRNA genes from a natural environment

We investigated biases occurring in the polymerase chain reaction (PCR) amplification of 16S rRNA genes from an environmental sample, by comparing the clone libraries that we had previously prepared from the gut homogenate of the termite Reticulitermes speratus . We detected a significant increase in the expected number of phylotypes by lowering the annealing temperature, and a significant decrease in the proportion of clones belonging to the predominant group by raising the number of PCR cycles. We also found that the Bacteria-universal primer, 63F, introduced a seriously biased amplification caused by primer mismatches, in contrast to a previous report. These results, together with suggestions from previous studies using simplified model samples, will help us to recognize the limitations of PCR-based analysis.


Introduction
Polymerase chain reaction (PCR)-based analysis of 16S rRNA genes is a powerful and essential tool for studies of bacterial diversity, community structure, evolution and taxonomy. It has enabled us to detect and identify as-yet unculturable bacteria, and in recent years has led to an enormous increase in our knowledge of bacterial ecology and taxonomy. However, the PCR procedure intrinsically results in biases and errors, attributable to complex factors such as preferential annealing between primers and templates, self-annealing between PCR products, di¡erent copy numbers of the targeted genes and the formation of chimeric sequences [1^7]. For these reasons, we need to interpret PCR experimental data with great caution.
In order to estimate the e¡ect of the parameters involved in PCR, such as the sequence of a primer, the annealing temperature and the number of PCR cycles, some studies have used a simpli¢ed model system with heterogeneous templates comprised of only 2^10 species [1,3^6]. These studies suggested that the annealing temperature should be low enough to suppress the ampli¢cation bias caused by primer mismatches, and that the number of PCR cycles should be reduced to minimize the e¡ects of primer mismatches and self-annealing between PCR products. In addition, Marchesi et al. [8,9] proposed the new Bacteria-universal primer set, 63F-1387R (later modi¢ed to 1389R by Osborn et al. [10]), as an alternative to the generally used 27F-1392R and 27F-1492R primers. Whereas this new primer set contains mismatches against some bacterial groups, they observed that it greatly improved the e⁄ciency of ampli¢cation, and ampli¢ed targets from a wider diversity of monocultured bacterial species, including mismatched targets, over 27F-1392R and 27F-1492R. Therefore, Marchesi et al. predicted that the new set would minimize ampli¢cation bias derived from the preferential annealing of primers. While all of these observational suggestions seem reasonable, it may be naive to assume that these predictions hold in a study using environmental samples, which contain hundreds of bacterial species, without ¢rst testing for ampli¢cation bias. However, there have yet to be any reports, except for a study using length heterogeneity analysis by PCR [11], on the e¡ect of these factors when using environmental samples, as opposed to a simpli¢ed model system.
In our previous study [12], we investigated the phylogenetic diversity of gut bacteria from the termite Reticulitermes speratus, and accumulated data from 16S rDNA clone libraries, prepared with various PCR primer sets and conditions, in order to evaluate the e¡ect of the primer sequences, annealing temperature and number of PCR cycles. Here, we report our evaluation of several Bacteriauniversal primers for 16S rRNA genes, annealing temperatures and numbers of PCR cycles, using environmental samples.

16S rDNA clone libraries
In our previous study [12], we established 14 clone libraries of 16S rDNA from the gut homogenate of the termite R. speratus (Isoptera, family Rhinotermitidae), and sequenced 96 clones from each of the libraries (Table  1). In the present study, we also prepared an additional two 16S rDNA clone libraries from the same DNA sample used in the previous study [12]. The PCR was performed under the following conditions: 2 min initial denaturation at 95 ‡C; 24 cycles of denaturation (30 s at 95 ‡C), annealing (1 min at 55 ‡C), and extension (2 min at 72 ‡C); a ¢nal extension at 72 ‡C for 10 min. The primer sets 63F-1389R and T63F(5P-CAGGCCTAACACATGCAAGTT-3P)-1389R were used to amplify the region corresponding to 64^1388 in Escherichia coli (J01695). The other conditions and procedures were as described previously [12]. We randomly chose 420 and 159 clones from the new libraries, prepared using 63F-1389R and T63F-1389R, respectively, and analyzed them by restriction fragment length polymorphism (RFLP) using the restriction enzymes HhaI and MboI (Takara). The restriction fragments were electrophoresed on 2.5% agarose gels (classic type; Nacalai) and sorted according to their RFLP patterns. A representative clone of each RFLP type was sequenced and sorted into a phylotype using the criterion of 97% sequence identity as described previously [12]. All of the sequences were subjected to the Check-Chimera program on the Ribosomal Database Project (RDP) web site [13]. After identifying the sequences, using FASTA 3.0 [14], phylotypes that we did not detect in our previous study were analyzed phylogenetically as described previously [12]. The taxonomic a⁄liation was conducted using the criterion of 80% bootstrap con¢dence level.
The sequence data of the new phylotypes will appear in the DDBJ, EMBL and GenBank nucleotide sequence databases under accession numbers, AB100456^99.

Statistical analysis
The diversity of the clone libraries was estimated by rarefaction analysis [15], using the software Analytic Rarefaction 1.3 (Steven M. Holland, University of Georgia), which is freely distributed at the web site http://www.uga. edu/strata/software/. Analysis was conducted according to the software's manual. The coverage [16] was calculated by the formula [13(n/N)], where n is the number of phylotypes represented by only one clone and N is the total number of clones. The calculation was conducted using the program LIBSHUFF [17] as described previously [12]. Table 1 16S rDNA clone libraries prepared from the gut homogenate of the termite R. speratus in our previous study [12] Clone library a Primer set b Annealing temperature (  a Ninety-six clones were analyzed from each of the clone libraries [12]. b The sequences and references were described previously [12]. c Concentration was calculated from A 260 measured with a DU530 spectrophotometer (Beckman). The measurement was done after puri¢cation using the QIAquick PCR Puri¢cation kit (QIAGEN).

Evaluation of Bacteria-universal primers
In concordance with the ¢ndings of Marchesi et al. [8], 63F-1389R produced PCR products more e⁄ciently than the primer 27F (Table 1). However, in contrast to Marchesi et al. [8], we found almost no spirochetal clones from libraries K, M and N, prepared using 63F-1389R, whereas they were the most abundant clones in the other libraries (Fig. 1). Even the predominant phylotype, Treponema Rs-D73 (AB088857) [12], was rarely found in libraries K, M and N. Since there were no or only a few mismatches against the dominant groups (spirochetes, Bacteroides, clostridia and Termite Group I (TG-I)), with primers 27F, 39F and 64F, it is unlikely that these primers ampli-¢ed spirochetes preferentially as the predominant clones. Therefore, it is probable that this was caused by preferential annealing, due to six to eight mismatches between 63F and the corresponding region of the spirochetal sequences ( Table 2). Considering the fact that 63F-1387R was able to amplify spirochetes well from monocultures [8], it should be noted that con¢rmation of a PCR primer set amplifying a target from a monoculture does not ensure unbiased ampli¢cation from heterogeneous templates.
From this point of view, a mismatch at the 3P-end between a PCR primer and its target seems more critical, even though Marchesi et al. [8] reported that 63F-1387R retained its competence when a mismatch at the 3P-end existed. In our study, no clones of Rs-D17 (AB089048) [12], one of the dominant TG-I phylotypes, were found from libraries K, M or N, whereas Rs-D17 clones were found from all of the libraries prepared using the other primer sets. Since the other dominant TG-I phylotypes, Rs-D95 (AB089049) and Rs-D43 (AB089050) [12], were found abundantly from K, M and N, the lack of Rs-D17 is solely attributable to the mismatch at the 3P-end, that is, between cytosine in 63F and thymine in Rs-D17 ( Table 2). In addition, when we investigated 420 clones from the new clone library, prepared using 63F-1389R, we found no Rs-D17 clones again. On the other hand, when we used T63F of which the 3P-end was converted from cytosine in 63F to thymine, we found that 88 of the 159 analyzed clones were those Rs-D17. The result indicates that a single mismatch at the 3P-end of a PCR primer can give a totally di¡erent ¢gure of the bacterial community structure.
A possible occurrence of preferential annealing due to mismatches was also found from the other primer sets. In libraries H and J, prepared using 41F-1389R, the number of clones related to Bacteroides was limited to as low as seven and three, respectively (Fig. 1). This is probably due to adenine at the fourth position from the 3P-end of 41F mismatching against guanine at the corresponding posi-  Fig. 1 in our previous paper [12]). Ninety-six clones were analyzed for each clone library. The clones were a⁄liated with the taxonomic groups by phylogenetic analysis in our previous study [12]. tion in Bacteroides (Table 2). In the 480 clones from the ¢ve libraries A^E (Table 1), prepared using 27F-1389R, we found no clones of the N-proteobacterial phylotype Rs-N31 (AB089104), even though it was one of the dominant phylotypes in most of the other clone libraries [12] (Fig. 1). It is possible that this was also caused by preferential annealing of 27F, although we cannot discuss it here due to a lack of information on the sequences in the corresponding region.
These biased ampli¢cations seem to be caused by primer mismatches of the forward primers, because no obvious di¡erence was found between the libraries prepared using the reverse primers 1389R and 1492R (Fig. 1). Furthermore, the biased ampli¢cation with 63F or 41F was improved when the degenerate primers 64F and 39F were used, which were designed to match spirochetal and Bacteroides sequences ( Fig. 1 and Table 2). These results suggest that we should not underestimate the e¡ect of primer mismatches.
In spite of the inferiority of 63F-1389R as a Bacteriauniversal primer set, it is still useful as long as we realize its characteristics. In the case of our study, the use of 63F-1389R enabled us to amplify a great diversity of non-spirochetal clones, which would not have been acquired using the other primers [12]. In the present study, we found a further 44 non-spirochetal clones from the new clone libraries. We obtained the new phylotypes of 34 clostridia (AB100461^94), one Enterococcus (AB100496), one Desulfosporosinus (AB100497), ¢ve Bacteroides (AB100456^60), one K-Proteobacterium (AB100498), one TM7 bacterium (AB100499) and one unidenti¢ed bacterium (AB100495) clustered with Rs-A23 (AB089057) and Rs-J96 (AB089068). Therefore, the total number of 16S rDNA phylotypes from the gut of R. speratus now becomes 312 (44+268). In addition, as reported by Marchesi et al. [8], 63F-1389R is a very e¡ective primer set for amplifying monocultured bacteria; we often experienced that it worked very well even for samples where 27F-1392R or -1492R was completely useless (data not shown).

E¡ect of annealing temperature and number of PCR cycles
For all of the primer sets, the number of phylotypes found from the clone libraries was the same or greater when the annealing temperature was lowered (Table 1). When comparing libraries K and N, prepared using 63F-1389R at 45 ‡C and 55 ‡C, respectively, a signi¢cant di¡erence in the expected diversity of detectable phylotypes was demonstrated by rarefaction analysis (Fig. 2). The number of phylotypes found from library K was 66, much greater than the 50 in library N (Table 1). This is concordant with a report using a model system [4], which found that a lower annealing temperature tended to allow annealing at a mismatched site. However, the e¡ect of the 10 ‡Cshiftdown of annealing temperature, from 55 ‡C to 45 ‡C, was limited to the increase of diversity within the range Number of clones Expected number of phylotypes Fig. 2. Rarefaction plots for clone libraries K, M and N, prepared using primers 63F and 1389R. Each clone library was generated from the PCR products, using 45, 50 or 55 ‡C as the annealing temperature [12]. The expected numbers of phylotypes are plotted with 95% con¢dence intervals. The expected numbers of phylotypes were signi¢cantly di¡erent (P 6 0.05) between K and N, in the range 50 6 (number of analyzed clones) 6 96.  less than a 0.15 nucleotide di¡erence per site (Fig. 3). This means that the 10 ‡C-shiftdown of annealing temperature did not allow an increase of division-level diversity as, indeed, shown as the absence of spirochetal clones in libraries K, M and N (Fig. 1). This implies that, when primer mismatches are severe, a considerable shiftdown of annealing temperature cannot compensate the mismatches fully.
We also found an e¡ect of reducing the PCR cycle number from 18 (libraries B and D) to 12 (libraries A and C) ( Table 1). The abundance of spirochetal clones, the predominant group in these libraries, increased more in libraries A and C than in B and D (Fig. 1). The tendency was most obvious in the comparison between A and B, both prepared using the lower annealing temperature of 50 ‡C, where there was a signi¢cant increase from 42 to 57% in spirochetal abundance (Fisher's exact probability test, P = 0.0430), when the number of cycles was reduced. Suzuki and Giovannoni [1,11] reported that biased ampli-¢cation occurred when the concentration of PCR products exceeded 2 mM, probably due to self-annealing between the PCR products. The e¡ect was more pronounced as the concentration increased, resulting in a substantial decrease in the proportion of predominant products. Since the concentration of the PCR products in our study for libraries A and C was 1.0 mM after 12 cycles, and 4.2 and 4.0 mM for libraries B and D, after 18 cycles, respectively (Table  1), the di¡erence in abundance of spirochetal clones is probably for the same reason. In this case, the frequency of clones observed in library A should be more representative of the original composition than those in library B. The smaller di¡erence in spirochete abundance between C and D, compared to between A and B, may be attributed to the lower probability of self-annealing between heterogeneous spirochetal fragments at the higher annealing temperature.
The decrease in spirochetal abundance in libraries B and D resulted in an increase in phylogenetic diversity of the clones. The numbers of phylotypes were 57 and 53 in libraries B and D, greater than 51 and 47 in libraries A and C, respectively ( Table 1). The rarefaction plots indicated the greater expected diversity of the clones in the libraries prepared using 18 cycles than in those prepared using 12 cycles (Fig. 4), although the di¡erences were sig-ni¢cant (P 6 0.05) only in the range from 90 to 96 analyzed clones. This tendency was parallel with the reports of Suzuki and Giovannoni, which found that the increase of PCR cycle number resulted in an overestimation of phylogenetic diversity in the samples [1,11].
These results suggest that a primer best matched with diverse bacterial groups, a lower annealing temperature and a decreased number of PCR cycles should be used to minimize ampli¢cation bias, as implied by the previous studies using model samples [1,4]. Although the PCRbased analysis of a microbial community cannot be free from biases and errors completely, the method is still essential to the studies of microbes. It is important that we continue to optimize PCR conditions to obtain the highest quality data. Each clone library was generated from the PCR products, using 45, 50 or 55 ‡C as the annealing temperature [12]. The evolutionary distance was calculated using the Jukes-Cantor model.