Phylogenetic and metagenomic analysis of Verrucomicrobia in former agricultural grassland soil.

The bacterial phylum Verrucomicrobia has a widespread distribution, and is known to be one of the most common and diverse phyla in soil habitats. However, members of this phylum have typically been recalcitrant to cultivation methods, hampering the study of this presumably important bacterial group. In this study, we examine the phylogenetic diversity of the Verrucomicrobia in a former agricultural field and gain access to genomic information via a metagenomic approach. We examined Verrucomicrobia-like 16S rRNA gene sequences recovered from general bacterial and phylum-specific libraries, revealing a dominance of subdivisions 1 and 2. A PCR-based screening method was developed to identify inserts containing verrucomicrobial 16S rRNA genes within a large-insert metagenomic library, and on screening of 28,800 clones, four fosmids were identified as containing verrucomicrobial genomic DNA. Full-length sequencing of fosmid inserts and gene annotation identified a total of 98 ORFs, representing a range of functions. No conservation of gene order was observed adjacent to the ribosomal operons. Fosmid inserts were further analyzed for tetranucleotide frequencies to identify remnants of past horizontal gene transfer events. The metagenomic approach utilized proved to be suitable for the recovery of verrucomicrobial genomic DNA, thereby providing a window into the genomes of members of this important, yet poorly characterized, bacterial phylum.


Introduction
Members of the phylum Verrucomicrobia are abundant in diverse habitats, including soil, water and sediments, as well as extreme environments such as in the Antarctic sample (Pearce et al., 2003), hot springs (60 1C) and low pH 2.0-2.5 habitats (Hou et al., 2008). The lifestyles of some Verrucomicrobia have also been linked to eukaryotic gut environments, including termites (Shinzato et al., 2005), sea cucumbers (Sakai et al., 2003) and humans , or as obligate endosymbionts of nematodes from the genus Xiphinema (Vandekerckhove et al., 2000). In addition, methanotrophs have been identified among the Verrucomicrobia (Dunfield et al., 2007;Pol et al., 2007). Approxi-mately 5% of all characterized 16S rRNA genes are affiliated with the Verrucomicrobia, and this phylum represents 1.2-11% of the total soil bacteria (Sangwan et al., 2004;Kielak et al., 2008). Although only recognized as a separate phylum in the past decade (Garrity & Holt, 2001), their large numbers and broad distribution in soils suggest that they are important members of terrestrial ecosystems.
Although Verrucomicrobia have commonly been recovered from diverse soil environments, relatively little is known about their diversity and distribution with respect to the environmental conditions present in different soilborne habitats. Soil management and nutritional regimes appear to influence Verrucomicrobia community structure (Buckley & Schmidt, 2001), and bacterial 16S rRNA gene library and quantitative PCR analyses have shown that Verrucomicrobia tend to be more dominant in the rhizosphere (Kielak et al., 2008). However, differences between the rhizospheres of different plants tended not to be significant (Kielak et al., 2008). Thus, it appears that Verrucomicrobia represent important rhizosphere colonizers, and yet virtually no information is available to date on the genotypic or the phenotypic characteristics of these bacteria. Furthermore, their importance in terrestrial ecosystems and potential as sources of natural products remains unexplored.
In this study, we therefore sought to gain an insight into the phylogenetic diversity of Verrucomicrobia in a former agricultural field and to develop and apply a cultivationindependent approach towards gaining access to genomic information from this phylum. To address the first of these goals, Verrucomicrobia-like 16S rRNA gene sequences were examined from the libraries recovered using general bacterial and Verrucomicrobia-specific PCR primers. In addition, a large-insert fosmid library (28 800 clones) was constructed from high-molecular-weight DNA isolated from the rhizosphere of Festuca ovina grown at the same field site. A PCRbased screening procedure was developed and implemented to identify clones containing inserts with Verrucomicrobialike 16S rRNA genes. Four inserts, positively determined to be of verrucomicrobial origin, were subsequently subjected to full-length DNA sequence analysis, followed by identification and annotation of ORFs. The genomic properties of verrucomicrobial genomic inserts were studied and patterns of oligonucleotide usage were examined to identify sites of potential horizontal gene transfer (HGT).

Field site and sampling
All soil samples were collected from a former arable field site near Ede (52104 0 N, 5145 0 E), the Netherlands, where an ecosystem restoration experiment was initiated after the 1995 harvest season within the European project 'Changing land usage: enhancement of biodiversity and ecosystem development' . Detailed characteristics of the soil and plant diversity treatment plots are provided in Van der Putten et al. (2000) and Bezemer & Van der Putten (2007).
Samples for the construction of general bacterial 16S rRNA gene libraries were derived from collective rhizosphere and bulk soil from each of the plant diversity treatments in the above experiment as described by Kielak et al. (2008). For the construction of Verrucomicrobiaspecific 16S rRNA gene libraries, rhizosphere soil samples were collected in August 2005 from F. ovina L. plants from within monoculture plots of this species, adjacent to the main plant diversity treatment experiment, as well as from Lotus corniculatus L. plants from various locations within the borders of the plots described above (Bezemer et al., 2006;Kielak et al., 2008).
Rhizosphere and bulk soil samples were taken as follows: roots were first separated and shaken to remove loosely adhering soil. Pieces of roots with the remaining adhering soil were cut and taken as a combined sample of rhizosphere and root surface (rhizoplane), further referred to as rhizosphere samples. Soil remaining behind after the removal of plant material was defined as bulk soil. All soil samples were frozen at 20 1C for approximately 2 weeks before DNA extraction.

DNA isolation
For the construction of all 16S rRNA gene libraries, DNA was isolated from 250 mg wet weight soil using the Ultra Clean TM DNA Isolation Kit, according to the manufacturer's specifications (MoBio Laboratories, Solana Beach, CA). For large-insert metagenomic library construction, high-molecular-weight DNA was extracted from 10 g of soil (wet weight) as described previously (Kielak et al., 2009). The construction of the metagenome library utilized the CopyControl Fosmid Library Production Kit (Epicentre, Madison, WI) according to the manufacturer's protocol. 5 0 -phosphorylated blunt-end DNAs were ligated into the pCC1FOS vector and transformed into Escherichia coli EPI300 (Epicentre). Recovered clones (28 800) were stored in 96-well plates at À 80 1C until use.

PCR-based cloning strategies
Bacterial 16S rRNA gene libraries were constructed as described by Kielak et al. (2008). For Verrucomicrobiaspecific libraries, a phylum-specific forward primer, Ver53F, was combined with a bacterial reverse primer, 1378r (Stevenson et al., 2004). The specificity of the primer was confirmed by probe match check against the Silva 16S rRNA gene database (Pruesse et al., 2007). Thermocycling consisted one 2-min denaturation step at 96 1C, followed by 30 cycles of 30 s at 92 1C; 60 s at 65 1C and 45 s at 68 1C. The terminal elongation step was extended by 10 min, and reactions were cooled to 10 1C upon completion.
PCR product size (approximately 1300 bp) and integrity were examined by standard agarose gel electrophoresis and ethidium bromide staining. PCR products were purified using the Qiagen PCR Purification Kit (Qiagen, Venlo, the Netherlands) and cloned into the pGEM-T vector, as per the manufacturer's protocol (Promega, Madison, WI), for transformation into E. coli JM109 competent cells using the pGEM-T Vector System. Fifteen positive transformants were selected randomly for each rhizosphere soil library, screened for inserts of proper length by PCR with the vector-encoded primers T7 (5 0 -TAATACGACTCACTATAGGG-3 0 ) and SP6 (5 0 -TATTTAGGTGACACTATAG-3 0 ) and sequenced using the same two primers (Macrogen, Seoul, South Korea). Given the virtual identity of Verrucomicrobia-specific PCR denaturing gradient gel electrophoresis community profiles from F. ovina L. and L. corniculatus L. rhizosphere samples (Kielak et al., 2008), and the low number of clones per library, these two libraries were treated as one phylumspecific library for all further analyses.

Selection of clones affiliated with the Verrucomicrobia
All clones derived from PCR-based cloning strategies were screened for potential chimera structures using CHIMERA-CHECK (http://rdp.cme.msu.edu) and BELLEROPHON (Huber et al., 2004) software. After the screening of libraries, 26 clones from the Verrucomicrobia-specific libraries and nine clones previously identified from the general bacterial libraries (Kielak et al., 2008) were selected for further analysis. All sequences were also examined by BLAST (Altschul et al., 1997) to identify the most closely related database sequences.

Screening of the metagenomic library for verrucomicrobial inserts
A total of 28 800 clones in the metagenome library were subjected to a PCR-based screening strategy using a seminested clone pooling strategy (Kielak et al., 2008(Kielak et al., , 2009). Briefly, clones were cultured overnight in 96-well plates, with the contents of four plates (384 clones) combined in a single plasmid extraction (QIAprep Spin Miniprep Kit; Qiagen). The resulting mixed templates were subjected to PCR with the pAF/1378R primers (Edwards et al., 1989), followed by primers Ver53F/1378R (Stevenson et al., 2004) as described above. If a PCR product was detected, clone pooling was reduced to a single plate. Pooling was further reduced to a single row of a culture plate and eventually a single clone for subsequent reactions that produced positive results. Four clones were identified of putative verrucomicrobial origin.

Fosmid clone sequencing and assembly
Four fosmid DNA preparations for each positive clone (designated clones 106, 118, 184 and 286) were performed using the Qiagen Midi Kit, and DNA concentrations were quantified using a Nanodrop (Thermo Scientific) instrument. We then pooled 2.5 mg from each fosmid DNA sample for a total of 10 mg DNA. The pooled fosmid DNA was subjected to pyrosequencing using a 1/2 FLX run on a Roche/454 Life Sciences Genome Sequencer. The resulting data were assembled using NEWBLER ASSEMBLY software v1.1.02.15 (Roche/454 Life Sciences). A total of 32 Mb of sequence data were assembled into 52 (500 kb) contigs. All large contigs were 'shredded' into 1000-bp fragments, overlapping one another by 100 bp. The resulting 192 'shreds' were reassembled with PHRED/PHRAP software, yielding 21 contigs, totalling 137 016 bp. The four fosmid clones were also end-sequenced using the vector-targeting primers T7 and T7 RV primer (5 0 -ATGACCATGATTACGCCAAG-3 0 ) to facilitate orientation of the fosmid assignment. Gap closure by primer walking was completed with the aid of 32 additional custom primers.

Phylogenetic analyses
All recovered Verrucomicrobium-like 16S rRNA gene sequences (four from metagenomic, nine from general bacterial and 26 from phylum-specific libraries) were aligned using the ARB software package (Ludwig et al., 2004) along with 68 sequences obtained from public databases, and the alignment was manually edited based on the secondary structure. Phylogenetic reconstruction was performed by neighborjoining (NJ), maximum likelihood (ML) and Bayesian inference (BI). MRMODELTEST 2.2 (Nylander, 2004) was used to determine the most appropriate nucleotide substitution model. The likelihood ratio test, as well as the Akaike information criterion, identified the general time-reversible model with invariable sites and the G-shaped distribution of substitution rates as the best-fitting model. The NJ and ML trees were constructed using the PHYLIP 3.67 package (Felsenstein, 2005) with the model parameters calculated by MRMODELTEST. The NJ tree was bootstrapped 1000 times, and the ML tree 100 times. The Bayesian tree was constructed using the MRBAYES 3.1.2 program (Ronquist & Huelsenbeck, 2003). Four independent computations, with random starting trees and Markov chains, were run for 3 000 000 generations with a sampling frequency of 100 generations. The first 600 000 generations were discarded as burn-in.

Comparative genomics and identification of potential sites of HGT
The gene orders within the recovered verrucomicrobial fosmid inserts and ORF similarities were compared with the eight publicly available verrucomicrobial draft genome sequences by pairwise TBLASTX using BLAST v2.2.18 (ftp://ftp. ncbi.nih.gov; Altschul et al., 1990) and ACT software for gene/ ORF visualization (Teeling et al., 2004;Carver et al., 2005). The e-value threshold for ORFs compared with these genomes was 10 À30 , because the diversity within the Verrucomicrobia phylum and the low number of available Verrucomicrobia genomes limit the chance for perfect match detection.
Searches for GC islands were performed using CPGFINDER (SoftBerry, http://linux1.softberry.com). For identification of potential HGT regions, the method described by Tamames & Moya (2008) was applied to examine tetranucleotide frequencies. Fosmid sequences were edited to include all ORFs after removal of rRNA operons, and the resulting sequences were cut into 300-bp fragments and tiled with an overlap of 290 bp for each subsequent fragment. Pearson's correlation coefficients of tetranucleotide frequencies for all fragments were calculated using TETRA software (Teeling et al., 2004), and correlation matrix visualization was performed in R (R statistical programming environment) using the 'GPLOTS' package.

Sequence accession numbers
The 16S rRNA gene sequences described in this study have been deposited in GenBank under accession numbers FJ822616-FJ822655. The accession numbers of fosmid clone sequences are as follows: clone 106, FJ872372; clone 118, FJ872373; clone 184, FJ872374; and clone 286, FJ872375.

Results and discussion
Several studies based on sequencing of 16S rRNA genes have suggested that Verrucomicrobia represent one of the more dominant phyla in soil environments (Janssen, 2006;Kielak et al., 2008). They typically represent between 2% and 8% of the total bacterial community, with the highest densities observed in the rhizosphere (Kielak et al., 2008). This relatively high abundance suggests that members of this phylum may play significant roles in the soil environment and have important interactions with plants. This study was therefore geared towards recovering information regarding the phylogenetic diversity of Verrucomicrobia in soil, as well as providing tools for gaining access to genomic information from this potentially important phylum.
Phylogenetic analyses (NJ, ML and BI) of 16S rRNA gene sequences recovered from three independent cloning approaches showed strong phylogenetic support for the established clustering system within the Verrucomicrobia (Hugenholtz et al., 1998;Schlesner et al., 2006) (NJ tree in Fig. 1; ML and BI trees not shown). Subdivision 4 was found to be the basal group in all phylogenetic trees, followed by subdivision 3. Subdivisions 1 and 2 appear to be sister groups (Fig. 1).
In agreement with other studies (Janssen, 2006), we identified subdivision 2 (Spartobacteria) to be the most dominantly detected cluster within all libraries, representing 18 of 27, six of eight and three of four clones for the Verrucomicrobia phylum-specific, general bacterial and metagenomic libraries, respectively. Subdivision 1 (Verrucomicrobiae) was also represented in all libraries with one identified metagenomic clone, two clones in the bacterialspecific library and four in the Verrucomicrobia-specific library. Subdivisions 3 and 4 were relatively rare and were only identified in the Verrucomicrobia phylum-specific libraries (three clones and one clone for subdivisions 3 and 4, respectively). Although all three cloning strategies yielded highly similar results, it is premature to state that the three  methods are truly similar in their recovery of sequences from across the breadth of the phylum, given the low number of clones examined. No clones closely related to subdivisions 5, 6 or 7 were identified in any of the libraries, and these subdivisions were therefore excluded from the presented phylogenetic analyses. The number of clones from the metagenomic library determined to contain genomic inserts of verrucomicrobial origin was rather low (four fosmid clones), but within the range (three to nine positive clones) that would be expected, given the library size (28 800 clones), size of the insert (32 kb), estimated average genome size (6000 kb), 16S rRNA gene copies per genome (1-3) and the relative abundance of Verrucomicrobia within the soil community sampled (4.9-5.7%) (Kielak et al., 2008). The average insert size per positive fosmid clone was 32 AE 1.5 kb (Table 1), yielding a total of approximately 130 kb of Verrucomicrobia genetic information. All four genome fragments revealed typical bacterial rrn operon organization, with 16S rRNA gene, followed by tRNA Ile , the 23S rRNA gene and then the 5S rRNA gene. In two clones (118 and 286), the 16S rRNA and tRNA Ile genes were separated by tRNA Ala .
The total G1C content of the fosmid inserts ranged from 60% to 62%, with a range from 61% to 64% for coding regions. These values are comparable to those found for verrucomicrobial genomes for which drafts are available, with G1C contents ranging from 52% for Ellin514 (http:// www.jgi.doe.gov) to 65% for O. terrae PB90-1 (NC010571). Methylacidiphilum infernorum V4 (NC010794) is a marked exception, with a G1C content of only 45%. Pearson's correlation coefficients for the four verrucomicrobial fosmid inserts were also calculated based on tetranucleotide usage patterns. Using this measure, clones 118 and 106 were most similar (r = 0.78), followed by the clone pair 118 and 286 (r = 0.75). The lowest correlation was between clones 106 and 286 (r = 0.62), while clone 184 was similarly distant to the other three clones (r = 0.64). The divergence in 16S rRNA genes followed a similar pattern, with the lowest dissimilarity (12%) between clones 118 and 106 and the highest dissimilarity (20%) between clones 184 and 286 ( Table 2).
The numbers of putative protein-encoding genes assigned for each clone varied between 22 ORFs (clone 184) and 28 ORFs (clone 118) ( Table 1). All ORFs were compared with the sequences available in public databases with special emphasis on the sequences available for the eighth Verrucomicrobia genome project currently underway. A total of 10% of the predicted genes, found predominantly within clones 118 and 286, showed low or no similarity to sequences deposited in public databases (Supporting Information, Tables S1 and S2). The number of ORFs with best matches to verrucomicrobial sequences ranged between 11 out of 22   (Table S1); 284 (Table S2); 184 (Table S3) and 106 (Table S4). The ribosomal operon illustrated in the figure in each fosmid was excluded from the analysis.
for clone 184 (Table S3) and 23 out of 26 for clone 106 (Table S4). For clone 184, the genome of C. flavus Ellin428 provided the best match for four of the 11 ORFs with verrucomicrobial best matches, whereas all 23 of the ORFs from clone 106 matched this genome. However, the 16S rRNA gene from clone 106 showed lower similarity to the Ellin428 strain compared with clone 118 (96% identity), which also showed several best BLAST hit matches with Proteobacteria, other verrucomicrobial strains and even Planctomycetes (Table S1).
Assignment of ORFS to COG functional categories revealed that all four clones encoded proteins involved in information storage/processing, metabolism and cellular processes/signalling, with clone 106 having the greatest proportion of COG hits belonging to metabolism or cellular processes and signalling. A large number of ORFs (between nine and 15 per clone) could not be classified into any COG category (Table S5).
The three fosmid clones affiliated with subdivision 2 (106, 118 and 286) were compared with the draft genome of Ellin428 by TBLASTX to examine patterns of gene synteny and homology. Homologous protein-encoding regions identified in these comparisons were no longer than 5 kb and contained a maximum of four ORFs (Fig. S1). Clone 184, affiliated with subgroup 1, contained no regions of conserved gene order compared with the other clones or available genomes. Thus, even though some of the recovered clones were very close to sequenced verrucomicrobial strains, based on 16S rRNA gene comparisons, large levels of divergence appear at the level of gene order and origin. Together, these observations highlight the fact that available sequence information represents but a small fraction of the depth of genomic information across the phylum as a whole.
Oligonucleotide frequencies within DNA sequences follow species-specific patterns (Nei & Gojobori, 1986;Sueoka, 1988;Karlin et al., 1998;Bellgard et al., 2001;Bentley & Parkhill, 2004). Such patterns carry an innate phylogenetic signal, and aberrant patterns of oligonucleotide usage may thus be indicative of regions recently acquired from foreign genetic sources (Pride et al., 2003;Abe et al., 2005). Upon analysis of tetranucleotide usage patterns (Tamames & Moya, 2008), clones 106 (Fig. 2a) and 184 (Fig. 2b) each appeared to contain one region of potential horizontal transfer. Several regions of aberrant tetranucleotide usage were observed in fosmid 286 (Fig. 2c), while no such regions were found in fosmid 118 (Fig. 2d). The ORFs identified within the regions of aberrant tetranucleotide usage typically exhibited only low levels of similarity to available database entries (29% identity, e-value = 7 Â 10 À45 for clone 106; 43% identity, e-value = 0.002 for clone 184; and 32% identity, e-value = 7.9 for clone 286). Interestingly, with the exception of fosmid 106, these regions corresponded with ORFs that did not yield verrucomicrobial BLAST matches. The largest potential HGT region detected (in clone 286) contained several genes (UvrB/UvrC protein, putative peptidoglycan peptidase and ankyrin) flanked by insertion sequence elements, an organization typical of mobile DNA regions. It is noteworthy that the putative ankyrin repeat belongs to a functionally diverse group of eukaryotic proteins, for which HGT has been implicated in its transfer to prokaryotic genomes (Bork, 1993).
The cultivation-independent approach used in this study allowed for genomic information to be recovered specifically from the Verrucomicrobia, but a large number of clones was required for the identification of a relatively small number of verrucomicrobial clones. Despite the recent advances in shotgun metagenomic analysis, it remains clear that metagenomes derived from highly diverse soil environments will not yield large genomic scaffolds (Tringe et al., 2005;Kowalchuk et al., 2007). Thus, directed approaches, such as the one used here, will continue to be necessary for the recovery of extended regions of verrucomicrobial genomic information from soil. Although this study has provided some insight into the genome content and genomic properties of some members of the phylum Verrucomicrobia, defining the ecological and evolutionary properties from relatively small genomic regions (32 kb) remains problematic. Much larger sample sizes will be necessary in order to retrieve sufficient information to make ecological inferences, and novel high-throughput screening methods will need to be used to achieve this goal (Park et al., 2008). Also, the approach utilized here is currently restricted to regions adjacent to rRNA operons. Now that verrucomicrobial genomic sequences are becoming available, detecting verrucomicrobial genomic fragments could be achieved by endsequencing of large insert libraries followed by sequence comparison and binning. Particularly exciting are methods of single-cell sequencing and microcultivation (Abulencia et al., 2006;Ingham et al., 2007;Podar et al., 2007), which could provide novel windows into the genomic diversity of this important bacterial phylum.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Fig. S1. Comparative genomic organization of fosmids in relation to Ellin428. Table S1. Identified ribosomal genes and predicted ORF of the verrucomicrobial genomic fragment 118. Table S2. Identified ribosomal genes and predicted ORF of the verrucomicrobial genomic fragment 286. Table S3. Identified ribosomal genes and predicted ORF of the verrucomicrobial genomic fragment 184. Table S4. Identified ribosomal genes and predicted ORF of the verrucomicrobial genomic fragment 106. Table S5. Number of ORFs of Verrucomicrobia genomic fragments (106, 118, 184, and 286) assigned to different COG functional categories.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.