-
PDF
- Split View
-
Views
-
Cite
Cite
Florian Humily and others, Development of a targeted metagenomic approach to study a genomic region involved in light harvesting in marine Synechococcus, FEMS Microbiology Ecology, Volume 88, Issue 2, April 2014, Pages 231–249, https://doi.org/10.1111/1574-6941.12285
Close - Share Icon Share
Abstract
Synechococcus, one of the most abundant cyanobacteria in marine ecosystems, displays a broad pigment diversity. However, the in situ distribution of pigment types remains largely unknown. In this study, we combined flow cytometry cell sorting, whole-genome amplification, and fosmid library construction to target a genomic region involved in light-harvesting complex (phycobilisome) biosynthesis and regulation. Synechococcus community composition and relative contamination by heterotrophic bacteria were assessed at each step of the pipeline using terminal restriction fragment length polymorphism targeting the petB and 16S rRNA genes, respectively. This approach allowed us to control biases inherent to each method and select reliable WGA products to construct a fosmid library from a natural sample collected off Roscoff (France). Sequencing of 25 fosmids containing the targeted region led to the assembly of whole or partial phycobilisome regions. Most contigs were assigned to clades I and IV consistent with the known dominance of these clades in temperate coastal waters. However, one of the fosmids contained genes distantly related to their orthologs in reference genomes, suggesting that it belonged to a novel phylogenetic clade. Altogether, this study provides novel insights into Synechococcus community structure and pigment type diversity at a representative coastal station of the English Channel.
Introduction
During the last decade, the development of culture-independent approaches has allowed scientists to make a leap forward in microbial ecology by showing that the genetic diversity of natural populations is strongly underrepresented in culture collections (Rusch et al., 2007; Pedrós-Alió, 2012). Indeed, the advent of high-throughput sequencing has granted scientists access to extensive gene content information on the different members of natural communities and even sometimes to entire metabolic pathways or whole genomes (Tringe & Rubin, 2005; Martín et al., 2006; Wooley et al., 2010). This approach proved particularly successful in environments with low complexity (Tyson et al., 2004; Cuadros-Orellana et al., 2007). However, environmental genomics (also called metagenomics) often fails to relate diversity information of specific microorganisms to their ecological functions in more complex environments, in which this link is often only accessible for the dominant members of the microbial community (Pedrós-Alió, 2012).
To characterize the gene repertoire of a particular group, reducing the sample complexity appears to be an appropriate strategy. Analysis of subcommunities exhibiting lower diversity, such as specific taxa or populations (Kalyuzhnaya et al., 2008) or even single cells (for a review, see Stepanauskas, 2012), is now possible by an approach termed ‘targeted enrichment’ (Hallam et al., 2006). The latter comprises focusing on a particular cell population, based on its specific phenotypic or spectral characteristics. In marine ecology, fluorescence-activated cell sorting by flow cytometry has become one of the most popular isolation methods, by allowing high-throughput cell separation based on various parameters such as light scatter and natural or induced fluorescence (Sekar et al., 2004; Stepanauskas & Sieracki, 2007; Woyke et al., 2009; Yoon et al., 2011). After cell or population enrichment, sequencing efforts can also be directed toward specific genes or metabolic pathways (i.e. targeted metagenomics), selected using different screening methods (for a review, see Suenaga, 2012). However, these targeted methodologies require working with low cell numbers, and thus low DNA amounts, often limiting for current sequencing procedures. This constraint can be circumvented by the use of whole-genome amplification (WGA), enabling acquisition of enough DNA for sequencing from a very low amount of DNA or cells (Abulencia et al., 2006; Podar et al., 2007; Chen et al., 2008; Lepère et al., 2011) or even single cells (Zhang et al., 2006; Rodrigue et al., 2009). Multiple displacement amplification (MDA) emerged as the reference technique, due to the high processivity and strand displacement activity of the phi29 enzyme (Dean et al., 2001). Although this technique is not devoid of biases, due to chimera formation (Lasken & Stockwell, 2007; Chen et al., 2008) or differential amplification of some taxa with regard to others (Abulencia et al., 2006), the combination of flow cell sorting and WGA led to significant advances in recent years. This notably includes the study of distribution, genetic diversity, and gene content of various uncultured phytoplankton groups (Cuvelier et al., 2010; Lepère et al., 2011; Mazard et al., 2011; Vaulot et al., 2012; Malmstrom et al., 2013) or the acquisition of environmental genomes, such as UCYN-A, starting from only 5000 flow-sorted cells (Tripp et al., 2010).
In this context, Synechococcus constitutes a particularly good candidate for targeted metagenomic studies as it is the second most abundant single-celled cyanobacterium in marine ecosystems (Scanlan et al., 2009) and plays a key role in the global carbon cycle (Li, 1994; Richardson & Jackson, 2007; Jardillier et al., 2010; Buitenhuis et al., 2012). This oxygenic photoautotroph, which is found in the upper lit layer across large environmental gradients from coastal waters to the open ocean, exhibits a high genetic diversity (Zwirglmaier et al., 2008; Huang et al., 2012; Mazard et al., 2012). Four clades (I to IV) are particularly abundant, with clades I and IV prevailing in nutrient-rich coastal waters, clade II in (sub)tropical waters, and clade III in oligotrophic regimes (Zwirglmaier et al., 2008; Mella-Flores et al., 2011; Mazard et al., 2012). Synechococcus also displays considerable pigment diversity (Six et al., 2007). Like most cyanobacteria, it collects light using phycobilisomes (PBS), that is, pigment–protein complexes composed of phycobiliproteins. Three main pigment types have been defined based on the major phycobiliprotein found in PBS rods: type 1 contains only phycocyanin (PC); type 2, PC and phycoerythrin I (PEI); and type 3, PC, PEI, and PEII (Six et al., 2007). The latter type has been subdivided into five subtypes based on the ratio of the two phycobilins, phycourobilin (PUB, Amax = 495 nm), or phycoerythrobilin (PEB, Amax = 545 nm), linked to these phycobiliproteins (Humily et al., 2013). This PUB to PEB ratio, generally assessed by the fluorescence excitation ratio of these two chromophores, can be low (3a), medium (3b), high (3c), or variable (3d-e). The latter two subtypes correspond to cells performing type IV chromatic acclimation (CA4) that are able to match their pigmentation with incident light (Palenik, 2001; Everroad & Wood, 2006; Humily et al., 2013). CA4 subtypes can be further distinguished, by adding a suffix (A or B), to indicate the occurrence of one of the two CA4 genomic island types, CA4-A or CA4-B, in their genomes. The seven pigment types/subtypes also differ in their content of genes involved in the synthesis and regulation of PBS rods, which are gathered into a specific genomic region ranging in size between 9 and 30 kbp (Six et al., 2007). Although this pigment diversity likely allows these organisms to colonize the large variety of light environments existing in the marine ecosystem, from green turbid coastal to blue oligotrophic waters (Olson et al., 1988, 1990; Partensky et al., 1999), the influence of pigmentation on the niche partitioning of Synechococcus spp. remains poorly understood.
In this study, we have developed a metagenomics approach to specifically target PBS genomic regions, the gene content of which can provide valuable information regarding the pigment types present in natural Synechococcus populations without prior knowledge of their spectral properties as well as about the evolutionary origin and regulation of PBS biosynthesis genes. This method, which consists of a unique combination of cell sorting by flow cytometry, WGA, and fosmid library construction, is actually applicable to retrieve any large DNA fragments from uncultured Synechococcus cells.
Materials and methods
Sample collection and cell sorting
A seawater surface sample (8 L) was collected on May 29th, 2012 at 1 m depth at the Service d'Observation en Milieu LITtoral (SOMLIT) Astan long-term station off Roscoff, France (latitude: 48°46′40″N, longitude: 3°56′15″W), using a Niskin bottle mounted on a CTD frame. Characteristics of the seawater at the sampling date were the following: temperature: 12.55 °C; salinity: 35.24 psu; and Chl a: 0.60 μg L−1. Nutrients were also measured according to standard protocols (Verdouw et al., 1978; Aminot & Kérouel, 2007): NH4+: 1.40 μM;
:1.10 μM;
: 0.17 μM;
: 0.22 μM. Within 2 h after being collected, the sample was prefiltered through 3-μm pore-size polycarbonate filters (Millipore, Molsheim, France) to remove large planktonic cells and particles, then concentrated (229-fold) by tangential flow filtration (TFF) on a 100 000 MWCO membrane, as previously described (Marie et al., 2010). Picophytoplankton cell concentrations were determined by flow cytometry using a FACS Canto II (Becton Dickinson, San Jose, CA), as detailed in the study by Marie et al. (1999).
Synechococcus cells were sorted using a FACS Aria flow cytometer (Becton Dickinson). To prevent sample contamination, the flow cytometer was cleaned by a succession of 2% detergent solution (Hellmanex® II, Hellma GmbH & Co. KG, Germany) in warm water, for at least 2 h followed by overnight rinsing with MilliQ water, as described by Stepanauskas & Sieracki (2007). The phosphate-buffered saline (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 2 mM KH2PO4, pH 7.5) used as sheath fluid (pressure of 70 psi) was prepared by dissolving combusted (4 h at 450 °C) salts in MilliQ water, UV-treated for 2 h, autoclaved at 121 °C for 20 min, and then filtered through a 0.2-μm pore-size Stericup® filter (Millipore) before transferring into a UV-treated tank. Prior to sorting, a 0.2-μm pore-size Sterivex® filter (Millipore) was connected on the sheath fluid line injection, just before the flow chamber.
A three-step sorting procedure was used to sort Synechococcus cells to reduce as much as possible the number of co-occurring heterotrophic cells (Supporting Information, Fig. S1). Cells were first sorted based on their natural autofluorescence using the ‘purity’ mode and by triggering the signal on both red chlorophyll fluorescence and side scatter. Then, two additional sorts were performed to screen out heterotrophic bacteria and picoeukaryotes after DNA staining for 15 min using a combination of two nucleic acid stains, SYTO-9 and SYTO-13, at a final concentration of 5 μM each (Invitrogen, Carlsbad, CA; del Giorgio et al., 1996; Lebaron et al., 1998; Gasol et al., 1999). Between each sort, the sample injection and flow chamber lines were cleaned for 10 min using an alkaline detergent (Bio-Rad, Hercules, CA) and rinsed for at least 10 min with autoclaved MilliQ water. Sorted cells were collected in UV-treated Eppendorf tubes, quickly spun down, and kept on ice until WGA or DNA extraction.
Whole-genome amplification
WGA reactions were carried out under a HEPA/UV3 PCR Workstation (UVP, Cambridge, UK) using the Genomiphi v2 kit (GE Healthcare, Waukesha, WI). Lysis (400 mM KOH, 100 mM DTT, 10 mM EDTA) and neutralization buffers (400 mM HCl, 600 mM Tris-HCl, pH 7.5) were filtered through a 0.2-μm pore-size MiniSart syringe filters (Sartorius Stedim Biotech GmbH, Goettingen, Germany) and UV-treated for 30 min just before WGA. All samples were denatured using a chemical procedure that proved to be more suitable than thermal denaturation to obtain large-size amplification products, necessary for fosmid library construction (Fig. S2). Approximately 2500 sorted Synechococcus cells (corresponding to 5 μL in our sorting conditions) were lysed by adding 1 μL of lysis buffer and incubated for 10 min on ice before adding 1 μL of neutralization buffer. WGA reactions were carried out in 24 μL final volume by adding sample buffer (7 μL), reaction buffer (9 μL), and phi29 enzyme (1 μL; GE Healthcare) and then incubated for 2.5 h at 30 °C before inactivating the enzyme for 5 min at 65 °C. Blank controls with sterile water and sheath fluid collected at the exit of the chamber were also taken for each experiment. For PCRs, WGA products were diluted 10-fold in sterile MilliQ water. WGA products and dilutions were stored at −20 °C until processing.
PCR amplification and T-RFLP screening
Terminal restriction fragment length polymorphism (T-RFLP) targeting either the 16S rRNA gene or petB genes was used to assess the purity of sorting and amplification biases. Genomic DNA was extracted from concentrated natural samples (c. 40 000 Synechococcus cells) and sorted samples (50–100 000 cells) using the DNA Blood and Tissue Kit (Qiagen, Courtaboeuf, France), following the conditions described by Balzano et al. (2012). PCRs were performed in a 25 μL total volume containing 1.25 U GoTaq polymerase (Promega, Madison, WI), 1× polymerase buffer, and 1.5 mM MgCl2. Primers and PCR conditions are listed in Table 0001. Forward primers 16S_27F and petBF were 5′-labeled with 6-carboxyfluorescein (6-FAM, Eurogentec, Seraing, Belgium) for fragment detection by T-RFLP. All amplifications were performed using a GeneAmp PCR system 9700 (Applied Biosystems, Carlsbad, CA), and the program consisted of an initial denaturation step of 5 min at 94 °C, followed by 30 cycles (30 s at 94 °C, 30 s at 55 °C, and 1 min at 72 °C), and a final extension step of 10 min at 72 °C (Table 0001). To assign 16S rRNA gene and petB terminal restriction fragments (T-RFs), Synechococcus sequences from natural populations and cultured isolates (Mazard et al., 2012) were used to generate T-RF databases by virtual in silico digestion using various restriction enzymes (Supporting Information, Tables S1 and S2). While for the 16S rRNA gene, MspI was used as suggested by Mazard et al. (2011), this database allowed us to identify HpaII/HinfII as the most appropriate set of restriction enzymes for petB, for which it provided eight distinct T-RFs for the whole database (Table S2). Both amplicon types were digested for 3 h at 37 °C, using the manufacturer's recommendations (New England Biolabs, Evry, France). After inactivation of the enzyme at 80 °C for 15 min, T-RFs were diluted in Hi-Di™ formamide (Applied Biosystems), separated using a 3130xl Genetic Analyzer and analyzed using the GeneMapper software (Applied Biosystems). Only T-RFs between 85 and 500 bp were included in the analysis.
List of primers used in this study
| Primer name | Sequence (5′ to 3′) | Targeted gene/operon | Use | Product | Amplicon length | Annealing temp. (°C) | dNTPs (μM) | Primers conc. (nM) | # cycles | References |
| 16S_27F | AGAGTTTGATCMTGGCTCAG | 16S rRNA gene | T-RFLP | 16S rRNA gene Eubacteria | 1448 | 55 | 100 | 400 | 30 | Field et al. (1998) |
| 16S_1492R | TACGGYTACCTTGTTACGACTT | Lane (1991) | ||||||||
| petBF | TACGACTGGTTCCAGGAACG | petB | T-RFLP/Pyrosequencing | b6 subunit of cytochrome b6/f | 597 | 55 | 250 | 1 μM | 30 | Mazard et al. (2012) |
| petBR | GAAGTGCATGAGCATGAA | 28–30 | Mazard et al. (2012) | |||||||
| mpeB_220F | TACACCAACCGCAARATGGC | mpeBA | Screening (hybridization/PCR) | PEII α- and β-subunits | 626 | 55 | 300 | 500 | 30 | This study |
| mpeA_263R | ACRAAGTCRCGCTTGCACTT | This study | ||||||||
| SynB1F | ATGCTCGACGCATTCTCCMG | cpeBA | Pyrosequencing | PEI α- and β-subunits | 635 | 55 | 250 | 500 | 30–35 | Everroad & Wood (2006) |
| SynA3R | CGGTGGTGAYGACGGAYTTCAT | Everroad & Wood (2006) | ||||||||
| c/rpcB_244F | TGCCTGCGCGACATGGAGATC | r/cpcBA | Pyrosequencing | PC α- and β-subunits | 523 | 50 | 250 | 500 | 30–40 | This study |
| SyncpcA-Rev | ATCTGGGTGGTGTAGGG | Screening (hybridization/PCR) | 30 | Haverkamp et al. (2008) | ||||||
| cpeC_187F | TTCAAGCTYGGTGARATCAG | cpeC | Screening (hybridization/PCR) | PE-associated linker | 645 | 55 | 300 | 500 | 35 | This study |
| cpeC_832R | TGAAYTGCTCNGAGAGYTTG | This study | ||||||||
| cpeU_224F | GATTTTGGTGGGARAGYAA | cpeU | Screening (PCR) | Putative phycobilin lyase | 228 | 55 | 30 | 500 | 30 | This study |
| cpeU_452R | ACAAACCARCAKCGYTCDAT | This study |
| Primer name | Sequence (5′ to 3′) | Targeted gene/operon | Use | Product | Amplicon length | Annealing temp. (°C) | dNTPs (μM) | Primers conc. (nM) | # cycles | References |
| 16S_27F | AGAGTTTGATCMTGGCTCAG | 16S rRNA gene | T-RFLP | 16S rRNA gene Eubacteria | 1448 | 55 | 100 | 400 | 30 | Field et al. (1998) |
| 16S_1492R | TACGGYTACCTTGTTACGACTT | Lane (1991) | ||||||||
| petBF | TACGACTGGTTCCAGGAACG | petB | T-RFLP/Pyrosequencing | b6 subunit of cytochrome b6/f | 597 | 55 | 250 | 1 μM | 30 | Mazard et al. (2012) |
| petBR | GAAGTGCATGAGCATGAA | 28–30 | Mazard et al. (2012) | |||||||
| mpeB_220F | TACACCAACCGCAARATGGC | mpeBA | Screening (hybridization/PCR) | PEII α- and β-subunits | 626 | 55 | 300 | 500 | 30 | This study |
| mpeA_263R | ACRAAGTCRCGCTTGCACTT | This study | ||||||||
| SynB1F | ATGCTCGACGCATTCTCCMG | cpeBA | Pyrosequencing | PEI α- and β-subunits | 635 | 55 | 250 | 500 | 30–35 | Everroad & Wood (2006) |
| SynA3R | CGGTGGTGAYGACGGAYTTCAT | Everroad & Wood (2006) | ||||||||
| c/rpcB_244F | TGCCTGCGCGACATGGAGATC | r/cpcBA | Pyrosequencing | PC α- and β-subunits | 523 | 50 | 250 | 500 | 30–40 | This study |
| SyncpcA-Rev | ATCTGGGTGGTGTAGGG | Screening (hybridization/PCR) | 30 | Haverkamp et al. (2008) | ||||||
| cpeC_187F | TTCAAGCTYGGTGARATCAG | cpeC | Screening (hybridization/PCR) | PE-associated linker | 645 | 55 | 300 | 500 | 35 | This study |
| cpeC_832R | TGAAYTGCTCNGAGAGYTTG | This study | ||||||||
| cpeU_224F | GATTTTGGTGGGARAGYAA | cpeU | Screening (PCR) | Putative phycobilin lyase | 228 | 55 | 30 | 500 | 30 | This study |
| cpeU_452R | ACAAACCARCAKCGYTCDAT | This study |
Amplicon length was determined on Synechococcus sp. WH8102 (GenBank accession number BX548020).
16S rRNA gene and petB forward primers used for T-RFLP were 5′-labeled with 6-carboxyfluorescein (6-FAM).
For pyrosequencing, the number of cycles was adjusted to obtain a faint band on an agarose gel.
List of primers used in this study
| Primer name | Sequence (5′ to 3′) | Targeted gene/operon | Use | Product | Amplicon length | Annealing temp. (°C) | dNTPs (μM) | Primers conc. (nM) | # cycles | References |
| 16S_27F | AGAGTTTGATCMTGGCTCAG | 16S rRNA gene | T-RFLP | 16S rRNA gene Eubacteria | 1448 | 55 | 100 | 400 | 30 | Field et al. (1998) |
| 16S_1492R | TACGGYTACCTTGTTACGACTT | Lane (1991) | ||||||||
| petBF | TACGACTGGTTCCAGGAACG | petB | T-RFLP/Pyrosequencing | b6 subunit of cytochrome b6/f | 597 | 55 | 250 | 1 μM | 30 | Mazard et al. (2012) |
| petBR | GAAGTGCATGAGCATGAA | 28–30 | Mazard et al. (2012) | |||||||
| mpeB_220F | TACACCAACCGCAARATGGC | mpeBA | Screening (hybridization/PCR) | PEII α- and β-subunits | 626 | 55 | 300 | 500 | 30 | This study |
| mpeA_263R | ACRAAGTCRCGCTTGCACTT | This study | ||||||||
| SynB1F | ATGCTCGACGCATTCTCCMG | cpeBA | Pyrosequencing | PEI α- and β-subunits | 635 | 55 | 250 | 500 | 30–35 | Everroad & Wood (2006) |
| SynA3R | CGGTGGTGAYGACGGAYTTCAT | Everroad & Wood (2006) | ||||||||
| c/rpcB_244F | TGCCTGCGCGACATGGAGATC | r/cpcBA | Pyrosequencing | PC α- and β-subunits | 523 | 50 | 250 | 500 | 30–40 | This study |
| SyncpcA-Rev | ATCTGGGTGGTGTAGGG | Screening (hybridization/PCR) | 30 | Haverkamp et al. (2008) | ||||||
| cpeC_187F | TTCAAGCTYGGTGARATCAG | cpeC | Screening (hybridization/PCR) | PE-associated linker | 645 | 55 | 300 | 500 | 35 | This study |
| cpeC_832R | TGAAYTGCTCNGAGAGYTTG | This study | ||||||||
| cpeU_224F | GATTTTGGTGGGARAGYAA | cpeU | Screening (PCR) | Putative phycobilin lyase | 228 | 55 | 30 | 500 | 30 | This study |
| cpeU_452R | ACAAACCARCAKCGYTCDAT | This study |
| Primer name | Sequence (5′ to 3′) | Targeted gene/operon | Use | Product | Amplicon length | Annealing temp. (°C) | dNTPs (μM) | Primers conc. (nM) | # cycles | References |
| 16S_27F | AGAGTTTGATCMTGGCTCAG | 16S rRNA gene | T-RFLP | 16S rRNA gene Eubacteria | 1448 | 55 | 100 | 400 | 30 | Field et al. (1998) |
| 16S_1492R | TACGGYTACCTTGTTACGACTT | Lane (1991) | ||||||||
| petBF | TACGACTGGTTCCAGGAACG | petB | T-RFLP/Pyrosequencing | b6 subunit of cytochrome b6/f | 597 | 55 | 250 | 1 μM | 30 | Mazard et al. (2012) |
| petBR | GAAGTGCATGAGCATGAA | 28–30 | Mazard et al. (2012) | |||||||
| mpeB_220F | TACACCAACCGCAARATGGC | mpeBA | Screening (hybridization/PCR) | PEII α- and β-subunits | 626 | 55 | 300 | 500 | 30 | This study |
| mpeA_263R | ACRAAGTCRCGCTTGCACTT | This study | ||||||||
| SynB1F | ATGCTCGACGCATTCTCCMG | cpeBA | Pyrosequencing | PEI α- and β-subunits | 635 | 55 | 250 | 500 | 30–35 | Everroad & Wood (2006) |
| SynA3R | CGGTGGTGAYGACGGAYTTCAT | Everroad & Wood (2006) | ||||||||
| c/rpcB_244F | TGCCTGCGCGACATGGAGATC | r/cpcBA | Pyrosequencing | PC α- and β-subunits | 523 | 50 | 250 | 500 | 30–40 | This study |
| SyncpcA-Rev | ATCTGGGTGGTGTAGGG | Screening (hybridization/PCR) | 30 | Haverkamp et al. (2008) | ||||||
| cpeC_187F | TTCAAGCTYGGTGARATCAG | cpeC | Screening (hybridization/PCR) | PE-associated linker | 645 | 55 | 300 | 500 | 35 | This study |
| cpeC_832R | TGAAYTGCTCNGAGAGYTTG | This study | ||||||||
| cpeU_224F | GATTTTGGTGGGARAGYAA | cpeU | Screening (PCR) | Putative phycobilin lyase | 228 | 55 | 30 | 500 | 30 | This study |
| cpeU_452R | ACAAACCARCAKCGYTCDAT | This study |
Amplicon length was determined on Synechococcus sp. WH8102 (GenBank accession number BX548020).
16S rRNA gene and petB forward primers used for T-RFLP were 5′-labeled with 6-carboxyfluorescein (6-FAM).
For pyrosequencing, the number of cycles was adjusted to obtain a faint band on an agarose gel.
Fosmid library construction
Twenty WGA products exhibiting no significant genetic diversity bias with regard to the initial sample, as checked by T-RFLP screening, were pooled for fosmid library construction. As previously suggested, DNA branching was reduced by cutting junctions using an enzymatic treatment with 1000 U of Nuclease S1 (Promega) for 1 h (Zhang et al., 2006; Neufeld et al., 2008). DNA was then purified by precipitation using SureClean reagent (Bioline France, Paris, France), washed using 70% ethanol, and rehydrated in sterile MilliQ water. The metagenomic library was constructed using the CopyControl™ HTP Fosmid Library Production Kit with pCC2FOS™ Vector (Epicentre, Madison). For the end-repair procedure, slight modifications to the standard protocol were made by adding 40 U of DNA polymerase I (New England Biolabs) to the enzymatic mix provided by the manufacturer and incubating the reaction at 25 °C for 3 h. The end-repaired DNA was separated on a 1% low melting point agarose gel (Invitrogen) in Tris Borate EDTA buffer 0.5×, at 200 V and a linearly ramped pulse time of 0.5–1.5 s for 20 h on a pulsed-field gel electrophoresis (PFGE) Chef DRII (Bio-Rad). Fragments between 25 and 55 kbp were cut out of the gel after staining the DNA ladder with ethidium bromide. DNA was purified after agarose digestion for 5 h at 45 °C in the presence of 10 U g−1 GELase (Epicentre) and then concentrated on 100-kDa Amicon centrifugal filter units (Millipore). About 300 ng DNA was added to a ligation reaction with the pCC2FOS™ vector and in vitro packaged according to the manufacturer's protocol. Once titrated, the entire phage suspension was used to infect an exponentially growing Escherichia coli EPI300-T1R culture. The infected metagenomic library was frozen in liquid nitrogen and stored at −80 °C in glycerol (20% final concentration) until screening. After thawing, the desired number of clones was plated on LB agar containing 12.5 μg mL−1 chloramphenicol for subsequent screening.
Fosmid library screening
Colony picking of about 55 000 clones and library screening on a high-density colony filter were performed at the ‘Centre National de Ressources Génomiques Végétales’ (Institut National de Recherche Agronomique, Toulouse, France), as described elsewhere (Gonthier et al., 2010). To increase the number of fosmids containing the PBS region and the average coverage, hybridizations were performed using probes targeting three loci situated at different locations of this region. This included the cpeC gene encoding the PEI linker polypeptide, the mpeBA operon encoding the PEII α and β subunits, and the rpcBA/cpcBA operon encoding the PC α and β subunits (see Fig. 0003 in Six et al., 2007). Probes were synthesized by PCR using the pool of WGA products from the Astan station and degenerate primers that were designed with Primaclade (Gadberry et al., 2005) and manually refined based on sequence alignments of these three loci (Table 0001). PCR products were purified from agarose gels using the Qiaquick Gel Purification kit (Qiagen) and quantified by the Quant-iT™ PicoGreen dsDNA assay (Invitrogen) according to the manufacturer's instructions using a Tecan microplate reader (Tecan, Männedorf, Switzerland). Amplicons were labeled with [α-32P]dCTP by random priming using Ready-To-Go DNA Labelling Beads and purified through Illustra™ ProbeQuant™ G-50 Micro Columns (GE Healthcare). Filter hybridization was performed with a pool of the three probes as described elsewhere (Sambrook & Russel, 2001). Filters were scanned and analyzed using a Storm 860 PhosphorImager (GE Healthcare).
Fosmid selection and sequencing
To select fosmids to be sequenced, positives clones selected by hybridization were secondarily screened by colony PCR targeting the three abovementioned loci as well as another gene of the PBS region, cpeU, encoding a putative PE phycobilin lyase (Six et al., 2007). After induction to high copy number, fosmid DNA was extracted using a NucleoSpin Plasmid Kit (Macherey-Nagel, Hoerdt, France) and quantified with a ND-1000 spectrophotometer (Thermo Scientific, Wilmington). To check the theoretical size and position of each insert within the PBS region, end-sequencing of PCR-positive clones was performed according to the CopyControl™ HTP Fosmid Library Production Kit instructions manual (Epicentre). The actual insert size was then determined by PFGE on 25 selected clones after fosmid DNA digestion by NotI as described above. These clones were pooled before generating a paired-end library, which was then sequenced using the Illumina technology on a MiSeq system (2 × 250-bp reads). This resulted in 6.9 × 106 paired-end reads with insert sizes ranging from 50 to 250 bp and with about 75% of the reads overlapping over at least 10 bp. Reads corresponding to the vector and the E. coli host were removed through mapping using the Geneious Workbench (Biomatters, Auckland, New Zealand). Remaining reads, corresponding to inserts, were then assembled using the clc assemblycell© software (CLCBio, Prismet, Denmark) followed by contig elongation using CAP3 (Huang & Madan, 1999).
454 pyrosequencing
To complement the T-RFLP analyses, petB, r/cpcBA, and cpeBA loci were sequenced by 454 pyrosequencing to assess the diversity of Synechococcus populations at each step of the metagenomic approach: after TFF, cell sorting, and WGA. Amplifications were conducted with ‘fusion’ primers comprising a 454 adaptor, a TCAG key used for amplicon sequencing, and a multiplex identifier (MID) tag for sample identification from a mixed pool. The cpeBA operon was amplified using the primer set SynB1F–SynA3R (Table 0001). Amplification of the r/cpcBA operon was performed using the reverse primer designed by Haverkamp et al. (2008) and a forward primer newly designed using Primaclade (Gadberry et al., 2005) on a manually refined multiple sequence alignment containing Synechococcus sequences from marine, brackish, and freshwater environments. This primer set was then tested on a selection of 14 marine Synechococcus strains, representative of the different pigment types (Six et al., 2007; Humily et al., 2013). PCRs were performed in triplicates, in 50 μL final volume following conditions described in Table 0001, using the minimum number of cycles to obtain a thin band on agarose gel to minimize PCR biases. Amplicons were pooled and purified using the NucleoSpin® Extract II kit (Macherey-Nagel) to obtain a final amount of at least 2 μg DNA. All samples were then pooled in equimolar amounts for 454 pyrosequencing at the Biogenouest® Genomics platform (Rennes, France) on a Roche GS-FLX system (454 Life Sciences, Roche, Brandford, CT) using an emPCR amplicon kit Lib-L according to the manufacturer's protocol.
Sequence treatment and phylogenetic analysis
Low-quality sequences, that is, lacking a key, exhibiting unexpected amplicon lengths (< 250 or > 600 bp) or containing more than 7% undetermined bases, were removed from the analysis. Sequences were denoised using acacia (v 1.52; Bragg et al., 2012) after trimming to 450 bp for petB and to 430 bp for the other loci. Intergenic sequences within the r/cpcB and r/cpcA operons were then removed before clustering using mothur (Schloss et al., 2009). To define an OTU, a cutoff value of 94% nt sequence identity was used for petB, as previously described (Mazard et al., 2012), and 95% for the other loci. OTUs defined by singletons were removed before calculation of rarefaction curves, diversity indices, and coverage values using mothur. Sequences were then used to generate multiple alignments using mafft (G-INS-I option, v 6.953; Katoh et al., 2005).
Taxonomic assignment of each OTU was made by phylogenetic analyses using (1) sequences from the same database as the one used for T-RF identification for petB, and (2) sequences retrieved from 42 public or unpublished marine Synechococcus/Cyanobium genomes for the r/cpcBA and cpeBA loci. Novel r/cpcBA and cpeBA sequences were deposited in GenBank under the accession numbers KF528763–KF528785 and KF528801–KF528826, respectively. Phylogenetic trees of R/CpcBA (340 aa positions) were performed by maximum likelihood using phyml (v3.0; Guindon et al., 2009), a LG+G model (Le & Gascuel, 2008), and a BIONJ starting tree and visualized using figtree v1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/). Robustness of inferred topology was supported by 1000 bootstraps resampling. For cpeBA, Bayesian inference was conducted on 999 nt positions using mrbayes (v3.1.2; Huelsenbeck & Ronquist, 2001) and started with a random tree, run for 2 million generations in four chains, sampling every 100 generations and burn-in of 5000 trees. For both loci, pyrotag sequences and a few additional partial sequences retrieved from GenBank were then aligned with the reference alignments and added to consensus trees using the ADD_BY_PARSIMONY algorithm implemented in ARB (Ludwig et al., 2004). Environmental gene sequences for petB, r/cpcBA, and cpeBA amplicons obtained in this study were deposited in the Sequence Read Archive (SRA) database under the following accession number: SRP028891.
Results and discussion
Although major breakthroughs have been achieved in our understanding of the composition and dynamics of marine microorganisms (Pedros-Alio, 2006; DeLong, 2009; Gilbert & Dupont, 2011), accessing the functional diversity of natural populations remains tricky. Indeed, the ability of an organism to perform a particular biochemical pathway generally requires the co-occurrence of several genes, which in prokaryotes are often organized as clusters. One way to deal with this problem is to sequence single amplified genomes (SAG). However, this approach currently only allows one to amplify and sequence partial genomes with no control on the sequenced genomic regions (Zhang et al., 2006; Rodrigue et al., 2009; Malmstrom et al., 2013) and requires a deep sequencing effort to access the diversity of biochemical pathways occurring within a population.
Here, we developed a metagenomic pipeline, combining flow cytometry cell sorting, WGA, and fosmid library construction, which allowed us not only to specifically target Synechococcus cells but also particular regions of their genomes (Fig. 0001). As this approach is not devoid of biases inherent to each method, we evaluated their effects at each step of the procedure by following both the genetic composition of the Synechococcus community and the relative abundance of contaminant heterotrophic bacteria by T-RFLP analyses and/or pyrosequencing. T-RFLP, a highly reproducible, low cost but only semiquantitative technique, was used to rapidly screen a high number of samples, and pyrosequencing was used to obtain a more thorough picture of the whole genetic and pigment diversity of samples.
Flowchart of the targeted metagenomic pipeline used in this study.
Efficiency of the combination of cell sorting and WGA to obtain large DNA amounts from natural Synechococcus populations
T-RFLP targeting the 16S rRNA gene was used to assess the proportion of co-occurring heterotrophic cells at different steps of the procedure: (1) in the initial sample concentrated by TFF, (2) in flow-sorted samples (including three technical replicates, named A to C, each comprising three consecutive sorts), and (3) in all samples obtained after WGA of the three final sorted samples (Fig. 0002a and c and Fig. S3). In silico analysis of a 16S rRNA gene Synechococcus reference database (cf. ) predicted Synechococcus T-RFs of 129, 491, and 492 bp, and, by extension, all other T-RFs were assumed to arise from heterotrophic contaminants (Table S1). Compared to these predictions, the sizes of Synechococcus T-RFs actually observed in sorted samples from the Astan station were 130 bp (Syn. T-RF I), 482–485 bp (Syn. T-RF II), and 487–489 bp (Syn. T-RF III; Fig. 0002a), and this assignment was independently confirmed by T-RFLP analyses of Synechococcus isolates (data not shown) and by petB sequencing of sorted samples from Astan (Fig. S3). Although somewhat large (up to 7 bp), such a drift between predicted and measured T-RF sizes has already been reported and was shown to increase with T-RF size (Kaplan & Kitts, 2003; Bukovská et al., 2010). The observed duplication of Syn. T-RF II and III may possibly be due to polymerase errors during PCR and/or during WGA. It is also worth noting that the flow cytometry setup used in this study (Exc 488 nm) was not suitable to detect PC-rich Synechococcus cells (pigment type 1), sometimes reported in coastal or low-salinity waters (Haverkamp et al., 2008). However, a complementary flow cytometric analysis of the initial Astan sample using dual lasers emitting at 488 and 635 nm (exciting PC) confirmed that only PE-containing Synechococcus cells were present in this sample (data not shown). As expected, no Prochlorococcus cells were detected since this genus is systematically absent from English Channel waters (N. Simon, pers. comm).
T-RFLP analysis of the diversity of Synechococcus and co-occurring heterotrophic bacteria at station Astan as assessed at different stages of the metagenomic pipeline by targeting the 16S rRNA (a and c) and petB (b and d) genes. (a) Composite profile of 16S rRNA gene T-RFs. (b) T-RFLP profile targeting petB for the initial sample concentrated by TFF. (c and d) Histograms of relative T-RF abundance of the Synechococcus and/or bacterial community composition based on T-RFLP targeting 16S rRNA (c) or petB (d) genes. Mean and standard deviations were calculated on the 47 WGA reactions (of 48) for which Synechococcus T-RFs were detected.
While the concentration of Synechococcus in the initial seawater sample was fairly low (c. 840 cells mL−1), virtually no residual co-occurring bacterial or eukaryotic cells could be detected after cell sorting in all three replicates, either by flow cytometry (< 2%; Fig. S1) or by T-RFLP using the 16S rRNA gene (Fig. 0002c). The three-step sorting procedure thus very efficiently removed all types of contaminants and was highly reproducible, as the three technical replicates gave similar results. While the first sorting step, based on natural fluorescence and side scatter, efficiently eliminated photosynthetic picoeukaryotes and large nonliving particles, the two following sorting steps, performed after DNA staining using a combination of SYTO-9 and SYTO-13, allowed us to almost completely remove heterotrophic bacteria. This absence of contamination shows that this protocol also efficiently eliminated any free DNA, often present at high concentrations in environmental samples especially after TFF that can lead to cell lysis (Rodrigue et al., 2009).
After the WGA step, some T-RF peaks assigned to contaminants in the initial sample re-appeared in variable proportion, that is, between 0 and 46% of the total bacterial T-RFs (Fig. 0002c and Fig. S3). In one of the WGA products, contaminants even represented all of the T-RFs present in the sample. As suggested in a previous study (Swan et al., 2011), a discrepancy in cell lysis efficiency between Synechococcus and their bacterial counterparts might explain this differential amplification. Although preliminary tests performed on cultured isolates showed that thermal lysis was slightly more efficient to break Synechococcus cells than chemical lysis (data not shown), we did not retain the former option, which generates smaller WGA products (Fig. S2). This differential amplification may also have caused heterotrophic bacterial DNA to be preferentially amplified over Synechococcus DNA during the WGA step. It is also worth noting that the relative proportion of contaminants was higher in WGA reactions performed on the second and third sorted replicates (sorts B and C), which chronologically were performed after sort A (Fig. 0002c). This may result from a lower integrity of Synechococcus cells and/or a higher sensitivity to DNA degradation in the TFF-concentrated sample that was kept on ice for several hours before cell sorting. Such modifications of the apparent global community structure as a result of WGA have already been reported in the literature and were attributed to either amplification of free contaminant DNA (Raghunathan et al., 2005; Kvist et al., 2007; Woyke et al., 2010) or differential amplification between taxa (Abulencia et al., 2006; Chen et al., 2008; Lepère et al., 2011; Blainey, 2013). The latter effect was suggested to result from various parameters. These include a difference in GC%, the presence of repeated regions (Pinard et al., 2006), different chromosome conformation (Schoenfeld et al., 2010), chromosomal breaks, relative cell abundance, the DNA amount of the various taxa present in the sample (Chen et al., 2008; Arakaki et al., 2010), and/or stochastic effects during initial priming and amplification (Rodrigue et al., 2009). In the present study, screening of the WGA products using T-RFLP targeting the 16S rRNA gene allowed us to circumvent this issue by selecting samples with very low contaminant load.
Effect of cell sorting and WGA on the apparent composition of Synechococcus communities
The aforementioned differential amplification of taxa during the WGA step may also affect the relative abundance of the different clades within the Synechococcus population. To estimate this potential bias, we developed a T-RFLP approach targeting the petB gene, encoding the cytochrome b6 subunit of the cytochrome b6/f complex, which provides a much finer taxonomic resolution of the genetic diversity within the marine Synechococcus radiation than 16S rRNA gene (Mazard et al., 2012). It must be noted that a given T-RF may correspond to different (sub)clades and that reciprocally some strains belonging to the same (sub)clade may have different T-RFs (Table S2), indicating that this approach does not perfectly reflect the composition of the Synechococcus population at the (sub)clade level. Still, variations in the T-RF pattern observed between the different stages of the metagenomic pipeline can be used for rapid screening of changes in the Synechococcus community composition between samples.
The set of restriction enzymes HpaII/HinfII used for the T-RFLP analyses of the Astan samples provided two T-RFs (125 and 234 bp), which were among the eight T-RFs predicted from in silico analysis of a database, including sequences from both Synechococcus isolates and environmental samples (Fig. 0002b and d; Table S2). These T-RF sizes were obtained from strains belonging to the four main Synechococcus clades dominating in oceanic regimes (I-IV; Zwirglmaier et al., 2008), with T-RF 125 bp encompassing strains from clades I, III, and IV and T-RF 234 bp encompassing strains from clades II, IV and also VIII, XVI and subcluster 5.3-I (Table S2). Although a moderate change was observed in the relative proportion of the two T-RFs in all three replicates after cell sorting, the 125 bp T-RF remained dominant in the sorted samples (Fig. 0002d). WGA also seemingly allowed us to preserve the diversity of the initial sample as both T-RFs were found in similar proportion after this step, despite some variability in the community composition between WGA products. In combination with 16S rRNA gene T-RFLP, these petB profiles helped us to select twenty WGA products, not only with a low contamination level by heterotrophic bacteria but also representative of the Synechococcus community present at the Astan site. One-third of this pool, equivalent to about 3.5 μg DNA, was used to construct the fosmid library.
Assessment of the quality of the fosmid library from the Astan station
The fosmid library prepared from the Astan station was seemingly representative of the genetic diversity occurring at this station, as it contained more than 500 000 clones, among which 10% were randomly screened by hybridization using a pool of three PBS probes targeting the c/rpcBA, mpeBA, and cpeC loci, then by PCR targeting the same genes as well as cpeU. 94% of the clones selected by hybridization contained at least one of the four genes screened by PCR (Table S3). Twenty-five clones were selected for sequencing based on their genetic diversity as assessed by blastn analyses of fosmid ends and by their location in the PBS region compared to reference Synechococcus genomes. This approach also revealed the presence of a few probable chimeric sequences, as shown by an incongruent predicted insert size with regard to PFGE size measurements, which were not retained for sequencing. Mapping of all reads onto a reference PBS region showed that there was a good and homogeneous coverage of the region (average: 28 641 ± 6409-fold, Fig. 0003a). Assembly of these reads followed by a contig elongation step (see ) led to 64 contigs with an average size of 4315 bp and an N50 of 7804 bp (Table S3). Among these, 41 contigs were located within the PBS gene region, including 7 exhibiting more than 98% nucleotide identity that were not included in the analysis. The 34 remaining PBS contigs had an average size of 5033 bp and an N50 of 11 113 bp and were deposited in GenBank under accession numbers KF846537–KF846570. Mapping of these contigs onto reference picocyanobacterial genomes then allowed us to examine the PBS gene diversity and genomic context and to assign these sequences to specific Synechococcus clades and pigment types (see below).
Partial or complete PBS regions retrieved from a natural population from the Astan station off Roscoff (France). (a) Read coverage using as a reference the PBS region from Synechococcus sp. CC9902. (b) Contigs or supercontigs assembled from sequencing of 25 fosmids and mapped onto the CC9902 genome. Colors represent the clades/subclades of the strain giving the best blastx hit among the 42 Synechococcus/Cyanobium genomes of the reference database, whereas the shading degree represents the percentage of identity of the environmental sequence to the closest reference genome (cf. insert legend). Blue indicates a best match to clade IV (CC9902 or BL107), red to subclade Ia (CC9311 or WH8020), and orange to subclade Ib (MVIR-18-1) strains. Highly conserved genes, mpeBA, cpeBA, and rpcBA, are shaded in gray. Environmental sequences from different contigs that exhibit more than 95% nucleotide identity are surrounded by blue frames. C, contigs obtained using the CLC-Assembly Cell; SC, supercontigs obtained by contig elongation using CAP3; misc., miscellaneous; PE, phycoerythrin; PC, phycocyanin.
Analysis of Synechococcus community structure and pigment type at the Astan station off Roscoff
The SOMLIT-Astan site is representative of permanently mixed waters of the Western Channel as no stratification occurs during the year (Sournia & Birrien, 1995). It is characterized by mesotrophic waters with moderate nitrate and phosphate concentrations and is weakly influenced by the Penzé estuary located near Roscoff. Picophytoplankton is dominated year-round by picoeukaryotes (Not et al., 2004), while the photosynthetic prokaryote fraction is exclusively represented by Synechococcus, which can reach abundances exceeding 104 cells mL−1 (N. Simon, pers. comm.). Sampling was performed in May, following the early spring bloom of Synechococcus, occurring in March.
Genetic diversity of the Synechococcus population was analyzed using pyrosequencing targeting the petB gene. After cleaning and denoising, at least 582 petB amplicons were obtained for each sample (Fig. 0004), which proved sufficient to cover the whole Synechococcus diversity, as attested by a Good's coverage index close or equal to 1 (Table 0002). Additionally, rarefaction curves were performed at 94% identity, a cutoff previously used by Mazard et al. (2012) and shown to be suitable for assignment of sequences at the subclade level (Fig. S4). Altogether, this led to the identification of 9 OTUs among the entire dataset, but only 4 were represented by more than 4 reads, suggesting that these rare OTUs might correspond to sequencing errors and that the curves were in fact saturated in all samples. It is also worth noting that based on several diversity estimators, including the SChao1 and Shannon–Weaver diversity indices (Table 0002; Shannon & Weaver, 1949; Chao et al., 2005), no significant modifications of the community structure occurred along the metagenomic pipeline (Fig. 0004a and Table 0002). The Synechococcus population at the Astan station at the sampling date appears largely dominated by clade I, with subclade Ib representing more than 96% of the total reads, clade Ia up to 3%, while the remaining community was composed of members of subclades Ic and IVa (Fig. 0004a). These results are consistent with observations made in the western Atlantic Ocean (Ahlgren & Rocap, 2012) or in cold North Atlantic mesotrophic waters where representatives of clade I are always dominating over clade IV (Huang et al., 2012; Mazard et al., 2012). However, this contrasts with abundance profiles found in California coastal waters (Tai & Palenik, 2009) or in Arctic, South Pacific, and eastern Atlantic Oceans where clade IV cells typically dominate (Zwirglmaier et al., 2008).
Description and statistical analyses of the pyrosequenced samples obtained during this study
| Gene/Operon | Sample | # Raw reads | # Cleaned reads | % Removed reads | # OTU | Good's coverage | SChao-1 | Shannon index |
| petB | TFF | 2119 | 1240 | 41 | 6 | 0.99 | 6.0 | 0.23 |
| Sort A | 2116 | 1438 | 32 | 7 | 1.00 | 7.5 | 0.18 | |
| Sort B | 2335 | 1450 | 38 | 6 | 1.00 | 4.0 | 0.18 | |
| Sort C | 3545 | 2232 | 37 | 6 | 1.00 | 6.0 | 0.16 | |
| Sort A + WGA | 1061 | 582 | 45 | 5 | 1.00 | 5.0 | 0.20 | |
| Sort B + WGA | 3388 | 1670 | 51 | 7 | 0.99 | 6.0 | 0.28 | |
| Sort C + WGA | 2457 | 1337 | 46 | 5 | 1.00 | 4.0 | 0.16 | |
| r/cpcBA | TFF | 6494 | 5572 | 14 | 13 | 0.99 | 14 | 1.55 |
| Sort A | 6361 | 5247 | 18 | 12 | 1.00 | 11 | 1.39 | |
| Sort B | 5259 | 4314 | 18 | 13 | 1.00 | 13 | 1.50 | |
| Sort C | 4704 | 3820 | 19 | 13 | 0.99 | 12 | 1.53 | |
| Sort A + WGA | 2694 | 2309 | 14 | 12 | 1.00 | 11 | 1.51 | |
| Sort B + WGA | 6456 | 5129 | 21 | 13 | 0.99 | 26 | 1.52 | |
| Sort C + WGA | 4890 | 3671 | 25 | 13 | 0.99 | 12 | 1.24 | |
| cpeBA | TFF | 6494 | 1195 | 82 | 8 | 1.00 | 9.0 | 0.98 |
| Sort A | 6361 | 1308 | 79 | 7 | 1.00 | 7.0 | 0.89 | |
| Sort B | 5259 | 1263 | 76 | 8 | 0.99 | 7.0 | 0.85 | |
| Sort C | 4704 | 447 | 90 | 6 | 1.00 | 7.0 | 0.94 | |
| Sort A + WGA | 2694 | 1269 | 53 | 9 | 1.00 | 7.0 | 1.01 | |
| Sort B + WGA | 6456 | 1812 | 72 | 9 | 1.00 | 10 | 1.41 | |
| Sort C + WGA | 4890 | 260 | 95 | 6 | 0.99 | 6.0 | 1.24 |
| Gene/Operon | Sample | # Raw reads | # Cleaned reads | % Removed reads | # OTU | Good's coverage | SChao-1 | Shannon index |
| petB | TFF | 2119 | 1240 | 41 | 6 | 0.99 | 6.0 | 0.23 |
| Sort A | 2116 | 1438 | 32 | 7 | 1.00 | 7.5 | 0.18 | |
| Sort B | 2335 | 1450 | 38 | 6 | 1.00 | 4.0 | 0.18 | |
| Sort C | 3545 | 2232 | 37 | 6 | 1.00 | 6.0 | 0.16 | |
| Sort A + WGA | 1061 | 582 | 45 | 5 | 1.00 | 5.0 | 0.20 | |
| Sort B + WGA | 3388 | 1670 | 51 | 7 | 0.99 | 6.0 | 0.28 | |
| Sort C + WGA | 2457 | 1337 | 46 | 5 | 1.00 | 4.0 | 0.16 | |
| r/cpcBA | TFF | 6494 | 5572 | 14 | 13 | 0.99 | 14 | 1.55 |
| Sort A | 6361 | 5247 | 18 | 12 | 1.00 | 11 | 1.39 | |
| Sort B | 5259 | 4314 | 18 | 13 | 1.00 | 13 | 1.50 | |
| Sort C | 4704 | 3820 | 19 | 13 | 0.99 | 12 | 1.53 | |
| Sort A + WGA | 2694 | 2309 | 14 | 12 | 1.00 | 11 | 1.51 | |
| Sort B + WGA | 6456 | 5129 | 21 | 13 | 0.99 | 26 | 1.52 | |
| Sort C + WGA | 4890 | 3671 | 25 | 13 | 0.99 | 12 | 1.24 | |
| cpeBA | TFF | 6494 | 1195 | 82 | 8 | 1.00 | 9.0 | 0.98 |
| Sort A | 6361 | 1308 | 79 | 7 | 1.00 | 7.0 | 0.89 | |
| Sort B | 5259 | 1263 | 76 | 8 | 0.99 | 7.0 | 0.85 | |
| Sort C | 4704 | 447 | 90 | 6 | 1.00 | 7.0 | 0.94 | |
| Sort A + WGA | 2694 | 1269 | 53 | 9 | 1.00 | 7.0 | 1.01 | |
| Sort B + WGA | 6456 | 1812 | 72 | 9 | 1.00 | 10 | 1.41 | |
| Sort C + WGA | 4890 | 260 | 95 | 6 | 0.99 | 6.0 | 1.24 |
Description and statistical analyses of the pyrosequenced samples obtained during this study
| Gene/Operon | Sample | # Raw reads | # Cleaned reads | % Removed reads | # OTU | Good's coverage | SChao-1 | Shannon index |
| petB | TFF | 2119 | 1240 | 41 | 6 | 0.99 | 6.0 | 0.23 |
| Sort A | 2116 | 1438 | 32 | 7 | 1.00 | 7.5 | 0.18 | |
| Sort B | 2335 | 1450 | 38 | 6 | 1.00 | 4.0 | 0.18 | |
| Sort C | 3545 | 2232 | 37 | 6 | 1.00 | 6.0 | 0.16 | |
| Sort A + WGA | 1061 | 582 | 45 | 5 | 1.00 | 5.0 | 0.20 | |
| Sort B + WGA | 3388 | 1670 | 51 | 7 | 0.99 | 6.0 | 0.28 | |
| Sort C + WGA | 2457 | 1337 | 46 | 5 | 1.00 | 4.0 | 0.16 | |
| r/cpcBA | TFF | 6494 | 5572 | 14 | 13 | 0.99 | 14 | 1.55 |
| Sort A | 6361 | 5247 | 18 | 12 | 1.00 | 11 | 1.39 | |
| Sort B | 5259 | 4314 | 18 | 13 | 1.00 | 13 | 1.50 | |
| Sort C | 4704 | 3820 | 19 | 13 | 0.99 | 12 | 1.53 | |
| Sort A + WGA | 2694 | 2309 | 14 | 12 | 1.00 | 11 | 1.51 | |
| Sort B + WGA | 6456 | 5129 | 21 | 13 | 0.99 | 26 | 1.52 | |
| Sort C + WGA | 4890 | 3671 | 25 | 13 | 0.99 | 12 | 1.24 | |
| cpeBA | TFF | 6494 | 1195 | 82 | 8 | 1.00 | 9.0 | 0.98 |
| Sort A | 6361 | 1308 | 79 | 7 | 1.00 | 7.0 | 0.89 | |
| Sort B | 5259 | 1263 | 76 | 8 | 0.99 | 7.0 | 0.85 | |
| Sort C | 4704 | 447 | 90 | 6 | 1.00 | 7.0 | 0.94 | |
| Sort A + WGA | 2694 | 1269 | 53 | 9 | 1.00 | 7.0 | 1.01 | |
| Sort B + WGA | 6456 | 1812 | 72 | 9 | 1.00 | 10 | 1.41 | |
| Sort C + WGA | 4890 | 260 | 95 | 6 | 0.99 | 6.0 | 1.24 |
| Gene/Operon | Sample | # Raw reads | # Cleaned reads | % Removed reads | # OTU | Good's coverage | SChao-1 | Shannon index |
| petB | TFF | 2119 | 1240 | 41 | 6 | 0.99 | 6.0 | 0.23 |
| Sort A | 2116 | 1438 | 32 | 7 | 1.00 | 7.5 | 0.18 | |
| Sort B | 2335 | 1450 | 38 | 6 | 1.00 | 4.0 | 0.18 | |
| Sort C | 3545 | 2232 | 37 | 6 | 1.00 | 6.0 | 0.16 | |
| Sort A + WGA | 1061 | 582 | 45 | 5 | 1.00 | 5.0 | 0.20 | |
| Sort B + WGA | 3388 | 1670 | 51 | 7 | 0.99 | 6.0 | 0.28 | |
| Sort C + WGA | 2457 | 1337 | 46 | 5 | 1.00 | 4.0 | 0.16 | |
| r/cpcBA | TFF | 6494 | 5572 | 14 | 13 | 0.99 | 14 | 1.55 |
| Sort A | 6361 | 5247 | 18 | 12 | 1.00 | 11 | 1.39 | |
| Sort B | 5259 | 4314 | 18 | 13 | 1.00 | 13 | 1.50 | |
| Sort C | 4704 | 3820 | 19 | 13 | 0.99 | 12 | 1.53 | |
| Sort A + WGA | 2694 | 2309 | 14 | 12 | 1.00 | 11 | 1.51 | |
| Sort B + WGA | 6456 | 5129 | 21 | 13 | 0.99 | 26 | 1.52 | |
| Sort C + WGA | 4890 | 3671 | 25 | 13 | 0.99 | 12 | 1.24 | |
| cpeBA | TFF | 6494 | 1195 | 82 | 8 | 1.00 | 9.0 | 0.98 |
| Sort A | 6361 | 1308 | 79 | 7 | 1.00 | 7.0 | 0.89 | |
| Sort B | 5259 | 1263 | 76 | 8 | 0.99 | 7.0 | 0.85 | |
| Sort C | 4704 | 447 | 90 | 6 | 1.00 | 7.0 | 0.94 | |
| Sort A + WGA | 2694 | 1269 | 53 | 9 | 1.00 | 7.0 | 1.01 | |
| Sort B + WGA | 6456 | 1812 | 72 | 9 | 1.00 | 10 | 1.41 | |
| Sort C + WGA | 4890 | 260 | 95 | 6 | 0.99 | 6.0 | 1.24 |
Synechococcus community structure determined at each step of the metagenomic pipeline by pyrosequencing targeting petB (a), r/cpcBA (b), and cpeBA (c) loci. OTUs were calculated using 94% and 95% similarity cutoff for petB and the two operons, respectively.
Pyrosequencing of the r/cpcBA and cpeBA operons (encoding α and β subunits of PC and PE I, respectively) was used to assess the pigment type diversity at the Astan station and to complement the analysis of PBS-containing fosmids. It must be noted that phylogenetic analyses made with these genes are not congruent with those based on core genes (including petB), likely due to lateral gene transfers that would have occurred during the evolution of these antenna genes (Six et al., 2007; Everroad & Wood, 2012). As for petB, Good's index indicated that the minimum number of sequences obtained for each operon, 3671 for r/cpcBA and 260 for cpeBA (Table 0002), was sufficient to cover properly the pigment diversity in all samples analyzed. It is noteworthy that the low number of cpeBA sequences is due to the fact that a high proportion (> 80%) of amplicons were discarded due to the nonspecific amplification of the mpeBA operon. Rarefaction curves led to the identification of 15 r/cpcBA OTUs and 11 cpeBA OTUs among the entire dataset, but only 12 and 8, respectively, were represented by more than 4 reads, indicating that the diversity of these samples was also well covered (Fig. S4). For both markers, the cell sorting step apparently did not alter the Synechococcus community composition in contrast to the WGA step, which led to a notable modification of the relative proportion of the different OTUs in most samples (Fig. 0004b and c). This seemingly higher variability with regard to petB might be due to the larger number of OTUs discriminated by these two markers, better highlighting the differential taxa amplification caused by WGA (Fig. 0004b and c). Yet, it is important to note that all dominant OTUs were still present in WGA samples.
Phylogenetic tree of r/cpcBA sequences from reference strains and environmental samples. The phylogenetic tree was built using maximum likelihood based on 340 aligned amino acids, 42 reference strains, and the LG+G model. Environmental sequences retrieved from public databases and from this study were added by ARB_Parsimony to the initial consensus tree shown in bold lines. Bold and red letters correspond to sequences obtained by pyrosequencing and fosmid library sequencing, respectively. Asterisks indicate the three most abundant OTUs in clone libraries sequenced by pyrosequencing. Gloeobacter violaceusPCC 7421 was used as an outgroup. Only bootstrap values > 70% are shown. Clades and/or pigment types were named following the nomenclature reported in previous studies (Crosbie et al., 2003; Six et al., 2007; Haverkamp et al., 2008; Humily et al., 2013).
We then performed phylogenetic analyses of these sequences together with laboratory strains of known pigmentation to try and assign OTUs obtained by pyrosequencing and contigs retrieved from fosmid sequencing to specific pigment types (Figs 0005 and 0006). The r/cpcBA tree resolved well both pigment types 1 or 2 (sensu Six et al., 2007), as strains exhibiting these pigmentations formed two well-defined clusters. In contrast, this marker was unable to distinguish between the different subtypes of pigment type 3. None of the r/cpcBA sequences obtained at the Astan station, either by pyrosequencing or from fosmid libraries, clustered with pigment types 1 or 2, suggesting that there were no representatives of these two pigment types in the Astan sample. The cpeBA tree brings complementary information as it splits members of pigment type 3 into distinct clusters. The first one gathered pigment types 3dA strains, that is, strains that are able to perform type IV chromatic acclimation (CA4), or related strains (e.g. 3aA) that possess a CA4-A genomic island involved in this process but display a different pigment phenotype (Humily et al., 2013). The second one encompassed pigment type 3c, that is, high-PUB strains, and 3dB, that is, CA4 strains possessing a CA4-B genomic island. Finally, strains branching at the base of these two clusters all belong to pigment type 3a, that is, low-PUB strains. Thus, the cpeBA phylogeny can be used to putatively assign the most abundant OTU at station Astan (cpeBA_OTU1 in Fig. 0004c) to type 3a and the second and third to type 3dA (or related strains), these abundant OTUs representing 63, 20, and 8% of the total cpeBA reads, respectively. Surprisingly, all cpeBA sequences retrieved from fosmid clones corresponded to pigment type 3dA and none to pigment type 3a, likely because primers used to screen this library were too specific given the set of reference genomes available and have selected out fosmids containing type 3a PBS regions. The new pyrosequencing data obtained for these marker genes in natural populations should allow us to refine these probes in the future.
Phylogenetic tree of cpeBA sequences from reference strains and environmental samples. The tree was built by Bayesian inference analysis based on 999 nt positions from 42 marine Synechococcus isolates. Environmental sequences retrieved in public databases and from this study were added by ARB_Parsimony to the initial consensus tree shown in bold lines. Bold and red letters correspond to sequences obtained by pyrosequencing and fosmid library sequencing, respectively. Asterisks indicate the three most abundant OTUs in clone libraries sequenced by pyrosequencing. Gloeobacter violaceusPCC 7421 was used as an outgroup. Only bootstrap values > 70% are shown. Clades and/or pigment types are named following the nomenclature reported in previous studies (Six et al., 2007; Haverkamp et al., 2008; Humily et al., 2013).
The metagenomic pipeline developed in this study (Fig. 0001) brought complementary information with regard to the pyrosequencing analysis, as it allowed us to sequence suites of PBS genes, while pyrosequencing can only be applied to the most highly conserved genes of the region, that is, those that were used as probes for fosmid screening (Fig. 0003). Mapping of all assembled contigs onto each reference genome showed that the majority of environmental PBS regions from the Astan station could be assigned to clades I and IV as expected from the pyrosequencing data. 24 contigs of 36, resulting from the assembly of about 64% of total reads, were attributable to clade IV (Fig. 0003b). Indeed, these contigs only comprised genes with highest identity/similarity to either Synechococcus sp. BL107 or Synechococcus sp. CC9902, the only sequenced clade IV strains so far, which are closely related (100% 16S rRNA gene identity, average nucleotide identity of 91.3%; Dufresne et al., 2008). A significant microdiversity was found all over the PBS region, suggesting that 4–6 clade IV genotypes co-occurred in this sample. It is noteworthy that the most variable genes of the region coded for phycobilin lyases (e.g. cpeS, cpeT, mpeU, cpeY, rpcEF) and uncharacterized proteins (e.g. unk2 and unk11). Additionally, about one-third of the reads were assembled into contigs that mapped onto two clade I reference genomes: MVIR-18-1 belonging to subclade Ia and CC9311 to subclade Ib. These contigs seemingly correspond to the two main environmental genotypes. The first one, for which we retrieved a complete PBS region, comprises alternating gene cassettes with highest percentages of identity with subclade Ia or Ib, while the second one mainly comprises genes related to subclade Ib. Interestingly, the remaining reads (c. 4%) could be assembled into one main 12.8-kb contig, which contains many genes very distant from all reference genomes available, and likely belong to a new environmental clade.
These contigs also provide important insights into the pigment type of the cells from which they originate. As expected from cpeBA phylogenetic analyses (Fig. 0006), most, if not all, of the contigs mapping onto clades I and IV can be assigned to pigment type 3dA (i.e. CA4-A) as these environmental PBS regions displayed a remarkable synteny with regard to reference genomes of strains belonging to this pigment type, apart from the 5′-end of these regions which presented more variability. In particular, they exhibited a succession of genes from mpeC to cpeF (previously called mpeV), which is typical of this pigment type (Six et al., 2007; Humily et al., 2013). By comparison, strains exhibiting pigment type 3c possess an unk10 gene before mpeU and do not possess cpeF, while pigment type 3a not only lack mpeC and mpeU but also have a different organization of the remaining genes of this region (i.e. cpeY-unk11-unk12-cpeF-cpeZ). The PBS region found in the unassigned clade (bottom of Fig. 0003b) is also original in this context. Indeed, gene content and organization of the covered region is identical to pigment type 3dA, except that it lacks the mpeU gene, a deletion previously observed only in MVIR-18-1, and that seemingly induces a strong alteration of the cell pigment content (i.e. low-PUB/PEB phenotype instead of a CA4 phenotype; Humily et al., 2013). By extension, it is possible that this new environmental genotype may exhibit the same pigmentation.
Conclusions and future applications
In this paper, we described an original metagenomic pipeline that we used to explore the diversity of Synechococcus pigment types but that can be applied to target any other region of the genome (Fig. 0001). This approach is particularly well suited for such a model organism because of its abundance in the field and its natural autofluorescence, which makes it easy to enumerate and sort by flow cytometry. The use of control molecular methods allowed us to circumvent most of the (often ignored) biases inherent to targeted metagenomics: T-RFLP to bypass the differential amplification of taxa included in the initial environmental and fosmid end-sequencing to avoid chimeras, both effects being mainly due to the WGA step. This strategy provided libraries qualitatively (though not quantitatively) representative of the genetic and functional diversity of the initial sample, as attested by pyrosequencing of petB, r/cpcBA, and cpeBA loci. Although phylogenetic analyses of the resulting sequences showed that there was a co-dominance of pigment type 3a and 3d in the Astan sample, which was maintained along the whole metagenomic pipeline, only fosmids of the latter pigment type were apparently selected for sequencing, while those of the former pigment type have most likely been counter-selected during our screening procedure. However, this issue could be avoided through the sequencing of a higher number of fosmids and/or the use of a mix of probes better representing the studied community.
The dominance of pigment type 3a at the Astan station is congruent with previous observations using dual-beam flow cytometry or spectrofluorimetry, which have shown the prevalence of low-PUB cells in coastal environments (Olson et al., 1990; Lantoine & Neveux, 1997; Wood et al., 1998). The latter cells were shown to almost always co-occur with high-PUB cells in these environments (Olson et al., 1990; Katano et al., 2007). Results from the present study suggest that these coastal high-PUB strains could actually be 3d (CA4) cells and not, as previously assumed, pigment type 3c, that is, cells with a fixed high-PUB pigmentation, as those were not observed in our sequence libraries. By extension, it is possible that pigment type 3c could rather be restricted to oceanic waters as these high-PUB cells most efficiently capture blue photons prevailing in these environments (Olson et al., 1990; Kirk, 1994).
To our knowledge, this is first study that shows the presence of CA4 strains in the natural environment. Indeed, due to their variable PUB to PEB ratio, they cannot be easily discriminated from pigment types with fixed pigment ratio (mainly 3a and 3c) by fluorescence-based approaches. Although a single environmental sample was analyzed here, the application of this molecular approach to a variety of trophic environments and depths should allow one to decipher the distribution and ecological importance of CA4 cells and more globally of all Synechococcus pigment types.
Acknowledgements
This work was supported by the collaborative program METASYN with Genoscope, the ‘Agence Nationale de la Recherche’ Microbial Genomics Program (PELICAN, ANR-09-GENM-030), UK Natural Environment Research Council grant (NE/I00985X/1), and the European Union's Seventh Framework Programs, MicroB3 and MaCuMBA (grant agreements 287589 and 311975, respectively). We thank Fabienne Rigaud-Jalabert for sample collection at the Astan station and Thierry Cariou (‘Service d'Observation en Milieu LITtoral (SOMLIT)’, INSU-CNRS, Station Biologique de Roscoff) for providing physicochemical parameters at the sampling site. We are most grateful to the Biogenouest® Genomics core facilities for their technical support, notably Gwenn Tanguy for genotyping operations in Roscoff and Alexandra Dheilly, Delphine Naquin, and Oscar Lima for pyrosequencing in Rennes. Hélène Bergès, Arnaud Bellec, and Joëlle Fourment from the CNRGV Platform are acknowledged for fosmid library screening. We are indebted to Dominique Boeuf for help with phylogenetic analyses.
References
Supporting Information
Additional Supporting Information may be found in the online version of the article:
Fig. S1. Flow cytometry analysis of the picophytoplankton community from the Astan station.
Fig. S3. Representative T-RFLP profiles at each step of the metagenomic pipeline.
Table S3. Characteristics of the fosmid library obtained at the Astan station.
Author notes
Editor: Patricia Sobecky






