Abstract
Streptococcus sinensis is a recently discovered human pathogen isolated from blood cultures of patients with infective endocarditis. Its phylogenetic position, as well as those of its closely related species, remains inconclusive when single genes were used for phylogenetic analysis. For example, S. sinensis branched out from members of the anginosus, mitis, and sanguinis groups in the 16S ribosomal RNA gene phylogenetic tree, but it was clustered with members of the anginosus and sanguinis groups when groEL gene sequences used for analysis. In this study, we sequenced the draft genome of S. sinensis and used a polyphasic approach, including concatenated genes, whole genomes, and matrix-assisted laser desorption ionization-time of flight mass spectrometry to analyze the phylogeny of S. sinensis. The size of the S. sinensis draft genome is 2.06 Mb, with GC content of 42.2%. Phylogenetic analysis using 50 concatenated genes or whole genomes revealed that S. sinensis formed a distinct cluster with Streptococcus oligofermentans and Streptococcus cristatus, and these three streptococci were clustered with the “sanguinis group.” As for phylogenetic analysis using hierarchical cluster analysis of the mass spectra of streptococci, S. sinensis also formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with the “mitis group.” On the basis of the findings, we propose a novel group, named “sinensis group,” to include S. sinensis, S. oligofermentans, and S. cristatus, in the Streptococcus genus. Our study also illustrates the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy.
Background
The genus Streptococcus currently comprises more than 90 species with some of them being important human pathogens causing significant morbidity and mortality globally. Traditional phenotypic classification of Streptococcus relies on their Lancefield group antigens and hemolytic properties, which divides the genus into two major groups, pyogenic and viridans groups (Sherman 1937). As a result of the widespread use of polymerase chain reaction and DNA sequencing in the last two decades, genotypic methods such as amplification and sequencing of universal gene targets represent an advanced method for bacterial classification and identification. Among the various studied gene targets, 16S ribosomal RNA (rRNA) gene has been the most widely used. Classification of Streptococcus based on the 16S rRNA gene sequences divided the genus into six major groups, namely anginosus, mitis, salivarius, mutans, bovis, and pyogenic groups (Kawamura et al. 1995). Subsequently, Facklam (2002) suggested the addition of sanguinis group to the existing groups based on the phenotypic properties, resulting in seven major groups. However, some studies still showed that 16S rRNA gene or other housekeeping genes failed to provide sufficient resolution and to delineate Streptococcus species into the same taxonomic groupings (Tapp et al. 2003; Glazunova et al. 2010; Park et al. 2010; Zbinden et al. 2011). Apart from these phenotypic and genotypic identification methods, a recently emerged technique, matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS), has been used for identification and cluster analysis of many bacterial species. The technique examines the profile of proteins detected directly from intact bacteria and involves pattern analysis of the mass spectra obtained within a mass range of 2,000–20,000 Da from whole cell protein fingerprinting using mathematical tools (Hsieh et al. 2008). Several studies have shown that MALDI-TOF MS is useful for identification and cluster analysis of different groups of medically important Streptococcus species (Dubois et al. 2013; Karpanoja et al. 2014).
In 2002, we reported the discovery of a novel species, Streptococcus sinensis, isolated from multiple blood cultures of a 42-year-old Chinese woman with infective endocarditis complicating her chronic rheumatic heart disease in Hong Kong (Woo et al. 2002). Two years later, we reported two other cases of infective endocarditis caused by S. sinensis with Lancefield F (Woo et al. 2004). Subsequently, additional cases reported from France and Italy suggested that the bacterium is an emerging pathogen of global importance (Uckay et al. 2007; Faibis et al. 2008). The oral cavity is the natural reservoir of S. sinensis (Woo et al. 2008). Phenotypically, the bacterium possessed characteristics from members of anginosus, sanguinis, and mitis groups, as different commercial kits gave different identities to different strains of S. sinensis (Woo et al. 2006). Genotypically, phylogenetic analysis using 16S rRNA gene sequences showed that S. sinensis was equally close to both the anginosus and mitis groups but was more closely related to sanguinis group when using GroEL sequences. Because of these inconclusive results, the phylogenetic position of S. sinensis remains uncertain.
In this study, we attempted to determine the exact phylogenetic position of S. sinensis using phylogenomic approach. We sequenced the first genome of S. sinensis type strain HKU4T and performed comparative genomic studies utilizing publicly available Streptococcus genomes. Phylogenomic analysis using the whole-genome sequences revealed a new phylogenetic clade, including S. sinensis, Streptococcus oligofermentans, and Streptococcus cristatus, which we propose it as “sinensis group.” Intergenomic distance analysis, phylogenetic analysis using concatenated sequences, and cluster analysis of MALDI-TOF MS were also performed.
Genome Sequencing of S. sinensis Type Strain HKU4T
The de novo assembly using 26.81 million paired-end reads generated 116 contigs with lengths ranging from 212 to 125,948 bases, giving a total genome size of 2.06 Mb with the average GC content of 42.2%. All contigs generated were submitted to the RAST version 4.0 (Rapid Annotation using Subsystem Technology) annotation server, resulting in 1,992 protein-coding sequences (CDSs), 4 rRNA operons, and 47 transfer RNA-encoding genes (supplementary table S1, Supplementary Material online). To facilitate the subsequent genomic analysis, 86 genome sequences of other Streptococcus species and that of Enterococcus faecalis were also submitted to RAST for annotation. Among the 86 genome sequences, 25 complete genome sequences and 5 draft genome sequences (Streptococcus australis, Streptococcus criceti, S. cristatus, Streptococcus dentisani, and Streptococcus tigurinus because only one partial genome sequence was available for each species) were used for subsystem classification (table 1). Each CDS in annotated genomes was grouped into different RAST subsystems based on the predicted functional role. Among the 1,992 CDSs in the S. sinensis HKU4T genome, 854 CDSs can be categorized into RAST subsystems (fig. 1A), in which 133 CDSs were classified into more than one category. Overall, majority of CDSs were classified into subsystems of carbohydrates (148 CDSs, 7.4%), protein metabolism (118 CDSs, 5.9%), DNA metabolism (111 CDSs, 5.6%), and amino acid and derivatives (110 CDSs, 5.5%) (fig. 1A). The remaining 1,138 (57.1%) CDSs could not be classified into any subsystems, in which 488 (24.5%) CDSs were only annotated as hypothetical proteins. Consistent to the results of previous studies (Olson et al. 2013), when we compared the distribution of CDSs in each subsystem of the S. sinensis genome with those of other Streptococcus genomes, all of them have a similar percentage of their genome dedicated to each subsystem (fig. 1B).
Distributions of predicted coding sequence function in the annotated genomes according to RAST subsystems. In (A), the number of CDSs of S. sinensis HKU4T in different subsystems is indicated in bracket. In (B), a total of 31 genomes of Streptococcus species, including S. sinensis and representatives from all major groups, were analyzed. Each column indicates the number of CDSs of each Streptococcus species in different subsystems showing in different color.
Distributions of predicted coding sequence function in the annotated genomes according to RAST subsystems. In (A), the number of CDSs of S. sinensis HKU4T in different subsystems is indicated in bracket. In (B), a total of 31 genomes of Streptococcus species, including S. sinensis and representatives from all major groups, were analyzed. Each column indicates the number of CDSs of each Streptococcus species in different subsystems showing in different color.
Genome Sequence Details and the Corresponding DDH Value with Streptococcus sinensis HKU4T
| Species | Strain | Status | GenBank Accession No. | DDH (Formula 1) |
|---|---|---|---|---|
| “Sinensis” group | ||||
| Streptococcus sinensis | HKU4T | Draft | JPEN00000000a,b | — |
| Streptococcus oligofermentans | AS 1.3089 | Finished | NC_021175.1a,b | 45.6 |
| Streptococcus cristatus | ATCC 51100 | Draft | AEVC01000001–AEVC01000031a,b | 44.3 |
| Sanguinis group | ||||
| Streptococcus sanguinis | SK36 | Finished | NC_009009a,b | 28.9 |
| Streptococcus sanguinis | SK678 | Draft | AEXA01000001–AEXA01000010 | — |
| Streptococcus gordonii | CH1 | Finished | NC_009785a,b | 23.2 |
| Anginosus group | ||||
| Streptococcus intermedius | C270 | Finished | NC_022237.1b | 17.1 |
| Streptococcus intermedius | JTH08 | Finished | NC_018073.1b | 16.7 |
| Streptococcus intermedius | B196 | Finished | NC_022246.1a,b | 16.7 |
| Streptococcus intermedius | F0413 | Draft | AFXO01000001–AFXO01000013 | — |
| Streptococcus constellatus subsp. pharyngis | C232 | Finished | NC_022236.1a,b | 16.8 |
| Streptococcus constellatus subsp. pharyngis | C818 | Finished | NC_022245.1b | 16.8 |
| Streptococcus constellatus subsp. pharyngis | C1050 | Finished | NC_022238.1b | 16.7 |
| Streptococcus constellatus subsp. pharyngis | SK1060 | Draft | BASX01000001–BASX01000066 | — |
| Streptococcus anginosus | C1051 | Finished | NC_022244.1a,b | 16.2 |
| Streptococcus anginosus | C238 | Finished | NC_022239.1b | 15.6 |
| Streptococcus anginosus | SK1138 | Draft | ALJO01000001–ALJO01000013 | — |
| Streptococcus anginosus subsp. whileyi | CCUG 39159 | Draft | AICP01000001–AICP01000083 | — |
| Mitis group | ||||
| Streptococcus oralis | Uo5 | Finished | NC_015291.1a,b | 16.1 |
| Streptococcus oralis | SK610 | Draft | AJKQ01000001–AJKQ01000031 | — |
| Streptococcus mitis | B6 | Finished | NC_013853a,b | 15.4 |
| Streptococcus pneumoniae | ATCC 700669 | Finished | NC_011900a,b | 15.4 |
| Streptococcus pneumoniae | D39 | Finished | NC_008533b | 15.4 |
| Streptococcus pneumoniae | JJA | Finished | NC_012466b | 15.4 |
| Streptococcus pneumoniae | Taiwan19F-14 | Finished | NC_012469b | 15.4 |
| Streptococcus pneumoniae | TIGR4 | Finished | NC_003028b | 15.4 |
| Streptococcus pneumoniae | G54 | Finished | NC_011072b | 15.3 |
| Streptococcus pneumoniae | P1031 | Finished | NC_012467b | 15.3 |
| Streptococcus pneumoniae | R6 | Finished | NC_003098b | 15.3 |
| Streptococcus pneumoniae | 70585 | Finished | NC_012468b | 15.2 |
| Streptococcus pneumoniae | CGSP14 | Finished | NC_010582b | 15.2 |
| Streptococcus pneumoniae | Hungary19A-6 | Finished | NC_010380b | 15.0 |
| Streptococcus parasanguinis | FW213 | Finished | NC_017905.1b | 15.2 |
| Streptococcus parasanguinis | ATCC 15912 | Finished | NC_015678.1a,b | 15.1 |
| Streptococcus pseudopneumoniae | IS7493 | Finished | NC_015875.1a,b | 14.8 |
| Streptococcus infantis | ATCC 700779 | Draft | AEVD01000001–AEVD01000031 | — |
| Streptococcus infantis | SK970 | Draft | AFUT01000001–AFUT01000009 | — |
| Streptococcus peroris | ATCC 700780 | Draft | AEVF01000001–AEVF01000017 | — |
| Streptococcus tigurinus | 1366 | Draft | AORX01000001–AORX01000014a | — |
| Streptococcus dentisani | 7746 | Draft | CAUJ01000001–CAUJ01000008a | — |
| Streptococcus australis | ATCC 700641 | Draft | AEQR01000001–AEQR01000027a | — |
| Pyogenic group | ||||
| Streptococcus agalactiae | NEM316 | Finished | NC_004368b | 13.5 |
| Streptococcus agalactiae | 09mas018883 | Finished | NC_021485.1a,b | 13.2 |
| Streptococcus agalactiae | 2603 V/R | Finished | NC_004116b | 13.2 |
| Streptococcus agalactiae | A909 | Finished | NC_007432b | 13.2 |
| Streptococcus dysgalactiae subsp. equisimilis | GGS_124 | Finished | NC_012891a,b | 13.3 |
| Streptococcus pyogenes | M1 GAS | Finished | NC_002737a,b | 13.3 |
| Streptococcus pyogenes | Manfredo | Finished | NC_009332b | 13.3 |
| Streptococcus pyogenes | MGAS10270 | Finished | NC_008022b | 13.3 |
| Streptococcus pyogenes | MGAS10394 | Finished | NC_006086b | 13.3 |
| Streptococcus pyogenes | MGAS10750 | Finished | NC_008024b | 13.3 |
| Streptococcus pyogenes | MGAS2096 | Finished | NC_008023b | 13.3 |
| Streptococcus pyogenes | MGAS315 | Finished | NC_004070b | 13.3 |
| Streptococcus pyogenes | MGAS5005 | Finished | NC_007297b | 13.3 |
| Streptococcus pyogenes | MGAS6180 | Finished | NC_007296b | 13.3 |
| Streptococcus pyogenes | MGAS8232 | Finished | NC_003485b | 13.3 |
| Streptococcus pyogenes | MGAS9429 | Finished | NC_008021b | 13.3 |
| Streptococcus pyogenes | NZ131 | Finished | NC_011375b | 13.3 |
| Streptococcus pyogenes | SSI-1 | Finished | NC_004606b | 13.3 |
| Streptococcus uberis | 0140J | Finished | NC_012004a,b | 13.2 |
| Streptococcus parauberis | KCTC 11537 | Finished | NC_015558.1a,b | 13.2 |
| Streptococcus iniae | SF1 | Finished | NC_021314.1a,b | 13.1 |
| Streptococcus equi subsp. zooepidemicus | MGCS10565 | Finished | NC_011134a,b | 13.1 |
| Streptococcus equi subsp. equi | 4047 | Finished | NC_012471b | 13.0 |
| Streptococcus equi subsp. zooepidemicus | H70 | Finished | NC_012470b | 13.0 |
| Streptococcus criceti | HS-6 | Draft | AEUV02000001–AEUV02000002a | — |
| Bovis group | ||||
| Streptococcus gallolyticus | UCN34 | Finished | NC_013798a,b | 13.3 |
| Streptococcus lutetiensis | 33 | Finished | NC_021900.1a,b | 13.3 |
| Streptococcus pasteurianus | ATCC 43144 | Finished | NC_015600.1a,b | 13.3 |
| Streptococcus equinus | ATCC 9812 | Draft | AEVB01000001–AEVB01000057 | — |
| Streptococcus infantarius | ATCC BAA-102 | Draft | ABJK02000001–ABJK02000022 | — |
| Mutans group | ||||
| Streptococcus mutans | NN2025 | Finished | NC_013928a,b | 13.3 |
| Streptococcus mutans | GS-5 | Finished | NC_018089b | 13.2 |
| Streptococcus mutans | LJ23 | Finished | NC_017768.1b | 13.2 |
| Streptococcus mutans | UA159 | Finished | NC_004350b | 13.2 |
| Salivarius group | ||||
| Streptococcus thermophilus | CNRZ1066 | Finished | NC_006449b | 13.8 |
| Streptococcus thermophilus | LMD-9 | Finished | NC_008532b | 13.8 |
| Streptococcus thermophilus | LMG 18311 | Finished | NC_006448b | 13.8 |
| Streptococcus thermophilus | JIM 8232 | Finished | NC_017581.1a,b | 13.7 |
| Streptococcus salivarius | JIM8777 | Finished | NC_017595.1b | 13.7 |
| Streptococcus salivarius | CCHSS3 | Finished | FR873481a,b | 13.6 |
| Streptococcus salivarius | SK126 | Draft | ACLO01000001–ACLO01000101 | — |
| No group | ||||
| Streptococcus suis | BM407 | Finished | NC_012926b | 13.7 |
| Streptococcus suis | P1/7 | Finished | NC_012925b | 13.7 |
| Streptococcus suis | 98HAH33 | Finished | NC_009443b | 13.6 |
| Streptococcus suis | 05ZYH33 | Finished | NC_009442a,b | 13.6 |
| Streptococcus suis | SC84 | Finished | NC_012924b | 13.6 |
| Outgroup | ||||
| Enterococcus faecalis | V583 | Finished | NC_004668.1b | 12.6 |
| Species | Strain | Status | GenBank Accession No. | DDH (Formula 1) |
|---|---|---|---|---|
| “Sinensis” group | ||||
| Streptococcus sinensis | HKU4T | Draft | JPEN00000000a,b | — |
| Streptococcus oligofermentans | AS 1.3089 | Finished | NC_021175.1a,b | 45.6 |
| Streptococcus cristatus | ATCC 51100 | Draft | AEVC01000001–AEVC01000031a,b | 44.3 |
| Sanguinis group | ||||
| Streptococcus sanguinis | SK36 | Finished | NC_009009a,b | 28.9 |
| Streptococcus sanguinis | SK678 | Draft | AEXA01000001–AEXA01000010 | — |
| Streptococcus gordonii | CH1 | Finished | NC_009785a,b | 23.2 |
| Anginosus group | ||||
| Streptococcus intermedius | C270 | Finished | NC_022237.1b | 17.1 |
| Streptococcus intermedius | JTH08 | Finished | NC_018073.1b | 16.7 |
| Streptococcus intermedius | B196 | Finished | NC_022246.1a,b | 16.7 |
| Streptococcus intermedius | F0413 | Draft | AFXO01000001–AFXO01000013 | — |
| Streptococcus constellatus subsp. pharyngis | C232 | Finished | NC_022236.1a,b | 16.8 |
| Streptococcus constellatus subsp. pharyngis | C818 | Finished | NC_022245.1b | 16.8 |
| Streptococcus constellatus subsp. pharyngis | C1050 | Finished | NC_022238.1b | 16.7 |
| Streptococcus constellatus subsp. pharyngis | SK1060 | Draft | BASX01000001–BASX01000066 | — |
| Streptococcus anginosus | C1051 | Finished | NC_022244.1a,b | 16.2 |
| Streptococcus anginosus | C238 | Finished | NC_022239.1b | 15.6 |
| Streptococcus anginosus | SK1138 | Draft | ALJO01000001–ALJO01000013 | — |
| Streptococcus anginosus subsp. whileyi | CCUG 39159 | Draft | AICP01000001–AICP01000083 | — |
| Mitis group | ||||
| Streptococcus oralis | Uo5 | Finished | NC_015291.1a,b | 16.1 |
| Streptococcus oralis | SK610 | Draft | AJKQ01000001–AJKQ01000031 | — |
| Streptococcus mitis | B6 | Finished | NC_013853a,b | 15.4 |
| Streptococcus pneumoniae | ATCC 700669 | Finished | NC_011900a,b | 15.4 |
| Streptococcus pneumoniae | D39 | Finished | NC_008533b | 15.4 |
| Streptococcus pneumoniae | JJA | Finished | NC_012466b | 15.4 |
| Streptococcus pneumoniae | Taiwan19F-14 | Finished | NC_012469b | 15.4 |
| Streptococcus pneumoniae | TIGR4 | Finished | NC_003028b | 15.4 |
| Streptococcus pneumoniae | G54 | Finished | NC_011072b | 15.3 |
| Streptococcus pneumoniae | P1031 | Finished | NC_012467b | 15.3 |
| Streptococcus pneumoniae | R6 | Finished | NC_003098b | 15.3 |
| Streptococcus pneumoniae | 70585 | Finished | NC_012468b | 15.2 |
| Streptococcus pneumoniae | CGSP14 | Finished | NC_010582b | 15.2 |
| Streptococcus pneumoniae | Hungary19A-6 | Finished | NC_010380b | 15.0 |
| Streptococcus parasanguinis | FW213 | Finished | NC_017905.1b | 15.2 |
| Streptococcus parasanguinis | ATCC 15912 | Finished | NC_015678.1a,b | 15.1 |
| Streptococcus pseudopneumoniae | IS7493 | Finished | NC_015875.1a,b | 14.8 |
| Streptococcus infantis | ATCC 700779 | Draft | AEVD01000001–AEVD01000031 | — |
| Streptococcus infantis | SK970 | Draft | AFUT01000001–AFUT01000009 | — |
| Streptococcus peroris | ATCC 700780 | Draft | AEVF01000001–AEVF01000017 | — |
| Streptococcus tigurinus | 1366 | Draft | AORX01000001–AORX01000014a | — |
| Streptococcus dentisani | 7746 | Draft | CAUJ01000001–CAUJ01000008a | — |
| Streptococcus australis | ATCC 700641 | Draft | AEQR01000001–AEQR01000027a | — |
| Pyogenic group | ||||
| Streptococcus agalactiae | NEM316 | Finished | NC_004368b | 13.5 |
| Streptococcus agalactiae | 09mas018883 | Finished | NC_021485.1a,b | 13.2 |
| Streptococcus agalactiae | 2603 V/R | Finished | NC_004116b | 13.2 |
| Streptococcus agalactiae | A909 | Finished | NC_007432b | 13.2 |
| Streptococcus dysgalactiae subsp. equisimilis | GGS_124 | Finished | NC_012891a,b | 13.3 |
| Streptococcus pyogenes | M1 GAS | Finished | NC_002737a,b | 13.3 |
| Streptococcus pyogenes | Manfredo | Finished | NC_009332b | 13.3 |
| Streptococcus pyogenes | MGAS10270 | Finished | NC_008022b | 13.3 |
| Streptococcus pyogenes | MGAS10394 | Finished | NC_006086b | 13.3 |
| Streptococcus pyogenes | MGAS10750 | Finished | NC_008024b | 13.3 |
| Streptococcus pyogenes | MGAS2096 | Finished | NC_008023b | 13.3 |
| Streptococcus pyogenes | MGAS315 | Finished | NC_004070b | 13.3 |
| Streptococcus pyogenes | MGAS5005 | Finished | NC_007297b | 13.3 |
| Streptococcus pyogenes | MGAS6180 | Finished | NC_007296b | 13.3 |
| Streptococcus pyogenes | MGAS8232 | Finished | NC_003485b | 13.3 |
| Streptococcus pyogenes | MGAS9429 | Finished | NC_008021b | 13.3 |
| Streptococcus pyogenes | NZ131 | Finished | NC_011375b | 13.3 |
| Streptococcus pyogenes | SSI-1 | Finished | NC_004606b | 13.3 |
| Streptococcus uberis | 0140J | Finished | NC_012004a,b | 13.2 |
| Streptococcus parauberis | KCTC 11537 | Finished | NC_015558.1a,b | 13.2 |
| Streptococcus iniae | SF1 | Finished | NC_021314.1a,b | 13.1 |
| Streptococcus equi subsp. zooepidemicus | MGCS10565 | Finished | NC_011134a,b | 13.1 |
| Streptococcus equi subsp. equi | 4047 | Finished | NC_012471b | 13.0 |
| Streptococcus equi subsp. zooepidemicus | H70 | Finished | NC_012470b | 13.0 |
| Streptococcus criceti | HS-6 | Draft | AEUV02000001–AEUV02000002a | — |
| Bovis group | ||||
| Streptococcus gallolyticus | UCN34 | Finished | NC_013798a,b | 13.3 |
| Streptococcus lutetiensis | 33 | Finished | NC_021900.1a,b | 13.3 |
| Streptococcus pasteurianus | ATCC 43144 | Finished | NC_015600.1a,b | 13.3 |
| Streptococcus equinus | ATCC 9812 | Draft | AEVB01000001–AEVB01000057 | — |
| Streptococcus infantarius | ATCC BAA-102 | Draft | ABJK02000001–ABJK02000022 | — |
| Mutans group | ||||
| Streptococcus mutans | NN2025 | Finished | NC_013928a,b | 13.3 |
| Streptococcus mutans | GS-5 | Finished | NC_018089b | 13.2 |
| Streptococcus mutans | LJ23 | Finished | NC_017768.1b | 13.2 |
| Streptococcus mutans | UA159 | Finished | NC_004350b | 13.2 |
| Salivarius group | ||||
| Streptococcus thermophilus | CNRZ1066 | Finished | NC_006449b | 13.8 |
| Streptococcus thermophilus | LMD-9 | Finished | NC_008532b | 13.8 |
| Streptococcus thermophilus | LMG 18311 | Finished | NC_006448b | 13.8 |
| Streptococcus thermophilus | JIM 8232 | Finished | NC_017581.1a,b | 13.7 |
| Streptococcus salivarius | JIM8777 | Finished | NC_017595.1b | 13.7 |
| Streptococcus salivarius | CCHSS3 | Finished | FR873481a,b | 13.6 |
| Streptococcus salivarius | SK126 | Draft | ACLO01000001–ACLO01000101 | — |
| No group | ||||
| Streptococcus suis | BM407 | Finished | NC_012926b | 13.7 |
| Streptococcus suis | P1/7 | Finished | NC_012925b | 13.7 |
| Streptococcus suis | 98HAH33 | Finished | NC_009443b | 13.6 |
| Streptococcus suis | 05ZYH33 | Finished | NC_009442a,b | 13.6 |
| Streptococcus suis | SC84 | Finished | NC_012924b | 13.6 |
| Outgroup | ||||
| Enterococcus faecalis | V583 | Finished | NC_004668.1b | 12.6 |
aGenome sequence submitted for subsystem classification using RAST 4.0.
bGenome sequence used to construct the whole genome tree.
Phylogenomic Analyses of S. sinensis and Other Streptococcus Species Reveals a Distinct Clade Comprising S. sinensis, S. oligofermentans, and S. cristatus
Phylogenetic analyses using sequences of single gene loci, 16S rRNA and groEL, extracted from 87 Streptococcus genomes showed that S. sinensis closely clustered with S. oligofermentans and S. cristatus, respectively (fig. 2A–C). However, the topology of these trees was not concordant, in which S. sinensis branched out from members of anginosus, mitis, and sanguinis groups when using 16S rRNA gene sequences (fig. 2A) but clustered with members of anginosus and sanguinis groups when using groEL sequences for analyses (fig. 2B and C). This inconclusive result suggested that phylogenetic analysis using single gene target has not added much to our understanding on the phylogeny of S. sinensis, as they only represent a very small portion of the bacterial genome.
Phylogenetic relationship among Streptococcus strains. Three phylogenetic trees were constructed, each using a different genetic locus for analysis. (A) 16S rRNA. (B) groEL. (C) GroEL. The trees were constructed by maximum-likelihood method using RAxML (version 7.3) and E. faecalis V583 as the root. A total of 1,583 nucleotide positions of the 16S rRNA gene, 1,709 nucleotide positions and 565 deduced amino acid positions of groEL from 88 genomes were included for analyses. Bootstrap values were calculated from 1,000 replicates. The scale bar corresponds to the mean number of nucleotide/amino acid substitutions per site on the respective branch. Names and accession numbers are given as cited in GenBank in table 1.
Phylogenetic relationship among Streptococcus strains. Three phylogenetic trees were constructed, each using a different genetic locus for analysis. (A) 16S rRNA. (B) groEL. (C) GroEL. The trees were constructed by maximum-likelihood method using RAxML (version 7.3) and E. faecalis V583 as the root. A total of 1,583 nucleotide positions of the 16S rRNA gene, 1,709 nucleotide positions and 565 deduced amino acid positions of groEL from 88 genomes were included for analyses. Bootstrap values were calculated from 1,000 replicates. The scale bar corresponds to the mean number of nucleotide/amino acid substitutions per site on the respective branch. Names and accession numbers are given as cited in GenBank in table 1.
In view of this problem, we attempted to use a phylogenomic approach, which based on the whole bacterial genome, to resolve the taxonomic ambiguity of S. sinensis. In bacterial taxonomy, whole genomic DNA–DNA hybridization (DDH) has been used to determine the genetic distance between two strains and 70% DDH was proposed as a criterion for delineating species (Wayne 1988). However, this method is not widely used because it is tedious and results depend on experimental conditions. Therefore, it is often difficult to compare results between different laboratories objectively. Recent advancement of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in silico genome-to-genome comparison. Among various sophisticated methods studied, a digital DDH method, GGDC 2.0, was shown to yield very good correlation with wet-lab DDH (Auch et al. 2010). In this study, with the availability of Streptococcus genome sequences, we were able to use GGDC 2.0 for intergenomic distance estimation, which allowed genome sequence comparison between two Streptococcus strains. The results showed that S. sinensis HKU4T shared 45.6% and 44.3% nucleotide identities to the genome sequence of S. oligofermentans AS 1.3089 (GenBank accession number NC_021175) and S. cristatus ATCC 51100 (GenBank accession number AEVC01000001–AEVC01000031) but only 23.2–28.9%, 15.6–17.1%, and 14.8–16.1% nucleotide identities to those from members of sanguinis (3 genomes), anginosus (12 genomes), and mitis (23 genomes) groups, respectively (table 1). The present DDH value between S. sinensis and S. oligofermentans was quite different from the results obtained in the previous study, in which the wet-lab DDH value between S. sinensis and S. oligofermentans was only 15%. We speculate that the discrepancy may due to the experimental error as DDH method is well-known not easily be made reproducible. Nevertheless, the present results using the entire genome sequences showed that S. sinensis was more closely related to S. oligofermentans and S. cristatus than members of sanguinis, anginosus, and mitis groups.
To further verify the phylogenetic position of S. sinensis among the genus Streptococcus, phylogenomic analysis utilizing the draft genome sequences of S. sinensis HKU4T and S. cristatus ATCC51100 as well as the available complete genome sequences of 69 Streptococcus species was performed and the results also supported that S. sinensis closely clustered with S. oligofermentans and S. cristatus, forming a unique clade (fig. 3A). Although this clade, comprising S. sinensis, S. oligofermentans, and S. cristatus, also clustered with members of sanguinis group, inclusion of this clade into the sanguinis group does not appear justified as they were connected by a relatively long branch and shared relatively low DDH value with S. sinensis. To confirm this result, we performed phylogenetic analysis using concatenated sequences of 50 ribosomal protein genes retrieved from 87 Streptococcus genomes including S. sinensis, representing 35 different species, and from one E. faecalis genome. These 50 ribosomal protein genes were chosen because they have been shown to be useful for phylogenetic delineation of bacterial species in previous studies (Jolley et al. 2012; Maiden et al. 2013) and their names were listed in supplementary table S2, Supplementary Material online. Results showed that the topologies of both trees were concordant and able to recover members of the seven taxonomic groups as described in previous studies (fig. 3A and B) (Kawamura et al. 1995; Facklam 2002). Consistently, the tree based on the concatenated sequences also revealed that S. sinensis, together with S. oligofermentans and S. cristatus, forming a distinct phylogenetic clade with a bootstrap value of 100%. It is worth noting that this unique clade also fell within the sanguinis group but the grouping was weakly supported by a low bootstrap value of 42%, meaning that it was not often clustered with members of sanguinis group (fig. 3B).
Phylogenetic tree constructed using draft genome sequences and concatenated nucleotide sequences of 50 ribosomal protein genes of S. sinensis HKU4T. In (A), the tree based on draft genome sequences was constructed by neighbor-joining method using GGDC distance (formula 1) and E. faecalis V583 as the root. In (B), the tree based on 50 ribosomal protein genes, showing the relationship of S. sinensis HKU4T to other Streptococcus species, was constructed by maximum-likelihood method using RAxML (version 7.3) and E. faecalis V583 as the root. Bootstrap values were calculated from 1,000 replicates. The scale bar corresponds to the mean number of nucleotide substitutions per site on the respective branch. The gene names and accession numbers are given as cited in GenBank in table 1.
Phylogenetic tree constructed using draft genome sequences and concatenated nucleotide sequences of 50 ribosomal protein genes of S. sinensis HKU4T. In (A), the tree based on draft genome sequences was constructed by neighbor-joining method using GGDC distance (formula 1) and E. faecalis V583 as the root. In (B), the tree based on 50 ribosomal protein genes, showing the relationship of S. sinensis HKU4T to other Streptococcus species, was constructed by maximum-likelihood method using RAxML (version 7.3) and E. faecalis V583 as the root. Bootstrap values were calculated from 1,000 replicates. The scale bar corresponds to the mean number of nucleotide substitutions per site on the respective branch. The gene names and accession numbers are given as cited in GenBank in table 1.
The phylogenetic position of S. sinensis has been considered controversial due to its simultaneous possession of phenotypic characteristics from members of mitis, sanguinis, and anginosus groups (Woo et al. 2006). Genotypic methods such as 16S rRNA or GroEL sequence analyses also failed to reveal its exact taxonomic group. In this study, the genome data, offering superior discriminatory power than any phenotypic and genotypic methods, resolve the taxonomic ambiguity of S. sinensis. Based on these genomic evidences, we suggest the three species, S. sinensis, S. oligofermentans, and S. cristatus, forming a well-supported clade, might best be considered as a separate group that we tentatively propose it as the sinensis group.
MALDI-TOF MS Analysis Supports the Newly Proposed Sinensis Group
Dendrogram generated from hierarchical cluster analysis of MALDI-TOF MS showed that the MALDI-TOF MS spectrum of S. sinensis HKU4T and 28 nonduplicated Streptococcus species correctly recovered members of salivarius, mutans, bovis and pyogenic groups, and Streptococcus suis (fig. 4). Streptococcus sinensis consistently formed a distinct cluster with S. oligofermentans and S. cristatus, but these three streptococci were clustered with members belonging to “mitis group” (fig. 4). Notably, in contrast to the previous phylogenomic analysis using 50 concatenated genes or whole genomes, these three streptococci were clustered with “sanguinis group” (fig. 3A and B). On the basis of all these findings, these three streptococci should not be included into either sanguinis group or mitis group, but they should exist as a novel group, sinensis group in the genus Streptococcus.
Dendrogram generated from hierarchical cluster analysis of MALDI-TOF MS spectra of S. sinensis HKU4T and 28 isolates of other Streptococcus species. Distances are displayed in relative units.
Dendrogram generated from hierarchical cluster analysis of MALDI-TOF MS spectra of S. sinensis HKU4T and 28 isolates of other Streptococcus species. Distances are displayed in relative units.
Comparative Genomic Analyses of the Three Genomes of Sinensis Group
Based on BLAST (Basic Local Alignment Search Tool) analysis using 50% nucleotide identity as a threshold, we compared the genome sequence of S. sinensis HKU4T with those of S. oligofermentans and S. cristatus and found that 1,241 CDSs were shared among the three genomes (fig. 5). There were 222 CDSs only shared with S. oligofermentans and 63 CDSs only shared with S. cristatus (fig. 5). A total of 466 CDSs were uniquely found in S. sinensis HKU4T genome (fig. 5). Among these 466 unique CDSs, only 90 CDSs could be classified into RAST subsystems according to their predicted functional roles. For the remaining 376 CDSs, which did not belong to any subsystem, 256 CDSs were only annotated as hypothetical proteins. More in-depth analyses on these unique CDSs, especially those annotated as hypothetical proteins, may provide insights about the differences in metabolic capacity between S. sinensis and two other members of sinensis group.
Comparative genomic analysis among three members of sinensis group, including S. sinensis, S. oligofermentans, and S. cristatus. The total number of CDSs per genome is given as indicated. The overlapping sections indicate shared numbers of CDSs among genomes.
Comparative genomic analysis among three members of sinensis group, including S. sinensis, S. oligofermentans, and S. cristatus. The total number of CDSs per genome is given as indicated. The overlapping sections indicate shared numbers of CDSs among genomes.
Potential Virulence Factors in S. sinensis HKU4T
Previous studies showed that S. sinensis has been recovered from patients with infective endocarditis globally (Woo et al. 2002, 2004; Uckay et al. 2007; Faibis et al. 2008), suggesting that the bacteria may possess virulence factors to adhere and to colonize heart valves and to cause endocardial damage. The draft genome of S. sinensis HKU4T contains homologs of several virulence genes that are important for the pathogenesis of infective endocarditis. These include platelet activating factor, clumping factor B, collagen-binding protein, laminin-binding protein, and fibronectin/fibrinogen-binding protein which have been implicated in the pathogenesis of infective endocarditis by promoting bacterial adherence to endothelial tissues and triggering inflammatory responses (supplementary table S3, Supplementary Material online) (Abranches et al. 2011; Que and Moreillon 2011). Notably, two other members of the sinensis group, S. oligofermentans and S. cristatus, have also been isolated from patients with infective endocarditis (Matthys et al. 2006; Matta et al. 2009). Detailed comparative genomic studies on the S. sinensis genome and genomes of members of the sinensis group and those of the mitis group, including Streptococcus mitis, S. tigurinus, and Streptococcus oralis, may shed light on the ecology and biology of S. sinensis as well as pathogenesis of infective endocarditis caused by members belonging to this new phylogenetic clade.
Conclusions
In this study, we sequenced the first draft genome of S. sinensis type strain HKU4T and unambiguously determined the phylogenetic position of S. sinensis using phylogenomic approach. Phylogenomic and MALDI-TOF MS analysis revealed a distinct phylogenetic clade in the genus Streptococcus, which we proposed it as sinensis group, currently comprising three species, S. sinensis, S. oligofermentans, and S. cristatus. The present draft genome sequence has also allowed rapid exploration of potential virulence genes in S. sinensis. Our findings also illustrate the power of phylogenomic analyses for resolving ambiguities in bacterial taxonomy.
Materials and Methods
Bacterial Strains
A total of 29 nonduplicated Streptococcus strains, including S. sinensis HKU4T, were included in the MALDI-TOF MS analysis (supplementary table S4, Supplementary Material online). Streptococcus sinensis HKU4T was isolated from blood culture of a Chinese patient with infective endocarditis in Hong Kong, whereas the remaining strains were either purchased from the Collection of Institut Pasteur or Culture Collection, University of Göteborg (CCUG) (Woo et al. 2002).
Genome Sequencing, Assembly, and Annotation of S. sinensis Type Strain HKU4T
The draft genome sequence of S. sinensis HKU4T was determined by high-throughput sequencing with Illumina Hi-Seq 2500. Genomic DNA was extracted from overnight cultures (37 °C) grown on blood agars by genomic DNA purification kit (QIAgen, Hilden, Germany) as described previously (Woo et al. 2009; Tse et al. 2010). It was sequenced by 151-bp paired-end reads with mean library size of 350 bp. Sequencing errors were corrected by k-mer frequency spectrum analysis using SOAPec (http://soap.genomics.org.cn/about.html, last accessed October 22, 2014). De novo assembly was performed using SOAPdenovo2 (http://soap.genomics.org.cn/soapdenovo.html, last accessed October 22, 2014). Prediction of protein-coding regions and automatic functional annotation was performed using Glimmer3 and RAST server version 4.0 (Delcher et al. 2007; Aziz et al. 2008). Intergenomic distance between S. sinensis HKU4T and other Streptococcus species, including representatives from seven major groups, was calculated using GGDC 2.0 (http://ggdc.dsmz.de/distcalc2.php, last accessed October 22, 2014) (Auch et al. 2010). Streptococcus species included for the distance calculation was shown in table 1.
Genome Sequence Data
Details of the 88 genome sequences used in this study are shown in table 1. Despite the 18 draft genome sequences, including one from S. sinensis HKU4T which was sequenced to near completion as part of this study, the remaining 70 genome sequences were complete and downloaded from National Center for Biotechnology Information. Nucleotide sequences of 50 ribosomal protein genes (supplementary table S2, Supplementary Material online) and one copy of 16S rRNA and groEL genes were retrieved, respectively, from all 88 genomes (table 1).
Phylogenetic Characterization
The tree based on entire genome sequences was constructed by neighbor-joining method using GGDC distance (formula 1: length of all high-scoring segment pairs divided by total genome length) and E. faecalis as the root. The trees based on single gene loci, 16S rRNA and groEL gene (nucleotide and amino acid), were aligned by Geneious 7.1.5 (Biomatters Limited) and constructed, respectively, by maximum-likelihood method using RAxML (version 7.3). The tree based on the concatenated sequences of 50 ribosomal protein genes from 88 Streptococcus species, including 35 nonduplicated species, was constructed by maximum-likelihood method using RAxML (version 7.3). A total of 20,921 nucleotide positions were included in the analysis. Nucleotide sequences of corresponding homologs in E. faecalis were used as the outgroups where appropriate.
Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry
MALDI-TOF MS was performed as previously described with slight modifications (Tang et al. 2013; Lau et al. 2014). Twenty-nine nonduplicated Streptococcus strains, including S. sinensis HKU4T, were grown on sheep blood agar at 37 °C with 5% CO2 for 48 h. All isolates were analyzed by the ethanol formic acid extraction method using the same experimental conditions suggested by the Bruker system. Briefly, 1–3 colonies were suspended into 100 µl of HPLC grade water (Fluka, St Louis, MO). Then, 300 µl of absolute ethanol was added and incubated for 5 min at room temperature. Cell pellet was then air dried after centrifugation. The pellet was resuspended in 30 µl each of formic acid (Fluka) and acetonitrile (Sigma Aldrich, St Louis, MO). Each bacterial extract was spotted onto six spots of the MSP96 target plate after centrifugation. Samples were processed in the Bruker MicroFlex LT mass spectrometer (Bruker Daltonics, Bremen, Germany) with 1 μl of α-cyano 4-hydroxycinnimic acid matrix solution (Sigma Aldrich). Spectra were obtained with an accelerating voltage of 20 kV in linear mode and analyzed within m/z range 3,000–15,000 Da. Spectra were analyzed with MALDI Biotyper 3.1 and Reference Library V.3.1.2.0 (Bruker Daltonics). A mass spectrum profile (MSP) based on 24 separate determinations was created. The representative MSPs were then selected for hierarchical cluster analysis using MALDI Biotyper 3.1 (Bruker Daltonics) with default parameters (Ketterlinus et al. 2005), where distance values are relative and are always normalized to a maximum value of 1,000.
Acknowledgments
This work was partly supported by the Strategic Research Theme Fund and the Small Project Funding Scheme, The University of Hong Kong. The authors thank members of the Centre for Genomic Sciences, The University of Hong Kong, for their technical support in genome sequencing.



