Genetic Diversity of Composite Enterotoxigenic Staphylococcus epidermidis Pathogenicity Islands

Abstract The only known elements encoding enterotoxins in coagulase-negative staphylococci are composite Staphylococcus epidermidis pathogenicity islands (SePIs), including SePI and S. epidermidis composite insertion (SeCI) regions. We investigated 1545 Staphylococcus spp. genomes using whole-genome MLST, and queried them for genes of staphylococcal enterotoxin family and for 29 ORFs identified in prototype SePI from S. epidermidis FRI909. Enterotoxin-encoding genes were identified in 97% of Staphylococcus aureus genomes, in one Staphylococcus argenteus genome and in nine S. epidermidis genomes. All enterotoxigenic S. epidermidis strains carried composite SePI, encoding sec and sel enterotoxin genes, and were assigned to a discrete wgMLST cluster also containing genomes with incomplete islands located in the same region as complete SePI in enterotoxigenic strains. Staphylococcus epidermidis strains without SeCI and SePI genes, and strains with complete SeCI and no SePI genes were identified but no strains were found to carry only SePI and not SeCI genes. The systematic differences between SePI and SeCI regions imply a lineage-specific pattern of inheritance and support independent acquisition of the two elements in S. epidermidis. We provided evidence of reticulate evolution of mobile elements that contain elements with different putative ancestry, including composite SePI that contains genes found in other coagulase-negative staphylococci (SeCI), as well as in S. aureus (SePI-like elements). We conclude that SePI-associated elements present in nonenterotoxigenic S. epidermidis represent a scaffold associated with acquisition of virulence-associated genes. Gene exchange between S. aureus and S. epidermidis may promote emergence of new pathogenic S. epidermidis clones.


Introduction
Staphylococci are a worldwide cause of human and animal infections including life-threatening cases of bacteremia, wound infections, pyogenic lesions, and mastitis (Stefani and Goglio 2010;Contreras and Rodr ıguez 2011). Host colonization (Xu et al. 2015) and disease progression can be associated with a family of bacterial proteins known as staphylococcal enterotoxins (SEs). In particular, SEs have a well-established role in food poisoning due to the emetic properties of enterotoxins (Argud ın et al. 2010) and diseases such as toxic shock syndrome (Pinchuk et al. 2010), associated with the ability of enterotoxins to stimulate extensive activation and proliferation of T cells (Hennekinne et al. 2012). Although much of the work on enterotoxins has focused on Staphylococcus aureus (Fisher et al. 2018), enterotoxin genes are being detected among coagulase-negative staphylococci (CoNS) (Madhusoodanan et al. 2011;Gazzola et al. 2013;Podkowik et al. 2016;Argemi 2018). These include sec and sel genes which are associated with emetic activity in S. aureus and have been implicated in numerous foodborne outbreaks (Jones et al. 2002;K erouanton et al. 2007;Chiang et al. 2008).
In the EU, detection of staphylococcal enterotoxin C (SEC) is a part of routine examination of selected food products ß The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
where S. aureus is identified. However, detection of S. aureus is a prerequisite of further examination of products, therefore enterotoxins produced by CoNS would not be detected using current food safety protocols. Consequently, products containing CoNS enterotoxins are at present considered safe. Staphylococcus aureus pathogenicity islands (SaPIs) encoding the sec gene often also harbor the sel gene encoding another enterotoxin, SEL. This toxin was initially characterized in S. aureus from a bovine mastitis isolate with no emetic activity (Orwin et al. 2003), but recent research has shown that SEL can induce emesis, indicating that this toxin should also be taken into account as potential food safety hazard (Omoe et al. 2013).
SaPIs are derived from temperate phages and encode numerous phage-like elements responsible for induction and excision from the genome. SaPIs can move between S. aureus strains and incorporate in a number of specific locations in the genome of the pathogen. However, the SaPIs cannot encode elements allowing them to leave the bacterial cell and transfer independently. Therefore, horizontal gene transfer (HGT) of SaPIs requires the involvement of specific helper phages that are found in the environment or are encoded by SaPI-containing isolates. During this HGT packaging process, SaPI DNA is incorporated into the phage, replacing phage DNA, allowing it to be transferred to new cells. This has led to descriptions of SaPIs as parasites of staphylococcal phages ( Ubeda et al. 2005). Extending understanding of the genetic basis of SaPIs to CoNS, specifically S. epidermidis, has identified a SaPI ortholog which included the elements encoding staphylococcal enterotoxins sec and sel (Madhusoodanan et al. 2011). SECepi was first identified as SEC3 variant in strain FRI909, originally described as S. aureus and later reclassified as S. epidermidis. Some properties of SECepi were described (Chi et al. 2002) with subsequent assessment of stability and mitogenic activity (Nanoukon et al. 2018). This S. epidermidis pathogenicity island (SePI) is a composite of two discrete regions separated with direct repeat motifs: a 9.6 kb SaPI-like region and a 16.3 kb S. epidermidis composite insertion (SeCI). Typically, SaPIs contain conserved regions, such as the integrase gene and specific direct repeats, as well as large variable regions that can be acquired, modified, or even deleted. Since a number of SaPIs do not contain enterotoxin genes they are sometimes regarded as ancestors of the enterotoxin-encoding elements in which toxins genes were acquired or lost through recombination (Subedi et al. 2007).
The availability of increasingly large isolate genome collections presents new opportunities for understanding the evolution of pathogenicity islands in species where they have been largely overlooked, such as S. epidermidis. In this study, we investigate the composition and diversity of SePIs in a large data set of CoNS genomes. Little is known about the structure and mobility of genetic elements carrying enterotoxin genes in CoNS, and efforts to mobilize the SePI element using existing protocols have not been successful (Madhusoodanan et al. 2011). Through comparison of hundreds of coagulase-negative and coagulase-positive genomes, we identify possible scenarios for the incorporation of the SeCI and SePI in the S. epidermidis genome and the events that led to SePIs with composite ancestral backgrounds.

Isolate Collection
Whole-genome sequences of 1545 Staphylococcus spp. strains, most of them from other studies (M eric et al. 2015Richards et al. 2015;Yan et al. 2016;Murray et al. 2017;Post et al. 2017), were used in the study and archived in a public Bacterial Isolate Genome Sequence Database (BIGSdb) (Jolley and Maiden 2010)

Core Genome Characterization and Isolate Genealogies
A gene-by-gene approach and whole-genome MLST were implemented using the Genome Comparator module of the Bacterial Isolate Genome Sequence Database (BIGSdb) web platform (Jolley and Maiden 2010;Sheppard et al. 2012;
Only 10 out of 662 non-S. aureus genomes were found to carry genes encoding members of staphylococcal enterotoxin family, including one coagulase-positive isolate (CPS) and nine CoNS. The eight genes forming egc together with selx gene were found in the genome of CPS, that is, S. argenteus MSHR1132 strain. The sec and sel genes were found in genomes of nine S. epidermidis strains. The composite SePI, first described in S. epidermidis FRI909, by Madhusoodanan et al. (2011), and then by Gazzola et al. (2013)-strain UC7032, Podkowik et al. (2016)-4S, and Argemi (2018)strains SE90 and SE95. SePI was also found in whole-genome sequences of four S. epidermidis isolates (803NLR2, 29, 34, and 65.2), the first of which was isolated from human source (M eric et al. 2015) and the remaining three originating in food and newly sequenced for this study.

Lineage-Specific Acquisition of Enterotoxin Genes in Staphylococcus epidermidis
A neighbor-joining phylogeny of all 564 S. epidermidis isolates based on shared sequence in 2,058 core genes was constructed ( fig. 2A). Three discrete clusters were observed in the S. epidermidis population. All nine enterotoxigenic S. epidermidis strains were assigned to a cluster of 65 strains (designated as cluster I) distant from remaining 477 S. epidermidis strains grouped in cluster II, and 22 strains in cluster III ( fig. 2A). Cluster I corresponds to group B, cluster II to group A, and cluster III to group C, as proposed by Thomas et al. (2014). Group B includes almost 70% S. epidermidis isolates derived from healthy people, and 27% from disease, group A contains 39% from healthy people and 57% isolates from disease, and group C 77% isolates from healthy people   and 22% from disease (Thomas et al. 2014). In cluster I, S. epidermidis FRI909 was found to be closely related to SE95, while S. epidermidis 4S was most related to 803NLR2, and UC7032. Staphylococcus epidermidis 29, 34, and 65.2 form a group of closely related strains distant from remaining toxigenic S. epidermidis. Staphylococcus epidermidis SE90 was found to be less closely related to remaining toxigenic strains. All four S. epidermidis strains were found to carry SePI, as well as previously identified reference strains, were among the GC4 complex identified previously by Tolo et al. (2016). In this complex, one S. epidermidis culture was isolated from human carriage, two from infection and three were designated as contaminant (Tolo et al. 2016).

Composite SePIs in Staphylococcus Genomes
We analyzed the gene content of 1545 Staphylococcus spp. isolates to identify 29 open reading frames (ORFs) previously identified in a prototype composite SePI from S. epidermidis FRI909 (Madhusoodanan et al. 2011) (table 3). A complete composite SePI (all 29 ORFs) was identified in five enterotoxigenic S. epidermidis strains (4S, 29, SE95, and 803NLR2). Several other S. epidermidis isolates were missing only a small number of the composite SePI gene homologues: isolates 34 and 65.2 were missing one ORF; SE90 was missing two ORFs; UC7032 was missing three ORFs. Cluster I encompassed all the enterotoxigenic S. epidermidis isolates, but also included isolates (n ¼ 18) in which only a limited number (five or less) of composite SePI gene homologs were identified. No homologous genes from the composite SePI were identified in nonenterotoxigenic S. epidermidis strain SS_0406 in this cluster. In four isolates, only one to two gene homologs were found, all of which were from the SePI region of the island. There were five nonenterotoxigenic S. epidermidis isolates, closely related to the enterotoxigenic strains containing a complete SeCI region, but only two ORFs from the SePI region of the composite island. Another five S. epidermidis isolates containing the complete SeCI region contained no gene homologs from the SePI part of the composite island. Visualization of the six nonenterotoxigenic S. epidermidis annotated genomes with Mauve (Darling et al. 2004) located the SeCI region in the same position in genome as the composite SePI in toxigenic strains, between the SsrA gene and a cation transporter protein gene. Isolates carrying only the complete SePI-part of the composite pathogenicity island but not SeCI, were not identified within analyzed S. epidermidis genomes ( fig. 2A). Considerable parts of SeCI were also found to occur in a group of strains within S. epidermidis cluster III. These strains correspond to GC3 complex as described previously (Tolo et al. 2016).
We screened all genomes for the 29 S. epidermidis FRI909 prototype SePI ORFs and identified variation in the dissemination of the genes among lineages. ORF 1 was present in 38% of S. epidermidis genomes, but completely absent in all other analyzed species (fig. 2). ORF 1 was also not identified in any of the nonenterotoxigenic isolates from S. epidermidis cluster I. ORF 2, encoding a homolog of a S. aureus SaPI integrase was found in a small number of S. epidermidis genomes (55 isolates; 10%), but also in a similar percentage of S. aureus (2%) and other CoNS (8%) genomes (15 and 8 isolates, respectively). Some ORFs (3, 8, and 9) were species-specific, found exclusively in the genomes of enterotoxigenic S. epidermidis. ORF 4 was core to all enterotoxigenic S. epidermidis, but limited to only two other nonenterotoxigenic S. epidermidis isolates, also from cluster I. Both ORFs 5 and 6 occurred in 8% of S. epidermidis genomes and a small number of other CoNS species (2%, and 8%, respectively).  ORFs 14,15,17,18,[20][21][22][23][24], frequently occurring in S. epidermidis cluster I, were also identified in 16 genomes belonging a group of strains within S. epidermidis cluster III, also corresponding to GC3 complex as described by Tolo et al. (2016).

Allelic Variation within Composite SePI Loci
We identified 51 S. epidermidis isolates (out of 564 genomes) with over 30% composite SePI ORF homologs with variation between genes. Some ORFs were conserved between isolates. For example, no allelic variation was observed in SePI ORFs 1, 3, 9, and 12. SeCI ORFs 25, 26, and 28 were also conserved and were identified as single allelic variants in both enterotoxigenic and nontoxigenic genomes. Slight variation was observed in ORFs 8 and 10, where two allelic variants were identified. Allelic diversity in the remaining ORFs varied between isolates, and more than ten allelic variants were identified for ORFs 7, 11, 13, 18, and 20 ( fig. 3). The patchwork architecture of the composite SePI suggests that genes within the island have different evolutionary histories ( fig. 4). Terminase (ORF 7), transposases (ORFs 11 and 13), and hypothetical proteins (ORFs 18 and 20) showed the greatest genetic variability in the island.

Organization of Composite SePIs
The gene order of each composite SePI was determined for all nine enterotoxigenic S. epidermidis isolates identified using flanking DR sequences and genome annotation details ( fig. 5). Within these sequences 30-35 ORFs were identified. Overall nucleotide identity between SePI sequences was between 99.43% and 99.99% and composite SePI gene architecture was also similar. Similarity between the five sec/sel-carrying S. aureus SaPIs (SaPITokyo12571 carrying sec 2 , SaPIn1-sec 3 , SaPImw2-sec 4 , SaPIbov1-sec bovine , and SaPIov1-sec ovine ) ranged from 49% to 82%. This shows that SePI family is more homogenous than SaPI. Furthermore, the average nucleotide identity of SePIs to SaPIs ranged from 38.02% to 53.04% and the number of homologous ORFs among composite SePIs and SaPIs is higher in the SePI region of the composite island than the SeCI region (figs. 2 and 5).
The location of DR2 sequences in composite SePI of S. epidermidis FRI909 suggested that SePI and SeCI components of the composite island overlap, having ORFs 11 and 12 in common (Madhusoodanan et al. 2011). Our analysis of S. epidermidis strains bearing composite SePI confirmed that the location of DR1 and DR2 motifs corresponds to that of S. epidermidis FRI909. Analysis of our data suggests an alternative structure of composite SePI in which the SePI component, flanked by DR1 motifs, begins at ORF1 and extends to ORF12. In turn, SeCI begins at ORF 14 and ends at ORF 29, flanked by DR2 motifs. ORFs 11 and 13 were commonly observed in S. epidermidis isolates outside of genotype cluster I (occurring in 24% and 31% genomes, respectively). Furthermore, in the majority of genomes no further SeCI homologs were identified. In a number of S. epidermidis cluster I strains, ORF 11 was not present (n ¼ 11). ORF 13 was also absent from a number of S. epidermidis cluster I strains (n ¼ 24), including strains bearing complete SeCI. Average SeCI ORF allelic variation of ORFs 11 and 13 was greater (13 and 11 alleles, respectively). Most loci within the SePI were not possible to construct as gene sequences were identical in all isolates. Each of these genes was found in nine S. epidermidis genomes. component of the composite island were present in a large number of S. epidermidis genomes. However, a different pattern of inheritance was observed for SeCI where potential ancestral motifs were observed in only a limited number of genomes within S. epidermidis cluster III. The absence of ORF 13 in these isolates suggests that the occurrence of ORFs 11 and 13 is not dependent on presence of SeCI in S. epidermidis genomes.

Discussion
Staphylococcal enterotoxins are an important cause of food poisoning, and S. aureus isolates are the main producers of  Islands (SaPITokyo12571, SaPIn1, SaPImw2, SaPIbov, SaPIov1) containing different variants of the sec gene. Sequences of SePIs and SaPIs were extracted from whole genomes based on the flanking DR sequences and annotated using Prokka (Seemann 2014). BLAST alignments and visualization were performed using Easyfig 2.2.2 (Sullivan et al. 2011). Color gradient legend indicates identity of BLAST hits. these agents (Argud ın et al. 2010). The possibility of enterotoxin production by CoNS was first considered in the early 1970s (Crass and Bergdoll 1986), with the first evidence of enterotoxin genes in CoNS much later (Madhusoodanan et al. 2011). Through comparison of a large collection of staphylococcal genomes, we identified loci with homology to known staphylococcal enterotoxin and superantigen-encoding genes. These homologous enterotoxin loci were infrequent outside S. aureus. This is consistent with PCR-based studies in which few (or no) staphylococcal superantigen-encoding genes were identified in CoNS of clinical (Madhusoodanan et al. 2011;Stach et al. 2015;Argemi 2018), animal (Naushad et al. 2019), or food origin (Gazzola et al. 2013;Podkowik et al. 2016).
Our analyses indicate that few S. aureus genomes lack enterotoxin and superantigen-encoding genes and many isolates contain multiple superantigen-encoding genes. Enterotoxin and superantigen-encoding genes are often located on staphylococcal pathogenicity islands in S. aureus (Lindsay et al. 1998;Fitzgerald et al. 2001;Subedi et al. 2007) and these potentially mobile elements comprise a diverse collection of alleles. SaPIs were divided into six subclasses according to their att/int specificity, which could be up to 90% divergent at the nucleotide level, or different at >160 amino-acid residues (Subedi et al. 2007). This high level of genetic variability and the mosaic structure of S. aureus SaPIs suggest they are hot spots of acquisition and/or loss of toxin genes which may be recombination-mediated (Subedi et al. 2007). This is consistent with evidence that S. aureus MGEs can evolve through successive incorporation of DNA elements from non-S. aureus spp. (Copin et al. 2019).
It has been speculated that mobile elements have transferred from CoNS to S. aureus (Otto 2013a). CRISPR elements found in CoNS isolates determine incorporation of foreign DNA into the genome and may limit the acquisition of mobile genetic elements, including enterotoxin genes (Otto 2013b). Consequently, we might expect unidirectional acquisition of mobile elements from CoNS by S. aureus. This evolutionary scenario explains the acquisition of the SCCmec cassette, ACME elements, and sasX genes from CoNS (Wu et al. 1996;Kriegeskorte and Peters 2012). However, the distribution of CRISPR elements is much lower in S. epidermidis and other staphylococci than previously thought and there is recent evidence of frequent HGT and exchange of mobile genetic elements within and between staphylococcal species (M eric et al. 2015;Rossi et al. 2017). Staphylococcus aureus may even act as a source of mobile elements for CoNS, including pathogenicity island exchange, as demonstrated by the transduction of S. aureus SaPI to S. xylosus and S. epidermidis (Maiques et al. 2007;Chen and Novick 2009).
Our analysis of the composite SePI in S. epidermidis provides evidence of mobile elements with seemingly of different ancestral backgrounds. Specifically, the island consists of regions containing genes that are significantly similar and often found in other CoNS (SeCI), as well as S. aureus (SaPIlike) elements. This has important implications for the food safety, as the sec and sel genes encoding emetic toxins were also identified in CoNS. The widespread presence of these SaPI genes in multiple S. aureus lineages suggests that they are mobile and able to cross the staphylococcal species boundary.
Some S. epidermidis isolates contained a complete composite SePI while others comprised allelic variants or missing genes with some evidence of mosaicism. Even within a specific S. epidermidis sequence cluster, closely related isolates varied in their enterotoxigenic gene content. Some S. epidermidis isolates in cluster I were replete with intact composite SePI, while others lacked any homologous genes. Specific SePI genes were highly variable, while others were entirely conserved in S. epidermidis. Most variable loci were ORF 7, which encodes a terminase homolog, ORFs 11 and 13, which are transposase homologs, as well as ORFs 18 and 20 which encode unidentified proteins.
There were also differences in the putative ancestry of genes within the SePI and SeCI regions of the composite island. ORFs in the SePI-like region (ORFs 1, 2, 5-7, and 11) are frequently identified in S. epidermidis cluster II. Their high degree of nucleotide homology with genes in the SaPI suggests recent acquisition from S. aureus. Consistent with this, individual gene trees of these ORFs showed little evidence of two species population structure. ORFs including 2, 5-7, 11, and 13 were very common in S. epidermidis and were also dispersed throughout other staphylococcal species. The transposase gene located between sec and sel in composite SePI was common in S. epidermidis and other CoNS species as well as in S. aureus. Diversity of SaPIs bearing the sec gene, as well as diversity of the sec sequences within S. aureus, was higher than in S. epidermidis. SEC 3 and SECepi differ at 43 nucleotides, which translates to 12 amino acids. Analysis of amino acid identity of S. aureus SEC variants indicated a maximum of 5.2% difference within this subfamily and 4.5-9.7% between S. aureus and S. epidermidis SECs. The enhanced variability of SEC in S. aureus may imply that it was acquired by S. aureus earlier than by S. epidermidis but the role of selection in influencing sequence variation is unclear. Interestingly, the sel gene is less divergent both within S. aureus SaPIs and between S. epidermidis and S. aureus, possibly indicating recent transfer or conservation of SEL compared with SEC. Our analyses do not address the direction of movement between and within staphylococcal species. It is possible that one or more species may be involved in the movement of genes between S. aureus and S. epidermidis. Continuous growth of sequenced genome collections may help to identify the elements involved in movement of enterotoxin genes within staphylococci.
The systematic differences between SePI and SeCI regions imply a lineage-specific pattern of inheritance. However, the evolutionary scenario leading to distinct patterns of inheritance within the pathogenicity islands remains unknown. Analysis of a large number of staphylococcal genomes supported a scenario of independent acquisition of the two elements in S. epidermidis. The SeCI component was present in genomes of nontoxigenic S. epidermidis strains very closely related to the strains bearing complete, composite, SePI. Within this group SeCI has undergone further evolutionary change since complete and incomplete SeCI were identified within nontoxigenic strains. Moreover, within a cluster of strains closely related to toxigenic S. epidermidis, we identified strains with neither SeCI nor SePI composite island components. Furthermore, large motifs that could constitute part of SeCI were also identified in S. epidermidis (cluster III) that were more distantly related to enterotoxigenic strains. Our data suggest some degree of mosaic ancestry and interspecies gene flow but that genetic diversity between S. epidermidis and S. aureus is high and much of the inferred mosaicism is within species.
Our study suggests that complete enterotoxin pathogenicity islands are uncommon in staphylococcal species other than S. aureus, being identified only in S. epidermidis and S. argenteus in this study. Furthermore, they were in certain S. epidermidis lineages and had a reduced repertoire of enterotoxin-encoding genes compared with S. aureus. We show that numerous nonenterotoxigenic S. epidermidis strains harbor incomplete composite SePIs and these elements represent a scaffold, potentially associated with the acquisition or loss of pathogenic elements, such as sec and sel genes, associated with interspecies genetic exchange between S. aureus and S. epidermidis. Some S. epidermidis strains contained variable SePI alleles, consistent with dynamic evolution at these loci involving possible recombination events. Since populations of S. aureus and S. epidermidis can occupy overlapping ecological niches, gene flow may occur between these two species. This has potentially promoted the emergence of toxigenic S. epidermidis clones, and highlights the importance of ongoing surveillance to detect new recombinant lineages.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.