Bacteriophage-host interactions in Streptococcus thermophilus and their impact on co-evolutionary processes

Abstract Bacteriophages (or phages) represent a persistent threat to the success and reliability of food fermentation processes. Recent reports of phages that infect Streptococcus thermophilus have highlighted the diversification of phages of this species. Phages of S. thermophilus typically exhibit a narrow range, a feature that is suggestive of diverse receptor moieties being presented on the cell surface of the host. Cell wall polysaccharides, including rhamnose-glucose polysaccharides and exopolysaccharides have been implicated as being involved in the initial interactions with several phages of this species. Following internalization of the phage genome, the host presents several defences, including CRISPR-Cas and restriction and modification systems to limit phage proliferation. This review provides a current and holistic view of the interactions of phages and their S. thermophilus host cells and how this has influenced the diversity and evolution of both entities.


Introduction
Streptococcus thermophilus is a lactic acid bacterium that has long been associated with the production of fermented dairy products, including yoghurt and Italian-and Swiss-style cheeses. A total of 180 species of str eptococci ar e curr entl y defined ( https://lpsn.d smz.de/genus/streptococcus November 2022), and among these S. thermophilus remains the only known non-pathogenic member of this genus, being a member of the salivarius group of the viridans streptococci. The origins of this species are somewhat controversial, but it seems likely that it has emerged through a combination of genome decay of other viridans streptococcal species and horizontal gene tr ansfer e v ents (Delorme 2008 ). While strains of the species are inextricably linked to fermented dairy foods, recent studies have demonstrated their ecological association with plant material (Umamaheswari et al. 2014 ). This is consistent with a traditional Bulgarian approach to yoghurt production in which the br anc h of a native plant was used to inoculate boiled sheep's milk (Micha ylo va et al. 2007 ).
The identification of S . thermophilus and Lactobacillus bulgaricus in 1905 by the Bulgarian physician and microbiologist, Stamen Grigorov, incited scrutiny into the functions , activities , and metabolites of so-called 'beneficial microbes' and formed the foundations for the field of probiotic research (Lilly and Stillwell 1965 ). Streptococcus thermophilus is associated with the alleviation of lactose intolerance-the only probiotic claim that is currently recognized and accepted by regulatory bodies [Regulation (EC) No. 1924]. Furthermore, its long history of safe application in food fermentations and human consumption are among the major factors underpinning its Generally Regarded as Safe status as The complete genomes of 83 S. thermophilus str ains ar e curr entl y av ailable in the NCBI database (searc h date: Nov ember 1, 2022). The genomes of these str ains ar e typicall y between 1.73 and 2.10 Mb (Table S1) and have an av er a ge of 39% G + C content. Recently, it has been proposed that there are two major clusters of S. thermophilus genomes (A and B) that may be differentiated based on gene gain or loss (Alexandraki et al. 2019 ). These two clusters can primarily be distinguished based on size, with Cluster A strains possessing genomes > 1.83 Mb and Cluster B strains with genomes below this threshold value. Of the 83 genomes currently available in public databases, 55 would be classified as Cluster A and 28 as Cluster B genotypes based on the genome size criterion (Table S1). This is ink ee ping with the findings of the comparison of 23 S. thermophilus genomes, in which ∼30% of strains were identified to have genomes of < 1.83 Mb (Alexandraki et al. 2019 ). Among these 23 anal ysed genomes, the percenta ge of pseudogenes was in the range of ∼9% to 14%. While it is suggested that the pan-genome of this species may soon be closed, ther e ar e genomic regions of variability that provide elasticity to strains of this species and constant evolution in response to the pressures that ar e pr esent in their natur al or industrial envir ons. Among these regions of genetic variability are those relating to the biosynthesis of exopol ysacc harides (EPS), rhamnose-g lucose pol ysacc harides (RGP), and the CRISPR-Cas loci. In this r e vie w, we discuss recent advances in defining the role of cell wall polysaccharides in host r ecognition by pha ges and/or pha ge exclusion. Furthermor e, we provide detailed insights into the core functions associated with the biosynthesis of EPS and RGP and explore how the diversification of the gene clusters that encode these structures may influence the ev olutionary pathw ay of phages infecting S. thermophilus.

EPS produced by S. thermophilus
The global starter culture market is curr entl y v alued at ∼$1.1 billion USD and is expected to increase to $1.5 billion USD by 2027 according to market r esearc h sources (BusinessWir e, r eport, October 2022). Among bacterial starter cultures, S. thermophilus is the second most widely exploited species applied in food fermentations (after Lactococcus lactis/cremoris ); ho w e v er, r ecent gr owth in demand for plant-based alternatives to dairy fermented products may r e v erse this order in the coming decades (Har per et al. 2022 ). To illustrate this, S. thermophilus is currently applied for the production of a range of yoghurt-like products based on plantbased substrates, including oat, soy, almond, and cashew, among others (Montem urr o et al. 2021 ). The primary function under pinning its widespread application in these products is the ability of man y str ains of S. thermophilus to produce e xo p oly s accharide (EPS), which contributes to the texture and mouthfeel of the product. The yield of EPS from S. thermophilus strains ranges from 20 to 600 mg/L (Vaningelgem et al. 2004 ), values that can be influenced by culturing conditions, carbon source, and co-cultivation with (non-EPS-producing) strain(s) of the same or other bacterial or yeast species (Zisu andShah 2003 , Sørensen et al. 2022 ). Interestingly, the composition or combination of EPS types is a major contributors to the rheology, texture, and microstructural properties of yoghurt, rather than simply the amount of EPS that is pr oduced (Folkenber g et al. 2006 ).
In addition to their technological properties, the EPSs of certain S. thermophilus strains have been demonstrated to facilitate bacterial adhesion to gastric mucosa and reduce the adhesion capacity of Helicobacter pylori , while they have also been reported to diminish the expression of pro-inflammatory markers (Marcial et al. 2017 ). Furthermore, the EPS of S. thermophilus ST538 was shown to enhance expression of interferon β, interleukin 6, and C-X-C motif chemokine 10 in response to activation of toll-like receptor 3 in porcine intestinal epitheliocytes, suggesting that the EPS of this strain contributes to defence against viral infections (Mizuno et al. 2020 ). The activities and applications of EPS produced by S. thermophilus are of continued interest to the ever-expanding probiotics market [for an extensive review on this topic, see (Sørensen et al. 2022 )].
The EPS structures produced by S. thermophilus strains are typicall y heter opol ysacc harides containing glucose , galactose , rhamnose, and on occasion N-acetylglucosamine, galactosamine, and fucose (Bubb et al. 1997 ;Low et al. 1998 ;Ricciardi et al. 2002 ;Szymczak et al. 2018 ;McDonnell et al. 2020 ;Jur ášk ov á et al. 2022 ). The gene clusters that encode EPS biosynthetic functions vary extensiv el y in size and genetic composition (Bourgoin et al. 1999 ;Parlindungan et al. 2022 ). Despite the sequence diversity of these gene clusters, ther e ar e cor e functions that ar e conserv ed among their associated gene products, for example, EPS chain length regulation, sugar transfer, subunit polymerization, and membrane translocation ( Fig. 1 ) (Parlindungan et al. 2022 ). Functional analysis of core genes associated with these loci has provided insights into the mechanisms by which these structures are constructed and r egulated. EPS pr oduction v aries depending on the sugar substrate that is present in the medium, and epsC and epsD have been shown to regulate the chain length of the final EPS structure in ASCC 1275 (Padmanabhan et al. 2020 ). The gene products of ep-sCD, whic h ar e pr oposed to form a membrane-located complex, also bear topological and sequence similarities to ABC transport systems, suggesting that they play a dual role in EPS transport and chain length control (Stingele et al. 1996 ). It is assumed that the transcription level of epsC (and likely epsD) influences chain length, while the availability of nucleotide sugars may also modulate chain length also (Wang et al. 2020 ). Polymerization of the EPS repeating subunits is suggested to be a function of EpsJ, possibly through complexation with EpsC and EpsD (Stingele et al. 1996 ). Recent analyses of eps loci of dairy streptococcal isolates identified 10 distinct genotypes (named A-J), and the gl ycosyltr ansfer aseassociated gene content lar gel y accounts for their differentiation (Szymczak et al. 2019, Romero et al. 2020, Parlindungan et al. 2022. Furthermor e, the pr esence of tr ansposase-encoding genes in many of these clusters may be associated with their diversification/mobilization within or between strains as well as their (lack of) functionality in certain strains (Parlindungan et al. 2022 ).
The diversity of the eps loci of dairy str eptococcal str ains is likely also linked to their compositional and structural diversity, with possible downstream impacts on their interactions with other micr oor ganisms. Ho w e v er, while ther e is an e v er-incr easing number of genome sequences available as well as several studies detailing the EPS composition and/or structure, the links between these and the composition and structure remain poorly explored and r epr esent a critical question that could be tr ansformativ e to functional food de v elopment and human health alike. Furthermore, since certain dairy streptococcal phages are known to recognize and bind to EPS, it is essential to gain impr ov ed insights into the structural diversity of EPS that are produced by this species to identify their specific sacc haridic r eceptors . T his will impro ve risk e v aluations by starter culture providers and ultimately facilitate increased consistency in dairy fermentations.

RGP produced by S. thermophilus
In addition to EPS, all S. thermophilus strains produce a cell wall pol ysacc haride, termed the RGP, that is closely associated with the cell surface . T he RGP of str eptococci is implicated in inter actions with other organisms, cell morphology, and cell division processes (De et al. 2017, Bischer et al. 2020, Lavelle et al. 2022. The designation of these pol ysacc harides as RGPs is an artefact of the nomenclature of similar polysaccharides in pathogenic streptococci, in whic h these structur es wer e primaril y observ ed to contain rhamnose and glucose (Mistou et al. 2016 ).
The RGPs of dairy streptococci have been found to contain Nacteyl glucosamine, N-acetyl galactosamine, or galactose in addition to rhamnose and/or glucose, with strain to strain variation in the monosaccharide composition, the number of monosaccharides in the repeating subunit, and the extent of branching of these structures (Szymczak et al. 2018, Szymczak et al. 2019, Romero et al. 2020, Lavelle et al. 2022, Lavelle et al. 2022, Parlindungan et al. 2022. The 20-30 kb gene cluster that encodes the biosynthetic machinery for the RGP structure, and which is termed the rgp locus is responsible for the synthesis of two connected saccharidic components . T he first of these constitutes the bac kbone structur e, whic h is belie v ed to be cov alentl y linked to and embedded within the peptidoglycan layer, while the second r epr esents a side-c hain structur e whic h is cov alentl y linked to the RGP backbone, is exposed at the cell surface, and has been recognized as the receptor for certain dairy streptococcal pha ges (Lav elle et al. 2022 ). Based on hier arc hical clustering analysis of the protein complement encoded by 78 S. thermophilus rgp loci, se v en rgp genotypes wer e identified ( rgp 1-7) and the Figure 1. Schematic depicting the general architecture of eps gene clusters in S. thermophilus and highlighting some of the major functional units within these clusters, including regulation (purple arrows); transport and/or chain length determination (indigo arrows), repeating subunit synthesis, including gl ycosyltr ansfer ases (gr een arr ows); and pol ymerization (y ello w arro w); and modification functions (or ange arr ows). These gene clusters range in size from ∼15-30 kb. Note: * the nomenclature of the repeating subunit biosynthesis and modification genes may vary depending on the number of genes in an individual cluster, but are typically named sequentially. corr esponding RGP structur e of str ains r epr esenting fiv e of these genotypes has been determined ( Fig. 2 ) (Lavelle et al. 2022 ). The leftw ar d end of the rgp locus is associated with the synthesis of the variable side-chain structure, while the rightw ar d end of the cluster is associated with the synthesis of the backbone structure. While se v en rgp genotypes (Rgp groups 1-7) have been discerned based on the ov er all gene content, these two distinct regions of the rgp gene cluster may be present in various combinations in differ ent str ains with thr ee distinct bac kbone genotypes (Bt) and fiv e v ariable side-c hain (Vt) genotypes discerned to date based on detailed analyses of these clusters (Lavelle et al. 2022 ). A two-step multiplex PCR has been established to facilitate the r a pid classification of dairy streptococcal strains based on their Bt and Vt genotypes, which can be applied for the identification of strains with novel of Bt and Vt combinations. It is proposed that the backbone part of the structure is embedded within the peptidoglycan layer while the variable side-chain is surface-exposed ( Lavelle et al. 2022 ). Ther efor e, it is most likely that the surface-exposed side-chain is associated with phage binding. Ho w ever, since only a small number of phage-host combinations have been studied in detail in this host species, it is important to e v aluate the full extent of diversity of the RGP structures and the multiplex PCR is a useful tool to r a pidl y pr edict and discern suc h div ersity.
Kno wledge regar ding the diversity of these rgp clusters and their associated RGP structures may serve as a basis for the prediction of strains with distinct cell wall polysaccharide compositions, while it will also facilitate rational starter strain uses. Suc h r ational starter str ain a pplications, whether they be thr ough selected (pha ge-insensitiv e) str ain blends and/or str ain r otation r egimes, ar e all intended to reduce the risk of bacteriophage proliferation in food fermentations.
While there is considerable and continually emerging sequence data for this species, we still lack insights into the combinations of rgp and eps loci in strains of this species. The combination of these loci and their encoded structures may help in the selection of strains for starter cultur e r egimes and mixed cultures to ensure the stability of fermentations and to reduce the risk of phage predation and pr olifer ation in a given fermentation facility. Insights such as these will expand the potential for functional and technological de v elopments in sustainable food pr oduction systems.

Bacteriophages of S. thermophilus
One of the most significant and persistent challenges to food fermentation is infection of starter cultures by bacteriophages (or pha ges). Pha ge infection of strains within a starter culture may impair growth and milk acidification rates and lead to product inconsistenc y, do wngr ading and, in se v er e cases, complete loss of pr oduct. Pha ges hav e been shown to be highly persistent in food production facilities over extended periods of time and, this is likely to be exacerbated by the repeated and intensive application of specific starter strains or starter culture blends and the aer osolization of pha ge particles (Rousseau and Moineau 2009 ;Verreault et al. 2011 ). Furthermore, phages infecting strains of S. thermophilus are widely reported to display a high tolerance to pasteurization and other thermal treatments while chemical sanitizers applied in dairy processing plants may reduce the phage load albeit in a phage-and sanitizer-dependent manner [for an extensiv e r e vie w on this subject, see (Marcó et al. 2019 )]. Ho w e v er, r esearc h r eports on these tr eatments ar e fe w and far between, and often lack a universal testing approach, e.g. testing in milk, whey, or rich medium backgrounds.
Streptococcus thermophilus infecting pha ges hav e r ecentl y been classified into fiv e geneticall y distinct groups (Philippe et al. 2020 ;Hanemaaijer et al. 2021 ). Members of all fiv e gr oups possess, long non-contractile tails and isometric capsids . T he gr oups ar e termed the Moineauvirus (formerly termed the cos group) , Brussowvirus (formerly termed the pac group) , Vansinderenvirus (formerly termed the 5093 group), and 987 and P738 genera of the Aliceev ansviridae famil y (formerl y part of the Siphoviridae famil y). The Moineau-and Brussowviruses are the most prevalent in the industrial dairy fermentation context and, consequentl y, ar e the most intensely studied of the dairy streptococcal phages with respect to genome sequence analysis, genetic diversity, and interactions with their host (Romero et al. 2020 ). While all dairy streptococcal phages appear to have evolved from temperate ancestors, only (certain) members of the Brussowvirus genus ar e trul y temper ate pha ges (Ne v e et al. 2003 ;Arioli et al. 2018 ). Members of all other genera of dairy streptococcal phages are virulent. The genomes of dairy str eptococcal pha ges ar e typicall y between 30 and 40 kb and exhibit a modular arc hitectur e with discrete modules containing genes encoding r eplication, mor phogenesis, and lysis functions (and lysogeny-related functions in certain Brussowvirus phages) (Hanemaaijer et al. 2021 ). The genomes of these pha ges ar e highl y plastic with significant e vidence of r ecombination within, between, and be yond (dairy) stre ptococcal phage groups (Hanemaaijer et al. 2021 ).
Se v er al pha ges ca pable of infecting S. thermophilus r ecognize and bind to either RGP or EPS components on the cell surface of the cognate host (Table 1 ) (Szymczak et al. 2018 ;McDonnell et al. 2020 ;Lavelle et al. 2022 ). Members of the Brussowvirus genus have been shown to bind to (part of) the RGP structure (Szymczak et al. 2018 ;Lavelle et al. 2022 ), while members of the 987 genus recognize and bind to EPS components (Szymczak et al. 2018 ;Mc-Donnell et al. 2020 ). The receptors for these phages have been identified through genome sequence analysis of phage-resistant deri vati ves of the host strains and, in some cases, through complementation of the observed mutations in trans (Table 1 ). Furthermore, the host range of many phages is well established, and since the genomes of many of the available strains are sequenced, it is The Bt1 pol ymer ma y or ma y not carry a glucose modification; the Vt4 side-chain may be a tri-or tetra-saccharide; and the V3 side-chain of the Rgp4 structure may be attached to the polyrhamnose core at differing linkage points.  (987) EPS (Szymczak et al. 2018 ) possible to link the phage-encoded receptor binding protein (RBP) sequence phylogeny to the host eps/rgp genotype based on hier arc hical clustering data (Szymczak et al. 2019 ). Using such a compar ativ e genomics a ppr oac h to link the phylogeny of phageencoded RBPs to host-encoded rgp and eps genotypes, it is proposed that Moineauvirus pha ge RBP phylogen y corr elates with host eps genotypes, thus implicating EPS as the receptor of these phages, although this has not been experimentall y v alidated (Szymczak et al. 2019 ). While there is emerging data regarding the target moiety of these pha ges, ther e r emains a significant knowledge gap with respect to the specific EPS or RGP (oligo)saccharides that ar e r ecognized and bound by these pha ges. Furthermor e, a model for the biosynthesis of dairy streptococcal RGPs has been pr oposed (Lav elle et al. 2022 ), and experimental confirmation of this process will be transformative in predicting the biological functions and chemical structures of uncharacterized and newly emer ging str ains. The phage infection process commences with the binding of the phage to a cognate receptor on the cell surface and is medi-ated by the phage-encoded adhesion device, a multi-protein complex located at the distal end of the phage tail. A seminal study of dairy streptococcal phage-host interactions using the model phage DT1 20 years ago suggested that m ultiple pha ge pr oteins w ere inv olved in DT1-host interactions (Duplessis and Moineau 2001 ). This has been validated by recent analysis of a range of dairy streptococcal phages, in which multiple phage adhesion device pr oteins wer e shown to incor por ate carbohydr ate binding domains (CBDs) (Lavelle et al. 2020 ;Goulet et al. 2022 ). Adhesion devices typically incorporate (a portion of) the tail tape measure protein (TMP), distal tail protein (Dit), tail-associated lysin (Tal), RBP, and accessory proteins in some cases. Among these proteins, the Dit and Tal of se v er al dairy streptococcal phages have been shown to incor por ate extensions containing v arious CBDs . While , in principle, the RBP alone is sufficient to initiate interactions with the host, it is belie v ed that the additional CBDs enhance the ability of the phage to gain proximity to its cognate host and to facilitate directed and specific contact between the RBP and the associated host-encoded receptor (Goulet et al. 2022 ). The interactions between dairy streptococcal phages and their hosts are highly specific , i.e . dairy str eptococcal pha ges typicall y exhibit narr ow host r anges, whic h is likel y due to the div ersity of the RGP and EPS structures that are presented on the cell surface. Ho w ever, while binding to these structures represents the first step in the phage infection process, it is important to consider that the host presents a series of barriers to phage entry and proliferation that undoubtedl y contribute additionall y to the narr ow host r ange of these phages.
Among the most significant de v elopments in the field of phagehost interactions is the development of bioinformatics tools that pr ovide detailed structur e-function insights, including HHPr ed (Soding et al. 2005 ) and AlphaFold (Jumper et al. 2021 ). Structure predictions and domain searches using such resources have generated significant insights into the types of receptors that phages ma y recognise , as well as the possible conformations of the structur es themselv es, and the importance of suc h tools in this ar ea is difficult to o verstate . For example , in S . thermophilus phages , HHPred and AlphaFold2.0 identified the presence , type , and location of CBDs within adhesion device proteins, supporting the contention that pr otein-sacc haridic inter actions ar e emplo y ed b y these phages (Lavelle et al. 2020, Goulet et al. 2022 ). Considering the alternativ e a ppr oac h, i.e. X-r ay crystallogr a phy, whic h can be challenging and time-consuming, the application of bioinformatic tools such as these is expected to be transformative to the field.

Restriction-modification and CRISPR-Cas systems: the primary defences
Dairy str eptococci ar e heavil y r eliant on two phage defence systems i.e. clustered regularly interspaced palindromic repeat (CRISPR) systems [and the CRISPR-associated ( cas ) genes] and restriction and modification (R/M) systems . T he first description of CRISPR-Cas systems of S . thermophilus was reported in 2005 (Bolotin et al. 2005 ). Following their identification in this species, a significant number of studies have reported on the diversity, functionality, and pr e v alence of dairy str eptococcal CRISPR-Cas systems (Barrangou et al. 2007, Horvath et al. 2008 ). There are two classes of CRISPR-Cas systems that differ in having either a multi-Cas protein complex or a single multi-domain protein that binds to CRISPR RNA (Class 1 and 2, r espectiv el y) (Makar ov a et al. 2020 ). Class 1 systems are further divided into three types (I, III, and IV), for which there are nine, six, and three variants described, respectiv el y (Makar ov a et al. 2020 ). Class 2 systems ar e divided into thr ee types (II, V, and VI) with four, se v enteen, and fiv e v ariants of eac h type, r espectiv el y. Genomes of S. thermophilus str ains hav e been reported to harbour up to four CRISPR-Cas loci (named CRISPR1-4), and among these, CRISPR1 and 3 are described as Type II-A systems and are the most prevalent and active systems in this species (Hao et al. 2018 ). CRISPR2 is classified as a Type III-A and appears to be a degenerate system that is non-functional in acquiring spacers in all strains studied to date. CRISPR4 is a member of the Type I-E systems and is found in a limited number of S . thermophilus genomes (Hao et al. 2018 ). Inter estingl y, it has been demonstrated that CRISPR-Cas systems are compatible with and complementary to R/M systems, and their combined activity incr eases pha ge r esistance activity (Dupuis et al. 2013 ).
R/M systems are classified into four types (I-IV) depending on their subunit composition and arc hitectur e (Roberts et al. 2015 ). A recent study analysing the genomes of 23 S. thermophilus strains observed that type I systems were prevalent in the majority of strains with most harbouring at least one such system (Alexandraki et al. 2019 ). Several strains were also found to possess a type II system, and four of the 23 genomes harboured three such systems each. Eight of the 23 strains possess a type III R/M system, while almost half of the strains possess a type IV system (Alexandraki et al. 2019 ). Research pertaining to the activity of dairy streptococcal R/M systems is limited, although certain type II R/M systems have been demonstrated to be functional in this species (Guimont et al. 1993 ;Burrus et al. 2001 ;Dupuis et al. 2013 ).
Dairy streptococci do not typically harbour many (if any) plasmids. Since CRISPR-Cas systems are associated with the restriction of foreign DNA, including phage and/or plasmid DNA, the low abundance of plasmids in this species is likely attributable to the omnipresence of active CRISPR-Cas systems. A database (NCBI) search of complete genome sequences of 83 S. thermophilus strains suggests that 11 harbour at least one plasmid, and among these, thr ee str ains harbour two plasmids (Table S1). Lactococcal strains, on the other hand, typically harbour se v er al plasmids and these ar e a ric h source of div erse anti-pha ge defence systems, while the limited pr e v alence of plasmids in dairy streptococci would suggest that this species is not highly dependent on such mobile elements such as plasmids for phage defences . T he identified dairy streptococcal plasmids range in size from 3.3-14.1 kb with the majority being ∼3.3-4.5 kb. The available annotations of these plasmid-related sequences do not provide significant functional information; ho w e v er, se v er al contain or phan type I R/M r estriction, methyltr ansfer ase, or specificity subunits that may complement or expand the action of c hr omosomall y encoded systems. In response to the R/M and CRISPR-Cas systems of dairy streptococci, phages may respond by acquiring methyltransferases and/or anti-CRISPR (Acr) encoding genes to circumvent the major obstacles presented by the host. To ascertain the pr e v alence and r elatedness of str eptococcal pha ge-encoded methyltr ansfer ases, the NCBI Virus ( https:// www.ncbi.nlm.nih.gov/labs/ virus/ vssi/#/ ) tool was used to interrogate both unclassified (but which includes members of the P738, 987, and Vansinderenvirus species) and classified Siphoviridae , Brussowvirus , and Moineauvirus members infecting S . thermophilus . A total of 55 pr oteins wer e r eturned using all available search criteria containing the terms 'methyltransferase' or 'methylase'. These methyltr ansfer ases a ppear to belong to six distinct groups based on the phylogenetic tree output of the abov e-mentioned searc h. Methyltr ansfer ase-encoding genes were identified among members of the Moineauvirus, Vansinderenvirus, and 987 gr oups, but seemingl y not among members of the Brussowvirus species . Moreo ver, certain phages harbour more than one methyltr ansfer ase, whic h is suggestiv e of r egular exposur e to (distinct) R/M systems (Table 2 ).
Two Acr systems have been described in S. thermophilus phages, i.e. AcrIIA5 and AcrIIA6 (Hynes et al. 2018 ), and among these AcrIIA6 systems are reportedly present in 33% of evaluated virulent dairy streptococcal phage genomes. To ascertain the current pr e v alence and relatedness of streptococcal phage-encoded anti-CRISPR proteins, the NCBI Virus ( https://www.ncbi.nlm.nih.gov/l abs/ virus/ vssi/#/ ) tool was used to interrogate both unclassified and classified Siphoviridae , Brussowvirus , and Moineauvirus phages infecting S . thermophilus . A total of 24 proteins were returned using all available search criteria containing the search string 'acr'. Three and eight phages were identified to harbour AcrIIA5 and AcrIIA6 systems, r espectiv el y, while a further 13 pha ges harbour so-called ' Acr -like' proteins based on the search terms used in this analysis . T hese systems are dominantly associated with Moineauvirus (16 phages) and Brussowvirus (four phages), while they also occur at a lo w er frequenc y among Vansinderenvirus and 987 phages 0/0 0/0 0 (Table 2 ). Although the true pr e v alence may be greater (and limited by the quality of annotations), the presence of both methyltr ansfer ase and Acr systems highlights the common presence of such counter defences among streptococcal phages and their historic engagement with such defence systems. It is also noteworthy that certain phages possess both Acr and methyltr ansfer ases (Table 2 ). For example, the Brussowvirus TP-J34 is predicted to possess AcrIIA6 and a N-6 DNA methylase; the Vansinderenvirus SW19 is predicted to encode a SAM-dependent DNA methyltransferase and an ' Acr -like' protein; and the 987 phage SW16 is predicted to encode a site-specific DNA methyltr ansfer ase and an ' Acr -like' protein.

Contribution of pr opha ges to the S. thermophilus resistome
Pr opha ges ar e known to pr e v ent secondary infection by phages possessing a homologous r epr essor (Davies et al. 2016 ). It has long been suggested that the incidence of l ysogen y among dairy streptococci is low . T o corr obor ate this notion, the genomes of 83 strains of S. thermophilus that are deposited in the NCBI database and are defined as complete (as of November 1, 2022) were analysed for the presence of prophages using PHAST (Zhou et al. 2011 ). This pr ediction pr ovides suggestions of possible 'intact', 'questionable' and 'incomplete' pr opha ge-encompassing DNA r egions. All e v aluated genomes wer e pr edicted to possess at least one incomplete pr opha ge r egion; ho w e v er, manual inspection of the (highl y) conserved ∼10 kb region deemed that it was unlikely to represent a phage based on BLASTn and Pfam analysis. Similarly, 18 genomes were suggested to harbour questionable pr opha ge-encoding r egions. While some of these contained genes that may be of phage origin, they a ppear ed to r epr esent or phan genes rather than cryptic or satellite pr opha ge r egions. Finall y, the genomes of eight str ains wer e pr edicted to harbour one intact pr opha ge (thus r epresenting ∼9.5% of evaluated genomes) and among these (based on manual inspection and Pfam and BLASTn anal ysis), thr ee a ppeared to be genuinely intact prophage regions with genome sizes of 47.8 and 48.7 kb (two str ains), r espectiv el y (Table S1). Ho w e v er, although the remaining five predicted intact prophage genomes may not be complete, the presence of repressor-encoding genes may be associated with immunity against phages possessing homologous r epr essor-encoding genes (Johnson et al. 1981 ).
Be yond re pressor-mediated immunity, certain prophages have been implicated in superinfection exclusion by heterologous phages via the small pr opha ge-encoded lipopr otein ltp (Sun et al. 2006 ;Ali et al. 2014 ). Inter estingl y, when ltp TP-J34 was expr essed in Lactococcus , it w as observ ed to pr ovide pr otection a gainst the lactococcal Skunavirus P008 (Sun et al. 2006 ). Of note, phage escape m utants ca pable of bypassing Ltp possess deletions in the TMPencoding gene (Bebeacua et al. 2013 ). In a gr eement with this finding, pr oteome anal ysis of purified pha ge particles of the mutant phages identified that the TMP was significantly smaller (66 kDa) r elativ e to that of the parent phage (75 kDa) (Bebeacua et al. 2013 ).
BLASTn searches for homologues of ltp TP-J34 identified eight such genes with > 95% sequence identity over the full length of the gene (January 2023). These ltp homologues ar e pr esent in the pr opha ges of strains SK778 (TP-778 L), DSM 20617, ATCC 19258, NCTC 12958, and NWC_2_1, as well as in the genomes of phages VS-2018a and SW18. Recently, a non-inducible prophage of S . thermophilus M17PTZA496 was also demonstrated to contribute to the host str ain's r esistome thr ough the pr esence of a cI-like r epr essor and ltp (da Silva Duarte et al. 2018 ). These findings support the hypothesis that cryptic pr opha ges and or phan pha ge-deriv ed genes may contribute to the fitness of the host.
While pr opha ge carria ge may pr esent possible benefits, the fitness costs should also be considered. For example, prophage carriage in S . thermophilus DSM 20617 was shown to reduce the cell wall integrity and heat-resistance, while it was shown to simultaneousl y incr ease adhesion to solid surfaces, a trait that is linked to peptidogl ycan br eaks (Arioli et al. 2018 ). In addition to the virulent and lysogenic states, it is proposed that phages may be present in cultures as chronic infections or exist in the so-called 'carrier state' either attached to the external surface of cells or as internalized DNA (Somerville et al. 2022 ). In these states, phages challenge the kill-the-winner concept and may persist stably over long periods of time within cultures or factories. Indeed, closely related phages that infect lactic acid bacterial starter cultur es hav e been observ ed ov er periods of up to a decade in Irish and Canadian cheese factories Moineau 2009 , Lavelle et al. 2018 ). T his persistence beha viour ma y lend itself to the fluctuating selection dynamic, which occurs when the fitness costs of developing pha ge-r esistance ar e unfavour able and leads to a (temporary) reduction in active phages against the host species/strain.

Challenges in the de v elopment of r ob ust starter cultures for application in industry
Despite extensive counteractive efforts in the dairy industry, phages continue to be a major problem in these fermentations. Consequentl y, food pr oducers hav e incor por ated se v er al lines of defence to mitigate this risk, including impr ov ed sanitation r egimes, air filtr ation, staff mov ement contr ol, and starter culture optimization and rotation. In dairy streptococci, the development of bacteriopha ge-insensitiv e m utants (BIMs) has traditionally been achieved through exposure of the starter culture (or individual strains) to whey containing phages that are problematic against the culture or strain. The generation of spontaneous BIMs of S. thermophilus is lar gel y facilitated by the innate CRISPR-Cas systems and has been the a ppr oac h of choice by many starter cultur e pr oviders for decades as it is inexpensiv e, r equir es limited expertize and facilities, it gener all y has a limited, if not completely negligible, impact on technological properties of strains, and is acceptable to regulatory authorities as a natural process (Mills et al. 2010 ;Chirico et al. 2014 ;Achigar et al. 2021 ). While this process allows companies to respond rapidly to emerging or per-sistent pha ge pr oblems in a factory-specific manner, suc h CRISPRderived BIMs may be rapidly overcome by e volv ed pha ges with single point mutations in the acquired spacer regions (which are typically ∼30 bp in length) (Deveau et al. 2008 ). Consequently, longerterm solutions are required to ensure the stability of fermentations, possibly in combination with the de v elopment of spontaneous BIMs.
Knowledge of the sacc haridic r eceptors r equir ed by dairy str eptococcal phages may guide starter cultur e pr oviders to select str eptococcal str ains with distinct rgp and/or eps genotypes in starter culture blends or rotation strategies . T his requires detailed knowledge of the phage types that are present in fermentation facilities and the exact culture composition and formulation that was applied in the factory. This r epr esents a consider able c hallenge for se v er al r easons and cannot be gener alized since eac h factory will have (i) individual production and cleaning practices, (ii) phage testing methods and depth, (iii) distinct relationships with starter culture pro viders/ingredient suppliers , and (iv) differ ent business or ganizations. Furthermor e, this is a long-term a ppr oac h that r equir es consider able inv estment to establish the pr e v alent pha ge(s) and suitable str ains to r e place the sensiti ve strain and/or a rotation strategy that will reduce the risk of phage pr olifer ation wher e pr oducers pr efer to r etain the a pplication of a pha ge-sensitiv e str ain. Ther efor e, it is imper ativ e that pr oducers are well informed by starter culture providers to ensure that good pr oduction pr actices ar e adher ed to in order to reduce the risk of phage infection and proliferation in industrial fermentations.
It is most likely that e v ery fermentation facility possesses resident phages that persist in the factory environment and that certain pha ges ar e also tr ansientl y pr esent acr oss the lifetime of a factory. Companies may wish to project that their phage problems are limited or e v en absent; ho w e v er, a univ ersal understanding and acceptance of the omnipresence of phages will enable the de v elopment of tailor ed solutions and facilitate risk r eduction thereby ensuring the ongoing success of the sector. Historically, failed fermentations resulted in the disposal of large volumes of milk into effluent ponds, w astew ater stor a ge facilities, or it was applied on land or as feed to animals with considerable downstr eam envir onmental impacts (Campbell and Feldpausc h 2022 ). Valorization of dairy waste and by-products is an emerging area of r esearc h and will hav e the dual impact of impr oving the sustainability of fermentation processes and removing waste and bypr oducts fr om the pr oduction site to the valorization facility, with the downstream result of reducing phage populations in the production environs as well as the associated economic advantages (Russo et al. 2021, Naomi David et al. 2022, Carolin et al. 2023. Reports of non-CRISPR mediated BIM de v elopment in S. thermophilus ar e limited, pr obabl y due to the success of CRISPR-Cas systems in the industrial context. This has created a void regarding alternativ e str ategies to gener ating BIMs in this species. RGP-mediated BIMs typically exhibit growth impairment and are consequentl y less likel y to be suitable for industrial a pplication (Lavelle et al. 2022 ). An alternative to generating RGP-based BIMs is the incor por ation of str ains with distinct rgp genotypes. Since pha ges r ecognize specific sacc haridic structur es (associated with the rgp genotypes), strain blends or rotations based on different rgp genotypes reduces the risk of phage infection of more than one strain in that blend or rotation. Similarly, inclusion of diverse eps genotypes in a starter blend or rotation scheme could be applied to reduce the proliferation of phages that recognize EPS receptors. Knowledge of EPS diversity at both the genetic and structural level is limited to date and will likely be an area of research expansion in the coming years given its biotechnological and fundamental importance.
In dairy Lactococcus spp. conjugation of large plasmids that harbour pha ge-r esistance systems, including abortiv e infection and R/M systems, is an a ppr oac h that has been widely applied for the de v elopment of pha ge-r esistant and r obust cultur es (Tr otter et al. 2001, Fallico et al. 2012. As mentioned abo ve , dairy streptococci harbour limited, if any, plasmids and those that are present ar e typicall y small and non-conjugative . T his , combined with the presence of CRISPR-Cas and R/M systems , ha ve limited the adaptation of S . thermophilus strains by conjugation. In contrast to lactococci, ho w e v er, natur al competence may be exploited in certain strains of S . thermophilus (Fontaine et al. 2010a ) to facilitate the introduction of genetic material. A challenge associated with the application of this a ppr oac h is the str ain-specific natur e of the competence phenomenon. Despite reports of impr ov ed methods to induce natural competence in dairy streptococci, it is not a pplied widel y for the ada ptation of dairy str eptococcal str ains (Fontaine et al. 2010b ). Ther efor e, while ther e is significant potential for the de v elopment of r obust starter cultur es, some pr actical, r egulatory, and bio(tec hno)logical barriers and bottlenec ks need to be overcome in the coming decade to facilitate timely responses to fermented food producers' needs.

Conclusion
The starter culture industry is in a period of major transformation as consumer demand for fermented dairy products and dairy alternative products continues to increase . T his diversification of products combined with increased production volumes emphasizes the need for starter culture providers to be able to r a pidl y identify str ains with the a ppr opriate tec hnological pr operties, that can reduce phage proliferation issues and improve the pr edictability/r eliability of food fermentation processes. Considerable advances have been made in our understanding of the diversity of S . thermophilus , its phages, and their interactome. Ho w ever, as these advances have raised an equal or expanded number of questions, it is imper ativ e that we de v elop a holistic vie w of dairy streptococci, the composition of their variable genome content and how this will shape the future of research in this species and its application in food fermentation. This should incorporate an ov ervie w of the combinations of rgp and eps genotypes that streptococci may possess in concert with the defence systems that they harbour to truly evaluate the various lines of defence that strains of this species employ and how these may be harnessed for the de v elopment of robust starter culture systems.

Supplementary data
Supplementary data is available at FEMSRE online.

Funding
This publication has emanated from research conducted with the financial support of Science Foundation Ireland under grant numbers 20/FFP-P/8664 and 12/RC/2273-P2. For the purpose of open access , we ha v e a pplied a CC BY public copyright license to any author acce pted man uscript v ersion arising fr om this submission.