Multiple variants of the type VII secretion system in Gram-positive bacteria

Abstract Type VII secretion systems (T7SS) are found in bacteria across the Bacillota and Actinomycetota phyla and have been well described in Staphylococcus aureus, Bacillus subtilis, and pathogenic mycobacteria. The T7SS from Actinomycetota and Bacillota share two common components, a membrane-bound EccC/EssC ATPase and EsxA, a small helical hairpin protein of the WXG100 family. However, they also have additional phylum-specific components, and as a result they are termed the T7SSa (Actinomycetota) and T7SSb (Bacillota), respectively. Here, we identify additional organizations of the T7SS across these two phyla and describe eight additional T7SS subtypes, which we have named T7SSc–T7SSj. T7SSd is found exclusively in Actinomycetota including the Olselnella and Bifodobacterium genus, whereas the other seven are found only in Bacillota. All of the novel subtypes contain the canonical ATPase (TsxC) and the WXG100-family protein (TsxA). Most of them also contain a small ubiquitin-related protein, TsxB, related to the T7SSb EsaB/YukD component. Protein kinases, phosphatases, and forkhead-associated (FHA) proteins are often encoded in the novel T7SS gene clusters. Candidate substrates of these novel T7SS subtypes include LXG-domain and RHS proteins. Predicted substrates are frequently encoded alongside genes for additional small WXG100-related proteins that we speculate serve as cosecretion partners. Collectively our findings reveal unexpected diversity in the T7SS in Gram-positive bacteria.


Introduction
Pr otein secr etion systems ar e ubiquitous in pr okaryotes.In Gr amnegativ e bacteria, ther e ar e at least 10 distinct secretion systems (Filloux 2022 ).Some of these, for example the type III secretion system, mediate translocation of substrate proteins directly across the cell envelope in a single step.Others are two-step pathways where the substrate is first exported to the periplasm by the gener al secr etory (Sec) or twin ar ginine (Tat) tr ansporters prior to passa ge acr oss the outer membr ane.Gr am-positiv e bacteria gener all y hav e simpler cell env elope or ganizations and ther efor e lac k the specialized systems found in Gr am-negativ e bacteria (Filloux 2022 ).
In 2003, a novel protein secretion system was described in pathogenic mycobacteria and was termed the type VII secretion system (T7SS) (Hsu et al. 2003, Pym et al. 2003, Stanley et al. 2003 ).The T7SS localizes to the cytoplasmic (inner) membrane of mycobacteria and operates in parallel to Sec and Tat to mediate tr ansport acr oss this bilayer (Beckham et al. 2021, Bunduc et al. 2021 ).Some of the components of the T7SS were also shown to be present in many Gram-positive Bacillota including Staphylococcus aureus and Bacillus subtilis (Pallen 2002 ).The T7SS is best c har acterized from Mycobacterium tuberculosis , where is it found in five paralogous copies, termed ESX-1-ESX-5 (Bitter et al. 2009 ).All five ESX systems comprise the membrane-bound components EccB, EccC, EccD, and the mycosin protease MycP (Fig. 1 A).A fifth membrane protein, EccE, is also found in all ESX systems except for some ESX-4 variants (Dumas et al. 2016, Bunduc et al. 2020, Lagune et al. 2021 ).ESX-4 is the most ancestral of the five ESX systems and is the only T7SS system found in nonmycobacterial members of the Actinomycetota including Nocardia and Gordonia (Dumas et al. 2016 ).
High r esolution structur es of purified ESX-5 hav e been r eported (Beckham et al. 2021, Bunduc et al. 2021 ).The complex has a hexameric arrangement and a mass in excess of 2.3 MDa.Six copies of EccC are located at the centre of the machinery, forming the secr etion por e. EccC is a AAA + ATP ase r elated to the DNA tr anslocase FtsK and has two tr ansmembr ane domains at its N-terminus follo w ed b y four nucleotide binding domains (Famelis et al. 2019 ).EccC forms interactions with the EccB and EccD subunits, while EccE is located at the periphery of the complex (Beckham et al. 2017, 2021, Famelis et al. 2019, Po w eleit et al. 2019, Bunduc et al. 2021 ).MycP, which is loosely associated with the machinery, forms a trimeric cap at the periplasmic side, and may pr oteol yticall y pr ocess some substrates as they are secreted (Ohol et al. 2010, Bunduc et al. 2021 ) (Fig. 1 A).A second, cytoplasmic AAA + ATPase, EccA, of unknown function is encoded alongside all ESX systems with the exception of ESX-4 (Gao et al. 2004, Converse and Cox 2005, Crossk e y et al. 2020 ).
The T7SS of Bacillota such as S. aureus is distantly related to mycobacterial ESX, with the AAA + ATPase being the only common membrane component between the two systems (Pallen 2002 ).This has led to the two systems being termed T7SSa (Actinomycetota) and T7SSb (Bacillota), r espectiv el y (Abdallah et al. 2007 ).In the T7SSb the ATPase is named EssC, and although it shares similar arc hitectur e to EccC, it is larger due to the presence of two forkhead-associated (FHA) domains at the N-terminus (Tanaka et al. 2007, Zoltner et al. 2016 ).Thr ee other membr ane pr oteins, EssA, EssB, and EsaA, dissimilar to ESX components in sequence and structur e, ar e essential components of the S. aureus and B. subtilis T7SS (Fig. 1 B) (Burts et al. 2005, Baptista et al. 2013, Huppert et al. 2014, Kneuper et al. 2014 ).EsaA has an extended extracellular domain that spans the cell wall raising the possibility that the Bacillota T7SS may form a conduit for release of substrates at the cell surface (Sao-Jose et al. 2006, Klein et al. 2021 ).A small cytoplasmic protein with a ubiquitin fold, EsaB (YukD in B. subtilis ), is a further essential component of the T7SSb (van den Ent and Lo w e 2005 , Kneuper et al. 2014, Casabona et al. 2017 ).An EsaB-like domain is also found at the C-terminus of the T7SSa component EccD (Famelis et al. 2019, Po w eleit et al. 2019 ), suggesting that this domain is a common feature of T7SS.
A further commonality between the T7SSa and T7SSb is in the r equir ement for proteins of the WXG100 family for function (Burts et al. 2005, Rosenberg et al. 2015 ).These are small helical hairpins of ∼100 amino acids that are secreted as folded dimers by the T7SS (Renshaw et al. 2005, Sundaramoorthy et al. 2008, Sysoeva et al. 2014 ).A conserv ed Tr p-Xaa-Gl y motif is found at the hairpin hinge, giving the family its name (Pallen 2002 ).In the T7SSb systems a single WXG100 protein, EsxA, is co-encoded with the secr etion mac hinery and is essential for its activity (Huppert et al. 2014, Kneuper et al. 2014 ).In the T7SSa systems WXG100 proteins usually occur in pairs, which heterodimerize (Renshaw et al. 2002 ).Some WXG100 proteins carry a C-terminal 'signal sequence' that binds in a pocket on the ATPase domains of EccC/EssC, controlling ATPase activity and promoting interaction between protomers (Champion et al. 2006, Rosenberg et al. 2015, Mietr ac h et al. 2020 ).
Other substrates of the T7SSa are the PE-PPE protein families, whic h ar e named for the PE or PPE motifs carried at the N-terminus and are encoded by highly expanded gene clusters in pathogenic mycobacteria (Cole et al. 1998, Abdallah et al. 2006 ).These are also heterodimers of a PE and a PPE pr otein, whic h inter act thr ough their helical hair pin N-termini to form a fourhelix bundle (Strong et al. 2006 ).A C-terminal signal sequence is present on the PE partner (Daleke et al. 2012 ).PE-PPE complexes bind dedicated EspG c ha per ones, whic h both stabilize the complex by shielding a hydrophobic patch at the tip of the helical bundle and play a role in targeting the substrates to the cognate ESX system for secretion (Ekiert and Cox 2014, K orotko va et al. 2014, Phan et al. 2017 ).Esp proteins are the third substrate family of the T7SSa.EspB adopts an N-terminal four helix bundle fold that carries both a WXG motif and a signal sequence and is secreted without an a ppar ent binding partner (Solomonson et al. 2015 ).EspK was r ecentl y identified as a c ha per one for EspB and binds to the tip of the EspB helical bundle, in an analogous fashion to EspG (Gijsbers et al. 2023 ).
PE-PPE and Esp proteins are not substrates for the T7SSb.Instead, this system secr etes pr oteins of the LXG family (Whitney et al. 2017, Yang et al. 2023 ).LXG proteins are related to WXG100 pr oteins but ar e longer, with a helical LXG domain encompassing the first ∼190 amino acids (Yang et al. 2023, Klein et al. 2024 ).LXG domains alone are not competent for secretion, and interact with small helical partner proteins, termed LXG-associated αhelical pr oteins (La ps) that ar e essential for export (Klein et al. 2022 ).Some La ps ar e pr oteins of the WXG100 famil y, wher eas others share the same fold and similar size but lack the WXG motif (Klein et al. 2022, Yang et al. 2023 ).Structur al anal ysis of LXG-Lap complexes reveal that they share striking similarity with the PE-PPE and Esp complexes of the T7SSa (Klein et al. 2024 ).Some LXG proteins, for example S. aureus EsaD, also r equir e a c ha perone, EsaE for export (Cao et al. 2016 ).Despite sharing very low sequence identity with EspG, EsaE is predicted to share the same EspG fold, and is also implicated in targeting of EsaD to the T7SSb (Cao et al. 2016, Yang et al. 2023 ).Very r ecentl y a second class of T7SSb substrate was identified.TslA is an antibacterial lipase toxin secreted by the S. aureus T7SS.TslA has a very unusual rev erse arr angement of domains, with the helical LXG-like domain present at the C-terminus rather than the N-terminus.Two small La p pr oteins inter act with this C-terminal domain to generate a composite targeting sequence (Garrett et al. 2023 ).It is not yet known whether the T7SSa also secretes proteins with reverse domain organization.
During our analysis of the T7SSb, we noted that the Bacillota bacterium Bacillus anthracis , which has a functional T7SS, lacked genes for the EssA, EssB, and EsaA components.Instead, a different suite of conserved genes flanks essC .Despite the lack of these core components, the B. anthracis T7SS was still competent for the secretion of a WXG100 protein, EsxB (Garufi et al. 2008 ).This led us to investigate the diversity of EssC orthologues and their encoding gene clusters among Gr am-positiv e bacteria.Fr om this we identify a further eight phylogenetically distinct EssC-like ATPases, each of which is encoded alongside unique sets of genes that ar e likel y to code for further secretion system components.We propose that these r epr esent T7SSc-T7SSj.

Methods
To identify orthologues of EssC across a diverse range of bacteria, the EssC orthologue from Brevibacillus brevis was used to perform a BLASTp search against the RefSeq database (Altschul et al. 1990 ).An accession list generated from this BLASTp output was used to run FlaGs2 v1.1.2(Saha et al. 2021 ).Further genetic neighbourhood analysis was performed following a similar pipeline, but with accession lists submitted to webFlaGs (Saha et al. 2021 ).Genetic neighbourhoods w ere display ed using Clinker (Gilchrist and Chooi 2021 ).
Protein alignments were performed using MUSCLE v3.8.1551 (Edgar 2004 ), and visualized with boxshade ( https://github.com/mdbar on42/pyBoxshade ).For the anal ysis of EssC div ersity acr oss the novel T7SS subtypes, a r epr esentativ e sample of EssC orthologues were taken from the FlaGs2 output ( Supplementary Data 1 and 2 ).These were aligned using MUSCLE, as described, and MEGA X was used to build a maximum likelihood tree, using the JTT matrix and 1000 bootstr a ps (Kumar et al . 2018 ).P airwise pr otein sequence alignments were performed using the EMBOSS needle pairwise alignment tool (EBLOSUM62 matrix) to obtain percentage sequence identity and similarity (Madeira et al. 2022 ).
For the identification of domains and further analysis of proteins predicted to form the core components of each novel T7SS, amino acid sequences were submitted to a number of prediction software.For domain and function prediction based on sequence homology, pr oteins wer e submitted to BLASTp (Altschul et al. 1990 ) and the InterPr o serv er (Jones et al. 2014, Paysan-Lafosse et al. 2023 ).A pr edicted structur e of eac h component was also obtained from AlphaFold2 version 1.5.5 (Mirdita et al. 2022 ), which was then submitted to FoldSeek (v an K empen et al. 2023 ) for domain and functional predictions based on structural similarity.To identify predicted signal peptides, sequences were submitted to SignalP 5.0 (Alma gr o Armenter os et al. 2019 ) and to identify pr edicted tr ansmembr ane helices, sequences wer e submitted to DeepTMHMM (Hallgren et al. 2022 ).A summary of the output fr om these anal yses can be found in Supplementary Data 3 , and the AlphaFold2 models for each of the predicted components is included in Supplementary Data 4 .Structural model alignments were performed on ChimeraX v1.4 (Pettersen et al. 2021 ) using the matchmaker tool.
For the identification of esaB and esxA orthologues in T7SSd clusters, nucleotide profile hidden Markov models for these genes wer e gener ated using orthologues fr om eac h of the other T7SS described and aligned using MAFFT v7.489 (Katoh et al. 2002 ).A custom bash script, [adapted from that used in Garrett et al. ( 2022 )] was used to extract copies of both esaB and esxA from T7SSd clusters, comprising 10 000 nucleotides upstream and downstream of tsdC , with an E-value cutoff of 0.05.Script available on GitHub ( https:// github.com/stephen-r-garr ett/T7SS _ v ariants ).

T he identifica tion of no vel T7SS arr angements
A prior study had identified an essC gene within the putative T7SS gene cluster of B. anthracis (Garufi et al. 2008 ).Ho w e v er, when we aligned EssC from B. anthracis with EssC sequences from the wellc har acterized S. aureus and B. subtilis T7SSb systems, we noted that while they shared clear sequence homology, the B. anthracis pr otein unexpectedl y lac ked FHA domains at the N-terminus and so aligned only with residues 208-1495 of the B. subtilis protein (with 24.8% identity) and residues 187-1479 of S. aureus EssC (with 25.4% identity) ( Fig. S1 ).Further anal ysis r e v ealed that EssC proteins from some other Bacillota members, for example B. brevis , had high similarity (42.4% identity, 62.1% similarity) to that of B. anthracis EssC and also lacked FHA sequences.Using the B. brevis EssC sequence we performed an extensiv e BLAST searc h a gainst the RefSeq database to identify div erse EssC v ariants, whic h wer e then used to perform gene neighbourhood analysis (Saha et al. 2021 ).This generated a phylogenetic tree for all EssC accessions, along with the genetic neighbourhood of each essC based on the tree order ( Supplementary Data 1 and 2 ).These data were used together to identify novel arrangements of the T7SS, allowing us to identify, in addition to the T7SSa and T7SSb, a further eight, genetically distinct organizations of the T7SS.
To gain a clearer understanding of the genetic diversity of these no vel T7SSs , a maxim um likelihood phylogenetic tr ee was constructed using a subset of the EssC accessions from the genetic neighbourhood analysis in Supplementary Data 1 and 2 , alongside T7SSa EccC and T7SSb EssC sequences (Fig. 2 ).Again, a similar clustering of EssC orthologues was observed as with the pr e vious anal ysis, with eight gr oups of EssCs distinct fr om eac h other and from the orthologues in the T7SSa and T7SSb systems.From this and the analysis below we propose the presence of an additional eight T7SS subtypes present in Gram-positive bacteria which we have named T7SSc-T7SSj.An alignment of the EccC/EssC orthologues from T7SSc to T7SSj alongside EccC (T7SSa) and EssC (T7SSb) is shown in Fig. S2 , and all of the accessions we identified for EssC orthologues from these systems are included in Supplementary Data 5 .Typical genetic arrangements for each new system are shown in Fig. 3 .

T7SSc
The T7SSc appears to be widespread in man y Gr am-positiv e bacteria, including Paenibacillus , B. anthracis , and B. brevis , and after T7SSa and T7SSb had the most r epr esentativ e sequences in the RefSeq database .T he T7SSc EssC falls into two diverse clades , ho w e v er the flanking genes, which encode putative additional T7SSc components, are conserved ( Supplementary Data 1 and 2 ).While the EssC sequences from the T7SSc are orthologous to EssC from the T7SSb, phylogenetically they cluster more closely with  Genes coding for components that are related between the different systems are shaded similarly.To simplify the nomenclature of the T7SSc-T7SSj systems, we propose to name the components as Ts(x), where 'x' refers to the letter associated with that system, so for example Tsc components are found in the T7SSc.The TsxA, TsxB, and TsxC components are (almost) univ ersall y conserv ed and ar e orthologues of EsxA, EsaB, and EccC/EssC, r espectiv el y.No TsxA-encoding gene is sho wn in the T7SSi gene cluster as w e were unable to identify a likely candidate (see text).
the EccC proteins from the T7SSa (Fig. 2 ).As mentioned abo ve , T7SSc EssC lacks N-terminal FHA domains, which are also absent from the T7SSa EccC but present on the T7SSb EssC ( Figs S1 and S2 ).We propose to name EssC/EccC components as Ts(x)C, wher e x r efers to the specific T7SS subtype , i.e .the EccC component would be TsaC, EssC from the T7SSb would be TsbC, and so on (Table 1 ).
Based on genetic neighbourhood conserv ation, we pr edict that ther e ar e se v en cor e components associated with the T7SSc (Figs 3 and 4 A; T able 1 ).T o determine the putative domain composition and subcellular arrangement of each of these components, r esults fr om BLASTp, SignalP-5.0,DeepTMHMM, and Inter-Pro domain sear ches w ere collated ( Supplementary Data 3 ).In addition, AlphaFold2 was used to obtain structur al pr edictions ( Supplementary Data 4 ) and these structural models were submitted to FoldSeek to obtain further predictions about possible domains present in these proteins.
The T7SSc gene clusters eac h hav e a single gene encoding a WXG100 protein orthologous to EsxA, which we have termed TscA.This is more akin to the T7SSb system, which encodes a single WXG100 protein, EsxA, in the T7SSb gene cluster, whereas the T7SSa systems have a pair of nonidentical proteins .T he T7SSc gene cluster also encodes a small pr otein, TscB, r elated to EsaB/YukD, a ubiquitin-like protein found as a discr ete pr otein in the T7SSb, and as a fused domain in the T7SSa (Fig. 4 A and B; Supplementary Data 3 ).Of the other likely T7SSc components, TscG has no recognizable domain predictions from either BLASTp or InterPro domain sear ches, ho w ever, FoldSeek sear ches using the AlphaFold2 structural prediction suggest it shares homology to the c ha per one pr otein, EspG, fr om the T7SSa ( Supplementary Data 3 , Table 1 ).
Ther e ar e thr ee further conserv ed components encoded in T7SSc gene clusters, which are not found in either the T7SSa or T7SSb.TscE is predicted to contain a von Willebrand factor Table 1.Summary of the components identified in the T7SSc-T7SSj systems and their distribution.* EspG (TsaG) is found in most T7SSa systems, with the exception of some ESX4-indicated subtypes.† The EsaE (TsbG) c ha per one is found only in some T7SSb systems.? indcates that no clear TsxA-encoding gene could be identified among the T7SSi gene clusters.type A (vWA) domain at its N-terminus and is likely localized to the cytoplasm due to the absence of any detectable signal peptide or tr ansmembr ane helices (Fig. 4 A and B; Supplementary Data 3 ).vWA domains often bind metal ions and fr equentl y mediate pr otein-pr otein inter actions (Whittaker and Hynes 2002 ).The other two components, TscD and TscF carry a putative serine/threonine kinase domain and a protein phosphatase Mg 2 -or Mn 2 -dependent (PPM) phosphatase domain, respectiv el y.While TscF is predicted to be localized to the cytoplasm, TscD is predicted to have at least one tr ansmembr ane helix, with the N-terminal kinase domain cytoplasmic (Fig. 4 A and B; Supplementary Data 3 ).TscD shares similar topology to the T7SSb component EssB, which also has a cytoplasmic pseudokinase domain (Zoltner et al. 2013, Tassinari et al. 2022 ).Ho w e v er, TscD shar es no a ppar ent sequence identity with EssB, and unlike EssB the kinase active site is conserv ed.Anal ysis of the genome sequences of bacteria that encode the T7SSc failed to identify any further orthologues of T7SSa or T7SSb components and we pro-pose that the T7SSc comprises the se v en cor e components TscA-TscG (Table 1 ).

T7SSd and T7SSe
All of the other EssC orthologues we identified from our analysis cluster ed mor e closel y with TsbC/EssC of the T7SSb system than with TsaC/EccC from the T7SSa (Fig. 2 ).Of these, the two most distinct from the T7SSb, are those which we have assigned as TsdC and TseC from the T7SSd and T7SSe, r espectiv el y.The T7SSd and T7SSe systems are closely related to one another, but the T7SSd is found only in the Actinomycetota phylum and the T7SSe only in Bacillota.Both TsdC and TseC have N-terminal FHA domains that are found on EssC/TsbC, but they share little sequence homology with the EssC FHA domains (typicall y ar ound 17% identity-see Fig. S2 ).FHA domains often bind phosphothreonines, linking protein phosphorylation with formation of protein complexes (Weiling et al. 2013 ).Ho w e v er, in the T7SSb, the FHA domains of EssC lac k the conserv ed binding site for phosphothreonine ( Fig. S2 and Table S1 ), and form interactions with the cytoplasmic pseudokinase domain of EssB that is pr esumabl y not regulated by protein phosphorylation (Bobrovskyy et al. 2022, Tassinari et al. 2022 ).Inter estingl y, the phosphothr eonine-binding consensus sequence in the FHA domains of TsdC and TseC appears to be well conserved ( Figs S2 and S3 , Table S1 ), raising the possibility that the assembly and/or activity of these systems is controlled by threonine phosphorylation.
The T7SSd is predicted to be composed of only five core components, as r epr esented by the gene cluster fr om Olsenella uli : TsdA, TsdC, TsdD, TsdH, and TsdI (Figs 3 and 5 , Table 1 ).TsdA is a WXG100 protein with homology to other WXG100 family members (Fig. 5 A; Supplementary Data 3 ).Similar to the T7SSc, a predicted serine/threonine kinase, TsdD, is also found associ-ated with the T7SSd, and in this system could potentially link phosphorylation e v ents with TsdC-FHA domain inter actions .T he AlphaFold structural models of TscD and TsdD align closely at their N-terminal kinase domains (Fig. 5 B), ho w e v er the Cterminal regions of these two proteins, while both predicted to be extensiv el y helical, ar e quite div erse .T his ma y potentially r eflect differ ences in the or ganization of the cell env elopes of these bacteria which is where the C-terminal r egions ar e pr edicted to reside .T he T7SSd has two further probable components that have not been described in any of the T7SSa-c systems .T he first of these is TsdH, composed of a single transmembrane helix and a short, likely unstructured cytoplasmic region (Fig. 5 A and C; Supplementary Data 3 ).The second of these novel components is TsdI.TsdI is a large protein (1654 amino acids for O. uli TsdI) that has a N-terminal Sec signal peptide and a  and 4 ).We predict that TsdI may span the cell envelope akin to EsaA in the T7SSb system (Klein et al. 2021 ), potentially forming a conduit for substrates and/or playing a role in target recognition.Sur prisingl y, no ubiquitin-like protein is found to be associated with the T7SSd, despite an orthologue being found associated with all other T7SS (Fig. 3 -annotated as gene 'B').To confirm whether the open reading frame (which would likely be very short) was present but not annotated, an HMM sear ch w as conducted using a custom script, a gainst e v ery known T7SSd locus identified in our anal ysis.Howe v er, no ubiquitin-like orthologues were encoded at any T7SSd locus ( Supplementary Data 1 and 2 ), and we were unable to identify an orthologue encoded else wher e in the genome by BLAST search.
All of the components we identified as being present in T7SSd are also found in T7SSe.Ho w e v er, we consistentl y identified additional core components in T7SSe, which are not found in the T7SSd (Fig. 3 ; Supplementary Data 1 and 2 ).Mor eov er, since the T7SSd is exclusiv el y found in Actinomycetota, wher eas the T7SSe is found in Bacillota, we have assigned these as separate subsystems.In addition to the T7SSd-related components TseA, TseC, TseD, TseH, and TseI, the T7SSe systems always encode an additional WXG100 protein (TseA2) as well as the ubiquitin-like protein TseB.TseF, an orthologue of the TscF PPM phosphatase found in the T7SSc, is also found in the T7SSe (Fig. 6 A and B; Supplementary Data 3 ).Two further components, TseJ and TseK, are also encoded at all T7SSe loci.TseJ carries a DUF6382 domain at the N-terminus and an FHA domain at its C-terminus ( Fig. S4 ) and is predicted to be a globular protein localized to the cytoplasm.TseK is predicted to have a single tr ansmembr ane helix, with a short unstructur ed str etc h of amino acids facing the cytoplasm (Fig. 6 B; Supplementary Data 3 ).

Fi v e further novel T7SS clusters are found in a narrow species range
T he three no vel T7SS organizations described above are present acr oss a r ange of bacterial gener a, allowing us to be r elativ el y confident about the likely components based on clustering of genes.Ho w e v er, in addition to these, we have identified a further five novel T7SS genetic arrangements (T7SSf-T7SSj).Each of these five are found within a single or limited range of bacterial genera, and as a result, we are less confident about concluding the likely core components of each of these.

T7SSf
The T7SSf is found in se v er al species of Clostridia .T7SSf comprises the same three core components found in most T7SS described thus far: TsfA, a WXG100 protein; TsfB, a YukD-like ubiquitin orthologue; and TsfC the FtsK-related membrane bound ATPase with N-terminal FHA domains (Fig. 7 A; Supplementary Data 3 ).The phosphothreonine binding motifs in the TsfC FHA domain ar e onl y partiall y conserv ed ( Fig. S2 and Table S1 ).It also contains TsfF, a predicted PPM phosphatase found in the T7SSc and T7SSe systems, and a TsfJ component.The other T7SS systems mechanism of regulation) or whether their colocation is coincidental due to the limited genera in which this system is found.

T7SSh and T7SSi
T7SSh, which is also encoded next to Tad pilus genes, is comprised of five core components, TshA, TshB, TshC, TshG, and TshM (Fig. 9 A and B; Supplementary Data 3 ).TshA is a WXG100 protein, TshB a ubiquitin-related protein and TshC the membrane-bound ATPase that lacks N-terminal FHA domains.TshG, is orthologous to the EspG c ha per one fr om the T7SSa system.TshM is not confidentl y pr edicted but does shar e some potential homology with serine/threonine kinases.Ho w ever, there is no sequence similarity with TscD , TsdD , or TseD (not shown).Mor eov er, TshM is pr edicted to have a single transmembrane helix at its C-terminus and lacks an extracellular domain found in other kinase components, such as TscD and TsdD (Fig. 9 B; Supplementary Data 3 ).
T7SSi is, to date, only found in Vallitalea species.While superficially it appears similar to T7SSh, sharing almost all of the same components, phylogenetically, the cor e ATP ase, TsiC is distinct from that of the T7SSh ( Supplementary Data 1 and 2 ).Moreover, TsiC has N-terminal FHA domains, unlike TshC, although they show poor conservation of the phosphothreonine recognition motifs (Fig. 2 , Fig. S2 ).WXG100 pr oteins ar e encoded at the T7SSi locus, but w e w ere not able to pinpoint which, if any, wer e likel y to encode TsxA.This is because in the other T7SS genetic clusters tsxA is found in a defined position/order relative to the other core genes, whereas in the T7SSi clusters the WXG100 protein-encoding genes were more variably located, often clustering beside candidate substrate genes (Figs 3 and 10 ; Supplementary Data 1 and 2 ).TsiN is unique to the T7SSi and is predicted to be a LytR-like transcriptional regulator that may potentially play a role in the regulation of T7SSi genes at the transcriptional le v el (Fig. 10 A and B; Supplementary Data 3 ).

T7SSj
The T7SSj was identified in a small subset of Clostridium species, and appears to represent a system similar to that of the T7SSd and T7SSe, sharing the orthologous components TsjA, TsjB, TsjC, TsjD, and TsjI (Figs 3 and 11 A and B; Supplementary Data 3 ).Ho w e v er, the core component, TsjC is phylogenetically distinct from that of either TsdC or TseC, clustering more closely with the orthologous component from T7SSf and T7SSb (Fig. 2 ; Supplementary Data 1 and 2 ), and we have therefore opted to classify T7SSj as a unique system.TsjC has N-terminal FHA domains containing the conserv ed phosphothr eonine r ecognition sequence .T he C-terminal domain of TsjD is also distinct from the C-terminal domains of either TsdD or TseD and is predicted to contain a tetratricopeptide repeat domain (Fig. 11 A; Supplementary Data 3 ).Much like the T7SSe, the T7SSj also contains the TsjF, TsjJ, and TsjK components, ho w e v er, it lac ks an orthologue of TseH, which is found in both the T7SSd and T7SSe.Inter estingl y, we found that in some individual genome sequences of Clostridium felsineum (DSM_794-NZ_CP096980.1), Clostridium acetobutylicum (ATCC 55025), and C. acetobutylicum (ATCC 824-NC_003030.1), both T7SSj and T7SSf loci co-occur, suggesting that some bacteria can produce more than one T7SS subtype.

Candidate T7SS substrate proteins can be identified for all new T7SS variants
Substrates of the T7SSa and T7SSb systems are often encoded adjacent to the T7SS locus in bacterial genomes (e.g.Damen et al. 2020 , Bowr an and P almer 2021 ).We, ther efor e anal ysed the genetic loci of example strains encoding each of the novel T7SSs to determine whether we could identify candidate substrates.As shown in Fig. 12 , likel y substr ate-encoding genes were present at the T7SS loci of some strains, as well as genes for candidate imm unity pr oteins .For the T7SSc , genes for a predicted His-Me finger (HNH) endonuclease toxin (WP_080489494.1) and immunity pr otein ar e found at the T7SSc locus of B. anthracis strain MCCC1A02161, and genes for an Rhs protein also with a potential C-terminal nuclease domain (WP_024632681.1)alongside a candidate imm unity pr otein ar e found at the same locus in Paenibacillus sp.MAEPY1.An Rhs-domain protein with an unknown C-terminal toxin domain and a predicted cytoplasmic immunity pr otein ar e also encoded next to the T7SSd in strain O. uli DSM 7084.It should be noted that Rhs toxins have been genetically linked with the T7SSa and T7SSb (Bullen et al. 2022, Bowran et al. 2023 ) and these observations suggest that they ar e mor e generally associated with T7SSs.A gene for a smaller to xin lik e protein, Recently it has been reported that the T7SSb can secrete a lipase that has a C-terminal helical targeting domain (Garrett et al. 2023 ).We observed similar 'reverse' lipase toxins encoded at the T7SSe gene cluster of Neobacillus sp.114 (WP_284036497.1) and the T7SSg cluster of Paenibacillus aquistagni strain 11 (both WP_139829221.1 and WP_085499087.1),suggesting that the capacity to recognize C-terminal targeting domains is also likely to be a general feature of the T7SS.In the two examples of the T7SSf gene clusters shown in Fig. 12 , multiple candidate imm unity genes ar e pr esent downstr eam of the pr edicted substrates (WP_077893465.1 in strain C. felsineum DSM 794 and WP_010963371.1 in C. acetobutylicum str ain DJ311), whic h has been classicall y observ ed for other pol ymor phic antibacterial toxins of the T7SSb (e.g.Klein et al. 2018, Garrett et al. 2022 ).
An NHN nuclease toxin (WP_019912159.1)alongside a SMI1/KNR4 famil y imm unity pr otein is encoded at the T7SSh gene cluster from Paenibacillus sp.HW567 and two likely toxins, WP_212693237.1 of unknown function and WP_244971330.1,with a predicted C-terminal CdiA-related endonuclease domain are encoded at the T7SSi locus of Vallitalea guaymasensis strain Ra1766G1.A candidate imm unity pr otein is encoded adjacent to eac h pr edicted toxin.Finall y, an LXG domain containing pr otein, WP_257675215.1, is encoded at the T7SSj locus of C. felsineum strain CUEA03 15.
For almost all of the candidate substr ate pr otein genes identified at the novel T7SS loci, we also observed that they were found next to one or two genes encoding small WXG100-related proteins.These were generally distinct from the TsxA/EsxA core compo-nent and likely represent Laps that interact with the helical targeting domains of their specific substrate partners as seen for the antibacterial T7SSb toxins (Klein et al. 2022, 2024, Garrett et al. 2023, Yang et al. 2023 ).

Discussion
Here, we describe eight nov el arr angements of the T7SS in Grampositive bacteria, based on phylogenetic and gene neighbourhood anal ysis, whic h we hav e named T7SSc-T7SSj.While the majority of these novel systems are encoded by members of the Bacillota phylum, the T7SSd, like the T7SSa, is found in Actinomycetota.Each of the T7SSc-T7SSj systems contains an orthologue of the membrane-bound FtsK-related ATPase EccC/EssC, each with four predicted C-terminal NTPase domains.EssC from the T7SSb differs from EccC in the T7SSa by the presence of FHA domains at its N-terminus.Of the no vel systems , the ATP ases fr om the T7SSc, T7SSg, and T7SSh also lack FHA domains whereas they ar e pr esent on the ATP ases fr om the other fiv e systems.Howe v er, ther e is no phylogenetic clustering of ATPases based on the presence or absence of FHA domains, and furthermore, the FHA domain sequences have very low sequence conservation ( < 24% identity) between the different systems, probably because they mediate interactions with distinct protein partners.
Alongside the ATPase component, two small globular proteins are also (almost) universally conserved throughout the T7 systems .T he first of these is EsxA/TsxA, a helical hairpin protein of the WXG100 family.EsxA is secreted by the T7SS, either as a homodimer by the T7SSb or as a heterodimer with a paralogous partner protein, EsxB, in the T7SSa.EsxA secretion is essential for the secretion of other T7SS substrates (Fortune et al. 2005, Kneuper et al. 2014 ).Ho w e v er, it is unknown whether EsxA activity is r equir ed both in the cytoplasm to support secr etion, and extr acellularly after secretion.A C-terminal sequence is present on the EsxA homodimer and the EsxAB heterodimer that interacts with EssC/EccC ATP ase domains, r egulating their conformation and activity (Champion et al. 2006, Rosenberg et al. 2015, Mietr ac h et al. 2020 ).This indicates a critical role inside the cell during the secr etion pr ocess, but ther e is also e vidence that at least some EsxA pr oteins hav e effector functions following secr etion, thr ough formation of pores in target membranes (de Jonge et al. 2007, Conrad et al. 2017, Spencer et al. 2021, Tak et al. 2021 ).Of the eight novel systems identified, most resembled the T7SSb by encoding a single esxA orthologue at the T7SS gene cluster.The exceptions to this are T7SSe and T7SSg where a nonidentical pair of EsxA paralogues are found, akin to EsxA and EsxB from the T7SSa system.Sur prisingl y, no clear esxA gene could be identified in the T7SS clusters of T7SSi-encoding str ains.Giv en the essentiality of EsxA proteins it would be unexpected for the T7SSi system to function without one or a pair of core WXG100 proteins.It should be noted, ho w e v er, that ther e ar e v ery fe w examples of T7SSiencoding strains present in the RefSeq database and availability of further genome sequences in future may allow us to be more confident in the identification of TsxA.

Figure 12.
Strains that encode the T7SSc-T7SSj systems also variably encode candidate toxin substrates, immunity proteins and Laps at their T7SS loci.Clinker output showing system-r epr esentativ e loci with genes annotated.The loci are centred on the tsxC gene of each system.Two arrangements are shown for each of systems c, d, and f, to highlight the v ariation seen is substr ate clusters.
The second small conserv ed pr otein is EsaB/TsxB.The structure of the EsaB orthologue from B. subtilis , YukD, has been solved and it adopts the same fold as ubiquitin but it lacks the C-terminal Gly-Gly motif that is essential for conjugation of ubiquitin to target l ysines (v an den Ent and Lo w e 2005 ).A ubiquitin-r elated pr otein, in each case lacking the C-terminal Gly-Gly motif, was identified for all of the novel systems, except the T7SSd.EsaB is essential for the function of the T7SSb, but its precise role is unclear (Casabona et al. 2017 ).Unexpectedl y, cryoEM anal ysis of a pr otomer of the T7SSa ESX-3 complex r e v ealed the pr esence of a cytoplasmic domain with the same fold as EsaB at the C-terminus of the poly-topic EccD protein (Famelis et al. 2019, Po w eleit et al. 2019 ).In the assembled ESX-5 system, the EccD domain interacts with the first nucleotide binding domain of EccC (Beckham et al. 2021 ), and it is ther efor e likel y that the globular EsaB/TsxB pr oteins similarl y interact with their cognate ATPases.
Across the eight no vel systems , predicted kinases and/or phosphatases were often found, and three of the systems (T7SSe, T7SSf, and T7SSj) also had a separate FHA-domain containing component.In these latter three systems, the phosphothreonine recognition motif on the TsxC ATPase is also well conserved.This raises the possibility that some of the T7SS subtypes may be reg-ulated by phosphorylation/dephosphorylation.Assembly and activity of the Gr am-negativ e type VI secretion system (T6SS) is also regulated by threonine phosphorylation.In the Agrobacterium tumefaciens T6SS, phosphorylation of one of the T6SS membrane proteins by a membrane-bound kinase promotes its interaction with an FHA domain pr otein and activ ates the secr etion system (Lin et al. 2014 ).Inter estingl y, kinase-dependent r egulation of the T7SSb has also been reported in Enterococcus faecalis (Chatterjee et al. 2020(Chatterjee et al. , 2021 ) ). Membr ane dama ge mediated by incoming pha ge attack is detected by IreK, a membrane-bound serine-threonine kinase involved in cell envelope homeostasis, resulting in transcriptional activation of the T7SS locus (Kristich et al. 2007, Chatterjee et al. 2021 ).
EspG c ha per ones hav e been well-c har acterized in the T7SSa system, and a related chaperone, EsaE, is found in some (but not all) T7SSb systems.A protein with predicted similarity to EspG was also encoded at the T7SSh and T7SSi loci.Each of the novel systems was also associated with substrate protein families related to those from the T7SSb, including LXG domain and RHS proteins, and lipases with r e v erse domain arrangement.Most substrates we identified were encoded at loci with genes for small La p-r elated partner proteins, pointing to a common mechanism for substrate secretion.
The T7SSg and T7SSh clusters are always encoded adjacent to genes for Tad pilus components.Tad pili are part of the type 4 pili superfamily and have roles in cell adherence, biofilm formation, and contact-dependent bacterial killing (Kac hlan y et al. 2001, Seef et al. 2021 ).It is not clear whether they are linked with these T7SSs, for example through a common form of regulation or thr ough shar ed biological functions, or whether their genetic colocation is coincidental.This underlines one of the shortfalls of genetic neighbourhood analysis when there are relatively few genome sequences available within a genus, and it remains possible that there are further T7 components for some of these systems that we have yet to identify.
In some instances, a copy of both the T7SSf and T7SSj gene clusters were encoded within the same clostridial genome sequence .T his is somewhat analogous to the multiple ESX paralogues seen in mycobacteria, although in the clostridial examples, these systems hav e differ ent r epertoir es of cor e components and should be considered orthologues rather than paralogues (Fig. 3 ).Inter estingl y, par alogous copies of the T7SSa ESX-4 have been found encoded on plasmids (Dumas et al. 2016, Newton-Foot et al. 2016 ).It is curr entl y unclear whether this is also the case for other T7SS variants and will require further analysis when additional sequences are a vailable .
Taken together, our analyses describes eight novel genetic arrangements of the T7SS.We anticipate that these findings will under pin further inv estigation into the div ersity of the T7SS in Gr am-positiv e bacteria.

Figure 1 .
Figure 1.Sc hematic r epr esentation of (left) the T7SSa and (right) T7SSb systems.Cor e components that ar e common to the two systems ar e shaded in black.Note that EsaE is only found in some T7SSb systems.

Figure 2 .
Figure2.Maximum likelihood tree from an amino acid alignment of a re presentati ve sample of EssC/EccC orthologues from each T7SS identified in Supplementary Data 1 and 2 .The nodes are labelled with bootstrap values, based on 1000 iterations .T he scale bar depicts evolutionary distance by the number of amino acid substitutions per site.

Figure 3 .
Figure 3. Genetic arrangements of the T7SS from the indicated organisms.Genes coding for components that are related between the different systems are shaded similarly.To simplify the nomenclature of the T7SSc-T7SSj systems, we propose to name the components as Ts(x), where 'x' refers to the letter associated with that system, so for example Tsc components are found in the T7SSc.The TsxA, TsxB, and TsxC components are (almost) univ ersall y conserv ed and ar e orthologues of EsxA, EsaB, and EccC/EssC, r espectiv el y.No TsxA-encoding gene is sho wn in the T7SSi gene cluster as w e were unable to identify a likely candidate (see text).

Figure 4 .
Figure 4. (A) Predicted domain arrangement and (B) predicted subcellular location of components of the T7SSc.

Figure 5 .
Figure 5. (A) Predicted domain arrangement and (C) predicted subcellular location of components of the T7SSd.(B) Overlay of the structural models for the kinase domains of TscD and TseD.For the structural model, RMSD for the kinase domain between 61 pruned atom pairs is 1.034 angstroms; across all 514 pairs is 97.233.

Figure 6 .
Figure 6.(A) Predicted domain arrangement and (B) predicted subcellular location of components of the T7SSe.

Figure 8 .
Figure 8. (A) Predicted domain arrangement and (B) predicted subcellular location of components of the T7SSg.

Figure 9 .
Figure 9. (A) Predicted domain arrangement and (B) predicted subcellular location of components of the T7SSh.

Figure 10 .
Figure 10.(A) Predicted domain arrangement and (B) predicted subcellular location of components of the T7SSi.

Figure 11 .
Figure 11.(A) Predicted domain arrangement and (B) predicted subcellular location of components of the T7SSj.