Characterization of the cluster MabR prophages of Mycobacterium abscessus and Mycobacterium chelonae

Abstract Mycobacterium abscessus is an emerging pathogen of concern in cystic fibrosis and immunocompromised patients and is considered one of the most drug-resistant mycobacteria. The majority of clinical Mycobacterium abscessus isolates carry 1 or more prophages that are hypothesized to contribute to virulence and bacterial fitness. The prophage McProf was identified in the genome of the Bergey strain of Mycobacterium chelonae and is distinct from previously described prophages of Mycobacterium abscessus. The McProf genome increases intrinsic antibiotic resistance of Mycobacterium chelonae and drives expression of the intrinsic antibiotic resistance gene, whiB7, when superinfected by a second phage. The prevalence of McProf-like genomes was determined in sequenced mycobacterial genomes. Related prophage genomes were identified in the genomes of 25 clinical isolates of Mycobacterium abscessus and assigned to the novel cluster, MabR. They share less than 10% gene content with previously described prophages; however, they share features typical of prophages, including polymorphic toxin–immunity systems.


Introduction
Prophages are viral genomes integrated into bacterial genomes and they contribute to the genetic diversity and virulence of many bacterial pathogens (Figueroa-Bossi et al. 2001;Brü ssow et al. 2004;Fortier and Sekulovic 2013;Wang and Wood 2016;Costa et al. 2018;Fortier 2018). Clinically important nontuberculosis mycobacteria (NTM), such as Mycobacterium abscessus, often cause drug-resistant infections and continue to be a significant public health burden (Nasiri et al. 2017). The majority of clinical NTM carry prophage genomes that are enriched in genes that potentially promote bacterial fitness and virulence (Glickman et al. 2020;Dedrick et al. 2021).
The prophages of M. abscessus are vastly diverse and distinct from the mycobacteriophage genomes in the Actinobacteriophage database of phagesdb.org (Russell and Hatfull 2017;Dedrick et al. 2021). Dedrick et al. (2021) identified 122 prophage sequences in 82 clinical isolates of M. abscessus of which 67 were unique. These were sorted into 17 Mab clusters (MabA-MabQ) based on the shared gene content (>35% shared genes) (Dedrick et al. 2021). Many of the prophages encode toxin/antitoxin and polymorphic toxin-immunity (PT-Imm) systems that are hypothesized to contribute to virulence (Zhang et al. 2012;Dedrick et al. 2021). We recently described a novel prophage genome, named McProf, in the genome of Mycobacterium chelonae (M. chelonae CCUG 47445 coordinates 1,521,426-1,589,648) that shares only 10% gene content with the Dedrick et al. prophages but encodes numerous genes expressed during lysogeny, including a PT-Imm system (Cushman et al. 2021).
McProf contributes to the intrinsic drug resistance of M. chelonae and increases expression of the conserved mycobacterial regulator of intrinsic antibiotic resistance genes, whiB7, when superinfected by a second mycobacteriophage (Cushman et al. 2021). Understanding the prevalence of this novel prophage genome and its relationship with known prophage genomes will be important for a better understanding of the role of prophage genomes in mycobacterial fitness and virulence.
In this study, prophage genomes related to McProf were identified in 25 published genomes of M. abscessus, and in 1 genome of Mycobacterium phlei. Gene content was compared with prophage genomes described by Dedrick et al. (2021) and sorted into a novel cluster, MabR (Dedrick et al. 2021). Here, we report the genomes of 5 unique cluster MabR genomes, including 4 M. abscessus prophages and the original M. chelonae prophage McProf.

Materials and methods
Identification and extraction of prophage from mycobacterial genomes genomes within bacterial genome sequences (Arndt et al. 2016). Precise coordinates were determined after manual inspection of prophage genomes and identification of repeat sequences that flank the prophage genome and represent the common core of attL/attR sites. Each prophage sequence was extracted with the identified attachment sites defining the genome ends. Prophages were named according to the strain in which they reside, i.e. prophiXXXX01-1, with suffixes used to denote multiple prophages in the same genome as described by Dedrick et al. (2021).

Prophage genome annotation and comparative genomics
Prophage genes were predicted using Glimmer and GeneMark within DNA Master (http://cobamide2.bio.pitt.edu/) and PECAAN (https://discover.kbrinsgd.org/) (Delcher et al. 1999;Borodovsky et al. 2003). The start site for each gene was determined through manual inspection. Gene functions were predicted using the web-based tools HHpred and NCBI BLASTp (Altschul et al. 1990;Sö ding et al. 2005). Dot plots were constructed using gepard using default settings (Krumsiek et al. 2007). The prophage network phylogeny is based on shared gene content and was created in SplitsTree (Huson 1998). Genome maps were created using Phamerator and the "Actino_Mab_Draft" database, version 19 (Cresawn et al. 2011). Integration sites were predicted by comparing flanking bacterial sequence in each prophage genome to that of M. abscessus ATCC 19977. Specific integration locations were determined by probing the previous integration region with the attL sequence for each prophage. Alignments with 100% sequence identity were considered to be core attB sites.  Table 2). The ends of the prophage genomes were determined by the left and right attachment sites flanking the prophage genomes (Table 2). Prophages were named by the strain they were extracted from and the number of prophages identified in the strain: prophi[strain]-# (Table 2). McProf and the 4 McProf-like prophage genomes: prophiFSAT01-1, prophiFSIL01-1, prophiFSQJ01-1, and prophiFVLQ01-1 share less than 10% genome content with the M. abscessus prophages described by Dedrick et al. (2021) and were assigned to a novel cluster, MabR (Fig. 1a) (Dedrick et al. 2021). The MabR prophages overall have high nucleotide similarity to one another (Fig. 1b).

Integration locations
The integration sites of MabR prophage were determined and compared to that of prophage described by Dedrick et al. (2021). The prophage genomes integrated into known M. abscessus attB sites, often in the 3 0 end of tRNA genes (Table 3). Three prophage genomes, McProf, prophiFSAT01-1, and prophiFVLQ01-1, integrate into the 3 0 end of a tRNA-Lys (attB-18) as described in Dedrick et al. (2021) (Fig. 2). prophiFSIL01-1 integrates into the 3 0 end of a tRNA-Lys (attB-22) and prophiFSQJ01-1 integrates into Mab_0771c (attB-23), a predicted major transport protein. attB-23 was the only cluster MabR integration site identified within a protein-coding sequence. The attB core sequences and coordinates for each identified integration site are listed in Table 3. Coordinates of the selected phage in the host where it was first identified (e.g. prophiFSAT01-1 in the genome FSAT01). The contig number (C1, C2, etc.) is shown followed by the coordinates within that contig. Coordinates are arranged attL to attR.

Genomic organization of cluster MabR genomes
MabR prophages have very similar genome architectures and areas of conserved gene content (Fig. 3). The genomes are tightly packed, typical of mycobacteriophage genomes, containing 98-102 genes across approximately 67 kb. The integration and immunity cassettes are located immediately adjacent to the left attachment site (attL). All MabR genomes share a rightward transcribed tyrosine integrase (gp1), a gene of unknown function (gp2), and a leftward transcribed phage repressor (gp3) (superinfection immunity repressor) (Figs. 3, 4 and S1). The immunity repressor is distinct from immunity repressors encoded by other Mab cluster prophages; however, it is a homolog of the immunity repressors found in the genomes of 5 cluster K2 mycobacteriophage, DismalFunk, DismalStressor, Findely, Marcoliusprime, and Milly. A Cro and excise gene (gp4 and 5) are divergently transcribed from the immunity repressor (Figs. 3 and 4). The early lytic genes that follow show some diversity across the 5 MabR genomes, particularly in prophiFSQJ01-1. The structural, assembly, and lysis cassette genes are highly conserved across MabR genomes. Between the lysis cassette and the right attachment site (attR) is a group of diverse genes that are most likely expressed during lysogeny (Fig. 3) (Dedrick et al. 2017;Cushman et al. 2021). Some of the genes shared across all MabR genomes are unique to the cluster and include a DNA polymerase III sliding clamp, an ADP-   purple is the most similar and red is the least similar above a BLASTN E threshold of 10 À5 . The ruler represents the coordinates of the genome. Forward and reverse-transcribed genes are shown as boxes above and below the ruler, respectively. Maps were generated using Phamerator and the database, "Actino_Mab_Draft (version 20)" (Cresawn et al. 2011). ribosyl glycohydrolase, a helix-turn-helix DNA binding domain protein, and an AAA-ATPase. Immediately adjacent to attR, all MabR prophage genomes encode a reverse-transcribed PT-Imm system that include an ESAT6-like WXG-100 protein, a polymorphic toxin (PT), and cognate immunity protein (Figs. 3 and 5). Dedrick et al. (2021) identified 21 distinct, modular, PT-Imm systems across 50 M. abscessus prophage (Dedrick et al. 2021). These systems consist of a large PT and a cognate immunity protein (PT-Imm) to prevent self-toxicity and at least 1 ESAT6-like WXG-100 protein. The cluster MabR genomes contain one of 2 types of PT-Imm systems (Figs. 3 and 5). The PT in the McProf and prophiFSIL01-1 genomes has an N-terminus WXG-100 domain and a C-terminus Tde-like DNAse toxin domain (Ntox15 PF15604) (Ma et al. 2014;Cushman et al. 2021). Downstream is the Tdi-like PT-Imm protein with GAD-like and DUF1851 domains (Ma et al. 2014). This PT-Imm system is also found in the genome of prophiGD43A-5 (Fig. 6). Although the 3 PT genes carry the same Ntox15 domain, they share low sequence identity across the linker and WXG-100 domains. In the NCBI database, this PT-Imm system is also found in the genomes of Mycobacterium phage phiT46-1 (accession number NC_054432.1) and numerous mycobacterial species including M. abscessus, Mycobacterium goodie, and Mycobacterium salmoniphilum.

Polymorphic toxin systems
The genomes of prophiFSAT01-1, prophiFSQJ01-1, and prophiFVLQ01-1 carry a gene cassette that is organized like a PT-Imm system and encodes an ESAT6-like WXG-100 protein (Fig. 5). However, we were unable to predict toxin and immunity domains. The presumed PT gene has an N-terminus WXG-100 domain but lacks an identifiable toxin domain in the C-terminus. Likewise, the downstream gene lacks domains known to be associated with immunity, such as SUKH or Imm (Zhang et al. 2012;Dedrick et al. 2021). This second PT-Imm system is also found in the cluster MabQ genome, prophiGD79-1 (Fig. 6) (Dedrick et al. 2021).

Discussion
The majority of bacterial pathogens carry prophages that are known to contribute to bacterial virulence and fitness (Figueroa-Bossi et al. 2001;Brü ssow et al. 2004;Wang and Wood 2016). Prophage introduces novel genes into bacterial genomes that can result in phenotypes that are more competitive in bacterial populations (Brü ssow et al. 2004;Wang and Wood 2016). The prophage McProf is found in the Bergey strain of M. chelonae (ATCC 35752) and increases bacterial resistance to aminoglycosides (Cushman et al. 2021). Although the McProf genome is distinct from the M. abscessus prophages described by Dedrick et al. (2021) (Dedrick et al. 2021), it is clearly related to a novel subgroup of prophage genomes identified in the genomes of clinical M. abscessus isolates and, therefore, was assigned to the novel cluster, MabR.
The majority of the MabR prophages were identified in the genomes of M. abscessus isolates, although a prophage genome that shared 100% nucleotide with McProf was identified in M. phlei. Of the 25 MabR genomes identified in M. abscessus strains, only 4 were unique and these were typically found in isolates with the same geographical origin (Table 1). Strains of the same geographic origin also typically carried identical cohabitating prophages, suggesting that the bacterial strains are highly related.
The MabR prophage genomes, although distinct in overall gene content, share a genome organization and some gene features that are typical of the prophages described by Dedrick et al. Organization of MabR PT-Imm systems. MabR genomes are aligned at their PT-Imm systems beginning at the 3 0 end of the predicted immunity proteins. Genomes are displayed as described in Fig. 2 but are ordered in such a way that genomes with the most similarity in this region are next to each other. Also shown are the motifs/domains found at the N-and C-termini of MabR PTs. All predicted PTs have a single WXG-100 motif at the N-terminus while the C-terminus is variable. Note that gp99 in prophiFSIL01-1 and prophiFSAT01-1 has no predicted function and is included to show the relationship of the PT systems to the genome ends.
Agrobacterium (Ma et al. 2014). It is not yet known whether mycobacterial prophage-encoded toxins are secreted, but it is hypothesized that the toxin dimerizes with the small WXG-100 protein (gp99 in McProf) via the WXG100 domains and is secreted by the mycobacterial T7SS (Esx-3 or Esx-4) (Zhang et al. 2012;Cushman et al. 2021;Dedrick et al. 2021).
It is not clear yet if the PT-Imm systems of the MabR prophage are important for bacterial fitness, but it is known that the presence of the McProf genome increases M. chelonae resistance to aminoglycosides relative to a nonlysogen strain (Cushman et al. 2021). The addition of a second prophage, cluster G phage BPs, to this strain further increased the aminoglycoside resistance and increased the expression of mycobacterial antibiotic resistance genes in the whiB7 regulon, including whiB7 (Sampson et al. 2009;Cushman et al. 2021). This large change in whiB7 expression and aminoglycoside resistance is driven by the presence of the McProf genome as it is not observed in strains carrying the BPs' prophage alone. There are 16 genes expressed from the McProf genome during lysogeny of M. chelonae that potentially contribute to altered whiB7 expression and increased aminoglycoside expression (Cushman et al. 2021). Many of these genes are common across the MabR genomes including the McProf PT-Imm cassette (gp97-99), gp91 and 92, and gp85 and 86 (Fig. 3). A better understanding of the function and role these genes potentially play in mycobacterial fitness will improve our overall understanding of how prophage contributes to mycobacterial virulence.
Supplemental material is available at G3 online.