TADB 3.0: an updated database of bacterial toxin–antitoxin loci and associated mobile genetic elements

Abstract TADB 3.0 (https://bioinfo-mml.sjtu.edu.cn/TADB3/) is an updated database that provides comprehensive information on bacterial types I to VIII toxin–antitoxin (TA) loci. Compared with the previous version, three major improvements are introduced: First, with the aid of text mining and manual curation, it records the details of 536 TA loci with experimental support, including 102, 403, 8, 14, 1, 1, 3 and 4 TA loci of types I to VIII, respectively; Second, by leveraging the upgraded TA prediction tool TAfinder 2.0 with a stringent strategy, TADB 3.0 collects 211 697 putative types I to VIII TA loci predicted in 34 789 completely sequenced prokaryotic genomes, providing researchers with a large-scale dataset for further follow-up analysis and characterization; Third, based on their genomic locations, relationships of 69 019 TA loci and 60 898 mobile genetic elements (MGEs) are visualized by interactive networks accessible through the user-friendly web page. With the recent updates, TADB 3.0 may provide improved in silico support for comprehending the biological roles of TA pairs in prokaryotes and their functional associations with MGEs.

Table S2.Statistical distribution of predicted TA-MGE relationships for each TA type and MGE type.In the homology search section, the protein sequences are input to BLASTp and HMMER3 to search for protein homologues, while the genome sequence is input to BLASTn to identify RNA toxins and antitoxins.The E-value (0.01 by default) for BLAST and HMMER3 as well as the identities (30% by default) for BLAST are set to filter out the results.In the TA pairing section, for the identification of types II to VII TA loci, the toxin hits and antitoxin hits should be located in the same strand, and the maximum intergenic distance (150 bp by default) is set for identifying the TA operon structure.For the identification of type I TA loci, the toxin gene and antitoxin RNA should be located on the opposite DNA strands, rather than forming an operon structure on the same strand.In addition, for the identification of type VIII TA loci, we took into consideration the two experimentally validated type VIII systems.The creTA loci had RNAs located either on the same strand or on the opposite strands (4-6), while the SdsR-RyeA loci had the two RNAs located on the opposite strands (7).Consequently, we predicted these two distinct type VIII TA loci based on their respective characteristics.

TADB 3 .
0: an updated database of bacterial toxin-antitoxin loci and associated mobile genetic elements SUPPLEMENTARY DATA Table S1.Statistical summary of experimentally validated and in silico TA loci archived in TADB 3.0 compared to previous database versions.

Figure S2 .
Figure S2.Workflow of the types I to VIII TA loci prediction tool TAfinder 2.0.

Figure S3 .
Figure S3.Prediction of types I to VIII TA loci and identification of TA-associated MGEs.

Figure S4 .
Figure S4.Taxonomic distribution and TA family distribution of the in silico predicted types I to VIII TA loci among the prokaryotic organisms archived in TADB 3.0.

Figure S5 .
Figure S5.The numbers of various types of MGEs associated with TA loci.

Figure S6 .
Figure S6.Distribution of the TA-MGE relationships among mobile genetic elements.

Figure S1 .
Figure S1.Statistics module in TADB 3.0.(A) Interactive networks displaying the TA-MGE relationships.Links between TA loci and MGEs indicate that the TA loci are located within MGEs or the TA loci are flanked by ISs/transposons with an interval <5 kb.Note that only experimentally validated TA loci are displayed.(B) Interactive networks displaying the homology networks of different types of toxin proteins and antitoxin proteins.Links between proteins indicate that the BLASTp-based Ha-value between these proteins is greater than 0.36 (1,2).(C) Interactive pie charts showing the taxonomic distribution of in silico predicted TA loci.Users can browse for TA loci in a specific taxon by clicking on the corresponding pie chart region.

Figure S2 .
Figure S2.Workflow of the types I to VIII TA loci prediction tool TAfinder 2.0.The updated TAfinder 2.0 contains four main steps for TA loci prediction.For the input section, two types of inputs are acceptable: an annotated genome in the GenBank format or an unannotated genome sequence in the FASTA format.For an unannotated genome sequence input, Prodigal would be used for protein-coding sequence (CDS) identification before sequence extraction (3).In the preprocess section, the extracted sequences would be filtered by user-defined maximum length (500 a.a.by default) and minimum length (30 a.a.by default).In the homology search section, the protein sequences are input to BLASTp and HMMER3 to search for protein homologues, while the genome sequence is input to BLASTn to identify RNA toxins and antitoxins.The E-value (0.01 by default) for BLAST and HMMER3 as well as the identities (30% by default) for BLAST are set to filter out the results.In the TA pairing section, for the identification of types II to VII TA loci, the toxin hits and antitoxin hits should be located in the same strand, and the maximum intergenic distance (150 bp by default) is set for identifying the TA operon structure.For the identification of type I TA loci, the toxin gene and antitoxin RNA should be located on the opposite DNA strands, rather than forming an operon structure on the same strand.In addition, for the identification of type VIII TA loci, we took into consideration the two experimentally validated type VIII systems.The creTA loci had RNAs located either on the same strand or on the opposite strands (4-6), while the SdsR-RyeA loci had the two RNAs located on the opposite strands(7).Consequently, we predicted these two distinct type VIII TA loci based on their respective characteristics.

Figure S3 .
Figure S3.Prediction of types I to VIII TA loci and identification of TA-associated MGEs.TA loci and MGE were predicted using TAfinder 2.0 and VRprofile2, respectively.To increase the specificity and reliability of TA loci predictions, only toxin and antitoxin BLAST hits with H-value > 0.36 (2) were kept.TA-associated MGEs are defined as the MGEs harboring TA loci or the ISs/transposons flanked by TA loci with an interval <5 kbp.

Figure S4 .
Figure S4.Taxonomic distribution and TA family distribution of in silico predicted types I to VIII TA loci among the prokaryotic organisms archived in TADB 3.0.Panels(A-H) display the distributions of types I to VIII loci, respectively.The upper bar plots display the number of strains of each species harboring each type of predicted TA loci.The number of TA loci of each type in each species is also displayed in total and per strain.The lower stacked bar plots display the TA family distribution of each type of TA system within each species.Only the top 20 species with the highest numbers of predicted TA loci were displayed for each type of TA system.

Figure S5 .
Figure S5.The numbers of various types of MGEs associated with TA loci.

Figure S6 .
Figure S6.Distribution of the TA-MGE relationships among mobile genetic elements.Panels (A-H) show the distributions of TA families in various MGEs, including plasmids, genomic islands, prophages, ISs/transposons (Tn), IS cluster/Tn, integrative conjugative elements (ICEs) and integrons (Supplementary Figure S4), respectively.For each MGE type, only the top 10 associated TA families are displayed.For each TA family, the number of TA-MGE relationships is displayed.

Table S1 .
Statistical summary of experimentally validated and in silico TA loci archived in TADB 3.0 compared to previous database versions.

Table S2 .
Statistical distribution of predicted TA-MGE relationships for each TA type and Abbreviations: GI: genomic island; IS: insertion sequence; Tn: transposon; ICE: integrative and conjugative element.Note:The TA-MGE relationships were calculated separately if one TA locus was associated with multiple MGEs or one MGE was associated with multiple TA loci.