SITVITBovis—a publicly available database and mapping tool to get an improved overview of animal and human cases caused by Mycobacterium bovis

Abstract Limited data are available for bovine tuberculosis and the infections it can cause in humans and other mammals. We therefore constructed a publicly accessible SITVITBovis database that incorporates genotyping and epidemiological data on Mycobacterium bovis. It also includes limited data on Mycobacterium caprae (previously synonymous with the name M. bovis subsp. Caprae) that can infect both animals and humans. SITVITBovis incorporates data on 25,741 isolates corresponding to 60 countries of origin (75 countries of isolation). It reports a total of 1000 spoligotype patterns: 537 spoligotype international types (SITs, containing 25 278 clinical isolates) and 463 orphan patterns, allowing a wide overview of the geographic distribution of various phylogenetical sublineages (BOV_1, BOV_2, BOV_3 and BOV_4-CAPRAE). The SIT identifiers of the SITVITBovis were compared to the SB numbers of the Mbovis.org database to facilitate crosscheck among databases. Note that SITVITBovis also contains limited information on mycobacterial interspersed repetitive units-variable number of tandem repeats when available. Significant differences were observed when comparing age/gender of human isolates as well as various hosts. The database includes information on the regions where a strain was isolated as well as hosts involved, making it possible to see geographic trends. SITVITBovis is publicly accessible at: http://www.pasteur-guadeloupe.fr:8081/SITVIT_Bovis. Finally, a future second version is currently in progress to allow query of associated whole-genome sequencing data. Database URLhttp://www.pasteur-guadeloupe.fr:8081/SITVIT_Bovis


Introduction
Bovine tuberculosis (bTB) is an infectious disease caused by Mycobacterium bovis, a member of the Mycobacterium tuberculosis complex (MTBC). M. bovis alongside with M. caprae infect a wide range of mammalian hosts, including humans. An estimated 140 000 new cases of zoonotic tuberculosis (TB) and 11 400 deaths occurred globally in 2019 due to M. bovis (1). Knowing the difficulty to easily distinguish M. bovis from M. tuberculosis, these numbers possibly represent an underestimation of M. bovis cases worldwide.
According to the World Organisation for Animal Health (http://www.oie.int/en/), isolates of M. bovis or M. caprae were obtained from various animal species such as buffalos, bison, sheep, goats, horses, camels, pigs, wild boars, deer, antelopes, dogs, cats, foxes, minks, badgers, ferrets, rats, primates, llamas, kudu, eland, tapirs, elk, elephants, sitatungas, oryx, addax, rhinoceros, opossums, squirrels, otters, seals, hares, moles, raccoons, coyotes and several predatory felines including lions, tigers, leopards and lynx. Domestication of animals has long been a usual practice for mankind, and this practice seems to have caused irreversible effects in the evolution of bTB (2). Note that some animals could be infected by their own MTBC species (e.g. Mycobacterium microti, Mycobacterium pinnipedii, Mycobacterium mungi, Dassie bacillus, Mycobacterium oryx and Mycobacterium leprae). Details are provided in Supplementary File 1/ Supplementary Table S1. Genotyping methods such as spoligotyping (3) and mycobacterial interspersed repetitive units-variable number of tandem repeats (MIRU-VNTRs) typing (4) have proven to be efficacious to distinguish the strains belonging to MTBC (5)(6)(7). Despite the limitations (such as homoplasy) of these polymerase chain reaction-based methods (8), these techniques are well used worldwide for MTBC family identification. Application of fingerprinting tools facilitates analysis of the molecular epidemiology of M. bovis in animal-to-human, human-to-human and even animal-to-animal transmission (9). In addition, studies based on whole-genome sequencing (WGS) analyses provide additional details which tend to become the norm in genomic epidemiology and evolutionary studies (10)(11)(12).
Development of worldwide or local databases dedicated to animal and human TB improves our global understanding on evolution and propagation of the disease. Mbovis.org database (www.Mbovis.org) (13) provides authoritative names for spoligotypes (SB numbers) of all members of MTBC isolates of animal origin. MycoDB.es (https://www.visavet.es/mycodb/) is another database focusing on zoonotic TB hosted in Spain (14).
The purpose of this study is to describe the SITVITBovis database containing available genotyping information on 25 741 M. bovis or M. caprae isolates. In addition to this information, preliminary data extracted from 188 raw sequence reads allowed to make some links with WGS and classical genotyping data. By publicly releasing this multimarker SITVITBovis database which incorporates user-friendly online tools and interfaces, we hope to serve the global research community with a concerted and coordinated response to monitor and assess global bTB spread.

Ethics statement and collection of data
All the data (human and other mammalian isolates) were obtained from collaborating laboratories as described in SITVIT2 database (15). Data were duly de-identified prior to database entry. SITVITBovis is an excerpt from SITVIT2 database focusing on bTB isolates. It contains additional information on host (e.g. cattle, buffalo, deer, human, etc.) and SB_numbers collected from Mbovis.org database (13). Unlike Mbovis.org database that only provides spoligotypes, SITVITBovis provides available epidemiological data, such as information on host, WGS and MIRU-VNTRs. Many isolates were obtained from MycoDB.es study, accounting for more than 17 000 isolates at the time of the present study (16). Other isolates were obtained from other published studies (14,(17)(18)(19)(20). SITVITBovis database aims to grow in the future and further updates will be applied, notably with the release of the upcoming SITVITEXTEND database. A user guide is provided online to facilitate navigation through the internet.

Genotyping markers and WGS
The genotyping data included in SITVITBovis were similar to those previously described in SITVIT2 database (15). The methods used were spoligotyping (3) and MIRU-VNTRs typing, comprising 5-locus exact tandem repeats (ETR-A to E) (21) and 12-and 15-loci MIRU formats (4). The order of MIRU loci is as follows: 12-loci MIRU patterns- MIRU 2,4,10,16,20,23,24,26,27,31,  indicates spoligotyping patterns found at least two times in database; VNTR international type (VIT) indicates 5-locus ETR patterns found at least two times in database; and 12or 15-MIRU international type (12-MIT or 15-MIT) indicates 12-or 15-MIRU-VNTR patterns found at least two times in our database. Mbovis.org website was used to extract the SB numbers matching with some SIT numbers from SITVIT-Bovis database (Supplementary Table S2). bTB sublineages (BOV, BOV_1, BOV_2, BOV_3 and BOV_4 − CAPRAE) have been previously described in SITVIT classification using revised spoligotyping rules (7,15). Note that BOV sublineage corresponds to previously labeled BOV_Like sublineage in SITVITWEB. To make a link with WGS data, we searched for genomes identified as 'Mycobacterium bovis not BCG' deposited in Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) as suggested by a recent study (11). Then, an in-house pipeline algorithm (written in Perl language) was used to link WGS data extracted from the European Nucleotide Archive with classical genotyping information and sequencing-based tools. A total of 188 sequence reads were available with easily findable country information. The following tools were used for the analyses: SpoTyping, MIRUReader, SPAdes, SpolLineages, TBProfiler and Fast-lineage-caller (22)(23)(24)(25)(26)(27). A preliminary strain list with read accessions is available on the 'query' web page. Formal presentation and query of WGS isolates will be available in an upcoming dedicated database which will contain more isolates.

Computing approach for database construction
SITVITBovis is a publicly accessible website enabling the viewing of global spread of M. bovis isolates. It has been designed to work optimally with Google Chrome and Firefox. The web interface has been implemented using Java technology (Java Server Pages, Asynchronous JavaScript and Ajax), Google Code API and XML, under the integrated development environment Eclipse (https://www.eclipse.org/). Data were integrated within a MySQL database. Public access to the database is strictly on a read-only basis; therefore, no direct update of the database is allowed from this website. The web application is hosted and deployed on an Apache Tomcat Server (version 6).

Phylogenetic, statistical and bioinformatics analyses
Existing bioinformatics tools have been used to realize the phylogenetic analysis. Minimum spanning trees (MSTs) based on spoligotypes or MIRU-VNTRs were drawn using BioNumerics software version 6.6 (Applied Maths, Sint-Martens-Latem, Belgium) or MLVA Compare version 1.03 software (GenoScreen; Lille, France). MSTs are undirected connected graphs which link all nodes (representing the isolates) together with the fewest possible linkages between nearest neighbors. Spoligoforests based on spoligotypes were drawn using the SpolTools software (28,29); they allowed to describe and visualize the potential parent-to-descendant relationships among spoligotypes. In some cases, Spoligoforests have been colored and reshaped using GraphViz software (http://www.graphviz.org/). Contrary to the MST, the spoligoforest trees are directed graphs which only evolve by loss of spacers. In these trees, nodes are not necessarily all connected. In case of too many changes between two strains, there are no edges linking them. STATA software version 12 was used for descriptive and univariate analyses. Pearson's chi-square test and Fisher's exact test were used for comparison of different parameters and P values of <0.05 were considered as statistically significant. V-DICE tool was used to compare discriminatory powers of typing methods, providing Hunter-Gaston Diversity Index (HGDI) or Simpson's diversity index (30).

Worldwide diversity of bTB genotypes
The SITVITBovis database provides a broad overview of the geographic distribution of various phylogenetical sublineages belonging to M. bovis (BOV_1, BOV_2, BOV_3 and BOV_4 − CAPRAE). This web application also helps distinguish between strains isolated from humans and other mammals in order to better understand transmission pathways and other factors that could potentially be responsible for the global spread of bTB. In our study, among the 25 741 isolates, BOV_1 sublineage was globally the most predominant, representing 67.7% (n = 17 427) of bTB clinical isolates, followed by BOV_2 (n = 3258 or 12.7%), BOV (n = 3157 or 12.3%), BOV_4 − CAPRAE (n = 1735 or 6.7%) and BOV_3 (n = 164 or 0.6% of isolates). (i) BOV_1 was notably predominantly found (number of isolates ≥ 25) in Russia, Western and Southern Europe, the whole African continent (North, Middle, West, Austral and East), the Middle-East (Western Asia), Southern Asia and North and South America, with proportions between 43% and 99% ( Figure 1). (ii) BOV_2 sublineage was predominantly found in Northern Europe (84% of clinical isolates), Eastern Asia (68%), Australasia (54%), South America (35%), North America (25%), Central America (9%) and Austral Africa (8%) (Figure 1). (iii) BOV 3 sublineage was scarcely found in our study, being limited to East Asia (24%), North America (12%) and South America (4%). (iv) BOV_4 − CAPRAE lineage was predominantly found in Eastern Europe (representing 83% of isolates). This lineage represented 8% of clinical isolates in Western Europe and South America and 6% of the isolates in North Africa. (v) BOV sublineage was predominantly found in Central America (59% of isolates), followed by East Africa (46% of strains), and with proportions from 12% to 23% in North, West and Central Africa; North and South America; Western and Southern Europe; and Western and Southern Asia (Figure 1). The database provided information on strains belonging to human hosts (n = 473 isolates) and other mammalian hosts (mainly cattle; n = 18 769 isolates). Most of the bTB isolates from animals were found in Southern Europe (predominantly in Spain), whereas available isolates from humans were mainly found in Western Europe (n = 149), Northern Europe (n = 120) and followed by Northern Asia or Russia (n = 108) ( Figure 1). Among human isolates, gender of patients was known for 301 isolates, and the global male/female sex ratio was 173/128 = 1.35. In countries where at least 10 human isolates were recorded, the highest sex ratio was observed in Southern Europe (male/female sex ratio=3.09), as opposed to the lowest ratio in Western  Europe (male/female sex ratio=0.94). Regarding the distribution of lineages between human and animal hosts (Supplementary File 1/Supplementary Figure S1), BOV_1 sublineage was globally more common among human hosts (n = 388/473; 82.03%), followed by BOV sublineage (n = 72/473; 15.22%). Considering the diverse sub-regions, BOV sublineage that predominated among animal isolates in Central America (around 88% of isolates) was also largely present among patients from Northern Europe (around 36% of isolates). Lastly, the BOV_3 sublineage was merely visible in patients from Eastern Asia (n = 2) and South America (n = 1). Table 1 shows the distribution of predominant spoligotypes in SITVITBovis.

Proposal of an international consensus schema based on MIRU-VNTR loci
Several studies have reported affordable and suitable sets of VNTR loci to discriminate bTB isolates (31)(32)(33). However, most of these studies have been based on specific countries. In SITVITBovis database, we have gathered data from various countries, providing a wider view of bTB diversity. According to pooled data in our database, an optimal consensus set could be based on 13 MIRU-VNTR loci showing reasonably high discriminatory power (HGDI>0.  Table 2).
In addition to the aforementioned VNTRs, we also recommend to include QUB3232 (VNTR3232) as suggested earlier by other studies (31,32,34), to further increase the discriminatory power in bTB diversity analyses.  (Table 3).
Considering only the bTB strains isolated in Europe (north, south and west, with most of the available data), we noted a significant disparity in the distribution of age groups (P-value < 0.0001). For all age groups (0-20 years, 21-40 years, 41-60 years and >60 years), we observed that the proportion of patients in Western Europe (EURO-W) was similar in each age group, representing around 20% (from 18 to 24%) of cases, except for the age group >60 years which represented about 35% of patients (Supplementary File 1/Supplementary Figure S2A). On the contrary, a majority of patients in Southern Europe (EURO-S) belonged to the age group 21-40 years (51% of patients), as opposed to the age group >60 years (81% of cases) in Northern Europe (EURO-N). These observations suggest that bovine tuberculosis is a major problem primarily affecting the working population in Southern Europe, while it rather concerns reactivation cases among senior patients in Northern Europe.

Discussion
Results obtained from this study suggested an important heterogeneity in worldwide distribution of bTB isolates. Both significant cleavages and similarities were observed in the geographical distribution of various bTB sublineages defined in our database as well as for the various bTB genotypes.
Diversity, presence and/or absence of certain spoligotyping spacers have been revealed as being particularly useful for discriminating specific bTB isolates circulating worldwide (16,13). Our web-based tool allows to have a global overview of bTB genotypes (including spoligotypes, E-locus ETRs and 12-and 15-loci MIRU-VNTRs). By using SITVITBovis tool, users can remarkably visualize the geographical position (at sub-region, country or city level) of specific genotypes in addition to available epidemiological data as well as drug resistance information; the latter being an important health issue in various regions of the world (35).
MIRU-VNTR method obviously increases the discriminatory power of spoligotyping; nonetheless, spoligotyping alone is still a reliable method to differentiate M. bovis isolates ( Table 2). Several studies have focused on the diversity of MIRU-VNTRs and/or spoligotypes involved in bTB (31,32). However, the majority of these studies generally focused on limited geographical areas.
In our analysis, spoligotyping was globally considered as a reasonable typing method to discriminate bTB isolates (with a global HGDI of 0.930). However, according to recent research, its discriminatory power could vary in function of countries (33). Despite an international consensus scheme of using MIRU-VNTR loci concomitantly with spoligotyping for improved characterization of MTBC (4-6, 15), only selected groups working on bTB isolates have used such an approach (31)(32)(33)(34). Henceforth, the scarcity of MIRU-VNTR data in SITVITBovis should be improved by adding new data in future updates. Nonetheless, with data on 25 741 isolates (75 countries of isolation corresponding to 60 countries of origin), SITVITBovis still incorporates valuable data from other resources. Unfortunately, information on host is not always provided for all collected bTB isolates in published literature, which hampers efforts to precisely calculate the real incidence of bTB among human vs. other animal hosts. Due to the lack of available information, we did not address the issues of clonal complexes of M. bovis marked by specific regions of deletions (RDs) or single-nucleotide polymorphisms (SNPs) in the present study. Nevertheless, information on RDs may be obtained by querying the database and exporting results into an Excel file. Last but not least, WGS and SNPs are being frequently used today to decipher bTB transmission and identify lineages involved (36,37). Studies using WGS already provide significant insights in bTB diversity, distribution and evolution (10)(11)(12) and may help identify hosts that could serve as reservoir of bTB infection and spread. As an illustration, a preliminary table linking WGS and classical genotyping data for some bTB isolates is available (Supplementary Table S3).
As the genomic information becomes more easily accessible in conjunction with geographical origin and hosts, we plan to automatically extract from sequencing data a number of variables such as specific phylogenetic markers, drug resistance, genomic determinants of virulence/pathogenicity or other relevant information. This novel generation of database(s) will be able to link the newly generated information with the existing data on classical genotyping information vs. hosts and geographical mapping. The development of all these tools in future bTB genotyping databases would possibly allow anticipating bTB outbreaks among humans and animal hosts alike.

Conclusion and perspectives
In summary, our study provides a global overview of bTB distribution, highlighting potential relationships between bTB genotypes and affected hosts as well as other epidemiological features. SITVITBovis, with its correlation with other existing resources (such as Mbovis.org), provides a tentative picture of bTB circulation in the world and therefore represents a nonnegligible resource for monitoring bTB and make correlations between epidemiological, macro-geographical and other data available.
Future developments aim to enrich the database with WGS data in conjunction with geographical origin and hosts. Development of automated extraction of knowledge related to specific phylogenetic markers, drug resistance, genomic determinants of virulence/pathogenicity or other relevant information will soon make it possible to link the newly generated information with the existing data generated using classical genotyping tools. The development of such a strategy in conjunction with information on hosts and subsequent geographical mapping will boost our efforts to generate a real comprehensive snapshot of global bTB diversity. Being helpful to better identify, treat and control bTB, such a tool will be highly beneficial for both medical and veterinarian specialists as well as public health authorities.

Supplementary data
Supplementary data are available at Database Online.