Abstract

Summary

Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing is widely used to genotype Mycobacterium tuberculosis complex in epidemiological studies for tracking tuberculosis transmission. Recent long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore Technologies can produce reads that are long enough to cover the entire repeat regions in each MIRU-VNTR locus which was previously not possible using the short reads from Illumina high-throughput sequencing technologies. We thus developed MIRUReader for MIRU-VNTR typing directly from long sequence reads.

Availability and implementation

Source code and documentation for MIRUReader program is freely available at https://github.com/phglab/MIRUReader.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing is a polymerase chain reaction (PCR)-based method widely used to genotype Mycobacterium tuberculosis complex (MTBC) that causes tuberculosis (TB). MTBC isolates with identical MIRU-VNTR profiles can be clustered to identify epidemiologically linked cases (Supply et al., 2006). Although whole-genome sequencing (WGS) is demonstrated to have higher resolution for cluster identification (Wyllie et al., 2018), a large database of distinct MTBC isolates with MIRU-VNTR genotypes had been collected over the past decade worldwide by TB researchers and National TB control programmes (Allix-Beguec et al., 2008; Lim et al., 2013). To facilitate comparison and linking to these historical isolates (Chee et al., 2015), it is important that MIRU-VNTR genotypes can be determined from the WGS data.

MIRU-Profiler is a tool that performs MIRU-VNTR typing from complete genomes and draft assemblies (Rajwani et al., 2018). However, de novo genome assembly is usually computationally intensive and requires significant amount of time. Recently, sequencing platforms from Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) can generate long reads (1–100 kbp) (Ameur et al., 2019) that span the entire tandem repeats for each MIRU-VNTR locus. We thus develop MIRUReader, a software tool that can rapidly determine MIRU-VNTR genotypes from either the long reads directly or assembled genomes, considering the high sequencing errors.

2 Implementation and datasets used

MIRUReader is written in Python and accepts sequencing reads in FASTQ or FASTA format. The primersearch program by EMBOSS (Rice et al., 2000) is used to scan long reads to find amplicons flanked by the PCR primers for each locus with maximum of 18% mismatch. For each MIRU-VNTR locus, zero, one or more amplicons could be identified. The repeat number each amplicon corresponds to can be deduced from the amplicon length and an allele calling table (Weniger et al., 2010), mimicking the laboratory protocol in determining the 24-locus MIRU-VNTR profile. Amplicons longer than 1828 bp are excluded from analysis. This will result in a set of repeat numbers for each MIRU-VNTR locus. The mode will thus be assigned as the repeat number for that locus. If multiple modes exist, the mismatch in the primer sequences alignments obtained through primersearch will be analyzed. The modal repeat number that has the lowest total number of mismatches in the alignments will be the assigned repeat number for the locus. In situations where the modal repeat numbers have equal total number of mismatches, the locus will have multiple repeat numbers. For example, locus MIRU2996 for sample MTB08 was assigned two repeat numbers (Supplementary Table S1). If no amplicon is detected, the locus will be assigned ‘ND’. MIRUReader outputs the 24-locus MIRU-VNTR pattern to the display screen in a tab-delimited format that can be redirected to a text file and viewed in Excel spreadsheet.

We compared MIRU-profiler and MIRUReader across three datasets. The first dataset consists of 17 samples sequenced using ONT MinION. MIRU-VNTR genotyping was performed using the Genoscreen MIRU-VNTR Quadraplex kit according to the manufacturer’s protocol. Two samples were excluded from analysis due to incomplete experimental MIRU-VNTR profiles. Raw fast5 files were demultiplexed and base-called using Albacore (v2.3.3). The sequence reads were demultiplexed again where adapters were trimmed using Porechop (v0.2.3, available from https://github.com/rrwick/Porechop). The filtered ONT reads were then de novo assembled using Canu (v1.7.1) (Koren et al., 2017). The raw draft assemblies were polished with ONT reads to improve consensus accuracy using nanopolish (v0.10.2) (Loman et al., 2015). The second dataset comprises six PacBio and one ONT sequenced samples where their reads and genome assemblies were downloaded from the National Center for Biotechnology Information database. Experimental MIRU-VNTR profiles for these samples were obtained through literature review. The third dataset is the set of 17 genome assemblies presented in Table 1 of the MIRU-profiler manuscript (Rajwani et al., 2018).

3 Results and performance

For datasets 1 and 2, MIRUReader achieved better accuracies in the prediction of 24-locus MIRU-VNTR profiles than MIRU-profiler on assembled genomes. Using experimental MIRU-VNTR results, 13 out of 15 (86.67%) samples in dataset 1 had their MIRU-VNTR profiles determined correctly by MIRUReader. MIRU-profiler was only able to correctly predict 5 (33.33%) based on polished genome assemblies, and none if uncorrected draft genome assemblies were used (Fig. 1; Supplementary Table S1). We observed similar trend for dataset 2, whereby 1 (14.29%) and 5 (71.43%) samples had their MIRU-VNTR profiles determined accurately out of the 7 samples for MIRU-profiler and MIRUReader respectively (Fig. 1; Supplementary Table S2). In dataset 3, MIRUReader obtained identical results as MIRU-profiler (default parameters) for 15 out of 17 (88.24%) samples using the published downloaded genome assemblies as input for analysis. In the two discordant samples (NC_008769 and NC_012207), we were unable to obtain the reported results for MIRU-profiler in five loci based on the default parameters, while MIRUReader was able to obtain accurate genotypes (Supplementary Table S3). Using the raw PacBio sequence reads from two samples (CP019613 and CP019610), MIRUReader however had discordant results at three loci (424, 580 and 1644) which were accurately determined when using the downloaded genome assemblies as input into MIRUReader.

Accuracy of MIRU-profiler and MIRUReader in correctly predicting the 24-locus MIRU-VNTR genotypes using (a) ONT data from this study (n = 15); (b) publicly released PacBio/ONT reads and genome assemblies (n = 7)
Fig. 1.

Accuracy of MIRU-profiler and MIRUReader in correctly predicting the 24-locus MIRU-VNTR genotypes using (a) ONT data from this study (n = 15); (b) publicly released PacBio/ONT reads and genome assemblies (n = 7)

MIRUReader is much faster than MIRU-profiler since it does not require the additional steps of genome de novo assembly and polishing (Supplementary Fig. S1). Based on dataset 1, the MIRU-VNTR profiles can be obtained in about an hour using MIRUReader with only one computing thread. In contrast, the shortest analysis time for the MIRU-profiler approach was 160 min using 10 computing threads.

Overall, MIRUReader is an accurate and rapid tool that can perform in-silico typing of the standard 24-locus MIRU-VNTR genotypes for MTBC isolates directly from long sequencing reads.

Acknowledgements

The authors would like to thank the Central Tuberculosis Laboratory (CTBL) at the Singapore General Hospital for performing the ONT sequencing runs and both CTBL and STEP for providing the laboratory results of the 24-locus MIRU-VNTR of the samples sequenced.

Funding

This work was supported by the Singapore Infectious Diseases Initiative (SIDI/2014/003) and NUS Startup Grant awarded to RTHO.

Conflict of Interest: none declared.

References

Allix-Beguec
 
C.
 et al. (
2008
)
Evaluation and strategy for use of MIRU-VNTRplus, a multifunctional database for online analysis of genotyping data and phylogenetic identification of Mycobacterium tuberculosis complex isolates
.
J. Clin. Microbiol
.,
46
,
2692
2699
.

Ameur
 
A.
 et al. (
2019
)
Single-molecule sequencing: towards clinical applications
.
Trends Biotechnol
.,
37
,
72
85
.

Chee
 
C.B.
 et al. (
2015
)
Multidrug-resistant tuberculosis outbreak in gaming centers, Singapore, 2012
.
Emerg. Infect. Dis
.,
21
,
179
180
.

Koren
 
S.
 et al. (
2017
)
Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation
.
Genome Res
.,
27
,
722
736
.

Lim
 
L.K.
 et al. (
2013
)
Molecular epidemiology of Mycobacterium tuberculosis complex in Singapore, 2006–2012
.
PLoS One
,
8
,
e84487
.

Loman
 
N.J.
 et al. (
2015
)
A complete bacterial genome assembled de novo using only nanopore sequencing data
.
Nat. Methods
,
12
,
733
735
.

Rajwani
 
R.
 et al. (
2018
)
MIRU-profiler: a rapid tool for determination of 24-loci MIRU-VNTR profiles from assembled genomes of Mycobacterium tuberculosis
.
PeerJ
,
6
,
e5090
.

Rice
 
P.
 et al. (
2000
)
EMBOSS: the European Molecular Biology Open Software Suite
.
Trends Genet
.,
16
,
276
277
.

Supply
 
P.
 et al. (
2006
)
Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis
.
J. Clin. Microbiol
.,
44
,
4498
4510
.

Weniger
 
T.
 et al. (
2010
)
MIRU-VNTRplus: a web tool for polyphasic genotyping of Mycobacterium tuberculosis complex bacteria
.
Nucleic Acids Res
.,
38
,
W326
331
.

Wyllie
 
D.H.
 et al. (
2018
)
A quantitative evaluation of MIRU-VNTR typing against whole-genome sequencing for identifying Mycobacterium tuberculosis transmission: a Prospective Observational Cohort Study
.
EBioMedicine
,
34
,
122
130
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Associate Editor: Inanc Birol
Inanc Birol
Associate Editor
Search for other works by this author on:

Supplementary data