Real-time strain typing and analysis of antibiotic resistance potential using Nanopore MinION sequencing

Clinical pathogen sequencing has significant potential to drive informed treatment of patients with unknown bacterial infection. However, the lack of rapid sequencing technologies with concomitant analysis has impeded clinical adoption in infection diagnosis. Here we demonstrate that commercially-available Nanopore sequencing devices can identify bacterial species and strain information with less than one hour of sequencing time, initial drug-resistance profiles within 2 hours, and a complete resistance profile within 12 hours. We anticipate these devices and associated analysis methods may become useful clinical tools to guide appropriate therapy in time-critical clinical presentations such as bacteraemia and sepsis.


INTRODUCTION
High throughput sequencing (HTS) has recently become the frontier technology in genomics and has transformed genetic research 1,2 .The pace of change in this field has been rapid and there have been several new sequencing instruments introduced to the market in recent years.One recent addition to the growing spectrum of HTS field is a portable MinION sequencing device from Oxford Nanopore Technologies.
DNA sequencing with biological nanopore was proposed in the 1990s 3 .However, only recently a prototype version of a nanopore sequencing device, the MinION, was released by Oxford Nanopore Technologies.
The MinION sequencer measures the change in electrical current as single-stranded DNA passes through the nanopore and the difference in electrical current determines the nucleotide sequence of each DNA strand 4,5 .Simultaneous base calling of multiple strands of DNA passing through multiple nanopores generates sequence fragments, which permits real-time analysis of the sequence data.
In recent years HTS has become an integrative tool for infectious disease analysis 6,7 .There have been several reports emphasizing the use of HTS methods to characterize clinical isolates, to study the spread of drug resistant microorganisms and to investigate outbreak of infections [8][9][10] .However, there have been several hurdles to widespread adoption of HTS as the 'method-of-choice' in the clinic for determining the infectious agent and guiding patient treatment: a) lack of portability; b) high cost of the sequencing devices and c) difficulty in obtaining actionable data within a few hours.The miniature, portable and low-cost MinION sequencer overcomes two of these hurdles, but it is currently unclear the extent to which clinically actionable data can be obtained within a few hours using this device.
In this article we report real-time analysis and characterization of three Klebsiella pneumoniae strains via Nanopore sequencing (Figure 1).We demonstrate that we can determine the species and strain type of the sequenced sample within an hour of sequencing.Furthermore we show that we can identify ~50% of the drug resistance genes present in a sample within 2 hours of sequencing, and the full drug resistance profile within 12 hours.We also show that Nanopore sequence data can be used for accurate Multi-Locus Sequence Typing (MLST) despite the relatively high base-calling error rates previously reported 11,12 .Our findings support the potential use of Nanopore sequencing for real time analysis of clinical samples for species detection and analysis of antibiotic resistance.

RESULTS
We sequenced the genomes of three Klebsiella pneumoniae strains, ATCC BAA-2146, ATCC 700603 and ATCC 13883 using the Nanopore MinION device.Nanopore sequence reads are classified into three types: template, complement and higher quality 2D reads (i.e. two direction reads, which contains the computationally merged template and complement read).Samples sequenced with the R7 flow cell yielded 12% of 2D reads but it was doubled with the improved R7.3 flow cell (Table 1).Although our sequence not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was this version posted May 15, 2015.; https://doi.org/10.1101/019356doi: bioRxiv preprint yields were lower, the read length and accuracy of the sequence data were similar to other user reports [11][12][13][14] . We observed that the majority of data (greater than 75% for R7 and greater than 66% for R7.3) were generated in the first 16 hours of sequencing time (Figure 2).

Species Detection:
To illustrate the potential of rapid 'real-time' species detection with the MinION sequencer, we built a 'streaming' species detection computational pipeline.As each new read is sequenced, our pipeline updates an estimate of the proportion of DNA present in the sample which belongs to each of 2,785 bacterial genomes currently available in GenBank (http://www.ncbi.nlm.nih.gov/genbank/), as well as an estimate of the uncertainty in this proportion (See Methods).
In all three sequenced samples, we successfully detected K.pneumoniae as the major species present in the isolate with 99% confidence.This was achieved with as little as 500 sequence reads requiring less than 20 minutes of sequencing time (Figure 3).We assessed our species detection method on an Escherichia coli (strain K12 MG1655) Nanopore MinION sequencing data set published by Quick et al 14 .Our method successfully identified the species in their sequence data set with approximately the same amount of data required for our samples.not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was this version posted May 15, 2015.; https://doi.org/10.1101/019356doi: bioRxiv preprint Interestingly, our pipeline identified approximately 20% of the ATCC 700603 sample (K.pneumoniae) as Klebsiella variicola.We downloaded the assembly of ATCC 700603 strain (Accession ID NZ_AOGO00000000.1) and performed phylogenetic analysis with 5 K.variicola assemblies and 9 K.pneumoniae complete genomes available on GenBank (Supplementary Figure 1).We found that the K.pneumoniae ATCC 700603 strain was in fact an ancestor of all other K.pneumoniae and K.variicola strains with available sequencing data in GenBank.This explains the shared identity detected in Nanopore sequencing data.Finally, when we included the ATCC 700603 assembly in the bacterial database, the species detection pipeline identified K.pneumoniae as the only species in the ATCC 700603 sample (Supplementary Figure 2).

Multi-locus Sequence Typing:
K.pneumoniae are conventionally strain typed using an MLST system (http://bigsdb.web.pasteur.fr/klebsiella/klebsiella.html), which requires accurate genotyping to distinguish the alleles of seven house-keeping genes 15 .Previous reports indicating a high base-calling error rate 11,12 suggested that MLST typing may be challenging with MinION sequence data.
We developed a pipeline to carry out MLST typing using MinION sequence data which first corrects errors in the raw sequence reads and subsequently combines information across multiple SNPs in a likelihood-based framework (See Methods).Table 2 presents the top five highest score types (in log-likelihood) for three K.pneumoniae strains using Nanopore sequencing.In all three strains, the correct types were the highest score out of 1678 types available in the MLST database.However, we noticed that the typing system also outputted several other types with the same likelihood (i.e., types 751 and 864 for strain ATCC BAA-2146 and type 851 for strain ATCC 700603).We examined the profiles of these types, and found that for strain ATCC BAA-2146, types 751 and 864 differ to the correct type 11 by only one SNP from the total of 3012 bases in seven genes (see Supplementary Note 3).For strain ATCC 700603, type 851 differs to type 489 by two alleles (in genes phoE and tonB), but there was only one read mapped to each of these genes.These results suggest a more accurate strain-typing methodology would consider all of the sequenced reads, rather than just those covering 7 house-keeping genes, so we further devised a method for strain-typing which was based on presence or absence of genes.

Strain Detection by presence or absence of genes
We propose a novel 'real-time' strain typing method to identify the bacterial strain from the Nanopore sequence reads based on the presence or absence patterns of genes.We obtained the genome assemblies of 235 K.pneumoniae strains from GenBank (http://www.ncbi.nlm.nih.gov/genbank/) and Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) and annotated them with Prokka 16 to create a gene database and a gene profile for each K.pneumoniae strain.Our pipeline identifies which genes in the database are present in each read (fully or partially) as it is generated by the nanopore sequencer, and use this information to update the posterior probability of each of the 235 strains , as well as the 99% confidence intervals in this estimate (see Methods).
We applied this pipeline to the 3 sequenced K.pneumoniae strains (Figure 4).We successfully identified the corresponding strains from the sequence data with 99% confidence within 25 minutes of sequencing time and with as low as 500 sequencing reads.

Antibiotic resistance detection:
The antibiotic resistance profile of the three K.pneumoniae strains were characterised with Nanopore MinION sequencing data.We obtained antibiotic drug resistance genes from "The Comprehensive Antibiotic Resistance Database" (http://arpcard.mcmaster.ca/) 17, and grouped them into 38 different antibiotic classes based on the gene annotation.We applied a pipeline to detect antibiotic resistance genes from the sequencing data and to report the classes of these genes in real-time (See Methods).The resulting antibiotic gene profiles were then validated with the gene profiles we obtained from the reference genome sequences of the respective strains.Table 3 shows the classes of antibiotic genes detected from Nanopore MinION sequencing of three K.pneumoniae strains over time.For the NDM-1 producing K.pneumoniae strain ATCC BAA-2146 ( 17classes of antibiotic resistance classes based on the reference genome), the pipeline detected 18 classes after 12 hours of sequencing with 94.12% sensitivity and 90.48% specificity.Almost 50% of the classes (8 out of 17) were detected after just 1 hour of sequencing.We observed similar performance for K.pneumoniae type strain ATCC 13883 where 6 out of 8 classes were detected after 1 hour with one false positive (specificity 96.67%).After 12 hours of sequencing, all classes were detected including 2 false positives (sensitivity 100% and specificity 93.33%).For the multi-drug resistant K.pneumoniae strain ATCC 700603, the pipeline only detected 6 out of 9 classes, however without any false positives (sensitivity 66.67% and specificity 100%).The reduced sensitivity for this sample is most likely due to the low sequence yield (33Mb of data in total).
To evaluate the overall performance of the antibiotic gene profiles identification, we varied the stringent threshold to obtain different levels of sensitivity.Figure 5 shows the Receiver Operating Characteristic curves of antibiotic resistance gene class detection after 12 hours of sequencing of three K.pneumoniae strains.In all three strains, the pipeline was able to detect 100% antibiotic resistance gene classes with only up to 23.80% false positive rates.On the other hand, it detected over three quarters of the antibiotic resistance gene classes without any false positive.Notably, it attained a perfect identification result (AUC=1.0)for sample ATCC 13383, which was sequenced with the better chemistry R7.3.not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.

Discussion
One of the major bottlenecks to routine establishment of whole genome pathogen sequencing in the clinic is the lack of a sequencing technology that can return definitive results within a few hours.In this paper we have demonstrated the potential of the Nanopore MinION sequencing device to return clinically actionable results on the pathogen species and strain with less than half an hour of sequencing time, and on the drug resistance profile, useful for therapeutic treatment, within a few hours.This is a major step forward for realising the promise of whole genome sequencing in the clinic.Bradley et al have also recently shown they could accurately identify the drug resistance profile of a Mycobacterium Tuberculosis sample from 8hrs of MinION sequencing using a de-Bruijn graph approach 18 (BioRXiv paper).
One of the major advantages of a whole-genome sequencing approach to drug resistance profiling is that it is not necessary to restrict the analysis to a limited panel of drug-resistance tests but it is possible to discover the complete drug resistance profile in a sample.By having a complete picture of the drug-resistance profile within a few hours, the clinician is able to design an antibiotic treatment regimen which is both much more likely to succeed and less likely to induce further antibiotic resistance.
Whole genome sequencing has further advantages that we have not explored here, but have been addressed by others, including the ability to track the progression of an outbreak.Traditional approaches to this problem require high confidence SNP calls, which are likely to become more viable as the Nanopore sequencing technology improves and as error-correction algorithms become more sophisticated.
Clearly, real-time Nanopore sequencing has a promising future in clinical pathogen sequencing.

DNA extraction
Bacterial strains K.pneumoniae ATCC 13883, ATCC 700603 and ATCC BAA-2146 were obtained from American Type Culture Collection (ATCC, USA).Bacterial cultures were grown overnight from a single colony at 37 °C with shaking (180 rpm).Whole cell DNA was extracted from the cultures using the DNeasy Blood and Tissue Kit (QIAGEN ® , Cat # 69504) according to the bacterial DNA extraction protocol with enzymatic lysis pre-treatment.

MinION library preparation -R7
Library preparation was performed using the Genomic DNA Sequencing kit (SQK-MAPP-002) (Nanopore) according to the manufacturer's instruction.Briefly, 1 g of genomic DNA was sheared to 10kb fragment size using a Covaris g-TUBE.The sheared DNA was end repaired using the NEBNext End Repair Module (New England Biolabs) in a total volume of 100 L and incubated at 20°C for 30 minutes.The end repaired DNA was purified using 1x volume (100 L) Agencourt Ampure XP beads (Beckman Coulter) according to the manufacturer's instructions.Purified end repair products were eluted in 42 L of molecular grade water and dA-tailing was performed using the NEBNext dA-tailing module (New England Biolabs) in a total volume of 50 L and incubated at 37°C for 30 minutes.Ligation was performed using the reagents supplied by Nanopore and T4 DNA ligase from New England Biolabs.The dA-tailed DNA was mixed with 10 L of adapter mix, 10 L of HP adapter, 20 L of 5x ligation buffer and 10 L of T4 DNA ligase (20000 units per reaction) and incubated at room temperature for 10 minutes.The adapter-ligated DNA was purified using 0.4x volume (40 L) Agencourt Ampure XP beads (Beckman Coulter) according to the manufacturer's instructions with slight modifications.Nanopore supplied wash buffer and elution buffer was used and only a single wash was performed.Samples were eluted in 25 L of elution buffer.The ligated DNA was mixed with 10 L of tether and incubated at room temperature for 10 minutes.Finally, 15 L of HP motor was added to the reaction and incubated at room temperature for 16 hours.

MinION library preparation -R7.3
For the R7.3 flow cells an updated Genomic Sequencing kit (SQK-MAPP-003) (Nanopore) was used according to the manufacturer's instruction.Purified end repair products were eluted in 25 L of molecular grade water and dA-tailing was performed in a total volume of 30 L.The dA-tailed DNA was mixed with 10 L of adapter mix,10 L of HP adapter and 50 L of Blunt/TA ligase master mix (New England Biolabs) and not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was this version posted May 15, 2015.; https://doi.org/10.1101/019356doi: bioRxiv preprint incubated at room temperature for 10 minutes.The adapter-ligated DNA was purified using 0.4 x volume (40 L) Agencourt Ampure XP beads (Beckman Coulter) according to the manufacturer's instructions with slight modifications.Nanopore supplied wash buffer and elution buffer was used and only a single wash was performed.Samples were eluted in 25 L of elution buffer.

MinION Sequencing
For each sample a new MinION flow cell (R7 or R7.3) was used for sequencing.The flow cell was inserted into the MinION device and prior to sequencing, the flow cell was primed using 150 L of EP buffer twice with 10 minute incubation after each addition.The sequencing library mix was prepared by combining 6 L of library with 140 L of EP buffer and 4 L of fuel mix.The library mix was loaded onto the MinION flow cell and the Genomic DNA 48 hour sequencing protocol was initiated on the MinKNOW software.The MinION flow cell was topped up with fresh library mix for every 12 hours as required.

MinION data analysis
The sequence read data were base called with the Metrichor software using the workflow r7 2D version 1.12.
We developed Nanopore Reader (npreader) program (available at https://github.com/mdcao/npReader) to convert base-called sequence data in fast5 format to fastq format.The npreader also extracted the time that each read was sequenced and used this information to sort the read sequences in order they were produced.We streamlined read data in this order to the analysis pipelines presented below and took measures every five minutes of sequencing data.

Species typing
We downloaded the bacterial genome database on GenBank (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/,accessed 19 Nov 2014), which contained high quality genomes of 2785 bacterial strains from 1487 species (See Supplementary Spread-sheets).Our species typing pipeline streamed read data from npreader directly to BWA 19 (see Supplementary Note 1) which aligned the reads to the bacterial genome database.Output from BWA was streamed directly into our species typing pipeline, which calculated the proportion of reads aligned to each of these species.We used the MultinomialCI package in R 20

Figure 1 :
Figure 1: Real-time analysis of nanopore MinION sequence data.(a) Timeline for laboratory workflow and comprehensive bacterial genome analysis using Nanopore MinION sequencing.Species and strain information from a DNA sample can be determined within 3 to 4 hours and antibiotic resistance profile can be determined within 15 hours.(b) Real-time analysis pipeline of Nanopore MinION sequence data.All parts of the pipeline are processed simultaneously to allow for real-time analysis.Raw sequence reads from Nanopore are base called using Metrichor program.Base called reads which are in fast5 fromat are converted into fastq format using our Nanopore reader program.Converted fastq reads are streamed into species detection, strain typing and antibiotic resistance profile modules.

Figure 2 :
Figure 2: Sequencing yield over time for all 3 Klebsiella pneumoniae samples (a) Number of reads over time and (b) Number of base pairs sequenced over time.

Figure 4 :
Figure 4: Real-time identification of strain type from Nanopore MinION sequence reads on three different K.pneumoniae strains (a) ATCC BAA-2146 (b) ATCC 700603 and (c) ATCC 13883 strains successfully identified using the pipeline.
not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.The copyright holder for this preprint (which was this version posted May 15, 2015.; https://doi.org/10.1101/019356doi: bioRxiv preprint

Figure 5 :
Figure 5: Receiver operating characteristic curves of antibiotic gene profiles identification of three K.pneumoniae strains ATCC BAA-2146, ATCC 700603 and ATCC 13883 after 12 hours of sequencing.
not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.

Table 2 :
not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.Multi-locus Strain-typing results for three K.pneumoniae strains.The top five probable MLST types are shown for each sample.
not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.

Table 3 :
Timeline of detecting antibiotic resistance gene classes from Nanopore MinION sequencing data on three K.pneumoniae strains ATCC BAA-2146, ATCC 700603 and ATCC 13883.Number of antibiotic classes detected in the reference TPtrue positive, FPfalse positive not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
to calculate 95% confidence intervals in the value of this proportion.not certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.The copyright holder for this preprint (which was this version posted May 15, 2015.; https://doi.org/10.1101/019356doi: bioRxiv preprint