Abstract

Motivation

Oxford Nanopore MinION is a portable DNA sequencer that is marketed as a device that can be deployed anywhere. Current base callers, however, require a powerful GPU to analyze data produced by MinION in real time, which hampers field applications.

Results

We have developed a fast base caller DeepNano-blitz that can analyze stream from up to two MinION runs in real time using a common laptop CPU (i7-7700HQ), with no GPU requirements. The base caller settings allow trading accuracy for speed and the results can be used for real time run monitoring (i.e. sample composition, barcode balance, species identification, etc.) or prefiltering of results for more detailed analysis (i.e. filtering out human DNA from human–pathogen runs).

Availability and implementation

DeepNano-blitz has been developed and tested on Linux and Intel processors and is available under MIT license at https://github.com/fmfi-compbio/deepnano-blitz.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

We introduce DeepNano-blitz, a very fast base caller for Oxford Nanopore MinION sequencers. MinION is a pocket-sized DNA sequencing machine which measures electric current as a DNA molecule passes through a nanopore. These electric signals are then translated into sequences using a base caller software [such as Guppy (Wick et al., 2019), Chiron (Teng et al., 2018) or WaveNano (Wang et al., 2018)] with about 10–15% error rate. MinIONs are extremely portable, and therefore, suitable for experiments in the field or deployment in a clinical setting; in fact, all equipment required to extract DNA, prepare sequencing libraries and perform the sequencing fits in a regular travel suitcase (Edwards et al., 2016).

Unfortunately, in a struggle to increase the accuracy, the new versions of base callers have become slower over time. To keep up with ∼1.5–2.0 million signal readouts per second coming from a successful MinION run using Guppy, one currently needs a powerful GPU with compute capability at least 6.1. Such requirements are often not met by regular laptop computers, which are in other ways perfectly capable of running a MinION. Heavy GPU usage also implies high energy consumption, which hampers field deployment of MinION sequencing. Note that Guppy will run without GPU support, but base calling will take significantly longer than sequencing itself.

Here, we introduce a new base caller DeepNano-blitz, which can keep up with one or even two MinION sequencers on a single i7-7700HQ 4-core laptop CPU with no GPU, at the cost of slightly reduced accuracy. Our intended use is real-time monitoring of a sequencing run (such as to ascertain proportions of barcodes in a sample, examine the ratio of human versus pathogen DNA, select a minority of interesting reads for in-depth analysis, etc.), even though some of the data may need to be reanalyzed later with a more accurate tool.

2 Materials and methods

DeepNano-blitz is based on a bidirectional recurrent neural network [similar as in Guppy (Wick et al., 2019), Chiron (Teng et al., 2018), DeepNano (Boža et al., 2017)] which is heavily optimized for performance (see Supplementary Fig. S1). The key ingredients, allowing us to increase the speed, are use of smaller networks and optimization of individual network components for efficient use of available CPU instructions.

Comparison of accuracy versus speed. A single MinION can produce up to 1.5–2 M signal samples per second. Note that the x-axis is in log scale
Fig. 1.

Comparison of accuracy versus speed. A single MinION can produce up to 1.5–2 M signal samples per second. Note that the x-axis is in log scale

Median-normalized raw signal on the input is forked into two channels (identity and squared input). Preprocessing part of the network uses temporal convolution, followed by temporal max-pooling with stride 3, and tanh layer. The main part consists of four GRU layers in alternating directions. The last step uses a softmax layer to predict one of {–, A, C, G, T} for each position. The network is trained using CTC loss (Graves et al., 2006). Decoding uses a beam search with tunable beam size, dropping beams if the last step probability is less than a tunable threshold. The details of training and testing sets are shown in Supplementary Section S1.

To achieve greater speed, DeepNano-blitz is written in Rust and hand-optimized. We employ cache-aware memory layouts, fast approximations of sigmoid/tanh functions (Mineiro, 2011) and Intel MKL library for matrix multiplication. The implementation runs over 15 floating point multiplications per CPU cycle which is close to the architectural maximum.

3 Results

To evaluate the speed and accuracy, we have used a benchmark dataset of R9.4.1 Klebsiella pneumoniae (Wick et al., 2019). The base calls were mapped to the reference using minimap2 (Li, 2016) and the read accuracy was computed as one minus the ratio of the alignment edit distance and the length of the base call. We report the median read accuracy. Figure 1 shows the comparison of various settings of the DeepNano-blitz against Guppy 3.4.4 (CPU version). In the fastest mode, the DeepNano-blitz runs over 100× faster than Guppy high accuracy and ∼13× faster than Guppy in fast mode. The accuracy difference between the version that can keep up with 2 million readouts per second (width64-beam5) and Guppy in fast mode is <2% points. For more complete results, including a human dataset (Jain et al., 2018), see Supplementary Table S2.

It is true that a few percentage points in read accuracy can sometimes significantly impact the accuracy of downstream analysis. To ascertain this effect, we have analyzed ZymoBIOMICS Microbial Community Standards dataset (Nicholls et al., 2019; Oxford Nanopore Technologies, 2019) with both DeepNano-blitz and Guppy 3.4.4. Each read was base called and using minimap2, it was mapped to the reference genomes to estimate the composition of the sample. Guppy in high accuracy mode resulted in 94.5% reads successfully mapped, while the fastest version of DeepNano-blitz mapped 90.7% of reads. There were no significant differences between the estimated proportions of reads (Supplementary Table S3).

We have also ascertained the ability of DeepNano-blitz to monitor the balance of a barcoded sample, using a public dataset of 12 barcoded bacterial samples (Wick et al., 2018). After base calling, we have used guppy barcoder with standard settings to classify reads into barcodes. As expected, DeepNano-blitz results in lower recall (∼26% of reads classified as unknown barcode with width64-beam5 setting versus 17% with guppy fast mode); yet, there is no significant difference in precision (∼96% with width64-beam5 versus 97% with guppy fast mode) and there are no significant differences in estimates of barcode composition of the sample between DeepNano-blitz and Guppy 3.4.4 (Supplementary Table S4). Note that guppy barcoder is optimized for guppy and with custom barcode recognition settings, recall of DeepNano-blitz may be improved.

4 Discussion

DeepNano-blitz provides a fast alternative to Oxford Nanopore base callers for analysis of MinION data. It allows trading accuracy for speed and enables real-time data analysis without requirement of a powerful GPU. We believe that DeepNano-blitz enhances the ability to deploy MinION sequencing in the field and enables building custom analysis pipelines monitoring MinION sequencing runs in real time.

Funding

This research was supported in part by funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 872539, and by grants from the Slovak Research and Development Agency (APVV-18-0239) and VEGA (1/0458/18 to T.V. and 1/0463/20 to B.B.).

Conflict of Interest: none declared.

References

Boža
V.
 et al. (
2017
)
DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads
.
PLoS One
,
12
,
e0178751
.

Edwards
A.
 et al. (
2016
) Extreme metagenomics using nanopore DNA sequencing: a field report from Svalbard, 78°N. bioRxiv, doi: 10.1101/073965.

Graves
A.
 et al. (
2006
) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ACM, New York, pp.
369
376

Jain
M.
 et al. (
2018
)
Nanopore sequencing and assembly of a human genome with ultra-long reads
.
Nat. Biotechnol
.,
36
,
338
345
.

Li
H.
(
2016
)
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
.
Bioinformatics
,
32
,
2103
2110
.

Mineiro
P.
(
2011
) Fast approximate logarithm, exponential, power and inverse root.  http://www.machinedlearnings.com/2011/06/fast-approximate-logarithm-exponential.html. (20 January 2020, date last accessed).

Nicholls
S.M.
 et al. (
2019
)
Ultra-deep, long-read nanopore sequencing of mock microbial community standards
.
Gigascience
,
8
,
giz043
.

Oxford Nanopore Technologies (

2019
) Zymo mock community data release 2019-02.  https://github.com/nanoporetech/zymo-data. (20 January 2020, date last accessed).

Teng
H.
 et al. (
2018
)
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning
.
Gigascience
,
7
,
giy037
.

Wang
S.
 et al. (
2018
)
WaveNano: a signal-level nanopore base-caller via simultaneous prediction of nucleotide labels and move labels through bi-directional WaveNets
.
Quant. Biol
.,
6
,
359
368
.

Wick
R.R.
 et al. (
2018
)
Deepbinner: demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks
.
PLoS Comput. Biol
.,
14
,
e1006583
.

Wick
R.R.
 et al. (
2019
)
Performance of neural network basecalling tools for Oxford Nanopore sequencing
.
Genome Biol
.,
20
,
129
.

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Associate Editor: Inanc Birol
Inanc Birol
Associate Editor
Search for other works by this author on:

Supplementary data