Comparison of single-nucleotide variants identified by Illumina and Oxford Nanopore technologies in the context of a potential outbreak of Shiga toxin–producing Escherichia coli

Abstract Background We aimed to compare Illumina and Oxford Nanopore Technology sequencing data from the 2 isolates of Shiga toxin–producing Escherichia coli (STEC) O157:H7 to determine whether concordant single-nucleotide variants were identified and whether inference of relatedness was consistent with the 2 technologies. Results For the Illumina workflow, the time from DNA extraction to availability of results was ∼40 hours, whereas with the ONT workflow serotyping and Shiga toxin subtyping variant identification were available within 7 hours. After optimization of the ONT variant filtering, on average 95% of the discrepant positions between the technologies were accounted for by methylated positions found in the described 5-methylcytosine motif sequences, CC(A/T)GG. Of the few discrepant variants (6 and 7 difference for the 2 isolates) identified by the 2 technologies, it is likely that both methodologies contain false calls. Conclusions Despite these discrepancies, Illumina and Oxford Nanopore Technology sequences from the same case were placed on the same phylogenetic location against a dense reference database of STEC O157:H7 genomes sequenced using the Illumina workflow. Robust single-nucleotide polymorphism typing using MinION-based variant calling is possible, and we provide evidence that the 2 technologies can be used interchangeably to type STEC O157:H7 in a public health setting.


Background
We aimed to compare Illumina and Oxford Nanopore Technology (ONT) sequencing data from the two isolates of STEC O157:H7 to determine whether concordant single nucleotide variants were identified and whether inference of relatedness was consistent with the two technologies.

Results
For the Illumina workflow, the time from DNA extraction to availability of results, was approximately 40 hours in comparison to the ONT workflow where serotyping, Shiga toxin subtyping variant identification were available within seven hours. After optimisation of the ONT variant filtering, on average 95% of the discrepant positions between the technologies were accounted for by methylated positions found in the described 5-Methylcytosine motif sequences, CC(A/T)GG. Of the few discrepant variants (6 and 7 difference for the two isolates) identified by the two technologies, it is likely that both methodologies contain false calls.

Conclusions
Despite these discrepancies, Illumina and ONT sequences from the same case were placed on the same phylogenetic location against a dense reference database of STEC O157:H7 genomes sequenced using the Illumina workflow. Robust SNP typing using MinION-based variant calling is possible and we provide evidence that the two technologies can be used interchangeably to type STEC O157:H7 in a public health setting.

Abstract 28
Background 29 We aimed to compare Illumina and Oxford Nanopore Technology (ONT) sequencing data from the 30 two isolates of STEC O157:H7 to determine whether concordant single nucleotide variants were 31 identified and whether inference of relatedness was consistent with the two technologies. 32 (SNP) typing offers an unprecedented level of strain discrimination and can be used to quantify the 71 genetic relatedness between groups of genomes. In general, for clonal bacteria, the fewer 72 polymorphisms identified between pairs of strains, the less time since divergence from a common 73 ancestor and therefore the increased likelihood that they are from the same source population. 74 Therefore, it is paramount that variant detection for typing is accurate, highly specific and 75 concentrated on positions of neutral evolution to ensure the correct interpretation of the sequence 76 data within the epidemiological context of an outbreak. It has been previously shown that different 77 bioinformatics analysis approaches for variant identification exhibit detection variability [7,8]

Comparison of typing results generated by Illumina and ONT workflows 112
To consider the potential benefits of real-time sequencing to enhance opportunities for early 113 outbreak detection, the timelines from DNA extraction to result generation for Illumina and ONT 114 workflows were evaluated ( Figure 1) and the relationship between yield, time and genome coverage 115 plotted ( Figure 2). For the ONT workflow, the time from DNA extraction to completion of the 116 sequencing run was 28 hours. A total yield of 0.45 Gbases for the isolate from Case A and 0.59 117 Gbases for the isolate from Case B was achieved which corresponds to an equivalent coverage of the 118   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Sakai O157 STEC reference genome (5.4Mb) of 81.29X and 108.30X for isolate A and B respectively. 119 The average PHRED quality score for all reads in Case A was 9.87 and Case B was 9.47, which is 120 approximately 1 error every 10 bases. Base-calling and analysis was performed in real-time and 121 serotyping, Shiga toxin subtyping and variant identification were available within six hours and 122 twenty minutes of the 24-hour sequencing run. With respect to the Illumina sequencing workflow, 123 the time from DNA extraction to availability of results, assuming there were no breaks in the 124 process, was just under 40 hours ( Figure 1). 125