NanoGalaxy: Nanopore long-read sequencing data analysis in Galaxy

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.

I'm glad you appreciate the opportunity to explore NanoGalaxy and want to thank you for recommending the publication. Nevertheless, we changed the manuscript according to your minor comments.
As far as I can tell, the primary publication for Galaxy  is not directly cited, nor is the original 2005 paper cited (although on reflection, the 2018 paper would be a better citation). It would be a good idea to include a citation specific to Galaxy, as recommended in the NanoGalaxy 'How to Cite Galaxy' information: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgalaxyproject.or g%2Fciting-galaxy%2F%23primary-publication&amp;data=02%7C01%7Cw.dekoning.1%40erasmusmc.nl%7Cb1868667c5 924c1ef0f108d84d7320cd%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0%7C 637344504151659246&amp;sdata=5K2q81Ool9f%2FlSDqh1ERnd2TYwAEJbq3QHd %2F7ektPz8%3D&amp;reserved=0 The Galaxy citation would be best in the first sentence of the last paragraph of the introduction. According to that page, the citation for Galaxy Toolshed [reference #12 in the manuscript] should be Blankenberg et al., 2014.
Thanks again for pointing out this issue, and expanding the explanation. Indeed, we added the citation on the suggested location in the manuscript as it is appropriate.
[Tried to set stopOnLowCoverage=0, but this option wouldn't be accepted] On further reflection, it would help if assembly tools would give a warning (especially for datasets > 100 Mb) about the time taken for the assembly step.
Thank you for trying our tools, we have added the following note to the assembly tools based on your feedback: "NOTE: This tool may take a long time depending on the size and characteristics of your dataset." Given troubles with my own datasets, I tried to find / test out the "Use Case 1: Long Read Sequencing Analysis" workflow. Unfortunately, the manuscript doesn't include information about input datasets. It mentions "In this worked example", but I can't see a worked example anywhere. Earlier in the manuscript, I see a reference to the Galaxy training materials website, https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftraining.galaxypr oject.org%2F&amp;data=02%7C01%7Cw.dekoning.1%40erasmusmc.nl%7Cb186866 7c5924c1ef0f108d84d7320cd%7C526638ba6af34b0fa532a1a511f4ac80%7C0%7C0 %7C637344504151659246&amp;sdata=LD8F60MoLA4UgJdQ39CZnXnwEYeN054Oh 4cdckEtjsY%3D&amp;reserved=0, and eventually found (after searching for "nanopore") a Galaxy Training workflow with the title "Antibiotic resistance detection", with authors matching the authors of this manuscript. That title doesn't seem to be anywhere in the manuscript. It would be helpful if a direct link to this workflow were added to the paper, or at the very least a reference to the workflow title, "Antibiotic resistance detection": https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftraining.galaxypr oject.org%2Ftraining-material%2Ftopics%2Fmetagenomics%2Ftutorials%2Fplasmidmetagenomics-nanopore%2Ftutorial.html&amp;data=02%7C01%7Cw.dekoning.1%40erasmusmc.nl% 7Cb1868667c5924c1ef0f108d84d7320cd%7C526638ba6af34b0fa532a1a511f4ac80% 7C0%7C0%7C637344504151669239&amp;sdata=81iS4K%2FwWRI31RcWL%2BYIR LszQLRNIoBrudpY%2BNybcb0%3D&amp;reserved=0 Indeed the Galaxy training website is mentioned, but not the name of the related training. Therefore, the name of the training is added to the manuscript.
After accessing and reading this workflow, I was able to run through the example successfully and carry out all operations: * running nanoplot * running all-vs-all minimap2 on the collected dataset * generating an assembly with miniasm * converting the resultant GFA to FASTA format * remapping reads to the assembled genomes with minimap2 * consensus calling with Racon * visualising miniasm assembly graph with Bandage * Assembling using unicycler with nanopore-only reads Unfortunately, I couldn't find 'PlasFlow' or 'staramr' in the Tools search to attempt those steps. It would be a good idea to either add these to NanoGalaxy (in a way that they're visible by search), or create an alternative workflow that stops at the assembly step.
assembly, when using the parameters stated in the workflow. I expect that the inconsistency with toolsets will improve over time as more tool paths get more eyes and more use.
We hope that our changes improve the manuscript as expected.
All the best, Willem

Background
Short-read sequencing has become a routine technique within clinical diagnostics [1]. However, the short length of the reads obtained (150-300 bp) complicates the assembly of genomes, especially for highly repetitive regions and the detection of structural variation [2,3,4]. Furthermore, even "state-of-theart" algorithms cannot overcome the issues associated with genome mapping or assembly using short-read sequences. Importantly, advances in sequencing technology now allow longread sequencing to be performed. The two prominent longread sequencing platforms are nanopore sequencing by Oxford Nanopore Technologies and single molecule real time sequencing by PacBio [5,6]. These platforms generate sequence reads much longer than the classic short-read technologies, including long-reads from single DNA molecules and without the need of PCR ampli cation (>10 kilobase on average). Moreover, utilising these technologies, library preparation and sequencing may be performed outside of traditional research laboratories, with sequencing outputs generated in real-time [7]. Protocols that require no PCR ampli cation also permit the direct detection of base modi cations [8].
Analyzing the large amount of data generated by the shortand long-read sequencing technologies is a complex, multistep process that is computationally intensive and often require bioinformatics expertise. Speci cally, for each step in the analysis, a set of di erent tools or software may be needed. For example, de novo assembly is performed via a combination of multiple alignments, assembly and polishing tools, each utilizing its own input parameters. Such tools are typically executed from a UNIX command-line and require extensive computational resources, adding to the complexity of the analysis process. Command-line based work ow managers such as Snakemake and Next ow [9,10] can be used for analysing the data. However, these solutions require having expertise in working from the command line. On the other side, some web-based solutions have also been o ered. For example EPI2ME platform o ers a cloud-based solution with a web interface. The platform supports practical solutions for a limited set of application scenarios, and provides a limited exibility for con guring the underlying work ows. Here, the Galaxy platforms o ers a exible data analysis platform with a high degree of exibility, similar to the command-line based work ow managers, and an accessible web interface.
The Galaxy platform reduces the data analysis complexity and implements a standardized and user-friendly interface that accommodates command-line tools and re ned workows complete with their dependencies [11]. The platform hosts a wide range of tools/software and is widely used for bioinformatics analysis within the biological science community [12,13]. Here we introduce the NanoGalaxy toolkit for analysing Nanopore long-read data. NanoGalaxy comprises a series of integrated Galaxy-based tools that enable researchers to generate powerful short-or long-sequence read assemblies for genomic and plasmid bioinformatics analyses. The NanoGalaxy toolkit is a user-friendly environment that can be utilized inside or outside of traditional research laboratories.

Findings Tools
We have integrated a large collection of long-read sequence tools into the Galaxy platform, the NanoGalaxy toolkit, including diverse applications for the analyses of long-read sequences (Table 1).
This toolkit is freely available from the Galaxy ToolShed, and has additionally been made available as a specialized GalaxyEU subdomain (https://nanopore.usegalaxy.eu).

Work ows
In order to increase the utility of this toolkit, we have developed a set of Galaxy work ows performing common analysis tasks using the tools in the NanoGalaxy toolkit.

Metagenomics taxonomic classi cation
The base quality of nanopore sequencing reads is constantly improving, making the actual assembly of reads more reliable. Further, the long-reads generated by nanopore sequencing can be used to provide valuable information from metagenomics data, including taxonomic classi cations.
Kraken2 is a k-mer based classi cation technique that can e ciently assign the taxa of long reads that are resilient to the noisy nature of long-read data. The input reads for Kraken2 are compared to a database containing di erent classes and domains of life that are pre-indexed for algorithm e ciency. Within the NanoGalaxy toolkit we provide a work ow for taxonomic classi cation using Kraken2, including the postprocessing of data and visualization of the results as interactive pie charts using the Krona tool [14].

Nanopolish tutorials
Nanopolish includes an extensive set of software tools for analysing nanopore long-read information at the raw signal level. Further, accompanying Nanopolish documentation provides intuitive tutorials on common scenarios, such as variation analysis and base methylation calling from the raw and mapped signals -Loman et al. [15]. We have integrated Nanopolish and its tutorials into Nanogalaxy in the form of workows that can be used by researchers to analyse and interpret common quality values for their data.

De novo assembly of genome with highly repetitive repeats
Compared to short reads, long-read data has the advantage of facilitating the assembly of large genomes that contain high numbers of repetitive elements. Schmid et al. utilised Flye and several other tools to generate a comprehensive assembly of the Pseudomonas koreensis genome, identifying that the genome has near identical repeat pairs up to 70 kilobase pairs in length [16]. These work ows have also been integrated in the NanoGalaxy toolkit.

Worked example: Antimicrobial resistance
As a further illustration of the utility of the NanoGalaxy toolkit and work ows, we describe below a full end-to-end work ow within Galaxy. This analysis pipeline performs a microbial resistance detection in clinical samples. We describe this workow in more detail in our training manual on the Galaxy Training materials repository (https://training.galaxyproject.org; Antibiotic resistance detection).

Background
According to the World Health Organization (WHO) and the Organisation for Economic Co-operation and Development (OECD), antimicrobial resistance (AMR) has become one of the biggest threats to global health, food security and economic development [17,18]. Approximately 50,000 lives per year are lost due to AMR infections within the USA and Europe [19] and AMR infections are expected to increase, reaching 10 million deaths per year by 2050 [20].
Further, the misuse of antibiotics in medical, veterinary and agricultural sectors continues to contribute to the alarming de Koning and Miladi et al. | 3 global rise in antibiotic resistant infections -an increase that may ultimately lead to an era where common infections could once again be lethal. However, the (rapid) detection of antimicrobial resistant pathogens and their resistances in diseases, food and the environment are pillars by which increasing AMR could be detected, monitored and prevented.
Conventional methods for the identi cation of antimicrobial resistances involves microbial isolation (via culture) and phenotypic typing, which together can take a few days or weeks to complete [21]. Moreover, not all microbial species are amenable to laboratory-based culturing [22]. DNA-sequencing technologies may be utilised to sequence the genomes of cultured microorganisms for the presence of antimicrobial resistance genes, which reduces the time-to-result time. Currently, Illumina sequencing is most widely used, but using this sequencing technology generates di culties in correctly identifying repetitive insertion sequences, sequences that may ank horizontally acquired genes associated with AMR [23]. Nanopore long-read assemblies could improve resolving these repetitive regions.
In this worked example, the outcome of the Nanogalaxy pipeline was compared to the plasmid sequences recovered by Li et al. [31] (Table S1). The pipeline recovered 19 out of 21 plasmids, with an average identity of 97.76%. The number of detected resistance genes was higher than that found by Li et al. [31], which was expected as Staramr [27] includes the PointFinder (chromosomal point mutations) database [32] and current long-read sequencing may generate relatively high sequence error rates.

Use case 2: Combining short-and long-read sequencing
The previously described long-read assembly work ow rapidly assembles genomes. Since short-read sequencing platforms tend to have a higher accuracy at single-nucleotide level, hybrid solutions to gain from both short-and long-read data are of special interest. The NanoGalaxy toolkit includes a workow that processes both long-and short-read sequences. In this respect, Unicycler was integrated into the NanoGalaxy toolkit in order to combine the best features of long-and shortsequencing technologies. The work ow recommended by the Unicycler developers [33] includes: Trim Galore [34], Porechop [35] and Filtlong [36] for quality trimming; Unicycler [33] for de novo assembly and bandage [29] for plasmid visualization. These tools are available as stand-alone tools and combined in a NanoGalaxy work ow.
The assembly graphs shown in Fig.  1, compare the NanoGalaxy toolkit with the results from Wick et al. [33]. The Illumina-only (short-read sequencing) graphs show no clear structure(s) present, whereas Nanopore-only (long-read sequencing) is able to generate the circularized structure expected of plasmids. The combination of both sequence techniques gives the clearest view of the circular assemblage expected of plasmids, analogues to the results obtained by Wick et al. [33] (Figure 1). Note that di erent combinations of shortand long-read tools can be used individually, or combined, to generate personalized work ows.

Conclusion
In this work we covered some important aspects of long-read sequencing analysis with a special focus on ONT sequencing data. We aggregated commonly-used tools into a single consistent inter-operable interface and presented solutions for metagenomic analysis and genome assembly. Furthermore, other long-read sequencing data analysis tools have been developed are currently under development, however we have focused on the most established and widely-used tools. Nevertheless, we expect that the toolkit will be further extended by the community, as NanoGalaxy is part of the open Galaxy platform and Galaxy community. Lastly, the majority of the integrated tools that support other technologies such as PacBio should also work inside Galaxy. However, here we have done an intensive testing of the integrated tools for ONT data.

Methods Implementation
The tools and work ows included in the NanoGalaxy toolkit enable non-bioinformatics-trained researchers to perform extensive genomics analysis using long-read sequence data, without the need for any coding skills. All tools and their dependencies are installed on the Galaxy platform and are managed by the Conda framework for dependency management. NanoGalaxy tools and their dependencies are available from the Bioconda Conda channel [37]. The Galaxy wrappers are developed openly on GitHub, utilizing the Travis continuous integration framework [38] for testing, and have been made available on the Galaxy ToolShed [13].

Training Materials
An online training manual for the AMR use case described in this publication, as well as a description of NanoGalaxy tools and end-to-end work ows can be found on the Galaxy training materials website [39].

Future Work
The availability of long-read sequencing platforms and data analysis tools is relatively new, with improvements in technology and software continually being developed. As more tools become available these will need to be assembled into existing or new toolkits. Additionally, the future availability of toolkits such as NanoGalaxy will help popularise long-read sequencing, while making it accessible to non-bioinformatics-trained researchers of the future.  Table 2. The work ows described in this work are publicly available from the European Galaxy server, as well as published Galaxy histories with an example run of each of these work ows (Table 3).

Availability of supporting data and materials
The data presented here to illustrate the functionality of the tools was obtained from previous publications [40,31]

Competing Interests
The authors declare that they have no competing interests.

Funding
This project was made possible with the support of Support Casper and the Albert Ludwig University of Freiburg. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 825775.

Author's Contributions
WdK, MM and SH contributed to toolkit development and writing of the manuscript. AH tested and evaluated the tools and suggested modi cations, feature requests and user improvements. JH contributed to AMR tool and nanopore sequencing discussions and the writing of the manuscript. MvdB and SF contributed to the tool development. BG contributed to the tool development, manuscript writing and supervised the project. DM, RB, and AS supervised the project.
All authors approved the nal version of the manuscript.   [33]. The plasmid assembly graphs output created by Bandage [29] are shown to con rm that the work ow works as expected. The length distribution, total yield and N50 of the Oxford Nanopore Technologies (ONT) reads of each K. pneumoniae represent the input data.   Background Short-read sequencing has become a routine technique within clinical diagnostics [1]. However, the short length of the reads obtained (150-300 bp) complicates the assembly of genomes, especially for highly repetitive regions and the detection of structural variation [2,3,4]. Furthermore, even "state-of-theart" algorithms cannot overcome the issues associated with genome mapping or assembly using short-read sequences. Importantly, advances in sequencing technology now allow longread sequencing to be performed. The two prominent longread sequencing platforms are nanopore sequencing by Oxford Nanopore Technologies and single molecule real time sequencing by PacBio [5,6]. These platforms generate sequence reads much longer than the classic short-read technologies, including long-reads from single DNA molecules and without the need of PCR ampli cation (>10 kilobase on average). Moreover, utilising these technologies, library preparation and sequencing may be performed outside of traditional research laboratories, with sequencing outputs generated in real-time [7]. Protocols that require no PCR ampli cation also permit the direct detection of base modi cations [8].
Analyzing the large amount of data generated by the shortand long-read sequencing technologies is a complex, multistep process that is computationally intensive and often require bioinformatics expertise. Speci cally, for each step in the analysis, a set of di erent tools or software may be needed. For example, de novo assembly is performed via a combination of multiple alignments, assembly and polishing tools, each utilizing its own input parameters. Such tools are typically executed from a UNIX command-line and require extensive computational resources, adding to the complexity of the analysis process. Command-line based work ow managers such as Snakemake and Next ow [9,10] can be used for analysing the data. However, these solutions require having expertise in working from the command line. On the other side, some web-based solutions have also been o ered. For example EPI2ME platform o ers a cloud-based solution with a web interface. The platform supports practical solutions for a limited set of application scenarios, and provides a limited exibility for con guring the underlying work ows. Here, the Galaxy platforms o ers a exible data analysis platform with a high degree of exibility, similar to the command-line based work ow managers, and an accessible web interface.
The Galaxy platform reduces the data analysis complexity and implements a standardized and user-friendly interface that accommodates command-line tools and re ned workows complete with their dependencies [11]. The platform hosts a wide range of tools/software and is widely used for bioinformatics analysis within the biological science community [12,13]. Here we introduce the NanoGalaxy toolkit for analysing Nanopore long-read data. NanoGalaxy comprises a series of integrated Galaxy-based tools that enable researchers to generate powerful short-or long-sequence read assemblies for genomic and plasmid bioinformatics analyses. The NanoGalaxy toolkit is a user-friendly environment that can be utilized inside or outside of traditional research laboratories.

Findings Tools
We have integrated a large collection of long-read sequence tools into the Galaxy platform, the NanoGalaxy toolkit, including diverse applications for the analyses of long-read sequences (Table 1).
This toolkit is freely available from the Galaxy ToolShed, and has additionally been made available as a specialized GalaxyEU subdomain (https://nanopore.usegalaxy.eu).

Work ows
In order to increase the utility of this toolkit, we have developed a set of Galaxy work ows performing common analysis tasks using the tools in the NanoGalaxy toolkit.

Metagenomics taxonomic classi cation
The base quality of nanopore sequencing reads is constantly improving, making the actual assembly of reads more reliable. Further, the long-reads generated by nanopore sequencing can be used to provide valuable information from metagenomics data, including taxonomic classi cations.
Kraken2 is a k-mer based classi cation technique that can e ciently assign the taxa of long reads that are resilient to the noisy nature of long-read data. The input reads for Kraken2 are compared to a database containing di erent classes and domains of life that are pre-indexed for algorithm e ciency. Within the NanoGalaxy toolkit we provide a work ow for taxonomic classi cation using Kraken2, including the postprocessing of data and visualization of the results as interactive pie charts using the Krona tool [14].

Nanopolish tutorials
Nanopolish includes an extensive set of software tools for analysing nanopore long-read information at the raw signal level. Further, accompanying Nanopolish documentation provides intuitive tutorials on common scenarios, such as variation analysis and base methylation calling from the raw and mapped signals -Loman et al. [15]. We have integrated Nanopolish and its tutorials into Nanogalaxy in the form of workows that can be used by researchers to analyse and interpret common quality values for their data.

De novo assembly of genome with highly repetitive repeats
Compared to short reads, long-read data has the advantage of facilitating the assembly of large genomes that contain high numbers of repetitive elements. Schmid et al. utilised Flye and several other tools to generate a comprehensive assembly of the Pseudomonas koreensis genome, identifying that the genome has near identical repeat pairs up to 70 kilobase pairs in length [16]. These work ows have also been integrated in the NanoGalaxy toolkit.

Worked example: Antimicrobial resistance
As a further illustration of the utility of the NanoGalaxy toolkit and work ows, we describe below a full end-to-end work ow within Galaxy. This analysis pipeline performs a microbial resistance detection in clinical samples. We describe this workow in more detail in our training manual on the Galaxy Training materials repository (https://training.galaxyproject.org; Antibiotic resistance detection).

Background
According to the World Health Organization (WHO) and the Organisation for Economic Co-operation and Development (OECD), antimicrobial resistance (AMR) has become one of the biggest threats to global health, food security and economic development [17,18]. Approximately 50,000 lives per year are lost due to AMR infections within the USA and Europe [19] and AMR infections are expected to increase, reaching 10 million deaths per year by 2050 [20].
Further, the misuse of antibiotics in medical, veterinary and agricultural sectors continues to contribute to the alarming de Koning and Miladi et al. | 3 global rise in antibiotic resistant infections -an increase that may ultimately lead to an era where common infections could once again be lethal. However, the (rapid) detection of antimicrobial resistant pathogens and their resistances in diseases, food and the environment are pillars by which increasing AMR could be detected, monitored and prevented.
Conventional methods for the identi cation of antimicrobial resistances involves microbial isolation (via culture) and phenotypic typing, which together can take a few days or weeks to complete [21]. Moreover, not all microbial species are amenable to laboratory-based culturing [22]. DNA-sequencing technologies may be utilised to sequence the genomes of cultured microorganisms for the presence of antimicrobial resistance genes, which reduces the time-to-result time. Currently, Illumina sequencing is most widely used, but using this sequencing technology generates di culties in correctly identifying repetitive insertion sequences, sequences that may ank horizontally acquired genes associated with AMR [23]. Nanopore long-read assemblies could improve resolving these repetitive regions.
In this worked example, the outcome of the Nanogalaxy pipeline was compared to the plasmid sequences recovered by Li et al. [31] (Table S1). The pipeline recovered 19 out of 21 plasmids, with an average identity of 97.76%. The number of detected resistance genes was higher than that found by Li et al. [31], which was expected as Staramr [27] includes the PointFinder (chromosomal point mutations) database [32] and current long-read sequencing may generate relatively high sequence error rates.

Use case 2: Combining short-and long-read sequencing
The previously described long-read assembly work ow rapidly assembles genomes. Since short-read sequencing platforms tend to have a higher accuracy at single-nucleotide level, hybrid solutions to gain from both short-and long-read data are of special interest. The NanoGalaxy toolkit includes a workow that processes both long-and short-read sequences. In this respect, Unicycler was integrated into the NanoGalaxy toolkit in order to combine the best features of long-and shortsequencing technologies. The work ow recommended by the Unicycler developers [33] includes: Trim Galore [34], Porechop [35] and Filtlong [36] for quality trimming; Unicycler [33] for de novo assembly and bandage [29] for plasmid visualization. These tools are available as stand-alone tools and combined in a NanoGalaxy work ow.
The assembly graphs shown in Fig.  1, compare the NanoGalaxy toolkit with the results from Wick et al. [33]. The Illumina-only (short-read sequencing) graphs show no clear structure(s) present, whereas Nanopore-only (long-read sequencing) is able to generate the circularized structure expected of plasmids. The combination of both sequence techniques gives the clearest view of the circular assemblage expected of plasmids, analogues to the results obtained by Wick et al. [33] (Figure 1). Note that di erent combinations of shortand long-read tools can be used individually, or combined, to generate personalized work ows.

Conclusion
In this work we covered some important aspects of long-read sequencing analysis with a special focus on ONT sequencing data. We aggregated commonly-used tools into a single consistent inter-operable interface and presented solutions for metagenomic analysis and genome assembly. Furthermore, other long-read sequencing data analysis tools have been developed are currently under development, however we have focused on the most established and widely-used tools. Nevertheless, we expect that the toolkit will be further extended by the community, as NanoGalaxy is part of the open Galaxy platform and Galaxy community. Lastly, the majority of the integrated tools that support other technologies such as PacBio should also work inside Galaxy. However, here we have done an intensive testing of the integrated tools for ONT data.

Implementation
The tools and work ows included in the NanoGalaxy toolkit enable non-bioinformatics-trained researchers to perform extensive genomics analysis using long-read sequence data, without the need for any coding skills. All tools and their dependencies are installed on the Galaxy platform and are managed by the Conda framework for dependency management. NanoGalaxy tools and their dependencies are available from the Bioconda Conda channel [37]. The Galaxy wrappers are developed openly on GitHub, utilizing the Travis continuous integration framework [38] for testing, and have been made available on the Galaxy ToolShed [13].

Training Materials
An online training manual for the AMR use case described in this publication, as well as a description of NanoGalaxy tools and end-to-end work ows can be found on the Galaxy training materials website [39].

Future Work
The availability of long-read sequencing platforms and data analysis tools is relatively new, with improvements in technology and software continually being developed. As more tools become available these will need to be assembled into existing or new toolkits. Additionally, the future availability of toolkits such as NanoGalaxy will help popularise long-read sequencing, while making it accessible to non-bioinformatics-trained researchers of the future.

Availability of source code and requirements
• Project name: NanoGalaxy • Project home page: https://nanopore.usegalaxy.eu • Training Manual: https://training.galaxyproject. org/training-material/topics/metagenomics/tutorials/ plasmid-metagenomics-nanopore/tutorial.html • License: GNU GPL • BiotoolsID: nanogalaxy • RRID: SCR_018912 All developed Galaxy wrappers are available for installation from the Galaxy ToolShed (https://toolshed.g2.bx.psu.edu/). The corresponding code repositories for the tool wrappers are  [33]. The plasmid assembly graphs output created by Bandage [29] are shown to con rm that the work ow works as expected. The length distribution, total yield and N50 of the Oxford Nanopore Technologies (ONT) reads of each K. pneumoniae represent the input data.

Category
Tool name