-
PDF
- Split View
-
Views
-
Cite
Cite
Edoardo Giussani, Alessandro Sartori, Angela Salomoni, Lara Cavicchio, Cristian de Battisti, Ambra Pastori, Maria Varotto, Bianca Zecchin, Joseph Hughes, Isabella Monne, Alice Fusaro, FluMut: a tool for mutation surveillance in highly pathogenic H5N1 genomes, Virus Evolution, Volume 11, Issue 1, 2025, veaf011, https://doi.org/10.1093/ve/veaf011
- Share Icon Share
Abstract
Over the past century, Influenza A virus (IAV) has caused four of the five reported pandemics, all of which originated from viruses possessing genome segments of avian origin. The recent spread of highly pathogenic avian influenza (HPAI) viruses, particularly the clade 2.3.4.4b A(H5N1) subtype, has led to an alarming increase in mammalian infections, raising concerns about the potential for future pandemics. In response to this, we developed FluMut, an open-source, cross-platform tool designed to identify molecular markers with potential impacts on H5N1 virus phenotypes. FluMut leverages an up-to-date database, FluMutDB, to rapidly analyze thousands of nucleotide sequences, identifying mutations associated with host adaptation, increased virulence, and antiviral resistance. The tool is available both as a command-line interface and a user-friendly graphical interface, making it accessible to researchers with varying levels of computational expertise. FluMut provides comprehensive outputs, including tables of detected markers, their biological effects, and corresponding literature references. This tool fills a critical gap in the genomic surveillance of HPAI H5N1, facilitating real-time monitoring of viral evolution and aiding in the identification of mutations that may signal increased pandemic potential. Future updates will extend FluMut’s capabilities to other influenza subtypes.
Introduction
Since 1918 Influenza A virus (IAV) has been responsible for four of the five reported pandemics (Piret and Boivin 2021). IAV spillover events from avian to mammals and humans continue to be reported, but recently there has been an increase in the number of reports of mammalian infections. This worrisome trend is mainly the consequence of the spread of the highly pathogenic avian influenza (HPAI) A(H5) viruses of the clade 2.3.4.4b, descendent of A/goose/Guangdong/1/1996 (Gs/GD), which was first detected in China in 1996. Since late 2020, this clade has dramatically expanded its geographical distribution having been found on all continents, except Oceania and remaining well entrenched in a number of countries. Among all the co-circulating A(H5) subtypes, the A(H5N1) subtype quickly became the dominant one in wild and domestic birds and has been responsible for a large number of transmission events from birds to marine and terrestrial mammals, including some large outbreaks such as those in cats in Poland (Domańska-Blicharz et al. 2023) and Korea (Lee et al. 2024); in minks, foxes, and raccoon dogs reared in fur farms in Spain and Finland (Agüero et al. 2023, Kareinen et al. 2024); in marine mammals in North (Puryear et al. 2023) and South America (Plaza et al. 2024); and in dairy cattle in the USA (Nguyen et al. 2024). In several of these zoonoses, evidence of sustained mammal to mammal transmission and back spillover from mammals to birds has been reported (Nguyen et al. 2024, Rimondi et al. 2024).
These episodes, combined with the rapid evolution of IAV and its propensity to go through reassortments, represent a critical challenge for the emergence and spread of new variants with increased pandemic potential. The emergence of critical mutations that increase the ability of virus replication in mammalian cells (i.e. E627K, D701N, or T271A in the PB2 protein) has been commonly observed during the A(H5N1) virus replication in mammalian species (Melidou et al. 2024).
In this scenario, genomic surveillance is crucial to monitor the influenza virus evolution and promptly identify mutations associated with host adaptation, increased virulence in different hosts, resistance toward antiviral drugs, or evasion from the host’s antiviral response. To identify molecular markers, a limited number of tools are already available, such as FluSurver (https://flusurver.bii.a-star.edu.sg/, 5 February 2025, date last accessed) and Influenza-Mutation-Checker (https://github.com/dombyrne/Influenza-Mutation-Checker, 5 February 2025, date last accessed). Although extremely useful, these tools are not routinely updated with the most recent described mutations. Moreover, they are not always suited for analyses of large datasets or may require translating nucleotide sequences into proteins. The need to rapidly and continuously monitor the zoonotic potential of the HPAI H5N1 responsible for ongoing epizootic through the molecular analysis of thousands of HPAI H5N1 genome sequences led us to develop FluMut (https://github.com/izsvenezie-virology/FluMut). This user-friendly tool is based on a constantly updated database that includes the most important zoonotic markers described in the literature and allows for a rapid analysis of thousands of nucleotide sequences of the H5N1 subtype, providing an output that is easy to read and interpret.
Materials and methods
FluMut and FluMutGUI
FluMut is a command-line tool implemented in Python 3 using standard libraries, Click, openpyxl, and BioPython (Cock et al. 2009). Source code is freely available at https://github.com/izsvenezie-virology/FluMut. FluMut can be installed via PyPI and Bioconda. FluMutGUI (Fig. 1) is the graphical user interface (GUI) of FluMut, and it is written in Python 3 using standard libraries and PyQT5 to implement the cross-platform GUI. Source code is freely available at https://github.com/izsvenezie-virology/FluMutGUI. FluMutGUI can be installed via installer, available on GitHub, or via PyPI. For Windows platform, it is also available as a stand-alone installer. A detailed explanation on how to install and use FluMut and FluMutGUI is available at https://izsvenezie-virology.github.io/FluMut/.

As input, FluMut takes a nucleotide FASTA file containing complete or partial H5N1 influenza virus sequences. By parsing the FASTA header, using a customizable regular expression, FluMut is able to assign each sequence to a specific segment and sample. The program aligns the input sequence/sequences against a reference sequence using the PairwiseAligner class from Biopython (Cock et al. 2009). The reference sequence is a complete, annotated, H5N1 sequence available on GISAID with accession ID EPI_ISL_16979821. From the resulting alignment, FluMut identifies the coding region and translates it into amino acid sequence for each encoded protein. The translation is performed using a custom algorithm, written in Python and available at https://github.com/izsvenezie-virology/FluMut/blob/v.0.6.3/src/flumut/FluMut.py#L247. The algorithm accounts for frameshifts and degenerate bases, ensuring that all possible amino acids encoded by a degenerate codon are considered. Amino acid positions for mutation detection are adjusted according to any gaps in the reference sequence used for alignment. This makes it possible to keep track of the correct position of each individual amino acid in the input sequences. Subsequently, FluMut searches for all mutations listed in the associated database (FluMutDB) described in the chapter below. Detected mutations are then used to query the database and obtain the list of markers.
We define a marker as a group of one or more amino acid mutations that have been demonstrated to have a phenotypic effect on virus properties. Specifically, we consider experimentally verified molecular markers associated with IAV host adaptation, virulence, changes in receptor binding, replicative capacity, and antiviral resistance.
FluMutDB
FluMutDB, available at https://github.com/izsvenezie-virology/FluMutDB, stores all information needed by FluMut to align, extract coding sequences, find mutations, and all the markers’ data needed to create the output. FluMutDB was created using the open-source Database Management System SQL ite and is distributed as a Python package available on PyPI and Bioconda. The database schema follows the Third Normal Form (3NF) to ensure the absence of data anomalies. It is composed of 11 tables, 6 of which contain genome information (segments, proteins, annotated reference sequences, and mutations), and the other 5 contain information about markers (i.e. amino acid mutations composing the marker, biological effects, subtypes, and the literature sources) (Fig. 2).

The FluMutDB includes all the markers described by Suttie et al. (2019), as well as a list of additional markers associated with important biological effects identified by performing a literature review covering the period from 2018 to September 2024 (see Markers_list.xlsx in supplementary materials). The database will be constantly updated with new markers described in novel studies. A comprehensive, up-to-date list of all markers present in FluMutDB is available at https://izsvenezie-virology.github.io/FluMut/docs/markers. Moreover, users are welcome and encouraged to report new markers of interest to be included in the database through GitHub issues in the FluMutDB repository.
Results
FluMut
FluMut and FluMutGUI are open-source, cross-platform tools created to identify markers with potential impact on biological characteristics from sequences of Influenza A viruses of the A(H5N1) subtype.
The FluMut input file consists of a nucleotide FASTA file containing complete or partial H5N1 influenza virus sequences. The main FluMut output is a table containing all markers detected in the input sequences along with their effects, the subtype on which the effect was demonstrated, and the bibliographic references. The secondary outputs of FluMut include a table listing all the literature available in the FluMut database and a table showing the amino acid present in each input sequence at every mutation position detected in at least one sample. The tables can be saved as plain text files or can be combined into a single Excel file (.xlsm extension) (see output_example.xlsm in supplementary material), containing one sheet for each data table (namely “Mutations,” “Markers,” and “Literature”). Additionally, the Excel output contains two extra tables to improve the main output readability: “Markers per sample” that displays the list of the markers identified for each sample (Fig. 3), and “Sample per marker” that shows the list of the identified markers in the input sequences and the number of samples in which the marker is observed. The Excel output also offers some macros to help users navigate and filter data. An example of the input data file (input_example.fa) and examples of the output files (output_example_markers.tsv, output_example_mutations.tsv, output_example_literature.tsv and output_example.xlsm) generated by FluMut from the input sequences input_example.fa are available in the supplementary material.

“Markers per sample” sheet of a typical output. The complete output can be found in the supplementary materials (output_example.xlsm).
Comparison with other tools
A few other tools are available to search markers in custom sequences. The most well-known programs are FluSurver (https://flusurver.bii.a-star.edu.sg/,5 February 2025, date last accessed), a web-based tool with graphical interface also integrated into GISAID, and the Influenza Mutation Checker (https://github.com/dombyrne/Influenza-Mutation-Checker,5 February 2025, date last accessed). The analysis speed of FluMut is comparable to these programs. Table 1 shows the main differences among the three tools, which may be of help for the end user to choose the most appropriate one to use based on the information and type of output needed, the number of sequences to analyze, and the bioinformatics skill. The strength of FluMut lies in its continuously updated mutation database and its versatility. It is suitable for pipeline integration while remaining user-friendly for less experienced users through its graphical interface.
. | FluMut . | Influenza Mutation Checker . | FluSurver . |
---|---|---|---|
Last database update | 12/09/2024 | 30/09/2021 | Unknown |
Marker list available | ✔ | ✔ | ✖ |
Accepts nucleotide sequences | ✔ | ✖ | ✔ |
Indel detection | ✖a | ✔ | ✔ |
Degenerated codon handling | ✔ | ✖ | ✖ |
Command line interface | ✔ | ✔ | ✖ |
Graphical interface | ✔ | ✖ | ✔ |
GISAID integration | ✖ | ✖ | ✔ |
Analysis on multiple subtypes | ✖a | ✖ | ✔ |
. | FluMut . | Influenza Mutation Checker . | FluSurver . |
---|---|---|---|
Last database update | 12/09/2024 | 30/09/2021 | Unknown |
Marker list available | ✔ | ✔ | ✖ |
Accepts nucleotide sequences | ✔ | ✖ | ✔ |
Indel detection | ✖a | ✔ | ✔ |
Degenerated codon handling | ✔ | ✖ | ✖ |
Command line interface | ✔ | ✔ | ✖ |
Graphical interface | ✔ | ✖ | ✔ |
GISAID integration | ✖ | ✖ | ✔ |
Analysis on multiple subtypes | ✖a | ✖ | ✔ |
Feature under development.
. | FluMut . | Influenza Mutation Checker . | FluSurver . |
---|---|---|---|
Last database update | 12/09/2024 | 30/09/2021 | Unknown |
Marker list available | ✔ | ✔ | ✖ |
Accepts nucleotide sequences | ✔ | ✖ | ✔ |
Indel detection | ✖a | ✔ | ✔ |
Degenerated codon handling | ✔ | ✖ | ✖ |
Command line interface | ✔ | ✔ | ✖ |
Graphical interface | ✔ | ✖ | ✔ |
GISAID integration | ✖ | ✖ | ✔ |
Analysis on multiple subtypes | ✖a | ✖ | ✔ |
. | FluMut . | Influenza Mutation Checker . | FluSurver . |
---|---|---|---|
Last database update | 12/09/2024 | 30/09/2021 | Unknown |
Marker list available | ✔ | ✔ | ✖ |
Accepts nucleotide sequences | ✔ | ✖ | ✔ |
Indel detection | ✖a | ✔ | ✔ |
Degenerated codon handling | ✔ | ✖ | ✖ |
Command line interface | ✔ | ✔ | ✖ |
Graphical interface | ✔ | ✖ | ✔ |
GISAID integration | ✖ | ✖ | ✔ |
Analysis on multiple subtypes | ✖a | ✖ | ✔ |
Feature under development.
Discussion
Real-time or almost real-time generation of complete genome sequence data from representative HPAI H5N1 positive samples is highly recommended by international agencies (WOAH 2021, OFFLU 2023). This practise is increasingly adopted by laboratories, as testified by the growing amount of H5N1 genetic data available in public databases. These data are essential to trace the virus’s spatio-temporal evolution, to identify the emergence of novel genotypes from reassortment events, as well as to identify mutations that may affect the biological characteristics of the virus, and above all its zoonotic potential. Inspection of mutations is a crucial step in the genome analysis of avian influenza viruses, but extremely challenging and time-consuming due to the lack of a complete and up-to-date inventory of mutations and tools that allow their rapid identification in newly generated sequences. FluMut intends to fill this gap. The command-line version can easily be incorporated into bioinformatics workflows (e.g. bash scripts, Nextflow or Snakemake). Additionally, to facilitate its extensive use, we have developed and made available a graphical interface, which can be used by researchers with limited or no experience with the command line. The tool is highly flexible, allowing the analysis of anywhere from a single sequence to thousands in a timely manner, without requiring the translation of consensus sequences into amino acid sequences, a task that is often demanding, especially for proteins like NS2, M2, PA-X, PB1-F2, which are encoded by alternative open-reading frames.
At present, FluMut allows the analyses of influenza virus sequences of the H5N1 subtype. However, extension to all the subtypes is currently under development.
Acknowledgements
We would like to thank Gianpiero Zamperin for his suggestions in the development of FluMut.
Supplementary data
Supplementary data is available at VEVOLU Journal online.
Conflict of interest:
None declared.
Funding
This work was partially supported by the FLU-SWITCH Era-Net ICRAD (grant agreement No. 862605), by EU funding under the NextGeneration EU-MUR PNRR Extended Partnership initiative on Emerging Infectious Diseases (Project No. PE00000007, INF-ACT), and by KAPPA-FLU HORIZON-CL6-2022-FARM2FORK-02-03 (grant agreement No. 101084171). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or REA. Neither the European Union nor the granting authority can be held responsible for them.
Data availability
Source code is freely available on GitHub.
FluMut complete documentation can be consulted at https://izsvenezie-virology.github.io/FluMut.
The examples provided in the supplementary materials were obtained using the complete genome sequences of three viruses available on GISAID with EPI_ISL_19496986 and EPI_ISL_19690438 and on NCBI GenBank with accession numbers from PQ898029.1 to PQ898036.1.