Abstract

Over the past century, Influenza A virus (IAV) has caused four of the five reported pandemics, all of which originated from viruses possessing genome segments of avian origin. The recent spread of highly pathogenic avian influenza (HPAI) viruses, particularly the clade 2.3.4.4b A(H5N1) subtype, has led to an alarming increase in mammalian infections, raising concerns about the potential for future pandemics. In response to this, we developed FluMut, an open-source, cross-platform tool designed to identify molecular markers with potential impacts on H5N1 virus phenotypes. FluMut leverages an up-to-date database, FluMutDB, to rapidly analyze thousands of nucleotide sequences, identifying mutations associated with host adaptation, increased virulence, and antiviral resistance. The tool is available both as a command-line interface and a user-friendly graphical interface, making it accessible to researchers with varying levels of computational expertise. FluMut provides comprehensive outputs, including tables of detected markers, their biological effects, and corresponding literature references. This tool fills a critical gap in the genomic surveillance of HPAI H5N1, facilitating real-time monitoring of viral evolution and aiding in the identification of mutations that may signal increased pandemic potential. Future updates will extend FluMut’s capabilities to other influenza subtypes.

Introduction

Since 1918 Influenza A virus (IAV) has been responsible for four of the five reported pandemics (Piret and Boivin 2021). IAV spillover events from avian to mammals and humans continue to be reported, but recently there has been an increase in the number of reports of mammalian infections. This worrisome trend is mainly the consequence of the spread of the highly pathogenic avian influenza (HPAI) A(H5) viruses of the clade 2.3.4.4b, descendent of A/goose/Guangdong/1/1996 (Gs/GD), which was first detected in China in 1996. Since late 2020, this clade has dramatically expanded its geographical distribution having been found on all continents, except Oceania and remaining well entrenched in a number of countries. Among all the co-circulating A(H5) subtypes, the A(H5N1) subtype quickly became the dominant one in wild and domestic birds and has been responsible for a large number of transmission events from birds to marine and terrestrial mammals, including some large outbreaks such as those in cats in Poland (Domańska-Blicharz et al. 2023) and Korea (Lee et al. 2024); in minks, foxes, and raccoon dogs reared in fur farms in Spain and Finland (Agüero et al. 2023, Kareinen et al. 2024); in marine mammals in North (Puryear et al. 2023) and South America (Plaza et al. 2024); and in dairy cattle in the USA (Nguyen et al. 2024). In several of these zoonoses, evidence of sustained mammal to mammal transmission and back spillover from mammals to birds has been reported (Nguyen et al. 2024, Rimondi et al. 2024).

These episodes, combined with the rapid evolution of IAV and its propensity to go through reassortments, represent a critical challenge for the emergence and spread of new variants with increased pandemic potential. The emergence of critical mutations that increase the ability of virus replication in mammalian cells (i.e. E627K, D701N, or T271A in the PB2 protein) has been commonly observed during the A(H5N1) virus replication in mammalian species (Melidou et al. 2024).

In this scenario, genomic surveillance is crucial to monitor the influenza virus evolution and promptly identify mutations associated with host adaptation, increased virulence in different hosts, resistance toward antiviral drugs, or evasion from the host’s antiviral response. To identify molecular markers, a limited number of tools are already available, such as FluSurver (https://flusurver.bii.a-star.edu.sg/, 5 February 2025, date last accessed) and Influenza-Mutation-Checker (https://github.com/dombyrne/Influenza-Mutation-Checker, 5 February 2025, date last accessed). Although extremely useful, these tools are not routinely updated with the most recent described mutations. Moreover, they are not always suited for analyses of large datasets or may require translating nucleotide sequences into proteins. The need to rapidly and continuously monitor the zoonotic potential of the HPAI H5N1 responsible for ongoing epizootic through the molecular analysis of thousands of HPAI H5N1 genome sequences led us to develop FluMut (https://github.com/izsvenezie-virology/FluMut). This user-friendly tool is based on a constantly updated database that includes the most important zoonotic markers described in the literature and allows for a rapid analysis of thousands of nucleotide sequences of the H5N1 subtype, providing an output that is easy to read and interpret.

Materials and methods

FluMut and FluMutGUI

FluMut is a command-line tool implemented in Python 3 using standard libraries, Click, openpyxl, and BioPython (Cock et al. 2009). Source code is freely available at https://github.com/izsvenezie-virology/FluMut. FluMut can be installed via PyPI and Bioconda. FluMutGUI (Fig. 1) is the graphical user interface (GUI) of FluMut, and it is written in Python 3 using standard libraries and PyQT5 to implement the cross-platform GUI. Source code is freely available at https://github.com/izsvenezie-virology/FluMutGUI. FluMutGUI can be installed via installer, available on GitHub, or via PyPI. For Windows platform, it is also available as a stand-alone installer. A detailed explanation on how to install and use FluMut and FluMutGUI is available at https://izsvenezie-virology.github.io/FluMut/.

FluMutGUI interface.
Figure 1.

FluMutGUI interface.

As input, FluMut takes a nucleotide FASTA file containing complete or partial H5N1 influenza virus sequences. By parsing the FASTA header, using a customizable regular expression, FluMut is able to assign each sequence to a specific segment and sample. The program aligns the input sequence/sequences against a reference sequence using the PairwiseAligner class from Biopython (Cock et al. 2009). The reference sequence is a complete, annotated, H5N1 sequence available on GISAID with accession ID EPI_ISL_16979821. From the resulting alignment, FluMut identifies the coding region and translates it into amino acid sequence for each encoded protein. The translation is performed using a custom algorithm, written in Python and available at https://github.com/izsvenezie-virology/FluMut/blob/v.0.6.3/src/flumut/FluMut.py#L247. The algorithm accounts for frameshifts and degenerate bases, ensuring that all possible amino acids encoded by a degenerate codon are considered. Amino acid positions for mutation detection are adjusted according to any gaps in the reference sequence used for alignment. This makes it possible to keep track of the correct position of each individual amino acid in the input sequences. Subsequently, FluMut searches for all mutations listed in the associated database (FluMutDB) described in the chapter below. Detected mutations are then used to query the database and obtain the list of markers.

We define a marker as a group of one or more amino acid mutations that have been demonstrated to have a phenotypic effect on virus properties. Specifically, we consider experimentally verified molecular markers associated with IAV host adaptation, virulence, changes in receptor binding, replicative capacity, and antiviral resistance.

FluMutDB

FluMutDB, available at https://github.com/izsvenezie-virology/FluMutDB, stores all information needed by FluMut to align, extract coding sequences, find mutations, and all the markers’ data needed to create the output. FluMutDB was created using the open-source Database Management System SQL ite and is distributed as a Python package available on PyPI and Bioconda. The database schema follows the Third Normal Form (3NF) to ensure the absence of data anomalies. It is composed of 11 tables, 6 of which contain genome information (segments, proteins, annotated reference sequences, and mutations), and the other 5 contain information about markers (i.e. amino acid mutations composing the marker, biological effects, subtypes, and the literature sources) (Fig. 2).

FluMutDB schema. The schema was created with dbdiagram.io.
Figure 2.

FluMutDB schema. The schema was created with dbdiagram.io.

The FluMutDB includes all the markers described by Suttie et al. (2019), as well as a list of additional markers associated with important biological effects identified by performing a literature review covering the period from 2018 to September 2024 (see Markers_list.xlsx in supplementary materials). The database will be constantly updated with new markers described in novel studies. A comprehensive, up-to-date list of all markers present in FluMutDB is available at https://izsvenezie-virology.github.io/FluMut/docs/markers. Moreover, users are welcome and encouraged to report new markers of interest to be included in the database through GitHub issues in the FluMutDB repository.

Results

FluMut

FluMut and FluMutGUI are open-source, cross-platform tools created to identify markers with potential impact on biological characteristics from sequences of Influenza A viruses of the A(H5N1) subtype.

The FluMut input file consists of a nucleotide FASTA file containing complete or partial H5N1 influenza virus sequences. The main FluMut output is a table containing all markers detected in the input sequences along with their effects, the subtype on which the effect was demonstrated, and the bibliographic references. The secondary outputs of FluMut include a table listing all the literature available in the FluMut database and a table showing the amino acid present in each input sequence at every mutation position detected in at least one sample. The tables can be saved as plain text files or can be combined into a single Excel file (.xlsm extension) (see output_example.xlsm in supplementary material), containing one sheet for each data table (namely “Mutations,” “Markers,” and “Literature”). Additionally, the Excel output contains two extra tables to improve the main output readability: “Markers per sample” that displays the list of the markers identified for each sample (Fig. 3), and “Sample per marker” that shows the list of the identified markers in the input sequences and the number of samples in which the marker is observed. The Excel output also offers some macros to help users navigate and filter data. An example of the input data file (input_example.fa) and examples of the output files (output_example_markers.tsv, output_example_mutations.tsv, output_example_literature.tsv and output_example.xlsm) generated by FluMut from the input sequences input_example.fa are available in the supplementary material.

“Markers per sample” sheet of a typical output. The complete output can be found in the supplementary materials (output_example.xlsm).
Figure 3.

“Markers per sample” sheet of a typical output. The complete output can be found in the supplementary materials (output_example.xlsm).

Comparison with other tools

A few other tools are available to search markers in custom sequences. The most well-known programs are FluSurver (https://flusurver.bii.a-star.edu.sg/,5 February 2025, date last accessed), a web-based tool with graphical interface also integrated into GISAID, and the Influenza Mutation Checker (https://github.com/dombyrne/Influenza-Mutation-Checker,5 February 2025, date last accessed). The analysis speed of FluMut is comparable to these programs. Table 1 shows the main differences among the three tools, which may be of help for the end user to choose the most appropriate one to use based on the information and type of output needed, the number of sequences to analyze, and the bioinformatics skill. The strength of FluMut lies in its continuously updated mutation database and its versatility. It is suitable for pipeline integration while remaining user-friendly for less experienced users through its graphical interface.

Table 1.

Main differences between FluMut, Influenza Mutation Checker, and Flusurver

 FluMutInfluenza Mutation CheckerFluSurver
Last database update12/09/202430/09/2021Unknown
Marker list available
Accepts nucleotide sequences
Indel detectiona
Degenerated codon handling
Command line interface
Graphical interface
GISAID integration
Analysis on multiple subtypesa
 FluMutInfluenza Mutation CheckerFluSurver
Last database update12/09/202430/09/2021Unknown
Marker list available
Accepts nucleotide sequences
Indel detectiona
Degenerated codon handling
Command line interface
Graphical interface
GISAID integration
Analysis on multiple subtypesa
a

Feature under development.

Table 1.

Main differences between FluMut, Influenza Mutation Checker, and Flusurver

 FluMutInfluenza Mutation CheckerFluSurver
Last database update12/09/202430/09/2021Unknown
Marker list available
Accepts nucleotide sequences
Indel detectiona
Degenerated codon handling
Command line interface
Graphical interface
GISAID integration
Analysis on multiple subtypesa
 FluMutInfluenza Mutation CheckerFluSurver
Last database update12/09/202430/09/2021Unknown
Marker list available
Accepts nucleotide sequences
Indel detectiona
Degenerated codon handling
Command line interface
Graphical interface
GISAID integration
Analysis on multiple subtypesa
a

Feature under development.

Discussion

Real-time or almost real-time generation of complete genome sequence data from representative HPAI H5N1 positive samples is highly recommended by international agencies (WOAH 2021, OFFLU 2023). This practise is increasingly adopted by laboratories, as testified by the growing amount of H5N1 genetic data available in public databases. These data are essential to trace the virus’s spatio-temporal evolution, to identify the emergence of novel genotypes from reassortment events, as well as to identify mutations that may affect the biological characteristics of the virus, and above all its zoonotic potential. Inspection of mutations is a crucial step in the genome analysis of avian influenza viruses, but extremely challenging and time-consuming due to the lack of a complete and up-to-date inventory of mutations and tools that allow their rapid identification in newly generated sequences. FluMut intends to fill this gap. The command-line version can easily be incorporated into bioinformatics workflows (e.g. bash scripts, Nextflow or Snakemake). Additionally, to facilitate its extensive use, we have developed and made available a graphical interface, which can be used by researchers with limited or no experience with the command line. The tool is highly flexible, allowing the analysis of anywhere from a single sequence to thousands in a timely manner, without requiring the translation of consensus sequences into amino acid sequences, a task that is often demanding, especially for proteins like NS2, M2, PA-X, PB1-F2, which are encoded by alternative open-reading frames.

At present, FluMut allows the analyses of influenza virus sequences of the H5N1 subtype. However, extension to all the subtypes is currently under development.

Acknowledgements

We would like to thank Gianpiero Zamperin for his suggestions in the development of FluMut.

Supplementary data

Supplementary data is available at VEVOLU Journal online.

Conflict of interest:

None declared.

Funding

This work was partially supported by the FLU-SWITCH Era-Net ICRAD (grant agreement No. 862605), by EU funding under the NextGeneration EU-MUR PNRR Extended Partnership initiative on Emerging Infectious Diseases (Project No. PE00000007, INF-ACT), and by KAPPA-FLU HORIZON-CL6-2022-FARM2FORK-02-03 (grant agreement No. 101084171). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or REA. Neither the European Union nor the granting authority can be held responsible for them.

Data availability

Source code is freely available on GitHub.

  1. FluMut: https://github.com/izsvenezie-virology/FluMut

  2. FluMutGUI: https://github.com/izsvenezie-virology/FluMutGUI

  3. FluMutDB: https://github.com/izsvenezie-virology/FluMutDB

FluMut complete documentation can be consulted at https://izsvenezie-virology.github.io/FluMut.

The examples provided in the supplementary materials were obtained using the complete genome sequences of three viruses available on GISAID with EPI_ISL_19496986 and EPI_ISL_19690438 and on NCBI GenBank with accession numbers from PQ898029.1 to PQ898036.1.

References

Agüero
 
M
,
Monne
 
I
,
Sánchez
 
A
 et al.  
Authors’ Response: Highly Pathogenic Influenza A(H5N1) viruses in farmed mink outbreak contain a disrupted second sialic acid binding site in neuraminidase, similar to human influenza a viruses
.
Eurosurveillance
 
2023
;
28
:2300109. doi:

Cock
 
PJA
,
Antao
 
T
,
Chang
 
JT
 et al.  
Biopython: freely available python tools for computational molecular biology and bioinformatics
.
Bioinformatics
 
2009
;
25
:
1422
23
. doi:

Domańska-Blicharz
 
K
,
Świętoń
 
E
,
Świątalska
 
A
 et al.  
Outbreak of highly pathogenic Avian Influenza A(H5N1) clade 2.3.4.4b virus in cats, Poland, June to July 2023
.
Eurosurveillance
 
2023
;
28
:2300366. doi:

Kareinen
 
L
,
Tammiranta
 
N
,
Kauppinen
 
A
 et al.  
Highly Pathogenic Avian Influenza A(H5N1) virus infections on fur farms connected to mass mortalities of black-headed gulls, Finland, July to October 2023
.
Eurosurveillance
 
2024
;
29
:2400063. doi:

Lee
 
K
,
Yeom
 
M
,
Vu
 
TTH
 et al.  
Characterization of highly pathogenic Avian Influenza A (H5N1) viruses isolated from cats in South Korea, 2023
.
Emerg Microbes Infect
 
2024
;
13
:2290835. doi:

European Food Safety Authority (EFSA), European Centre for Disease Prevention and Control (ECDC)
,
Melidou
 
A
,
Enkirch
 
T
,
Willgert
 
K
 et al.  
Drivers for a pandemic due to Avian Influenza and options for one health mitigation measures
.
EFSA J
 
2024
;
22
:e8735. doi:

Nguyen
 
T-Q
,
Hutter
 
C
,
Markin
 
A
 et al.  
Emergence and interstate spread of highly pathogenic Avian Influenza A(H5N1) in dairy cattle
.
BioRxiv
 
2024
. doi:

OFFLU
.
OFFLU Annual Report 2023
.
2023
. https://www.offlu.org/wp-content/uploads/2024/02/OFFLU_Annual_Report_2023.pdf (
3 March 2025
, date last accessed).

Piret
 
J
,
Boivin
 
G
.
Pandemics throughout history
.
Front Microbiol
 
2021
;
11
:631736. doi:

Plaza
 
PI
,
Gamarra-Toledo
 
V
,
Rodríguez Euguí
 
J
 et al.  
Pacific and Atlantic Sea lion mortality caused by highly pathogenic Avian Influenza A(H5N1) in South America
.
Travel Med Infectious Dis
 
2024
;
59
:102712. doi:

Puryear
 
W
,
Sawatzki
 
K
,
Hill
 
N
 et al.  
Highly pathogenic Avian Influenza A(H5N1) virus outbreak in New England Seals, United States
.
Emerg Infectious Dis
 
2023
;
29
:
786
91
. doi:

Rimondi
 
A
,
Vanstreels
 
RET
,
Olivera
 
V
 et al.  
Highly pathogenic Avian Influenza A(H5N1) viruses from multispecies outbreak, Argentina, August 2023
.
Emerg Infectious Dis
 
2024
;
30
:
812
14
. doi:

Suttie
 
A
,
Deng
 
Y-M
,
Greenhill
 
AR
 et al.  
Inventory of molecular markers affecting biological characteristics of Avian Influenza A Viruses
.
Virus Genes
 
2019
;
55
:
739
68
. doi:

WOAH
.
Avian Influenza
.
2021
. https://www.woah.org/fileadmin/Home/eng/Health_standards/tahm/3.03.04_AI.pdf (
3 March 2025
, date last accessed).

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site–for further information please contact [email protected].

Supplementary data