Exploring utility of genomic epidemiology to trace origins of highly pathogenic influenza A/H7N9 in Guangdong

Abstract The first highly pathogenic (HP) influenza A/H7N9 was reported in Guangdong in January 2017. To investigate the emergence and spread of HP A/H7N9 in Guangdong province, we sequenced 297 viruses (58 HP A/H7N9, 19 low pathogenic (LP) A/H7N9, and 220 A/H9N2) during 2016–2017. Our analysis showed that during the fifth wave, three A/H7N9 lineages were co-circulating in Guangdong: the local LP Pearl River Delta (PRD) lineage (13%), the newly imported LP Yangtze River Delta (YRD) lineage (23%), and the HP YRD lineage (64%). Previously circulating YRD-lineage LP during the third wave evolved to the YRD-lineage HP A/H7N9 in Guangdong. All YRD-lineage LP detected during the fifth wave most likely originated from newly imported viruses into Guangdong. Genotype comparison of HP A/H7N9 suggests limited outward spread of HP A/H7N9 to other provinces. The distribution of HP A/H7N9 cleavage site variants on live poultry markets differed from that found in humans, suggesting a V1-type cleavage site may facilitate human infections.


Introduction
Since the first detection of low pathogenic (LP) A/H7N9 avian influenza in the Yangtze River Delta (YRD) region in China in 2013, five epidemic waves of human and poultry transmission took place in China (Li et al. 2014). An important risk factor for human A/H7N9 infection is a visit to a live poultry market (LPM) (Liu et al. 2014;Yu et al. 2014). Control of the A/H7N9 epizootic has been challenging due to extensive live poultry trading and absence of obvious clinical symptoms in infected birds (Spackman et al. 2015;Zhou et al. 2015). In January 2017, the first highly pathogenic (HP) A/H7N9 virus was identified in a patient with confirmed exposure to dead poultry in Guangdong province, China (Ke et al. 2017). Subsequently, similar viruses were reported in poultry sampled at LPMs in Guangdong (Lu et al. 2018).
In response to the first emergence of HP A/H7N9 in Guangdong province, we used data from our ongoing LPMs environmental surveillance program in the Guangdong province to genetically trace the source of these HP A/H7N9 viruses. In addition, we used genomic data to compare the molecular characteristics of the A/H7N9 viruses in humans, poultry, and the environment. We aim to have a thorough understanding of the origin, spread, and transmission of this novel virus in Guangdong province.

Influenza a surveillance at the live poultry markets
Influenza A surveillance was conducted at twenty-eight sentinel hospitals and LPMs in twenty-one prefecture-level cities in Guangdong province during the fifth wave of the A/H7N9 outbreak (November 2016-August 2017. In order to trace the origin of HP A/H7 in Guangdong, we retrospectively screened samples of LPMs collected since January 2016. Environmental samples of LPMs include swab specimens collecting from cages, chopping boards, drinking water, and feather removal machines. Additionally, oropharyngeal and cloacal swabs were taken from apparently healthy chickens, ducks, pigeons, francolins, geese, and quails on LPMs. We collected samples at one retail market and one wholesale market in each of twenty-one prefecturelevel cities. All samples were placed in viral transport medium and stored at À80 C until further processing.

Screening and virus isolation
Samples were first tested for the presence of influenza A RNA using real-time reverse transcription-PCR (rRT-PCR) assays targeting the matrix protein, followed by rRT-PCR assays specifically targeting the A/H7 and A/H9 genes in twenty-one local Centers for Disease Control and Prevention (CDC) and were further verified by the Guangdong Provincial CDC as described previously (Ke et al. 2014). HP A/H7N9 was detected with rRT-PCR amplification of a fragment of HA, as described previously (Jia et al. 2017). Samples positive for A/H7 and A/H9 were treated by antibiotics for 30 min and blindly passaged for two to three generations in 9-to 10-day-old embryonated chicken eggs for virus isolation. Allantoic fluid was collected 2 days after inoculation and tested for the presence of virus by a hemagglutination test using 1 per cent guinea pig erythrocytes or 0.5 per cent turkey erythrocytes. Then positive subtypes of H7 and H9 were further confirmed by rRT-PCR and processed for sequencing. In total, 251 HP and 70 LP A/H7N9 rRT-PCR positive samples were inoculated, yielding 83 and 24 isolates, respectively. Of these, fiftyeight and nineteenwere successfully sequenced. For A/H9N2 viruses, 863 positive samples were inoculated, yielding 360 isolates, and 220 were successfully sequenced.

Genome sequencing
Viral RNA was extracted from allantoic fluid samples using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany).
Extracted RNA was subjected to reverse transcription and amplification using the SuperScript V R III One-Step RT-PCR system (Thermo Fisher Waltham, USA). Genome sequencing of environmental A/H7 and A/H9 viruses was performed on the nextgeneration Ion PGM sequencing platform. Sequencing data were analyzed with CLC Genomics Workbench 7.5.1 software. Lowquality reads were trimmed using CLC trimmer with a quality limit set at 0.05. Trimmed reads were then de novo assembled in CLC using default parameters. Contigs with a coverage above 10 were extracted and annotated using a BLAST search against the NCBI non-redundant nucleotide database (Benson et al. 2014). The sequence with the highest nucleotide identity was downloaded as reference for read mapping. Consensus sequences for all segments were extracted from the mapping results, with at least 10Â coverage depth at each site.

Epidemiological data
Information on live poultry imports from other regions of China to Guangdong province was obtained from the local Ministry of Agriculture. According to the geographic distribution of twennty-nine Chinese provinces and cities, we classified them into six regions. Provinces Hebei, Heilongjiang, Jilin, Liaoning, Inner Mongolia, Shanxi, and Beijing city were grouped as northern regions. Provinces Hubei, Hunan, and Henan were grouped as central regions. Provinces Anhui, Fujian, Jiangsu, Jiangxi, Shandong, Zhejiang, and the city of Shanghai were grouped as eastern region. Hainan is considered the southern region. The north-western region included provinces Gansu, Ningxia, Qinghai, Shaanxi, Xinjiang and the south-western region included provinces of Guangxi, Guizhou, Sichuan, Tibet, Yunnan, and the city of Chongqing.

Phylogenetic analysis
Genome sequences from the present study were combined with all publicly available sequences of influenza A/H7N9 (China) and A/H9N2 (China) virus sequences available in GISAID and part of A/H9N2 (China) from NCBI. The reference sequence accession numbers are summarized in Supplementary Table S3. A total of 15,103 A/H7N9 sequences (HA ¼ 1,989, NA ¼ 2,173, PB2 ¼ 1,779, PB1 ¼ 1,620, PA ¼ 1,793, NP ¼ 1,913, M ¼ 2,027, NS ¼ 1,809) and 6950 A/H9N2 sequences (PB2 ¼ 1,159, PB1 ¼ 1,150, PA ¼ 1,162, NP ¼ 1,151, M ¼ 1,172, NS ¼ 1,140) were downloaded and analyzed. From the first to the fifth wave, a total of 1,371 complete genomes (1,301 of LP A/H7N9 and 70 of HP A/H7N9) were available in GISAID. We only used sequence data with known collection dates and sampling locations in our analyses. Sequences were aligned in MUSCLE v3.5 with default settings (Edgar 2004). The alignment was manually checked and corrected in BioEdit (version 7.0.5.2). The amino acid sequences of 58 HP A/H7N9 and 19 LP A/H7N9 viruses were translated by NCBI ORF Finder. Phylogenetic analysis was performed using IQ-TREE (Nguyen et al. 2015), using maximum-likelihood phylogenies under the GTRþF þ IþG4 nucleotide substitution model as the best predicted model. Clade nomenclature of eight segments was assigned as described previously (Lam et al. 2015). We defined a novel genotype of A/H7N9 virus if one of eight segments belonged to a different clade. Clades were further divided into sub-clades (1.1, 1.2, 2.1, 2.2, etc.) if at least three sequences were available and the average intra-sub-clade genetic distance was less than 3 per cent, as described previously (Lam et al. 2015).
We estimated molecular clock phylogenies by using the Bayesian Markov chain Monte Carlo approach implemented in BEAST version 1.8. We computed four independent Markov chain Monte Carlo runs of 1.5 Â 10 8 steps for each alignment and extracted a subset of 2,000 phylogenies from the posterior tree distribution, subsequently used as an empirical tree distribution for phylogeographic analyses. We computed maximum clade credibility trees for each dataset by using TreeAnnotator. We used the discrete phylogeographic method implemented in BEAST to investigate spatial dynamics of H7N9 virus lineages from seven regions in China. The seven regions were central China (Jiangxi, Hunan, and Hubei), southern China (Guangdong), Eastern China (Anhui, Jiangsu, Shandong, and Zhejiang), Southwestern China (Yunnan, Sichuan, Gansu, Guangxi, and Guizhou), Northern China (Hebei, Henan, Liaoning, and Jilin), Southeastern China (Fujian), Northwestern China (Xinjiang, Shanxi, and Ningxia).

Analysis of genetic markers for risk assessment
Protein sequences of fifty-eight HP A/H7N9 viruses in this study were compared with publicly available HP A/H7N9 sequences using BioEdit. Biologically relevant amino acid substitutions on protein HA, NA, NS1, M2, PA, PB1, and PB2 were compared as described in previous studies (Jiao et al. 2008;Ping et al. 2010;Mok et al. 2011Mok et al. , 2014Tada et al. 2011;Song et al. 2014;Bi et al. 2015;Xiao et al. 2016).

Influenza a surveillance at the live poultry markets
During the study period, 34,106 environmental swabs were tested, ranging from 572 to 3,790 per month during January 2016-August 2017 (Table 1). Surveillance data showed that influenza A virus was detected in 7.4-29.4 per cent of the monthly collected samples. A/H7 accounted for 1.0-36.8 per cent of all avian influenza A viruses (Table 1) (Table 1). We sequenced the complete genomes of 58 HP H7N9, 19 LP H7N9, and 220 H9N2 viruses isolated from LPMs and poultry from November 2016 to November 2017. These sequences have been deposited into the GISAID database with accession numbers listed in Supplementary Table S1.

Poultry trade
A total of 5.5 billion live birds were imported into Guangdong from different areas of China including twenty-nine provinces from the northwestern, southwestern, northern, central, eastern, and southern regions during the period January 2016-July 2018 (Supplementary Table S2). Specifically, live poultry from the south-west, central, eastern, and south of China accounted for 70 per cent of imported live poultry into Guangdong province. Those regions included eleven provinces, which were Yunnan (south-west), Sichuan (south-west), Guizhou (southwest), Guangxi (south-west), Hunan (central), Henan (central), Anhui (eastern), Shandong (eastern), Jiangsu (eastern), Zhejiang (eastern), and Hainan (south) ( Fig. 2A and B).

Analysis of a/H7N9 sequences
We compared the newly generated sequence data with all Chinese A/H7N9 HP and LP full genomes available in public   (Lam et al. 2015). The YRD viruses have expanded and have been further subdivided in clades and sub-clades, following nomenclature as described by Lam et al. (2015) (Fig. 3, Supplementary Fig. S1). The A/H7N9 viruses of the fifth wave clustered into two major clades, designated as Clade B and C. All LP YRD-lineage A/H7N9 from the fifth wave cluster into the C-2a, C-2b, and C-3 clades. HP YRD-   lineage A/H7N9 viruses clustered into Clade C-1. The ancestor of the HA gene of YRD-lineage HP A/H7N9 seems to have originated from third wave viruses that spread to Guangdong in early 2015 (Fig. 4, Supplementary Figs S1 and S3). We found that virus isolates from eastern China and Guangdong in the fourth wave were at the root of clade C-1, suggesting that this clade possibly originated from eastern China (Fig. 4, Supplementary  Fig. S1).
The comparison of genotypes identified a total of 180 genotypes (G1-G180) from the first wave to the fifth epidemic A/ H7N9 wave (Supplementary Fig. S2). A total of ffifty-nine genotypes were detected in the fifth wave in China, of which nineteen genotypes were HP ( Supplementary Fig. S2), and forty genotypes were LP, reflecting the widespread evolution of these viruses. Of the nineteen genotypes of HP, 15, 5, 2, and 2 genotypes were found in Guangdong, Eastern China, Central China, and other regions, respectively. The most prevalent genotype G108 (83/125; 66%) was detected across China ( Supplementary  Fig. S2). Twelve of the fourteen other genotypes found in our Guangdong dataset are unique to Guangdong ( Supplementary  Fig. S2). Genotypes 113, 114, 119, and 126 were uniquely detected in other regions of China. Most HP A/H7N9 genotypes in Guangdong reassorted with NA and six internal genes of newly introduced YRD-lineage LP, local PRD-lineage LP, and local A/H9N2 viruses (Fig. 4). Our genome analysis confirmed the diversity of A/H7N9 viruses and its continuing diversification by genetic reassortment.

Reassortment history
To clarify the mechanism behind the genotype diversity of LP and HP A/H7N9 in Guangdong, the internal genes of the viruses within each HA clade were further analyzed. During the fifth wave, three H7N9 lineages were co-circulating in Guangdong: the local LP PRD lineage (13%), the newly imported LP YRD lineage (23%), and HP YRD lineage (64%). Strikingly, in addition to the PRD-lineage LP viruses, a diverse range of YRD-lineage LP A/ H7N9 viruses were found during the fifth wave (Fig. 3), which strongly suggests that these were newly introduced from other provinces. Moreover, in the fifth wave, the six internal genes of PRD-lineage A/H7N9 viruses in Guangdong seem to be replaced by those of YRD-lineage A/H7N9 viruses (Fig. 3). Most viruses, including PRD-lineage A/H7N9 viruses, contained descendants of internal genes of YRD-lineage A/H7N9, while none of the local LPM and human A/H7N9 viruses inherited all six internal gene segments from PRD-lineage A/H7N9. In the beginning of the fifth wave, HP A/H7N9 viruses acquired PB2 and M gene segments from clade 2, and had PB1, PA, NP, and NS gene segments of clade 1 (Fig. 3, Supplementary Fig. S1). The later spread of this virus to different regions in Guangdong led to occasionally replacement of six internal genes by clade 1 PB2 or by clade 2 PB1, PA, NP, and NS genes, or by clade 3 PB1, NP, and NS genes. Thus, current HP A/H7N9 was a result of extensive reassortment of introduced YRD-lineage A/H7N9, local PRD-lineage A/H7N9, and A/H9N2 viruses (Fig. 3, Supplementary Fig. S1).  S 0 (0) 0 (0) 0 (0) 1 (6.3) 1 (1.7) (continued)
Amino acid substitutions of human and environmental HP A/H7N9 were analyzed and compared (H3 numbering), with an emphasis on variants V1 and V3. Many amino acid substitutions that may be relevant for risk assessment were found both in human and environmental HP A/H7N9, which include G186V on the HA gene, K526R, A588V, and K702R on the PB2 gene, V100A, A402S, and S409N on the PA gene, S31N of M2 gene, N205S on the NS1 gene, and T48A on the NS2 gene (Table 2). Moreover, some amino acid substitutions were exclusively found in human HP A/H7N9, including G186I, Q226H, and Q226L on the HA gene, E119V, R292K, and R292X on the NA gene, A199S, T271A, A588I, E627K, E627X, D701N, and D701X on the PB2 gene, I368L on the PB1 gene, V33I on the NP gene, P41A on the M1 gene, and P42S on the NS1 gene (Table 2).

Discussion
The fifth epidemic wave of avian influenza A/H7N9 started in September 2016 with a sudden increase of human cases in seven provinces (Su et al. 2017). The first HP A/H7N9 was detected retrospectively on four markets in Heyuan, Guangdong province in November 2016, before the notification of the first case of human HP A/H7N9 infection in December 2016 (Ke et al. 2017). HP A/H7N9 was later also identified in other patients in the PRD Region, as well as the eastern and northern areas of Guangdong.
Since the first detection of HP A/H7N9 in four cities of Guangdong province, it was detected in nineteen cities in Guangdong and eleven other provinces in China. Analysis based on the HA gene suggested that HP A/H7N9 originated from the Guangdong and eastern China. It seems that HP A/H7N9 viruses spread from Guangdong to other regions (Fig. 3). This might have been due to market closure, which may have led to a shift in the market through retailers selling their live poultry in other regions or neighboring provinces (Wu et al. 2016). Genotype comparison demonstrated that, although Guangdong province accounted for 79 per cent of HP genotypes in the fifth wave, most HP genotypes are unique to Guangdong province. This suggests limited outward spread of HP A/H7N9 viruses. The reverse was true for LP, for which genome analysis revealed novel imported H7N9 viruses into Guangdong. Further analysis of six internal genes showed that the internal genes of HP A/H7N9 viruses have multiple origins, with internal gene segments from YRD-lineage H7N9 viruses from fourteen different provinces across the whole wave, which is consistent with the massive poultry import into Guangdong (Li et al. 2018). These imported LP viruses seemed to have an advantage as almost all internal segments of local PRD-lineage LP/H7N9 viruses were replaced by those of YRD-lineage H7N9 viruses. Our study suggests that live poultry trade is a key driver of genetic heterogeneity of HP H7N9 and LP H7N9 in Guangdong.
Seven different cleavage site variants (V1-V7) were identified in Guangdong in the fifth wave outbreak (Qi et al. 2018;Quan et al. 2018;Shi et al. 2018). Comparing these cleavage site motifs between human A/H7 viruses and those detected on LPMs, showed a difference in distribution of variants between LPMs and humans, suggesting V1 cleavage site may increase the replication in poultry. The high viral load makes the virus more readily transmit from birds to humans. It has been described that different cleavage site motifs can contribute to variations in virulence in HP avian influenza viruses in different hosts (Zhang et al. 2012). However, further research is needed to determine their influence on human pathogenicity and virulence. In addition, we found several mutations that have been described as potentially increasing the risk of pathogenicity and transmission in humans. A G186V amino acid substitution in the HA gene was found in 100 per cent of environmental HP A/ H7N9 viruses in the present study, implying an increased viral affinity to the human-type receptor (Chen et al. 2016). Although mutations on E627K and D701N in PB2 gene were not found in environmental HP A/H7N9, mutations K526R and M535L were found in 77.5 per cent environmental HP A/H7N9 viruses, which indicated a potential enhanced virulence in mammalians (Chen et al. 2016). Mutations associated with host transmission (avian to human) were also found in the PB2 gene (12%) and the PA gene (74%-90%) in environmental HP A/H7N9. Moreover, 99 per cent of environment HP A/H7N9 acquired S31N on M2 gene, N205S on NS1, T48A on NS2, which reportedly reduces susceptibility to rimantadine and amantadine and alter the antiviral response in the hosts (Chen et al. 2016). The potential consequences of these different mutations of HP A/H7N9 virus need further investigation.
Our study was done retrospectively, but showed potential for early warning for vaccine escape variants or circulation of viruses with molecular markers for human infections or antiviral resistance (Bai 2019). With the development of next-generation sequencing, such public health translation of pathogen genomic data can become a new routine, provided costs of sequencing and analytical processes is developed for such environments (Aarestrup and Koopmans 2016;Grubaugh et al. 2019). This includes infrastructure for sharing of relevant data, even before public release (Amid 2019).

Supplementary data
Supplementary data are available at Virus Evolution online.

Data availiablility
The data used to support the findings of this study are available from the corresponding author upon request.

Authors contributors
R.B. is a postdoctoral researcher at the Guangdong Provincial Center for Disease Control and Prevention, Guangzhou, China, whose research focuses on the epidemiology, evolution, and transmission of avian influenza viruses in Guangdong province. R.S.S. is a PhD candidate at the Erasmus Medical Centre, Rotterdam, The Netherlands, whose research focuses on the risk-based surveillance of influenza viruses in the market chain.