Identification of TMPRSS2 as a Susceptibility Gene for Severe 2009 Pandemic A(H1N1) Influenza and A(H7N9) Influenza

Abstract The genetic predisposition to severe A(H1N1)2009 (A[H1N1]pdm09) influenza was evaluated in 409 patients, including 162 cases with severe infection and 247 controls with mild infection. We prioritized candidate variants based on the result of a pilot genome-wide association study and a lung expression quantitative trait locus data set. The GG genotype of rs2070788, a higher-expression variant of TMPRSS2, was a risk variant (odds ratio, 2.11; 95% confidence interval, 1.18–3.77; P = .01) to severe A(H1N1)pdm09 influenza. A potentially functional single-nucleotide polymorphism, rs383510, accommodated in a putative regulatory region was identified to tag rs2070788. Luciferase assay results showed the putative regulatory region was a functional element, in which rs383510 regulated TMPRSS2 expression in a genotype-specific manner. Notably, rs2070788 and rs383510 were significantly associated with the susceptibility to A(H7N9) influenza in 102 patients with A(H7N9) influenza and 106 healthy controls. Therefore, we demonstrate that genetic variants with higher TMPRSS2 expression confer higher risk to severe A(H1N1)pdm09 influenza. The same variants also increase susceptibility to human A(H7N9) influenza.

The emergence of influenza virus A(H1N1)2009 (A[H1N1]pdm09) in early 2009 caused the first influenza pandemic of the 21st century. The global mortality of A(H1N1)pdm09 infection has been estimated to be about 0.2 million [1]. In Hong Kong, as of October 2010, a total of 299 severe cases were documented [2].
After the 2009 pandemic, A(H1N1)pdm09 virus continued to circulate and caused major epidemics in Northern America and Eurasia [3,4]. About one-quarter to one-half of patients with severe A(H1N1)pdm09 influenza were previously healthy, without preexisting medical conditions that could enhance the risk, implicating the presence of host genetic predisposition for severe A(H1N1)pdm09 infection [5]. The World Health Organization, therefore, identified the study of host genetic factors on susceptibility and severity of influenza infection as a priority [6]. Genetic predisposition to severe A(H1N1)pdm09 influenza has been extensively studied in our laboratory and others [7][8][9]. We have demonstrated that genetic polymorphisms of surfactant protein B (SFTPB) and CD55 contribute to the severe illness of A(H1N1)pdm09 infection [7,8].
Four years after the 2009 pandemic A(H1N1) influenza, a novel reassortant avian influenza virus A(H7N9) was identified to cause human infection; which has affected 500 patients, with the case fatality rate of >30% as of 19 January 2015 [10]. Moreover, the virus continues to be detected in animals and the environment in China [11], leading to the second and third waves of human A(H7N9) influenza outbreak in the winters of 2013-2014 and 2014-2015, respectively. Most patients with A(H7N9) influenza had significant exposure to infected poultry or contaminated environment such as the live poultry market [12]. In China, a high proportion of the common population are assumed to be exposed to live poultry in both urban and rural settings. Therefore, we speculate that individuals with certain genetic predisposition are more likely to develop symptomatic A(H7N9) influenza.
We have conducted a small-scale genome-wide association study (GWAS), comparing the distribution of genetic variants in severe and mild cases of patients with A(H1N1)pdm09 influenza, aiming to identify the risk variants or susceptible genes predisposing to severe A(H1N1)pdm09 influenza. Instead of prioritizing the candidate variants with the association P value, we used the human lung expression quantitative trait locus (eQTL) data set based on a rational assumption that the genes differentially expressed in lung tissues would be more important in determining the severity of A(H1N1)pdm09 influenza in affected individuals.
We chose rs2070788, an intronic single nucleotide polymorphism (SNP) in transmembrane protease S2 (TMPRSS2) and a lung cis-eQTL for TMPRSS2, from the pilot GWAS. We verified the genetic association of rs2070788 with severe infection in a cohort of 409 patients with A(H1N1)pdm09 influenza, including 162 cases with severe illness and 247 controls with mild illness. We also replicated the genetic association of rs2070788 with the susceptibility to A(H7N9) influenza in a cohort of 102 patients with A(H7N9) influenza and 108 healthy controls. In addition, we performed functional studies to validate the expression variation underlying the genetic association.

Study Participants
The study participants in the A(H1N1)pdm09 cohort were described elsewhere [7,8]. Their clinical characteristics are specified in Table 1. The patients and controls in A(H7N9) cohort have also been described previously [13]. This study was approved by the institutional review board of The University of Hong Kong/Hospital Authority of Hong Kong and the First Affiliated Hospital, College of Medicine, Zhejiang University.

Genotyping Methods and Analysis
The genomic DNA samples from 42 patients with severe and 42 with mild A(H1N1)pdm09 influenza were genotyped using the Genome-Wide Human SNP Array 6.0 (Affymetrix), as described elsewhere [8]. A total of 561 844 SNPs were left after exclusion of SNPs with allele frequency <5% and genotyping missing rate >10%. Genomic DNA samples from additional 325 patients with A(H1N1)pdm09 influenza were genotyped using the SE-QUENOM MassArray Genotyping platform (available at Center for Genomic Sciences, The University of Hong Kong). The association P values of SNPs were generated using PLINK v1.07 software [14]. The rs2070788 genotyping data of patients with A(H7N9) influenza and controls were obtained from the previous study [13]. The genotyping results of rs383510 were obtained by local imputation using MACH v1.0 software [15] with genotype data from 286 East Asian individuals (97 Chinese Han in Beijing, 100 Southern Han Chinese, and 89 Japanese in Tokyo) from the 1000 Genomes Project, released in June 2011 as a reference panel.
Haploview software was used to analyze and visualize the linkage disequilibrium (LD) pattern of the interested variants [16]. The in-house generated program indelLDplot (https:// sourceforge.net/projects/indelldplot/files) was used to search for functional variants in high LD with the user-interested variant through mining the publically available 1000 Genomes and Hap-Map databases. IndelLDplot software can also annotate these high-LD variants with biological information, such as genomic regulatory features from the Ensembl and ENCODE database.

Reporter Vector Construction, Transfection and Luciferase Assay
A putative regulatory element in TMPRSS2 intronic region (chr21: 42 856 950-42 858 646; human genome assembly 19) accommodating rs383510 was amplified by polymerase chain reaction with primer pair GCCGAGGTACCCCAAGAAATGT and CGGTAAGGTACCCCCTCCTGCC (KpnI digestion site underlined), cloned into pGL3-Basic vector (Promega) to generate 2 luciferase vectors with genotype TT or CC at locus rs383510. The correctness of the constructs was verified by direct sequencing. The 2 reporter vectors and unmodified pGL3-Basic vector were transfected into A549 cells, the cell line commonly used to model human respiratory epithelial cells, for luciferase assay, as described elsewhere [17].

Statistical Analysis
The linear regression analysis incorporated in PLINK was used to examine the correlation of a specific variant to the quantitative expression in human lung tissues, using an additive model to estimate the effect of one copy increment of the variant. The luciferase assay results were analyzed using Student t test.
Stepwise multivariate analysis and χ 2 , Fisher exact, and Mann-Whitney U tests were used for data analysis. Differences were considered statistically significant at P ≤ .05.

Genetic Variation of TMPRSS2 Associated With Disease Severity of A(H1N1)pdm09 Infection
We performed a small-scale GWAS in 42 patients with severe and 42 with mild A(H1N1)pdm09 influenza. A total of 28 566 SNPs showed differential allele distribution between the 2 groups (P < .05). These SNPs were intersected with lung cis-eQTLs at P < 10 −8 , generating a total of 1114 SNPs linked to the expression of 445 unique genes. Based on the mounting recognition of TMPRSS2 for influenza virus propagation, we chose an intronic SNP in TMPRSS2, rs2070788, which was mapped as a lung cis-eQTL (P = 4.95 × 10 −10 ) for TMPRSS2. We evaluated its genetic association with severe A(H1N1)pdm09 influenza in an extended cohort of 409 patients with A(H1N1)pdm09. The clinical characteristics of the patients are shown in Table 1.
It was demonstrated that the rs2070788 G allele was significantly overrepresented in severe compared with mild cases (P = .004; Table 2). The GG genotype conferred >2-fold higher vulnerability to severe infection under the recessive model (odds ratio, 2.11; P = .01). Because patient age, sex, and the presence of some predisposition conditions differed significantly between the severe and mild infection groups (Table 1), we performed multivariate regression analysis to exclude the effect of the confounding factors. We demonstrated that the rs2070788 GG genotype remained as an independent significant association signal for severe A(H1N1)pdm09 influenza (odds ratio, 2.15; P = .02).
We profiled the TMPRSS2 expression in correlation to rs2070788 genotypes from the lung eQTLs data set [18]. The profile showed that individuals with GG genotype have the highest expression, GA heterozygotes have intermediate expression and AA homozygotes have the lowest expression of TMPRSS2 (Figure 1, left). Therefore, our results demonstrated that genetic variations of TMPRSS2 were associated with the predisposition to severe A(H1N1)pdm09 infection. The higher TMPRSS2 expression variant, rs2070788 GG genotype, was associated with higher susceptibility to severe illness in patients with A(H1N1)pdm09 influenza.

Identification of Putative Functional Variant
Based on lung eQTL analysis, rs2070788 genotypes were correlated with the differential expression level of TMPRSS2. However, there is currently little evidence implicating the functional basis of rs2070788 for TMPRSS2 expressional regulation. As shown in Figure 2A, rs2070788 does not reside in any known regulatory region, which may affect the expression of TMPRSS2. Therefore, we proceeded to seek the functional variants in high LD with rs2070788 that caused the differential expression of this gene and generated the association signal of rs2070788 with severe A(H1N1)pdm09 infection. We identified 9 potentially functional variants in a 20-kb range upstream and downstream of TMPRSS2 using our in-house generated program IndelLDplot. We plotted the LD pattern of these potentially functional variants with rs2070788 using genotyping data of Chinese (Chinese Han in Beijing) and Asian populations from the 1000 Genomes Project. Four of 9 variants in the vicinity of TMPRSS2 (rs461194, rs2298662, rs4816720, and rs383510) showed high LD with rs2070788 (r 2 > 0.80) in the Chinese and Asian populations ( Figure 2B).
Based on the discovery of ENCODE and Roadmap experiments, 2 international collaborative projects to functionally annotate human genome, rs383510 was situated in a regulatory region predicted as a putative enhancer (Figure 2A). Moreover, among all the high-LD variants with rs2070788, rs383510 emerged as a lung cis-eQTL (P = 3.84 × 10 −14 ; Figure 1, right), the strongest among all the eQTLs for TMPRSS2 gene in human lung tissues. Therefore, we inferred that rs383510 might be the functional variant that dictated the differential TMPRSS2 expression and underlay the genetic association of rs2070788 with severe A(H1N1)pdm09 infection.

Rs383510 T Allele Increasing Enhancer Activity in Luciferase Assay
Despite its identity as a lung cis-eQTL, we cannot directly extrapolate rs383510 as a functional regulatory variant for TMPRSS2 expression. According to the data of ENCODE and Roadmap, the DNA element harboring rs383510 is predicted to be a putative regulatory region with enhancer activity. To verify the prediction, we cloned the element into a luciferase reporter vector and transfected the reporter gene constructs into A549 cells for luciferase assay.
We demonstrated that the putative regulatory region was functional because the reporter vectors with the insertions displayed much higher luciferase activity than the unmodified vector (P = .006; Figure 3). Furthermore, the luciferase activity of rs383510 TT plasmid was significantly greater than that of CC plasmid (P = .02). Therefore, using a reporter gene assay, we identified a regulatory DNA element that can potentially govern TMPRSS2 expression, in which the rs383510 T variant exhibited a significantly higher transcriptional level than the C variant. The luciferase assay results suggested that rs383510 was not only correlated with differential TMPRSS2 expression but also functionally regulated its expression.

Association of rs2070788 and rs383510 With Susceptibility to A(H7N9) Infection
Influenza A(H7N9) virus, similar to the A(H1N1)pdm09 virus, has a monobasic cleavage site in the viral hemagglutinin (HA) protein, which is activated by TMPRSS2 for viral replication and virus pathogenicity in vitro and in vivo [19,20]. Therefore, we assessed the association of rs2070788 and rs383510 with susceptibility to human A(H7N9) influenza in patients with A(H7N9) influenza and healthy controls. Remarkably, the association of rs2070788 was replicated in the A(H7N9) cohort (Tables 3 and 4). The carriers of rs2070788 genotype GG and GA were conferred an approximately 2-fold higher susceptibility to A(H7N9) infection (P = .02). As expected, rs383510 generated a similar association signal. Collectively, we demonstrated the genetic association of rs2070788 and rs383510 with susceptibility to human A(H7N9) infection.

DISCUSSION
Although GWASs have been widely conducted to understand the genetic basis underlying human diseases in a unbiased manner, the functional basis of the genetic variations leading to disease susceptibility cannot be directly inferred from these studies. On the other hand, genetic predisposition of most human diseases involves many genes, each of which may make a modest contribution. In human infectious diseases, such as avian influenza, the limited sample size of affected patients represents an additional hurdle for the discovery of susceptible/resistance genes via GWASs. The eQTLs are currently the most abundant and systematically surveyed class of functional consequence of genetic variation [18,21]. The local eQTLs (cis-eQTLs) tend to have larger effect on gene expression than distant eQTLs (trans-eQTLs) [22]. The availability of systematically generated eQTLs prioritized the discovery of GWAS and facilitated the identification of causal genes [23].
In the current study, we initiated a small-scale GWAS to investigate the role of genetic variation in the susceptibility to severe A(H1N1)pdm09 influenza. To avoid possible false-positive findings due to the small sample size, we prioritized the candidate variants by their status as a lung cis-eQTL instead of their association P values. We focused on rs2070788 based on its role as a lung cis-eQTL. The genetic association was evaluated in an extended cohort of 409 patients with A(H1N1)pdm09 influenza. The rs2070788 GG genotype, corresponding to higher TMPRSS2 expression in human lung tissues, conferred >2fold higher risk to severe A(H1N1)pdm09 influenza.
Based on its high LD with rs2070788, a SNP in a putative enhancer of TMPRSS2 and an even stronger lung cis-eQTL, rs383510 was identified as the potentially functional variant for the association with severe A(H1N1)pdm09 influenza. Because of the shared usage of TMPRSS2 for replication and pathogenicity of A(H1N1)pdm09 and A(H7N9) viruses, we assessed the genetic association of rs2070788 and rs383510 in another cohort of patients with A(H7N9) influenza and controls. Collectively, we demonstrated that genetic variants with higher TMPRSS2 expression in the lung tissue conferred a higher risk to severe A(H1N1)pdm09 infection. Notably, the same variants also increased susceptibility to A(H7N9) infection.
Influenza virus infection is initiated with the attachment of virus HA to the cellular receptor and the subsequent fusion of viral envelope and cellular endosomal membrane. Posttranslational cleavage of HA by host protease is a prerequisite for the fusion and hence for virus infectivity, tissue tropism, and virus pathogenicity. Most human influenza A virus strains, including H1N1 and H3N2 have a single arginine at the HA cleavage site by trypsin-like proteases, which are mainly present in the respiratory or gastrointestinal tract [24]. TMPRSS2 is the Figure 3. Luciferase activity of the putative enhancer. The putative enhancer region containing rs383510 was inserted in a luciferase reporter vector. Two constructs representing rs383510 genotype TT and genotype CC, as well as unmodified vector pGL3-Basic, were transfected into A549 cells. A Renilla luciferase vector pGL4.70[hRluc] was cotransfected as an internal control, and luciferase assay was performed 32 hours after transfection. The relative luciferase activity (ratio of firefly to Renilla luminescence) is presented as the mean and SD of triplicated transfection in a representative experiment.  major host protease that cleaves and activates the HA of influenza viruses in the human respiratory tract [25]. The highly pathogenic avian influenza viruses, such as subtypes H5 and H7, are cleaved at the multibasic motif by ubiquitous cellular proteases and cause lethal systemic infection [26]. Despite their avian virus origin, the novel A(H7N9) viruses possess a monobasic HA cleavage site [27], consistent with their low pathogenicity in poultry. TMPRSS2-deficient mice were highly tolerant to the lethal challenge of A(H1N1)pdm09 and A(H7N9) viruses, demonstrating the essential role of TMPRSS2 in the pathogenesis of A(H1N1)pdm09 and A(H7N9) virus infections [20].
In this study, we found that the same variant of TMPRSS2 increased the risk to human A(H7N9) influenza and severe A(H1N1)pdm09 influenza. Clinically, the patients with severe A(H1N1)pdm09 influenza and most patients with A(H7N9) influenza displayed similar symptoms. Human A(H1N1)pdm09 influenza primarily presents as an upper respiratory tract infection [28], whereas most patients with A(H7N9) influenza had rapidly progressive pneumonia, manifesting as a lower respiratory tract infection [29]. The receptor-binding specificity may contribute to the distinct clinical presentations of A(H1N1)pdm09 and A(H7N9) infections.
Despite the dual receptor-binding specificity, A(H7N9) HA retains a preference for α2,3-linked sialic acid (α2,3-SA) receptor over α2,6-SA receptor [30,31]. In humans, α2,6-SA receptors are more abundantly expressed in the upper respiratory tract than α2,3-SA receptors, which are predominantly distributed in the lower respiratory tract [32]. Consistent with the receptor-binding pattern, A(H7N9) replication was more efficient in lung tissue than in the trachea [33]. Therefore, the preference of A(H7N9) virus binding to α2,3-SA receptor over the α2,6-SA counterpart and the robust viral replication in lung tissue may constitute the biological basis for the clinical manifestation of human A(H7N9) influenza as primarily a lower respiratory tract infection. Any risk factor, such as a high expression level of endogenous TMPRSS2, may robustly propagate the invading A(H7N9) virus and facilitate its spread to the lower respiratory tract, leading to symptomatic infection.
On the other hand, A(H1N1)pdm09 virus basically binds α2,6-SA receptor; and A(H1N1)pdm09 influenza is basically manifested as an upper respiratory tract infection in most patients. Given the existence of factors that facilitate viral growth and access to the lower respiratory tract, A(H1N1)pdm09 infection from the upper respiratory tract will spread downward to cause viral pneumonia and other complications, a severe outcome of A(H1N1)pdm09 influenza. A recent study clearly demonstrated that the volume of inoculum can affect disease severity of the A(H1N1)pdm09 infection in ferrets. The same dose of virus in a greater inoculation volume can optimally deliver the viruses to the lower respiratory tract and caused more severe diseases in ferrets [34]. Therefore, the viral access to the lower respiratory tract and replication therein constitute the pathological basis of severe A(H1N1)pdm09 infection. Taken together, regardless of viral subtype, both severe A(H1N1)pdm09 influenza and A(H7N9) influenza are lower respiratory tract infections. In this study, we demonstrated that the higher TMPRSS2 expression variant, the rs2070788 G allele, conferred a higher risk of severe A(H1N1)pdm09 infection as well as higher susceptibility to A(H7N9) influenza.
Interestingly, in A(H7N9) influenza, the proportion of male patients was more than double that of female patients [35]. It has been known that androgen can positively regulate TMPRSS2 expression [36]. Therefore, further study might be conducted to compare TMPRSS2 levels in females and males and determine whether androgen-enhanced TMPRSS2 may increase the risk of A(H7N9) influenza in the latter. Apart from influenza viruses, TMPRSS2 is an important host protease for the entry of several coronaviruses causing human infections, such as human coronavirus 229E [37], severe acute respiratory syndrome coronavirus [38], and Middle East respiratory syndrome coronavirus [39] with ongoing outbreaks in the Middle East. It would be interesting to learn whether the interindividual heterogeneity of TMPRSS2 expression may also affect susceptibility to infections with these coronaviruses.
The role of TMPRSS2 in the pathogenesis of A(H1N1)pdm09 and A(H7N9) has been clearly elucidated in mouse studies. In this study, we further demonstrated that the differential TMPRSS2 expression derived from genetic variation may substantially affect disease outcome in affected patients on exposure of the 2 subtypes of virus. The discovery of TMPRSS2 as a susceptibility marker for the severe A(H1N1)pdm09 influenza and human A(H7N9) influenza will help identify high-risk individuals for better prophylactic and therapeutic intervention.