## Abstract

Several genome-wide association studies (GWAS) have demonstrated the association between genetic variants in the major histocompatibility complex (MHC) region and chronic hepatitis B (CHB) virus infection, but it is still unknown about the disease-causing loci and potential mechanisms owing to the complicated linkage disequilibrium for this region. To systematically characterize the MHC variations in relation to the CHB infection, we fine mapped the MHC region on our existing GWAS data with SNP2HLA taken the Pan-Asian panel as reference and finally identified four independent associations. The HLA-DPβ1 amino acid positions 84–87, which drove the effect of reported single nucleotide polymorphisms rs9277535 and rs3077, showed the most significant association (OR = 0.65, P = 2.03 × 10−8). The Leu-15 of HLA-C, conferring the effect of rs3130542, increased the risk of CHB infection independently (OR = 1.61, P = 3.42 × 10−7). The HLA-DRβ1*13, in perfect LD with glutamic at site 71, and rs400488, an expression quantitative trait locus for HLA-J, were newly identified to be associated with CHB infection independently (OR = 1.84, P = 3.84 × 10−9; OR = 0.28, P = 6.27 × 10−7, respectively). HLA-DPβ1 positions 84–87 and HLA-DRβ1 position 71 implicated the P1 and P4 in the antigen-binding groove, whereas HLA-C position 15 affected the signal peptide. These four independent loci together can explain ∼6% of the phenotypic variance for CHB infection, accounting for 72.94% of that explained by known genetic variations. We fine mapped the MHC region and identified four loci that independently drove the chronic HBV infection. The results provided a deeper understanding of the GWAS signals and identified additional susceptibility loci which were missed in previous association studies.

## Introduction

Hepatitis B virus (HBV) infection remains to be a serious public health problem, with >2 billion people affected worldwide (1). Although most HBV infectors display asymptomatic self-limiting infection, it is estimated that ∼350 million people developed into chronic hepatitis B (CHB) infection (1), which is the major risk factor for liver cirrhosis and hepatocellular carcinoma (HCC) (2,3). HBV prevalence was very high in sub-Saharan regions and East Asia, and China is one of the endemic areas with the Hepatitis B surface antigen (HBsAg) prevalence between 5–7% (4).

The progression from HBV infection to CHB infection is influenced by multifactor, including age, viral infection, environmental factors and host genetic makeup (5). Recently, genome-wide association studies (GWAS) in Chinese, Japanese and Korean populations have reported 13 single nucleotide polymorphisms (SNPs) associated with CHB infection (6–10). Eleven of these SNPs were located at the major histocompatibility complex (MHC) region, an ∼4 Mb spanning on chromosomal region 6p21.3. The SNPs in HLA-DP/DQ (rs3077, rs9277535, rs2856718 and rs7453920) were firstly identified associated with CHB infection by Koichi et al. and validated by multiple independent studies (8,11–13). In the following studies with larger sample size and more rigorous study design, SNPs in HLA-DOA (rs378352) (7), HLA-C (rs3130542 and rs2853953) (6,7), CFB (rs12614) (7), NOTCH4 (rs422951) (7), TCF19 (rs1419881) (9) and EHMT2 (rs652888) (9) were also revealed to be associated with CHB infection. Even though these results confirmed the association between MHC region and CHB infection, the causal variants and potential mechanisms were still unclear because most of the reported SNPs were proxy without direct biological function.

HLA genes mainly encode cell surface proteins that display antigenic peptides to effector immune cells and regulate immune responses (14). Variations in amino acid residues at specific positions may alter the repertoire of presented peptides, thus influencing the response of host immune. Recently, several fine mapping studies in the MHC region have identified a number of amino acid residues or classical alleles of HLA that contribute to the risk of immune-related diseases, such as celiac disease, Graves' disease, type 1 diabetes, rheumatoid arthritis, psoriasis vulgaris and idiopathic achalasia (14–16). However, the fine mapping study of MHC region in CHB infection has not been reported until now. In this study, we performed an imputation study with SNP2HLA software on our pre-existing CHB GWAS data and explored the potential causal variants that drove predisposition to Chronic Hepatitis B in Han Chinese.

## Results

### Summary of the previously reported loci

Among the 13 CHB infection-related variants, three variants, rs3130542 at HLA-C, rs7453920 at HLA-DQB2 and rs4821116 at UBE2L3 were significant at GWAS level (P < 5 × 10−8) and reported in our previous work with two-stage validation (6). Another five loci, including rs12614 at CFB, rs2856718 at HLA-DQB1-DQA2, rs3077 at HLA-DPA1, rs9277535 at HLA-DPB1and rs1883832 at CD40, also showed P < 0.05. However, we did not confirm the associations of rs1419881 at TCF19, rs422951 at NOTCH4 and rs378352 at HLA-DOA with CHB infection even though the association directions were consistent. And rs2853953 was excluded from our analysis because of low imputation quality (Supplementary Material, Table S2).

### Association analysis of classical HLA alleles with CHB infection

As described above, a total of 170 classical HLA alleles were included in our analysis (Supplementary Material, Table S3). The results showed that HLA-C*07 (OR = 1.56, P = 4.04 × 10−6) and HLA-DRB1*13 (OR = 0.31, P = 2.18 × 10−6) were significantly associated with CHB infection at our predefined significance threshold (9.18 × 10−6). The HLA-C*07 was in high linkage disequilibrium (17) with our previous discovered SNP rs3130542 (r2 = 0.93, D′ = 0.99). However, the HLA-DRB1*13 was independent from any of the reported loci. Several classical alleles in HLA-DP/DQ/DR (such as HLA-DPA1*01 and HLA-DPA1*01:03 in LD with rs3077 (r2 = 1.00, D′ = 1.00), HLA-DPB1*02 in LD with rs9277535 (r2 = 0.51, D′ = 1.00), HLA-DQB1*03:02 in LD with rs7453920 (r2 = 0.34, D′ = 0.70) and HLA-DRB1*13:02 in LD with rs12614 (R2 = 0.37, D′ = 0.97)) also showed suggestive associations with CHB risk (P = 1.79 × 10−4∼6.40 × 10−5), but none of these variants reached our predefined significance level.

### Association analysis of HLA amino acid polymorphisms with CHB infection

We then evaluated the associations of HLA amino acid polymorphisms with CHB risk and found that 45 amino acid polymorphisms or sites in three HLA genes (HLA-DPB1, HLA-C and HLA-DRB1) showed significant associations at P < 9.18 × 10−6 (Supplementary Material, Table S4). These amino acid polymorphisms were in high LD within respective genes (Supplementary Material, Fig. S2) and leaded by HLA-DPβ1 residues Gly84, Gly85, Pro86 and Met87 (OR = 0.66, P = 5.90 × 10−8), HLA-C residue Leu-15 (OR = 1.59, P = 2.80 × 10−7) and HLA-DRβ1 residue Glu71 (OR = 0.31, P = 2.35 × 10−6), respectively. The amino acid polymorphisms at sites 84–87 of HLA-DPβ1 were adjacent in structure and in perfect LD with each other (r2 = 1.00, D′ = 1.00). The amino acid residue Leu-15 of HLA-C and Glu71 of HLA-DRβ1 was in high LD with the identified two-digit HLA allele HLA-C*07 (r2 = 0.79, D′ = 0.98) and HLA-DRB1*13 (r2 = 0.99, D′ = 1.00), respectively. Additionally, two multiallelic amino acid sites, including HLA-DRβ1 position 13 (Pomnibus = 1.59 × 10−6) and HLA-DRβ1 position 71 (Pomnibus = 5.98 × 10−6), were also significantly associated with CHB risk.

### Independent HLA associations drive CHB infection risk

Given the complexity and extensive LD nature of the MHC region, we aimed to identify independent variants that may drive CHB risk using stepwise conditional analysis. The most significant variants in each step were included in the next model as covariant until no variants reached the study-wide significance level (P > 9.18 × 10−6) (Supplementary Material, Table S5). When we conditioned on age, sex and the significant principal-component analysis (PCA), the HLA-DPβ1 amino acid positions 84–87 showed the most significant associations (OR = 0.66, P = 5.89 × 10−8) (Fig. 1A, Supplementary Material, Table S6 and Fig. S3A). When conditioning on HLA-DPβ1 amino acid positions 84–87, we detected the second significant independent association at HLA-C amino acid position 15 (OR = 1.57, P = 7.48 × 10−7) (Fig. 1B, Supplementary Material, Table S6 and Fig. S3B). When conditioning on HLA-DPβ1 and HLC-C, we observed the third significant independent association rs400488 (OR = 1.59, P = 3.91 × 10−6), which located at the first exon of long noncoding RNA HCG9 (Fig. 1C, Supplementary Material, Table S6 and Fig. S3C). When conditioning on HLA-DPβ1, HLA-C and HCG9, we found the fourth most significant association to be the classical two-digit HLA allele HLA-DRB1*13 (OR = 0.27, P = 4.64 × 10−7) (Fig. 1D, Supplementary Material, Table S6 and Fig. S3D). When conditioning on HLA-DPB1, HLA-C, HCG9 and HLA-DRB1, no variants in the MHC region satisfied the study-wide significance threshold (Fig. 1E).

Figure 1

Regional association plots of HLA loci independently associated with CHB infection. HLA variants, including SNPs, classical alleles and amino acid polymorphisms, were tested for CHB association, using the imputed allelic dosage (between 0 and 2). In each panel, the diamonds represent –log10 (P values) for the variants and the green line marks P = 9.18 × 10−6. (A) The strongest associations were amino acid positions 84–87 (OR = 0.66; P = 5.89 × 10−8) in HLA-DPβ1 locus, which in strong LD (r2 = 1, D′ = 1). See Supplementary Material, Table S5 for associations of all markers. (B) After adjusting for amino acid positions 84–87 of HLA-DPβ1, the strongest independent signal was amino acid site 15 of HLA-C (OR = 1.57; P = 7.48 × 10−7). (C) Adjusting for HLA-DPB1 and HLA-C, the next strongly associated variant was rs400488 in a long noncoding RNA HCG9 (OR = 1.59; P = 3.91 × 10−6). (D) The final independent association, after additionally conditioning for HCG9, was in HLA-DRB1, led by a classical HLA allele HLA-DRβ1*13 (OR = 0.27; P = 4.64 × 10−7), which was in strong LD with glutamic at site 71 (r2 = 0.99, D′ = 1). (E) After additionally conditioning for HLA-DRB1, we found no additional independent associations in the MHC region.

Figure 1

Regional association plots of HLA loci independently associated with CHB infection. HLA variants, including SNPs, classical alleles and amino acid polymorphisms, were tested for CHB association, using the imputed allelic dosage (between 0 and 2). In each panel, the diamonds represent –log10 (P values) for the variants and the green line marks P = 9.18 × 10−6. (A) The strongest associations were amino acid positions 84–87 (OR = 0.66; P = 5.89 × 10−8) in HLA-DPβ1 locus, which in strong LD (r2 = 1, D′ = 1). See Supplementary Material, Table S5 for associations of all markers. (B) After adjusting for amino acid positions 84–87 of HLA-DPβ1, the strongest independent signal was amino acid site 15 of HLA-C (OR = 1.57; P = 7.48 × 10−7). (C) Adjusting for HLA-DPB1 and HLA-C, the next strongly associated variant was rs400488 in a long noncoding RNA HCG9 (OR = 1.59; P = 3.91 × 10−6). (D) The final independent association, after additionally conditioning for HCG9, was in HLA-DRB1, led by a classical HLA allele HLA-DRβ1*13 (OR = 0.27; P = 4.64 × 10−7), which was in strong LD with glutamic at site 71 (r2 = 0.99, D′ = 1). (E) After additionally conditioning for HLA-DRB1, we found no additional independent associations in the MHC region.

We then conducted a multivariate regression model that incorporated these independent risk-associated HLA variants with characteristics (Table 1). We observed decreased risk associated with HLA-DPβ1 positions 84–87 (OR = 0.65, P = 2.03 × 10−8), and increased risk associated with HLA-C Leu-15 (OR = 1.61, P = 3.42 × 10−7). The A allele of rs400488 was related with increased risk of CHB infection (OR = 1.84, P = 3.84 × 10−9). Individuals with classical HLA allele of HLA-DRβ1*13, which in strong LD with Glu71 of HLA-DRβ1, showed decreased risk of CHB infection (OR = 0.28, P = 6.27 × 10−7).

Table 1.

Associations of the HLA variants with chronic HBV infection in Han Chinese

HLA variants Reference allelea Effect alleleb EAF

OR (95%CI)c Pc
Cases (N = 951) Controls (N = 937)
HLA-DPβ1 amino acid positions 84–87 Asn/Glu/Ala/Val Gly/Gly/Pro/Met 0.30 0.40 0.65 (0.56–0.75) 2.03 × 10−8
HLA-C amino acid position 15 Ile Leu 0.23 0.16 1.61 (1.34–1.93) 3.42 × 10−7
rs400488 0.19 0.14 1.84 (1.50–2.25) 3.84 × 10−9
HLA_DRB1_13 Absent Present 0.01 0.04 0.28 (0.17–0.47) 6.27 × 10−7
HLA variants Reference allelea Effect alleleb EAF

OR (95%CI)c Pc
Cases (N = 951) Controls (N = 937)
HLA-DPβ1 amino acid positions 84–87 Asn/Glu/Ala/Val Gly/Gly/Pro/Met 0.30 0.40 0.65 (0.56–0.75) 2.03 × 10−8
HLA-C amino acid position 15 Ile Leu 0.23 0.16 1.61 (1.34–1.93) 3.42 × 10−7
rs400488 0.19 0.14 1.84 (1.50–2.25) 3.84 × 10−9
HLA_DRB1_13 Absent Present 0.01 0.04 0.28 (0.17–0.47) 6.27 × 10−7

aHLA-DPβ1 amino acid at position 84, 85, 86 and 87 were in high LD (r2 = 1). The reference amino acid for position 84 was aspartic acid, 85 were glutamic acid, 86 were alanine and 87 were valine. The effect amino acid for position 84 were glycine, 85 were glycine, 86 were proline, 87 were methionine.

bEAF, effect allele frequency.

cObtained from the multivariate regression model that included age, sex, PCA and the amino acid polymorphisms, SNPs or HLA alleles.

### LD analysis of the independent HLA variants with previous GWAS reported SNPs

In order to investigate the relationships of the identified four independent HLA variants and previous GWAS reported SNPs, we performed a systematically LD analysis in the MHC region (Supplementary Material, Table S7). The HLA-DPβ1 amino acid positions 84–87 were in high LD with rs9277535 at HLA-DPB1 (r2 = 0.83, D′ = 1.00) and in medium LD with rs3077 at HLA-DPA1 (r2 = 0.44, D′ = 0.75). When conditioning on amino acid positions 84–87 of HLA-DPβ1, we found the associations of rs9277535 and rs3077 were not significant (P = 0.323 and 0.603, respectively), whereas amino acid positions 84–87 in HLA-DPβ1 were nominally significant when conditioning on rs9277535 (P = 0.002), rs3077 (P = 1.96 × 10−4) or both (P = 0.006). Leu-15 of HLA-C was in moderate LD with rs3130542 at downstream location of HLA-C (r2 = 0.75, D′ = 0.99), and the association of rs3130542 was not significant after conditioning on Leu-15 of HLA-C (P = 0.706), while when conditioning on rs3130542, Leu-15 of HLA-C was still associated with CHB risk (P = 0.033). These results suggested that the previously reported associations with rs9277535 and rs3077 probably reflected the primary risk of amino acid polymorphisms at sites 84–87 of HLA-DPβ1, and rs3130542 might represent the effect of Leu-15 in HLA-C. In comparison, the rs400488 at HCG9 and HLA-DRB1*13 were independent from the reported SNPs, suggesting to be new susceptibility loci.

### Key amino acids are located in the peptide-contacting grooves and signal peptide

HLA-DPβ1 positions 84–87 and HLA-DRβ1 positions 13, 71 are located in the peptide-contacting grooves of respective HLA molecule (Fig. 2A and B). The changes at positions 84–87 of HLA-DP in pocket-1 and positions 13, 71 of HLA-DRβ1 in pocket-4 could produce alterations in the groove-binding peptide residues, which suggested their function was involved in antigen-presentation ability or protein stability (14,18). However, HLA-C position 15 is located at a leader peptide of HLA class I histocompatibility antigen (Cw-7 alpha chain) rather than the main chain of HLA-C (Fig. 2C), indicating their functional contributions to protein localization and transmembrane transport.

Figure 2.

Three-dimensional ribbon models of the HLA proteins associated with CHB infection risk. The protein structures of HLA-DP, HLA-DR and HLA-C are based on Protein Data Bank (PDB) entries 3LQZ, 3PDO and 3BZF, respectively, which were prepared using Pymol version 1.7. Residues at the CHB risk-associated amino acid positions are highlighted as red spheres.

Figure 2.

Three-dimensional ribbon models of the HLA proteins associated with CHB infection risk. The protein structures of HLA-DP, HLA-DR and HLA-C are based on Protein Data Bank (PDB) entries 3LQZ, 3PDO and 3BZF, respectively, which were prepared using Pymol version 1.7. Residues at the CHB risk-associated amino acid positions are highlighted as red spheres.

### Rs400488 probably to be an eQTL SNP of HLA-J

The rs400488 located at the first exon of the long non-coding RNA HCG9, whose function has not been determined until now. The genomic position of rs400488 was also parallelized with the intron of HLA-G and HLA-J. To explore the potential mechanism behind the association, we evaluated the association between the genotype of rs400488 and the expression of HCG9, HLA-G and HLA-J in 55 Asian samples of TCGA (Supplementary Material, Fig. S4). The results demonstrated that the rs400488 was associated with the expression of HLA-J significantly (P = 2.71 × 10−5).

### Variance explained by the independent HLA variants

On the basis of the variants identified in our study and those reported by previous GWAS, we estimated the proportion of phenotypic variance explained with a liability threshold model assuming a disease prevalence of 5, 6 and 7%, respectively (Table 2). The four independent loci identified in our study could explain 5.4, 5.7 and 6.0%, while 10 of the GWAS reported SNPs at MHC region account for only 3.6, 3.8 and 4.0% of the total phenotypic variance at the prevalence of 5, 6 and 7%, respectively. Correspondingly, variants in the non-MHC region explained additional 1.8, 1.9 and 2.0% phenotypic variance. In total, all these variants together can explain 7.5, 7.9 and 8.3% of the phenotypic variance at the respective prevalence. The four identified variants alone can explain ∼72.94% of the phenotypic variance owing to genetic variations.

Table 2.

Heritability estimates for our identified and GWAS reported variants in chronic HBV infection

Modela h2(SE) observed scale h2(SE) liability scale

Prevalence 5% Prevalence 6% Prevalence 7%
The four independent HLA variants 0.062 0.054 0.057 0.06
SNPs in MHC region identified by GWAS (10 SNPs) 0.041 0.036 0.038 0.04
SNPs in non-MHC regions identified by GWAS (two SNPs) 0.021 0.018 0.019 0.02
Combined 0.085 0.075 0.079 0.083
Modela h2(SE) observed scale h2(SE) liability scale

Prevalence 5% Prevalence 6% Prevalence 7%
The four independent HLA variants 0.062 0.054 0.057 0.06
SNPs in MHC region identified by GWAS (10 SNPs) 0.041 0.036 0.038 0.04
SNPs in non-MHC regions identified by GWAS (two SNPs) 0.021 0.018 0.019 0.02
Combined 0.085 0.075 0.079 0.083

aVariants identified in our current analysis, and those in the MHC region or non-MHC regions reported in previous GWAS studies, were used to estimate the heritability variances of chronic HBV infection separately.

## Discussion

Fine mapping the MHC locus is very important for immune-related diseases that were implicated with MHC SNPs in GWAS. Using the genotype data from SNP array and reliable HLA references, recent studies have achieved a great success in celiac disease, Graves' disease, type 1 diabetes, rheumatoid arthritis, psoriasis vulgaris and idiopathic achalasia (14–16). In this study, we imputed the MHC region with our pre-existing GWAS data on CHB infection based on Pan-Asian reference panel, and identified four loci that were independently associated with the CHB infection. The HLA-DPβ1 positions 84–87 and HLA-C amino acid position 15 represented the SNPs reported by previous GWAS, but rs400488 at HCG9 and HLA-DRB1*13 were new identified loci that modulated CHB infection risk independently. These four variants together captured over 72.94% of the phenotypic variance explained by all identified genetic loci.

The amino acid positions 84–87 were highly correlated with previous GWAS reported loci (rs9277535 and rs3077) but were independent from classical HLA-DP alleles. Conditional analysis suggested that the previously reported SNPs rs9277535 and rs3077 might be derived from the primary risk of these sites. Given the complete LD in these positions, individuals were observed with either amino acid sequence NEAV or GGPM. Structural models for HLA-DP revealed that the substitution of GGPM could produce alterations in the contact area between HLA-DPα and β chains as well as in the part of the groove-contacting peptide residues pocket-1, which played a crucial role in T-cell receptor binding with MHCII complex (18,19). Studies on the HLA-DPβ1 amino acid 86 indicated that the variant was related to the way peptides bind to the groove rather than the peptide affinity to modify the MHC-peptide conformation (20). Polymorphisms at these amino acid positions had been involved in the susceptibility for Follicular Lymphoma and Hodgkin lymphoma previously (21). Thus, HLA-DPβ1 amino acid substitution at positions 84–87 may be causal variants that modify the susceptibility to CHB infection.

The HLA-C amino acid position 15 was in strong LD with our previous reported GWAS SNP rs3130542 and the identified classical two-digit HLA allele HLA-C*07. Another CHB GWAS in Chinese population replicated the association with rs3130542 recently (7). In our study, we observed rs3130542 was in strong LD with HLA-C*0702 (r2 = 0.91, D′ = 0.99) and HLA-C*07 (r2 = 0.93, D′ = 0.99). Conditional analysis further indicated that HLA-C amino acid position 15 might be the causal variant of this signal. The HLA-C gene contained 8 exons, exon 1 encoding the leader peptide, exons 2 and 3 encoding the alpha 1 and alpha 2 domains, exon 4 encoding the alpha 3 domain, exon 5 encoding the transmembrane region, and exons 6 and 7 encoding the cytoplasmic tail. The amino acid position 15 corresponding to rs2308527 was located at the exon 1 of HLA-C and usually laid the signal peptide, which direct the newly synthesized protein to the protein-conducting channels. Variants at the signal peptides were strongly associated with the efficiency of protein secretion (22).

The rs400488 at HCG9 was independent from any of the reported SNPs, suggesting being a new susceptibility locus of CHB. The genotype of rs400488 was re-imputed using IMPUTE2 taking the haplotype information from the 1000 Genomes Project (Vision 3) as reference, and the concordance rate was >99%. It was reported that the rs400488 was associated with mean DNA methylation of HCG9 in postmortem brains with bipolar disorder (23). Our expression quantitative trait loci analysis showed the rs400488 was associated with the expression of HCG9 marginally but significantly for HLA-J in liver tissue. The HLA-J possibly to be a transcribed pseudogene of HLA-A, whose function were still unclear until now.

The HLA allele HLA-DRB1*13, which is in perfect LD with Glu71 of HLA-DRβ1, has been related with protection against persisting hepatitis B in an Iranian population and shows a strong association with the clearance of HBV in Thai population (24,25). However, the HLA-DRB1*13 was a low-frequency variant (MAF < 0.05) in Han Chinese and was not associated with CHB infection in Han until now. The Glu71 of HLA-DRβ1 was located in pocket-4 of the peptide-contacting grooves, and the amino acid site had been linked with type 1 diabetes, multiple sclerosis, rheumatoid arthritis and follicular lymphoma previously (14,26). It should be point out the multiallelic amino position 13 of HLA-DRB1, which was significantly associated with CHB risk at the primary analysis, was still significantly related with CHB infection even after conditioning on the identified four variants (Pomnibus = 6.15 × 10−4). This suggested that there might be another signal in the region of HLA-DRB1 which was missed because of the limitation of our sample size.

In summary, we fine mapped the MHC region using existing GWAS data of CHB infection and identified four loci that independently drove the chronic HBV infection. The results provided a deeper understanding of the GWAS signals and identified additional susceptibility loci which were missed because of the limitation of genotyping chips or strict quality control standards. However, it also should be noted that moderate sample size is the main limitation of this study, further validation studies with larger sample size are warranted in the future.

## Materials and Methods

### Study subjects and genotyping

We utilized our previously reported GWAS data with 951 chronic HBV carriers and 937 controls from East China (6). In our study, cases should satisfy the following criteria: (i) positive for HBsAg and HBV core antibody (HBcAb) and (ii) negative for specific antibody to hepatitis C virus (HCV). Controls were those who declared no HBV vaccination and naturally cleared HBV, meeting the following criteria: (i) negative for HBsAg and antibody specific for HCV and (ii) positive for hepatitis B surface antibody and HBcAb (Supplementary Material, Table S1). The mean age for HBV carriers and controls were 49.6 and 50.8, respectively. The proportion of male in HBV carriers was higher than that in controls (83.2 versus 52.6%). Each subject provided informed consent at recruitment and the study was conformed to the ethical guidelines of the 1975 Declaration of Helsinki as reflected in a priori approval by the institutional review board of Nanjing Medical University.

Cases of HBV carriers were genotyped using Illumina Human OmniExpress12v1 chips (731 442 SNPs), and controls were genotyped using OmniZhongHua chips (900 015 SNPs), which resulted in 595 310 SNPs available on both chips for further analysis. Based on quality control procedures, we excluded SNPs with a call rate of <95%, SNPs with minor allele frequency (MAF) <0.05 and SNPs deviated from the Hardy–Weinberg equilibrium (P < 1 × 10−5) in the controls. We also removed samples with genotype call rates <95%, duplicates or probable relatives and outlier samples identified by PCA. After quality control, a total of 490 610 SNPs in 951 cases and 937 controls were included in this analysis.

### Imputation of HLA variants

The genotype data of SNPs located in the entire MHC region (3779 SNPs in the genomic region from 29 to 34 Mb on chromosome 6, NCBI Build 37) were firstly extracted from the above existing CHB GWAS data. Then, we imputed two- and four-digit classical HLA alleles and amino acid polymorphisms for HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1 and HLA-DPB1, as well as additional SNPs that were not genotyped in original GWAS data. The Pan-Asian reference panel including Han Chinese, Southeast Asian Malay, Tamil Indian ancestries and Japanese were used in our analysis with SNP2HLA software (27–29). In the post-imputation quality control procedures, we exclude HLA variants with low imputation quality (INFO < 0.5) or variants with MAF < 0.01. For each variant, the INFO score was calculated from the ratio of the observed variance in dosage to the expected variance under Hardy–Weinberg equilibrium. Finally, a total of 5375 variants, including 4356 SNPs, 849 amino acid polymorphisms, 70 two-digit classical alleles and 100 four-digit alleles, were kept for further association analysis (Supplementary Material, Fig. S1).

### Association analysis of HLA variants with CHB infection risk

All of the HLA variants were defined as biallelic SNPs, two- and four-digit biallelic classical HLA alleles, biallelic HLA amino acid polymorphisms for respective residues and multiallelic HLA amino acid polymorphisms for each amino acid site. For biallelic variants, traditional logistic regression model assuming additive effects of allele dosages on a log (odds) scale were used adjusting for age, sex and the significant eigenvector (the first PCA) as our previous GWAS study. For the multiallelic amino acid polymorphisms at each site, the most frequent variant was set as reference and excluded from the regression model, thereby obtaining the following logistic regression model:

(1)
$log⁡(oddsk)=β0+∑i=1m−1β1,ixi,k+β2Agek+β3Sexk+β4PCAk$
where β0 is the logistic regression intercept and β1,i is the additive effect size of the dosage of allele i (6) of the kth subject for the variant consisting of m alleles. The variables β2, β3 and β4 are the effect sizes of the age, sex and the eigenvector covariates, respectively. The null model is represented by
(2)
$log⁡(oddsk)=β0+β2Agek+β3Sexk+β4PCAk$

An omnibus P-value (Pomnibus) was obtained for each HLA amino acid position on the basis of a log-likelihood ratio test (3) by comparing the likelihood L0 of the null model (2) against the likelihood L1 of the fitted model including m-1 alleles at the site (1). The log-likelihood ratio test is represented by

(3)
$D=−2lnL0L1,D∼χ(m−1)2$
where D represents the deviance, which follows an approximate χ2 distribution with m − 1 degrees of freedom (16). In the remaining 5375 variants, a total of 74 amino acid sites were multiallelic amino acid polymorphisms, as a result, we considered the study-wide significance threshold to be P = 9.18 × 10−6 (0.05/(5375 + 74)). Stepwise conditional analysis, which including the most significant HLA variants as covariates in the next model, was also used to explore independently effective loci in a forward stepwise manner on the basis of the study-wide significance threshold.

### eQTL analysis

The genotype data, copy number variations (CNVs) and expectation–maximization normalized read counts in HCC were downloaded from TCGA on July, 2014. A total of 55 Asian patients with matched genotypes, CNV and gene expression data were used in our analysis. We firstly imputed the HLA variants with SNP2HLA taken the Pan-Asian panel as reference, and then calculated the associations between the SNPs and expression (log transformed) of corresponding genes using a regression model adjusting for age, gender, population structure (the top 10 PCAs) and CNVs across these genes.

### Variance explained

The phenotypic variances explained by specific groups of genetic variants were estimated using the fixed effects model from individual associations as described previously (30). Variants identified in our current analysis and those in the MHC region or non-MHC regions reported in previous GWAS studies were used to calculate the respective variances by assuming the prevalence of CHB infection to be 5, 6 or 7% (4).

## Supplementary Material

Supplementary Material is available at HMG online.

## Funding

This work was supported by the National Science Foundation for Distinguished Young Scholars of China (81225020), the National Key Basic Research Program For Youth (2013CB911400), the National Program for Support of Top-notch Young Professionals, the National Major S&T Projects (2009ZX10004-904, 2011ZX10004-902, 2013ZX10004-905), the Foundation for the Program for New Century Excellent Talents in University (NCET-10-0178), the Foundation for the Author of National Excellent Doctoral Dissertation (201081), Jiangsu Specially-appointed Professor Project the National Natural Science Foundation of China (31100895), the Outstanding Young Scholar Project of Fudan University; Jiangsu Province Clinical Science and Technology Projects (BL2012008), and the Priority Academic Program for the Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine).

## Acknowledgements

We thank all participants who contributed to this work.

Conflict of Interest statement. None declared.

## References

1
Liaw
Y.F.
,
Chu
C.M.
(
2009
)
Hepatitis B virus infection
.
Lancet
,
373
,
582
592
.
2
McMahon
B.J.
,
Alward
W.L.
,
Hall
D.B.
,
Heyward
W.L.
,
Bender
T.R.
,
Francis
D.P.
,
Maynard
J.E.
(
1985
)
Acute hepatitis B virus infection: relation of age to the clinical expression of disease and subsequent development of the carrier state
.
J. Infect. Dis
,
151
,
599
603
.
3
Torre
L.A.
,
Bray
F.
,
Siegel
R.L.
,
Ferlay
J.
,
Lortet-Tieulent
J.
,
Jemal
A.
(
2015
)
Global cancer statistics, 2012
.
CA Cancer J. Clin.
,
65
,
87
108
.
4
Ott
J.J.
,
Stevens
G.A.
,
Groeger
J.
,
Wiersma
S.T.
(
2012
)
Global epidemiology of hepatitis B virus infection: new estimates of age-specific HBsAg seroprevalence and endemicity
.
Vaccine
,
30
,
2212
2219
.
5
He
Y.L.
,
Zhao
Y.R.
,
Zhang
S.L.
,
Lin
S.M.
(
2006
)
Host susceptibility to persistent hepatitis B virus infection
.
World J. Gastroenterol.
,
12
,
4788
4793
.
6
Hu
Z.
,
Liu
Y.
,
Zhai
X.
,
Dai
J.
,
Jin
G.
,
Wang
L.
,
Zhu
L.
,
Yang
Y.
,
Liu
J.
,
Chu
M.
et al
. (
2013
)
New loci associated with chronic hepatitis B virus infection in Han Chinese
.
Nat. Genet.
,
45
,
1499
1503
.
7
Du
J.
,
Zhu
X.
,
Xie
C.
,
Dai
N.
,
Gu
Y.
,
Zhu
M.
,
Wang
C.
,
Gao
Y.
,
Pan
F.
,
Ren
C.
et al
. (
2015
)
Telomere length, genetic variants and gastric cancer risk in a Chinese population
.
Carcinogenesis
,
36
,
963
970
.
8
Kamatani
Y.
,
Wattanapokayakit
S.
,
Ochi
H.
,
Kawaguchi
T.
,
Takahashi
A.
,
Hosono
N.
,
Kubo
M.
,
Tsunoda
T.
,
Kamatani
N.
,
Kumada
H.
et al
. (
2009
)
A genome-wide association study identifies variants in the HLA-DP locus associated with chronic hepatitis B in Asians
.
Nat. Genet.
,
41
,
591
595
.
9
Kim
Y.J.
,
Kim
H.Y.
,
Lee
J.H.
,
Yu
S.J.
,
Yoon
J.H.
,
Lee
H.S.
,
Kim
C.Y.
,
Cheong
J.Y.
,
Cho
S.W.
,
Park
N.H.
et al
. (
2013
)
A genome-wide association study identified new variants associated with the risk of chronic hepatitis B
.
Hum. Mol. Genet.
,
22
,
4233
4238
.
10
Mbarek
H.
,
Ochi
H.
,
Urabe
Y.
,
Kumar
V.
,
Kubo
M.
,
Hosono
N.
,
Takahashi
A.
,
Kamatani
Y.
,
Miki
D.
,
Abe
H.
et al
. (
2011
)
A genome-wide association study of chronic hepatitis B identified novel risk locus in a Japanese population
.
Hum. Mol. Genet.
,
20
,
3884
3892
.
11
Hu
L.
,
Zhai
X.
,
Liu
J.
,
Chu
M.
,
Pan
S.
,
Jiang
J.
,
Zhang
Y.
,
Wang
H.
,
Chen
J.
,
Shen
H.
et al
. (
2012
)
Genetic variants in human leukocyte antigen/DP-DQ influence both hepatitis B virus clearance and hepatocellular carcinoma development
.
Hepatology
,
55
,
1426
1431
.
12
Lam
Y.F.
,
Wong
D.K.
,
Seto
W.K.
,
To
K.K.
,
Hung
I.F.
,
Fung
J.
,
Lai
C.L.
,
Yuen
M.F.
(
2014
)
HLA-DP and gamma-interferon receptor-2 gene variants and their association with viral hepatitis activity in chronic hepatitis B infection
.
J. Gastroenterol. Hepatol.
,
29
,
533
539
.
13
Cheng
L.
,
Sun
X.
,
Tan
S.
,
Tan
W.
,
Dan
Y.
,
Zhou
Y.
,
Mao
Q.
,
Deng
G.
(
2014
)
Effect of HLA-DP and IL28B gene polymorphisms on response to interferon treatment in hepatitis B e-antigen seropositive chronic hepatitis B patients
.
Hepatol. Res.
,
44
,
1000
1007
.
14
Hu
X.
,
Deutsch
A.J.
,
Lenz
T.L.
,
Onengut-Gumuscu
S.
,
Han
B.
,
Chen
W.M.
,
Howson
J.M.
,
Todd
J.A.
,
de Bakker
P.I.
,
Rich
S.S.
et al
. (
2015
)
Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk
.
Nat. Genet.
,
47
,
898
905
.
15
Gutierrez-Achury
J.
,
Zhernakova
A.
,
Pulit
S.L.
,
Trynka
G.
,
Hunt
K.A.
,
Romanos
J.
,
Raychaudhuri
S.
,
van Heel
D.A.
,
Wijmenga
C.
,
de Bakker
P.I.
(
2015
)
Fine mapping in the MHC region accounts for 18% additional genetic risk for celiac disease
.
Nat. Genet.
,
47
,
577
578
.
16
Okada
Y.
,
Momozawa
Y.
,
Ashikawa
K.
,
Kanai
M.
,
Matsuda
K.
,
Kamatani
Y.
,
Takahashi
A.
,
Kubo
M.
(
2015
)
Construction of a population-specific HLA imputation reference panel and its application to Graves’ disease risk in Japanese
.
Nat. Genet.
,
47
,
798
802
.
17
Lenz
T.L.
,
Deutsch
A.J.
,
Han
B.
,
Hu
X.
,
Okada
Y.
,
Eyre
S.
,
Knapp
M.
,
Zhernakova
A.
,
Huizinga
T.W.
,
Abecasis
G.
et al
. (
2015
)
Widespread non-additive and interaction effects within HLA loci modulate the risk of autoimmune diseases
.
Nat. Genet.
,
47
,
1085
1090
.
18
Diaz
G.
,
Amicosante
M.
,
Jaraquemada
D.
,
Butler
R.H.
,
Guillen
M.V.
,
Sanchez
M.
,
Nombela
C.
,
Arroyo
J.
(
2003
)
Functional analysis of HLA-DP polymorphism: a crucial role for DPbeta residues 9, 11, 35, 55, 56, 69 and 84-87 in T cell allorecognition and peptide binding
.
Int. Immunol.
,
15
,
565
576
.
19
Wang
J.H.
,
Reinherz
E.L.
(
2002
)
Structural basis of T cell recognition of peptides bound to MHC molecules
.
Mol. Immunol.
,
38
,
1039
1049
.
20
Wu
S.
,
Gorski
J.
(
1997
)
Polymorphism at beta 85 and not beta 86 of HLA-DR1 is predominantly responsible for restricting the nature of the anchor side chain: implication for concerted effects of class II MHC polymorphism
.
Int. Immunol.
,
9
,
1495
1502
.
21
Skibola
C.F.
,
Berndt
S.I.
,
Vijai
J.
,
Conde
L.
,
Wang
Z.
,
Yeager
M.
,
de Bakker
P.I.
,
Birmann
B.M.
,
Vajdic
C.M.
,
Foo
J.N.
et al
. (
2014
)
Genome-wide association study identifies five susceptibility loci for follicular lymphoma outside the HLA region
.
Am. J. Hum. Genet.
,
95
,
462
471
.
22
Kober
L.
,
Zehe
C.
,
Bode
J.
(
2013
)
Optimized signal peptides for the development of high expressing CHO cell lines
.
Biotechnol. Bioeng.
,
110
,
1164
1173
.
23
Kaminsky
Z.
,
Tochigi
M.
,
Jia
P.
,
Pal
M.
,
Mill
J.
,
Kwan
A.
,
Ioshikhes
I.
,
Vincent
J.B.
,
Kennedy
J.L.
,
Strauss
J.
et al
. (
2012
)
A multi-tissue analysis identifies HLA complex group 9 gene methylation differences in bipolar disorder
.
Mol. Psychiatry.
,
17
,
728
740
.
24
Kummee
P.
,
Tangkijvanich
P.
,
Poovorawan
Y.
,
Hirankarn
N.
(
2007
)
Association of HLA-DRB1*13 and TNF-alpha gene polymorphisms with clearance of chronic hepatitis B infection and risk of hepatocellular carcinoma in Thai population
.
J. Viral. Hepat.
,
14
,
841
848
.
25
Ramezani
A.
,
Hasanjani Roshan
M.R.
,
Kalantar
E.
,
Eslamifar
A.
,
Banifazl
M.
,
Taeb
J.
,
Aghakhani
A.
,
Gachkar
L.
,
Velayati
A.A.
(
2008
)
Association of human leukocyte antigen polymorphism with outcomes of hepatitis B virus infection
.
J. Gastroenterol. Hepatol.
,
23
,
1716
1721
.
26
Greer
J.M.
,
Pender
M.P.
(
2005
)
The presence of glutamic acid at positions 71 or 74 in pocket 4 of the HLA-DRbeta1 chain is associated with the clinical course of multiple sclerosis
.
J. Neurol. Neurosurg. Psychiatry
,
76
,
656
662
.
27
Jia
X.
,
Han
B.
,
Onengut-Gumuscu
S.
,
Chen
W.M.
,
Concannon
P.J.
,
Rich
S.S.
,
Raychaudhuri
S.
,
de Bakker
P.I.
(
2013
)
Imputing amino acid polymorphisms in human leukocyte antigens
.
PLoS ONE
,
8
,
e64683
.
28
Okada
Y.
,
Kim
K.
,
Han
B.
,
Pillai
N.E.
,
Ong
R.T.
,
Saw
W.Y.
,
Luo
M.
,
Jiang
L.
,
Yin
J.
,
Bang
S.Y.
et al
. (
2014
)
Risk for ACPA-positive rheumatoid arthritis is driven by shared HLA amino acid polymorphisms in Asian and European populations
.
Hum. Mol. Genet.
,
23
,
6916
6926
.
29
Pillai
N.E.
,
Okada
Y.
,
Saw
W.Y.
,
Ong
R.T.
,
Wang
X.
,
Tantoso
E.
,
Xu
W.
,
Peterson
T.A.
,
Bielawny
T.
,
Ali
M.
et al
. (
2014
)
Predicting HLA alleles from high-resolution SNP data in three Southeast Asian populations
.
Hum. Mol. Genet.
,
23
,
4443
4451
.
30
Lee
S.H.
,
Goddard
M.E.
,
Wray
N.R.
,
Visscher
P.M.
(
2012
)
A better coefficient of determination for genetic profile analysis
.
Genet. Epidemiol.
,
36
,
214
224
.

## Author notes

The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.