-
PDF
- Split View
-
Views
-
Cite
Cite
Tarunveer S Ahluwalia, Bram P Prins, Mohammadreza Abdollahi, Nicola J Armstrong, Stella Aslibekyan, Lisa Bain, Barbara Jefferis, Jens Baumert, Marian Beekman, Yoav Ben-Shlomo, Joshua C Bis, Braxton D Mitchell, Eco de Geus, Graciela E Delgado, Diana Marek, Joel Eriksson, Eero Kajantie, Stavroula Kanoni, John P Kemp, Chen Lu, Riccardo E Marioni, Stela McLachlan, Yuri Milaneschi, Ilja M Nolte, Alexandros M Petrelis, Eleonora Porcu, Maria Sabater-Lleal, Elnaz Naderi, Ilkka Seppälä, Tina Shah, Gaurav Singhal, Marie Standl, Alexander Teumer, Anbupalam Thalamuthu, Elisabeth Thiering, Stella Trompet, Christie M Ballantyne, Emelia J Benjamin, Juan P Casas, Catherine Toben, George Dedoussis, Joris Deelen, Peter Durda, Jorgen Engmann, Mary F Feitosa, Harald Grallert, Ann Hammarstedt, Sarah E Harris, Georg Homuth, Jouke-Jan Hottenga, Sirpa Jalkanen, Yalda Jamshidi, Magdalene C Jawahar, Tine Jess, Mika Kivimaki, Marcus E Kleber, Jari Lahti, Yongmei Liu, Pedro Marques-Vidal, Dan Mellström, Simon P Mooijaart, Martina Müller-Nurasyid, Brenda Penninx, Joana A Revez, Peter Rossing, Katri Räikkönen, Naveed Sattar, Hubert Scharnagl, Bengt Sennblad, Angela Silveira, Beate St Pourcain, Nicholas J Timpson, Julian Trollor, CHARGE Inflammation Working Group, Jenny van Dongen, Diana Van Heemst, Sophie Visvikis-Siest, Peter Vollenweider, Uwe Völker, Melanie Waldenberger, Gonneke Willemsen, Delilah Zabaneh, Richard W Morris, Donna K Arnett, Bernhard T Baune, Dorret I Boomsma, Yen-Pei C Chang, Ian J Deary, Panos Deloukas, Johan G Eriksson, David M Evans, Manuel A Ferreira, Tom Gaunt, Vilmundur Gudnason, Anders Hamsten, Joachim Heinrich, Aroon Hingorani, Steve E Humphries, J Wouter Jukema, Wolfgang Koenig, Meena Kumari, Zoltan Kutalik, Deborah A Lawlor, Terho Lehtimäki, Winfried März, Karen A Mather, Silvia Naitza, Matthias Nauck, Claes Ohlsson, Jackie F Price, Olli Raitakari, Ken Rice, Perminder S Sachdev, Eline Slagboom, Thorkild I A Sørensen, Tim Spector, David Stacey, Maria G Stathopoulou, Toshiko Tanaka, S Goya Wannamethee, Peter Whincup, Jerome I Rotter, Abbas Dehghan, Eric Boerwinkle, Bruce M Psaty, Harold Snieder, Behrooz Z Alizadeh, Genome-wide association study of circulating interleukin 6 levels identifies novel loci, Human Molecular Genetics, Volume 30, Issue 5, 1 March 2021, Pages 393–409, https://doi.org/10.1093/hmg/ddab023
- Share Icon Share
Abstract
Interleukin 6 (IL-6) is a multifunctional cytokine with both pro- and anti-inflammatory properties with a heritability estimate of up to 61%. The circulating levels of IL-6 in blood have been associated with an increased risk of complex disease pathogenesis. We conducted a two-staged, discovery and replication meta genome-wide association study (GWAS) of circulating serum IL-6 levels comprising up to 67 428 (ndiscovery = 52 654 and nreplication = 14 774) individuals of European ancestry. The inverse variance fixed effects based discovery meta-analysis, followed by replication led to the identification of two independent loci, IL1F10/IL1RN rs6734238 on chromosome (Chr) 2q14, (Pcombined = 1.8 × 10−11), HLA-DRB1/DRB5 rs660895 on Chr6p21 (Pcombined = 1.5 × 10−10) in the combined meta-analyses of all samples. We also replicated the IL6R rs4537545 locus on Chr1q21 (Pcombined = 1.2 × 10−122). Our study identifies novel loci for circulating IL-6 levels uncovering new immunological and inflammatory pathways that may influence IL-6 pathobiology.
Introduction
Interleukin 6 (IL-6) is a multifunctional cytokine, which is involved in a wide range of immunomodulatory processes, from cellular migration and adhesion to proliferation and maturation (1,2). Interleukins are involved in immune cell differentiation and activation (3). IL-6 is synthesized by a variety of different immune cells such as monocytes (4), B cells (5) and T cells (6) and also non-immune cells such as epithelial and smooth muscle cells (7), adipocytes (8), endothelial cells (9) and osteoblasts (10).
Several factors have been implicated in circulating IL-6 levels. We have previously demonstrated that IL-6 levels decrease with age in children and increase with age in adults (11). Also, increased levels of IL-6 have been observed in various diseases, not surprisingly in autoimmune diseases such as rheumatoid arthritis (12) and systemic juvenile idiopathic arthritis (13), but also cardio-metabolic diseases like type 2 diabetes (14), heart failure, coronary heart disease (15) and atherosclerosis (16), as well in cancers (17), atopic dermatitis (18) and psychological disorders like depression (19). Due to its implications in the pathogenesis of different disorders, Il-6 has been used as an appropriate choice for drug targeting and used as a monitoring biomarker of disease progression and response to treatments (20). The most illustrious IL-6 inhibitor is tocilizumab (21), a monoclonal antibody binding the IL-6 receptor, which is already in use for treating patients with allergic asthma (22), and immune system disorders like rheumatoid arthritis (23) and systemic juvenile idiopathic arthritis (24), with high efficacy with some initial benefits towards respiratory illnesses like COVID-19 (25).
IL-6 baseline levels are heritable with estimates from twin studies ranging between 15 and 61% (26–29). However, efforts to identify genetic variants associated with levels of IL-6 constituted relatively small-scale GWAS (30–33) or sequencing-based candidate gene association studies (34). To date, variants in the IL-6 receptor gene (IL-6R) and the gene encoding histo-blood group ABO system transferase (ABO) have been identified as statistically significant for an association to IL6-levels. Also, the genetic risk score constructed of IL-6 variants identified in the study by Shah and colleagues explained up to 2% of the variation in IL-6 levels (33), leaving a substantial part of its heritability unexplained. These seemingly sparse results and limited findings could be due to limitations in the study power caused by low sample size or a great inter-individual variability of IL-6 levels. One may speculate a substantial increase in the study size by increasing the number of participants, which would very likely lead to the identification of additional variants explaining IL-6 levels (35–37).
The current study is the (till date) largest meta GWAS study including 67 428 individuals of European ancestry to identify genetic variants explaining the levels of circulating IL-6 and to understand underlying genetic mechanisms implicated in the pathophysiology of this cytokine.
Results
A total of 52 654 individuals of European descent from 26 cohorts were included in the discovery GWAS meta-analysis with up to 2 454 025 autosomal SNPs passing quality control. Four cohorts (ALSPAC, MONICA/KORA, NTR and SardiNIA) identified genome-wide significant associations in the ABO region, whereas none of the other 22 cohorts did, either individually or combined. These cohorts conditioned their results on their relevant top-SNP in ABO, the results of which were included in the discovery meta-analyses. The overall genomic control inflation factor (λGC after correction) at the discovery stage meta-analysis was 1.0.
We identified 94 variants that were genome-wide significantly (Pdiscovery < 5.0 × 10−8; Supplementary Material, Table S1) associated with IL-6 levels, representing two independent genetic loci on chromosomes 1q21 and 6p21. Two common SNPs (rs4537545 and rs660895), one per locus, Chr. 1q21 (IL6R), and Chr. 6p21 (HLA-DRB1/HLA-DRB5), showed the most significant association with IL-6 levels (index SNPs) and the third SNP (rs6734238) mapped on Chr. 2q14 (IL1F10/IL1RN) locus showed suggestive (5.0 × 10−8 < Pdiscovery < 1.0 × 10−5) association in addition to 5 other loci (LHFPL3, LZTS1, GPC5/GPC6, USP32/APPBP2, STAU1; Supplementary Material, Table S2). Manhattan and QQ plots have been depicted in Figures 1A and 1B.
The minor alleles of IL6R rs4537545*T (β = 0.091; Pdiscovery = 8.39 × 10−85), IL1F10/IL1RN rs6734238*G (β = 0.025; Pdiscovery = 1.45 × 10−7) and HLA-DRB1/5 rs660895*G (β = 0.036; Pdiscovery = 1.80 × 10−9) associated with increased circulating IL-6 levels (Table 1). Two additional genome-wide significant SNPs in the IL1R locus, rs11265618 (β = 0.047; Pdiscovery = 1.21 × 10−15) and rs10796927 (β = 0.034; Pdiscovery = 1.24 × 10−11), in low LD (r2 < 0.25) with the lead SNP rs4537545 were carried forward for replication, and later conditional analysis as they seemed potential candidates as independent signals.
Novel and replicated loci associated with circulating IL-6 levels at P < 5.0 × 10−8 in the combined GWAS meta analyses
Chr . | Lead SNP (rsID) . | BP (Hg19) . | Effect/Other allele . | EAF . | Beta (SE) . | Pdiscovery . | Preplication . | Pcombined . | Annotation . | Nearest Genes . | Phet (I2) . |
---|---|---|---|---|---|---|---|---|---|---|---|
Novel Loci | |||||||||||
2q14 | rs6734238 | 113 841 030 | G/A | 0.42 | 0.025 (0.005) | 1.45 × 10−7 | 3.24 × 10−5 | 1.84 × 10−11 | Intergenic | IL1F10/IL1RN | 0.03 (32) |
6p21 | rs660895 | 32 577 380 | G/A | 0.19 | 0.036 (0.006) | 1.80 × 10−9 | 3.38 × 10−2 | 1.55 × 10−10 | Intergenic | HLA-DRB5/DRB1 | 0.08 (26) |
Replicated Known Locus | |||||||||||
1q21 | rs4537545 | 154 418 879 | T/C | 0.39 | 0.091 (0.005) | 8.39 × 10−85 | 7.88 × 10−37 | 1.20 × 10−122 | Intronic | IL6R | <0.01 (72) |
Chr . | Lead SNP (rsID) . | BP (Hg19) . | Effect/Other allele . | EAF . | Beta (SE) . | Pdiscovery . | Preplication . | Pcombined . | Annotation . | Nearest Genes . | Phet (I2) . |
---|---|---|---|---|---|---|---|---|---|---|---|
Novel Loci | |||||||||||
2q14 | rs6734238 | 113 841 030 | G/A | 0.42 | 0.025 (0.005) | 1.45 × 10−7 | 3.24 × 10−5 | 1.84 × 10−11 | Intergenic | IL1F10/IL1RN | 0.03 (32) |
6p21 | rs660895 | 32 577 380 | G/A | 0.19 | 0.036 (0.006) | 1.80 × 10−9 | 3.38 × 10−2 | 1.55 × 10−10 | Intergenic | HLA-DRB5/DRB1 | 0.08 (26) |
Replicated Known Locus | |||||||||||
1q21 | rs4537545 | 154 418 879 | T/C | 0.39 | 0.091 (0.005) | 8.39 × 10−85 | 7.88 × 10−37 | 1.20 × 10−122 | Intronic | IL6R | <0.01 (72) |
Index SNPs that reached P < 5 × 10–8 in the combined analysis from each locus are reported here. Sample sizes: discovery cohorts, n = 52 654; replication cohorts, n = 14 774; combined, n = 67 428. The effect sizes (β) in the discovery phase, given for the effect allele. EAF: effect allele frequency; effect sizes and standard error (SE) values are based on natural log transformed IL6 (pg/ml) levels. Phet: combined meta-analysis heterogeneity P value; I2: heterogeneity measure.
Novel and replicated loci associated with circulating IL-6 levels at P < 5.0 × 10−8 in the combined GWAS meta analyses
Chr . | Lead SNP (rsID) . | BP (Hg19) . | Effect/Other allele . | EAF . | Beta (SE) . | Pdiscovery . | Preplication . | Pcombined . | Annotation . | Nearest Genes . | Phet (I2) . |
---|---|---|---|---|---|---|---|---|---|---|---|
Novel Loci | |||||||||||
2q14 | rs6734238 | 113 841 030 | G/A | 0.42 | 0.025 (0.005) | 1.45 × 10−7 | 3.24 × 10−5 | 1.84 × 10−11 | Intergenic | IL1F10/IL1RN | 0.03 (32) |
6p21 | rs660895 | 32 577 380 | G/A | 0.19 | 0.036 (0.006) | 1.80 × 10−9 | 3.38 × 10−2 | 1.55 × 10−10 | Intergenic | HLA-DRB5/DRB1 | 0.08 (26) |
Replicated Known Locus | |||||||||||
1q21 | rs4537545 | 154 418 879 | T/C | 0.39 | 0.091 (0.005) | 8.39 × 10−85 | 7.88 × 10−37 | 1.20 × 10−122 | Intronic | IL6R | <0.01 (72) |
Chr . | Lead SNP (rsID) . | BP (Hg19) . | Effect/Other allele . | EAF . | Beta (SE) . | Pdiscovery . | Preplication . | Pcombined . | Annotation . | Nearest Genes . | Phet (I2) . |
---|---|---|---|---|---|---|---|---|---|---|---|
Novel Loci | |||||||||||
2q14 | rs6734238 | 113 841 030 | G/A | 0.42 | 0.025 (0.005) | 1.45 × 10−7 | 3.24 × 10−5 | 1.84 × 10−11 | Intergenic | IL1F10/IL1RN | 0.03 (32) |
6p21 | rs660895 | 32 577 380 | G/A | 0.19 | 0.036 (0.006) | 1.80 × 10−9 | 3.38 × 10−2 | 1.55 × 10−10 | Intergenic | HLA-DRB5/DRB1 | 0.08 (26) |
Replicated Known Locus | |||||||||||
1q21 | rs4537545 | 154 418 879 | T/C | 0.39 | 0.091 (0.005) | 8.39 × 10−85 | 7.88 × 10−37 | 1.20 × 10−122 | Intronic | IL6R | <0.01 (72) |
Index SNPs that reached P < 5 × 10–8 in the combined analysis from each locus are reported here. Sample sizes: discovery cohorts, n = 52 654; replication cohorts, n = 14 774; combined, n = 67 428. The effect sizes (β) in the discovery phase, given for the effect allele. EAF: effect allele frequency; effect sizes and standard error (SE) values are based on natural log transformed IL6 (pg/ml) levels. Phet: combined meta-analysis heterogeneity P value; I2: heterogeneity measure.
Overall, 12 SNPs spanning over 9 loci at a Pdiscovery < 1 × 10−5 in the discovery GWAS meta-analyses were selected for the replication stage (Supplementary Material, Table S2). This included the three index SNPs, two additional SNPs from the 1q21 locus (GWS but in low LD, r2 < 0.25 with index SNP) plus an additional set of seven statistically suggestive SNPs with a P-value of 5 × 10−8 < P < 1 × 10−5 in the discovery meta-analyses (either in low LD, r2 < 0.25 with the index SNP or independent loci). Additionally, 3 SNPs as negative controls and 3 SNPs in LD (r2 > 0.25) with the Chr.1 index SNP, to control for possible genotyping errors of index SNP across replication cohorts, were also added to the replication list, yielding 18 SNPs for replication stage (Supplementary Material, Table S3).
Three loci including Chr.1q21 IL6R, Chr.6p21 HLA-DRB1/5 and Chr.2q14 ILF10/IL1RN replicated at Preplication < 0.05, reaching GWS; 1q21 rs4537545, Pcombined = 1.20 × 10−122; 6p21 rs660895, Pcombined = 1.55 × 10−10; and 2q14 rs6734238, Pcombined = 1.84 × 10−11 in the combined meta-analyses (Table 1 and Supplementary Material, Table S3). Locus zoom plots available in Figures 1C, 1D, and 1E. The two additional signals at Chr.1q21 IL6R locus were replicated at Preplication = 1.7 × 10−4 for rs11265618 and P = 0.03 for rs10796927, reaching Pcombined = 2.5 × 10−9 and Pcombined = 4.1 × 10−13, respectively (Supplementary Material, Table S3). The conditional analysis confirmed that rs11265618 and rs10796927 SNPs were not independent from (Supplementary Material, Table S4) but were driven by the index rs4537545 SNP.
In both, discovery and replication association analyses, the effect directionality was generally consistent across individual studies for GWS variants, while there was some evidence of borderline heterogeneity in one of the two novel loci (I2 (P value) < 0.05) during the discovery and combined meta-analysis (Table 1, Fig. 2). The imputation quality scores (r2) for the GWS (index) SNPs for each cohort (discovery and replication) are available in Supplementary Material, Table S5. The other seven SNPs that showed suggestive association in the discovery stage, and expectedly the negative control SNPs did not reach GWS in the combined meta-analyses (Supplementary Material, Table S3).
The three GWAS index SNPs when combined explained approximately 1.06% of the variance in circulating levels of IL-6 using data from the NESDA cohort. The phenotypic variance explained by all the common variants was estimated to be 4.45% using the SumVg method (38).
Replication of other known/suggestive loci for IL-6
IL6R was the only IL6 known locus that we replicate at GWS. IL1RN and HLA-DRB1, our primary findings have been reported as suggestive loci (1 × 10−6 < P < 1 × 10−4) by Shah et al. while some known/suggestive IL6 loci (ABO, BUD13, TRIB3 and SEZ6L) did not replicate (Pdiscovery > 0.05) in the current study.
SNP functionality
We looked up SNPs in LD with the index SNPs from the immunologically associated loci including IL-6R, rs4537545, 1q21; IL1F10, IL1RN, rs6734238, 2q14, intergenic; and HLA-DRB1/DRB5, rs660895, 6p21, intergenic. The search for functional/missense variants in high LD (r2 > 0.8) with the lead SNPs led to the identification of only one nonsynonymous rs2228145 SNP in LD (r2 = 0.95) with the rs4537545 index SNP from the IL6R locus. We used the Combined Annotation-Dependent Depletion (CADD) database to identify the functionality, i.e. deleterious, disease causal, pathogenicity, of rs2228145 in IL6R. CADD is an integrative annotation based on multiple genomic features scored into a single metric (39). IL6R missense the rs2228145 variant has a CADD score of 15.98 (https://cadd.gs.washington.edu).
Associations with other traits and gene expression data
Genome-wide significant associations between IL6-associated top SNPs and other traits, and gene expression, data were mined using the Pheno Scanner v2 database (accessed, October 2020).
GWAS-based IL1F10/IL1RN rs6734238*G allele has been associated with increased levels of serum C-reactive protein (CRP) and decreased fibrinogen levels, and blood cell traits in recent GWAS reports (40,41) (PMID:27863252; Supplementary Material, Table S6).
HLA-DRB1/DRB5 rs660895*G allele is associated with increased risk of rheumatoid arthritis (RA) in Europeans and Asians (42), IgA nephropathy in Asians (43), while the decreased risk of ulcerative colitis and inflammatory bowel disease (IBD) (44). IL-6R rs4537545*T allele has been associated with increased circulating CRP levels (45), a decreased risk of RA (42) in mixed ancestries, while an increased risk of diabetes and asthma from the UK Biobank Neale’s lab rapid GWAS (See Web Resources; Supplementary Material, Table S6). IL6R rs4537545T* allele is also associated with C-reactive protein, allergic disease, rheumatoid arthritis and coronary artery disease (Supplementary Material, Table S6).
Gene expression
IL1F10/IL1RN rs6734238 is associated with IL1F10/IL1RN expression levels in the skin, peripheral blood and whole blood (P < 5.0 × 10−8; Supplementary Material, Table S7). HLA-DRB1/DRB5 rs660895 has been associated with HLA-DRB1/DRB5/DRB6/DQB1/DQB2 expression levels in multiple tissues including peripheral blood, whole blood, monocytes, adipose tissue, thyroid, tibial artery, coronary artery, heart, lung, brain, colon, skeletal muscle, tibial nerve, skin and lymphoblastoid cell lines (P < 5.0 × 10−8; Supplementary Material, Table S7).IL6R rs4537545 SNP is also associated with IL6R expression levels in peripheral and whole blood (P < 5.0 × 10−8; Supplementary Material, Table S7).
Power estimates
Based on power calculator and assumptions mentioned under methods section, the estimated power for the 2 novel index SNPs was 98.3% rs6734238 (effect allele frequency, EAF: 0.42), and 76.9% rs660895 (EAF: 0.19), respectively.
Discussion
We performed the largest (to date) GWAS meta-analysis for circulating IL-6 levels, which includes 66 341 individuals of European ancestry. We identified three loci associated with levels of circulating of IL-6 in the general population amongst which two are novel (Chr6p21, and Chr2q14), located in/nearby genes (HLA-DRB1 and IL1RN/IL-38) with inflammatory roles explaining up to 1.06% variance.
The strongest associated SNP, interleukin 6 receptor (IL-6R) rs4537545 at the 1q21 locus, is in high LD (r2 = 0.95) with a missense IL-6R SNP rs2228145 (D358A) that results in an amino acid substitution at position 358 (Asp → Ala) on the extracellular domain of IL-6R and a high CADD score suggesting that the variant is pathogenic or functional or deleterious (among top 10% variants of the genome). The missense SNP is known to impair the responsiveness of cells targeted by IL-6 (46) by reducing IL-6R expression on cell surfaces (47), and increasing levels of soluble IL-6R in individuals homozygous for this mutation (48,49). Recently it has been demonstrated that increased levels of sIL-6R induced by this variant can be explained by ectodomain shedding off IL-6R, a mechanism in which membrane-associated proteins are rapidly converted into soluble effectors whereby simultaneously cell surface expression of the same protein is reduced (50). Increased levels of sIL-6R may act as a counter-balance to limit exaggerated IL-6 signaling and may explain the protective effect of the 358A allele for various cardiovascular diseases including coronary artery disease (CAD) (51–53), atrial fibrillation (54), lung function in asthmatics (55) and abdominal aortic aneurysm (56) as well as RA (57). However, in contrast with this finding, the IL-6-sIL-6R complex itself is capable of transducing IL-6 signaling to non-IL-6R expressing cells, known as trans-signaling (58), and it is this mechanism, as opposed to classic signaling, that is linked to chronic inflammatory disorders including IBD and RA (59). Blocking IL-6 signaling cascades can be achieved by using an IL-6R specific inhibitor in the form of a monoclonal antibody, tocilizumab, which is a widely used therapy in the treatment of RA. Several variants in IL-6R, including rs2228145, may assist in the prediction of patient response to tocilizumab in RA (60). The rs4537545*T allele that is associated with IL6 levels is known to associate with increased circulating CRP levels (61) and a decreased risk of RA (42) in studies comprising mixed ancestries. Moreover, this SNP has been associated with IL6R expression in peripheral blood, skin, brain and adipose tissue (Supplementary Material, Table S7). The causal involvement of IL-6 levels in disease remains to be elucidated, but a recent study using a Mendelian randomisation (MR) approach did demonstrate that by using this SNP as instrumental variable, modelling the effects of tocilizumab, that IL-6R signalling has a causal effect on CAD (52). On the other hand, pleiotropic nature of the IL-6R locus, influencing IL-6, CRP and fibrinogen levels, prohibits instrumental variable analysis and attribution of causality to one particular intermediate. Finally, several other genes encompass the 1q21 locus, including Src homology 2 domain containing E (SHE), and Tudor domain containing 10 (TDRD10), but have been ruled out to play a role at this locus (33).
At the identified chromosome 2 locus the lead SNP, rs6734238, is intergenic and has also been associated with circulating CRP and fibrinogen levels (40,41,62). The nearest genes to this locus are the interleukin 1 family member 10 (IL1F10, distance = 7.6 kb, currently known as IL-38) and interleukin 1 receptor antagonist (IL1RN, distance = 34.4 kb). IL1F10/IL-38 and IL1RN variants (rs6759676 and rs4251961) in partial LD with the lead SNP (r2LD:0.10 and 0.61) have been recently reported to be protective against the development of insulin resistance (63). This further supports the molecular mechanisms behind IL-6-mediated insulin secretion via glucagon-like peptide 1 (GLP-1) (64) contributing to type 2 diabetes (T2D) pathophysiology. For IL-6 specifically, it has been found that synthesis increases when dendritic cells are stimulated by bacterial lipopolysaccharides (LPS) in the presence of IL1F10 (65). IL-1RN is another member of the interleukin 1 cytokine family, with suggestive evidence for involvement in determining IL-6 levels in the blood. One study found significant associations of IL-1RN rs4251961 with plasma CRP and IL-6 levels, albeit not independently replicated and not genome-wide significant (P = 1 × 10−4 and P = 0.004) (66). Our lead SNP was not in high LD (r2 < 0.8) with variants in either neighboring genes and therefore in conjunction with its intergenic position, identifying a causal variant in this locus remains non-trivial.
The 6p21 rs660895, which was identified, resides within the HLA region, which forms one of the most complex genomic regions to study due to its large LD blocks and sequence diversity. This region has some population substructure in Europeans, which may have influenced the results; however, (1) each cohort population substructure adjustment was applied, followed by genomic correction for overall discovery stage meta-analyses. Thus, we reduced the chances that the population substructure may have had on this locus. The nearest genes to the index SNP, HLA-DRB1 (distance = 19.8 kb) and HLA-DQA1 (distance = 27.8 kb) are both histocompatibility complex genes encoding proteins that form cell surface complexes for certain immune system cells helping in antigen presentation to trigger an immune response. It is noteworthy that variations at this locus code for antigen-presenting complexes (APCs), which have been previously associated with diseases having a dysfunctional immune system; while we report for the first time that there exists also a strong association of this locus with circulating cytokine levels. Therefore, the association of this locus with the disease may corroborate through its effect through IL6 levels. One high-LD SNP (rs9272422, r2 = 0.82 with our index SNP, rs660895) residing in the promoter region of HLA-DQA1 support this hypothesis and has been identified previously for systemic lupus erythematosus (67) and ulcerative colitis (68). rs660895*G allele is associated with increased risk of RA in Europeans and Asians (42), IgA nephropathy in Asians (43), while the decreased risk of ulcerative colitis and IBD. (44)
Various studies aimed to identify genetic variation underlying levels of IL-6 (22–26) have found genome-wide significant associations in the IL-6R and ABO genes. The study performed by Shah and colleagues (33) found suggestive evidence (non-GWS; P = 3.8 × 10−6, respectively) for additional loci, including ABO, BUD13, TRIB3 and SEZ6L, none of which replicate in the current study (Pdiscovery > 0.05) indicating that these might be false-positive findings due to low sample size (~7800) or loci with sex-specific effects (associations based on women dominant population) or due to technical shortcomings with measurement assay (ABO locus).
It is surprising that even with increased statistical power (ndiscovery = 52 654; nreplication = 14 774) in the current study (compared to the previous IL6 GWAS) (33), we could identify three genetic loci (1q21, 2q14 and 6p21) accounting for ~1% of the genetic variance for circulating IL-6 levels. According to the current estimates, the heritability levels for IL6 levels range between 15 and 61%, suggesting that an enormous increase in sample sizes would be required to identify additional variants explaining this remaining heritability. Multiple explanations for this so-called missing heritability phenomenon have been proposed in the past, which can be sought in rare or low frequency coding variants as observed for a similar metabolic quantitative trait by us (69) or can be explained by non-additive effects, which may cause inflated estimates of heritability. Plausible evidence for other sources of unexplained heritability that have been found are epigenetic changes, and haplotypes of common SNPs.
Collectively, our results provided additional insights into the biology of circulating IL-6. We identify new loci, limited by common variants in the Hap Map Reference panel. Albeit this is comparable to the 1000 genomes reference panel (70) but narrower compared to some newly available panels that show greater variant coverage in numbers and frequency range. Future studies are recommended to aim for identification of additional common but also rare variants, by firstly using richer imputation panels, such as UK10K project or the Haplotype Reference Consortium, a strategy that holds great promise, and secondly by making use of genetically isolated populations. Thirdly, we would like to stress the importance of phenotype harmonization. As we identified genome-wide variants in the ABO locus, in four studies participating in the discovery, but not in the remaining 22 cohorts, there is a strong indication that this locus may be assay specific. However, a proper demonstration of this hypothesis would require further testing, including repeating the GWAS in ABO-positive cohorts using a different IL-6 assay. Indeed it is emerging that the ABO locus has pleiotropic effects on many different traits and diseases (71), which would suggest a more thorough analysis before disregarding this signal. Also, conventionally increasing sample sizes without correction for population substructures may raise heterogeneity within populations (72), likely concealing the SNPs that affect particular subgroups. Future specific studies should counter the widely held assumption of unconditional risk alleles of complex traits and focus on the importance of studying more homogenous subgroups to, for example, investigate the age-dependent effect of genetic variants (73,74). Here, while further exploring the pleiotropic effect of IL-6-related variants, we identified phenotypes differentially regulated by diverse variants in the 1q21 locus. Biologic systems are dynamic complex networks and are evolving through lifespan and investigating the interrelationships existing between phenotypes as well as between genetic variations and phenotypic variations has the potential for uncovering the complex mechanisms. This is the case here for IL-6 and tailored methodologies should be devoted to the study of such traits, hopefully resulting in clinically significant breakthroughs. Future collaborative efforts therefore should strive to use well-calibrated assays, z-standardized protocols for sample handling, and processing (75), though this will be difficult to achieve in practice. Lastly, we have attempted to perform formal association-based causal analysis to identify the likely causal loci, using the DEPICT approach; unfortunately, instead with only 2 novel GWS findings, our analyses were underpowered and thus not included. We also mine the gene expression and eQTL data for the identified SNPs using established databases; however, we were unable control for random co-localization signals or other confounders as we had limited access to summary level data. In conclusion, we identify two novel common genetic variants associated with circulating IL-6 levels that may influence the pathophysiology of complex cardio-metabolic, psychiatric and immunological traits, among individuals of European ancestry. This is a step further towards unravelling new biological pathways and potential therapeutic targets that can be developed for the IL-6-related disorders, while suggesting looking deeper into the genome for coding variants (rare and common) having larger individual effects (Figs 1 and 2).

Manhattan, QQ and LocusZoom plots of the discovery GWAS meta-analyses. (A) Manhattan plot showing the association of SNPs with IL-6. Loci coloured in red or blue, three in total, represent those for which the lead SNPs reached genome-wide significance (P = 5 × 10−8). Horizontal axis: relative genomic position of variants on the genome, vertical axis: −log10 P-value of each SNP; (B) Quantile–quantile plot for P-values obtained from the meta-analysis. The horizontal and vertical axes represents the expected distribution of −log10 (P-values) under the null hypothesis of no association, whereas the vertical axis shows the observed −log10 (P-values). The blue dashed line represents the null, and λGC value represents the genomic inflation factor lambda. Each data point represents the observed versus the expected P-value of a variant included in the association analyses; (C–E) Regional association plots for each of the three genome-wide significant loci, 1q21, 2q14 and 6p21, respectively. Pairwise LD (r2) with the lead SNP is indicated following a color-coded scale. Horizontal axis: relative genomic position of variants within the locus, vertical axis: −log10 P-value of each SNP.
Combined discovery and replication forest plots for the GWAS Index SNPs. Forrest plots for (A) IL6R rs4537545 (chr. 1q21), (B) IL1RN rs6734238 (chr. 2q14), (C) HLA-DRB5 rs660895 (chr. 6p21) with discovery, replication and combined effect estimates, 95% CI and weights based on the fixed effects inverse variance meta-analyses.
Material and Methods
Discovery stage
Study populations
The overall study design (Supplementary Material, Fig. S1) involved the discovery cohorts with 53 893 individuals. After overlapping individuals with available genotype and phenotype data, the discovery stage included 52 654 individuals from 26 cohorts of European ancestry listed under Supplementary Material, Table S8, described in Supplementary Text S1 and study summary characteristics in Supplementary Material, Table S9. Only population-based samples or healthy controls from case–control studies were included in the final analyses.
Serum IL-6 measurements
Each study typically collected venous blood samples stored below −80°C until the time of measurement using various types of immunoassays and expressed as pg/ml as presented in Supplementary Material, Table S10. The trait transformation and phenotype data quality control (QC) were presented by Supplementary Material, Text S3 (Supplementary Material, Text S3.1 and S3.2). In brief, participating cohorts have checked for the percentage of missingness in IL6 measurements and evaluated for indices of QC (Supplementary Material, Text S3.2), yielded the final number of participants with available validated IL6 levels, of whom those with available genotype data were included in the study as characterized in Supplementary Material, Table S9 and in Supplementary Material, Text S1 and S2.
Genotyping and imputation
Each participating cohort performed genome-wide genotyping using a variety of genotyping platforms and applied a predefined quality control (QC) of genotype data (Supplementary Material, Table S11) followed by performing imputation of non-genotyped genetic variants, on the backbone of haplotypes inferred from the Hapmap Phase II reference panel (NCBI Build 36), and using statistical software such as IMPUTE (75), MACH, Minimac (76) or BIMBAM (77) (Supplementary Material, Table S11). Each cohort was recommended a set of general SNP quality filters including MAF < 0.01; Hardy Wienberg Equilibrium (HWE) P ≤ 10−6; imputation quality r2 ≤ 0.3; and genotyping call rate <0.95 (Supplementary Material, Fig. S1). Once we received summary results from each participating study, we ran a series of QC checks. Firstly, these included a set standard checks, including the imputation quality filters (basis the imputation program used and/or r2imputation < 0.3 were excluded), and then checks for genomic inflation (quantile–quantile or QQ plots). We adapted filter thresholds per cohort to reduce any observed deviation from the null while missing SNP loss due to the QC process. Finally, ~2.45 million (2 454 025) common SNPs were part of the discovery meta-analysis.
Statistical methods
GWAS analysis
Each study conducted an independent GWAS analysis between SNPs and natural log-transformed values of serum IL-6 levels following a predefined analysis plan (Supplementary Material, Methods S4). Association analyses were conducted using linear regression model, or linear mixed effect models to account for familial correlation when warranted, with additive genetic effects, accounting for imputation uncertainty while adjusting for age, sex, population substructure (through study-specific principal components) and/or study-specific site, when necessary. GWAS summary result obtained from each cohort underwent a series of QC checks using the QCGWAS package in R (78) (Supplementary Material, Text S3 and Supplementary Material, Fig. S1). Being aware of the potential false-positive association in the ABO region on chromosome 9 (28,30), while using an R&D systems high-sensitivity assay kit to measure IL6 levels (R&D systems, Minneapolis, MN, USA), four (out of 22) discovery cohorts that observed genome-wide significant results in the ABO locus were asked to rerun the GWAS analysis conditional on the top ABO SNP (i.e. rs8176704) before including them in the final discovery meta-analysis (Supplementary Material, Text S3.3).
Discovery GWAS meta-analyses
Individual GWAS results from 26 European studies were meta-analyzed using the inverse variance weighted, fixed-effects method as implemented in GWAMA while applying the double genomic control (GC) correction for population stratification, i.e. first to each study individually and subsequently also to the pooled results after meta-analysis (79).
Regional association plots for the discovered loci were generated through the LocusZoom (78) tool. We used the SNAP tool (80) to perform the pairwise LD checks (HapMap release 22 data) and to verify low LD with secondary signals. All SNPs selected for the replication stage had to fulfill the following criteria: (1) having an association Pdiscovery ≤ 1 × 10−5 and being in very low LD with the index SNP (r2 < 0.2) and (2) available in at least 50% of study cohorts.
Replication and combined meta-analysis
Study population, phenotyping and QC
The overall study (Supplementary Material, Fig. S1) comprised 15 785 individuals for replication. After removing individuals with missing data, the replication analyses were performed using a combination of in silico and de novo genotyping in 14 774 individuals from 12 cohorts of European ancestry as described in Supplementary Material, Text S2. Similar QC (Supplementary Material, Text S3 and Supplementary Material, Table S11) and statistical checks were made as in the discovery stage.
Venous blood samples (serum or plasma) were collected and stored at −80°C. Serum/plasma IL-6 levels (pg/ml) were measured using various immunoassay methods described in Supplementary Material, Table S10. Each cohort tested the selected SNPs using the same statistical model as for the discovery association analyses (Supplementary Material, Fig. S1). Effect size estimates of all replication SNPs from each replication study were compared with the effect size estimates from the discovery meta-analyses. When effect sizes from individual cohorts did not align, we excluded these cohorts from the replication meta-analyses (ncohorts = 3). To account for the inter-study assay differences insensitivity, we combined results across the replication studies using a fixed effect sample size weighted Z-score meta-analysis as implemented in the METAL package (https://genome.sph.umich.edu/wiki/METAL) (81).
The summary GWAS meta-analyses result from the discovery and replication stages were then used to perform a combined (discovery + replication) GWAS meta-analysis using a sample size weighted Z-score method. Test for heterogeneity was also performed as part of the meta-analysis package using METAL where I2 statistic denoting the percentage of variation across studies was estimated (I2 = 100% × (Q − df)/Q) where Q is the Chi-Square statistic. Significance for heterogeneity was denoted by the heterogeneity (or HetP) P values. Variants that were significant in the replication meta-analysis at P < 0.05 and had an overall Pcombined < 5 × 10−8 in the combined meta-analysis were considered statistically GWS. SNPs within the range of 1 Mb (or 106 bases) on either side of the most significant (i.e. index) SNP (with LD, r2 > 0.25) were considered part of the same locus, whereas those in low LD (r2 < 0.25) were tested if they were independent using conditional analysis.
Conditional analysis
We performed an approximate joint conditional analysis to identify distinct signals in a specific chromosomal region as implemented in GCTA (82) using high-quality genome-wide genotyped/imputed reference data from two studies (NEtherlands Study of Depression and Anxiety (NESDA) from the Netherlands and/or Genetics of Obesity in Young Adults (GOYA) from Denmark) to estimate linkage disequilibrium (LD) (83) between SNPs.
Conditional analysis for identification of independent signals was performed on GWS SNPs ( ±1 Mb to the index SNP and having low LD, r2 < 0.2 with the index SNP) using summary statistics from the discovery GWAS meta-analysis data (—COJO option in GCTA) after confirming the GWS loci from the combined meta-analysis.
Heritability estimates
SNP mapping and functionality
We searched for variants in high LD (r2 > 0.8) within a 1 Mb region on either side of the lead SNPs using 1000 Genomes sequence data (Phase1 Integrated Release, Version 3, 2012.04.30), utilizing tools available in Liftover (85), VCFtools (86) and clumping in PLINK (84). We subsequently annotated these variants using ANNOVAR (87) with the RefSeq (88) database for variant function and genic residence or distance. We used the Combined Annotation-Dependent Depletion (CADD) database to identify the functionality, i.e. deleterious, disease causal, pathogenicity, for the index SNPs.
Associations with other traits and gene expression data
PhenoScanner v2 (89) data mining tool was used (Access date October 2020) to identify existing GWS (at P < 5 × 10−8) associations between IL6 identified SNPs and other traits, and gene expression/eQTLs data.
Power calculation
We used GWAs power estimator (see Web Resources) by assuming a relative risk of 1.10 (or an effect estimate of 0.10), given N = 66 000, alpha (P-value) = 5 × 10−8 (also GCTA power calculator, Supplementary Text S3.5).
Web Resources
QCGWAS, https://cran.r-project.org/web/packages/QCGWAS/index.html
GWAMA, http://www.well.ox.ac.uk/gwama/
METAL, http://csg.sph.umich.edu//abecasis/metal/
GCTA, http://www.complextraitgenomics.com/software/gcta/
LocusZoom, http://csg.sph.umich.edu/locuszoom/
1000 Genomes, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/
PLINK, http://pngu.mgh.harvard.edu/purcell/plink/
VCFtools, http://vcftools.sourceforge.net/
ANNOVAR, http://www.openbioinformatics.org/annovar/
PhenoScanner, http://www.phenoscanner.medschl.cam.ac.uk/phenoscanner
Power calculations: http://csg.sph.umich.edu/abecasis/cats/gas_power_calculator/index.html
The UK Biobank Neale’s lab rapid GWAS: (http://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank).
Authors Contributions
The guarantors of the study are T.S.A., B.P. and B.Z.A., from conception and design to conduct of the study and acquisition of data, data analysis and interpretation of data. T.S.A. and B.P. have written the first draft of the manuscript. All co-authors have provided important intellectual input and contributed considerably to the analyses and interpretation of the data. All authors guarantee that the accuracy and integrity of any part of the work have been appropriately investigated and resolved and all have approved the final version of the manuscript. BP had full access to the data and the corresponding authors T.S.A. and B.Z.A. had the final responsibility for the decision to submit for publication.
Acknowledgments
We acknowledge the participants of this study and the funding resources that have contributed to achieving this effort. Detailed acknowledgments can be found in Supplementary Material, Table S12.
Conflict of Interest. There is no conflict of interest from any of the participating co-authors. T.S.A. is a shareholder with Novo Nordisk A/S and Zealand Pharma A/S.
Funding
Novo Nordisk Foundation (NNF18OC0052457 to T.S.A.). Detailed Funding statements from participating study cohorts can be found in Supplementary Material, Table S12.