Genome-Wide Association Study of Microscopic Colitis in the UK Biobank Confirms Immune-Related Pathogenesis

Abstract Background and Aims The causes of microscopic colitis are currently poorly understood. Previous reports have found clinical associations with coeliac disease and genetic associations at the human leukocyte antigen [HLA] locus on the ancestral 8.1 haplotype. We investigated pharmacological and genetic factors associated with microscopic colitis in the UK Biobank. Methods In total, 483 European UK Biobank participants were identified by ICD10 coding, and a genome-wide association study was performed using BOLT-LMM, with a sensitivity analysis performed excluding potential confounders. The HLA*IMP:02 algorithm was used to estimate allele frequency at 11 classical HLA genes, and downstream analysis was performed using FUMA. Genetic overlap with inflammatory bowel disease [Crohn’s disease and ulcerative colitis] was investigated using genetic risk scores. Results We found significant phenotypic associations with smoking status, coeliac disease and the use of proton-pump inhibitors but not with other commonly reported pharmacological risk factors. Using the largest sample size to date, we confirmed a recently reported association with the MHC Ancestral 8.1 Haplotype. Downstream analysis suggests association with digestive tract morphogenesis. By calculating genetic risk scores, we also report suggestive evidence of shared genetic risk with Crohn’s disease, but not with ulcerative colitis. Conclusions This report confirms the role of genetic determinants in the HLA in the pathogenesis of microscopic colitis. The genetic overlap with Crohn’s disease suggests a common underlying mechanism of disease.


Introduction
Microscopic colitis includes two related inflammatory bowel disorders, lymphocytic colitis and collagenous colitis that have a combined prevalence of 103 cases per 100 000 population. 1 Both disorders cause chronic watery non-bloody diarrhoea and incontinence, and are associated with normal endoscopic appearances and characteristic histological features. The primary histological feature of lymphocytic colitis is patchy lymphocytic infiltration of the epithelium with preserved crypt architecture. Collagenous colitis is characterized by the presence of a thickened subepithelial collagen layer.
The pathogenesis of microscopic colitis is poorly elucidated: it reportedly involves immune responses to luminal factors in genetically predisposed individuals. 2 A recent association study based on Immunochip data reported association between human leukocyte antigen [HLA] alleles on the 8.1 haplotype and collagenous colitis 3 but not with lymphocytic colitis 4 in cohorts that comprised 314 patients with collagenous colitis, 122 patients with lymphocytic colitis and 4299 controls. Furthermore, Westerlind et al. report genetic overlap with inflammatory bowel disease [IBD] by comparing the number of nominally significant single nucleotide polymorphisms [SNPs] in both phenotypes. 3 The most frequently cited environmental risk factors are medications, with non-steroidal anti-inflammatory drugs [NSAIDs], protonpump inhibitors [PPIs] and selective serotonin reuptake inhibitors [SSRIs] most commonly implicated. [5][6][7] There have been no reported investigations of potential pharmacogenetic risk factors for microscopic colitis.
We sought to identify phenotypic and genetic associations with microscopic colitis in individuals of European ancestry enrolled in the UK Biobank, and report subsequent downstream analysis. Subsequently, we stratified the data by drug use and performed a genome-wide association study [GWAS] to identify pharmacogenetic associations. Finally, we calculated genetic risk scores for IBD to quantify in detail genetic overlap with microscopic colitis.

Participants
The UK Biobank is a population-based prospective study comprising more than 500 000 UK participants aged 40-69 years at time of recruitment between 2006 and 2010. Participants are actively followed, and phenotypic data collected include demographics, medical conditions, medications, lifestyle and anthropometric measurements. SNP genotypes were generated from the Affymetrix Axiom UK Biobank array [∼450 000 individuals] and the UK BiLEVE array [∼50 000 individuals]. More detail on the UK Biobank can be found elsewhere. 8 We defined microscopic colitis by the ICD10 code K52. 8

Statistical Methods
We performed tests for association with clinical characteristics using the Mann-Whitney U test for continuous data and Fisher's exact test for categorical data. Probability (p) values are reported uncorrected for multiple testing, but we used an adjusted p-value threshold of 0.0045 for statistical significance for the clinical data.
Quality control of the genotype data was performed centrally by the UK Biobank. 9 For the GWAS, we used ~12.0 million Haplotype Reference Consortium [HRC] imputed variants with an imputation r 2 ≥ 0.9, minor allele frequency [MAF] ≥ 0.025 [2.5%] and with a Hardy-Weinberg equilibrium p > 1 × 10 -12 . Furthermore, individuals with IBD or coeliac disease were excluded from all genetic analyses, leaving 423 cases and 445 232 controls.
We performed our main association test using BOLT-LMM v2.3, 10 which applies a linear mixed model [LMM] to adjust for the effects of population structure and individual relatedness and allowed us to include all related individuals in our white European subset, rather than reducing the sample size to only include the unrelated individuals [379 768]. Covariates included were age, sex, recruiting centre and genotyping chip. A more detailed explanation of the GWAS methodology can be found in our recent publication, 11 and a principal components plot is available in Supplementary Figure 1. Odds ratios from BOLT-LMM were calculated by OR = e β / (µ * (1 − µ)) where µ = case fraction, and standard errors were divided by (µ * (1 − µ)) to give confidence intervals. Following the GWAS, FUMA's SNP2GENE analysis was used to convert SNP data into genomic loci, and to perform a gene-set enrichment analysis using MAGMA. 12 As a sensitivity analysis we also performed a secondary GWAS on a more refined phenotype using only white British unrelated individuals, in which we excluded coeliac and IBD participants [defined by ICD10, ICD9 and self-report] and participants using PPIs. Controls were further refined by excluding those with a diagnosis [self-reported or in Hospital Episode Statistics data] before the age of 70 years of coronary heart disease, stroke, diabetes, chronic obstructive pulmonary disease, renal failure, any cancer [excluding non-melanoma skin cancer], and those who had died from any cause before the age of 70 years. To further limit the possibility of type 1 errors, this GWAS was performed using Fisher's exact test. This analysis used 335 cases and 64 300 controls [253 cases and 49 608 controls following filtering of related individuals].
Imputation of HLA alleles was performed using the HLA*IMP:02 algorithm to estimate allele frequency at 11 classical HLA genes: HLA-A, -B, -C, -DRB5, -DRB4, -DRB3, -DRB1, -DQB1, -DQA1, -DPB1 and -DPA1 using reference panels described by Motyer et al. 13 This procedure provides accurate dosages for 362 SNPs in the HLA region, allowing identification of the genetic basis of autoimmune processes with greater precision. 14 Genetic risk scores for ulcerative colitis [UC], Crohn's disease [CD] and IBD were calculated using odds ratios [ORs] for previously published SNPs. 15 From that study, using only SNPs that were genome-wide significant at 5 × 10 -8 and also present in our imputation panel, we have 145 SNPs for CD, 89 for UC and 162 for IBD. To quantify genetic overlap, for each IBD phenotype, we compared the mean genetic risk score (GRS) among the microscopic colitis patients with controls [defined as not having UC, CD, IBD or microscopic colitis]. A GRS containing N SNPs was calculated according to the equation below where β is the β-coefficient [log OR] representing the association between each SNP and the relevant phenotype and d i is the estimated dosage. Statistical associations are quantified using a two-tailed t-test.

Clinical Associations
Demographic and drug associations are shown in Table 1. Microscopic colitis patients were more likely to be older, female and a current smoker. White European UK Biobank participants with microscopic colitis had an eight times higher risk of coeliac disease and 12 times higher risk of IBD than controls. We found no evidence of association with body mass index [BMI] or socioeconomic status [defined by the Townsend Deprivation Index].
In terms of drug associations, patients with microscopic colitis were twice as likely to be using PPIs [a sub-analysis suggests 1.95 for omeprazole, 2.00 for lansoprazole]. We found no significant associations for SSRIs, NSAIDs or statins when accounting for multiple testing. We also performed a sub-analysis of NSAIDs, stratifying by COX1 and COX2 inhibitors, but found no significant associations.
All clinical associations were robust to adjustment for age and sex in a multivariable logistic regression model, and testing with or without the related individuals. Figure 1 shows a Manhattan plot of all SNPs that passed quality control and had p < 0.01. We found a strong association for microscopic colitis on chromosome 6, in the HLA region, with lead SNP rs2596560 [OR 0.64, 95% confidence interval 0.56-0.72, p = 3×10 -8 , MAF 0.24 vs 0.33]. The odds ratio remained the same at the lead SNP [OR 0.64, 95% confidence interval 0.53-0.77, p = 3.3 × 10 -6 ] when we performed our secondary analysis, on the refined phenotype using Fisher's exact test.

GWAS Results
Summary statistics for all genomic risk loci with p < 10 -5 are available in Supplementary Table 2, and the full GWAS summary  16 The horizontal dashed line is the genome-wide significance threshold at p = 5 × 10 -8 . The strongly significant signal on chromosome 6 lies in the HLA region, with lead SNP rs2596560, p = 3 × 10 -8 . statistics can be found at https://www.ebi.ac.uk/gwas/home. In Supplementary Figure 2 we present a QQ plot. The genomics inflation factor λ was 1.0000. We also include the top ten results of a MAGMA Gene-set Analysis in Supplementary Table 3, in which we find a Bonferroni-significant association for digestive tract morphogenesis. 17 HLA imputation demonstrated that the SNP rs2596560 associates with the class I and II alleles that comprise the ancestral major histocompatibility [MHC] 8.1 haplotype previously been linked to microscopic colitis, with lead SNP B_0801 passing genome-wide significance. Results from BOLT-LMM runs on the HLA imputed alleles are given in Table 2, with alleles on the MHC 8.1 haplotype highlighted in bold.
To follow up on the phenotypic association with PPIs, we sought to identify pharmacogenetic associations by using PPIs as an inclusion criterion [comparing PPIs with microscopic colitis and PPIs with no microscopic colitis]. However, there were no significant genome-wide associations for this GWAS. Table 3 shows the mean CD, UC and IBD genetic risk scores and 95% confidence intervals for microscopic colitis patients and controls.

Genetic Overlap with IBD
Microscopic colitis patients had a higher genetic risk for all three tests, but only CD [p = 0.035] and IBD [p = 0.019] were significant at the 5% confidence level, suggesting some shared genetic pathway behind microscopic colitis and CD/IBD. These were robust to applying the same tests on only the unrelated individuals in the UK Biobank as a sensitivity analysis. In Table 4 we show which of the known risk loci replicate at p < 0.05 for microscopic colitis, although none of these are significant when Bonferroni correcting the p value threshold.

Discussion
We have conducted a clinical and genetic case-control study of microscopic colitis in the UK Biobank. We confirm previously reported phenotypic associations with age, sex, coeliac disease, smoking status and PPIs. In a GWAS, we have confirmed recent reports of association with SNPs on the MHC 8.1 haplotype, indicating an immune component to the pathogenesis of microscopic colitis. Using genetic risk scores, we obtain results consistent with overlap in genetic risk factors for CD and IBD but not UC. This may suggest shared genetic pathways between these phenotypes. This is the largest GWAS of microscopic colitis to date, and the first to look at genetic overlap between microscopic colitis and IBD by using a genetic risk score. The main limitation of the study is that the ICD10 coding in the UK Biobank only covers the first decimal place: K52.8. We acknowledge two limitations as a consequence: this code also includes eosinophilic gastritis and colitis, and we are unable to distinguish lymphocytic from collagenous colitis. However, eosinophilic gastritis and colitis are of very low prevalence [5.1 and 2.1/100 000 persons respectively] 18 compared to microscopic colitis [103.0/100 000 persons]. 1 The UK Biobank ICD10 data rely on hospital coding data, which has limitations in terms of whether patients have had a coded diagnosis at hospital, and rely on the accuracy of hospital coding;    Summary statistics of the known risk variants for Crohn's disease and IBD in our microscopic colitis genome-wide association study for those that pass the nominal p value threshold of 0.05. Abbreviation: IBD, inflammatory bowel disease; GRS, genetic risk score. however, their use is standard practice in literature performing UK Biobank GWASs. This study was performed using only white Europeans, and further studies are required to determine if the results are consistent across other ethnic groups. The UK Biobank also only includes patients between the ages of 40 and 69 years at recruitment, although this covers the peak age of onset for microscopic colitis. We are confident our main GWAS result is not a false positive due to the high MAF of the lead SNP and aligning closely with previous work, but we acknowledge that due to having under 500 cases, there may by many SNPs we were unable to detect due to low OR or MAF. A meta-analysis combining the results of this study and others may help to identify further associations.
To follow up on the strong association with PPIs, we performed a GWAS to find possible underlying pharmacogenetic associations, but there were no genome-wide significant SNPs. A follow-up study with a dedicated drug-exposed cohort would have greater power to detect such associations, as has been demonstrated in our recent study of thiopurine-induced myelosuppression. 19