Abstract

Background and Aims

Genome-wide association studies [GWAS] of inflammatory bowel disease [IBD] in multiple populations have identified over 240 susceptibility loci. We previously performed a largest-to-date Asian-specific IBD GWAS to identify two new IBD risk loci and confirm associations with 28 established loci. To identify additional susceptibility loci in Asians, we expanded our previous study design by doubling the case size with an additional dataset of 1726 cases and 378 controls.

Methods

An inverse-variance fixed-effects meta-analysis was performed between the previous and the new GWAS dataset, comprising a total of 3195 cases and 4419 controls, followed by replication in an additional 1088 cases and 845 controls.

Results

The meta-analysis of Korean GWAS identified one novel locus for ulcerative colitis at rs76227733 on 10q24 [pcombined = 6.56 × 10–9] and two novel loci for Crohn’s disease [CD] at rs2240751 on 19p13 [pcombined = 3.03 × 10–8] and rs6936629 on 6q22 [pcombined = 3.63 × 10–8]. Pathway-based analysis of GWAS data using MAGMA showed that the MHC and antigenic stimulus-related pathways were more significant in Korean CD, whereas cytokine and transcription factor-related pathways were more significant in European CD. Phenotype variance explained by the polygenic risk scores derived from Korean data explained up to 14% of the variance of CD whereas those derived from European data explained 10%, emphasizing the need for large-scale genetic studies in this population.

Conclusions

The identification of novel loci not previously associated with IBD suggests the importance of studying IBD genetics in diverse populations.

1. Introduction

Inflammatory bowel disease [IBD], a chronic inflammatory disorder of the gastrointestinal tract, is thought to develop due to dysregulated mucosal immune responses to gut flora in genetically susceptible individuals.1 Crohn’s disease [CD] and ulcerative colitis [UC] are the two major subtypes of IBD.

The prevalence of IBD is lower in Asia than in the West; however, its incidence is rapidly increasing throughout Asia.2–5 There are differences in the phenotype and disease course of IBD between Asians and Europeans; CD has a male predominance in Asia with a male-to-female ratio ranging from 1.67:1 to 2.9:1,2,6,7 and ileocolonic disease is predominant in Asia, accounting for around two-thirds of CD cases, whereas colonic and ileal disease account for 4–12% and 20–23%, respectively.3,6,7 It is unclear whether these Asian-specific clinical characteristics of IBD are solely attributed to differences in the environment between Asia and the West, which highlights the need for genetic studies of IBD in Asian populations.

Recent large-scale studies on populations of European ancestry have greatly advanced our understanding of IBD-related genetics.8 A meta-analysis by the International IBD Genetics Consortium, which combined genome-wide association studies [GWAS] and Immunochip data from 96 486 individuals with multiple ancestries [including Asian samples], identified over 200 susceptibility loci for IBD and reported an overlap in the directionality of the odds ratios [ORs] between European and Asian cohorts.9 The latest genome-wide meta-analysis performed on populations of European ancestry reported 241 susceptibility loci for IBD.10 However, further studies are needed to enhance our understanding of the genetic architecture of IBD.

Despite the differences in the clinical characteristics of IBD among different ethnicities, the number of studies on non-European populations is limited.11–17 In a recent largest-to-date Asian-specific GWAS of IBD, we identified two new susceptibility loci for IBD and confirmed the associations with 28 established IBD loci in Koreans.18 Here, we performed an extended GWAS by doubling the case size compared with that in our previous study. We newly included 1726 IBD cases [725 CD and 1001 UC] and 378 unrelated healthy controls genotyped using a population-specific array [Illumina Infinium Asian Screening Array-24 v1.0]. We then conducted a GWAS meta-analysis of the two datasets, comprising a total of 3195 cases and 4419 controls of Korean ancestry. We used 1088 additional cases [582 CD and 506 UC] and 845 additional healthy controls as a replication cohort. We aimed to identify novel genetic variants associated with IBD in Asians through an extended GWAS. Our meta-analyses identified three novel susceptibility loci at the genome-wide significance level.

2. Methods

2.1. Study subjects

For discovery, we combined samples from 1469 IBD patients [896 CD and 573 UC] and 4041 controls genotyped using the OmniExpress and Omni1-Quad from our previously published GWAS18 with 1726 additional IBD patients [725 CD and 1001 UC] and 378 additional controls genotyped using the Asian Screening Array [ASA] [Supplementary Table 1]. The clinical characteristics of the patients are summarized in Supplementary Table 2. The replication cohort consisted of 1088 individuals with IBD [582 CD and 506 UC] and 845 healthy controls. In total, 9547 samples including 4283 IBD patients [2203 CD and 2080 UC cases] and 5264 controls were used for meta-analysis in the Korean population. All IBD patients were recruited from the IBD Clinic of Asan Medical Center.

2.2. Genotyping and quality control of the ASA

Quality control [QC] was conducted for each dataset separately and the combined set of samples using a common approach. Standard QC procedures were performed using PLINK v1.9 [https://www.cog-genomics.org/plink2] and R 3.5.0. as described previously for the Korean IBD dataset.18 The genotyping of 1726 IBD cases [725 CD and 1001 UC] and 378 unrelated healthy controls was carried out using the Infinium Asian Screening Array-24 v1.0 BeadChip, and detailed QC measures are described in Supplementary Table 3. All single nucleotide polymorphisms [SNPs] on the X, Y and mitochondrial chromosomes were excluded. SNPs with call rate <98%, Hardy–Weinberg equilibrium [HWE] test p < 1.0 × 10–5 for controls, or minor allele frequency [MAF] < 0.01 were excluded. Five samples with a high proportion of missing genotypes [>4%] [one CD, one UC and three controls] were removed. Nine samples [four CD and five UC] were removed due to close genetic relatedness [PI_HAT > 0.25, IBS > 0.8]. Subsequently, principal-component analysis [PCA] was performed to detect population outliers and stratification by calculating the first ten PCs per individual using PLINK v1.9 after merging with 194 reference samples including European [CEU], Asian [CHB + JPT] and African [YRI] samples from the International HapMap Project. One CD case was removed following PCA [Supplementary Figure 1A and B]. After SNP and sample QC analyses of Infinium Asian Screening Array-24 v1.0 data, 457 272 SNPs in 1726 cases and 378 controls in discovery cohort I [average call rate of 99.99%] remained for further analyses [Supplementary Table 3]. We also applied the same QC procedures for discovery cohort II18 in this study [Supplementary Table 4]. Overlapping samples [13 CD and nine UC cases] between the two discovery cohorts were removed from cohort II.

2.3. Imputation

The software IMPUTE v2.3.219 was used for imputing the genotype data of untyped SNPs in each dataset following pre-phasing using SHAPEIT v2,20 which were imputed based on the genotyped SNPs that passed the QC criteria [call rate > 98%, MAF > 1%, HWE p > 1.0 × 10–5 for controls] in each dataset. Imputation was performed using the multi-ethnic 1000 Genomes Project reference panel release v5 [https://www.internationalgenome.org/]. For the QC of imputed SNPs, we removed all imputed SNPs with info score < 0.8, missing rate > 10%, MAF < 1%, HWE test p < 1 × 10–5 for controls and 5 × 10–8 for cases, or posterior probability score < 0.8. A total of 6139 980 imputed SNPs passed the QC criteria and were combined with 457 272 SNPs for association analysis. For the previously published GWAS dataset, a total of 6088 678 imputed SNPs passed the QC criteria and were combined with 522 285 SNPs for association analysis. After imputation of each dataset, a total of 6193 769 SNPs were shared between the two datasets.

2.4. Statistical analysis

Association tests were carried out on the Korean dataset of IBD, CD and UC using the additive model of frequentist association test of SNPTEST v2.5.2 [https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html].21 A quantile–quantile plot was generated using R 3.5.0 [http://www.r-project.org/] to evaluate the overall significance of the genome-wide associations and the potential impact of population stratification. The impact of population stratification was also evaluated by calculating the genomic control inflation factor [λ GC]. As the polygenic architecture and linkage disequilibrium [LD] with true causal variants can influence λ GC, we also evaluated λ GC after stringent LD pruning [r2 < 0.1]. In addition, we used the recently developed LD score regression [LDSC] approach,22 which gives an equivalent correction factor to λ GC after accounting for the polygenic architecture. A Manhattan plot was generated with −log10p values using R [3.5.0]. Conditional analysis was performed to assess whether candidate novel signals were due to long-range LD with variants in previously reported loci. For all variants in candidate loci that were less than 3 Mb away from a known locus, conditional analysis was performed on each of the three datasets separately followed by a meta-analysis or on the combined dataset. Secondary SNPs with conditional p < 5 × 10–8 were assumed to be independent from the reported lead SNP in the region.

2.5. Fixed-effects meta-analysis in the Korean population

To identify novel susceptibility loci for IBD, CD and UC, the association results of the ASA and previously published GWAS were combined using the inverse-variance method based on a fixed-effects model as implemented in meta v1.7.23 SNPs with pmeta < 1 × 10–6 were selected for replication in an independent cohort consisting of 1088 individuals with IBD [582 CD, 506 UC] and 845 healthy controls. Genotyping of the replication cohort was performed using TaqMan genotyping assay with the Applied Biosystems 7900HT Fast Real-Time PCR System according to the manufacturer’s instructions. Between-study heterogeneity was quantified using the I2 heterogeneity score, and the statistical significance was assessed using the Q test statistic. All SNPs with heterogeneity p < 0.05 were excluded to consider possible heterogeneity across studies. For the fixed-effects model, the significance was defined as pmeta < 5 × 10–8. Defining associated loci as CD/UC/IBD was done by following the approach applied in Jostins et al.8 as described in our previous GWAS.18

2.6. Fine-mapping

To determine a set of causal SNPs, fine-mapping analysis was carried out using ‘FM-summary’ [https://github.com/hailianghuang/FM-summary/blob/master/getCredible.r] based on summary statistics from the meta-analysis of cohorts I and II and the LD reference of East Asians [JPT + CHB] in the 1000 Genomes Project reference panel. The 95% credible set in each locus was defined as the minimum list of SNPs with posterior probability [PP] > 95% in the fine-mapping analysis.

2.7. Gene annotation

We performed gene analysis using Multi-marker Analysis of GenoMic Annotation [MAGMA] v1.07b [http://ctg.cncr.nl/software/magma]24 to prioritize causal genes at susceptibility loci for IBD, CD and UC. By using the summary statistics from the meta-analysis of cohorts I and II and LD information of the East Asian population as input, all SNPs located between the transcription start and end sites were aggregated to that gene to calculate the gene p value based on a multiple regression model. Of 19 257 reference genes, 17 371 genes for IBD, 17 361 genes for CD and 17 396 genes for UC were included in the gene analysis. By applying the threshold of Bonferroni correction, we annotated 29 genes with p < 2.88 × 10–6 [0.05/17,371] for IBD, 58 genes with p < 2.88 × 10–6 [0.05/17,361] for CD, and 39 genes with p < 2.87 × 10–6 [0.05/17,396] for UC [Supplementary Table 5].

2.8. eQTL and bioinformatics analysis

To gain insight into the potential functional roles of the novel loci, we performed cis-eQTL analysis extensively by searching publicly available data from the eQTL Blood Browser,25 Genotype-Tissue Expression [GTEx] database,26 and whole blood cis-eQTL databases for Japanese27 and Koreans.28 Whole blood, small intestine, transverse colon and sigmoid colon data were selected in the GTEx browser for the analysis. To explore the epigenetic profiles of susceptibility loci, ENCODE histone modification data, HaploReg v4.1 and Regulome DB were used to examine whether any of the SNPs or their proxies [r2 ≥ 0.8 in the 1000 genomes of the JPT + CHB reference panel] were annotated as regulatory variants. Identified loci were examined for previous implications in other autoimmune or immune-related phenotypes using the Ensembl, UCSC Genome Bioinformatics, GeneCards and GWAS Catalog databases. When the SNP was not directly typed, a proxy SNP was used [r2 ≥ 0.8].

2.9. Pathway analysis

To identify biological pathways associated with annotated genes for IBD, CD and UC, we performed gene-set analysis using MAGMA v1.07b.24 The analysis results were used as input data. We used the gene sets of 9976 Gene Ontology pathways from MSigDB v7.029 to calculate the p value of each pathway. We set the statistical significance at Bonferroni-corrected p < 5.01 × 10–6 [0.05/9976]. We also performed pathway analysis using the previously published summary statistics for a European IBD dataset.10 The European dataset comprised 12 194 cases and 28 072 controls for CD and 12 366 cases and 33 609 controls for UC.

2.10. Polygenic risk scores

We performed polygenic risk score [PRS] analysis using PRSice-2.30 A PRS estimates an individual’s genetic liability to disease based on genotype profile and relevant GWAS data. PRSs are calculated by summing risk alleles, which are weighted by effect sizes derived from GWAS results.30 To avoid overfitting, we used one of the two cohorts as the base data for estimating effect sizes and the other as the target data for evaluating PRS. Specifically, we calculated the PRSs in cohort I newly genotyped by ASA based on the effect sizes estimated from the Korean GWAS [PRSKOR]18 of cohort II or the European ancestry IBD GWAS [PRSEUR].10 We used a total of 5601 568 shared SNPs between cohort I and the Korean GWAS to calculate the PRSKOR, and a total of 4391 300 shared SNPs between cohort I and European ancestry IBD GWAS to calculate the PRSEUR. We treated the MHC region [chromosome 6: 25–34 Mb] with additional caution to minimize overfitting due to tight LD by selecting only the most significant variant in this region, and both PRSKOR and PRSEUR showed a normal distribution [data not shown]. After LD clumping [--clump-kb 250, --clump-p 1.00, and --clump-r2 0.10] using the East Asian [CHB + JPT] or European [CEU + FIN + GBR + IBS + TSI] 1000 Genomes data as a reference panel, 151 164 SNPs in Korean GWAS and 131 117 SNPs in European GWAS remained. We selected LD-clumped SNPs based on thresholds of p values [5 × 10–8, 5 × 10–6, 5 × 10–4, 5 × 10–3, 5 × 10–2, 0.1, 0.2, 0.5 and 1] from the Korean or European GWAS for the PRS analysis. We then compared the full model [including the PRS] with the null model [with the PRS variable excluded] and estimated the variance explained using Nagelkerke’s pseudo-R2.

2.11. Ethics

This study was approved by the Institutional Review Board of Asan Medical Center [2017-0456]. All samples were obtained with written informed consent under Institutional Review Board-approved protocols.

3. Results

3.1. Identification of susceptibility loci for UC and CD by fixed-effects meta-analysis of Korean GWAS

To identify additional susceptibility loci for IBD, CD and UC, we carried out a two-stage GWAS analysis of the Korean population. The discovery stage involved a new GWAS dataset of 1726 IBD cases [725 CD and 1001 UC] and 378 unrelated healthy controls genotyped using the Infinium Asian Screening Array-24 v1.0 BeadChips [cohort I] and one published GWAS dataset of 1469 IBD patients [896 CD and 573 UC] and 4041 controls genotyped using the OmniExpress and Omni1-Quad [cohort II]18 [Supplementary Table 1]. Following the genome-wide imputation of the two datasets separately, association tests were carried out on each of the Korean IBD, CD and UC datasets using the additive model of frequentist association test of SNPTEST v2.5.2 [https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html].21 The quantile–quantile plots for IBD, CD and UC in cohort I appeared normal, as shown in Supplementary Figure 2A–C. The genomic inflation factor [λ GC] of IBD [1.036], CD [1.033] and UC [1.044] was decreased to 1.029, 1.026, and 1.029, respectively, following LD pruning [Supplementary Figure 2]. These results suggest that a slight inflation in λ GC might reflect the polygenic architecture of the disease, rather than population stratification. As shown in the Manhattan plot for IBD, CD, and UC [Supplementary Figure 3], the association test of ASA data identified two loci [TNFSF15-TNFSF8, MHC] previously established for CD and one locus [MHC] previously established for UC with genome-wide significance [p < 5 × 10–8]. To maximize the statistical power for identification of novel susceptibility loci for IBD, CD and UC in the Korean population, we performed fixed-effects meta-analyses of cohorts I and II18 using meta v1.7.23 The meta-analysis dataset consisted of 7614 individuals including 3195 IBD patients [1621 CD and 1574 UC] and 4419 healthy controls. The λ GC of both cohorts I and II was lower than 1.03 following LD score regression22 accounting for the polygenic architecture [Supplementary Table 1]. The IBD meta-analysis showed the λ GC of 1.11 was reduced to 1.03 after LD score regression. A total of 10 loci were identified with genome-wide significance [pmeta < 5 × 10–8] including TNFSF15-TNFSF8, MHC, TNFRSF6B, TBC1D1-KLF3, GPR35, PYGO2-SHC1, STAT3-STAT5B-STAT5A, NCF4-CSF2RB, DUSP5-SMNDC1 and ZNF365 in the meta-analysis for IBD [Supplementary Figure 4A]. These are well-established IBD susceptibility loci. Following the meta-analyses for CD and UC separately, one novel locus [LOC731275] and 12 established loci for CD and one novel locus [LCOR-SLIT1] and four established loci for UC exceeded the genome-wide significance level [Supplementary Figure 4B and C]. By applying a threshold of pmeta < 1 × 10–6, we selected eight additional novel candidate loci [two loci for IBD, five loci for CD and one locus for UC] for the replication study [Supplementary Table 6]. We genotyped these ten lead SNPs from two novel and eight suggestive loci in an independent replication sample consisting of 1088 individuals with IBD [582 CD and 506 UC] and 845 healthy controls using TaqMan technology. By combining association results from the meta-analysis and replication study, three novel susceptibility loci were identified including one UC-specific locus and two CD-specific loci: rs76227733 in the LCORSLIT1 region at 10q24 [pcombined = 6.56 × 10–9, OR = 1.32] for UC, rs2240751 in the MFSD12-C19orf71-FZR1-DOHH region at 19p13 [pcombined = 3.03 × 10–8, OR = 1.25] for CD, and rs6936629 in the RFX6-GPRC6A-FAM162B region at 6q22 [pcombined = 3.63 × 10–8, OR = 1.25] for CD [Table 1 and Figure 1]. These three SNPs showed consistent association across the three independent samples without any indication of genetic heterogeneity [p > 0.05]. The three loci did not show additional independent genome-wide significant signals following conditional analyses [Supplementary Figure 5].

Table 1.

Three novel susceptiblilty loci to ulcerative colitis or Crohn’s diease in Korean population

PhenotypeLocusSNPPosition [hg19]Candidate gene[s]Risk alleleStudyNumber of samplesRAFOR [95% CI]ppheta
CaseControlCaseControl
UC10q24rs7622773398,556,649LCOR, SLIT1CASA10013780.3440.2641.47 [1.22–1.76]3.91 × 10-5b
GWAS57340410.3480.2961.35 [1.16–1.56]8.22 × 10-5b
Replication4918370.3490.3111.19 [1.01–1.41]4.35 × 10-2†
Combined206552560.3460.2961.32 [1.20–1.46]6.56 × 10-9‡2.42 × 10-1
CD19p13rs22407513,548,231MFSD12, C19orf71, FZR1, DOHHGASA7233780.3870.3191.34 [1.11–1.60]1.78 × 10-3†
GWAS89640410.3840.3341.24 [1.12–1.38]6.09 × 10-5†
Replication4618260.3770.3311.22 [1.03–1.44]1.93 × 10-2†
Combined208052450.3840.3321.25 [1.16–1.36]3.03 × 10-8‡7.32 × 10-1
6q22rs6936629117,239,141RFX6, GPRC6A, FAM162BCASA7233750.3990.3431.27 [1.06–1.52]9.94 × 10-3†
GWAS89640410.4040.3511.26 [1.13–1.40]2.08 × 10-5†
Replication4868260.3870.3381.21 [1.04–1.42]1.64 × 10-2†
Combined210552420.3980.3481.25 [1.15-1.35]3.63 × 10-8‡9.14 × 10-1
PhenotypeLocusSNPPosition [hg19]Candidate gene[s]Risk alleleStudyNumber of samplesRAFOR [95% CI]ppheta
CaseControlCaseControl
UC10q24rs7622773398,556,649LCOR, SLIT1CASA10013780.3440.2641.47 [1.22–1.76]3.91 × 10-5b
GWAS57340410.3480.2961.35 [1.16–1.56]8.22 × 10-5b
Replication4918370.3490.3111.19 [1.01–1.41]4.35 × 10-2†
Combined206552560.3460.2961.32 [1.20–1.46]6.56 × 10-9‡2.42 × 10-1
CD19p13rs22407513,548,231MFSD12, C19orf71, FZR1, DOHHGASA7233780.3870.3191.34 [1.11–1.60]1.78 × 10-3†
GWAS89640410.3840.3341.24 [1.12–1.38]6.09 × 10-5†
Replication4618260.3770.3311.22 [1.03–1.44]1.93 × 10-2†
Combined208052450.3840.3321.25 [1.16–1.36]3.03 × 10-8‡7.32 × 10-1
6q22rs6936629117,239,141RFX6, GPRC6A, FAM162BCASA7233750.3990.3431.27 [1.06–1.52]9.94 × 10-3†
GWAS89640410.4040.3511.26 [1.13–1.40]2.08 × 10-5†
Replication4868260.3870.3381.21 [1.04–1.42]1.64 × 10-2†
Combined210552420.3980.3481.25 [1.15-1.35]3.63 × 10-8‡9.14 × 10-1

CD, Crohn’s disease; CI, confidence interval; hg19, human genome version 19; OR, odds ratio; p, p value; Position, chromosome position; RAF, risk allele frequency; SNP, single nucleotide polymorphism; UC, ulcerative colitis.

ap value for heterogeneity.

bAssociation p value of SNPTEST v2.5.2.

cp value of fixed-effects meta-analysis.

Table 1.

Three novel susceptiblilty loci to ulcerative colitis or Crohn’s diease in Korean population

PhenotypeLocusSNPPosition [hg19]Candidate gene[s]Risk alleleStudyNumber of samplesRAFOR [95% CI]ppheta
CaseControlCaseControl
UC10q24rs7622773398,556,649LCOR, SLIT1CASA10013780.3440.2641.47 [1.22–1.76]3.91 × 10-5b
GWAS57340410.3480.2961.35 [1.16–1.56]8.22 × 10-5b
Replication4918370.3490.3111.19 [1.01–1.41]4.35 × 10-2†
Combined206552560.3460.2961.32 [1.20–1.46]6.56 × 10-9‡2.42 × 10-1
CD19p13rs22407513,548,231MFSD12, C19orf71, FZR1, DOHHGASA7233780.3870.3191.34 [1.11–1.60]1.78 × 10-3†
GWAS89640410.3840.3341.24 [1.12–1.38]6.09 × 10-5†
Replication4618260.3770.3311.22 [1.03–1.44]1.93 × 10-2†
Combined208052450.3840.3321.25 [1.16–1.36]3.03 × 10-8‡7.32 × 10-1
6q22rs6936629117,239,141RFX6, GPRC6A, FAM162BCASA7233750.3990.3431.27 [1.06–1.52]9.94 × 10-3†
GWAS89640410.4040.3511.26 [1.13–1.40]2.08 × 10-5†
Replication4868260.3870.3381.21 [1.04–1.42]1.64 × 10-2†
Combined210552420.3980.3481.25 [1.15-1.35]3.63 × 10-8‡9.14 × 10-1
PhenotypeLocusSNPPosition [hg19]Candidate gene[s]Risk alleleStudyNumber of samplesRAFOR [95% CI]ppheta
CaseControlCaseControl
UC10q24rs7622773398,556,649LCOR, SLIT1CASA10013780.3440.2641.47 [1.22–1.76]3.91 × 10-5b
GWAS57340410.3480.2961.35 [1.16–1.56]8.22 × 10-5b
Replication4918370.3490.3111.19 [1.01–1.41]4.35 × 10-2†
Combined206552560.3460.2961.32 [1.20–1.46]6.56 × 10-9‡2.42 × 10-1
CD19p13rs22407513,548,231MFSD12, C19orf71, FZR1, DOHHGASA7233780.3870.3191.34 [1.11–1.60]1.78 × 10-3†
GWAS89640410.3840.3341.24 [1.12–1.38]6.09 × 10-5†
Replication4618260.3770.3311.22 [1.03–1.44]1.93 × 10-2†
Combined208052450.3840.3321.25 [1.16–1.36]3.03 × 10-8‡7.32 × 10-1
6q22rs6936629117,239,141RFX6, GPRC6A, FAM162BCASA7233750.3990.3431.27 [1.06–1.52]9.94 × 10-3†
GWAS89640410.4040.3511.26 [1.13–1.40]2.08 × 10-5†
Replication4868260.3870.3381.21 [1.04–1.42]1.64 × 10-2†
Combined210552420.3980.3481.25 [1.15-1.35]3.63 × 10-8‡9.14 × 10-1

CD, Crohn’s disease; CI, confidence interval; hg19, human genome version 19; OR, odds ratio; p, p value; Position, chromosome position; RAF, risk allele frequency; SNP, single nucleotide polymorphism; UC, ulcerative colitis.

ap value for heterogeneity.

bAssociation p value of SNPTEST v2.5.2.

cp value of fixed-effects meta-analysis.

Regional association plots for three novel IBD loci: [A] rs76227733 at 10q24; [B] rs2240751 at 19p13; [C] rs6936629 at 6q22. SNPs are plotted according to their chromosomal positions [NCBI Build 37] with −log10p values from the meta-analysis in the region flanking 750 kb on either side of the marker SNP. Circles indicate genotyped SNPs and squares indicate imputed SNPs. The most strongly associated SNP in the discovery stage is shown as a small purple circle. Linkage disequilibrium [LD; r2 values] between the lead SNP and other SNPs is indicated using colours. The relative location of the annotated genes and the direction of transcription are shown in the lower portion of the figure. The estimated recombination rates of Asian samples from the 1000 Genomes Project [November 2014] are plotted to reflect the local LD structure. Plots are generated using LocusZoom.
Figure 1.

Regional association plots for three novel IBD loci: [A] rs76227733 at 10q24; [B] rs2240751 at 19p13; [C] rs6936629 at 6q22. SNPs are plotted according to their chromosomal positions [NCBI Build 37] with −log10p values from the meta-analysis in the region flanking 750 kb on either side of the marker SNP. Circles indicate genotyped SNPs and squares indicate imputed SNPs. The most strongly associated SNP in the discovery stage is shown as a small purple circle. Linkage disequilibrium [LD; r2 values] between the lead SNP and other SNPs is indicated using colours. The relative location of the annotated genes and the direction of transcription are shown in the lower portion of the figure. The estimated recombination rates of Asian samples from the 1000 Genomes Project [November 2014] are plotted to reflect the local LD structure. Plots are generated using LocusZoom.

3.2. Gene prioritization of novel associations

3.2.1.Locus 10q24 for UC

The most significant association at rs76227733 on 10q24 [pcombined = 6.56 × 10–9, OR = 1.32] was located 35.4 kb away from the 5′-end of LCOR [ligand dependent nuclear receptor corepressor], and 201.1 kb away from the 3′-end of SLIT1 [slit guidance ligand 1] [Table 1 and Figure 1A]. rs76227733 was within an LD region of 250.6 kb, which included LCOR and SLIT1. The 95% credible set at the 10q24 locus consisted of 28 SNPs including rs76227733 [PP = 18.7%] in the fine-mapping analysis [Supplementary Table 7]. Gene analysis using MAGMA v1.07b24 identified a significant association with LCOR [p = 9.58 × 10–7][Supplementary Table 5]. LCOR is a transcriptional corepressor that can attenuate agonist-activated nuclear receptor signalling through multiple mechanisms.31SLIT1 belongs to the Slit family of proteins implicated in guiding the migration of neurons and leukocytes.32 rs76227733 did not show cis-eQTL effects in the Korean CD eQTL,28 Japanese eQTL,27 GTEx,26 or eQTL Blood Browser.25 However, rs57807455, which was in high LD with rs76227733 [r2 = 0.99], showed enhancer histone marks and DNAse I hypersensitivity sites in gastrointestinal tissues, altered TATA box motifs in Haploreg v4.1, and a high regulomeDB score of 2b [Supplementary Table 8]. Analysis of the ENCODE project data showed that rs57807455 was located at the candidate cis-regulatory elements [cCREs] of EH38E1491359 [chromosome 10: 96 798 560–96 798 898 bp], a binding site of POLR2A in the transverse colon and Peyer’s patch cells [Supplementary Table 9].

3.2.2.Locus 19p13 for CD

At the 19p13 locus identified in CD meta-analysis, the lead SNP was rs2240751 [pcombined = 3.03 × 10–8, OR = 1.25] within an LD region of 66.5 kb, which included MFSD12 [major facilitator superfamily domain containing 12], C19orf71 [chromosome 19 open reading frame 71], FZR1 [fizzy and cell division cycle 20 Related 1] and DOHH [deoxyhypusine hydroxylase][Table 1 and Figure 1B]. In the fine-mapping analysis, rs2240751 was the only single variant with greater than 50% certainty [PP = 83.4%] within the 95% credible set including three SNPs [Supplementary Table 7]. rs2240751 did not show any cis-eQTL effects in disease-relevant tissues. rs2240751 was in exon 3 of MFSD12, resulting in an amino acid substitution from a polar uncharged tyrosine to a basic histidine at position 182 of the transmembrane helical domain. In silico evaluation of rs2240751 based on sequence homology and physico-chemical similarity predicted the substitution to be deleterious with a SIFT33 score of 0 and probably damaging with a PolyPhen-234 score of 1. MFSD12 belongs to the major facilitator superfamily [MFS] of membrane proteins, the largest family of secondary transporters. MFS proteins catalyse the transport of a wide range of substrates in both directions across the membrane.35MFSD12 mRNA levels are low in the depigmented skin of vitiligo patients, probably due to the autoimmune-related destruction of melanocytes.36MFSD12 has also been found to be associated with skin pigmentation in Africans.37 The minor allele G of rs2240751 showed active regulatory features in 11 immune cell types, including CD4 T cells, macrophages and granulocytes in the Ensembl Genome Browser38 [Supplementary Table 10].

3.2.3.Locus 6q22 for CD

rs6936629 at the 6q22 locus identified in CD meta-analysis [pcombined = 3.63 × 10–8, OR = 1.25] was in an LD region of 446.5 kb, which included RFX6 [regulatory factor X6], GPRC6A [G protein-coupled receptor class C group 6 member A] and FAM162B [family with sequence similarity 162 member B][Table 1 and Figure 1C]. The 95% credible set consisted of 277 SNPs including rs6936629 with a PP of 4.6% [Supplementary Table 7]. Gene analysis using MAGMA24 identified RFX6 as an annotated gene with a significant p value of 1.26 × 10–6 [Supplementary Table 5]. rs6936629, located in intron 9 of RFX6, did not have eQTL effects on the genes at this locus. RFX6 belongs to the regulatory factor X [RFX] family of transcription factors, which can bind to X-box motifs highly conserved in the promoter regions of various MHC class II genes.39 Among 37 SNPs in high LD [r2 ≥ 0.8] with rs6936629 in Haploreg v4.1, rs339331 [r2 = 0.97] with enhancer histone marks in gastrointestinal tissues had the highest RegulomeDB score [2b] [Supplementary Table 8]. Based on previous reports indicating that the prostate cancer risk-associated SNP rs339331 lies within a functional HOXB13-binding site and that the T risk allele increases the transcription of RFX6 by promoting the binding of HOXB13 to a transcriptional enhancer,40,41 the C risk allele for CD appeared to be associated with the decreased expression of RFX6 [pmeta = 3.53 × 10–6]. Recently, RFX6 was reported to be an essential transcriptional regulator of enteroendocrine cell specification in mice, which sheds light on the molecular mechanisms of intestinal failures in human RFX6-deficiencies such as Mitchell–Riley syndrome.42,43

We further examined why these three novel loci were not identified as significant in the European population [Supplementary Table 11], even though previous European studies had much larger sample sizes. rs76227733 at 10q24 identified in UC meta-analysis did not show a significant association in the European population [p = 9.82 × 10–2]. It showed a relatively different allele frequency between two populations [MAF = 3.8% in the European population vs 28.6% in the East Asian population] [Supplementary Table 11]. However, rs57807455, ~1.9 kb away from rs76227733 with low LD [r2 < 0.2 in Europeans], showed a suggestive association with UC in Europeans [p = 3.78 × 10–3, OR = 1.08]. rs2240751 at 19p13 identified in CD meta-analysis showed effects in the same direction between the Korean and European data; however, the p value [4.22 × 10–1] was not significant [Supplementary Table 11]. The risk allele frequency of rs2240751 was around 1% in the European population, whereas it was 26.9% in the East Asian population. Among the three novel loci, rs6936629 at the 6q22 locus including RFX6-GPRC6A-FAM162B showed a significant association in the European population [p = 1.88 × 10–4, OR = 1.07] [Supplementary Table 11].

3.3. Previously reported loci

With our Korean data, we first examined 245 IBD-associated loci [276 independent SNPs] that were previously established in European studies [Supplementary Table 12].9,10 A total of 39 independent SNPs in 36 loci were monomorphic in the East Asian population, and 13 additional independent SNPs in ten loci did not have proxy SNPs [r2 > 0.8]. Of the remaining 224 SNPs in 199 available loci, 29 independent SNPs in 27 loci were significant after Bonferroni correction [0.05/276, p < 1.81 × 10–4]. Although the top SNPs were different with low LD [r2 < 0.2] between the Korean and European data, three additional loci including GPR35, TNFRSF6B and NCF4 showed a genome-wide significant association in the discovery phase [Supplementary Figure 4A]. Of the seven loci first identified in Asians, three including the PYGO2-SHC1 locus at 1q21,18CDYL2 at 16q23,18 and CDKN2A-AS1-CDKN2A-CDKN2B-AS1-CDKN2B at 9p2144 were not present in the list of 245 IBD loci. These three Asian-specific loci had Bonferroni-corrected significant pmeta < 1.67 × 10–2 [0.05/3] in the discovery phase; thus we consider them as replicated as well [Supplementary Table 13]. In total, 35 SNPs from 33 loci, including the 27, three and three loci described above, were replicated in the Korean population. An additional 77 independent SNPs in 73 loci did not reach the Bonferroni threshold but showed nominal p < 0.05 in Koreans. Of those, five loci had an opposite direction of effects with Europeans [Supplementary Table 12].

The replication of only a small fraction of the established loci in our discovery samples may be attributed to the limited power of our study. An estimation of the statistical power of our GWAS samples for detecting the 276 European susceptibility SNPs based on the reported OR and allele frequency in the Korean population [Supplementary Table 12] showed that our samples had limited power: nine SNPs in eight loci and 48 SNPs in 39 loci had >80% power at p < 1.81 × 10–4 and p < 0.05, respectively.

We also compared the effect sizes of 224 SNPs from the established loci between Asians and Europeans using Pearson’s correlation coefficient. The comparison showed positive correlations in the direction of effects for CD and UC between the European and Korean populations with significant p values [Supplementary Figure 6] [r = 0.60 and p = 7.30 × 10–22 for CD; r = 0.62 and p = 8.25 × 10–24 for UC], which are consistent with the findings of previous studies.9,18

3.4. Pathway analysis

To identify biological processes associated with candidate genes for CD and UC, we conducted pathway analyses using the summary statistics obtained through the meta-analyses of cohorts I and II as input. Pathway analysis using MAGMA v1.07b.24 for 9976 Gene Ontology gene sets from MSigDB v7.029 identified 30 and five pathways for CD and UC, respectively, with Bonferroni-corrected significance [0.05/9976, p < 5.01 × 10–6] [Supplementary Table 14]. The MHC class II protein complex was the most significant pathway for both phenotypes [p = 1.56 × 10–9 for CD and 5.65 × 10-10 for UC]. Pathways including T cell differentiation [p = 2.02 × 10–7 for CD and 1.48 × 10–1 for UC] and T helper 17 type immune response [p = 3.29 × 10–6 for CD and 2.58 × 10–2 for UC] were specifically significant for CD only.

We also performed additional pathway analyses using the summary statistics of the largest meta-analysis in the European population10 and identified 157 and 29 pathways for CD and UC, respectively [Supplementary Table 14]. We then compared the list of the top ten pathways for CD and UC between the Korean and European data [Figure 2]. In the case of CD, T cell differentiation-related pathways were significant in both populations. The MHC class II protein complex, antigen binding and response to antigenic stimulus-related pathways were more significant in the Korean population, whereas cytokine and transcription factor-related pathways were more significant in the European population [Figure 2A and B]. In the case of UC, MHC and antigen binding-related pathways identified in the Korean population were also significant in the European population [Figure 2C and D].

Comparison of biological pathways associated with Crohn’s disease [CD] and ulcerative colitis [UC] between the Korean and European data. Top ten pathways with Bonferroni-corrected significant p < 5.01 × 10–6 [0.05/9976] [vertical dashed line] for CD in the [A] Korean and [B] European populations and for UC in the [C] Korean and [D] European populations. Each bar denotes the significance of each biological process [p value on a −log10 scale] in pathway analysis using MAGMA v1.07b for 9976 Gene Ontology [GO] sets. *Bonferroni-corrected significant pathways in Koreans only, and #in Europeans only.
Figure 2.

Comparison of biological pathways associated with Crohn’s disease [CD] and ulcerative colitis [UC] between the Korean and European data. Top ten pathways with Bonferroni-corrected significant p < 5.01 × 10–6 [0.05/9976] [vertical dashed line] for CD in the [A] Korean and [B] European populations and for UC in the [C] Korean and [D] European populations. Each bar denotes the significance of each biological process [p value on a −log10 scale] in pathway analysis using MAGMA v1.07b for 9976 Gene Ontology [GO] sets. *Bonferroni-corrected significant pathways in Koreans only, and #in Europeans only.

The Gene Ontology pathways for prioritized genes in three novel loci at 10q24, 19p13 and 6q22 failed to show Bonferroni-corrected significant p values. However, transcription factor binding [p = 6.68 × 10–4] including LCOR at the 10q24 locus, whole membrane [p = 3.16 × 10–2] including MFSD12 at the 19p13 locus, and regulation of peptide secretion [p = 9.47 × 10–3] including RFX6 at the 6q22 locus showed p < 0.05.

3.5. Polygenic risk scores

We also calculated variance explained by the PRSs for genome-wide significant variants using Korean vs European effect sizes. Despite the fact that the European studies had much larger samples size, PRSs derived from Korean data explained up to 14.3% of the phenotype variance of CD whereas those derived from European data explained 9.9% [Supplementary Table 15]. However, for UC, the variance explained by PRSEUR was far better than that explained by PRSKOR [11.8% vs 7.3%].

4. Discussion

In this extended GWAS of IBD in the East Asian population, we successfully identified one novel locus for UC and two novel loci for CD and replicated 35 SNPs from 33 previously reported loci in the Korean population, indicating distinct as well as common pathways associated with IBD in Europeans and Asians. Of the three novel loci, one novel CD locus was not replicated in the European data, suggesting the presence of population-specific IBD susceptibility loci.

Comparisons of the top ten pathways for CD and UC between the Korean and European data [Figure 2] showed that T cell differentiation-related pathways were significant for CD in both populations. However, MHC class II protein complex, antigen binding and response to antigenic stimulus-related pathways were more significant in the Korean population, whereas cytokine and transcription factor-related pathways were more significant in the European population. In the case of UC, MHC and antigen binding-related pathways identified in the Korean population were also significant in the European population. Our findings offer new insights into the genetic architecture of IBD.

Genetic correlation analyses of CD and UC between European and East Asian cohorts showed that the genetic architectures of CD and UC were broadly similar [CD, r = 0.60; UC, r = 0.62] between these populations as reported previously.9 However, variants for some individual loci showed significant differences in effect size and/or allele frequency. Recently, PRSs have attracted increasing interest from the clinical community for their predictive value for multiple common diseases.45–47 One of the major challenges associated with the clinical utility of PRSs is that their accuracy is highly dependent on the study population represented in the training GWAS. PRSs developed using data from Europeans can be less predictive in non-Europeans. Indeed, even with our modest sample sizes, the variance of CD explained by PRSKOR was higher than that by PRSEUR [Supplementary Table 15]. Of note is that this large variance explained by PRSKOR was not driven by overfitting, because we ensured that the training data [cohort II] and target data [cohort I] for PRSs were independent [Methods]. To further validate this phenomenon, we used the summary statistics from the Immunochip IBD GWAS.9 This study published summary statistics for both East Asians and Europeans. Although the East Asians included Koreans, these individuals were not included in our current study. Thus, we can use these summary statistics to calculate the PRS of our target data [cohort I]. When we calculated the PRS, we obtained a similar observation that PRSEAS explained more of the variance than PRSEUR [12.9% vs 8.0%] for CD and the variance of UC explained by PRSEUR was higher than the variance explained by PRSEAS [11.2% vs 2.5%] [Supplementary Table 16]. We wanted to further examine this phenomenon of a larger variance being explained by PRSKOR than by PRSEUR for CD. If the genetic structures had been similar between the two populations, the variance explained by PRSEUR would have been larger than the variance explained by PRSKOR, as shown in our UC data, because PRSEUR was calculated based on a larger amount of base data. Our observation in CD, however, suggested that the effect sizes might be population-specific in CD. We focused on the observation that this tendency was consistent regardless of the p-value threshold for the PRS calculation [Supplementary Table 15]. This suggested a possibility that the SNPs with large effects [small p-values], which contributed to PRS regardless of the p-value thresholds, might have population-specific effects. To test this hypothesis, we re-calculated the variance explained by PRSKOR and PRSEUR after removing the largest-effect loci, TNFSF15 and HLA. When we removed each of the two, the variance explained by PRSKOR became similar to the variance explained by PRSEUR [Supplementary Table 17]. Furthermore, when we removed both of them, the variance explained by PRSKOR became smaller than the variance explained by PRSEUR. This showed that the large variance explained by PRSKOR might have been driven by these two loci with population-specific effects. This was in line with our previous report that in HLA, the effects for CD were more population-specific than for UC.48 In the future, in order to achieve an equitable benefit of PRSs for IBD patients, we will need to perform large genetic analyses in diverse populations and create tools for population genetic admixture.

Acknowledgments

We would like to thank all the participating patients and healthy donors who provided the DNA and clinical information for this study. This study was supported by the PLSI supercomputing resources of the Korea Institute of Science and Technology Information.

Funding

This study was supported by a Mid-career Researcher Program grant through the National Research Foundation of Korea to K. Song [2017R1A2A1A05001119, 2020R1A2C2003275] funded by the Ministry of Science and ICT, the Republic of Korea.

Conflict of Interest

The authors disclose the following: B.D.Y.: grant/research support from CELLTRION, Inc.; lecture fees from Abbvie Korea, CELLTRION, Inc., Janssen Korea, Shire Korea, Takeda Korea and IQVIA; consultant fees from Abbvie Korea, Ferring Korea, Janssen Korea, Kangstem Biotech, Kuhnil Pharm., Shire Korea, Takeda Korea, IQVIA, Cornerstones Health and Robarts Clinical Trials Inc. S.-K.Y.: grant/research support from Janssen Korea Ltd. The remaining authors have no conflicts of interest.

Author Contributions

K.S. obtained financial support. K.S. and J.L. conceived and designed the study and supervised the data analysis and interpretation. S.-K.Y., B.D.Y., S.H.P. and H.S.L. recruited subjects and participated in the diagnostic evaluation. S.J., D.P., G.K. and J.B. performed data analyses. S.J. and K.S. drafted the manuscript. B.H. supervised the data analysis and interpretation. K.S., B.H. and H.S.L. revised the manuscript.

Web Resources

The URLs for data presented herein are as follows:

METAL, http://www.sph.umich.edu/csg/abecasis/metal/

Online Mendelian Inheritance in Man [OMIM], http://www.omim.org

The 1000 Genomes Project, http://www.1000genomes.org/

UCSC Genome Browser, http://genome.ucsu.edu/

GCTA, cnsgenomics.com/software/gcta/

IIBDGC, www.ibdgentics.org

RegulomeDB v2, http://www.broadinstitute.org/mammals/haploreg/haploreg.php

Genotype-Tissue Expression [GTEx] project, http://www.gtexportal.org/home

eQTL Blood Browser, http://www.genenetwork.nl/bloodeqtlbrowser/

Geuvadis/1000 Genomes resources, http://www.ebi.ac.uk/Tools/geuvadis-das/

Data Availability

The summary data from our genome-wide meta-analyses on IBD, CD and UC are available from the European Genome-phenome Archive under accession EGAS00001005026.

References

1.

Khor
B
,
Gardet
A
,
Xavier
RJ
.
Genetics and pathogenesis of inflammatory bowel disease
.
Nature
2011
;
474
:
307
17
.

2.

Thia
KT
,
Loftus
EV
Jr
,
Sandborn
WJ
,
Yang
SK
.
An update on the epidemiology of inflammatory bowel disease in Asia
.
Am J Gastroenterol
2008
;
103
:
3167
82
.

3.

Yang
SK
,
Yun
S
,
Kim
JH
, et al.
Epidemiology of inflammatory bowel disease in the Songpa-Kangdong district, Seoul, Korea, 1986–2005: a KASID study
.
Inflamm Bowel Dis
2008
;
14
:
542
9
.

4.

Ng
SC
,
Tang
W
,
Ching
JY
, et al.
Incidence and phenotype of inflammatory bowel disease based on results from the Asia-Pacific Crohn’s and colitis epidemiology study
.
Gastroenterology
2013
;
145
:
158
62.e2
.

5.

Park
SJ
,
Kim
WH
,
Cheon
JH
.
Clinical characteristics and treatment of inflammatory bowel disease: a comparison of Eastern and Western perspectives
.
World J Gastroenterol
2014
;
20
:
11525
37
.

6.

Burisch
J
,
Pedersen
N
,
Čuković-Čavka
S
, et al. ;
EpiCom-group
.
East–West gradient in the incidence of inflammatory bowel disease in Europe: the ECCO-EpiCom inception cohort
.
Gut
2014
;
63
:
588
97
.

7.

Cosnes
J
,
Gower-Rousseau
C
,
Seksik
P
,
Cortot
A
.
Epidemiology and natural history of inflammatory bowel diseases
.
Gastroenterology
2011
;
140
:
1785
94
.

8.

Jostins
L
,
Ripke
S
,
Weersma
RK
, et al. ;
International IBD Genetics Consortium [IIBDGC]
.
Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease
.
Nature
2012
;
491
:
119
24
.

9.

Liu
JZ
,
van Sommeren
S
,
Huang
H
, et al. ;
International Multiple Sclerosis Genetics Consortium; International IBD Genetics Consortium
.
Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations
.
Nat Genet
2015
;
47
:
979
86
.

10.

de Lange
KM
,
Moutsianas
L
,
Lee
JC
, et al.
Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease
.
Nat Genet
2017
;
49
:
256
61
.

11.

Yang
SK
,
Hong
M
,
Zhao
W
, et al.
Genome-wide association study of ulcerative colitis in Koreans suggests extensive overlapping of genetic susceptibility with Caucasians
.
Inflamm Bowel Dis
2013
;
19
:
954
66
.

12.

Yang
SK
,
Hong
M
,
Zhao
W
, et al.
Genome-wide association study of Crohn’s disease in Koreans revealed three new susceptibility loci and common attributes of genetic susceptibility across ethnic populations
.
Gut
2014
;
63
:
80
7
.

13.

Yang
SK
,
Hong
M
,
Choi
H
, et al.
Immunochip analysis identification of 6 additional susceptibility loci for Crohn’s disease in Koreans
.
Inflamm Bowel Dis
2015
;
21
:
1
7
.

14.

Ye
BD
,
Choi
H
,
Hong
M
, et al.
Identification of ten additional susceptibility loci for ulcerative colitis through immunochip analysis in Koreans
.
Inflamm Bowel Dis
2016
;
22
:
13
9
.

15.

Hong
M
,
Ye
BD
,
Yang
SK
, et al.
Immunochip meta-analysis of inflammatory bowel disease identifies three novel loci and four novel associations in previously reported loci
.
J Crohns Colitis
2018
;
12
:
730
41
.

16.

Asano
K
,
Matsushita
T
,
Umeno
J
, et al.
A genome-wide association study identifies three new susceptibility loci for ulcerative colitis in the Japanese population
.
Nat Genet
2009
;
41
:
1325
9
.

17.

Yamazaki
K
,
Umeno
J
,
Takahashi
A
, et al.
A genome-wide association study identifies 2 susceptibility loci for Crohn’s disease in a Japanese population
.
Gastroenterology
2013
;
144
:
781
8
.

18.

Yang
SK
,
Hong
M
,
Oh
H
, et al.
Identification of loci at 1q21 and 16q23 that affect susceptibility to inflammatory bowel disease in Koreans
.
Gastroenterology
2016
;
151
:
1096
9.e4
.

19.

Howie
BN
,
Donnelly
P
,
Marchini
J
.
A flexible and accurate genotype imputation method for the next generation of genome-wide association studies
.
PLoS Genet
2009
;
5
:
e1000529
.

20.

Delaneau
O
,
Marchini
J
,
Zagury
JF
.
A linear complexity phasing method for thousands of genomes
.
Nat Methods
2011
;
9
:
179
81
.

21.

Marchini
J
,
Howie
B
,
Myers
S
,
McVean
G
,
Donnelly
P
.
A new multipoint method for genome-wide association studies by imputation of genotypes
.
Nat Genet
2007
;
39
:
906
13
.

22.

Bulik-Sullivan
BK
,
Loh
PR
,
Finucane
HK
, et al. ;
Schizophrenia Working Group of the Psychiatric Genomics Consortium
.
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies
.
Nat Genet
2015
;
47
:
291
5
.

23.

Liu
JZ
,
Tozzi
F
,
Waterworth
DM
, et al. ;
Wellcome Trust Case Control Consortium
.
Meta-analysis and imputation refines the association of 15q25 with smoking quantity
.
Nat Genet
2010
;
42
:
436
40
.

24.

de Leeuw
CA
,
Mooij
JM
,
Heskes
T
,
Posthuma
D
.
MAGMA: generalized gene-set analysis of GWAS data
.
PLoS Comput Biol
2015
;
11
:
e1004219
.

25.

Westra
HJ
,
Peters
MJ
,
Esko
T
, et al.
Systematic identification of trans eQTLs as putative drivers of known disease associations
.
Nat Genet
2013
;
45
:
1238
43
.

26.

GTEx consortium
.
Genetic effects on gene expression across human tissues
.
Nature
2017
;
550
:
204
13
.

27.

Ishigaki
K
,
Kochi
Y
,
Suzuki
A
, et al.
Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis
.
Nat Genet
2017
;
49
:
1120
5
.

28.

Jung
S
,
Liu
W
,
Baek
J
, et al.
Expression quantitative trait loci (eQTL) mapping in Korean patients with Crohn’s disease and identification of potential causal genes through integration with disease associations
.
Front Genet
2020
;
11
:
486
.

29.

Subramanian
A
,
Tamayo
P
,
Mootha
VK
, et al.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.

30.

Choi
SW
,
O’Reilly
PF
.
PRSice-2: polygenic risk score software for biobank-scale data
.
Gigascience
2019
;
8
:
1
6
.

31.

Fernandes
I
,
Bastien
Y
,
Wai
T
, et al.
Ligand-dependent nuclear receptor corepressor LCoR functions by histone deacetylase-dependent and -independent mechanisms
.
Mol Cell
2003
;
11
:
139
50
.

32.

Wong
K
,
Park
HT
,
Wu
JY
,
Rao
Y
.
Slit proteins: molecular guidance cues for cells ranging from neurons to leukocytes
.
Curr Opin Genet Dev
2002
;
12
:
583
91
.

33.

Sim
NL
,
Kumar
P
,
Hu
J
,
Henikoff
S
,
Schneider
G
,
Ng
PC
.
SIFT web server: predicting effects of amino acid substitutions on proteins
.
Nucleic Acids Res
2012
;
40
:
W452
7
.

34.

Adzhubei
IA
,
Schmidt
S
,
Peshkin
L
, et al.
A method and server for predicting damaging missense mutations
.
Nat Methods
2010
;
7
:
248
9
.

35.

Madej
MG
,
Dang
S
,
Yan
N
,
Kaback
HR
.
Evolutionary mix-and-match with MFS transporters
.
Proc Natl Acad Sci U S A
2013
;
110
:
5870
4
.

36.

Yu
R
,
Broady
R
,
Huang
Y
, et al.
Transcriptome analysis reveals markers of aberrantly activated innate immunity in vitiligo lesional and non-lesional skin
.
PLoS One
2012
;
7
:
e51040
.

37.

Crawford
NG
,
Kelly
DE
,
Hansen
MEB
, et al.
Loci associated with skin pigmentation identified in African populations
.
Science
2017
;
358
:
eaan8433
.

38.

Cunningham
F
,
Achuthan
P
,
Akanni
W
, et al.
Ensembl 2019
.
Nucleic Acids Res
2019
;
47
:
D745
51
.

39.

Aftab
S
,
Semenec
L
,
Chu
JS
,
Chen
N
.
Identification and characterization of novel human tissue-specific RFX transcription factors
.
BMC Evol Biol
2008
;
8
:
226
.

40.

Huang
Q
,
Whitington
T
,
Gao
P
, et al.
A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding
.
Nat Genet
2014
;
46
:
126
35
.

41.

Spisák
S
,
Lawrenson
K
,
Fu
Y
, et al.
CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncodong GWAS variants
.
Nat Med
2015
;
21
:
1357
63
.

42.

Gehart
H
,
van Es
JH
,
Hamer
K
, et al.
Identification of enteroendocrine regulators by real-time single-cell differentiation mapping
.
Cell
2019
;
176
:
1158
73.e16
.

43.

Piccand
J
,
Vagne
C
,
Blot
F
, et al.
Rfx6 promotes the differentiation of peptide-secreting enteroendocrine cells while repressing genetic programs controlling serotonin production
.
Mol Metab
2019
;
29
:
24
39
.

44.

Lee
HS
,
Lee
SB
,
Kim
BM
, et al.
Association of CDKN2A/CDKN2B with inflammatory bowel disease in Koreans
.
J Gastroenterol Hepatol
2018
;
33
:
887
93
.

45.

Martin
AR
,
Kanai
M
,
Kamatani
Y
,
Okada
Y
,
Neale
BM
,
Daly
MJ
.
Clinical use of current polygenic risk scores may exacerbate health disparities
.
Nat Genet
2019
;
51
:
584
91
.

46.

Lambert
SA
,
Abraham
G
,
Inouye
M
.
Towards clinical utility of polygenic risk scores
.
Hum Mol Genet
2019
;
28
:
R133
42
.

47.

Chen
MH
,
Raffield
LM
,
Mousas
A
, et al. ;
VA Million Veteran Program
.
Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations
.
Cell
2020
;
182
:
1198
213.e14
.

48.

Han
B
,
Akiyama
M
,
Kim
KK
, et al.
Amino acid position 37 of HLA-DRβ1 affects susceptibility to Crohn’s disease in Asians
.
Hum Mol Genet
2018
;
27
:
3901
10
.

Author notes

These authors contributed equally to this study.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)