A multi-ancestry genome-wide association study in type 1 diabetes

Abstract Type 1 diabetes (T1D) is an autoimmune disease caused by destruction of the pancreatic β-cells. Genome-wide association (GWAS) and fine mapping studies have been conducted mainly in European ancestry (EUR) populations. We performed a multi-ancestry GWAS to identify SNPs and HLA alleles associated with T1D risk and age at onset. EUR families (N = 3223), and unrelated individuals of African (AFR, N = 891) and admixed (Hispanic/Latino) ancestry (AMR, N = 308) were genotyped using the Illumina HumanCoreExome BeadArray, with imputation to the TOPMed reference panel. The Multi-Ethnic HLA reference panel was utilized to impute HLA alleles and amino acid residues. Logistic mixed models (T1D risk) and frailty models (age at onset) were used for analysis. In GWAS meta-analysis, seven loci were associated with T1D risk at genome-wide significance: PTPN22, HLA-DQA1, IL2RA, RNLS, INS, IKZF4-RPS26-ERBB3, and SH2B3, with four associated with T1D age at onset (PTPN22, HLA-DQB1, INS, and ERBB3). AFR and AMR meta-analysis revealed NRP1 as associated with T1D risk and age at onset, although NRP1 variants were not associated in EUR ancestry. In contrast, the PTPN22 variant was significantly associated with risk only in EUR ancestry. HLA alleles and haplotypes most significantly associated with T1D risk in AFR and AMR ancestry differed from that seen in EUR ancestry; in addition, the HLA-DRB1*08:02-DQA1*04:01-DQB1*04:02 haplotype was ‘protective’ in AMR while HLA-DRB1*08:01-DQA1*04:01-DQB1*04:02 haplotype was ‘risk’ in EUR ancestry, differing only at HLA-DRB1*08. These results suggest that much larger sample sizes in non-EUR populations are required to capture novel loci associated with T1D risk.


Introduction
Type 1 diabetes (T1D) is a common autoimmune disease in which the destruction of pancreatic β-cells results in the eventual inability of the body to produce insulin [1][2][3][4].Without insulin, there is accumulation of glucose in the bloodstream and an inability for glucose to enter cells for production of energy.Progression of glucose accumulation leads to blood vessel and organ damage from dehydration, conversion of tissue to ketones for alternative energy sources, and life-threatening diabetic ketoacidosis [5][6][7][8].Thus, external sources of insulin are necessary for survival.
There are many risk factors associated with the development of T1D, including both genetic and generally unknown environmental factors [9][10][11].As a disease of autoimmunity, T1D has been characterized by three specific stages [12]: Stage 1 represents the transition from normal glucose homeostasis in an individual with variable genetic and other risk factors to production of multiple islet autoantibodies but with glucose levels in the normal range; Stage 2 includes individuals who have multiple islet autoantibodies but with glucose levels exceeding normal range (e.g.fasting plasma glucose ≥ 100 mg/dl or ≥ 5.6 mmol/l); with Stage 3 representing clinically diagnosed T1D.Each of these stages of T1D diabetes may have overlapping, as well as distinct, genetic, and non-genetic risk factors.T1D has historically been thought to be a disease of childhood (known previously as juvenile-onset diabetes mellitus) and restricted to people of European ancestry; however, T1D occurs in individuals of all ages and ancestry groups [13][14][15][16][17].
The genetic basis of T1D is well-established.Twin and family studies have estimated the genetic contribution to T1D in childhood as roughly 50%.Early studies focused on the HLA region [18][19][20] and identified the contribution of the genes, alleles and haplotypes of the human Major Histocompatibility Complex (MHC) that have large effects on risk, specifically the HLA class I genes (-A, -B, and -C) and the HLA class II genes (-DRB1, -DQA1, -DQB1, -DPA1, -DPB1) [19][20][21][22].Subsequent studies utilized candidate gene approaches (related to the immune response) with few (typically functional) genetic variants in small numbers of cases, controls, and families.The insulin gene (INS) variable number of tandem repeat (VNTR) polymorphism was identified through a candidate gene study focused on INS due to its direct impact on insulin metabolism and implication in risk of T1D [23].Several additional loci were discovered containing variants with large effect that were in coding or promoter regions in candidate genes, such as PTPN22 [24] and CTLA4 [25].
With the development of rapid genotyping technology and ability to assemble large numbers of cases with T1D and controls, genome-wide association studies (GWAS) became a common tool to detect genetic variants associated with disease and risk factors [26].Initial GWAS studies identified loci that had statistical power for the detection of common variants (minor allele frequency (MAF) > 0.05) with large effect (OR > 1.5) [26], there was recognition that much of the genetic impact on common human disease would have much smaller effects, requiring significantly increased sample size.The Type 1 Diabetes Genetics Consortium (T1DGC) was formed to conduct genome-wide analyses in affected sib-pair families (linkage) and case-control collections (association) to individually increase sample size and to conduct meta-analyses in T1D [27].Following GWAS, additional genotyping for fine mapping required development of a custom array (ImmunoChip) to better interrogate regions of interest across the genome [28,29].Currently, GWAS meta-analyses and fine mapping studies [29,30] have identified over 100 loci associated with risk of T1D.
A major limitation in most human genetic studies has been a focus on populations of European (EUR) ancestry [31][32][33].Despite increasing recognition of the benefit of including diversity in discovery, scientific equity, and reducing health disparities from applications of genetic medicine, there has yet to be a GWAS in T1D that includes large sample sizes in non-EUR ancestry populations.This absence of genetic diversity represents a major gap, particularly with the increasing global incidence and prevalence of T1D [34][35][36].In this report, we conduct a multi-ancestry GWAS meta-analysis in cases, controls, and families with T1D for discovery of genetic variants, detection of novel ancestry-specific HLA variants and amino acid residues that are associated with risk of T1D as well as the age at onset of disease.

Results
A total of 13 412 individuals were included in this study, with 6648 having T1D and 52% female.After genotype quality control, 430 930 variants were included for imputation.Each individual's genetic ancestry was inferred by using multi-dimensional scaling (MDS), where T1DGC samples were projected to 1000 Genome phase-3 reference panel (see Methods/Population stratification for details).Each participant was assigned to one of three genetic ancestry groups-African (AFR), Admixed (AMR) and European (EUR).Genotypes were imputed using the TOPMed multi-ancestry reference panel.After additional SNP quality control, 13 777 800 (AFR), 8 952 895 (AMR), and 8 500 361 (EUR pseudo case-control) variants (MAF > 0.01) remained for association analyses.Association analyses were conducted separately on individual ancestry groups (409 AFR cases, 482 AFR controls; 153 AMR cases, 155 AMR controls; and 3428 pseudo-cases and 3428 pseudo-controls generated from affected sibpair families).Across ancestry groups, the average age at onset of T1D ranges from 8.3-11.0years, with more females than males (67% in AFR, 58% in AMR and 51% in EUR) (Supplementary Table 1, Supplementary Fig. 1).

Genome-wide association analysis of T1D age at onset across diverse ancestry
AFR (N = 891), and AMR (N = 308) individuals self-reported age at onset of T1D in cases and age at enrollment for controls.In families that generated pseudo case-controls, the family members (N = 6840) self-reported their age at onset of T1D or age at enrollment.Censored time-to-event models were used for analysis of each ancestry group.There was no evidence of systematic bias (λ GC = 1.01) after excluding SNPs in the MHC region.Meta-analysis of the three ancestry groups for T1D age at onset revealed four regions that attained genome-wide significance, all established T1D risk loci: 1p13.2 (PTPN22), 6p21.32 (HLA-DQB1), 11p15.5 (INS), and 12q13.2(ERBB3) (Table 2; Supplementary Fig. 2C and D, Supplementary Fig. 3G-I).Among the non-HLA region SNPs, rs689 in the INS locus had the strongest association with age at onset (HR = 1.45,P = 2.81 × 10 −28 ), like its association with T1D risk (OR = 1.81,P = 2.34 × 10 −45 ).Although the association with T1D age at onset was weaker, the trend was consistent with disease risk.Meta-analysis revealed that rs11171747, near ERBB3 in the 12q13.2region, was the most significantly associated SNP with T1D age at onset (HR = 1.17,P = 1.20 × 10 −8 ).This finding suggests that different (statistically The number of cases and controls used in the analyses and the ancestry-specific results are provided in Supplementary Tables 1 and 2  independent) SNPs in the IKZF4-RPS26-ERBB3 locus may be associated with T1D risk and age at onset.Two regions associated with T1D risk also reached suggestive evidence of association with T1D age at onset: 10q23.31(RNLS) and 12q24.12(SH2B3).The SNP in RNLS (rs2018705, HR = 1.18,P = 1.11 × 10 −7 ) had similar direction of effect for T1D risk.The SNP with T1D in SH2B3 (rs1265564, HR = 1.16,P = 1.38 × 10 −7 ) is in moderate LD (r 2 ∼ 0.60) with rs597808, the variant identified as associated with T1D risk.The 18p11.21 (PTPN2) locus reached suggestive significance for association with T1D age at onset but not T1D risk in these data, although the PTPN2 locus has been established previously as a T1D risk locus [39,40].In the NRP1 locus, the same variant associated with T1D risk and with T1D age at onset only reached genome-wide significance in the meta-analysis of AFR and AMR ancestry subjects (rs722988, HR AFR_AMR = 1.41,P AFR_AMR = 2.41 × 10 −8 ) (Supplementary Fig. 4B).

HLA region class II gene and haplotype analysis in T1D
The association analyses of T1D with the HLA region included AFR ancestry (409 cases and 482 controls), AMR ancestry (153 cases, 155 controls), and EUR ancestry (2970 pseudo-cases, 2970 pseudocontrols).After imputation, the HLA region contained 20 329 variants for AFR, 20 376 variants for AMR, and 20 279 variants for EUR).Classical HLA alleles and HLA gene amino acid sequences were imputed from SNP data (Methods).In AFR and AMR ancestry groups, the most significantly associated allele for both T1D risk and age at onset was HLA-DQA1 * 03:01.In the EUR ancestry group, however, the HLA-DQB1 * 03:02 allele was most associated with both T1D risk and age at onset.When we evaluated amino acid residues in HLA genes, the established HLA-DQB1 amino acid position 57 was associated most strongly with both T1D risk (Fig. 1) and age at onset (Fig. 2) across the three ancestry groups.

Enrichment of T1D genes across autoimmune diseases
The coexistence of T1D with other immune-mediated diseases was documented extensively through clustering in families in part due to similarity of HLA associations with disease.Recent evidence of genetic similarity of autoimmune diseases has been shown in analysis of targeted array data and not genome-wide analysis.We utilized GWAS data to compare enrichment of T1D-annotated genes against GWAS-catalog reported genes in autoimmune diseases (Fig. 3) using FUMA software [43].We identified 12 autoimmune diseases, including T1D that shared annotated genes.The most significant overlap of identified T1D genes in autoimmune diseases was with alopecia areata (AA; P = 3.62 × 10 −16 ).The overlap between T1D and AA loci was driven, in part, by common susceptibility variants within PTPN22, IL2RA, IKZF4, ERBB3 and SH2B3.In EUR ancestry populations, previous gene set enrichment analysis (using the fine-mapping ImmunoChip array) with T1D showed a similar overlap with diseases that have characteristic tissue autoantibodies; e.g.AA, juvenile idiopathic arthritis (JIA), and rheumatoid arthritis (RA) [28].

Discussion
This is the first genome-wide association scan of T1D in populations of diverse ancestry.While previous studies utilized targeted genotyping arrays (ImmunoChip), we demonstrated that no additional loci could be identified genome-wide in EUR and other ancestry populations.Although, the study contains the largest sample sizes in AFR and AMR populations to date, much larger sample sizes will be required to fully characterize the genome with respect to T1D risk.We identified seven regions at genome-wide significance associated with risk and four regions at genome-wide significance associated with T1D age at onset, with some regions exhibiting ancestry-specific effects.In the HLA region, we determined specific HLA class I and class II alleles and amino acids associated with T1D risk and age at onset within and across ancestry, as well as ancestry-specific HLA haplotypes.We identified seven associated HLA haplotypes in the AFR ancestry group, two in the AMR (Hispanic/Latino) ancestry group, and nineteen HLA haplotypes significantly associated with risk in the EUR ancestry group.
As expected, the strongest non-HLA regions associated with T1D risk and age at onset were seen at the known INS and PTPN22 loci.The most associated variant (rs2476601, C1858T, R620W, in complete LD with rs6679677) in the PTPN22 locus was only supported by EUR ancestry.The association between PTPN22 rs2476601 and T1D was first documented in a case-control study of non-Hispanic white individuals from North America and Sardinia [24].Subsequent family and case-control studies in numerous European populations [37,[44][45][46][47] confirmed its association with T1D in this locus, showing high frequencies of rs2476601 in northern European populations and decreased frequencies in southern European and Sardinian populations [48].The PTPN22 rs2476601 variant has been shown to be rare in African and Asian populations [48][49][50], supporting our results of reduced association in non-EUR ancestry individuals.In addition, it has been shown that the PTPN22 rs2476601 SNP is associated with earlier age at onset of T1D in the European ancestry population [51].We identified an association of the NRP1 locus with T1D risk and age at onset in non-EUR ancestry (AFR and AMR groups) but not in the EUR ancestry group.This locus has been identified in a large fine-mapping study involving diverse ancestry individuals [29] and GWAS of a large European ancestry cohort [30].Together, the NRP1 locus may represent a region more associated with T1D risk in non-European ancestry populations, as the effect sizes are stronger in our AFR and AMR groups; however, the direction of effect is the same in all ancestry groups.In addition, the strongest associated variant in the NRP1 locus was associated with earlier T1D age at onset.Previously, two SNPs located in intron 9 of NRP1 were shown to be associated with T1D [52], with the strength of association stronger in children with onset before age 10 years and/or in children who had a parent with T1D.In addition, an NRP1 isoform in pancreatic islets has been shown to be associated with a very young age at onset of T1D [53].Cells containing the truncated version of neuropilin-1 protein (encoded by NRP1) are devoid of insulin, resulting in the development of T1D at a very early age.In those individuals with onset of T1D before age 4 years, the frequency of the minor allele (T) of the NRP1 intron 9 variant (rs2070303) is increased when compared with those having an older age at onset.
This study has several strengths, including use of underrepresented populations (African and admixed ancestry), the genome-wide coverage of variants, and imputation to increase SNP density.In addition, the detailed interrogation of the HLA region provides novel information on risk associations not only of HLA alleles but also amino acid residues and HLA haplotypes.There are, however, some limitations of the study, including the small number of samples compared with other genomic studies in European ancestry (despite having the largest non-EUR ancestry T1D genome-wide data to date).The relatively small number of samples in total may explain the number of T1D-associated loci and failure to identify new regions that would likely have small effect sizes.Despite these limitations, it is important to recognize potential ancestry-specific effects on the risk of T1D, thereby better defining the genetic landscape of T1D risk in non-European ancestry populations, particularly in the HLA region as it represents a major risk locus for T1D.
Our multi-ancestry GWAS analysis enabled the discovery of genetic variants, novel ancestry-specific HLA variants and amino acid residues associated with risk of T1D and age at T1D onset.Including subjects with diverse genetic ancestry revealed that the NRP1 region exhibits a stronger effect on T1D risk and age at onset in non-European ancestry populations.The association of NRP1 locus and specific HLA haplotypes, in addition to PTPN22, differentiate the risk of T1D between individuals of European and non-European ancestries.These results suggest that further increasing sample diversity can provide better understanding of genetic risk factors contributing to T1D in global populations.This genetic diversity could help improve population-specific genetic risk scores application to identify individuals at high genetic risk of T1D and provide opportunities for islet autoantibody screen for eligibility into early intervention trials (e.g.teplizumab) to delay or prevent disease onset.

Participants
We obtained DNA samples from 13 412 subjects recruited by the T1DGC (Supplementary Table 1).The study population consisted of 12 213 individuals from affected sib-pair and trio families.The majority of individuals from families were of European (EUR) ancestry (10501).The case-control series consisted of 891 unrelated individuals of AFR ancestry (409 T1D cases, 482 controls) and 308 individuals of AMR ancestry (153 T1D cases, 155 controls).

Genotyping and quality control
All samples were genotyped according to the manufacturer's protocol using the Illumina Infinium CoreExome BeadChip in the Genome Sciences Laboratory at the University of Virginia.Raw genotyped data were subjected to SNP-level and samplelevel quality control (Supplementary Fig. 1) using KING software (version 2.2.8 [55]).For sample level quality control, we utilized chromosome X heterozygosity and chromosome Y absence to identify DNA samples that had discordant results between genetic sex and self-reported sex, these samples were treated as errors in processing and were removed.Additionally, we removed samples with a genotype call rate < 95% and evidence of Mendelian inconsistences (MI, for families).Pedigree sample relationships were updated using KING software [55].From samples passing initial quality control metrics, the following filters were applied for variants: removal of monomorphic SNPs, removal of SNPs with call rates < 95%, removal of SNPs with Mendelian inconsistencies in > 1% of the parent-offspring pairs and trios, and removal of SNPs significantly deviating from Hardy-Weinberg Equilibrium (P < 1.0 × 10 −6 [P < 1.0 × 10 −20 for MHC region]).SNPs with MAF < 0.01 were not included in the analysis.Variant quality control included removal of SNPs that were not mapped uniquely to the genome (e.g.exm244817).Additional sample quality control included removal of family members with T1D or case samples with age at onset of zero, and removal of family samples with onset of diabetes greater than 32 years and with at least one affected parent (suggestive of misdiagnosis of T1D with maturityonset diabetes of the young, MODY).All DNA samples were collected after approval from relevant institutional research ethics committees and appropriate informed consent was obtained from all subjects and families.

Generation of a pseudo case-control sample
In the T1DGC, the family collection was ascertained specifically to include affected sib-pairs [56] followed by the collection of unrelated individuals with T1D and controls.As the analytic method (logistic mixed models) is not robust to the targeted ascertainment of affected sib-pair families, a series of pseudocases and pseudo-controls was generated [57] for application of the logistic mixed and frailty models (SAIGE and GATE).Within each family and for each SNP, the alleles transmitted from each parent to an affected child constituted the "pseudo case"; similarly, alleles not transmitted from each parent to an affected

Table 3 .
Association of MHC class II haplotypes with T1D in pseudo case-control European, African and admixed ancestry individuals.

.
The extended results for ancestry-specific HLA haplotype analyses results are provided in Supplementary Tables3-5.Control AF-haplotype frequency in controls.Case AF-haplotype frequency in cases.OR-odds ratio.P-P-value.a OR adjusted.

Figure 1 .
Figure 1.HLA alleles and amino acids associated with T1D risk in AFR (A), AMR (B), EUR (C) and meta-analysis (D).HLA class I and HLA class II genes are labeled on the x-axis.The y-axis represents −log 10 (P-value).The horizontal dashed line represents the threshold for genome-wide significance.SNPs are represented by grey diamonds, HLA alleles by yellow diamonds, and HLA amino acids by red diamonds.(A) T1D risk associations within the HLA region in AFR ancestry individuals.(B) T1D risk associations within the HLA region in AMR ancestry individuals.(C) T1D risk associations within the HLA region in EUR ancestry individuals.(D) T1D risk associations within the HLA region in meta-analysis of AFR, AMR and EUR ancestry individuals.

Figure 2 .
Figure 2. HLA alleles and amino acids associated with T1D age at onset in AFR (A), AMR (B), EUR (C) and meta-analysis (D).HLA class I and HLA class II genes are labeled on the x-axis.The y-axis represents −log 10 (P-value).The horizontal dashed line represents the threshold for genome-wide significance.SNPs are represented by grey diamonds, HLA alleles by yellow diamonds, and HLA amino acids by red diamonds.(A) T1D age at onset associations within the HLA region in AFR ancestry individuals.(B) T1D age at onset associations within the HLA region in AMR ancestry individuals.(C) T1D age at onset associations within the HLA region in EUR ancestry individuals.(D) T1D age at onset associations within the HLA region in meta-analysis of AFR, AMR and EUR ancestry individuals.

Table 2 .
Lead variants associated with T1D age at onset (GATE).
The number of cases used in the analyses of age at onset is provided in Supplementary Table1.Genome Build: GRCh38.A1-effect allele.A2-alternative allele.AF_A1-allele frequency of the effect allele.HR-hazard ratio.BETA-effect size.SE-standard Error of BETA.P-P-value for meta-analysis of 3 ancestries.Direction-indication of risk (+), protection (−) or missing SNP (?) for AFR, AMR and EUR.χ 2 -heterogeneity statistic from Cochran's test.P-value-heterogeneity P-value.Gene-the nearest or candidate gene.