-
PDF
- Split View
-
Views
-
Cite
Cite
Genevieve H L Roberts, Stephanie A Santorico, Richard A Spritz, Deep genotype imputation captures virtually all heritability of autoimmune vitiligo, Human Molecular Genetics, Volume 29, Issue 5, 1 March 2020, Pages 859–863, https://doi.org/10.1093/hmg/ddaa005
- Share Icon Share
Abstract
Autoimmune vitiligo is a complex disease involving polygenic risk from at least 50 loci previously identified by genome-wide association studies. The objectives of this study were to estimate and compare vitiligo heritability in European-derived patients using both family-based and ‘deep imputation’ genotype-based approaches. We estimated family-based heritability (h2FAM) by vitiligo recurrence among a total 8034 first-degree relatives (3776 siblings, 4258 parents or offspring) of 2122 unrelated vitiligo probands. We estimated genotype-based heritability (h2SNP) by deep imputation to Haplotype Reference Consortium and the 1000 Genomes Project data in unrelated 2812 vitiligo cases and 37 079 controls genotyped genome wide, achieving high-quality imputation from markers with minor allele frequency (MAF) as low as 0.0001. Heritability estimated by both approaches was exceedingly high; h2FAM = 0.75–0.83 and h2SNP = 0.78. These estimates are statistically identical, indicating there is essentially no remaining ‘missing heritability’ for vitiligo. Overall, ~70% of h2SNP is represented by common variants (MAF > 0.01) and 30% by rare variants. These results demonstrate that essentially all vitiligo heritable risk is captured by array-based genotyping and deep imputation. These findings suggest that vitiligo may provide a particularly tractable model for investigation of complex disease genetic architecture and predictive aspects of personalized medicine.
Introduction
Vitiligo is an autoimmune disease in which patches of depigmented skin and hair result from progressive destruction of skin melanocyte. Vitiligo affects ~0.4% of the European population (1) and often presents with other concomitant autoimmune diseases. Vitiligo is genetically complex, involving polygenic risk from at least 50 susceptibility loci identified by genome-wide association studies (GWAS) of European-derived white subjects, as well as environmental triggers that remain unknown (2).
The concept of predictive personalized medicine for complex diseases such as vitiligo rests on two premises: (1) that disease heritability (h2) is relatively high, and (2) that a large fraction of genomic variants that contribute to disease h2 can be captured and identified. However, for most complex diseases, neither of these premises has yet been met. Previous classical pedigree-based analyses of vitiligo heritability (h2FAM) have yielded heritability estimates of 0.46 (3), 0.52 (4), 0.72 (5) and 0.77 (6) in Indian, Han Chinese, Egyptian and Paisa Colombian populations, respectively; however, no such analysis has been conducted in Europeans. Here, we have estimated vitiligo heritability in families of European-derived white descent. We then performed ‘deep’ imputation in our existing European vitiligo GWAS cohort and we used the resulting set of variants to estimate SNP-based heritability (h2SNP). We find that estimates of h2FAM and h2SNP are in remarkably close agreement, approximately 0.78–0.79, indicating that vitiligo h2 is relatively high in this population and that most vitiligo h2 can be captured by array-based genotyping with deep imputation, with little ‘missing heritability’. We thus suggest that vitiligo may represent an ideal model disease for further developing the predictive personalized medicine paradigm.
Materials and Methods
Subjects
All subjects were from North America and Europe and were of self-described European-derived white ancestry. All cases met diagnostic criteria for generalized vitiligo (7). We obtained written informed consent from all participants, and the study was approved by the institutional review board at each participating center with oversight from the Colorado Multiple Institutional Review Board (COMIRB).
Vitiligo family recurrence data were derived from epidemiological and family structure information provided by 2122 unrelated European-derived vitiligo probands (8). The 8034 reported first-degree relatives included 3776 siblings and 4258 parents or offspring.
Genotyped subjects consisted of 2812 unrelated vitiligo cases and 37 079 controls from our three previous vitiligo GWAS (9–11) (referred to in sum as GWAS123). Some individuals in GWAS123 were excluded based on pairwise identity-by-descent sharing (|$\hat\pi$|) ≥ 0.0884, with cases retained over controls and individuals with the highest SNP call rate retained otherwise; identity-by-descent analysis was performed using KING v1.4 (http://people.virginia.edu/∼wc9c/KING/manual.html).
Genotype imputation
Pre-imputation variant quality control (QC) procedures have been described previously (11). Each of the three GWAS cohorts was imputed separately using the Sanger Imputation Server (https://imputation.sanger.ac.uk/). Variants were imputed genome-wide both to the Haplotype Reference Consortium (12) release 1.1 (HRC; http://www.haplotype-reference-consortium.org/) and to the 1000 Genomes Project (13) phase 3 release (1KGP; http://www.internationalgenome.org/) using the same procedure: input genotypes were phased with EAGLE (14) v2.4 and were subsequently imputed with the Positional Burrows–Wheeler Transform (15) (PBWT). In total, each post-imputation GWAS dataset contained ~39 million variants from HRC imputation and ~82 million variants from 1KGP imputation. There were more variants from 1KGP because the 1KGP output included SNPs and indels, whereas the HRC output only included SNPs. Only the 22 autosomes were included.
Variant QC
From the combined set of HRC- and 1KGP-imputed variants, we selected variants with imputation INFO score >0.7 and reference panel MAF > 0.0001. We then combined the selected variants from all three GWAS cohorts, retaining only the intersection of variants present in all three cohorts. If a variant was present in both the HRC and 1KGP imputation sets, the HRC variant was retained. Within the combined cohorts, we performed a second round of QC, removing variants with individual missingness rate >0.05, Hardy–Weinberg equilibrium test P-value <1 × 10−5 and GWAS123 minor allele frequency (MAF) <0.0001. In total, 12 034 711 variants remained after filtering. Variant filtering procedures were carried out with PLINK (16) version 2.0 (https://www.cog-genomics.org/plink/2.0/).
For h2SNP estimation using GCTA-LDMS-I (17), variants are required to be stratified by MAF and LD-score as a measure to reduce bias in the estimate (18). We stratified variants into 15 different LD/MAF bins (Supplementary Material, Table S1). We first calculated individual SNP LD-scores with GCTA (19) version 1.91.6 (https://cnsgenomics.com/software/gcta/) on each chromosome separately. The following GCTA command was used to calculate the individual variant LD scores:
gcta64 —bfile [PLINK File by Chromosome] —ld-score-region 200 —thread-num 12 —chr [Chromosome Number] —out [Output Filename].
We subsequently sorted variants into one of three LD terciles: low, medium or high. Variants were then further stratified into one of five MAF bins ([0.0001–0.001), [0.001, 0.01), [0.01, 0.05), [0.05, 0.25), [0.25, 0.50]) with MAF estimated from GWAS123 cases and controls. Variants from all chromosomes were combined. The resulting 15 LD/MAF bins were used to calculate 15 genetic relationship matrices with GCTA version 1.91.6. The following GCTA command was used to create each GRM:
gcta64 —extract [Variant List by Bin] —thread-num 12 —make-grm —out [Output Filename].
Heritability analyses
Falconer’s formulas (20, see Box 3) were used to calculate h2FAM based on population-specific vitiligo recurrence rates from our previous large-scale study of 2122 European-derived vitiligo probands and their families (8). Falconer’s approach accounts for ascertainment of probands.
To calculate h2SNP, we used GCTA-LDMS-I (17). We used the 15 genetic relationship matrices described in the previous section as input variance components. Fourteen ancestry-based principal components described previously (21) were included as fixed effects. The following GCTA version 1.91.6 command was used to fit the model:
gcta64 —reml —mgrm [List of 15 GRM Filenames] —prevalence 0.004 —qcovar [14 Principal Components] —threads 12 —pheno [Phenotype File] —out [Output Filename].
Comparing h2FAM to h2SNP

Distribution of SNPs, h2SNP and average h2SNP explained for vitiligo. (A) Distribution of genotyped and imputed variants included in h2SNP estimation. All included variants were either genotyped or had imputation INFO > 0.70 in all three GWAS cohorts. The 15 variant bins were created by stratifying first by LD-score tercile, then by MAF group. (B) h2SNP explained by each LD/MAF variant bin. (C) Average heritability explained per variant in each LD/MAF bin. For each bin, the h2SNP estimate was divided by the total number of variants in the MAF bin. For (A) and (C), the color key is the same as in (A).
Results
Estimation of family-based vitiligo heritability (h2FAM)
To estimate family-based vitiligo heritability (h2FAM), we used family recurrence risks derived from our previous large-scale analysis of first-degree relatives of 2122 European-derived white vitiligo probands (7). These family recurrence risks were 0.061 in siblings (nsiblings = 3776), 0.078 in parent-offspring pairs (nparent–offspring = 4258) and 0.070 in all first-degree relatives (nall first-degree = 8034) (7). We calculated h2FAM based on these family recurrence risks using Falconer’s formula (19), yielding h2FAM estimates of 0.75 (SE = 0.039) based on siblings, 0.83 (SE = 0.029) based on parent–offspring pairs and 0.79 (SE = 0.023) based on all first-degree relatives. These three estimates are in reasonably close agreement.
Estimation of SNP-based vitiligo heritability (h2SNP)
To estimate h2SNP captured by dense genotyping arrays with deep imputation, we used European-derived vitiligo cases (n = 2812) and controls (n = 37 079) who had been genotyped in one of our three previous vitiligo GWAS (9–11). We performed genome-wide imputation to both the HRC and 1KGP and combined all variants that were genotyped or accurately imputed, yielding a total ~12 million variants with MAF ≥ 0.0001. We stratified variants into three LD-score terciles (low, medium and high) and five MAF bins, resulting in a total of 15 bins. The total numbers of accurately imputed variants included in each LD/MAF bin are shown in Figure 1A. We jointly estimated the h2SNP explained by each of the 15 LD/MAF variant bins with GCTA-LDMS-I (17) (Fig. 1B). Summing the h2SNP estimates from all 15 variant bins, total estimated vitiligo h2SNP is 0.78 (SE = 0.040). We tested whether this estimate of h2SNP differs significantly from any of the three h2FAM estimates described above and found no significant differences (Psiblings = 0.59; Pparent–offspring = 0.31; Pall first-degree = 0.83), indicating that most or all of h2FAM is captured by h2SNP with deep imputation (Fig. 2).

Comparison of three h2FAM estimates and h2SNP. Each bar represents a total narrow-sense heritability estimate for sibling-based h2FAM (green), parent–offspring-based h2FAM (orange), all first-degree relative-based h2FAM (blue) and GCTA-LDMS-i h2SNP (pink). The errors represent a 95% confidence interval for each estimate.
Discussion
We estimate that in European-derived white vitiligo h2FAM ranges from 0.75 to 0.83 and that h2SNP is 0.78 based on array-based genotyping and deep genotype imputation. These estimates are quite high among complex diseases and furthermore are remarkably similar. Indeed, there is no significant difference between the h2SNP estimate and any of the h2FAM estimates, suggesting that most or all vitiligo h2 is captured by array-based genotyping and deep imputation.
In a previous, more limited study (22), we estimated that vitiligo h2SNP was ~0.55 based on array-based genotyping and imputation only of variants with MAF > 0.01. In that analysis, imputed variants with MAF < 0.01 were excluded due to the generally low quality of imputation for rarer variants using earlier smaller reference panels. As a result, in that analysis, almost one-third of total expected vitiligo heritability based on h2FAM thus appeared to be ‘missing’. Here, after deep imputation using far larger reference panels, we now find that common variants (MAF ≥ 0.01) contribute ~0.55 and rare (MAF < 0.01) variants contribute ~0.23 to the total h2SNP of 0.78. Thus, newly imputable rare variants explain nearly all previously ‘missing’ vitiligo heritability.
Rare variants in aggregate thus appear to contribute ~30% to overall h2SNP. However, the average contribution of an individual rare variant (i.e. the h2SNP attributable to all rare variants divided by the total number of rare variants) is 5.3 × 10−8, roughly 25% smaller than the average contribution of any individual common variant (7.2 × 10−8h2SNP/variant) (Fig. 1C). Practically, this may mean that, while rare variants that contribute to vitiligo risk are now imputable, their individual effects on risk may not be sufficiently large to detect by rare-variant association tests at current vitiligo GWAS sample sizes. To identify individual rare variants that contribute to vitiligo susceptibility, it may thus be necessary to substantially increase GWAS sample size or to use alternative study designs such as family studies.
Recently, Wainschtein and coworkers (23) reported that h2FAM for height and body mass index is fully recovered by whole-genome sequencing in a large cohort. However, array genotyping coupled with deep imputation did not achieve full recovery of h2FAM for these traits (17). Together, the facts that vitiligo heritability is quite high, that virtually all vitiligo heritability estimated from family studies (h2FAM) is captured by array-based genotyping and deep imputation (h2SNP), that about 70% of h2SNP is represented by common variation detected with a relatively modest-sized sample and that the predictive accuracy of a vitiligo composed of largely common variants is relatively high (21) suggest that vitiligo may provide a particularly tractable model for investigations of predictive aspects of personalized medicine.
Data Availability
Case genotype and phenotype data for GWAS1, GWAS2 and GWAS3 subjects have been deposited in the Database of Genotypes and Phenotypes (dbGaP) under accession numbers phs000224.v1.p1, phs000224.v2.p1, phs000224.v3.p2 and phs000224.v4.p2 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000224.v3.p2). Genome-wide association study summary statistics have been deposited in the NHGRI-EBI Catalog of published GWAS (https://www.ebi.ac.uk/ega/studies/phs000224.v1.p1).
Funding
National Institutes of Health (grants R01AR056292, R01AR057212, R01AR065951, P30AR057212); National Institutes of Health (training grant T32 AR007411-31A1 to G.H.L.R.).
Conflict of Interest statement. None declared.