-
PDF
- Split View
-
Views
-
Cite
Cite
Julian Hecker, Sung Chun, Ahmad Samiei, Cuining Liu, Cecelia Laurie, Priyadarshini Kachroo, Sharon M Lutz, Sanghun Lee, Albert V Smith, Jessica Lasky-Su, Michael H Cho, Sunita Sharma, Manuel Enrique Soto Quirós, Lydiana Avila, Juan C Celedón, Benjamin Raby, Xiaobo Zhou, Edwin K Silverman, Dawn L DeMeo, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Christoph Lange, Scott T Weiss, FGF20 and PGM2 variants are associated with childhood asthma in family-based whole-genome sequencing studies, Human Molecular Genetics, Volume 32, Issue 4, 15 February 2023, Pages 696–707, https://doi.org/10.1093/hmg/ddac258
- Share Icon Share
Abstract
Asthma is a heterogeneous common respiratory disease that remains poorly understood. The established genetic associations fail to explain the high estimated heritability, and the prevalence of asthma differs between populations and geographic regions. Robust association analyses incorporating different genetic ancestries and whole-genome sequencing data may identify novel genetic associations.
We performed family-based genome-wide association analyses of childhood-onset asthma based on whole-genome sequencing (WGS) data for the ‘The Genetic Epidemiology of Asthma in Costa Rica’ study (GACRS) and the Childhood Asthma Management Program (CAMP). Based on parent–child trios with children diagnosed with asthma, we performed a single variant analysis using an additive and a recessive genetic model and a region-based association analysis of low-frequency and rare variants.
Based on 1180 asthmatic trios (894 GACRS trios and 286 CAMP trios, a total of 3540 samples with WGS data), we identified three novel genetic loci associated with childhood-onset asthma: rs4832738 on 4p14 (|$P=1.72\ast{10}^{-9}$|, recessive model), rs1581479 on 8p22 (|$P=1.47\ast{10}^{-8}$|, additive model) and rs73367537 on 10q26 (|$P=1.21\ast{10}^{-8}$|, additive model in GACRS only). Integrative analyses suggested potential novel candidate genes underlying these associations: PGM2 on 4p14 and FGF20 on 8p22.
Our family-based whole-genome sequencing analysis identified three novel genetic loci for childhood-onset asthma. Gene expression data and integrative analyses point to PGM2 on 4p14 and FGF20 on 8p22 as linked genes. Furthermore, region-based analyses suggest independent potential low-frequency/rare variant associations on 8p22. Follow-up analyses are needed to understand the functional mechanisms and generalizability of these associations.
Introduction
Asthma is a common chronic respiratory disease (1) characterized by substantial clinical heterogeneity. Patients differ across a range of clinical manifestations and, despite strong evidence for genetic predisposition, asthma etiology remains poorly understood (2). Recent large-scale genetic association analyses showed that the genetic architectures of childhood-onset and adult-onset asthma are partly distinct, likely due to age-of-onset-dependent disease mechanisms (3,4). Furthermore, the prevalence of childhood-onset asthma differs between populations and geographic regions (2,5). Genetic association studies of asthmatic children that include multiple genetic ancestries could potentially facilitate the identification of the underlying biological mechanisms. Moreover, given the underlying differences between childhood-onset and adult-onset asthma as well as recall biases, targeted studies based on physician-diagnosed asthmatic children are required; biobank analyses are usually based on self-reported medical history (3,4).
In this work, we utilized whole-genome sequencing (WGS) data and performed genome-wide association analyses of childhood asthma based on two family-based studies: ‘The Genetic Epidemiology of Asthma in Costa Rica’ study (GACRS) (6) and the Childhood Asthma Management Program (CAMP) (7). For both studies, WGS data were generated as part of the NHLBI Trans-Omics for Precision Medicine (TOPMed) program (8). Given the family-based designs consisting of asthmatic trios (two parents and a child with asthma), we applied robust family-based association tests (FBATs) (9,10). We incorporated both the commonly used additive and a recessive model for the single variant analysis of common genetic variants. The latter was motivated by recent findings using recessive models for other complex diseases (11,12). Furthermore, WGS data allowed us to analyze region-based associations based on low-frequency and rare variants using a recently proposed FBAT framework (13,14). Our analysis incorporated a total of 3540 samples, representing 1180 asthmatic trios (894 GACRS trios and 286 CAMP trios). To the best of our knowledge, this is the largest WGS family-based association analysis of childhood-onset asthma to date.
Results
Baseline characteristics
After quality control, we kept WGS data for 894 GACRS and 286 CAMP trios. In Table 1, we report the baseline characteristics of the asthmatic offspring in both studies. For CAMP, these data reflect the randomization timepoint of the study. The two study populations possess overall comparable characteristics regarding the sex ratio, age distribution, parental asthma history and height. Differences are observed regarding lung function measurements and reported race. As the FBAT approach is robust against any confounding due to population substructure and admixture, the FBAT statistics can be computed for the combined sample of trios from GACRS and CAMP.
. | GACRS . | CAMPa . | Comparison test . |
---|---|---|---|
N | 894 | 286 | |
% Female | 40.9 | 32.5 | P = 0.013 |
Age (mean (SD) in years) | 9.28 (1.86) | 8.90 (2.16) | P = 0.008 |
Height (mean (SD) in meter) | 1.33 (0.12) | 1.34 (0.14) | P = 0.60 |
FEV1 (mean (SD) in liter) | 1.80 (0.52) | 1.65 (0.47) | P = 1.03e−05 |
FEV1% predicted (mean (SD)) | 98.87 (16.84) | 92.82 (13.40) | P = 1.17e−09 |
Maternal asthma history % yes | 30.0 | 25.4 | P = 0.16 |
Paternal asthma history % yes | 25.4 | 22.6 | P = 0.34 |
Race | Hispanic | 75% White/10% African American/7% Hispanic/8% other | - |
. | GACRS . | CAMPa . | Comparison test . |
---|---|---|---|
N | 894 | 286 | |
% Female | 40.9 | 32.5 | P = 0.013 |
Age (mean (SD) in years) | 9.28 (1.86) | 8.90 (2.16) | P = 0.008 |
Height (mean (SD) in meter) | 1.33 (0.12) | 1.34 (0.14) | P = 0.60 |
FEV1 (mean (SD) in liter) | 1.80 (0.52) | 1.65 (0.47) | P = 1.03e−05 |
FEV1% predicted (mean (SD)) | 98.87 (16.84) | 92.82 (13.40) | P = 1.17e−09 |
Maternal asthma history % yes | 30.0 | 25.4 | P = 0.16 |
Paternal asthma history % yes | 25.4 | 22.6 | P = 0.34 |
Race | Hispanic | 75% White/10% African American/7% Hispanic/8% other | - |
aCAMP data at randomization.
Comparison tests between GACRS and CAMP based on chi-squared or t-test, depending on the type of measurement.
. | GACRS . | CAMPa . | Comparison test . |
---|---|---|---|
N | 894 | 286 | |
% Female | 40.9 | 32.5 | P = 0.013 |
Age (mean (SD) in years) | 9.28 (1.86) | 8.90 (2.16) | P = 0.008 |
Height (mean (SD) in meter) | 1.33 (0.12) | 1.34 (0.14) | P = 0.60 |
FEV1 (mean (SD) in liter) | 1.80 (0.52) | 1.65 (0.47) | P = 1.03e−05 |
FEV1% predicted (mean (SD)) | 98.87 (16.84) | 92.82 (13.40) | P = 1.17e−09 |
Maternal asthma history % yes | 30.0 | 25.4 | P = 0.16 |
Paternal asthma history % yes | 25.4 | 22.6 | P = 0.34 |
Race | Hispanic | 75% White/10% African American/7% Hispanic/8% other | - |
. | GACRS . | CAMPa . | Comparison test . |
---|---|---|---|
N | 894 | 286 | |
% Female | 40.9 | 32.5 | P = 0.013 |
Age (mean (SD) in years) | 9.28 (1.86) | 8.90 (2.16) | P = 0.008 |
Height (mean (SD) in meter) | 1.33 (0.12) | 1.34 (0.14) | P = 0.60 |
FEV1 (mean (SD) in liter) | 1.80 (0.52) | 1.65 (0.47) | P = 1.03e−05 |
FEV1% predicted (mean (SD)) | 98.87 (16.84) | 92.82 (13.40) | P = 1.17e−09 |
Maternal asthma history % yes | 30.0 | 25.4 | P = 0.16 |
Paternal asthma history % yes | 25.4 | 22.6 | P = 0.34 |
Race | Hispanic | 75% White/10% African American/7% Hispanic/8% other | - |
aCAMP data at randomization.
Comparison tests between GACRS and CAMP based on chi-squared or t-test, depending on the type of measurement.
Genome-wide single variant association analysis
We performed power calculations for the single variant analysis based on an additive (multiplicative relative risk) and a recessive model (for minor and major allele). In Supplementary Material, Figure S1, we plotted the corresponding power curves based on a sample size of |$n=1180$| affected trios and established power formulas (15). These computations demonstrated that we have adequate power for common variants with substantial effect sizes.
In the combined GACRS + CAMP FBAT single variant analysis, we evaluated the additive association P-value for 9 330 567 variants. For the recessive model, we evaluated 7 444 202 association P-values (including a substantial number of variants with two recessive P-values, one for each allele, see section Materials and Methods). In Figure 1, we visualized the Miami plot for the additive and recessive analysis. In Supplementary Material, Figures S2 and S3, we plotted quantile-quantile-plots for the additive and recessive analyses. These plots demonstrate the validity of the analyses and show that the exclusion of the lead SNP in the well-established 17q21 locus (16), and a 500 kb flanking region, removes a substantial part of the inflation of the test statistics.

Miami plot for the GACRS + CAMP single variant analyses based on the additive (top) and recessive (bottom) model.
Considering both the results for the additive and recessive models, we observed four genetic loci with variants that reached a genome-wide significant association P-value |$P<5\ast{10}^{-8}$|. The well-established 17q21 locus was the most significant association in our additive GACRS + CAMP analysis (16). The additive GACRS + CAMP association P-value of the corresponding lead variant rs8076131 was |$P=5.35\ast{10}^{-11}$|. The corresponding association P-values in GACRS and CAMP were |$P=3.89\ast{10}^{-10}$| and |$P=0.0239$|, respectively (Supplementary Material, Table S4). Describing novel associations, variants in the 8p22 locus reached genome-wide significance in the additive GACRS + CAMP analysis. The lead variant rs1581479 obtained an association P-value of |$P=1.47\ast{10}^{-8}$| (Fig. 2, Table 2). The corresponding association P-values in GACRS and CAMP were |$P=1.83\ast{10}^{-4}$| and |$P=2.43\ast{10}^{-6}$|, respectively.

Region association plots for 8p22 based on the additive GACRS + CAMP analysis (LocusZoom, EUR LD). Reference variant rs1581479 (A) and reference variant rs17515041 (B). The LD between rs1581479 and rs17515041 is |${r}^2=0.1$| and |${r}^2=0.14$| in GACRS and CAMP, respectively (estimated based on parental genotypes). The LD pair tool estimated the LD between both variants to be |${r}^2=0.14$| and |${r}^2=0.11$| in EUR (European) and AMR (Ad Mixed American), respectively.
Novel genetic associations for childhood asthma in either GACRS + CAMP or GACRS/CAMP only, based on an additive or recessive model
SNP . | CHR . | BP (hg38) . | A1 . | A2 . | TEST ALLELE . | MAF GACRS + CAMP . | Z GACRS + CAMP . | P GACRS + CAMP . | Z GACRS . | P GACRS . | Z CAMP . | P CAMP . | cluster . | model . | colocalization . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rs4832738a | 4 | 37 813 371 | G | A | A | 0.4964 | 6.03 | 1.72e−09 | 5.65 | 1.62e−08 | 2.22 | 2.62e−02 | 4p14 | recessive | eQTL: PGM2 in GTEx lung, fetal lung, immune cell types in eQTL Catalogue. Cluster contains meQTLs for cg20524452 and cg04038500 (blood). |
rs1581479 | 8 | 16 950 316 | C | T | C | 0.1801 | 5.67 | 1.47e−08 | 3.74 | 1.83e−04 | 4.71 | 2.43e−06 | 8p22–1 | additive | - |
rs17515041 | 8 | 16 997 985 | C | CTCTTG | C | 0.4267 | -5.43 | 5.50e−08 | -4.12 | 3.70e−05 | -3.71 | 2.11e−04 | 8p22–2 | additive | eQTL: FGF20 in GTEx spleen. Cluster contains meQTLs for cg04451175 and cg06872257 (blood). |
rs73367537 | 10 | 120 390 728 | T | C | T | 0.0261 | 4.09 | 4.30e−05 | 5.70 | 1.21e−08 | -1.89 | 5.88e−02 | 10q26 | additive | - |
SNP . | CHR . | BP (hg38) . | A1 . | A2 . | TEST ALLELE . | MAF GACRS + CAMP . | Z GACRS + CAMP . | P GACRS + CAMP . | Z GACRS . | P GACRS . | Z CAMP . | P CAMP . | cluster . | model . | colocalization . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rs4832738a | 4 | 37 813 371 | G | A | A | 0.4964 | 6.03 | 1.72e−09 | 5.65 | 1.62e−08 | 2.22 | 2.62e−02 | 4p14 | recessive | eQTL: PGM2 in GTEx lung, fetal lung, immune cell types in eQTL Catalogue. Cluster contains meQTLs for cg20524452 and cg04038500 (blood). |
rs1581479 | 8 | 16 950 316 | C | T | C | 0.1801 | 5.67 | 1.47e−08 | 3.74 | 1.83e−04 | 4.71 | 2.43e−06 | 8p22–1 | additive | - |
rs17515041 | 8 | 16 997 985 | C | CTCTTG | C | 0.4267 | -5.43 | 5.50e−08 | -4.12 | 3.70e−05 | -3.71 | 2.11e−04 | 8p22–2 | additive | eQTL: FGF20 in GTEx spleen. Cluster contains meQTLs for cg04451175 and cg06872257 (blood). |
rs73367537 | 10 | 120 390 728 | T | C | T | 0.0261 | 4.09 | 4.30e−05 | 5.70 | 1.21e−08 | -1.89 | 5.88e−02 | 10q26 | additive | - |
aIn complete LD with rs7691795 and rs4832956.
The column ‘cluster’ refers to the fine mapping results. The column ‘colocalization’ summarizes the results of the eQTL colocalization analysis and the overlap analysis between the fine mapped signal cluster and external meQTL results. eQTL: expression quantitative trait locus; meQTL: methylation quantitative trait locus.
Novel genetic associations for childhood asthma in either GACRS + CAMP or GACRS/CAMP only, based on an additive or recessive model
SNP . | CHR . | BP (hg38) . | A1 . | A2 . | TEST ALLELE . | MAF GACRS + CAMP . | Z GACRS + CAMP . | P GACRS + CAMP . | Z GACRS . | P GACRS . | Z CAMP . | P CAMP . | cluster . | model . | colocalization . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rs4832738a | 4 | 37 813 371 | G | A | A | 0.4964 | 6.03 | 1.72e−09 | 5.65 | 1.62e−08 | 2.22 | 2.62e−02 | 4p14 | recessive | eQTL: PGM2 in GTEx lung, fetal lung, immune cell types in eQTL Catalogue. Cluster contains meQTLs for cg20524452 and cg04038500 (blood). |
rs1581479 | 8 | 16 950 316 | C | T | C | 0.1801 | 5.67 | 1.47e−08 | 3.74 | 1.83e−04 | 4.71 | 2.43e−06 | 8p22–1 | additive | - |
rs17515041 | 8 | 16 997 985 | C | CTCTTG | C | 0.4267 | -5.43 | 5.50e−08 | -4.12 | 3.70e−05 | -3.71 | 2.11e−04 | 8p22–2 | additive | eQTL: FGF20 in GTEx spleen. Cluster contains meQTLs for cg04451175 and cg06872257 (blood). |
rs73367537 | 10 | 120 390 728 | T | C | T | 0.0261 | 4.09 | 4.30e−05 | 5.70 | 1.21e−08 | -1.89 | 5.88e−02 | 10q26 | additive | - |
SNP . | CHR . | BP (hg38) . | A1 . | A2 . | TEST ALLELE . | MAF GACRS + CAMP . | Z GACRS + CAMP . | P GACRS + CAMP . | Z GACRS . | P GACRS . | Z CAMP . | P CAMP . | cluster . | model . | colocalization . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rs4832738a | 4 | 37 813 371 | G | A | A | 0.4964 | 6.03 | 1.72e−09 | 5.65 | 1.62e−08 | 2.22 | 2.62e−02 | 4p14 | recessive | eQTL: PGM2 in GTEx lung, fetal lung, immune cell types in eQTL Catalogue. Cluster contains meQTLs for cg20524452 and cg04038500 (blood). |
rs1581479 | 8 | 16 950 316 | C | T | C | 0.1801 | 5.67 | 1.47e−08 | 3.74 | 1.83e−04 | 4.71 | 2.43e−06 | 8p22–1 | additive | - |
rs17515041 | 8 | 16 997 985 | C | CTCTTG | C | 0.4267 | -5.43 | 5.50e−08 | -4.12 | 3.70e−05 | -3.71 | 2.11e−04 | 8p22–2 | additive | eQTL: FGF20 in GTEx spleen. Cluster contains meQTLs for cg04451175 and cg06872257 (blood). |
rs73367537 | 10 | 120 390 728 | T | C | T | 0.0261 | 4.09 | 4.30e−05 | 5.70 | 1.21e−08 | -1.89 | 5.88e−02 | 10q26 | additive | - |
aIn complete LD with rs7691795 and rs4832956.
The column ‘cluster’ refers to the fine mapping results. The column ‘colocalization’ summarizes the results of the eQTL colocalization analysis and the overlap analysis between the fine mapped signal cluster and external meQTL results. eQTL: expression quantitative trait locus; meQTL: methylation quantitative trait locus.
Based on the recessive model, we observed genome-wide significant associations in the 4p14 locus. The lead variants rs4832738, rs7691795, and rs4832956 (in complete LD) reached an association P-value of |$P=1.72\ast{10}^{-9}$| (Table 2). All three P-values were based on the recessive model test for the major allele with an estimated allele frequency of 50.4% in GACRS + CAMP (Table 2 and Supplementary Material, Table S4). Supplementary Material, Figures S4–S6 visualize region association plots for 4p14. Supplementary Material, Figure S4 is based on standard LD correlation (LocusZoom, EUR LD), Supplementary Material, Figure S5 is based on a recessive genotype coding in 1000 Genomes phase 3 EUR, and Supplementary Material, Figure S6 is based on a recessive genotype coding in GACRS + CAMP (see section Materials and Methods). We added the two region plots in Supplementary Material, Figures S5 and S6 since LD does not correspond to the correlation between test statistics for the recessive model.
Considering GACRS separately, we observed highly significant low-frequency variants with minor allele frequencies (MAFs) of approximately 3% on 10q26 in the additive analysis. The lead variant rs73367537 had an additive association P-value of |$P=1.21\ast{10}^{-8}$| in GACRS and all significant variants were in strong LD (Supplementary Material, Fig. S7). However, the corresponding additive association P-value in CAMP was nearly significant (|$P=0.059$|) but with an opposite direction of effect (Table 2, Supplementary Material, Table S4). Aside the established 17q21 associations, the identified SNPs on 4p14, 8p22, and 10q26 were not previously described in the childhood-onset asthma context (GWAS catalog (17), June 2022) and therefore represent novel genetic associations for childhood asthma.
Single variant results replicate previously identified genetic associations for childhood asthma
Among the 87 genetic variants in our additive GACRS + CAMP analysis that were previously reported as childhood-onset asthma associations in the Ferreira et al. analysis (3) (see section Materials and Methods), 13 (15%) had a nominal significant additive GACRS + CAMP association P-value |$(P\le 0.05)$|. A binomial test for enrichment |$(\mathrm{null}\ \mathrm{hypothesis}:\mathrm{success}\ \mathrm{probability}\le 0.05)$| resulted in a significant P-value of |$P=3.85\ast{10}^{-4}$|, assuming independent genetic variants and thus neglecting residual LD in this pruned set of variants. All of the 13 nominal significant associations had a consistent direction of effect. Considering directions of effects only, 70 out of 87 (80%) had a consistent direction in the additive GACRS + CAMP analysis. The corresponding binomial test for enrichment (null hypothesis: success probability |$\le 0.5)$| resulted in a p-value of |$P=4.18\ast{10}^{-9}$|. The list of the 13 nominal significant associations can be found in Supplementary Material, Table S1.
To get a quantitative understanding of the replication results, we performed a replication power analysis for our analysis based on 1180 affected trios using established power formulas for the TDT/FBAT (15). We extracted the GARCS/CAMP minor allele frequencies (estimated based on parental genetic data) and the reported odds ratios from the Ferreira et al. analysis for all 87 available associations/variants. We applied the empirical_bayes function (18) in the winnerscurse R package (see section code and data availability) to the log odds ratios to correct for the Winner’s curse (19). Next, we converted the adjusted odds ratios to relative risks (assuming a prevalence of 0.27) and computed the power for each variant to reach nominal significance (|$P\le 0.05$|) in our GACRS + CAMP analysis. The expected number of nominally significant associations based on these computations was approximately 19.78. The probability of observing 13 or fewer nominally significant associations based on these power parameters was approximately |$P=0.04$| (based on 100 000 simulations). Although the number of observed associations is less than expected, we conclude that the deviations from the theoretical expectations are modest. In general, replication analyses for asthma risk loci are challenging, given the heterogeneity of asthma and asthma diagnoses.
Statistical fine mapping of single variant associations
We performed statistical fine mapping for all identified novel loci using DAP-G (see section Materials and Methods) to pinpoint genetic variants that most likely explain the observed associations. Supplementary Material, Table S4 contains all variants in each identified cluster, where variants within a cluster are in strong LD with each other. This table also contains the posterior inclusion probabilities for each genetic variant as computed by DAP-G. In line with the regional association plots for the 8p22 association (Fig. 2), DAP-G found evidence for an independent signal with lead variant rs17515041 in the 8p22 locus. The corresponding association P-value was |$P=5.5\ast{10}^{-8}$| in the GACRS + CAMP analysis, whereas the cohort-specific association P-values for GACRS and CAMP were P|$=3.7\ast{10}^{-5}$| and |$P=2.11\ast{10}^{-4}$|, respectively (Table 2 and Supplementary Material, Table S4). The LD between rs1581479 and rs17515041 is |${r}^2=0.1$| and |${r}^2=0.14$| in GACRS and CAMP, respectively, estimated based on parental genotypes. These two clusters on 8p22 will be denoted by 8p22–1 and 8p22–2 in the following. For the other loci, only one signal cluster was identified. Supplementary Material, Table S4 contains additional information including annotations for all genetic variants located in the identified clusters (Material and Methods).
Expression quantitative trait loci colocalization analysis
We tested the single variant associations on 4p14, 8p22–1, 8p22–2 and 10q26 for colocalization with expression quantitative trait loci (eQTLs). Since there are two independent associations in the 8p22 locus, approximately 50 kb apart, we used a radius of 25 kb between the two associations when extracting the FBAT statistics for the colocalization analysis. For the other loci, we used a radius of 50 kb (see section Materials and Methods).
The lead variant rs4832738 on 4p14 colocalized with an eQTL signal for PGM2 (Phosphoglucomutase 2) in fetal lung (false discovery rate (FDR) |$5\ast{10}^{-5}$|), GTEx lung tissue (FDR |$<{10}^{-5}$|) and several immune cell types in the eQTL Catalogue data. According to the GTEx eQTL results, the asthma risk increasing allele increases PGM2 expression in the lung.
For rs1989754 on 8p22–2, in strong LD with the lead variant rs17515041 on 8p22–2 (|$r^2>0.98$| in AMR (Ad Mixed American) and EUR (European) populations in LDpair tool, see section code and data availability), we observed colocalization with an eQTL signal for FGF20 (Fibroblast Growth Factor 20) in spleen tissue in GTEx (FDR 0.0064). Here, the asthma risk decreasing allele increases expression in the spleen tissue. We tested rs1989754 since rs17515041 is an Indel and was removed in the eQTL studies. For the GWAS lead variants rs1581479 on 8p22–1 and rs73367537 on 10q26, no significant colocalizations with gene expression data were observed.
Overlap with meQTL results
Investigating the overlap between the identified signal clusters and recent methylation quantitative trait loci (meQTLs) results in blood (20) showed that rs3862103 and rs1721100 on 8p22–2 are clumped meQTLs for cg04451175 and cg06872257, respectively. On 4p14, we observed that rs34738612 and rs11721535 are clumped meQTLs for cg20524452 and cg04038500, respectively. A clumped meQTL represents an approximately fine mapped meQTL signal using LD clumping (see section Materials and Methods).
Replication of novel associations in UK Biobank and additional support for 10q26 by GACRS findings
Based on the publicly available summary statistics (see section code and data availability) for the analysis of childhood asthma in the UK Biobank (3), we investigated the association P-values of our identified variants on 4p14, 8p22 and 10q26. The identified variants failed to replicate (rs4832738 P = 0.003829 with opposing direction of effect, rs1581479 P = 0.6833 and rs73367537 P = 0.1962). Also, rs1989754, in strong LD with the second association rs17515041 on 8p22–2, was not significant (P = 0.3078). Association data for rs17515041 was not available. We additionally sought evidence for replication of our signal on 10q26. Since we observed heterogeneous effects for 10q26 in GACRS and CAMP, we performed a similar analysis as described by Ferreira et al. in the UK Biobank (see section Materials and Methods) but partitioned the samples according to the UK Biobank derived variable ‘White British’ ancestry (combination of self-report and genetic principal component analysis). Overall, after quality control, we obtained 369 010 samples, including 11 738 asthmatics with childhood-onset and 357 272 controls. To avoid extreme imbalances, we restricted the set of controls to randomly chosen 50 000 controls. Then, we performed the association analysis of rs73367537 in two distinct groups defined by the British European ancestry variable. In this stratified analysis, we observed a significant risk effect of rs73367537-T, consistent with the effect in GACRS, in samples with British European ancestry (9702 cases, 41 437 controls, P = 0.0014) and a suggestive significant protective effect of rs73367537-T in the remaining participants (2036 cases, 8563 controls, P = 0.064). Accordingly, a standard test for equality of effects in both subsamples had a significant two-sided P-value of P = 0.0012.
Furthermore, we also found additional support for the 10q26 association in independent Costa Rica data. In 2007, Celedón et al. performed a linkage analysis of asthma and airway responsiveness in eight large pedigrees from Costa Rica (21). These pedigrees partly overlap with GACRS, but the 894 GACRS trios in our analysis were not included in the linkage analysis. Celedón et al. found evidence for linkage signals for asthma and airway responsiveness. One shared signal was located on Chromosome 10 with genetic coordinates between 145 and 151 cm (21). The lead variant on 10q26 in GACRS, rs73367537, is located in this genetic region. We ran an FBAT analysis of rs73367537 in all members of the corresponding extended pedigrees with physician’s diagnosed asthma and WGS data available. The resulting P-value was |$P=0.045$| (13 informative FBAT families) with consistent risk effect direction as in the GACRS main analysis.
In summary, the UK Biobank analysis replicated both the association and observed heterogeneity of the 10q26 signal. Furthermore, the association was replicated in an independent part of GACRS.
Region-based association analysis
The sliding window approach constructed a total of 1 331 478 unique regions. As described in the Material and Methods section, we evaluated the regions using a two-stage procedure.
In the first stage, we considered all regions that are located within a radius of 20 kb around one of the associations identified in the single variant analysis. This included the lead variants in the signal clusters on 4p14, 8p22–1, 8p22–2 and 10q26, as well as the lead variant on 17q21. We obtained a total number of 98 regions, with an average number of 27 variants (minimum number 2 and maximum number 50). Among these regions, 16 showed a significant FBAT ACAT association P-value after Bonferroni correction for |$n=98$| tests and a significance level of |$\alpha =0.05$|, i.e. |$P<\frac{0.05}{98}=0.00051$|. The 16 regions were located in the 8p22 and 10q26 loci. Since the single variant associations on 10q26 are low-frequency variants with MAF of around 3% themselves, we focused on the six regions on 8p22 (Supplementary Material, Table S2). We further analyzed the conditional significance of these regions (Table 3), conditioning on the lead variants in the signal clusters 8p22–1 and 8p22–2 identified in the single variant analysis. Most of the regions remained nominally significant, even though, in general, the conditional approach reduces power substantially due to robustness of the model-free FBAT approach. Table 3 also reports the FBAT ACAT P-values for the 6 regions, based on GACRS and CAMP separately. In Supplementary Material, Table S6, we report the results of FBAT haplotype analysis based on the two common genetic variants and all rare variants that had a nominal significant single variant FBAT additive association P-value in GACRS + CAMP, GACRS or CAMP. Overall, these results suggest that both common and low-frequency/rare variants play a role in the architecture of the 8p22 association with childhood-onset asthma and that the observed signals appear to be highly consistent between GACRS and CAMP. Therefore, our first stage analysis identified potential additional, independent rare/low-frequency variant associations on 8p22.
Genetic regions in the 8p22 locus with significant FBAT ACAT association P-value in the first stage of testing, P-values shown are uncorrected for multiple testing (see Supplementary Material, Table S2)
CHR . | Start BP (hg38) . | End BP (hg38) . | Number of variants . | ACAT . | ACAT conditional on rs1581479 . | ACAT conditional on rs17515041 . | GACRS ACAT . | CAMP ACAT . |
---|---|---|---|---|---|---|---|---|
8 | 16 948 949 | 16 951 656 | 30 | 2.77e−04 | 9.31e−03 | 4.68e−03 | 2.43e−03 | 1.08e−01 |
8 | 16 949 866 | 16 953 451 | 39 | 2.03e−05 | 7.15e−03 | 1.21e−03 | 3.86e−04 | 2.12e−02 |
8 | 16 995 849 | 16 999 748 | 35 | 2.89e−04 | 9.53e−02 | 3.46e−02 | 1.42e−04 | 2.85e−02 |
8 | 17 003 908 | 17 007 616 | 29 | 5.27e−05 | 4.14e−02 | 3.00e−02 | 2.76e−03 | 1.17e−02 |
8 | 17 005 897 | 17 009 766 | 28 | 7.33e−05 | 1.01e−01 | 1.98e−02 | 1.18e−02 | 1.48e−02 |
8 | 17 012 081 | 17 015 804 | 36 | 3.13e−04 | 3.91e−02 | 4.33e−04 | 8.53e−04 | 4.76e−01 |
CHR . | Start BP (hg38) . | End BP (hg38) . | Number of variants . | ACAT . | ACAT conditional on rs1581479 . | ACAT conditional on rs17515041 . | GACRS ACAT . | CAMP ACAT . |
---|---|---|---|---|---|---|---|---|
8 | 16 948 949 | 16 951 656 | 30 | 2.77e−04 | 9.31e−03 | 4.68e−03 | 2.43e−03 | 1.08e−01 |
8 | 16 949 866 | 16 953 451 | 39 | 2.03e−05 | 7.15e−03 | 1.21e−03 | 3.86e−04 | 2.12e−02 |
8 | 16 995 849 | 16 999 748 | 35 | 2.89e−04 | 9.53e−02 | 3.46e−02 | 1.42e−04 | 2.85e−02 |
8 | 17 003 908 | 17 007 616 | 29 | 5.27e−05 | 4.14e−02 | 3.00e−02 | 2.76e−03 | 1.17e−02 |
8 | 17 005 897 | 17 009 766 | 28 | 7.33e−05 | 1.01e−01 | 1.98e−02 | 1.18e−02 | 1.48e−02 |
8 | 17 012 081 | 17 015 804 | 36 | 3.13e−04 | 3.91e−02 | 4.33e−04 | 8.53e−04 | 4.76e−01 |
The table also reports the (uncorrected) conditional FBAT ACAT association P-values when conditioning on one of the common variants identified in the single variant analysis (Table 2) (P-values based on 1 000 000 simulations). The last two columns contain the FBAT ACAT (uncorrected) P-values for the regions using only GACRS and CAMP data, respectively (P-values based on 10 000 000 simulations). FBAT: Family-Based Association Test; ACAT: Aggregated Cauchy Association Test.
Genetic regions in the 8p22 locus with significant FBAT ACAT association P-value in the first stage of testing, P-values shown are uncorrected for multiple testing (see Supplementary Material, Table S2)
CHR . | Start BP (hg38) . | End BP (hg38) . | Number of variants . | ACAT . | ACAT conditional on rs1581479 . | ACAT conditional on rs17515041 . | GACRS ACAT . | CAMP ACAT . |
---|---|---|---|---|---|---|---|---|
8 | 16 948 949 | 16 951 656 | 30 | 2.77e−04 | 9.31e−03 | 4.68e−03 | 2.43e−03 | 1.08e−01 |
8 | 16 949 866 | 16 953 451 | 39 | 2.03e−05 | 7.15e−03 | 1.21e−03 | 3.86e−04 | 2.12e−02 |
8 | 16 995 849 | 16 999 748 | 35 | 2.89e−04 | 9.53e−02 | 3.46e−02 | 1.42e−04 | 2.85e−02 |
8 | 17 003 908 | 17 007 616 | 29 | 5.27e−05 | 4.14e−02 | 3.00e−02 | 2.76e−03 | 1.17e−02 |
8 | 17 005 897 | 17 009 766 | 28 | 7.33e−05 | 1.01e−01 | 1.98e−02 | 1.18e−02 | 1.48e−02 |
8 | 17 012 081 | 17 015 804 | 36 | 3.13e−04 | 3.91e−02 | 4.33e−04 | 8.53e−04 | 4.76e−01 |
CHR . | Start BP (hg38) . | End BP (hg38) . | Number of variants . | ACAT . | ACAT conditional on rs1581479 . | ACAT conditional on rs17515041 . | GACRS ACAT . | CAMP ACAT . |
---|---|---|---|---|---|---|---|---|
8 | 16 948 949 | 16 951 656 | 30 | 2.77e−04 | 9.31e−03 | 4.68e−03 | 2.43e−03 | 1.08e−01 |
8 | 16 949 866 | 16 953 451 | 39 | 2.03e−05 | 7.15e−03 | 1.21e−03 | 3.86e−04 | 2.12e−02 |
8 | 16 995 849 | 16 999 748 | 35 | 2.89e−04 | 9.53e−02 | 3.46e−02 | 1.42e−04 | 2.85e−02 |
8 | 17 003 908 | 17 007 616 | 29 | 5.27e−05 | 4.14e−02 | 3.00e−02 | 2.76e−03 | 1.17e−02 |
8 | 17 005 897 | 17 009 766 | 28 | 7.33e−05 | 1.01e−01 | 1.98e−02 | 1.18e−02 | 1.48e−02 |
8 | 17 012 081 | 17 015 804 | 36 | 3.13e−04 | 3.91e−02 | 4.33e−04 | 8.53e−04 | 4.76e−01 |
The table also reports the (uncorrected) conditional FBAT ACAT association P-values when conditioning on one of the common variants identified in the single variant analysis (Table 2) (P-values based on 1 000 000 simulations). The last two columns contain the FBAT ACAT (uncorrected) P-values for the regions using only GACRS and CAMP data, respectively (P-values based on 10 000 000 simulations). FBAT: Family-Based Association Test; ACAT: Aggregated Cauchy Association Test.
In the second stage, we considered the ACAT P-values for all 1 331 478 regions. The quantile-quantile plots for the ACAT and the four sub-statistics is displayed in Supplementary Material, Figure S8. None of these regions were significant after correction for multiple testing. We considered all suggestively significant regions with an FBAT ACAT P-value |$P<{10}^{-5}.$| The corresponding regions were located on Chromosomes 1, 3, 16 and 17 and are described in Supplementary Material, Table S3. Supplementary Material, Table S5 contains additional information including annotations for all low-frequency/rare genetic variants with a P-value below 0.05 in either GACRS + CAMP or GACRS and CAMP individually, that are located within one of the described regions. Overall, the second stage analysis did not reveal genome-wide significant associations.
Discussion
We performed a WGS-based association analysis based on two family-based studies and identified three novel genetic associations for childhood asthma.
Our family-based analysis has several advantages compared to recent large-scale case–control studies. First, the trio design provides robustness against population stratification, allowing a joint analysis of the combined data from GACRS and CAMP. Second, asthma status in our analysis is based on comparable definitions of asthma in both study populations, including a physician’s diagnosis. This more rigorous definition was used instead of self-report or a simple questionnaire-based diagnosis of asthma used in other asthma GWAS. Third, the incorporation of different ethnicities and genetic ancestries provides potential insights into the generalizability of the results. Fourth, our association analysis is based on whole-genome sequencing data that allowed us to robustly identify low-frequency variant associations in the GACRS cohort, as well as potential low-frequency/rare variant signals on 8p22. The main limitation of our analysis is the relatively small sample size compared to large-scale association analyses based on unrelated samples, which limits the power, especially for the genome-wide region-based analysis of low-frequency and rare variants. Another limitation of the analysis is that our eQTL colocalization testing using the JLIM approach does not take into account functional annotations or experimental validation. The resolution level of the colocalization approach depends on the underlying LD structure, and the relatively small overall sample size makes it difficult to distinguish causal variants from tagged variants. Furthermore, eQTL studies and GWAS tend to be biased towards different types of variants, emphasizing careful interpretation of colocalization results (22). Therefore, future studies are needed to confirm the role of the proposed genes.
Besides the three novel genetic loci associated with childhood asthma, we confirmed 13 (15%) of previously identified asthma GWAS associations. Moreover, 70 out of 87 previously identified loci (80%) had a consistent direction of effect.
The study design and analysis strategy played a crucial role in the identification of the three novel loci. For the signal on 8p22, both studies had similar association signals for two independent common variants and together these signals reached genome-wide significance in the combined analysis. Furthermore, our region-based analysis suggested that there may be additional, independent low-frequency/rare variant associations in this region. The 4p14 association reached genome-wide significance in the recessive model. Finally, the association signal on 10q26 consists of low-frequency variants, enabled by the WGS data, and was observed in GACRS only and may have ancestry- or environment-specific effects.
The 4p14 locus was the most significant locus using the recessive association model. SNP rs13122762, part of the signal cluster, is located at the 5′ upstream of PGM2. The signal cluster also contained clumped meQTL for cg20524452 and cg04038500, but no previous epigenome-wide association was reported for these CpG sites in the EWAS atlas (23). Furthermore, we found evidence for colocalization with eQTL for PGM2 in several relevant tissues and datasets, including both lung tissues and immune cell types. PGM2 encodes an enzyme implicated in glucose metabolism and catalyzes the conversion of the nucleoside breakdown products ribose-1-phosphate and deoxyribose-1-phosphate to the corresponding 5-phosphopentoses and the interconversion of glucose-1-phosphate to glucose-6-phosphate.
Recently, early embryonic lethality was reported in Pgm2 knockout mice before E9.5 when lung development starts, suggesting important roles of Pgm2 in embryonic development (24). Future studies in inducible and conditional knockout mice of Pgm2 in specific lung cell types, especially in Ovalbumin (OVA)- or house dust mite-induced asthmatic models may pinpoint important cell types and possible molecular mechanisms of how PGM2 may contribute to asthma susceptibility. Glucose metabolism was shown to be dysregulated in asthma development (25–27). It is conceivable that the expression levels of PGM2, influenced by genetic variants on 4p14, may impact asthma susceptibility possibly through fine-tuning the glucose metabolic status in immune cells, thereby priming epigenetic features of the genes and impacting the inflammatory response in airways in response to allergen or viral infection.
On 8p22, we found two common variant associations (i.e. two signal clusters), that were in weak LD. In addition, we observed potentially associated low-frequency/rare variants in close proximity to both common variants. However, the exact relationship of these rare variants to the two separate common variants is hard to explore because of the LD structure of the region and the limited sample size. The association was also supported by clumped meQTL associations for cg04451175 and cg06872257, which were reported to be associated with aging and gestational age (23). For the first signal cluster 8p22–1, we found no colocalization with eQTLs. For the second signal cluster, 8p22–2, gene expression colocalization analysis suggested FGF20 as the gene of interest. FGF20 belongs to the important fibroblast growth factor family of genes that have essential roles during embryonic lung development. We previously reported that the genetic variants nearby FGF20 are linked with allergic asthma (28). FGF20 binds to its membrane receptor FGFR1, which activates the MAPK pathway (29) and the beta-catenin/Wnt pathway (30) to modulate cell proliferation and differentiation. Furthermore, previous findings (31) in Fgf20 knockout mice demonstrated that the potential genetic interaction between FGF20 and SOX2 may determine the differentiation of progenitor cells in cortical hair cells during embryonic development. Given that SOX2 is an important transcription factor that determines the differentiation of proximal airway epithelial progenitor cells during embryonic lung development, FGF20 may also modulate the function of airway epithelial cells during fetal lung development or post-natally. Our study suggests that future characterization of the function of FGF20 in both murine and cellular models relevant to asthma is clearly warranted.
The results for 10q26 are more difficult to interpret. This locus was statistically significant in GACRS, and was independently implicated in a linkage study based on independent samples from extended pedigrees of Costa Rican children with asthma. FBAT testing in this independent set also resulted in a significant association (P = 0.045). It was also replicated in the UK Biobank in British Europeans. The signal did not replicate in CAMP or the remaining UK Biobank participants but demonstrated heterogeneity of the effect. While there were no eQTL analyzed in this region, fine-mapped variants overlap with a distal, non-promoter DNase I hypersensitive site (DHS) that is correlated with a promoter DHS for WDR11 (32,33). The low-frequency variants on 10q26 associated with childhood asthma in GACRS are also present with similar allele frequencies in CAMP and the UK Biobank, but the analyses indicate that the genetic effects on asthma differ. We hypothesize that the effects of the identified variants on 10q26 are modified by either a genetic or environmental factor.
The replication analysis provided mixed results. The associations on 4p14 and 8p22 did not replicate in the UK Biobank childhood asthma analysis by Ferreira et al. (3). We also investigated reported associations between the lead variants in both loci and other phenotypes, including asthma-related outcomes, in the GWAS catalog, FinnGen database, and publicly available UK Biobank summary statistics via open target genetics (34,35). No significant result was observed. Further analyses are needed to analyze the exact reasons for the lack of replication. Potential explanations include differences in phenotype definitions or modification of genetic risk by population-specific factors. Asthma status in GACRS and CAMP was directly assessed by a physician’s diagnosis and not self-reported. We also note that the association on 10q26 did not replicate in the overall UK Biobank analysis but demonstrated significance and effect heterogeneity in the stratified analysis according to the ‘White British’ variable.
Our genome-wide region-based rare variant analysis did not identify any significant signals. Suggestive signals include a region on 3p26 that consists of intronic variants for the gene SUMF1, which has been hypothesized to be associated with COPD (36). A second region on 16q12 contains an intronic splice site variant for ABCC11 with a Combined Annotation Dependent Depletion (CADD) score (37) above 24.
Our study design has many strengths, but the major weakness is that we lack a large enough sample size to identify rare variant associations for asthma in genome-wide analyses. Also, as mentioned above, the eQTL colocalization results are not functionally validated. This will be subject of future studies. However, our disease definition and trio design set a high standard for asthma genetic association studies, and we have adequate power to detect common variants.
Materials and Methods
Study populations
Our analysis is based on data from two studies with asthmatic children: ‘The Genetic Epidemiology of Asthma in Costa Rica’ study (GACRS) and the Childhood Asthma Management Program (CAMP). GACRS is a cohort of children between 6 and 14 years of age with physician-diagnosed asthma and at least two episodes of respiratory symptoms (wheezing, cough or dyspnea) or a history of asthma attacks in the previous year (6). Additional criteria were that the children have at least six great-grandparents born in the Central Valley of Costa Rica. The population of the Central Valley of Costa Rica is a genetic isolate with one of the highest prevalences of asthma worldwide (approximately 27% in children) (38). For further details about enrollment and phenotyping protocols, we refer to previous publications (6).
CAMP was a multicenter clinical trial designed to determine the long-term effects of three inhaled treatments for childhood asthma (7). This cohort consists mainly of trios and a similar age distribution as GACRS. CAMP only included children with mild to moderate persistent asthma, as defined by the presence of symptoms or by the use of an inhaled bronchodilator at least twice weekly or the use of daily medication for asthma (39). Children with too many symptoms during the run-in period, such that they could not be taken off medication, were excluded. Additional exclusion criteria included severe chronic sinusitis or nasal polyposis, use of more than four sprays of nasal steroids daily (only beclomethasone allowed), current use of metoclopramide, ranitidine or cimetidine treatment for gastroesophageal reflux or evidence of severe asthma. GACRS and CAMP used similar protocols for phenotyping subjects.
Ethics
Written parental consent and participating child’s assent were obtained. For GACRS data, the study was approved by the Mass General Brigham Human Research Committee at Brigham and Women’s Hospital (Boston, MA; protocol No. 2000P001130) and the Hospital Nacional de Niños (San José, Costa Rica).
For CAMP data, all study procedures were approved by the Mass General Brigham Human Research Committee at Brigham and Women’s Hospital (Boston, MA; protocol No. 2011P000710).
Whole-genome sequencing and quality control
Whole-genome sequencing data for GACRS and CAMP were generated as part of the National Heart, Lung and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program (8). Details on DNA sample handling, quality control, library construction, clustering and sequencing, read processing and sequence data quality control are described on the TOPMed website (https://topmed.nhlbi.nih.gov/topmed-whole-genome-sequencing-methods-freeze-8). Variant calls were obtained from TOPMed data freeze 8 with variant call format files aligned to the GRCh38 genome reference. We extracted biallelic SNPs and insertions/deletions that passed quality control, only considering individual variant information with at least 10x sequencing depth for the analyses. Based on genome-wide identity-by-descent estimates generated by kinship-based inference for GWAS (KING) (40) and additional information from the TOPMed Data Coordinating Center, we identified and removed genetic duplicates, pedigree discrepancies and sex mismatches. After this step, we kept complete trios for the family-based association analysis.
Single variant association analysis
Based on the trio data, we performed single variant family-based association tests (FBAT v204) using additive and recessive genetic models (9,10). Our primary analysis was based on the combined GACRS and CAMP dataset (GACRS + CAMP), but we also considered GACRS and CAMP separately. The advantage of the FBAT approach is that the association test is robust to population stratification and admixture (41), allowing the combination of both studies. For the single variant analysis, we excluded genetic variants with a minor allele frequency (MAF) below 1%, more than four Mendelian errors, genotype missing rate > 2%, TOPMed freeze 8 reported batch effects or deviations from Hardy–Weinberg proportions |$(P<{10}^{-8}$|). Furthermore, we only considered association tests with at least 10 informative families to ensure valid asymptotic inference. Testing based on an additive genetic model provides one association P-value per variant, whereas testing based on the recessive genetic model can be performed based on the minor allele as well as based on the major allele. We performed recessive tests on the minor allele for all variants where this test had at least 10 informative families. For the recessive tests on the major alleles, however, we restricted testing to variants with a major allele frequency of at most 60%. The corresponding quality control computations were performed using PLINK (version v1.90b6.12 and v2.00a2.3LM). The MAFs and Hardy–Weinberg deviations were assessed based on parental genotype data. We considered only autosomal genetic data in our analyses. We declared association tests with P-values |$P<5\ast{10}^{-8}$| as genome-wide significant. Since the FBAT statistic equals the TDT for asthmatic trios, we used existing power formulas for the TDT to investigate the expected power of our combined GACRS + CAMP analysis based on 1180 trios (15).
Replication of previously reported associations for childhood asthma
We downloaded all genetic associations from the Ferreira et al. analysis of childhood-onset asthma from the NHGRI-EBI GWAS catalog (3,17) (Study accession GCST007800, June 18, 2021). Among the 105 reported genetic variants, 103 were autosomal and therefore potentially overlapping with our analysis. Further, 87 out of the 103 variants (84%) were available in the GACRS + CAMP after the described quality control procedure. We considered the corresponding additive association P-values and direction of effect for comparison.
Statistical fine mapping
Due to the high variant density in the WGS data, signal clusters contain a large number of variants in strong linkage disequilibrium (LD). To identify potentially causal genetic variants, we performed statistical fine-mapping for all regions with significant single variant associations with an association P-value |$P<5\ast{10}^{-8}$|. For the fine mapping, we included all variants within a window of ±100 kb around the lead association. Since the FBAT tests are score test statistics, we obtain association z-scores but no effect estimates with corresponding standard errors. This motivated using the DAP-G tool (42) for fine mapping since only z-scores and the corresponding LD/correlation matrices are required as input. The LD matrices were estimated based on parental WGS data. For associations discovered using the recessive model, we replaced the standard LD matrix with an empirical correlation matrix based on the parental genotype data and a recessive coding. The ld_control parameter was set to 0.25. We considered all signal clusters, groups of genetic variants in strong LD, with a DAP-G posterior inclusion probability of at least 50% for further analyses.
eQTL colocalization analysis
We tested for colocalization between the identified genetic associations with asthma (‘primary trait’) and eQTLs (‘secondary trait’) using the joint likelihood mapping (JLIM) approach (43). We used the FBAT association P-values within a region of ±50 kb around the lead variant for the primary trait (in the presence of two associations within the same loci, we used a smaller radius). For the secondary trait, we used the eQTL association P-values of the corresponding intervals from the following eQTL datasets: all immune cell types in the eQTL Catalogue (44), all tissues in the Genotype Tissue Expression project (GTEx; version 8) (45) and a study of cis-eQTL in developing human fetal lung (sample superset of dataset previously described (46)). For GTEx tissues, we used the eQTL association statistics calculated only with subjects of European ancestry, available for download from the GTEx consortium. In the case of fetal lung eQTLs, we used the full multi-ethnic eQTL data since the underlying individual-level genotype data were available. In addition to the association statistics, JLIM requires the LD matrices of primary and secondary traits to account for the correlation of association statistics between SNPs. For the additive model with the primary trait (asthma), LD matrices were derived directly from parental WGS genotype. Considering the secondary trait (eQTL), LD matrices were calculated from the genotypes of non-Finnish European ancestry (n = 404; 1000 Genomes Project phase 3 (47)) for the eQTL Catalogue and GTEx; and from the genotypes of all post-QC multi-ethnic subjects (n = 386) for fetal lung eQTLs. For the recessive model, we treated the alleles of each SNP as if they were distinct SNPs. Specifically, for the primary trait, the empirical correlation of parental genotype data based on a recessive coding was used. For the secondary trait, the LD between alleles of the same variants was set to 1, and the LD between alleles of different variants was set to the underlying LD between the SNPs.
For all colocalization analyses, we examined all genes of which transcription start sites are within 1 Mbps from the FBAT lead variant. Due to low imputation accuracies, especially in datasets with non-European genetic ancestry, Indels, genetic variants with MAF < 5%, and tri-allelic SNPs were filtered out from the analyses. JLIM was applied in the default setting with a maximum adaptive permutation of 100 000 iterations and outputs a colocalization P-value for each gene. The FDR was calculated separately for each tissue/cell type and locus using the Benjamini–Hochberg procedure (48). All colocalizations with FDR below 5% were considered to be significant.
Overlap with methylation quantitative trait loci results in recent large-scale analysis
We used results from a recent large-scale DNA methylation quantitative trait loci (meQTLs) analysis to investigate the overlap between the identified signal clusters in the DAP-G fine mapping analysis and meQTL associations in blood (20). We extracted the clumped meQTL results from this analysis, using the reference SNP ID (rsID) as the identifier. Clumping corresponds to linkage disequilibrium (LD) clumping, aiming to extract the most significant meQTL while excluding other meQTL associations in LD.
Region-based association analysis
To test for an association between low-frequency/rare genetic variants and childhood asthma, we applied the recently published framework for region-based association analysis in family-based designs (14). This framework evaluates different association test statistics (Burden, SKAT (49,50), Higher Criticism (51) and maximum single variant (51)) and computes an overall association P-value using the Aggregated Cauchy Association Test (ACAT) statistic (52). The evaluation of the individual sub-statistics is based on simulations and does not require asymptotic theory. For the region-based analysis, we considered genetic variants with MAF below 5% and at least five informative families (single variant FBAT, additive model). Also, we excluded variants with more than two Mendelian errors, variant missing rate > 2%, TOPMed freeze 8 reported batch effects or deviations from Hardy–Weinberg proportions (|$P<{10}^{-8}$|). All region-based test statistics were computed based on an additive genetic model in GACRS+CAMP. The number of simulations for the P-value computation is chosen adaptively and the maximum number of simulations was set to 10 000 000.
We partitioned all genetic variants into regions by using a sliding window approach with a size of 4 kb (but at most 50 genetic variants) and with a shift of 2 kb. For the evaluation of significance, we considered a two-stage approach. First, we evaluated all regions in a ±20 kb radius around a significant association identified in the single variant analysis. Here, we used a Bonferroni correction corresponding to the number of regions in this first stage. In the second stage, we considered all remaining regions. Additionally, for significant regions in the first stage, we also performed a conditional test, where we conditioned on the corresponding single variant association to confirm the independence of the potential observed low-frequency/rare variant signal. All region-based computations were performed using FBAT v204 and v208 (see section code and data availability).
Annotation of genetic variants, region association plots and Miami plot
We used two different resources to annotate potential causal variants to genes and functional categories: SNPnexus (53–57) and the combined SNP-to-gene (cS2G) strategy (58). SNPnexus combines different external resources to assist in selecting functionally relevant genetic variants and is accessible using a web interface (see section code and data availability). The cS2G strategy includes seven constituent SNP-to-gene strategies (Exon, Promoter, two fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC) and Cicero), and scores can be downloaded (see section code and data availability). Region association plots were created using the web application of LocusZoom (59) (see section code and data availability). The EUR LD information for the LocusZoom plots was used. Region association plots for the recessive association data were created using internal R code (see section code and data availability). The Miami plot for the single variant additive and recessive analysis was generated using the R package miamiplot (see section code and data availability).
Replication analysis: UK Biobank
We used the summary statistics from the UK Biobank analysis of childhood asthma (3) as a replication dataset (see section code and data availability). In addition, we used the individual genotype data in the UK Biobank (60) to perform more specific stratified analyses using a similar approach as conducted by Ferreira et al.
In particular, we kept only individuals with no deviations between self-reported and genetically inferred sex, no putative sex chromosome aneuploidies, no excess relatives, that are no outliers for heterozygosity or missing rate, and that have available genetic principal component data. Based on this subset of individuals, we identified individuals as asthmatics with childhood-onset using self-reported asthma onset before 16 years of age (field 3786, ‘What was your age when the asthma was first diagnosed?’) and no difference of more than 10 years between the self-reported onset and the age where asthma was diagnosed by a physician (field 22 147, ‘age at which doctor diagnosed asthma’), if information was available. We defined individuals as controls that had no self-reported asthma, no physician-diagnosed asthma, and no asthma indicated by field 6152 or 20 002. Asthma status was regressed on the imputed genotype data using logistic regressions adjusting for the first 10 principal components, genotyping array and sex. For the stratified analysis, we extracted the UK Biobank variable ‘in.white.British.ancestry.subset’ (data field 22 006), which represents a combination of self-reported information and genetic principal component analysis.
Code and data availability
Summary statistics for the single variant analyses: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001974. The FBAT software (v204 and v208) is available at https://sites.google.com/view/fbatwebpage. PLINK (version v1.90b6.12 and v2.00a2.3LM) was downloaded from https://www.cog-genomics.org/plink/. DAP-G (v1.0.0) was downloaded from https://github.com/xqwen/dap. JLIM (version 2.5) is available at https://github.com/cotsapaslab/jlim. C2G scores are available at https://alkesgroup.broadinstitute.org/cS2G. SNPnexus is available at https://www.snp-nexus.org/v4/. Region association plots were created using https://my.locuszoom.org/ (v0.14.0). Region association plots for recessive association data was performed using an adaption of the R code available at https://github.com/Geeketics/LocusZooms. UK Biobank website (https://www.ukbiobank.ac.uk/). Data processing and plots were created using R (version 4.1.0) (https://www.r-project.org/). Summary statistics for the childhood-onset analysis by Ferreira et al. (3) are available at https://genepi.qimr.edu.au/staff/manuelF/gwas_results/main.html. The LDpair tool is available at https://ldlink.nci.nih.gov/?tab=ldpair. The meQTL analysis results were assessed using http://api.godmc.org.uk/v0.1 (20). Miami plot was generated using the R package miamiplot (https://github.com/juliedwhite/miamiplot). The winnerscurse R package is available at https://rdrr.io/github/amandaforde/winnerscurse/.
Acknowledgements
This research was conducted by using the UK Biobank resource under application number 20915. We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. A full list of authors for the NHLBI TOPMed Consortium is provided at https://www.nhlbiwgs.org/topmed-banner-authorship.
Conflict of Interest statement: Dr Weiss reports personal fees from UpToDate, outside the submitted work, and Dr Weiss is on the Scientific Board of Histolix; Dr Silverman received grant support from GlaxoSmithKline and Bayer, outside the submitted work; Dr Cho has received grant funding from GSK and Bayer and speaking or consulting fees from AstraZeneca, Illumina and Genentech, outside the submitted work; Dr Celedón has received research materials from GSK and Merck (inhaled steroids) and Pharmavite (vitamin D and placebo capsules) in order to provide medications free of cost to participants in NIH-funded studies, unrelated to this manuscript. Dr DeMeo has received honoraria from Novartis and grant support from Bayer, outside the submitted work. Dr Lasky-Su is a Scientific Advisor to Precion.
Funding
National Heart, Lung, and Blood Institute [U01HL089856, U01HL089897, P01HL120839, P01HL132825, K99HL159234, R01HL123915, R01HL141826, R01HL155742, R01HL125734], National Institute of Mental Health [R01MH129337] and the National Human Genome Research Institute [R01HG008976, 2U01HG008685]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute, the National Human Genome Research Institute, the National Institute of Mental Health, or the National Institutes of Health.
Molecular data for the Trans-Omics for Precision Medicine (TOPMed) program was supported by the National Heart, Lung, and Blood Institute (NHLBI). Genome Sequencing for ‘NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica’ (phs000988.v4.p1) was performed at Northwest Genomics Center (HHSN268201600032I, 3R37HL066289-13S1). Genome Sequencing for ‘NHLBI TOPMed: Childhood Asthma Management Program (CAMP)’ (phs001726.v1.p1) was performed at the Northwest Genomics Center (HHSN268201600032I). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I).
References
(
The Childhood Asthma Management Program Research Group (