-
PDF
- Split View
-
Views
-
Cite
Cite
Lerato E Magosi, Anuj Goel, Jemma C Hopewell, Martin Farrall, on behalf of the CARDIoGRAMplusC4D Consortium, Identifying small-effect genetic associations overlooked by the conventional fixed-effect model in a large-scale meta-analysis of coronary artery disease, Bioinformatics, Volume 36, Issue 2, 15 January 2020, Pages 552–557, https://doi.org/10.1093/bioinformatics/btz590
Close -
Share
Abstract
Common small-effect genetic variants that contribute to human complex traits and disease are typically identified using traditional fixed-effect (FE) meta-analysis methods. However, the power to detect genetic associations under FE models deteriorates with increasing heterogeneity, so that some small-effect heterogeneous loci might go undetected. A modified random-effects meta-analysis approach (RE2) was previously developed that is more powerful than traditional fixed and random-effects methods at detecting small-effect heterogeneous genetic associations, the method was updated (RE2C) to identify small-effect heterogeneous variants overlooked by traditional fixed-effect meta-analysis. Here, we re-appraise a large-scale meta-analysis of coronary disease with RE2C to search for small-effect genetic signals potentially masked by heterogeneity in a FE meta-analysis.
Our application of RE2C suggests a high sensitivity but low specificity of this approach for discovering small-effect heterogeneous genetic associations. We recommend that reports of small-effect heterogeneous loci discovered with RE2C are accompanied by forest plots and standardized predicted random-effects statistics to reveal the distribution of genetic effect estimates across component studies of meta-analyses, highlighting overly influential outlier studies with the potential to inflate genetic signals.
Scripts to calculate standardized predicted random-effects statistics and generate forest plots are available in the getspres R package entitled from https://magosil86.github.io/getspres/.
Supplementary data are available at Bioinformatics online.
1 Introduction
The conservative nature of the traditional random-effects model (RE), which assumes the presence of heterogeneity under the null, has contributed to the dominance of fixed-effect (FE) meta-analysis methods in the discovery of small-effect variants (per-allele disease odds ratios <1.2 or trait variance <0.2%) (Bush and Moore, 2012; Yang et al., 2011) even at heterogeneous loci. A modification of the traditional random-effects method, RE2, was designed to detect genetic associations both in the presence and absence of heterogeneity, to provide an opportunity to identify small-effect heterogeneous variants that might go unnoticed in a FE meta-analysis (Han and Eskin, 2011). Most users of the RE2 random-effects method employed it to refine associations at significant and suggestive genetic signals identified in FE meta-analyses (Sapkota et al., 2015, 2017; Wyss et al., 2018); in its latest iteration, RE2C (Lee et al., 2017) it reports a subset of variants detected by RE2 where The RE2C update is intended to have a broad application beyond the augmentation of summary association P-values of variants identified in FE meta-analysis to the discovery of additional and potentially novel loci. The RE2 and by extension RE2C random-effects method’s power advantage over traditional fixed and random-effects meta-analysis models is partly attributable to a relaxed null hypothesis, which assumes homogeneity of genetic effects under the null and thereby provides a greater contrast between the null and alternative hypotheses.
Heterogeneity of genetic effects might arise from biologically relevant differences among contributing studies in a meta-analysis, such as diverse: ancestries, linkage disequilibrium patterns, sub-phenotypes, ages-of disease onset, family-history of disease or gender. Alternatively, differences in the direction and/or size of genetic effect-estimates among participating studies in a meta-analysis could reflect genotyping error or population structure (i.e. local admixture), where, for example, the average genetic effect estimate at a variant of interest is inflated by a few outlier studies showing outsized effects while the majority of study effects are marginal. Heterogeneity at individual variants can be explored through forest plots and the calculation of standardized predicted random-effects (SPREs), while heterogeneity patterns across multiple variants can be conveniently inspected through the calculation of M statistics (Magosi et al., 2017). Notably, SPREs are precision weighted residuals that indicate the direction and extent with which individual studies in a meta-analysis deviate from the average genetic effect (Harbord and Higgins, 2008; Magosi et al., 2017), and can be a useful quantitative indicator of whether the average genetic effect at a variant of interest might be unduly influenced by outlier studies showing extreme effects.
In this report, we revisit the CARDIoGRAMplusC4D meta-analysis (60 801 cases and 123 504 controls) of coronary artery disease (CAD) with the RE2C random-effects method, to search for additional CAD loci potentially masked by heterogeneity in the primary FE meta-analysis.
2 Materials and methods
2.1 GWAS datasets
2.1.1 CARDIoGRAMplusC4D
Summary data (i.e. logistic regression coefficients and their corresponding standard errors) were collated from 48 genome-wide association studies of coronary disease risk that comprised individuals from 6 different ancestry groups including: African (n = 1) and Hispanic American (n = 1), East (China and Korea, n = 3) and South (India and Pakistan, n = 4) Asian, Middle Eastern (Lebanese, n = 1) and European (n = 38); meta-analysis was conducted for a set of ∼ 9 million variants with minor allele frequencies >0.005 (CARDIoGRAMplusC4D Consortium, 2015). Design details of each participating CARDIoGRAMplusC4D study are summarized in Supplementary Table S1; the coronary disease phenotype included patients with an inclusive CAD diagnosis (e.g. myocardial infarction, acute coronary syndrome, chronic stable angina or coronary stenosis >50%). Study-level genomic correction (Devlin and Roeder, 1999) was applied to each study to minimize false positives induced by inflated association test statistics. Variant effect-size estimates (β coefficients scaled as loge(odds ratios) from an additive-effects-only association model) in each study were aligned such that the same risk allele was compared across the studies assembled in the meta-analysis. The studies contributing to the CARDIoGRAMplusC4D study obtained ethical approval from the ethics committees of the respective medical faculties, and informed consent was obtained from all participants, summary genetic association data were anonymously meta-analysed and reported here. Membership of the CARDIoGRAMplusC4D Consortium is provided in the Supplementary Text S1. Requests for access to the summary statistics are coordinated by the CARDIoGRAMplusC4D Steering Committee (www.cardiogramplusc4d.org).
2.1.2 UK Biobank
The UK Biobank study (UKBB) is a large-scale prospective study of over half a million participants commissioned to assemble comprehensive data on genotypic, socio-demographic, lifestyle and environmental factors with the aim of developing better strategies for the prevention, diagnosis and treatment of common diseases (Sudlow et al., 2015) such as cardiovascular disease (Littlejohns et al., 2019). Data from an interim release of GWAS genotypes for 296 525 participants were previously merged and analysed with clinical phenotype data that identified 34 541 cases of coronary heart disease and 261 984 controls from England, Scotland and Wales aged 45–69 years (van der Harst and Verweij, 2018). Coronary disease case status was assigned to prevalent and incident cases of myocardial infarction, acute coronary syndromes and associated therapeutic interventions (e.g. revascularization). Association summary statistics (β coefficients scaled as loge(odds ratios) and associated standard errors from an additive-effects-only logistic regression association model) from this analysis were downloaded from the www.cardiomics.net server. Design details of the UK Biobank participants to compare with the CARDIoGRAMplusC4D cohorts are included in Supplementary Table S1.
2.2 RE2 and RE2C meta-analysis
2.2.1 Traditional RE
The traditional RE tests the null hypothesis that the average genetic effect, is zero that is, , and its summary association test statistic under the null is given by, (asymptotically) (Neupane et al., 2012).
2.2.2 Contemporary random-effects model (RE2)
2.2.3 Updated RE2 model (RE2C)
2.3 Evaluation of heterogeneity for individual variants and M statistics
2.3.1 Calculation of SPRE statistics
2.3.2 Calculation of M statistics—aggregation of SPRE s
2.3.3 Q-statistic and heterogeneity index
Heterogeneity was also assessed using the Q-statistic (Cochran, 1954) and the heterogeneity index (I2) measure (Higgins and Thompson, 2002); I2 was further used to quantify heterogeneity in M statistics.
3 Results
3.1 RE2C association analysis
Of 9 455 778 variants in a RE2C meta-analysis of 48 CARDIoGRAMplusC4D studies, 4645 showed genome-wide significant associations with coronary disease (PRE2C<5×10−8), yielding 382 loci where lead variants were centered on a genetic distance window of cM (Table 1).
A summary of RE2C association results from the CARDIoGRAMplusC4D meta-analysis of coronary disease
| Description . | . |
|---|---|
| Number of variants examined in the CARDIoGRAMplusC4D meta-analysis of coronary disease | 9 455 778 |
| Number of variants significantly associated with coronary disease under the RE2C method (PRE2C<5×10−8) | 4645 |
| Number of loci obtained after grouping the 4645 significantly associated variants by a genetic distance window of ±0.5 cM around each lead variant | 382 |
| Number of lead variants that replicated in the UK Biobank (UKBB) prospective study (PUKBB<5×10−5) | 24 |
| Description . | . |
|---|---|
| Number of variants examined in the CARDIoGRAMplusC4D meta-analysis of coronary disease | 9 455 778 |
| Number of variants significantly associated with coronary disease under the RE2C method (PRE2C<5×10−8) | 4645 |
| Number of loci obtained after grouping the 4645 significantly associated variants by a genetic distance window of ±0.5 cM around each lead variant | 382 |
| Number of lead variants that replicated in the UK Biobank (UKBB) prospective study (PUKBB<5×10−5) | 24 |
cM, centiMorgans.
A summary of RE2C association results from the CARDIoGRAMplusC4D meta-analysis of coronary disease
| Description . | . |
|---|---|
| Number of variants examined in the CARDIoGRAMplusC4D meta-analysis of coronary disease | 9 455 778 |
| Number of variants significantly associated with coronary disease under the RE2C method (PRE2C<5×10−8) | 4645 |
| Number of loci obtained after grouping the 4645 significantly associated variants by a genetic distance window of ±0.5 cM around each lead variant | 382 |
| Number of lead variants that replicated in the UK Biobank (UKBB) prospective study (PUKBB<5×10−5) | 24 |
| Description . | . |
|---|---|
| Number of variants examined in the CARDIoGRAMplusC4D meta-analysis of coronary disease | 9 455 778 |
| Number of variants significantly associated with coronary disease under the RE2C method (PRE2C<5×10−8) | 4645 |
| Number of loci obtained after grouping the 4645 significantly associated variants by a genetic distance window of ±0.5 cM around each lead variant | 382 |
| Number of lead variants that replicated in the UK Biobank (UKBB) prospective study (PUKBB<5×10−5) | 24 |
cM, centiMorgans.
This compares with the conventional FE meta-analysis that revealed 2213 GWAS (PFE<5×10−8) variants in 46 loci, and an RE2 analysis that afforded 5942 GWAS (PRE2<5×10−8) variants in 406 loci (Fig. 1).
A flowchart summarizing meta-analysis genetic association results under the RE2 and RE2C random-effects models and the traditional fixed-effect (FE) method
A flowchart summarizing meta-analysis genetic association results under the RE2 and RE2C random-effects models and the traditional fixed-effect (FE) method
3.2 Single-variant heterogeneity analysis of 382 novel RE2C loci
Most (85.6%) of the lead variants showed marked heterogeneity (Q-statistic P < 1×10−7), with at least half of the lead variants showing relatively high levels of heterogeneity (I2>72.1%) (Supplementary Table S2). Next, we calculated SPREs and generated forest plots to inspect heterogeneity patterns at lead variants of the 382 RE2C loci. Most of the RE2C lead variants had one or more outlier studies where genetic effect-size estimates deviated substantially from the average genetic effect. This empirical threshold to flag overly influential outliers was informed by rs2891168 (chromosome 9p21) in the primary CARDIoGRAMplusC4D meta-analysis, where this well-established locus had max (CARDIoGRAMplusC4D Consortium, 2015). An inspection of forest plots for the 382 RE2C lead variants revealed heterogeneity patterns that were grouped into three categories (Supplementary Fig. S1). Most (n = 323) of the lead variants fell in the first category where at least one study showed outsized effects while the majority of the studies showed minimal effects (Supplementary Fig. S2 and Table S3). Lead variants (n = 28) in the second category generally showed heterogeneity patterns with outlier studies showing contrasting effects, in particular the forest plots showed both positive outlier studies with the potential to inflate the average genetic effect as well as negative outliers that might lower or change the direction of the mean genetic effect, a scenario where dropping either type of outlier would likely induce a false positive or negative signal (Supplementary Fig. S3 and Table S3). The final category comprised 31 lead variants where there was little evidence of overly influential outlier studies consistent with heterogeneity patterns plausibly induced by biologically relevant differences (Supplementary Fig. S4 and Table S3). A general trend that emerged from inspecting heterogeneity patterns at the individual RE2C lead variants was that RE2C P-values became more extreme (i.e. smaller) with increasing levels of heterogeneity (Supplementary Table S2).
3.3 M Statistic, multi-variant heterogeneity analysis
A multi-variant heterogeneity analysis across the 382 RE2C lead variants revealed five significant outlier studies (14, 15, 16, 17 and 18) that systematically showed stronger than average effects (Bonferroni-corrected M statistic P-values <0.05) (Supplementary Fig. S5). A meta-regression of the M statistics found no evidence of systematic heterogeneity patterns due to differences in ancestry, age-of CAD onset and CAD family-history (Supplementary Table S4), design factors that were prominent in our previous analysis of the CARDIoGRAMplusC4D data (Magosi et al., 2017) using lead variants for 46 published loci (CARDIoGRAMplusC4D Consortium, 2015). We note that studies 15, 16, 17 and 18 showed relatively high genomic inflation () prior to study-level genomic correction and a meta-regression of the M statistics confirmed varying levels of genomic inflation among contributing studies in the CARDIoGRAMplusC4D meta-analysis as a significant explanatory factor () (Supplementary Table S4).
3.4 Replication in the UK Biobank study
We next explored whether genetic associations between lead variants at the novel RE2C loci and CAD risk could be replicated in a large-scale prospective study based on 296 525 participants (including 34 541 cases of coronary heart disease) from England, Scotland and Wales aged 45–69 years (van der Harst and Verweij, 2018). Only 24 of the 323 RE2C lead variants available in the UK Biobank GWAS were replicated (PUKBB < 5 × 10−5, Supplementary Table S5). All but 3 of the replicated genetic signals had traditional FE meta-analysis P-values that were significant at genome-wide levels ( < 5 × 10−8) and just 2 of the 24 showed marked heterogeneity (I2>0.5) (Supplementary Table S5). Furthermore, 3 replicated variants included an influential outlier study in the CARDIoGRAMplusC4D meta-analysis, these 3 variants were also GWAS-significant in the FE meta-analysis (Supplementary Tables S2 and S5). These findings are consistent with Han and Eskin’s (2011) observation that the power of RE2 only exceeded FE meta-analysis for markedly heterogeneous variants.
Finally, a meta-regression model of M statistics for 323 RE2C lead variants in a combined CARDIoGRAMplusC4D and UK Biobank meta-analysis confirmed genomic control inflation as a potential source of systematic heterogeneity in genetic meta-analyses (Supplementary Table S6 and Fig. S6).
4 Discussion
Our application of the RE2C method to the CARDIoGRAMplusC4D meta-analysis dataset highlights the high sensitivity but low specificity of the method as a discovery tool for small-effect heterogeneous genetic associations. Consequently the practical advantage afforded by the improved power of the RE2C method will likely be in augmenting P-values for putative loci highlighted by traditional fixed and random-effects meta-analyses.
Beyond variants that would have otherwise been detected through a traditional FE meta-analysis approach, 21 lead variants that were associated with CAD under the RE2C method (PRE2C < 5 × 10−8) were suggestively associated under the traditional FE method (5×10−8 < 5 × 10−5); and 2 (rs12509595, rs62181365) of these were part of the group of RE2C lead variants that replicated in the UKBB analysis while the remaining 19 fell below the replication threshold (PUKBB<5×10−5) (Supplementary Tables S2 and S5). Of the list of 24 significant RE2C replicated variants in the UKBB analysis, a single lead variant (rs662799) on chromosome 11 showed neither significant nor suggestive association with CAD under the traditional FE method (Q-statistic P = 2.4×10−4, I2 = 47%, PFE = 1.28×10−4) (Supplementary Table S4 and Fig. S7). Notably, rs662799 maps to the APOA1-C3-A4-A5 locus, immediately upstream of APOA5, a locus that is strongly associated with higher triglyceride levels (TG) and lower HDL cholesterol (HDL-C) in individuals of East Asian and European ancestry () (Lu et al., 2016; Spracklen et al., 2017). APOA5 is a ‘well-known’ CAD-associated locus (e.g. rs964184; CARDIoGRAM Consortium et al., 2011), thus the rs662799 CAD association detected in this RE2C analysis represents a confident positive assignment that can guide future functional genomic experiments to identify the underlying causal variants(s).
Altogether, the majority (n = 331) of lead variants discovered in the CARDIoGRAMplusC4D meta-analysis by the RE2C random-effects method fell outside the scope of tentatively associated CAD risk variants (>5×10−5) (Supplementary Table S2). Significant P-values under the RE2 and RE2C models can represent a non-null average genetic effect and/or considerable heterogeneity (Neupane et al., 2012). Therefore, the genome-wide significant RE2C P-values at the 277 lead variants where genetic associations with CAD were irreproducible in the UKBB dataset (PUKBB>5×10−5) and where >5×10−5, likely signify substantial heterogeneity of genetic effects at the individual variants rather than novel CAD signals.
Small-effect genetic associations at variants with relatively high heterogeneity might elicit skepticism regarding the potential reproducibility of such associations. However, there are notable exceptions within the coronary disease landscape, such as rs2891168, the lead variant for the chromosome 9p21 CAD risk locus in the CARDIoGRAMplusC4D data (2015) that shows substantial heterogeneity (Q-statistic P 4.2×10−7; ) but with no exceptional outlier studies (i.e. ), a heterogeneity pattern typified in Supplementary Figure S4. rs2891168 tags one of the strongest associated loci in CARDIoGRAMplusC4D (odds ratio = 1.2, P < 2×10−98), a meta-analysis dataset heavily weighted by European (69%), South Asian (20%) and East Asian (7%) data (Supplementary Table S1). Other tagging variants for this locus in strong linkage disequilibrium have been convincingly validated to show comparable strength associations with CAD risk in some non-European populations (e.g. India and Pakistan, Coronary Artery Disease (C4D) Genetics Consortium, 2011; Han Chinese, Lu et al., 2012; multi-ethnic cohorts from East Asia, Han et al., 2017) but not for instance, and to our knowledge in populations of African ancestry. The latter are poorly represented in CARDIoGRAMplusC4D (African Americans form ∼1% of the total data), limiting opportunities to judge the informativity or otherwise of individual loci in this meta-analysis dataset.
Based on our experience of applying RE2C to the CARDIoGRAMplusC4D dataset, we recommend as best practice that reports of small-effect heterogeneous loci discovered with this method be accompanied by forest plots and SPRE statistics to explore the distribution of genetic effect estimates across participating studies. This can highlight overly influential outlier studies with the potential to inflate genetic signals prompting researchers to reflect upon the underlying data that gave rise to novel heterogeneous associations.
Acknowledgements
We are grateful to the CARDIoGRAMplusC4D collaborators (http://www.cardiogramplusc4d.org) for their support during this work.
Funding
This research was supported by a Wellcome Trust core award (090532/Z/09/Z and 203141/Z/16/Z, M.F.), The British Heart Foundation (FS/14/55/30806, J.C.H.), the BHF Centre of Research Excellence, Oxford (RE/13/1/30181, M.F. and J.C.H.), the Government of Botswana (L.E.M.), the European Union Seventh Framework programme (HEALTH-F2-2013-60145, A.G.) and the Wellcome Trust Institutional strategic support fund (M.F.). A.G. participates in the TriPartite Immunometabolism Consortium (TrIC) supported by the Novo Nordisk Foundation (NNF15CC0018486). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the article.
Conflict of Interest: none declared.
References
CARDIoGRAM Consortium et al. (
CARDIoGRAMplusC4D Consortium. (
Coronary Artery Disease (C4D) Genetics Consortium. (
Author notes
The authors wish it to be known that, in their opinion, Jemma C. Hopewell and Martin Farrall should be regarded as Joint Last Authors.

