-
PDF
- Split View
-
Views
-
Cite
Cite
Haixiang Zhang, Yinan Zheng, Zhou Zhang, Tao Gao, Brian Joyce, Grace Yoon, Wei Zhang, Joel Schwartz, Allan Just, Elena Colicino, Pantel Vokonas, Lihui Zhao, Jinchi Lv, Andrea Baccarelli, Lifang Hou, Lei Liu, Estimating and testing high-dimensional mediation effects in epigenetic studies, Bioinformatics, Volume 32, Issue 20, 15 October 2016, Pages 3150–3154, https://doi.org/10.1093/bioinformatics/btw351
Close -
Share
Abstract
Motivation: High-dimensional DNA methylation markers may mediate pathways linking environmental exposures with health outcomes. However, there is a lack of analytical methods to identify significant mediators for high-dimensional mediation analysis.
Results: Based on sure independent screening and minimax concave penalty techniques, we use a joint significance test for mediation effect. We demonstrate its practical performance using Monte Carlo simulation studies and apply this method to investigate the extent to which DNA methylation markers mediate the causal pathway from smoking to reduced lung function in the Normative Aging Study. We identify 2 CpGs with significant mediation effects.
Availability and implementation: R package, source code, and simulation study are available at https://github.com/YinanZheng/HIMA.
Contact:lei.liu@northwestern.edu
1 Introduction
Mediation analysis plays an important role in biomedical, behavioral, and psychosocial research studies, typically to understand the mechanism whereby change in one variable causes change in another (MacKinnon, 2008). Analytical methods for mediation analysis have been published extensively in the literature. For example, MacKinnon et al. (2002) compared several methods to test the statistical significance of the mediation effect via a Monte Carlo study; Wang and Zhang (2011) considered estimating and testing mediation effects in censored data; Taylor and MacKinnon (2012) investigated four applications of permutation tests to the single-mediator model; Pearl (2012) presented the causal mediation formula based on the counterfactual approach; Zhang and Wang (2013) introduced and compared four approaches to dealing with missing data in mediation analysis; Boca et al. (2014) developed a permutation approach for testing multiple mediators. For more details about mediation analysis, we refer to the review papers by Ten Have and Joffe (2010) and Preacher (2015). A comprehensive list of literature on mediation analysis is given in a webpage maintained by David Kenny (http://davidakenny.net/cm/mediate.htm).
Most of the above results are concerned with a single or multiple but low-dimensional mediators. To the best of our knowledge, there is very limited research on the high-dimensional mediation effects. However, with the development of advanced data collection techniques, high-dimensional data become increasingly common in many areas of scientific research. Our motivating example is an epigenome-wide DNA methylation study. In the methylation process, methyl groups are added to DNA at binding sites typically referred to as cytosine-phosphate-guanine (CpG) islands, which results in changes (typically down-regulation) to the expression of that DNA. Illumina Infinium HumanMethylation450 BeadChip array is a widely used platform that allows to measure DNA methylation levels of roughly 480K probes, resulting in high-dimensional data.
Specifically, our clinical interest lies in the effect of smoking (measured in pack-years) on lung function, and the extent to which this effect may be mediated by methylation changes. Prior studies have identified CpG sites associated with cigarette smoking in both epigenome-wide or gene-specific analyses, e.g. Gao et al. (2015), Harlid et al. (2014), Zeilinger et al. (2013). Identifying which markers mediate the effect of smoking on lung function is highly desirable from a public health perspective as it can lead to improved techniques for disease early detection and prevention. However, currently there are no appropriate statistical methods developed for use in the high-dimensional mediation analysis.
In this article, we will adopt the multiple mediator model’s framework (Preacher and Hayes, 2008) and extend it to the high-dimensional setting. Then, we propose a method to estimate and test mediation effects in high-dimensional epigenetic studies. Our key ideas are: first, reduce the pool of potential mediators from a very large to a moderate number (i.e. less than the sample size); next, conduct the variable selection with the minimax concave penalty (MCP, Zhang 2010); third, carry out joint significance testing for mediation effects.
The structure of the article is given as follows. In Section 2, we introduce the high-dimensional mediation regression model and propose the estimation and inference procedures. In Section 3, we illustrate the performance of our proposed procedure via extensive simulation studies. In Section 4, we apply our method to study the mediating effect of high-dimensional DNA methylation markers on the causal effect of smoking on lung function in the Normative Aging Study. Section 5 presents some concluding remarks and discusses further research topics.
2 Model and methodology
A scenario with a single mediator between exposure and outcome (plotted similarly to Boca et al., 2014)
A scenario with a single mediator between exposure and outcome (plotted similarly to Boca et al., 2014)
A scenario with high-dimensional mediators between exposure and outcome (plotted similarly to Boca et al., 2014)
A scenario with high-dimensional mediators between exposure and outcome (plotted similarly to Boca et al., 2014)
Since the number of mediators p is much larger than the sample size n, traditional regression analysis fails to work in the third equation of (1). To tackle this problem we will first employ the sure independence screening (SIS, Fan and Lv, 2008) to identify those Mk’s with large absolute effect , which form an index set denoted by . We will then perform variable selection using MCP. Details of the proposed procedure are as follows:
Step 1. (Screening). Use the SIS (Fan and Lv, 2008) to identify a subset Mk is among the top largest effects for the response Y}. Of note, the methylation markers are standardized to ensure that the coefficients are in the same scale.
Here is the regularization parameter, and determines the concavity of MCP. The MCP procedure has been implemented in R package ncvreg (Breheny and Huang, 2011). We prefer MCP over other penalty functions, e.g. elastic net (Zou and Hastie, 2005) since MCP can select the correct model with probability tending to 1 (Zhang, 2010). Further, we set in Step 1 instead of in Fan and Lv (2008) to increase the chance to identify important mediators, since we need to consider both and simultaneously.
In the last equation of Model (6), Y depends on only one mediator Mk. However, as shown in Figure 2, multiple mediators contribute to the outcome Y. Preacher and Hayes (2008) described several advantages of the single multiple mediation Model (1) over the separate simple mediation Model (6). First, failure to adjust for other P – 1 mediators could lead to either inefficiency (if mediators are independent of each other) or even bias (if mediators are correlated with each other). The latter issue may be more troublesome as the correlation among probes close to one another can be as high as 0.6 (Moen et al., 2013) in cell lines or even stronger in our Normative Aging Study (NAS) data collected directly by blood sample. Furthermore, including multiple mediators in one model allows us to determine to what extent the specific indirect effects are associated with mediators, as shown in our Application. Finally, it is not feasible to predict Y using only one mediator.
3 Simulation studies
In this section, we will conduct some simulation studies to assess our proposed procedure. We generate data from model (1), where X is generated from , the first 8 elements of β are , and the first 8 elements of α are . The rest of β and α are all 0. Let c = 1, . ck is chosen as a random number from U(0, 2). ek and ϵ2 are generated from and N(0, 1), respectively. The direct effect and the total effect , so the percentage of total effect mediated by methylation markers is 0.395/0.895 = 44%.
We fit the model by the proposed method and use the joint significance test procedure to derive the P-values. We also compare our procedure with the naive joint significance test based on model (6), where we use Bonferroni's adjustment by the total number of methylation markers p. In the above setting, let denotes the index set of significant mediators. Following Dezeure et al. (2015), we define FWER=P), where is given in (5). Similarly, . Table 1 presents the estimator and mean square error (MSE) for the indirect effect . The FWER and power are reported in Tables 2 and 3, respectively. All simulations are based on 500 replications, with sample size n = 100, 200 and 300, respectively.
Estimator and MSE (in parenthesis) for the indirect mediation effect
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| ( . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| (0.25,0.20) | 0.0256 | 0.0415 | 0.0475 | 0.0154 | 0.0222 | 0.0298 |
| (0.0226) | (0.0206) | (0.0159) | (0.0160) | (0.0146) | (0.0156) | |
| (0.15,0.25) | 0.0191 | 0.0313 | 0.0365 | 0.0135 | 0.0187 | 0.0249 |
| (0.0174) | (0.0177) | (0.0139) | (0.0136) | (0.0126) | (0.0121) | |
| (0.25,0.35) | 0.0511 | 0.0764 | 0.0866 | 0.0311 | 0.0454 | 0.0577 |
| (0.0331) | (0.0253) | (0.0217) | (0.0247) | (0.0197) | (0.0229) | |
| (0.55,0.40) | 0.1376 | 0.1876 | 0.2138 | 0.0844 | 0.1187 | 0.1461 |
| (0.0622) | (0.0492) | (0.0351) | (0.0552) | (0.0390) | (0.0448) | |
| (0,0.50) | −0.0010 | 0.0008 | −0.0026 | 0.0010 | −0.0009 | 0.0004 |
| (0.0282) | (0.0270) | (0.0224) | (0.0196) | (0.0163) | (0.0156) | |
| (0,0.50) | 0.0005 | −0.0006 | 0.0005 | −0.0003 | −0.0009 | −0.0010 |
| (0.0269) | (0.0248) | (0.0231) | (0.0173) | (0.0167) | (0.0153) | |
| (0.55,0) | −0.0005 | −0.0001 | 0.0001 | −0.0002 | 0.0000 | 0.0000 |
| (0.0077) | (0.0094) | (0.0034) | (0.0054) | (0.0000) | (0.0000) | |
| (0.55,0) | −0.0001 | 0.0008 | 0.0000 | −0.0001 | 0.0000 | 0.0001 |
| (0.0109) | (0.0081) | (0.0000) | (0.0061) | (0.0034) | (0.0035) | |
| (0,0) | −0.0001 | −0.0000 | 0.0000 | 0.0000 | 0.0000 | −0.0000 |
| (0.0021) | (0.0004) | (0.0002) | (0.0015) | (0.0004) | (0.0002) | |
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| ( . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| (0.25,0.20) | 0.0256 | 0.0415 | 0.0475 | 0.0154 | 0.0222 | 0.0298 |
| (0.0226) | (0.0206) | (0.0159) | (0.0160) | (0.0146) | (0.0156) | |
| (0.15,0.25) | 0.0191 | 0.0313 | 0.0365 | 0.0135 | 0.0187 | 0.0249 |
| (0.0174) | (0.0177) | (0.0139) | (0.0136) | (0.0126) | (0.0121) | |
| (0.25,0.35) | 0.0511 | 0.0764 | 0.0866 | 0.0311 | 0.0454 | 0.0577 |
| (0.0331) | (0.0253) | (0.0217) | (0.0247) | (0.0197) | (0.0229) | |
| (0.55,0.40) | 0.1376 | 0.1876 | 0.2138 | 0.0844 | 0.1187 | 0.1461 |
| (0.0622) | (0.0492) | (0.0351) | (0.0552) | (0.0390) | (0.0448) | |
| (0,0.50) | −0.0010 | 0.0008 | −0.0026 | 0.0010 | −0.0009 | 0.0004 |
| (0.0282) | (0.0270) | (0.0224) | (0.0196) | (0.0163) | (0.0156) | |
| (0,0.50) | 0.0005 | −0.0006 | 0.0005 | −0.0003 | −0.0009 | −0.0010 |
| (0.0269) | (0.0248) | (0.0231) | (0.0173) | (0.0167) | (0.0153) | |
| (0.55,0) | −0.0005 | −0.0001 | 0.0001 | −0.0002 | 0.0000 | 0.0000 |
| (0.0077) | (0.0094) | (0.0034) | (0.0054) | (0.0000) | (0.0000) | |
| (0.55,0) | −0.0001 | 0.0008 | 0.0000 | −0.0001 | 0.0000 | 0.0001 |
| (0.0109) | (0.0081) | (0.0000) | (0.0061) | (0.0034) | (0.0035) | |
| (0,0) | −0.0001 | −0.0000 | 0.0000 | 0.0000 | 0.0000 | −0.0000 |
| (0.0021) | (0.0004) | (0.0002) | (0.0015) | (0.0004) | (0.0002) | |
Estimator and MSE (in parenthesis) for the indirect mediation effect
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| ( . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| (0.25,0.20) | 0.0256 | 0.0415 | 0.0475 | 0.0154 | 0.0222 | 0.0298 |
| (0.0226) | (0.0206) | (0.0159) | (0.0160) | (0.0146) | (0.0156) | |
| (0.15,0.25) | 0.0191 | 0.0313 | 0.0365 | 0.0135 | 0.0187 | 0.0249 |
| (0.0174) | (0.0177) | (0.0139) | (0.0136) | (0.0126) | (0.0121) | |
| (0.25,0.35) | 0.0511 | 0.0764 | 0.0866 | 0.0311 | 0.0454 | 0.0577 |
| (0.0331) | (0.0253) | (0.0217) | (0.0247) | (0.0197) | (0.0229) | |
| (0.55,0.40) | 0.1376 | 0.1876 | 0.2138 | 0.0844 | 0.1187 | 0.1461 |
| (0.0622) | (0.0492) | (0.0351) | (0.0552) | (0.0390) | (0.0448) | |
| (0,0.50) | −0.0010 | 0.0008 | −0.0026 | 0.0010 | −0.0009 | 0.0004 |
| (0.0282) | (0.0270) | (0.0224) | (0.0196) | (0.0163) | (0.0156) | |
| (0,0.50) | 0.0005 | −0.0006 | 0.0005 | −0.0003 | −0.0009 | −0.0010 |
| (0.0269) | (0.0248) | (0.0231) | (0.0173) | (0.0167) | (0.0153) | |
| (0.55,0) | −0.0005 | −0.0001 | 0.0001 | −0.0002 | 0.0000 | 0.0000 |
| (0.0077) | (0.0094) | (0.0034) | (0.0054) | (0.0000) | (0.0000) | |
| (0.55,0) | −0.0001 | 0.0008 | 0.0000 | −0.0001 | 0.0000 | 0.0001 |
| (0.0109) | (0.0081) | (0.0000) | (0.0061) | (0.0034) | (0.0035) | |
| (0,0) | −0.0001 | −0.0000 | 0.0000 | 0.0000 | 0.0000 | −0.0000 |
| (0.0021) | (0.0004) | (0.0002) | (0.0015) | (0.0004) | (0.0002) | |
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| ( . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| (0.25,0.20) | 0.0256 | 0.0415 | 0.0475 | 0.0154 | 0.0222 | 0.0298 |
| (0.0226) | (0.0206) | (0.0159) | (0.0160) | (0.0146) | (0.0156) | |
| (0.15,0.25) | 0.0191 | 0.0313 | 0.0365 | 0.0135 | 0.0187 | 0.0249 |
| (0.0174) | (0.0177) | (0.0139) | (0.0136) | (0.0126) | (0.0121) | |
| (0.25,0.35) | 0.0511 | 0.0764 | 0.0866 | 0.0311 | 0.0454 | 0.0577 |
| (0.0331) | (0.0253) | (0.0217) | (0.0247) | (0.0197) | (0.0229) | |
| (0.55,0.40) | 0.1376 | 0.1876 | 0.2138 | 0.0844 | 0.1187 | 0.1461 |
| (0.0622) | (0.0492) | (0.0351) | (0.0552) | (0.0390) | (0.0448) | |
| (0,0.50) | −0.0010 | 0.0008 | −0.0026 | 0.0010 | −0.0009 | 0.0004 |
| (0.0282) | (0.0270) | (0.0224) | (0.0196) | (0.0163) | (0.0156) | |
| (0,0.50) | 0.0005 | −0.0006 | 0.0005 | −0.0003 | −0.0009 | −0.0010 |
| (0.0269) | (0.0248) | (0.0231) | (0.0173) | (0.0167) | (0.0153) | |
| (0.55,0) | −0.0005 | −0.0001 | 0.0001 | −0.0002 | 0.0000 | 0.0000 |
| (0.0077) | (0.0094) | (0.0034) | (0.0054) | (0.0000) | (0.0000) | |
| (0.55,0) | −0.0001 | 0.0008 | 0.0000 | −0.0001 | 0.0000 | 0.0001 |
| (0.0109) | (0.0081) | (0.0000) | (0.0061) | (0.0034) | (0.0035) | |
| (0,0) | −0.0001 | −0.0000 | 0.0000 | 0.0000 | 0.0000 | −0.0000 |
| (0.0021) | (0.0004) | (0.0002) | (0.0015) | (0.0004) | (0.0002) | |
FWER at significance level 0.05
| . | p= 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.0380 | 0.0360 | 0.0240 | 0.0240 | 0.0140 | 0.0200 |
| Naive | 0 | 0 | 0 | 0 | 0 | 0 |
| . | p= 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.0380 | 0.0360 | 0.0240 | 0.0240 | 0.0140 | 0.0200 |
| Naive | 0 | 0 | 0 | 0 | 0 | 0 |
FWER at significance level 0.05
| . | p= 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.0380 | 0.0360 | 0.0240 | 0.0240 | 0.0140 | 0.0200 |
| Naive | 0 | 0 | 0 | 0 | 0 | 0 |
| . | p= 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.0380 | 0.0360 | 0.0240 | 0.0240 | 0.0140 | 0.0200 |
| Naive | 0 | 0 | 0 | 0 | 0 | 0 |
Power at significance level 0.05
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.2635 | 0.6735 | 0.8845 | 0.1325 | 0.4445 | 0.6990 |
| Naive | 0.0595 | 0.2770 | 0.4630 | 0.0325 | 0.1770 | 0.3770 |
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.2635 | 0.6735 | 0.8845 | 0.1325 | 0.4445 | 0.6990 |
| Naive | 0.0595 | 0.2770 | 0.4630 | 0.0325 | 0.1770 | 0.3770 |
Power at significance level 0.05
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.2635 | 0.6735 | 0.8845 | 0.1325 | 0.4445 | 0.6990 |
| Naive | 0.0595 | 0.2770 | 0.4630 | 0.0325 | 0.1770 | 0.3770 |
| . | p = 1000 . | p = 10 000 . | ||||
|---|---|---|---|---|---|---|
| Method . | n = 100 . | n = 200 . | n = 300 . | n = 100 . | n = 200 . | n = 300 . |
| Proposed | 0.2635 | 0.6735 | 0.8845 | 0.1325 | 0.4445 | 0.6990 |
| Naive | 0.0595 | 0.2770 | 0.4630 | 0.0325 | 0.1770 | 0.3770 |
From the results in Table 1, we can see that the estimators are close to the true values of indirect effect and the MSE decreases as the sample size n increases. Table 2 indicates that the proposed joint significance test procedure has reasonably well controlled type I error, while a little conservative. In contrast, the naive procedure has poor type I error control. In Table 3, our method has better power than the naive method. Therefore, our method is preferred in practice.
4 An application
Methylation markers are often considered potential mediators between exposures and health outcomes. For example, Bind et al. (2014) found that the effect of air pollution on coagulation and inflammation was significantly mediated by several methylation markers in the Normative Aging Study. However, they only considered 5 specific methylation markers. An epigenome-wide mediation analysis will allow for more thorough and systematic identification of all the possible mediation effects due to DNA methylation.
Our data come from the US Department of Veterans Affairs Normative Aging Study, an ongoing longitudinal cohort of elderly, predominantly white American veterans. In 1963, 2280 men aged 21–80 years and free of hypertension or other chronic conditions were enrolled. Between January 1, 1999 and December 31, 2013, 686 were randomly selected and had blood samples profiled using the Illumina Infinium 450K BeadChip DNA methylation array. A total of 500ng of DNA was used to perform bisulfite conversion. The DNA methylation level was calculated as M values (logit of methylated probe intensity) which approximate a normal distribution (Du et al., 2010). Batch effect and potential confounding effects of blood cell subtype were estimated by Housemandn method (Houseman et al., 2012) and corrected for using ComBat (Johnson et al., 2007). We include 484 548 probes in the analysis.
We are interested in how these methylation markers mediate the relationship between smoking and lung function. Lung function is measured by four outcomes: FEV1 (forced expiratory volume in 1 second), FVC (forced expiratory vital capacity), FEV1/FVC, and MMEF (maximum mid expiratory flow). We conduct separate mediation analysis for each measure. We exclude subjects with lung-related diseases, e.g. asthma, emphysema and COPD, resulting in a sample size of 290. Smoking status and frequency were assessed via questionnaire between 1999 and 2006, defined as ‘baseline’ for our analyses. Methylation was measured at baseline, and outcomes were measured between 2001 and 2006 (e.g. 2+ years post-baseline for each subject), allowing us to ensure the proper temporal relationship (exposure → methylation → lung function). Our analysis also adjusts for age, height, and weight in each equation of model (1).
Of note, in the Normative Aging Study, there are much stronger correlations between M’s and Y than those between X and M’s. Therefore, in Step 1 we also add the top CpGs in the path from to increase the possibility to identify significant mediators. In the second step we run a variable selection on the screened CpGs. In Step 3 we use the joint significance test to derive the P-values. Since smoking reduces lung function, we filter out mediators with indirect effect .
In Table 4 we list the summary results for each of the four outcomes. We identify 2 CpGs as mediators, which are associated with at least one lung function outcome. Specifically, cg05575921 (in the gene region of AHRR) is associated with three measures of lung function, methylation of which has been shown to be a sensitive marker of smoking history (Gao et al., 2015; Harlid et al., 2014). Another CpG, cg24859433 in the intergenic region 6p21.33 is associated with MMEF of the lung function (Ambatipudi et al., 2016; Zeilinger et al., 2013). Therefore, our Epigenome-Wide Association Study (EWAS) results are supported by the current literature for their potential roles in smoking and lung function, demonstrating the validity of our approach.
Estimators and corrected P-values for significant mediation effects
| . | CpG . | CHR . | Gene Name . | . | . | P-value . | % TE . |
|---|---|---|---|---|---|---|---|
| FEV1 | cg05575921 | 5 | AHRR | −0.0231 | 0.1141 | 0.0003 | 50.5 |
| FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.1327 | 0.0017 | 57.5 |
| FEV1/FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.6065 | 0.0453 | 38.9 |
| MMEF | cg24859433 | 6 | * | −0.0117 | 13.366 | 0.0324 | 15.9 |
| . | CpG . | CHR . | Gene Name . | . | . | P-value . | % TE . |
|---|---|---|---|---|---|---|---|
| FEV1 | cg05575921 | 5 | AHRR | −0.0231 | 0.1141 | 0.0003 | 50.5 |
| FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.1327 | 0.0017 | 57.5 |
| FEV1/FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.6065 | 0.0453 | 38.9 |
| MMEF | cg24859433 | 6 | * | −0.0117 | 13.366 | 0.0324 | 15.9 |
‘*’denotes CpGs in the intergenic region; ‘%TE’ denotes the percentage of total effect: .
Estimators and corrected P-values for significant mediation effects
| . | CpG . | CHR . | Gene Name . | . | . | P-value . | % TE . |
|---|---|---|---|---|---|---|---|
| FEV1 | cg05575921 | 5 | AHRR | −0.0231 | 0.1141 | 0.0003 | 50.5 |
| FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.1327 | 0.0017 | 57.5 |
| FEV1/FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.6065 | 0.0453 | 38.9 |
| MMEF | cg24859433 | 6 | * | −0.0117 | 13.366 | 0.0324 | 15.9 |
| . | CpG . | CHR . | Gene Name . | . | . | P-value . | % TE . |
|---|---|---|---|---|---|---|---|
| FEV1 | cg05575921 | 5 | AHRR | −0.0231 | 0.1141 | 0.0003 | 50.5 |
| FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.1327 | 0.0017 | 57.5 |
| FEV1/FVC | cg05575921 | 5 | AHRR | −0.0231 | 0.6065 | 0.0453 | 38.9 |
| MMEF | cg24859433 | 6 | * | −0.0117 | 13.366 | 0.0324 | 15.9 |
‘*’denotes CpGs in the intergenic region; ‘%TE’ denotes the percentage of total effect: .
We are also interested in the relative magnitudes of the total effect mediated through methylation markers, defined as for each methylation marker. The results are listed in the last column of Table 4. About 50% of total effect between smoking and FEV1 (or FVC), and 40% between smoking and FEV1/FVC is mediated through cg05575921, and 16% between smoking and MMEF through cg24859433. We note that the percentage of total effect mediated by methylation markers for FEV1 is close to the Simulation Setting (44%), demonstrating the applicability of our method to real scenarios. Intervention could be explored on these CpGs to modify the lung function among smokers. Finally, we use the naive joint significance test for the NAS data. However, it fails to identify any significant mediators.
5 Conclusion and remarks
We developed a new method to estimate mediation effects with high-dimensional mediators. We used the sure independent screening and the MCP methods, and the joint significance test for mediation effects. We illustrated the proposed method via simulation studies and a real data example. We identified 2 CpGs which could mediate the effects of smoking and lung function. Our method can be widely used in high-dimensional DNA methylation analysis from population studies.
Several other issues may complicate the testing of high-dimensional mediation effects, which will be studied in the future research. e.g. confounders (Li et al., 2007), non-linearity (Albert, 2012) and measurement error (Valeri et al., 2014; Zhao and Prentice, 2014). Particularly, for measurement error, two classical correction approaches including the method of moments and regression calibration (Valeri et al., 2014) may be employed in the high-dimensional mediators case.
In reality, many exposures or risk factors may work simultaneously on DNA methylation. For example smoking and physical activity can both affect lung function through DNA methylation. If these exposures are independent of each other, we can simply add the other risk factor in Model (1). For example, in the second equation of Model (1), we have a total of 2p parameters (). The estimation and inference can be carried out similarly. However, complications arise when there factors are correlated or have interaction effects. It is of further interest to incorporate multiple exposures into the mediation analysis of high-dimensional methylation markers.
Acknowledgements
We would like to thank the Editor, the Associate Editor and three anonymous reviewers for their helpful comments and suggestions, which helped us improve the article substantially.
Funding
This work was supported by AHA 14SFRN20480260, 12GRNT12070254 and National Institute of Environmental Health Sciences (R01ES021357, R01ES021733 and R01ES015172), National Natural Science Foundation of China (11301212, 11401146), and China Postdoctoral Science Foundation (2014M550861). The VA Normative Aging Study is supported by the Cooperative Studies Program/Epidemiology Research and Information Center of the US Department of Veterans Affairs.
Conflict of Interest: none declared.
References
Author notes
Associate Editor: Oliver Stegle


