Cohort Profile Cohort Profile : Pregnancy And Childhood Epigenetics ( PACE ) Consortium

Janine F. Felix, University Medical Center Rotterdam Bonnie R. Joubert, National Institute of Environmental Health Sciences Andrea A. Baccarelli, Columbia University Gemma C. Sharp, University of Bristol Catarina Almqvist, Karolinska Institutet Isabella Annesi-Maesano, Institut Pierre Louis d'Epidémiologie et de Santé Publique Hasan Arshad, University of Southampton Nour Baiz, Institut Pierre Louis d'Epidémiologie et de Santé Publique Marian J. Bakermans-Kranenburg, Leiden University Kelly M. Bakulski, University of Michigan

Why was the Consortium set up?
Epigenetics refers to mitotically heritable changes to the DNA, which do not affect the DNA sequence, but can influence its function. Currently, DNA methylation is the most studied epigenetic phenomenon in large populations. It entails the binding of a methyl group, mainly to positions in genomic DNA where a cytosine is located next to a guanine, a cytosine-phosphate-guanine (CpG) site ( Figure 1). DNA methylation at CpG sites can influence gene expression by altering the DNA's three-dimensional structure and interacting with methyl-binding proteins, consequently affecting the binding of the gene transcription and chromatin-modifying machinery. There are approximately 28 million CpG sites in the human genome. DNA methylation is a dynamic process that can be influenced by genetic factors, as well as by environmental factors such as diet, air pollution, toxicants or smoking. [1][2][3][4] Hence, DNA methylation may be seen as linking the genome to the environment with respect to health and disease. Early development is a period of profound changes in DNA methylation and may, as such, be a critical period for environmentally-induced DNA methylation changes. 4 Hence, this period is of specific interest for DNA methylation studies in relation to specific exposures and long-term health outcomes. 1,4-6 DNA methylation modifications in early life represent an important potential mechanism for studies on the developmental origins of health and disease (DOHaD). The DOHaD hypothesis suggests that exposure to an adverse environment in fetal life or early childhood leads to permanent changes in organ structure or function, which may have effects on later life health. 7,8 Many associations of early life adverse exposures, such as maternal obesity, smoking, air pollution and suboptimal diet, with common diseases throughout the life course have been described. [9][10][11][12] Long-lasting DNA methylation modifications may be an important mechanism linking early life exposures with outcomes in later life. 13 Besides having a potential mechanistic role, DNA methylation may also serve as a biomarker of exposures or outcomes, even without it having a direct causal role in the process. 3,14,15 For example, an environmental factor may cause both a change in phenotype and a change in DNA methylation, without a causal relation between the two. Also, a disease could cause a change in DNA methylation, rather than the other way around. 15 The ability of methylation signals to serve as strong biomarkers of some exposures, such as maternal smoking in pregnancy, may complicate inference about the role in mediating health outcomes; measurement error correction may help in this regard. 16 Various pregnancy, birth and childhood studies have recently initiated research on the role of DNA methylation in the response to environmental exposures and development of health outcomes. Individual studies usually have sample sizes too small to address this issue, but it can be studied in joint efforts of prospective cohort studies starting from early life onwards. 1,17 The potential of collaborative efforts between large-scale prospective cohort studies has been demonstrated by the success of recent genome-wide association studies (GWAS) which have shed light on the genetic background of common diseases as well as their risk factors. These GWAS are characterized by state-of-the-art genome-wide agnostic approaches in which millions of genetic variants are related to a particular health outcome, usually in the setting of large consortia combining the results of multiple studies, using meta-analysis. Common genetic variants have been identified that are related to birthweight, childhood obesity, respiratory phenotypes, atopic dermatitis and behavioural outcomes among others. [18][19][20][21][22][23][24][25] In line with these approaches, recent developments enable analysis of hundreds of thousands of DNA methylation markers across the genome on a single array. 26,27 The high-throughput and cost-effective nature of these arrays has made it possible for studies to measure DNA methylation across the genome ('epigenome-wide DNA methylation') in relatively large samples sizes. These data can be used in epigenome-wide association studies (EWAS) to evaluate associations of DNA methylation at specific sites or regions of the genome with determinants and outcomes of health and disease. EWAS in pregnancy, birth or child cohorts specifically enable exploration of associations of early life exposures with DNA methylation levels in children, and of DNA methylation levels with specific growth, development and health outcomes. Recent studyspecific EWAS have shown associations of DNA methylation levels in offspring with birthweight, maternal body mass index and maternal smoking. [28][29][30][31] Large sample sizes are required to achieve optimal power in analyses of so many genomic sites, especially if the prevalence of the exposure or outcome under study is low. Collaboration between studies and combined meta-analysis of the available data are needed to optimize the use of resources and to increase the likelihood of detecting DNA methylation differences underlying the associations of early life exposures and health outcomes.
This paper describes the global Pregnancy And Childhood Epigenetics (PACE) Consortium which, to date, brings together 39 studies with over 29 000 samples and DNA methylation data in pregnant women, newborns and/or children. Besides strongly increased power to detect associations, bringing studies together in the PACE Consortium for meta-analysis greatly decreases the risk of false-positive associations. The larger power also enables more detailed studies into potential causal roles of methylation, using a mendelian randomization approach for which large sample sizes are typically needed. In addition, a number of studies have measured DNA methylation at multiple time points from birth through childhood and/or in adolescence, which enables investigation into the persistence of differential DNA methylation signals over time. Also, the availability of information from studies with participants from various backgrounds in terms of ethnicity, location and living environment enables testing of identified associations across different settings and evaluation of heterogeneity of effects across study populations.
The primary aim of the PACE Consortium is to identify differences in DNA methylation in relation to a wide range of exposures and outcomes pertinent to health in pregnancy and childhood through joint analysis of DNA methylation data. Secondary aims of the Consortium are to perform further functional annotation-based analyses, to attempt to assess causality of DNA methylation differences for child health phenotypes, to contribute to methodological development and to exchange knowledge and skills.
Who is in the Consortium?
In June 2013, an international group of studies focused on maternal and child health met at the U.S. National Institute of Environmental Health Sciences to organize an EWAS meta-analysis on maternal smoking in pregnancy and DNA methylation in newborns and children. 32 This marked the start of the PACE Consortium. The success of this initial effort resulted in the expansion of the Consortium and inclusion of additional research groups, to include additional exposures and outcomes. The PACE Consortium is modelled after successful GWAS consortia, in which many PACE investigators already participated, including the Early Growth Genetics (EGG) Consortium, the Early Genetics and Lifecourse Epidemiology (EAGLE) Consortium and the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. 33 Currently, the PACE Consortium includes 39 studies with genome-wide DNA methylation data from pregnancy, newborn or childhood samples and information on at least one of the exposures or outcomes of interest. A list of studies currently involved in the PACE Consortium with basic study information is shown in The Consortium structure is purposefully kept simple. The work in the Consortium is strongly researcherdriven. Any member can propose an analysis. Projects are often co-led by two or more researchers from different studies. This supports collaboration and exchange of knowledge and skills for both junior and senior researchers. On most projects, junior researchers, often PhD students or postdoctoral students, take the lead under the supervision of a more experienced, senior researcher from their own or another participating institution. The lead group operates as the meta-analysis centre for a specific project. For each project, a working group is formed and studies can opt into or opt out of that specific project. Analyses are performed according to a predefined analysis plan, which contains inclusion and exclusion criteria, phenotype definitions, covariates and statistical models, usually logistic or robust linear regression models. Each cohort performs its own quality control and normalization of the EWAS data. We have shown a very limited influence of different normalization methods between cohorts on the results of EWAS meta-analyses. 32 Each cohort analyses its own data according to the analysis plan, after which the summary results are shared with the meta-analysis centre. Data exchange is organized for each project separately, usually through secure universitybased upload servers. These summary results include the effect estimate, standard error, P-value and included sample size for each CpG analysed. In general, meta-analysis of summary results is the preferred approach and no individual-level data are shared between the centres. However, integrated data approaches may be considered, conditional on ethical and legal agreements, which may differ for each individual study; but such approaches have not been used so far. Subsequently, the meta-analysis centre performs quality control of the summary results files and meta-analyses all datasets, with specific 'omics' meta-analysis software, such as Metal. 34 Standard quality controls include inspection of the distribution of effect estimates and standard errors across cohorts, and Manhattan plots of individual cohort and metaanalysis results. The full process of quality control and metaanalysis is independently repeated by an analyst from one of the other participating studies (the 'second centre') as a quality control measure.
As a general rule, as many studies as possible are included in the discovery meta-analysis to increase power to discover new associated DNA methylation sites. Replication of findings is then pursued in further studies that were unable to participate in the discovery meta-analysis, if available. After the discovery meta-analysis is finished, further work is done in terms of validation and interpretation of the results, including enrichment/pathway/functional network analyses using publicly available resources, and methylation-expression analyses ( Figure 2). Often, such follow-up work involves a look-up of the main findings in children of different ages than in the main analysis. For example, after a discovery analysis in cord blood samples, a look-up of the findings in childhood and adolescent samples may be done to study persistence of the identified signals.
Analyses in the PACE Consortium are performed collaboratively by the participating centres. Logistics are organized by the National Institute of Environmental Health Sciences in Research Triangle Park, NC, USA. All ongoing and proposed analyses are discussed in bi-weekly conference calls, during which project leaders give updates. In addition, individual analysis groups may have separate conference calls if needed.
How often have they been followed-up?
The PACE Consortium brings together a large number of cohorts, each of them with cohort-specific protocols ( Table 1,  Supplementary Table 1). Most studies have ongoing data collection and follow-up. Many of the cohorts have multiple follow-up time points from fetal life into childhood, and several have follow-up into adolescence or early adulthood. Most have information on maternal exposures during pregnancy, including maternal smoking and body mass index. 31,32 A number of studies also collected information on more specific exposures, such as air pollution. 35 All cohorts have collected information on child physical and/or mental development. Some studies have a particular focus, such as cleft lip and palate (NCL) or autism (SEED I), but most are population-based cohorts collecting a vast amount of data on many domains. These include anthropometric, cardiometabolic, neurodevelopmental and respiratory measurements, as well as childhood diseases. Further details of data collection waves, follow-up and biological sample collection in all studies can be found in Supplementary Table 1. The PACE Consortium is focused around the common methylation platform. Studies commit to the PACE Consortium on a projectby-project basis. They are not necessarily involved in PACE with all their data, but rather decide per project whether or not they will participate. It is therefore possible that a particular study is not involved in a PACE project on a specific topic, for example because they decide to pursue a single-study project or because they are involved in another collaboration on that topic. In such cases, studies opt out of the project and are not involved until the work is published. With ongoing sample collection and data expansion in each study, increased DNA methylation and phenotype measures will be available in the future.

What has been measured?
All studies involved in the PACE Consortium have common measures of DNA methylation. Currently the platform used by the group is the Illumina 450 K HumanMethylation array, the most widely used array in large-scale human studies (Illumina Inc., San Diego, USA). 26 Recently a larger, compatible array (850 K EPIC) was developed. 27 New studies using this array can be included in the Consortium in the future. The 450 K array includes around 485 000 DNA methylation sites, covering less than 2% of all sites across the genome. It is targeted at genes and CpG islands, and sites were chosen based on advice of an international group of DNA methylation experts. 26 The PACE Consortium currently focuses on exposures occurring during pregnancy and childhood health outcomes. Across studies, a vast number of exposures and outcomes are available and studies usually participate in multiple analyses. The main exposures that the PACE Consortium currently focuses on are those occurring during pregnancy; the main outcomes are childhood health parameters and diseases. An overview is given in Figure 3.
Recently, working groups have formed around methodological issues, such as blood cell composition adjustment and evaluation of methods for identifying differentially methylated regions. Many of the studies involved in the PACE Consortium also have GWAS data and other types of 'omics' data, including transcriptomics and metabolomics if available, creating the possibility for integrative omics analyses. The availability of GWAS data enables analyses of associations of genetic variants with DNA methylation, as well as analyses to assess the potential influence of genetic variation on methylation variance, the possible causal role of DNA methylation differences using a two-step mendelian randomization approach, and adjustment for genetic markers of ancestry. [36][37][38] What has it found?
A number of the cohorts involved in the PACE Consortium have published cohort-specific EWAS on various phenotypes, including maternal smoking, maternal body mass index, maternal stress and child birthweight and sex, predating PACE projects on these topics. 28-31,39-43 Some studies have involved collaborations between a few of the PACE cohorts. [44][45][46][47] In addition, members of the PACE Consortium have contributed to methodological developments in the field, such as evaluation of normalization methods, aspects of study design, and analysis software development. [48][49][50][51][52][53][54] Multiple consortium projects are currently being analysed or prepared. Here, we would like to highlight the first three published reports. The first large PACE Consortium meta-analysis reported on the results of a meta-analysis on maternal smoking in relation to cord blood DNA methylation. 32 This meta-analysis of EWAS was on sustained maternal smoking during pregnancy in 13 cohorts, with a total of 6685 newborns. There were 6073 differentially methylated CpG sites in relation to maternal smoking during pregnancy, after multiple testing correction using a false discovery rate of 5%, of which half had not previously been identified for their association with either maternal smoking during pregnancy or smoking in adults. This analysis showed the increased power leveraged by large consortium analysis. Analyses of older children (five cohorts, N ¼ 3187) indicated that most of these DNA methylation signals observed at birth persist into childhood, but are attenuated. A number of the differentially methylated CpG sites were in or near genes with known roles in diseases associated with maternal smoking, such as orofacial clefts and asthma. We also found enrichment in developmental processes.
The second report was a meta-analysis of the association of maternal plasma folate levels during pregnancy among 1988 newborns from two cohorts. Differential methylation of 443 CpG sites related to 320 genes was found, with most of these genes having no known function in folate biology. 44 The third, most recent meta-analysis reported the results of an assessment of the association of prenatal air pollution exposure and cord blood DNA methylation in four cohorts, spanning 1508 participants. 55 It showed that exposure to nitrogen dioxide during pregnancy was associated with differential offspring DNA methylation in mitochondria-related genes, as well as in several genes involved in antioxidant defence pathways. Some of these associations also persisted to older ages. 55  What are the main strengths and weaknesses?

Main strengths
Although individual-cohort analyses can reveal associated DNA methylation sites, joining forces in meta-analyses within a consortium brings significant benefits. First, it substantially increases sample size, facilitating the discovery of novel loci and optimizing the use of resources. Second, it offers the potential for analyses of DNA methylation signals at various ages throughout infancy, childhood and adolescence. Third, this setting makes it possible to compare effects between different populations and ethnicities. Fourth, a consortium setting allows replication of findings across studies, thus decreasing the publication of false-positive results from individual studies. Fifth, EWAS analyses in pregnancy, birth and child cohort studies offer an enormous potential to shed light on mechanisms underlying the associations of early, fetal and childhood exposures with later life health and disease, and on a potential role of DNA methylation as a biomarker of exposures or outcomes. The longitudinal data collection from early life onwards enables us to study the role of DNA methylation in life course health trajectories. Sixth, the experience and diverse backgrounds of the PACE investigators, including epidemiologists, statisticians, geneticists, clinicians, bioinformaticians and biologists, enables sharing of methods and analytical code, quicker solutions to methodological issues and easier exchange of knowledge and skills. The experience of many PACE investigators in existing consortia, often with the same partner studies, was of great benefit at the start of the PACE Consortium. Issues that may have posed challenges to earlier consortia, such as communication between studies, harmonizing analytical methods, and authorship strategies, were hence part of the 'basic skill set' of this Consortium. 33 Seventh, the Consortium also offers outstanding networking opportunities for students, postdocs and junior investigators in their career development. Based on recent experience in GWAS consortia, we expect that the PACE structure can be a springboard for both junior and senior investigators to apply for funding for new projects, including those that require additional analyses of samples, exposures or outcomes. Similar to many other consortia, the PACE Consortium has no structural or central funding other than the modest administrative support from the National Institute of Environmental Health Sciences for conference calls, the website and the three in-person meetings held to date.

Main weaknesses
Analyses of epigenome-wide DNA methylation face particular methodological challenges. First, the analyses in the PACE Consortium are mainly performed on DNA extracted from blood samples, which are easily collected in population-based settings. However, each cell type may have its own unique methylation profile. Thus, DNA methylation in leukocytes does not necessarily represent DNA methylation in other tissues that may be more relevant for certain phenotypes, for example lung tissue when studying the association of DNA methylation and asthma. This feature of DNA methylation studies in blood poses a challenge in the interpretation of the findings. As cohort studies involving young children will generally not be able to collect more specific tissue samples, with the exception of buccal cells, collaborations will be sought with other partners in the future to be able to address tissue specificity. A subset of PACE cohorts have DNA methylation measured in placenta. Second, the distribution of blood cell subtypes in blood samples varies in response to a range of internal and external factors, such as infection, diseases and smoking. As DNA methylation is cell-type specific, an observed association of an exposure or an outcome with DNA methylation may be the result of changes in blood cell composition, rather than a representation of a true association. Adjustment for blood cell composition in studies using cord blood data is a challenge. So far, we have used the regression calibration method of Houseman and colleagues, which until recently has been constrained to the first available reference panel of 450 K data in white blood cell subtypes of six adult males. This panel has been shown to be suboptimal in estimating blood cell proportions in DNA from newborns. 49,56,57 Recently, PACE consortium investigators reported on cord blood-specific methods for blood cell composition correction. 49,58,59 Third, as in any epidemiological study, but less problematic in GWAS, confounding factors need to be taken into account in the analyses. In addition, confounding by technical covariates, or batch effects, which has minimal effect on genotype calling in GWAS, needs to be addressed in EWAS and may require extensive adjustment. Given the size of the Consortium and the number of studies that may be involved in a metaanalysis, it can also be a challenge in terms of logistics and time to ask individual studies to go back and re-run analyses with additional covariates or stratified on a particular factor such as sex to study associations in more detail.
Fourth, as certain outcomes or disease states may also influence DNA methylation, the potential for reverse causality needs to be taken into account, especially in crosssectional analyses. Yet, even if a disease causes differences in DNA methylation, these may still serve a clinical purpose as biomarker of the disease or its progression. 15 Such epigenetic biomarkers may be used in disease prediction, as a diagnostic test, in determining specific disease subtypes or in informing on prognosis. 3 Fifth, the currently used DNA methylation arrays only cover 2-3% of the total number of DNA methylation sites, with a focus on genes and CpG islands. 26,27 Even though the newer EPIC array increases coverage of enhancer regions, the coverage will still be relatively limited. 27 Sixth, the integration of DNA methylation data with other 'omics' data to gain insight into their interrelations will also pose challenges, both in terms of methodology and in terms of bioinformatics approaches. An in-depth discussion of these methodological challenges is beyond the scope of this article, but these are topics of ongoing work within and outside the PACE Consortium. 50,[60][61][62][63] Seventh, the studies currently involved in the PACE Consortium are located in industrialized countries. Studying environmental exposures in lowand middle-income settings would be relevant for a more complete understanding of epigenetic mechanisms. As PACE is an open consortium, we hope to be able to include studies from developing countries in the future.
There is much to learn in the field of EWAS. The efforts by this Consortium and many other researchers represent the first steps in the discovery of the role of DNA methylation in health and disease. Results from EWAS meta-analyses do not stand on their own. Discovery results from EWAs need to be followed by investigation of the relationships between DNA methylation and gene expression, of the roles of biological pathways on outcomes and of causality between exposures and DNA methylation. Conversely, results from laboratory scientists may inspire new analyses of DNA methylation in human studies. Many methodological issues need to be resolved. The PACE Consortium offers a strong platform to address these points and to contribute to the field of population epigenetics in the future.
Can I get hold of the data? Where can I find out more?
The PACE Consortium is an open consortium and studies interested in participating in one or more analyses are welcome to join. Each individual cohort analyses its own data locally and only summary statistics, including cohortspecific effect estimates, standard errors and P-values for each CpG site, are shared for the meta-analysis. Therefore, for access to data from individual cohorts in the PACE Consortium, researchers should contact studies directly. Study-specific protocols can be found through the study websites (Supplementary material and Supplementary  Table 1) or through contact with study investigators. Researchers interested in participating in the PACE Consortium can contact the corresponding authors of this paper. Meta-analysis summary statistics will be made publicly available, in accordance with journal requirements. For more information, please see: [http://www.niehs.nih. gov/research/atniehs/labs/epi/pi/genetics/pace/index.cfm].

Supplementary Data
Supplementary data are available at IJE online.

Center for Health Assessment of Mothers and Children of Salinas (CHAMACOS)
The CHAMACOS study was supported by the NIH grants P01 ES009605 and R01 ES021369, R01ES023067 and EPA grants RD 82670901 and RD 83451301.

Childhood Obesity Project (CHOP)
The CHOP study and research reported herein were partially sup-

Early Autism Risk Longitudinal Investigation cohort (EARLI)
Funding for this work was provided by R01ES017646, R01ES01900, R01ES16443, and Autism Speaks grant #260377.

Exploring Perinatal Outcomes in Children (EPOCH)
EPOCH is funded by the following NIH grants: R01DK068001; R01 DK100340.

Flemish Environment and Health Study I (FLEHSI) birth cohort
The FLEHS study was commissioned, financed and steered by the

Study to Explore Early Development, Phase I (SEED I)
The SEED study is funded by the Centers for Disease Control and Prevention (grant nos. U10DD000180, U10DD000181, U10DD000182, U10DD000183, U10DD000184, U10DD000498) and the methylation assays were funded by Autism Speaks (grant no. 7659).

Children's Health Study (CHS)
We would like to express our sincere gratitude to Steve Graham and Robin Cooley at the California Biobank Program and Genetic Disease Screening Program within the California Department of Public Health for their assistance and advice regarding newborn bloodspots. The biospecimens and/or data used in this study were obtained from the California Biobank Program, (SIS request number(s) 479)' Section 6555(b), 17 CCR. The California Department of Public Health is not responsible for the results or conclusions drawn by the authors of this publication. We are indebted to the school principals, teachers, students and parents in each of the study communities for their cooperation and especially to the members of the health testing field team for their efforts.

Early Autism Risk Longitudinal Investigation cohort (EARLI)
We thank the families, clinicians, and study staff who participated in EARLI. We thank J.H.B.R. for sample processing and the JHU SNP Center for performing the methylation assays.

Etudes des Dé terminants pré et postnatals pré coces du dé veloppement et de la santé de l'Enfant (EDEN)
The analysis for EDEN is the result of a 'Collaboration INSERM et CEA-IG-CNG Epigenetique. On behalf of the EDEN Mother-Child Cohort Study Group, we thank the study participants and staff for their participation in this cohort.

Healthy Start
We thank all women and children who have taken part in the Healthy Start study. We also thank Mrs Mercedes Martinez, the Healthy Start Study Project Coordinator, Colorado School of Public Health, University of Colorado Denver, and the Healthy Start team for their hard work and dedication.

Infancia y Medio Ambiente (INMA)
INMA researchers would like to thank all the participants for their generous collaboration. A full roster of the INMA Project Investigators can be found at [http://www.proyectoinma.org/presen tacioninma/listado-investigadores/en_listado-investigadores.html].

Inner City Asthma Consortium (ICAC) EPIGEN Cohort
We would like to extend our gratitude to the investigators, participants, and their families.

Isle of Wight (IoW)
The IoW cohorts acknowledges the great help provided by the nurses at the David Hide Asthma and Allergy Research Centre led by Professor Hasan Arshad, Stephen Potter for data management, Faisal Rezwan and Cory White from the University of Southampton in DNA methylation data pre-processing and Nikki Graham for sample processing. We greatly appreciate the support of the participating families in 1989 birth cohort and the 3rd Generation study.