Consortium-based genome-wide meta-analysis for childhood dental caries traits

Abstract Prior studies suggest dental caries traits in children and adolescents are partially heritable, but there has been no large-scale consortium genome-wide association study (GWAS) to date. We therefore performed GWAS for caries in participants aged 2.5–18.0 years from nine contributing centres. Phenotype definitions were created for the presence or absence of treated or untreated caries, stratified by primary and permanent dentition. All studies tested for association between caries and genotype dosage and the results were combined using fixed-effects meta-analysis. Analysis included up to 19 003 individuals (7530 affected) for primary teeth and 13 353 individuals (5875 affected) for permanent teeth. Evidence for association with caries status was observed at rs1594318-C for primary teeth [intronic within ALLC, odds ratio (OR) 0.85, effect allele frequency (EAF) 0.60, P 4.13e-8] and rs7738851-A (intronic within NEDD9, OR 1.28, EAF 0.85, P 1.63e-8) for permanent teeth. Consortium-wide estimated heritability of caries was low [h2 of 1% (95% CI: 0%: 7%) and 6% (95% CI 0%: 13%) for primary and permanent dentitions, respectively] compared with corresponding within-study estimates [h2 of 28% (95% CI: 9%: 48%) and 17% (95% CI: 2%: 31%)] or previously published estimates. This study was designed to identify common genetic variants with modest effects which are consistent across different populations. We found few single variants associated with caries status under these assumptions. Phenotypic heterogeneity between cohorts and limited statistical power will have contributed; these findings could also reflect complexity not captured by our study design, such as genetic effects which are conditional on environmental exposure.


Introduction
Dental caries remains a prevalent public health problem in both children and adults. Untreated dental caries was estimated to affect 621 million children worldwide in 2010, with little change in prevalence or incidence between 1990 and 2010 (1). This problem is not unique to lower income countries; around 50% of children have evidence of caries by age 5 in industrialized nations (2)(3)(4). Dental caries results from reduced mineral saturation of fluids surrounding teeth, driven by ecological shifts in the oral microbiome (5). Many different factors predispose toward dental caries, of which high sugar consumption, poor oral hygiene and low socio-economic status are the most notorious (6)(7)(8). Over the last decades there has been increasing appreciation for the role of genetic influences in dental caries. The importance of genetic susceptibility for dental caries experience was demonstrated in an animal model over 50 years ago, a finding since substantiated in twin studies in humans (9)(10)(11). Of particular relevance to caries traits in children and adolescents, Bretz et al. (10) analysed longitudinal rates of change in caries status in children, and found that caries progression and severity were highly heritable in the primary and permanent dentition. It has also been suggested that heritability for dental caries does not depend entirely on genetic predisposition to sweet food consumption (12). Despite evidence of a genetic contribution to caries susceptibility, few specific genetic loci have been identified.
Shaffer et al. (13) performed the first GWAS for dental caries in 2011, studying the primary dentition of 1305 children. They found evidence for association at novel and previously studied candidate genes (ACTN2, MTR, EDARADD, MPPED2 and LPO), but no individual single-nucleotide polymorphisms (SNPs) exceeded the genome-wide significance threshold (P 5.0e-08), possibly as a consequence of the modest sample size (13). The first GWAS for dental caries in the permanent dentition in adults was performed at a similar time by Wang et al. (14). They included 7443 adults from five different cohorts and identified several suggestive loci (P-value 10e-05) for dental caries (RPS6KA2, PTK2B, RHOU, FZD1, ADMTS3 and ISL1), different loci from those mentioned above for the primary dentition and again with no single variants reaching genome-wide significance.
The next wave of GWAS of caries suggested association at a range of different loci. Two GWAS used separate phenotype definitions for pit-and-fissure and smooth tooth surfaces and identified different loci associated with dental caries susceptibility in both primary and permanent dentition (15,16). The GWAS in primary dentition used a sample of approximately 1000 children and found evidence for association at loci reported in previous studies, including MPPED2, RPS6KA2 and AJAP1 (13)(14)(15)(16). The largest GWAS for dental caries in permanent dentition was performed in a Hispanic and Latino sample of 11 754 adults (17). This study identified unique genetic loci (NAMPT and BMP7) compared with previous GWAS in individuals of European ancestry. To date, it is unclear whether the variability in nominated loci reflects true variability in the genetic architecture of dental caries across different populations, age periods and sub-phenotypic definitions, or merely represent chance differences between studies given the modest power in the studies performed to date.
Dental caries is a complex and multifactorial disease, caused by a complex interplay between environmental, behavioural and genetic factors. Until now there has been a lack of largescale studies of dental caries traits in children and the genetic basis of these traits remains poorly characterized. This investigation set out to examine the hypothesis that common genetic variants influence dental caries with modest effects on susceptibility. We anticipated that (a) caries in both primary and permanent teeth would be heritable in children and adolescents aged 2.5-18 years and (b) common genetic variants are likely to only have small effects on the susceptibility of a complex disease such as dental caries. Therefore, the aim of this largescale, consortium-based GWAS is to examine novel genetic loci associated with dental caries in primary and permanent dentition in children and adolescents.  Fig. S3).

Meta-analysis of caries in primary teeth in individuals of
The strongest evidence for association with caries in primary teeth was seen at rs1594318 [odds ratio (OR) 0.85 for C allele, EAF 0.60, P ¼ 4.13e-08] in the European ancestry metaanalysis (Figs 1, 2 and 3, Table 1). This variant is intronic within ALLC on 2p25, a locus which has not previously been reported for dental caries traits. In the meta-analysis combining individuals of all ancestries this variant no longer reached genomewide significance, although suggestive evidence persisted at rs1594318 (OR 0.868 for C allele EAF 0.60, P ¼ 3.78e-07) and other intronic variants within ALLC in high linkage disequilibrium (LD) (Fig. 3). For the permanent dentition the strongest statistical evidence for association was seen between caries status and rs7738851 (OR 1.28 for A allele, EAF 0.85, P ¼ 1.63e-08) (Figs 1, 2 and 4, Table 1). This variant is intronic within NEDD9 on 6p24.

Estimated heritability
Using participant level data in ALSPAC heritability was estimated at 0.28 (95% CI 0.09: 0.48) and 0.17 (95% CI 0.02: 0.31) for primary and permanent teeth, respectively. Using summary statistics at the meta-analysis level produced point estimates near zero heritability, with wide confidence intervals (Table 2).

Cross-phenotype comparisons
Genome-wide mean chi-squared was too low to undertake genome-wide genetic correlation using the linkage disequilibrium score regression (LDSR) method for caries in either primary or permanent teeth. Hypothesis-free phenome-wide lookup for rs1594318 included 885 GWAS where either rs1594318 or a proxy with r 2 > 0.8 was present. None of these traits showed evidence of association with rs1594318 at a Bonferroni-corrected alpha of 0.05. Lookup of rs7738851 and its proxies was performed against 662 traits, where similarly no traits reached a Bonferronicorrected threshold. Hypothesis-driven lookup in adult caries traits revealed no strong evidence for persistent genetic effects into adulthood (Table 3).
Gene prioritization, gene set enrichment and association with predicted gene transcription Gene-based tests identified association between caries status in the primary dentition and a region of 7q35 containing TCAF1, OR2F2 and OR2F1 (P ¼ 1.91e-06, 1.58e-06 and 1.29e-06, respectively). There were insufficient independently associated loci to perform gene set enrichment analysis using DEPICT for either of the principal meta-analyses. Association with predicted gene transcription was tested but no genes met the threshold for association after accounting for multiple testing. The single greatest evidence for association was seen between increased predicted transcription of CDK5RAP3 and increased liability for permanent caries (P ¼ 3.94e-05). CDK5RAP3 is known to interact with PAK4 and p14 ARF , with a potential role in oncogenesis (18,19).

Discussion
Dental caries in children and adolescents has not been studied to date using a large-scale, consortium-based genome-wide meta-analysis approach. Based on previous knowledge of the heritability of caries in young populations and from our understanding of other complex diseases, we anticipated that common genetic variants would be associated with dental caries risk with consistent effects across different cohorts. We found evidence for association between rs1594318 and caries in primary teeth. This variant showed weaker evidence for association in the multi-ethnic meta-analysis, potentially relating to ALLC (Allantoicase) codes the enzyme allantoicase, which is involved in purine metabolism and whose enzymatic activity is believed to have been lost during vertebrate evolution. Mouse studies suggest that this loss of activity relates to low expression levels and low substrate affinity rather than total non-functionality (20). Although there is some evidence that ALLC polymorphisms are associated with response to asthma treatment (21), there is limited understanding of the implications of variation in ALLC for human health, and it is possible that rs1594318 tags functionality elsewhere in the same locus.
For permanent teeth, we found evidence for association between caries status and rs7738851, an intronic variant with NEDD9 (neural precursor cell-expressed, developmentally down-regulard gene 9). NEDD9 is reported to mediate integrininitiated signal transduction pathways and is conserved from gnathostomes into mammals (22,23). NEDD9 appears to play a number of functional roles in disease and normal development, including regulation of neuronal differentiation, development and migration (22,(24)(25)(26)(27)(28). One such function involves regulation of neural crest cell migration (26). Disruption of neural crest signalling is known to lead to enamel and dentin defects in animal models (29,30) and might provide a mechanism for variation at rs7738851 to influence dental caries susceptibility.
Traditionally, risk assessment for dental caries in childhood has concentrated on dietary behaviours and other modifiable risk factors (31), with little focus on tooth quality. Although our understanding of the genetic risk factors for dental caries is incomplete, authors have noted that the evidence from previous genetic association studies tends to support a role for innate tooth structure and quality in risk of caries (32,33). If validated by future studies, the association with rs7738851 would provide further evidence for this argument, and may in the future enhance risk assessment in clinical practice. 2. Regional association plots. (A) Regional association plot for rs1594318 and caries in primary teeth (European ancestry meta-analysis). (B) Regional association plot for rs7738851 and caries in permanent teeth.
The lookup of lead associated variants against adult caries traits provided no strong evidence for persistent association in adulthood. This might imply genetic effects which are specific to the near-eruption timepoint. An alternative explanation is that the variants identified in the present study represent false positive signals as the statistical evidence presented is not irrefutable and there is no formal replication stage in our study; yet, we see good consistency of effects across studies.
The meta-analysis heritability estimates were lower than anticipated from either previous within-study heritability estimates (34) or the new within-study heritability estimates obtained for this analysis. There are several possible explanations for this phenomenon. First, the methods used in the present analysis are SNP based which consistently underestimate heritability of complex traits relative to twin and family studies (35). Second, meta-analysis heritability represents the heritability of genetic effects which are consistent across populations. In the event of genuine differences in genetic architecture of dental caries across strata of age, geography, environmental exposure or subtly different phenotypic meanings, the metaheritability estimate is not the same conceptually as the weighted average of heritability within each study.
More intuitively, genetic influences might be important within populations with relatively similar environments but not determine much of the overall differences in risk when comparing groups of people in markedly different environments. This view is consistent with existing literature from family based and candidate gene association studies suggesting the genetic architecture of dental caries is complex with multiple interactions. For example, gene-sex interactions are reported which change in magnitude between the primary and permanent dentition (36), genetic variants may have heterogeneous effects on the primary and permanent dentition (37) and environmental exposures such as fluoride may interact with genetic effects (38). Finally, the aetiological relevance of specific microbiome groups appears to vary between different populations (39), suggesting genetic effects acting through the oral microbiome might also vary between populations. Unfortunately, this study lacks statistical power to perform meta-analyses stratified on these exposures, so does not resolve this particular question.
In line with any consortium-based approach, the need to harmonize analysis across different collections led to some compromises. The phenotypic definitions used in this study do not contain information on disease extent or severity. Loss of information in creating these definitions may have contributed to the low statistical power of analysis. Our motivation for using simple definitions was based on the facts that (a) case-control status simply represents a threshold level of an underlying continuum of disease risk, (b) simple binary classifications facilitate comparison of studies with different assessment protocols and population risks and (c) simple classifications have been used successfully in a range of complex phenotypes.
Between participating centres there are differences in characteristics such as age at participation, phenotypic assessment and differences in the environment (such as nutrition, oral hygiene and the oral microbiome) which might influence dental caries or its treatment, as reflected in the wide range of caries prevalence between different study centres. Varying phenotypic characteristics do not necessarily result in heterogeneous genetic effects, as this variability may be uncorrelated with genetic effects. There was little evidence for heterogeneity in the top associated loci reported, however, the test for heterogeneity in genetic effects (I 2 ) is limited by the small number of participating studies in meta-analysis (40) and wide confidence intervals for within-study genetic effect estimates. Given these limitations, it is possible that heterogeneity contributed to low study power and prevented more comprehensive single variant findings.
In the ALSPAC study we made extensive use of questionnaire derived data. This will systematically under-report true   caries exposure compared with other studies as children or their parents are unlikely to be aware of untreated dental caries which would be evident to a trained assessor. We have explored some of these issues previously and shown that self-report measures at scale can be used to make meaningful inference about dental health in childhood (41). We believe that misclassification and under-reporting in questionnaire data would tend to bias genetic effect estimates and heritability toward the null. Despite this we show evidence for heritability using these definitions and effect sizes at lead variants are comparable with effect sizes obtained using clinically assessed data (Figs 3 and 4). As our power calculations showed, the sample size was sufficient to detect the identified variants associated at a genome wide significant level with caries in the primary teeth (rs1594318) and in permanent teeth (rs872877), where we observed relatively large effect sizes. For smaller effect sizes we were underpowered to identify association, and did not detect any variants with effect sizes (expressed as per-allele increased odds) smaller than 15% or 17% in the primary and permanent teeth, respectively. Caries is highly influenced by environmental factors and it is likely that its susceptibility is polygenic in nature (32) with individual genetic variants conferring small effect sizes, as seen in other comparable complex traits (42). Furthermore, some of the included studies had major differences in their caries prevalence, likely acting as a proxy for features affecting risk of caries. This may have introduced heterogeneity and reduced power to detect association, as discussed further below.
One area of interest in the literature is the ability of genetics to guide personalized decisions on risk screening or identifying treatment modalities, and this is also true in dentistry. The genetic variants identified in this study are unlikely to be useful on their own in this context, given the modest effect sizes and low total heritability observed in our meta-analysis. We would suggest clinicians should continue to consider environment and aggregate genetic effects (e.g. knowledge of disease patterns of close relatives) rather than specific genetic variants at this moment in time. Nevertheless, the findings of our study contribute to a better understanding of the genetic and biological mechanisms underlying caries susceptibility.

Study samples
We performed genome-wide association (GWA) analysis for dental caries case/control status in a consortium including nine coordinating centres. Study procedures differed between these centres. We use the term 'clinical dental assessment' to mean that a child was examined in person, whether this was in a dental clinic or a study centre. We use the term 'examiner' to refer to a dental professional, and use the term 'assessor' to refer to an individual with training who is not a dental professional, for example a trained research nurse.
The Avon Longitudinal Study of Parents and Children (ALSPAC) is a longitudinal birth cohort which recruited pregnant women living near Bristol, UK with an estimated delivery date between 1991 and 1992. Follow-up has included clinical assessment and questionnaires and is ongoing (43). A subset of children attended clinics including clinical dental assessment by a trained assessor at age 31, 43 and 61 months of age. Parents were asked to complete questionnaires about their children's health regularly, including comprehensive questions at a mean age of 5.4 and 6.4 years. Parents and children were asked to complete questionnaires about oral health at a mean age of 7.5, 10.7 and 17.8 years. Please note that the study website contains details of all the data that are available through a fully searchable data dictionary (www.bristol.ac.uk/alspac/researchers/access; date last accessed June 2018). Both clinical and questionnaire derived data were included in this analysis, with priority given to clinical data were available (Supplementary Material, Table S3).
The Copenhagen Prospective Studies on Asthma in Childhood includes two population-based longitudinal birth cohorts in Eastern Denmark. COPSAC2000 recruited pregnant women with a history of asthma between 1998 and 2001 (44). Children who developed wheeze in early life were considered for enrolment in a nested randomized trial for asthma prevention. COPSAC2010 recruited pregnant women between 2008 and 2010 and was not selected on asthma status. Both COPSAC2000 and COPSAC2010 studies included regular clinical follow-up. Within Denmark clinical dental assessment is routinely offered to children and adolescents until the age of 18 years and summary data from these examinations are stored in a national register. These data were obtained via index linkage for participants of COPSAC2000 and COPSAC2010 and used to perform joint analysis across both cohorts.
The Danish National Birth Cohort (DNBC) is a longitudinal birth cohort which recruited women in mid-pregnancy from 1996 onwards (45). For this analysis, index linkage was performed to obtain childhood dental records for mothers participating in DNBC. As with the COPSAC studies, these data were originally obtained by a qualified dentist and included surface level dental charting.
The Generation R study (GENR) recruited women in early pregnancy with expected delivery dates between 2002 and 2006 living in the city of Rotterdam, the Netherlands. The cohort is multi-ethnic with representation from several non-European ethnic groups. Follow-up has included clinical assessment visits and questionnaires and is ongoing (46). Intra-oral photography was performed as a part of their study protocol, with surface level charting produced by a dental examiner (a specialist in paediatric dentistry) (47). Analysis in GENR included (a) a multiethnic association study including all individuals with genetic and phenotypic data (48)

and (b) analysis including only individuals of European ancestry.
The GENEVA consortium is a group of studies which undertake coordinated analysis across several phenotypes (49). Within GENEVA, the Center for Oral Health Research in rural Appalachia, West Virginia and Pennsylvania, USA (COHRA), the Iowa Fluoride Study in Iowa, USA (IFS) and the Iowa Head Start (IHS) study participated in analysis of dental traits in children (15). COHRA recruited families with at least one child aged between 1 and 18 years of age, with dental examination performed at baseline (50). IFS recruited mothers and new-born infants in Iowa between 1992 and 1995 with a focus on longitudinal fluoride exposures and dental and bone health outcomes. Clinical dental examination in IFS was performed by trained assessors aged 5, 9, 13 and 17 years (51). IHS recruited children participating in an early childhood education program which included a one-time clinical dental examination (13).
The 'German Infant study on the influence of Nutrition Intervention plus air pollution and genetics on allergy development' (GINIplus) is a multi-centre prospective birth cohort study which has an observational and interventional arm which conducted a nutritional intervention during the first 4 months of life. The study recruited new born infants with and without family history of allergy in the Munich and Wesel areas, Germany between 1995 and 1998 (52,53) .The 'Lifestyle-related factors, Immune System and the development of Allergies in East and West Germany' study (LISA) is a longitudinal birth cohort which recruited between 1997 and 1999 across four sites in Germany (52,54). For participants living in the Munich area, follow-up used similar protocols in both GINIplus and LISA, with questionnaire and clinic data including clinical dental examination by trained examiners at age 10 and 15 years. Analysis for caries in GINIplus and LISA was therefore performed across both studies for participants at the Munich study centre.
The Physical Activity and Nutrition in Children (PANIC) Study is an ongoing controlled physical activity and dietary intervention study in a population of children followed retrospectively since pregnancy and prospectively until adolescence. Altogether 512 children 6-8 years of age were recruited in 2008-2009 (55). The main aims of the study are to investigate risk factors and pathophysiological mechanisms for overweight, type 2 diabetes, atherosclerotic cardiovascular diseases, musculoskeletal diseases, psychiatric disorders, dementia and oral health problems and the effects of a long-term physical activity and dietary intervention on these risk factors and pathophysiological mechanisms. Clinical dental examinations were performed by a qualified dentist with tooth level charting.
The Cardiovascular Risk in Young Finns Study (YFS) is a multi-centre investigation which aimed to understand the determinants of cardiovascular risk factors in young people in Finland. The study recruited participants who were aged 3, 6, 9, 12, 15 and 18 years old in 1980. Eligible participants living in specific regions of Finland were identified at random from a national population register and were invited to participate. Regular follow-up has been performed through physical examination and questionnaires (56). Clinical dental examination was performed by a qualified dentist with tooth level charting.
The Western Australian Pregnancy Cohort (RAINE) study is a birth cohort which recruited women between 16th and 20th week of pregnancy living in the Perth area, Western Australia. Recruitment occurred between 1989 and 1991 with regular follow-up of mothers and their children through research clinics and questionnaires (57). The presence or absence of dental caries was recorded by a trained assessor following clinical dental examination at the year 3 clinic follow-up.
Further details of study samples are provided in Supplementary Material, Table S1.

Medical Ethics
Within each participating study written informed consent was obtained from the parents of participating children after receiving a full explanation of the study. Children were invited to give assent where appropriate. All studies were conducted in accordance with the Declaration of Helsinki.
Ethical approval for the ALSPAC study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committee. Full details of ethical approval policies and supporting documentation are available online (http://www.bris tol.ac.uk/alspac/researchers/research-ethics/; date last accessed June 2018). Approval to undertake analysis of caries traits was granted by the ALSPAC executive committee (B2356).
The COPSAC2000 cohort was approved by the Regional Scientific Ethical Committee for Copenhagen and Frederiksberg The DNBC study of caries was approved by the Scientific Ethics Committee for the Capital City Region (Copenhagen), the Danish Data Protection Agency and the DNBC steering committee.
Each participating site in the GENEVA consortium caries analysis received approval from the local university institutional review board (federal wide assurance number for GENEVA caries project: FWA00006790). Within the COHRA arm local approval was provided by the University of Pittsburgh (020703/0506048) and West Virginia University (15620B), whilst the IFS and IHS arms received local approval from the University of Iowa's Institutional Review Board.
The GENR study design and specific data acquisition were approved by the Medical Ethical Committee of the Erasmus University Medical Center, Rotterdam, The Netherlands (MEC-2007-413).
The GINIplus and LISA studies were approved by the ethics committee of the Bavarian Board of Physicians (10 year followup: 05100 for GINIplus and 07098 for LISA, 15 year follow-up 10090 for GINIplus, 12067 for LISA).
The PANIC study protocol was approved by the Research Ethics Committee of the Hospital District of Northern Savo. All participating children and their parents gave informed written consent.
The YFS study protocol was approved by local ethics committees for contributing sites.
The RAINE study was approved by the University of Western Australia Human Research Ethics Committee.

Phenotypes
Primary teeth exfoliate and are replaced by permanent teeth between 6 and 12 years of age. We aimed to separate caries status in primary and permanent teeth wherever possible using clinical information or age criteria, in line with our expectation that the genetic risk factors for dental caries might differ between primary and permanent dentition. For children in the mixed dentition we created two parallel case definitions, whilst in younger or older children a single case definition was sufficient.
All study samples included a mixture of children with dental caries and children who were caries-free, with varying degrees of within-mouth or within-tooth resolution. To facilitate comparison across these differing degrees of resolution all analysis compared children who were caries-free (unaffected) or had dental caries (affected). Missing teeth could represent exfoliation or delayed eruption rather than the endpoint of dental caries and therefore missing teeth were not included in classifying children as caries-free or caries affected.
In children aged 2.50 years to 5.99 years, any individual with 1 or more decayed or filled tooth was classified as caries affected, with all remaining individuals classified as unaffected. In children aged 6.00 years to 11.99 years of age, parallel definitions were determined for the primary dentition and permanent dentition, respectively. Any individual with at least 1 decayed or filled primary tooth was classified as caries affected for primary teeth, while all remaining participants were classified as unaffected. In parallel, any individual with at least 1 decayed or filled permanent tooth was classified as caries affected for permanent teeth, while all remaining individuals were classified as unaffected. In children and adolescents aged 12.00 to 17.99 years of age, any individual with 1 or more decayed or filled tooth or tooth surface (excluding third molar teeth) was classified as caries affected, with remaining individuals classified as unaffected.
Analysis was conducted in cross-section, meaning a single participant could only be represented in a single phenotype definition once. Where multiple sources of dental data were available for a single participant within a single phenotype definition window, the first source of data was selected (reflecting the youngest age at participation), in line with our expectation that caries status would be most heritable in the near-eruption period.
The sources of data used to create these phenotypic definitions are given in Supplementary Material, Table S3. Within ALSPAC only, questionnaire responses were used to supplement data from clinical examination. The questions asked did not distinguish between primary and permanent teeth. Based on the age at questionnaire response we derived variables which prioritized responses from questionnaires before 6.00 years of age (thought to predominantly represent caries in primary teeth), and responses after 10.00 years of age (which might predominantly represent caries in permanent teeth). The final data sweep considered in this analysis targeted adolescents at age 17.50 years. Some participants responded to this after their 18th birthday. Data derived from this final questionnaire sweep were not included in the principal meta-analyses but were included in the GCTA heritability analysis.
Each study performed routine QC measures during genotyping, imputation and association testing (Supplementary Material, Table S2). Further pre-meta-analysis QC was performed centrally using the EasyQC R package and accompanying 1KG phase1 v3 reference data (58). Minor allele count (MAC) was derived as the product of minor allele frequency (MAF) and site-specific number of alleles (twice the site-specific sample size). Variants were dropped which had a per-file MAC of 6 or lower, a site-specific sample size of 30 or lower, or an impute INFO score of less than 0.4. Sites which reported effect and noneffect alleles other than those reported in 1KG phase 1 v3 reference data were dropped. Following meta-analysis, sites with a weighted MAF of less than 0.005were dropped, along with variants present in less than 50% of the total sample.

Statistical analysis
Association testing Each cohort preformed GWA analysis using an additive genetic model. Caries status was modelled against genotype dosage whilst accounting for age at phenotypic assessment, age squared, sex and cryptic relatedness. Sex was accounted for by deriving phenotypic definitions and performing analysis separately within male and female participants, or by including sex as a covariate in association testing. Each study adopted approaches to account for cryptic relatedness and population stratification, as described in Supplementary Material, Table S2. In the GENR study parallel analyses were conducted for participants of European ancestry (using the approach described in Supplementary Material, Table S2) and the entire study population, using a previously published method (48). The software and exact approach used by each study is shown in Supplementary Material, Table S2.

Meta-analysis
Results of GWA analysis within each study were combined in two principal meta-analyses, representing caries status in primary teeth and caries status in permanent teeth. For primary teeth, parallel meta-analyses were performed, one using results of multi-ethnic analysis in the GENR study and the other using results of European ancestry analysis in the GENR study. The GENR study did not have phenotypic data for permanent teeth, therefore the analysis of permanent teeth contained only individuals of European ancestry. Fixed-effects meta-analyses was performed using METAL (59), with genomic control of input summary statistics enabled and I 2 test for heterogeneity. Meta-analysis was run in parallel in two centres and results compared. All available studies with genotype and phenotypic information were included in a one-stage design, therefore there was no separate replication stage.

Meta-analysis heritability estimates
For each principal meta-analysis population stratification and heritability were assessed using LDSR (60). Reference LD scores were taken from HapMap3 reference data accompanying the LDSR package.
Within-sample heritability estimates For comparison, heritability within the ALSPAC study was assessed using the GREML method (61), implemented in the GCTA software package (62), using participant level phenotype data and a genetic relatedness matrix estimated from common genetic variants (with MAF> 5.0%) present in HapMap3.

Hypothesis-free cross-trait lookup
We used PLINK 2.0 (63) to clump meta-analysis summary statistics based on LD structure in reference data from the UK10K project. We then performed hypothesis-free cross-trait lookup of independently associated loci using the SNP lookup function in the MRBase catalogue (64). Proxies with an r 2 of 0.8 or higher were included where the given variant was not present in an outcome of interest. We considered performing hypothesis-free cross-trait genetic correlation analysis using bivariate LD score regression implemented in LDhub (65).
Lookup in previously published paediatric caries GWAS Previously published caries GWAS was performed within the GENEVA consortium, which is also represented in our metaanalysis. We therefore did not feel it would be informative to undertake lookup of associated variants in previously published results.

Lookup in GWAS for adult caries traits
This analysis was planned and conducted in parallel with analysis of quantitative traits measuring lifetime caries exposure in adults (manuscript in draft).The principal trait studied in the adult analysis was an index of decayed, missing and filled tooth surfaces (DMFS). This index was calculated from results of clinical dental examination, excluding third molar teeth. The DMFS index was age-and-sex standardized within each participating adult study before GWAS analysis was undertaken. Study-specific results files were then combined in a fixedeffects meta-analysis. In addition to DMFS, two secondary caries traits were studied in adults, namely number of teeth (a count of remaining natural teeth at time of study participation) and standardized DFS (derived as the number of decayed and filled surfaces divided by the number of natural tooth surfaces remaining at time of study participation). After age-and-sex standardization these secondary traits had markedly nonnormal distribution and were therefore underwent rank-based inverse normal transformation before GWAS analysis and meta-analysis. We performed cross-trait lookup of lead associated variants in the paediatric caries meta-analysis against these three adult caries traits. As the unpublished analysis also contains samples which contributed to previously published GWAS, we did not feel it would be informative to undertake additional lookup in published data.
Gene prioritization, gene set enrichment and association with predicted gene transcription Gene-based testing of summary statistics was performed using MAGMA (66) with reference data for LD correction taken from the UK10K project and gene definitions based on a 50 kb window either side of canonical gene start: stop positions. Gene set enrichment analysis was considered using the software package DEPICT (67). Tests for association between phenotype and predicted gene transcription were performed using S-PrediXcan (68), which is a summary-statistic implementation of the PrediXcan method. This method aims to assess the effects of tissue-specific gene transcription on phenotypes. Gene transcription models are trained in datasets with transcriptomic data, then used to predict gene expression in datasets with phenotypic data. This method was applied using the MetaXcan standalone software (https:// github.com/hakyimlab/MetaXcan; date last accessed June 2018) and a transcription prediction model trained in whole blood (obtained from the PedictDB data repository at http://predictdb. org/; date last accessed June 2018). Bonferroni correction was applied on the basis of approximately 7000 independent gene-based tests for two caries traits, giving an experiment-wide significance level of approximately P < 3.6e-06.

Power calculations
Post-hoc power calculations were performed using the free, web-based tool Genetic Association Study (GAS) Power Calculator and the software utility Quanto (v1.2.4) (https://csg. sph.umich.edu/abecasis/gas_power_calculator/index.html, http://biostats.usc.edu/Quanto.html; date last accessed June 2018) (69). Using the sample size and caries prevalence of the final meta-analysis samples, we calculated the minimum effect size required to have 80% discovery power at a significance level of 5.0e-08 for variants with MAF between 0.05 and 0.50. For primary teeth (17 037 individuals, 6922 caries affected, prevalence 40.6%) we were able to detect variants with a minimal effect size (OR) between 1. 13

Supplementary Material
Supplementary Material is available at HMG online.
We are very grateful to the children and families who agreed to participate in the contributing studies, without whom this research would not be possible. We would like to acknowledge the role of Mark McCarthy and the Early Growth Genetics consortium in recruiting studies which contributed to this analysis. For ALSPAC, we are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The authors are grateful to the Raine Study participants and their families, and to the Raine Study research staff for cohort coordination and data collection. The authors gratefully acknowledge the assistance of the Western Australian DNA Bank (National Health and Medical Research Council of Australia National Enabling Facility). We would also like to acknowledge the Raine Study participants for their ongoing participation in the study, and the Raine Study Team for study coordination and data collection.

Funding
Funding to pay the Open Access publication charges for this article was provided by the Medical Research Council Integrative Epidemiology Unit at the University of Bristol which is supported by the Medical Research Council and University of Bristol.