Consortium genome-wide meta-analysis for childhood dental caries traits

Prior studies suggest dental caries traits in children and adolescents are partially heritable, but there has been no large-scale consortium genome-wide association study (GWAS) to date. We therefore performed GWAS for caries in participants aged 2.5-18.0 years from 9 contributing centers. Phenotype definitions were created for the presence or absence of treated or untreated caries, stratified by primary and permanent dentition. All studies tested for association between caries and genotype dosage (imputed to Haplotype Reference Consortium or 1000 Genomes phase 1 version 3 panels) accounting for population stratification. Fixed–effects meta-analysis was performed weighted by inverse standard error. Analysis included up to 19,003 individuals (7,530 affected) for primary teeth and 13,353 individuals (5,875 affected) for permanent teeth. Evidence for association with caries status was observed at rs1594318-C for primary teeth (intronic within ALLC, Odds Ratio (OR) 0.85, Effect Allele Frequency (EAF) 0.60, p 4.13e-8) and rs7738851-A (intronic within NEDD9, OR 1.28, EAF 0.85, p 1.63e-8) for permanent teeth. Consortium-wide estimated heritability of caries was low (h2 of 1% [95% CI: 0%:7%] and 6% [95% CI 0%:13%] for primary and permanent dentitions, respectively) compared to corresponding within-study estimates (h2 of 28%, [95% CI: 9%:48%] and 17% [95% CI:2%:31%]) or previously published estimates. This study was designed to identify common genetic variants with modest effects which are consistent across different populations. We found few single variants associated with caries status under these assumptions. Phenotypic heterogeneity between cohorts and limited statistical power will have contributed; these findings could also reflect complexity not captured by our study design, such as genetic effects which are conditional on environmental exposure. Author summary Dental caries (tooth decay) is a common disease in children. Previous studies suggest genetic factors alter caries risk, but to date there is a gap of knowledge in identifying which specific genetic variants are responsible. We undertook analysis in a consortium including around 19,000 children and investigated whether any of 8 million common genetic variants were associated with risk of caries in primary (milk) or permanent teeth. If identified, these variants are used as ‘tags’ to highlight genes which may be involved in a disease. We identified variants in two loci associated with caries status; in the primary (rs1594318) and permanent dentition (rs7738851). The former is intronic in ALLC, a gene with poorly understood function. The latter is an intronic variant within NEDD9, a gene which has several known functions including a role in development of craniofacial structures. To gain a more comprehensive understanding of genetic effects which influence caries larger studies and a better understanding of environmental modifiers or interactions with genetic effects are required.

Dental caries is a complex and multifactorial disease, caused by a complex interplay between environmental, behavioral and genetic factors. Until now there has been a lack of large scale studies of dental caries traits in children and the genetic basis of these traits remains poorly characterized. This investigation set out to examine the hypothesis that common genetic variants influence dental caries with modest effects on susceptibility. We anticipated that a) caries in both primary and permanent teeth would be heritable in children and adolescents aged 2.5 -to -18 years and b) common genetic variants are likely to only have small effects on the susceptibility of a complex disease such as dental caries.
Therefore, the aim of this large-scale, consortium-based GWAS is to examine novel genetic loci associated with dental caries in primary and permanent dentition in children and adolescents.

Study samples
We performed genome-wide association (GWA) analysis for dental caries case/control status in a consortium including 9 coordinating centers. Study procedures differed between these centers. We use the term 'clinical dental assessment' to mean that a child was examined in person, whether this was in a dental clinic or a study center. We use the term 'examiner' to refer to a dental professional, and use the term 'assessor' to refer to an individual with training who is not a dental professional, for example a trained research nurse. included regular clinical follow up. Within Denmark clinical dental assessment is routinely offered to children and adolescents until the age of 18 years and summary data from these examinations are stored in a national register. These data were obtained via index linkage for participants of COPSAC2000 and COPSAC2010 and used to perform joint analysis across both cohorts.
The Danish National Birth Cohort (DNBC) is a longitudinal birth cohort which recruited women in midpregnancy from 1996 onwards [20]. For this analysis, index linkage was performed to obtain childhood dental records for mothers participating in DNBC. As with the COPSAC studies, these data were originally obtained by a qualified dentist and included surface level dental charting.
The Generation R study (GENR) recruited women in early pregnancy with expected delivery dates The "German Infant study on the influence of Nutrition Intervention plus air pollution and genetics on allergy development" (GINIplus) is a multi-center prospective birth cohort study which has an observational and interventional arm which conducted a nutritional intervention during the first four months of life. The study recruited new born infants with and without family history of allergy in the Munich andWesel areas, Germany between 1995 and1998 [28, 29] .The "Lifestyle-related factors, Immune System and the development of Allergies in East and West Germany" study (LISA) is a longitudinal birth cohort which recruited between 1997 and 1999 across four sites in Germany [28,30].
For participants living in the Munich area, follow up used similar protocols in both GINIplus and LISA, with questionnaire and clinic data including clinical dental examination by trained examiners at age 10 and 15 years. Analysis for caries in GINIplus and LISA was therefore performed across both studies for participants at the Munich study center.
The Physical Activity and Nutrition in Children (PANIC) Study is an ongoing controlled physical activity and dietary intervention study in a population of children followed retrospectively since pregnancy and prospectively until adolescence. Altogether 512 children 6-8 years of age were recruited in 2008-2009 [31]. The main aims of the study are to investigate risk factors and pathophysiological mechanisms for overweight, type 2 diabetes, atherosclerotic cardiovascular diseases, musculoskeletal diseases, psychiatric disorders, dementia and oral health problems and the effects of a long-term physical activity and dietary intervention on these risk factors and pathophysiological mechanisms. Clinical dental examinations were performed by a qualified dentist with tooth level charting.
The Cardiovascular Risk in Young Finns Study (YFS) is a multi-center investigation which aimed to understand the determinants of cardiovascular risk factors in young people in Finland. The study recruited participants who were aged 3, 6, 9, 12 ,15 and 18 years old in 1980. Eligible participants living in specific regions of Finland were identified at random from a national population register and were invited to participate. Regular follow-up has been performed through physical examination and questionnaires [32].
Clinical dental examination was performed by a qualified dentist with tooth level charting.
The Western Australian Pregnancy Cohort (RAINE) study is a birth cohort which recruited women between 16 th and 20 th week of pregnancy living in the Perth area, Western Australia. Recruitment occurred between 1989 and 1991 with regular follow up of mothers and their children through research clinics and questionnaires [33]. The presence or absence of dental caries was recorded by a trained assessor following clinical dental examination at the year 3 clinic follow up. 1 Further details of study samples are provided in S1 Table.

Medical Ethics
Within each participating study written informed consent was obtained from the parents of participating children after receiving a full explanation of the study. Children were invited to give assent where appropriate. All studies were conducted in accordance with the Declaration of Helsinki.
Ethical approval for the ALSPAC study was obtained from the ALSPAC Ethics and Law Committee and

Phenotypes
Primary teeth exfoliate and are replaced by permanent teeth between 6 and 12 years of age. We aimed to separate caries status in primary and permanent teeth wherever possible using clinical information or age criteria, in line with our expectation that the genetic risk factors for dental caries might differ between primary and permanent dentition. For children in the mixed dentition we created two parallel case definitions, whilst in younger or older children a single case definition was sufficient.
All study samples included a mixture of children with dental caries and children who were caries-free, with varying degrees of within-mouth or within-tooth resolution. To facilitate comparison across these differing degrees of resolution all analysis compared children who were caries-free (unaffected) or had dental caries (affected). Missing teeth could represent exfoliation or delayed eruption rather than the endpoint of dental caries and therefore missing teeth were not included in classifying children as caries-free or caries affected.
In children aged 2.50 years to 5.99 years any individual with 1 or more decayed or filled tooth was classified as caries affected, with all remaining individuals classified as unaffected. In children aged 6.00 years to 11.99 years of age parallel definitions were determined for the primary dentition and permanent dentition respectively. Any individual with at least 1 decayed or filled primary tooth was classified as caries affected for primary teeth, while all remaining participants were classified as unaffected. In parallel, any individual with at least 1 decayed or filled permanent tooth was classified as caries affected for permanent teeth, while all remaining individuals were classified as unaffected. In children and adolescents aged 12.00 to 17.99 years of age any individual with 1 or more decayed or filled tooth or tooth surface (excluding third molar teeth) was classified as caries affected, with remaining individuals classified as unaffected.
Analysis was conducted in cross-section, meaning a single participant could only be represented in a single phenotype definition once. Where multiple sources of dental data were available for a single participant within a single phenotype definition window, the first source of data was selected (reflecting the youngest age at participation), in line with our expectation that caries status would be most heritable in the near-eruption period.
The sources of data used to create these phenotypic definitions are given in S3 Table. Within ALSPAC only, questionnaire responses were used to supplement data from clinical examination. The questions asked did not distinguish between primary and permanent teeth. Based on the age at questionnaire response we derived variables which prioritized responses from questionnaires before 6.00 years of age (thought to predominantly represent caries in primary teeth), and responses after 10.00 years of age (which might predominantly represent caries in permanent teeth). The final data sweep considered in this analysis targeted adolescents at age 17.50 years. Some participants responded to this after their eighteenth birthday. Data derived from this final questionnaire sweep were not included in the principal meta-analyses but were included in the GCTA heritability analysis.

Genotypes and imputation
All participating studies used genetic data imputed to a comprehensive imputation panel. The 1000 genomes phase 1 version 3 panel (1KG phase 1 v3) was used as a common basis across 6 centers (GINIplus/LISA, GENR, GENEVA, YFS, PANIC, RAINE, (S1 Table). In ALSPAC, DNBC, COPSAC2000 and COPSAC 2010 the haplotype reference consortium (HRC v1.0 and v1.1) imputation panels were used. (S1 Table) Each study performed routine quality control measures during genotyping, imputation and association testing (S2 Table). Further pre-meta-analysis quality control was performed centrally using the EasyQC R package and accompanying 1KG phase1 v3 reference data [34]. Minor allele count (MAC) was derived as the product of minor allele frequency and site-specific number of alleles (twice the site-specific sample size). Variants were dropped which had a per-file MAC of 6 or lower, a site-specific sample size of 30 or lower, or an impute INFO score of less than 0.4. Sites which reported effect and non-effect alleles other than those reported in 1KG phase 1 v3 reference data were dropped. Following meta-analysis, sites with a weighted minor allele frequency (MAF) of less than 0.5% were dropped, along with variants present in less than 50% of the total sample.

Statistical analysis
Association testing. Each cohort preformed GWA analysis using an additive genetic model. Caries status was modelled against genotype dosage whilst accounting for age at phenotypic assessment, age squared, sex and cryptic relatedness. Sex was accounted for by deriving phenotypic definitions and performing analysis separately within male and female participants, or by including sex as a covariate in association testing. Studies applied standard exclusions based on cryptic relatedness and ancestry, as described in S2 Table. In the GENR study association analysis included genetic principal components derived from the entire study population to account for ancestry in the multi-ethnic analysis, whilst the analysis of individuals of European ancestry included European population-specific genetic principal components [35]. The software and exact approach used by each study is shown in S2 Table. Meta-analysis. Results of GWA analysis within each study were combined in two principal metaanalyses, representing caries status in primary teeth and caries status in permanent teeth. For primary teeth, parallel meta-analyses were performed, one using results of multi-ethnic analysis in the GENR study and the other using results of European ancestry analysis in the GENR study. The GENR study did not have phenotypic data for permanent teeth, therefore the analysis of permanent teeth contained only individuals of European ancestry. Fixed-effects meta-analyses was performed using METAL [36], with genomic control of input summary statistics enabled and I 2 test for heterogeneity. Meta-analysis was run in parallel in two centers and results compared. All available studies with genotype and phenotypic information were included in a one stage design, therefore there was no separate replication stage.

Meta-analysis heritability estimates. For each principal meta-analysis population stratification
and heritability were assessed using linkage disequilibrium score regression (LDSR) [37]. Reference linkage disequilibrium (LD) scores were taken from HapMap3 reference data accompanying the LDSR package.
Within-sample heritability estimates. For  Hypothesis free cross trait lookup. We used PLINK 2.0 [40] to clump meta-analysis summary statistics based on LD structure in reference data from the UK10K project. We then performed hypothesis-free cross-trait lookup of independently associated loci using the SNP lookup function in the MRBase catalog [41]. Proxies with an r 2 of 0.8 or higher were included where the given variant was not present in an outcome of interest. We considered performing hypothesis free cross-trait genetic correlation analysis using bivariate LD score regression implemented in LDhub [42].

Lookup in previously published pediatric caries GWAS. Previously published caries
GWAS was performed within the GENEVA consortium, which is also represented in our meta-analysis.
We therefore did not feel it would be informative to undertake lookup of associated variants in previously published results.

Lookup in GWAS for adult caries traits. This analysis was planned and conducted in parallel
with analysis of quantitative traits measuring lifetime caries exposure in adults (manuscript in draft).The principal trait studied in the adult analysis was an index of decayed, missing and filled tooth surfaces (DMFS). This index was calculated from results of clinical dental examination, excluding third molar teeth.
The DMFS index was age-and-sex standardized within each participating adult study before GWAS analysis was undertaken. Study-specific results files were then combined in a fixed-effects meta-analysis [43]. In addition to DMFS, two secondary caries traits were studied in adults, namely number of teeth (a count of remaining natural teeth at time of study participation) and standardized DFS (derived as the number of decayed and filled surfaces divided by the number of natural tooth surfaces remaining at time of study participation). After age-and-sex standardization these secondary traits had markedly non-normal distribution and were therefore underwent rank-based inverse normal transformation before GWAS analysis and meta-analysis. We performed cross-trait lookup of lead associated variants in the pediatric caries meta-analysis against these three adult caries traits. As the unpublished analysis also contains samples which contributed to previously published GWAS, we did not feel it would be informative to undertake additional lookup in published data. The strongest evidence for association with caries in primary teeth was seen at rs1594318 (OR 0.85 for C allele, EAF 0.60, p = 4.13e-08) in the European ancestry meta-analysis (Figs 1,2 and 3, Table 1). This variant is intronic within ALLC on 2p25, a locus which has not previously been reported for dental caries traits. In the meta-analysis combining individuals of all ancestories this variant no longer reached genome-wide significance, although suggestive evidence persisted at rs1594318 (OR 0.868 for C allele EAF 0.60 p = 3.78e-07) and other intronic variants within ALLC in high linkage disequilibrium ( Figure 3).

Gene prioritization, gene set enrichment and association with gene transcription.
For the permanent dentition the strongest statistical evidence for association was seen between caries status and rs7738851 (OR 1.28 for A allele, EAF 0.85, p = 1.63e-08 Cross-phenotype comparisons. Genome-wide mean chi squared was too low to undertake genome-wide genetic correlation using the LDSR method for caries in either primary or permanent teeth.
Hypothesis-free phenome wide lookup for rs1594318 included 885 GWAS where either rs1594318 or a proxy with r 2 > 0.8 was present. None of these traits showed evidence of association with rs1594318 at a Bonferroni-corrected alpha of 0.05. Lookup of rs7738851 and its proxies was performed against 662 traits, where similarly no traits reached a Bonferroni-corrected threshold. Hypothesis-driven lookup in adult caries traits revealed no strong evidence for persistent genetic effects into adulthood (Table 3).

Gene prioritization, gene set enrichment and association with gene transcription.
Gene based tests identified association between caries status in the primary dentition and a region of 7q35 containing TCAF1, OR2F2 and OR2F1 (p=1.91e-06, 1.58e-06 and 1.29e-06, respectively). There were insufficient independently associated loci to perform gene set enrichment analysis using DEPICT for either of the principal meta-analyses. Association with gene transcription was tested but no genes met the threshold for association after accounting for multiple testing. The single greatest evidence for association was seen between increased transcription of CDK5RAP3 and increased liability for permanent caries (p=3.94e-05). CDK5RAP3 is known to interact with PAK4 and p14 ARF , with a potential role in oncogenesis [49,50].

Discussion
Dental caries in children and adolescents has not been studied to date using a large-scale, consortium based genome-wide meta-analysis aproach. Based on previous knowledge of the heritability of caries in young populations and from our understanding of other complex diseases, we anticipated that common genetic variants would be associated with dental caries risk with consistent effects across different cohorts. We found evidence for association between rs1594318 and caries in primary teeth. This variant showed weaker evidence for association in the multi-ethnic meta-analysis, potentially relating to different alelle frequencies across the different ethnic groups included in analysis. Frequency of the G allele is reported to vary between 0.24 in Asian populations to 0.42 in populations of European ancestry based on 1KGP allele fequencies. ALLC (Allanticase) codes the enzyme allantoicase, which is involved in purine metabolism and whose enzymatic activitiy is believed to have been lost during vertebrate evolution. Mouse studies suggest this loss of activity relates to low expression levels and low substrate affinity rather than total non-functionality [51]. Although there is some evidence that ALLC polymorphisms are associated with response to asthma treatment [52], there is limited understanding of the implications of variation in ALLC for human health, and it is possible that rs1594318 tags functionality elsewhere in the same locus.
For permanent teeth we found evidence for association between caries status and rs7738851, an intronic variant with NEDD9 (neural precursor cell-expressed, developmentally down-regulard gene 9 genetic association studies tends to support a role for innate tooth structure and quality in risk of caries [63,64]. If validated by future studies, the association with rs7738851 would provide further evidence for this argument, and may in the future enhance risk assessment in clinical practice.
The lookup of lead associated variants against adult caries traits provided no strong evidence for persistent association in adulthood. This might imply genetic effects which are specific to the neareruption timepoint. An alternative explaination is that the variants identified in the present study represent false positive signals; although we see good consistency of effects across studies the statistical evidence presented is not irrefutable and there is no formal replication stage in our study.
The meta-analysis heritabiltiy estimates were lower than anticipated from either previous within-study heritability estimates [65] or the the new within-study heritabiltiy estimates obtained for this analysis. phenotypes. Between participating centers there are differences in characteristics such as age at participation, phenotypic assesment and differences in the environment (such as nutrition, oral hygiene and the oral microbime) which might influence dental caries or its treatment, as reflected in the wide range of caries prevalence between different study centers. Potentially, this might lead to heterogenetiy in our meta-analyses. Although we see little evidence for heterogeneity in the top associated loci reported, it is possible that heterogeneity at other loci contributed to low study power and prevented more comprehensive single variant findings.
In the ALSPAC study we made extensive use of questionnaire derived data. This will systematically under-report true caries exposure compared to other studies as children or their parents are unlikely to be aware of untreated dental caries which would be evident to a trained assesor. We have explored some of these issues previously and shown that self-report measures at scale can be used to make meaningful inference about dental health in childhood [71]. We believe that misclassification and under-reporting in questionnaire data would tend to bias genetic effect estimates and heritability towards the null. Despite this we show evidence for heritabiltiy using these definitions and effect sizes at lead variants are comparable with effect sizes obtained using clinically assessed data (Figs 3,4).
As our power calculations showed, the sample size was sufficient to detect the identified variants associated at a genome wide significant level with caries in the primary teeth (rs1594318) and in permanent teeth (rs872877), where we observed relatively large effect sizes. For smaller effect sizes we were underpowered to identify association, and did not detect any variants with effect sizes (expressed as per-allele increased odds) smaller than 15% or 17% in the primary and permanent teeth, respectively. One area of interest in the literature is the ability of genetics to guide personalized decisions on risk screening or identifying treatment modalities, and this is also true in dentistry. The genetic variants identified in this study are unlikely to be useful on their own in this context, given the modest effect sizes and low total heritability observed in our meta-analysis. We would suggest clinicians should continue to consider environment and aggregate genetic effects (for example, knowledege of disease patterns of close relatives) rather than specific genetic variants at this moment in time. Nevertheless, the findings of our study contribute to a better understanding of the genetic and bioloigcal mechanisms underlying caries suceptibility. to detect association at a range of minor allele frequencies and effect sizes for caries in the primary dentition (European ancestry analysis) (a) and caries in the permanent dentition (b). Significance level is fixed at 5e-08.  S  t  r  a  k  e  r  L  ,  M  o  u  n  t  a  i  n  J  ,  J  a  c  q  u  e  s  A  ,  W  h  i  t  e  S  ,  S  m  i  t  h  A  ,  L  a  n  d  a  u  L  ,  e  t  a  l  .  C  o  h  o  r  t  P  r  o  f  i  l  e  :  T  h  e  W  e  s  t  e  r  n  A  u  s  t  r  a  l  i  a  n  P  r  e  g  n  a  n  c  y  C  o  h  o  r  t  (  R  a  i  n  e  )  S  t  u  d  y  -G  e  n  e  r  a  t  i  o  n  2  .  I  n  t  e  r  n  a  t  i  o  n  a  l  j  o  u  r  n  a  l  o  f  e  p  i  d  e  m  i  o  l  o  g  y  .  2  0  1  7  .  d  o  i  :  1  0  .  1  0  9  3  /  i  j  e  /  d  y  w  3  0  8  .  P  u  b  M  e  d  P  M  I  D  :  M  E  D  L  I  N  E  :  2  8  0  6  4  1  9  7  .  3  4  .  W  i  n  k  l  e  r  T  W  ,  D  a  y  F  R  ,  C  r  o  t  e  a  u  -C  h  o  n  k  a  D  C  ,  W  o  o  d  A  R  ,  L  o  c  k  e  A  E  ,  M  ä  g  i  R  ,  e  t  a  l  .  Q  u  a  l  i  t  y  c  o  n  t  r  o  l  a  n  d  c  o  n  d  u  c  t  o  f  g  e  n  o  m  e  -w  i  d  e  a  s  s  o  c  i  a  t  i  o  n  m  e o  f  g  r  o  u  p  2  o  f  t  h  e  j  o  i  n  t  E  F  P  /  O  R  C  A  w  o  r  k  s  h  o  p  o  n  t  h  e  b  o  u  n  d  a  r  i  e  s  b  e  t  w  e  e  n  c  a  r  i  e  s  a  n  d  p  e  r  i  o  d  o  n  t  a  l  d  i  s  e  a  s  e  s  .  J  o  u  r  n  a  l  o  f  C  l  i  n  i  c  a  l  P  e  r  i  o  d  o  n  t  o  l  o  g  y  .  2  0  1  7  ;  4  4  :  S  3  9  -S  5  1  .  d  o  i  :  1  0  .  1  1  1  1  /  j  c  p  e  .  1  2  6  8  5  .  6  4  .  N  i  b  a  l  i  L  ,  D  i  I  o  r  i  o  A  ,  T  u  Y  -K  ,  V  i  e  i  r  a  A  R  .  H  o  s  t  g  e  n  e  t  i  c  s  r  o  l  e  i  n  t  h  e  p  a  t  h  o  g  e  n  e  s  i  s  o  f  p  e  r  i  o  d  o  n  t  a  l  d  i  s  e  a  s  e  a  n  d  c  a  r  i  e  s  .  J  o  u  r  n  a  l  o  f  C  l  i  n  i  c  a  l  P  e  r  i  o  d  o  n  t  o  l  o  g  y  .  2  0  1    c Figure 2. Regional association plots. 2a: Regional association plot for rs1594318 and caries in primary teeth (European ancestry meta-analysis. 2b: Regional association plot for rs7738851 and caries in permanent teeth.