Proxy gene-by-environment Mendelian randomization study confirms a causal effect of maternal smoking on offspring birthweight, but little evidence of long-term influences on offspring health

Abstract Background A lack of genetic data across generations makes transgenerational Mendelian randomization (MR) difficult. We used UK Biobank and a novel proxy gene-by-environment MR to investigate effects of maternal smoking heaviness in pregnancy on offspring health, using participants’ (generation one: G1) genotype (rs16969968 in CHRNA5) as a proxy for their mothers’ (G0) genotype. Methods We validated this approach by replicating an established effect of maternal smoking heaviness on offspring birthweight. Then we applied this approach to explore effects of maternal (G0) smoking heaviness on offspring (G1) later life outcomes and on birthweight of G1 women’s children (G2). Results Each additional smoking-increasing allele in offspring (G1) was associated with a 0.018 [95% confidence interval (CI): -0.026, -0.009] kg lower G1 birthweight in maternal (G0) smoking stratum, but no meaningful effect (-0.002 kg; 95% CI: -0.008, 0.003) in maternal non-smoking stratum (interaction P-value = 0.004). The differences in associations of rs16969968 with grandchild’s (G2) birthweight between grandmothers (G0) who did, versus did not, smoke were heterogeneous (interaction P-value = 0.042) among mothers (G1) who did (-0.020 kg/allele; 95% CI: -0.044, 0.003), versus did not (0.007 kg/allele; 95% CI: -0.005, 0.020), smoke in pregnancy. Conclusions Our study demonstrated how offspring genotype can be used to proxy for the mother’s genotype in gene-by-environment MR. We confirmed the causal effect of maternal (G0) smoking on offspring (G1) birthweight, but found little evidence of an effect on G1 longer-term health outcomes. For grandchild’s (G2) birthweight, the effect of grandmother’s (G0) smoking heaviness in pregnancy may be modulated by maternal (G1) smoking status in pregnancy.


Proxy gene-by-environment Mendelian randomization study confirms causal effect of maternal smoking on offspring birthweight, but little evidence of long-term influences on offspring health
Qian  Associations of G1 rs16969968 with potential confounders by strata of smoking status in participants (G1) and their mothers (G0), adjusted for the first ten principal components Supplementary Table 3. Differences in the associations of rs16969968 with 12 outcomes in participants (G1) across maternal (G0) smoking status in pregnancy

Potential confounders in G1
The age of participants at baseline was derived by the UK Biobank based on their date of birth and date of attending the initial assessment. We used Townsend deprivation index and household income as measures of socio-economic position at baseline. Townsend deprivation index is a score measuring material deprivation in the area where participants were living, calculated by UK Biobank using participants' postcode. Participants (except those who living in sheltered accommodation or a care home) were asked to report their average total household income before tax, with five categories ranging from less than £18 000 to greater than £100 000. Sex of participants was acquired from the registry and updated by the participants.
Supplementary Table 5 summarised the UK Biobank fields we used to derive our smoking phenotypes, outcomes, and confounders.

Simulation
We performed two simulations to compare statistical power of proxy G×E MR to that of G×E MR (where maternal [G0] genotype is available).
Simulation A is for offspring (G1) early life outcomes (e.g. birthweight) that should not be affected by G1 smoking, and thus we will not stratify on G1 smoking status. We generated simulated data according to the directed acyclic graphs shown in Supplementary Figure 6A, using the following steps: (1) We generated G0 rs16969968 according to the allelic dosage distribution in UK Biobank.
(2) We generated G1 rs16969968 according to both the allelic dosage distribution in UK Biobank and G0 rs16969968. For example, mothers with dosage 1 could have offspring with dosage 0, 1 or 2, with probabilities (1 -effect allele frequency [EAF])/2, 1/2 and EAF/2, respectively. The proportions of G0 and G1 in each dosage were shown in Supplementary Figure 6B.
(3) We generated G0 smoking status in pregnancy according to its proportion of smokers and non-smokers in UK Biobank.
(5) We generated a standardised continuous outcome in the offspring (G1). Among G0 nonsmokers this was generated randomly from a normal distribution. Among G0 smokers, we generated the outcome as a linear function of G0 smoking heaviness and normally distributed random error (as was used for non-smokers).
We repeated our simulation A with different effect sizes of G0 smoking heaviness (i.e. 0.01, 0.025, 0.05, 0.075 and 0.1 SD/cigarette) for step (5) and different total sample sizes (i.e. 100 000, 500 000, 1 000 000 and 5 000 000). Given an effect size and a sample size, we simulated 1000 times. In each time, we stratified on G0 smoking status, estimated G0 rs16969968 -G1 outcome associations for G×E MR and G1 rs16969968 -G1 outcome associations for proxy G×E MR in each stratum, and then identified the strength of interaction between G0 strata using Cochran's Q test for heterogeneity.
Simulation B is for G1 later life outcomes (e.g. adulthood body mass index [BMI]) that would be affected by G1 smoking, and thus we will further stratify on G1 smoking status. We generated simulated data in the same way as steps (1)(2)(3)(4). We also generated G1 smoking status and G1 smoking heaviness following the same rules in steps (3)(4). Finally, we generated a standardised continuous outcome in G1. Among G1 non-smokers, this was generated in the same way as step (5). Among G1 smokers, we further include a linear effect of G1 smoking heaviness (0.1 SD/cigarette). We repeated our simulation B in the same way as A. In each time, we stratified on both G0 and G1 smoking status, estimated those genetic associations in A, and then within each G1 stratum identified the strength of interaction between G0 strata (e.g. G0 smokers & G1 smokers versus G0 non-smokers & G1 smokers).
Given an effect size and a sample size, the statistical power would be the times (when the interaction P-value is smaller than 0.05) out of 1000. We plotted the power in Supplementary  Figure 2B&C), it was unknown if this was before or during their pregnancy. If she stopped smoking at 23 years old, she did not smoke during her pregnancy.  Boxes around a phenotype denote this phenotype is being conditioned upon. Blue solid arrow denotes known association, blue dashed arrow denotes the hypothesis we are testing, and red dashed-dotted arrows denote that conditioning on the collider induced an association between parents of the collider -i.e. an association between genotype and the confounders.
(A) The smoking heaviness variant (rs16969968) has been shown to influence smoking status in pregnancy. (3) In this DAG, G1 birthweight is related to G0 smoking status in pregnancy because they have a common cause (e.g. G0 socioeconomic status [SES]). Therefore, G0 smoking status in pregnancy would be a collider such that conditioning on it induces an alternative pathway between rs16969968 and G1 birthweight that is not via smoking heaviness (shown as ). This may bias the association of rs16969968 with G1 birthweight. For example, G0 individuals with more smoking-increasing alleles would be more likely to smoke in pregnancy. G0 individuals with lower SES would be more likely to smoke in pregnancy and concurrently have a lower G1 birthweight. The observed association of G0 rs16969968 with G1 birthweight would include not only the true adverse effect via G0 smoking heaviness but also an adverse effect via conditioning on G0 smoking status in pregnancy, and thus bias the true estimate away from the null. (B) Besides the potential collider bias described in (A), our proxy (i.e. G1 rs16969968) is weakly associated with G1 smoking status. In this DAG, G1 later life outcomes are also related to G1 smoking status via a common cause (e.g. G1 SES). Therefore, G1 smoking status would be a collider such that conditioning on it induces an alternative pathway between rs16969968 with G1 later life outcomes that is not via smoking heaviness (shown as ). This may bias the association of rs16969968 with G1 later life outcomes. (C) Besides the potential collider bias described in (A), our proxy (i.e. G1 rs16969968) is associated with female G1 smoking status in pregnancy. In this DAG, G2 birthweight is related to female G1 smoking status in pregnancy because they have common causes (e.g. G1 SES). Therefore, female G1 smoking status in pregnancy would be a collider such that conditioning on it induces an alternative pathway between rs16969968 with G2 birthweight that is not via smoking heaviness (shown as ). This may bias the association of rs16969968 with G2 birthweight.
Other sources of bias regarding conditioning on a downstream effect (e.g. missingness) of exposures or outcomes have been discussed by Hughes et al. (4)

Supplementary Figure 4. The associations of G1 rs16969968 with G1 height and age at menarche in sensitivity analyses
Generation (G)0: UK Biobank participants' mother; G1: UK Biobank participants themselves.
Estimates are the mean difference of G1 outcome per each smoking-heaviness increasing allele of rs16969968. G1 were grouped according to whether they were ever smokers before achieving their adulthood height or their age at menarche. G1 who started smoking at the same age of achieving their adulthood height or at menarche were removed from analyses due to uncertainty. 9

Supplementary Figure 5. Comparison of statistical power between gene-by-environment (G×E) Mendelian randomization (MR) and proxy G×E MR
*For example, if the outcome is G1 age at menarche and its SD is 1.6 years in UK Biobank, 0.01 SD/cigarette will be equivalent to 5.84 days/cigarette. Figure 6. Data generation mechanism for the simulation

Supplementary
*The effect allele frequency is 33% in UK Biobank participants, and we assume this frequency remains the same in their mothers and fathers.  2 Participants smoking in pregnancy was derived from G1 age at first live birth and the ages they reported starting and stopping smoking. As these ages were recorded as a whole number of years, it was not always possible to determine whether a woman was a smoker in pregnancy (see the Methods section in the main paper and Supplementary Figure 2 for details). 3 Controls were participants who did not indicate having asthma diagnosed by a doctor.

Confounder
By G1 smoking status Overall By G0 smoking status in pregnancy Yes No Age (years) 1 All participants -0.018 ( 1 Results were from linear regression. Estimates are mean difference of confounder per each smoking-heaviness increasing allele of rs16969968. 2 Results were from ordinal logistic regression for household income (less than £18 000 = 1, £18 000 to £30 999 = 2, £31 000 to £51 999 = 3, £52 000 to £100 000 = 4, greater than £100 000 = 5). Estimates are the change in odds of being in a higher level of household income per each smoking-heaviness increasing allele of rs16969968. 3 Results from logistic regression for sex. Estimates are the change in odds of being male rather than female per each smoking-heaviness increasing allele of rs16969968.  Table 3. Differences in the associations of rs16969968 with 12 outcomes in participants (G1) across maternal (G0) smoking status in pregnancy 1 We combined current and former smokers into ever smokers for some outcomes given smoking cessation may not influence them rapidly. 2 Interaction P-value was obtained using Cochran's Q statistic for the heterogeneity in the association of rs16969968 with each outcome between participants whose mothers did versus did not smoke.
G1 outcome (associations shown in Figure 2