Vitamin D deficiency and C-reactive protein: a bidirectional Mendelian randomization study

Abstract Background Low vitamin D status is often associated with systemic low-grade inflammation as reflected by elevated C-reactive protein (CRP) levels. We investigated the causality and direction of the association between vitamin D status and CRP using linear and non-linear Mendelian randomization (MR) analyses. Methods MR analyses were conducted using data from 294 970 unrelated participants of White-British ancestry from the UK Biobank. Serum 25-hydroxyvitamin D [25(OH)D] and CRP concentrations were instrumented using 35 and 46 genome-wide significant variants, respectively. Results In non-linear MR analysis, genetically predicted serum 25(OH)D had an L-shaped association with serum CRP, where CRP levels decreased sharply with increasing 25(OH)D concentration for participants within the deficiency range (<25 nmol/L) and levelled off at ∼50 nmol/L of 25(OH)D (Pnon-linear = 1.49E-4). Analyses using several pleiotropy-robust methods provided consistent results in stratified MR analyses, confirming the inverse association between 25(OH)D and CRP in the deficiency range (P = 1.10E-05) but not with higher concentrations. Neither linear or non-linear MR analysis supported a causal effect of serum CRP level on 25(OH)D concentration (Plinear = 0.32 and Pnon-linear = 0.76). Conclusion The observed association between 25(OH)D and CRP is likely to be caused by vitamin D deficiency. Correction of low vitamin D status may reduce chronic inflammation.


Introduction
Systemic low-grade inflammation, characterized by prolonged release of inflammatory mediators and activation of harmful signal-transduction pathways, is associated with various complex somatic and neuropsychiatric diseases and disorders. 1,2 It is considered that nutritional factors can influence many aspects of inflammation. 3 Vitamin D is a pro-hormone and an essential micronutrient, and although its classical roles are related to the regulation of calcium homeostasis, various types of immune cells express both the vitamin D receptor and metabolizing enzymes, 4 suggesting that hormonal vitamin D could also play a role in modulating inflammatory responses. 5 This is supported by an inverse association between serum 25-hydroxyvitamin D [25(OH)D] concentrations and C-reactive protein (CRP), which is frequently reported in observational studies. 6,7 Serum 25(OH)D is the best indicator for vitamin D status whereas CRP is one of the most widely used inflammatory biomarkers in clinical practice. However, there is an ongoing debate about the causal nature of the association between 25(OH)D and CRP, and the observational association has not been supported by randomized trials. 8 Indeed, it has been suggested that the association between serum 25(OH)D and CRP simply reflects reverse causality or confounding, where a low 25(OH)D concentration is either a consequence of chronic inflammation or results from behaviours such as less time outdoors in people who are unwell. 8 Sitting at the interface between observational studies and randomized-controlled trials (RCTs), Mendelian randomization (MR) has been increasingly used to strengthen causal evidence in observational studies. 9 It uses genetic variants associated with the exposure of interest to approximate the exposure and, conditional on the key method assumptions being met, MR has the benefit of reducing bias due to confounding and reverse causation. 9 The association of 25(OH)D and CRP has previously been investigated using the MR approach, with no evidence to support a causal effect. 10,11 However, all previous studies have only used the standard linear MR method, which cannot rule out the possibility of a threshold effect restricted to vitamin D deficiency. 12 Indeed, it is logical to expect that improving vitamin D status would be relevant only in the presence of vitamin D deficiency, whereas any further additions may be redundant and, in the high extreme of supplementation, might become toxic. 13 These types of non-linear dose-response relationships can be tested by the non-linear MR approach, which allows us to interrogate the shape of the association. 14 This method has been recently used to provide evidence for the adverse effect of vitamin D deficiency on cardiovascular disease (CVD) risk and mortality, which is not visible using the standard linear MR approach. 15,16 In this study we set out to examine evidence for the direction and causality of the association between serum 25(OH)D and CRP, also allowing for possible threshold effects. We performed these analyses using data from 294 970 participants in UK Biobank, representing the largest cohort to date with measured serum 25(OH)D concentrations.

Study participants
UK Biobank is a large prospective cohort study with >500 000 participants aged 37-73 years recruited from 22 assessment centres across the UK between 13 March 2006 and 1 October 2009 with a goal to improve the prevention, diagnosis and treatment of diseases of middle and old age. 17 Participants filled in questionnaires to provide broad information on health and lifestyles at baseline survey and provided blood samples for biomarker and genetic assays. We restricted the analyses to unrelated individuals who were identified as White-British ancestry based on selfreport and genetic profiling 18

Serum 25(OH)D and CRP concentrations
Blood samples of participants were collected at the time of recruitment. Serum 25(OH)D concentration (nmol/L) was measured using the LIAISON XL 25(OH)D assay (DiaSorin, Stillwater, USA) and serum CRP concentration (mg/L) was measured using high-sensitivity immunoturbidimetric assay on a Beckman Coulter AU5800 19,20 (see Supplementary methods, available as Supplementary data at IJE online, for details). Since the distribution of serum CRP concentration is highly skewed, we natural-log-transformed serum CRP concentrations to facilitate analyses.

Genetic instrument for serum 25(OH)D concentration
We constructed a weighted genetic score (vitaminD-GS) consisting of 35 single-nucleotide polymorphisms (SNPs) to instrument serum 25(OH)D concentration (Supplementary  Table S1, available as Supplementary data at IJE online). All 35 SNPs are common genome-wide significant variants (minor allele frequency > 5%), discovered in a recent genomewide association analysis (GWAS) for serum 25(OH)D concentration in UK Biobank 21 and were replicated with a consistent direction and a P-value < 0.05 in the earlier GWAS by the SUNLIGHT consortium 22 (Supplementary Figure S2, available as Supplementary data at IJE online). Benefits of replication in the SUNLIGHT consortium are 2-fold. It ensures the robustness of the GWAS signals and also allows us to take weights for vitaminD-GS from an independent sample, avoiding bias arising from using internal weights. 23 VitaminD-GS was constructed by first computing the weighted average of the number of 25(OH)D-increasing alleles for an individual and then multiplying it by the number of available variants. The weight for each SNP was the effect estimate of the association of the SNP with serum 25(OH)D in the SUNLIGHT consortium. 22 As a sensitivity analysis, we constructed an alternative instrument using a broader set of SNPs consisting of 122 autosomal variants (vitaminD-GS-122) (Supplementary methods, Supplementary Figure S2 and Supplementary Table S1, available as Supplementary data at IJE online). Further, to minimize any potential influence by horizontal pleiotropy (via variants with no clear biological links to vitamin D metabolism), in particular by metabolic traits (including lipids traits 24 ), we included two additional instruments in the sensitivity analysis, which were constructed using only variants near/in genes related to vitamin D metabolism. A focused score 16 25 As an attempt to better represent the level of 'effective' 25(OH)D, we repeated analyses using a synthesis score 26,27 that only included variants from DHCR7 and CYP2R1, which directly contribute to substrate availability and synthesis of 25(OH)D (Supplementary methods, available as Supplementary data at IJE online).

Genetic instrument for serum CRP concentration
The genetic score for serum CRP concentration (CRP-gwasGS) was constructed using 46 genome-wide significant variants associated with serum CRP concentration, which were discovered in a recent GWAS meta-analysis with no overlap with UK Biobank. 28 Information on these 46 variants can be found in Supplementary Table S2 (available as Supplementary data at IJE online). CRP-gwasGS was constructed in the same way as vitaminD-GS with the weight for each SNP being the effect estimate of the association of the SNP with CRP in the aforementioned metaanalysis. 28 As a sensitivity analysis, we constructed an alternative instrument using four cis-acting variants 29 (Supplementary Table S2, available as Supplementary data at IJE online).

Statistical analysis
We performed linear and non-linear MR analyses to examine the genetic evidence for the bidirectional association between serum 25(OH)D and CRP concentrations. If a non-linear association was evident in the non-linear MR analysis, we also performed stratified MR analyses as a sensitivity analysis to gauge the robustness of the detected non-linearity. For the full analytical strategy, please see Supplementary Figure S3 (available as Supplementary data at IJE online). MR estimates of the linear, non-linear and stratified MR analyses were calculated using the genetic score-based approach (GS-based one-sample approach). For the linear and stratified analyses, where estimation via the SNP-based two-sample approach is also possible, we repeated the analyses applying five SNP-based two-sample methods as sensitivity analyses to gauge the robustness of the results to horizontal pleiotropy. Both GS-based onesample and SNP-based two-sample approaches are described in detail below in the context of linear, stratified and non-linear MR analyses.

GS-based one-sample analysis for linear and stratified MR
In the GS-based one-sample analysis, the MR estimate is computed using the ratio-of-coefficients method, 30 in which in the same sample (i.e. one-sample) GS-exposure and GS-outcome association estimates are computed and then taken as inputs to estimate the causal effect. For the linear MR analysis, the MR estimate is computed in the full sample, whereas in the stratified analysis, stratumspecific MR estimates are computed in the strata of residual exposure, which is defined as the exposure level minus the variation induced by the genetic score. Stratification is performed using residual exposure (rather than the raw exposure level) to avoid collider bias 31,32 that could potentially induce spurious associations between GS and outcome, and bias stratum-specific MR estimates.

Non-linear MR analysis
The non-linear MR is also a GS-based one-sample approach and for our analysis we used the fractional polynomial method to capture the non-linearity of the exposureoutcome association. 14 Briefly, the full UK Biobank sample was stratified into 100 strata using the residuals of exposure after regressing on the corresponding GS. Within each stratum, the localized average causal effect (LACE) was computed using the ratio-of-coefficients method, which is the ratio of the coefficient of the GS-outcome association estimate to that of the GS-exposure association estimate. Meta-regression of LACE against the stratum-specific mean exposure was then performed by fitting a range of fractional polynomial exposure-outcome models of Degrees 1 and 2, and the best-fitting model selected based on the likelihood ratio test. We report the fractional polynomial test for non-linearity in which the best-fitting fractional polynomial model of Degree 1 is compared against the linear model. 14 Non-linear MR analyses assume that the GS-exposure association is constant over the entire distribution of exposure. To test this assumption, we examined the heterogeneity of GS-exposure associations across 100 strata using the Cochran's Q test and trend test. 14 SNP-based two-sample analysis for linear and stratified MR We included five SNP-based two-sample methods in the linear and stratified MR analyses, including inverse-variance weighted (IVW), MR-Egger, weighted median, weighted mode and MR-Presso (Supplementary Figure  S3, available as Supplementary data at IJE online). All these methods use individual SNPs (instead of an aggregate genetic score as in the GS-based approach) and take SNP-exposure and SNP-outcome association estimates as inputs. It is important to note that SNP-exposure and SNP-outcome association estimates need to be taken from independent samples (i.e. two-sample) otherwise bias can be introduced. 33 The five SNP-based two-sample methods have largely independent assumptions on horizontal pleiotropy and a good agreement across these methods suggests a credible causal estimate (see Supplementary methods, available as Supplementary data at IJE online, for details). In the linear MR analysis, in which SNP-25(OH)D and SNP-CRP association estimates were all taken from UK Biobank, we applied the split-sample strategy to avoid bias due to overlapping samples. 33 More specifically, the full UK Biobank sample is randomly split into two subsamples of equal size (Samples A and B); the MR estimate is calculated for each sample (MR A and MR B ) and then combined to obtain the overall estimate (MR meta ) (Supplementary Figure S3, available as Supplementary data at IJE online). MR A is computed using SNP-outcome association estimates from Sample A and SNP-exposure association estimates from Sample B, whereas SNP-outcome association estimates from Sample B and SNP-exposure association estimates from Sample A are used to compute MR B . MR A and MR B are then combined in a fixed-effects metaanalysis to compute MR meta . In the stratified analyses, the stratum-specific MR estimate is computed using SNPoutcome association estimates from the stratum under consideration with SNP-exposure association estimates taken from the full sample leaving out the stratum under consideration (Supplementary Figure S3, available as Supplementary data at IJE online).
Given that serum 25(OH)D and CRP concentrations are continuous, all GS/SNP-exposure/outcome association estimates required for the linear, stratified and nonlinear analyses were computed by fitting linear regression models. All models were adjusted for age, sex, assessment centre, birth location, SNP array, top 40 genetic principal components and nuisance factors related to the measurement of serum 25(OH)D and/or CRP concentrations, including month in which blood sample was taken, fasting time before blood sample was taken and sample aliquots for measurement. 20 Adjustment of birth location and 40 genetic components is recommended to account for latent population structure in UK Biobank. 34 GS-based onesample linear and stratified analyses were performed using STATA, version 17.0 (StataCorp LP, College Station, Texas, USA). SNP-based two-sample analyses and non-linear MR analysis were conducted in R (version 4.0.2) using the TwoSampleMR (version 0.5.6) 35 and nlmr (version 2.0) 14 package, respectively.

Results
In total, 294 970 participants were included in the analysis (Supplementary Figure S1, available as Supplementary data at IJE online). The average 25(OH)D concentration was 50.0 nmol/L (range 10-340 nmol/L) and 11.7% (n ¼ 34 403) of the participants had concentrations of <25 nmol/L (Table 1). There were notable variations in serum 25(OH)D and CRP concentrations with respect to distributions of demographics, lifestyle, general health and socio-economic factors (Table 1). Serum 25(OH)D and CRP concentrations are inversely related in a dosedependent manner (P < 1.0E-300) ( Table 1).

Instrument validation for vitaminD-GS
VitaminD-GS was robustly associated with serum 25(OH)D concentration in UK Biobank, explaining 2.8% of variation (F-statistic ¼ 8646, P < 1.0E-300) (Supplementary Figure  S4, available as Supplementary data at IJE online). We examined the association of vitaminD-GS with serum 25(OH)D across 100 strata of residuals of serum 25(OH)D. We detected evidence for heterogeneity (P Cochran's Q < 1.0E-300; P trend ¼ 0.23), where vitaminD-GS-25(OH)D association in the 1st and 100th strata appeared to be outliers (Supplementary Figure S5, available as Supplementary data at IJE online). We found an association between the genetic instrument with serum low-density lipoprotein (LDL) and triglyceride concentrations (P < 0.001 for both). In contrast there was no evidence that vitaminD-GS was associated with other potential confounders in UK Biobank, including body mass index (BMI), smoking, alcohol intake, physical activity, education and Townsend deprivation index (uncorrected P 0.051 for all) (Supplementary Table S3, available as Supplementary data at IJE online). We also examined vitaminD-GS-confounder associations across 100 strata of residuals of serum 25(OH)D and found no evidence of association after accounting for multiple testing (Supplementary Figure S6, available as Supplementary data at IJE online).

Instrument validation for CRP-gwasGS
CRP-gwasGS explained 6.0% of variation in the serum log CRP concentration with a F-statistic of 18 687 (P < 1.0E-300) (Supplementary Figure S7, available as Supplementary data at IJE online). There was evidence of heterogeneity in the CRP-gwasGS-CRP association across 100 strata of residuals of serum CRP, with the 1st and 100th strata appearing to be the outliers (P Cochran's Q ¼ 2.66E-13, P trend ¼ 0.28) (Supplementary Figure S8,    data excluding local causal estimates from the outlying strata of residuals of serum 25(OH)D (i.e. the 1st and 100th strata, after exclusion P Cochran's Q ¼ 0.30; P trend ¼ 0.28), and this did not affect our findings (Supplementary Figure S10, available as Supplementary data at IJE online). Given the association between the vitamin-GS and blood lipids, we reanalysed the association adjusting for LDL and triglycerides, and restricting the instrument to variants that were not associated with any metabolic traits (a non-metabolic score) (Supplementary methods, available as Supplementary data at IJE online). Both approaches replicated the L-shaped association between 25(OH)D and CRP (Supplementary Figure S11, available as Supplementary data at IJE online) (P non-linear ¼ 1.4E-03 for LDL adjustment, P non-linear ¼ 0.057 for triglyceride adjustment and P non-linear ¼ 1.6E-03 for the non-metabolic score). We also conducted reanalyses of nonlinear MR using alternative instruments including vitaminD-GS-122 (P non-linear ¼ 7.7E-07) (Supplementary Figure S12, available as Supplementary data at IJE online), focused score (P non-linear ¼ 6.8E-10) (Supplementary Figure   S13, available as Supplementary data at IJE online) and synthesis score (P non-linear ¼ 3.3E-07) (Supplementary Figure  S14, available as Supplementary data at IJE online) and found similar results. We performed several sensitivity analyses to gauge whether the null association in the linear and non-linear MR analyses could be related to the study design. First, as there is some evidence that CRP-gwasGS is associated with BMI and physical activity in UK Biobank (Supplementary  Table S3, available as Supplementary data at IJE online), we re-performed linear and non-linear analyses adjusting for BMI, physical activity or both BMI and physical A-levels, Advanced levels; SD, standard deviation; Gmean: geometric mean; GSD: geometric standard deviation; Q, quartiles; cig, cigarette; TDI, Townsend deprivation index. a P-values have been adjusted for age, sex, assessment centre and nuisance factors that could affect serum 25(OH)D measurements, including month in which blood sample was taken, fasting time before blood sample was taken and sample aliquots for measurement. b Current smokers without information on types of tobacco that they smoke. activity, and found similar results (Supplementary Table  S4, available as Supplementary data at IJE online). Further, for the non-linear analysis, exclusion of local causal estimates from the outlying strata of residuals of serum CRP concentrations (i.e. the 1st and 100th strata, after exclusion P Cochran's Q ¼ 0.51; P trend ¼ 0.28) did not affect our findings (P non-linear ¼ 0.79) (Supplementary  Table S4, available as Supplementary data at IJE online). Reanalyses using instrument with four cis-acting variants also provided similar results (Supplementary Table S4, available as Supplementary data at IJE online).

Discussion
In this large-scale genetic analysis, we observed evidence for a causal effect of vitamin D status on CRP with no support for CRP as a determinant of 25(OH)D concentrations. The association between 25(OH)D and CRP was largely restricted to the deficiency range, where only individuals with low serum 25(OH)D concentrations have elevated serum CRP. The shape of the observed association supports the previously proposed threshold effect, 12 suggesting that correction of vitamin D deficiency in the affected individuals is likely to reduce systemic low-grade inflammation and potentially mitigate the risk or severity of chronic illnesses with inflammatory components. The earlier RCTs or MR studies have failed to provide evidence for an effect of vitamin D on CRP, 8,10,11 which seemingly contradicts our finding. However, if the causal effect of vitamin D is truly L-shaped and restricted to concentrations within the deficiency range as seen in our study, it would have been overlooked both by the existing supplementation trials and linear MR studies. Severe deficiency is relatively rare 36 and as it is unethical to subject participants to undue harm, supplementation trials often can only include individuals who are already vitamin D replete, 12 rendering the health effect of supplementation in the deficiency range largely unassessed. Previous linear MR studies 10,11 would have had limited statistical power to capture the threshold effect restricted to the deficiency range, which is indeed what was reflected in our linear MR analysis. We also found no evidence for an effect by genetically instrumented CRP on serum 25(OH)D concentration. This provides evidence against the notion that both molecules serve as acute phase reactants and that their association arises merely from confounding by inflammation. 37 Indeed, if inflammation truly drives low serum 25(OH)D concentrations, we would have expected genetically instrumented CRP values to be associated with serum 25(OH)D concentration. Overall, our bidirectional MR analyses suggest that rather than vitamin D acting as a bystander [where 25(OH)D-CRP association arises merely from confounding by inflammation 37 ], increasing 25(OH)D concentrations to alleviate a state of severe deficiency may help to mitigate the severity of inflammation. This said, it is important to note that these findings provide no support for a need to use high-dose vitamin D supplementation, as the observed benefits appeared to become largely saturated by the time 25(OH)D concentrations reach 50 nmol/L. It should also be noted that a higher vitamin D status may benefit some subpopulations or with respect to other disease outcomes, 38 which is an area warranting further investigation.
Vitamin D is a pro-hormone. Its anti-inflammatory property captured by our analysis could be mediated through its hormonal effect on vitamin D receptorexpressing immune cells, such as monocytes, B cells, T cells and antigen-presenting cells. 4 Indeed, cell experiments have shown that active vitamin D can inhibit the production of pro-inflammatory cytokines, including TNF-a, IL-1b, IL-6, IL-8 and IL-12, and promote the production of IL-10, an anti-inflammatory cytokine. 4,5 Further, the antiinflammatory effect also raises the possibility that having adequate vitamin D concentrations may mitigate complications arising from obesity and reduce the risk or severity of chronic illnesses with an inflammatory component, such as CVDs, diabetes, autoimmune diseases and neurodegenerative conditions, among others. 1 If the related effects are indeed true, given the high prevalence of serum 25(OH)D levels of <50 nmol/L across the world (40% in some European countries), 36,39-42 population-wide correction of low vitamin D status (e.g. by food fortification) could potentially be a cost-effective measure to reduce the burden of chronic disease. In fact, in linear MR analyses higher 25(OH)D concentrations have been associated with a lower risk of type 2 diabetes 43 and multiple sclerosis (a chronic inflammatory disease of the central nervous system), 44 with recent non-linear MR analyses providing evidence that correction of vitamin D deficiency can decrease the risk for CVDs 15  To the best of our knowledge, this is the first non-linear MR study to explore the bidirectional association between serum 25(OH)D and CRP concentrations. We used several strategies to ensure that our MR analyses are not affected by horizontal pleiotropy, where variants may influence the outcome through pathways other than through the exposure of interest. 9 First, we restricted our vitaminD-GS to 35 variants with robust replicated evidence for an association with serum 25(OH)D concentrations. Second, we confirmed that the L-shaped association of serum 25(OH)D and CRP was consistent between the non-linear and stratified MR analyses, with the inverse association of serum 25(OH)D and CRP in the deficiency range consistently observed across pleiotropy-robust methods with largely independent assumptions on the pattern of pleiotropy. Third, the L-shaped association of serum 25(OH)D and CRP was confirmed to be robust using a spectrum of alternative instruments, including when using a broader set of variants, when excluding variants related to lipids and other metabolic traits and when restricting the set of variants to those directly related to vitamin D metabolism. Despite these strengths, our study also has some limitations. Although CRP is a widely used inflammatory biomarker, it certainly cannot capture the full complexity of the immune system and hence investigation of more specific biomarkers (such as TNF-a and IL-6) is required to provide a more detailed understanding on the anti-inflammatory effects of hormonal vitamin D. We restricted our analysis to participants of White-British descent to minimize bias due to population stratification; however, this may limit the transferability of our findings to other ethnic groups. As with all MR studies, genetic instruments approximate the average effects over the life course and the true biological association between serum 25(OH)D and CRP may be more complex than that indexed in our study. With only a 5% response rate at the recruitment, UK Biobank is not representative of the general public in the UK 45 despite its large sample size. It is uncertain to what extent this selection could affect the non-linear MR analysis. However, given that risk factor-disease associations show close agreement between UK Biobank and nationally representative studies 46 and that an earlier publication from UK Biobank using the same non-linear MR approach has replicated the expected J-shaped association between BMI and mortality, 47 this lack of representativeness may not be affecting our findings.

Conclusions
In conclusion, using a large population-based cohort, we provide genetic evidence for an L-shaped association of serum 25(OH)D with CRP, suggesting that the benefit of increasing 25(OH)D is restricted to individuals with low vitamin D status. Our finding suggests that improving vitamin D status in the deficiency range could reduce systemic low-grade inflammation and potentially mitigate the risk or severity of chronic illnesses with an inflammatory component.

Ethics approval
The present study was conducted under UK Biobank application number 20175. The UK Biobank study was approved by the National Information Governance Board for Health and Social Care and North West Multicentre Research Ethics Committee (11/NW/ 0382).

Data availability
Data are available from UK Biobank for researchers who meet the criteria and gain approvals to access the research database from the UK Biobank access management committee at the University of Oxford.

Supplementary data
Supplementary data are available at IJE online.

Author contributions
E.H. conceived of the study and designed the research question; A.Z. analysed the data and wrote the first draft. Both authors interpreted the results, drafted and revised the manuscript critically for important intellectual content, and read and approved the final manuscript.

Funding
This study was financially supported by the National Health and Medical Research Council, Australia (grant number 1123603). The funder had no role in the design, implementation, analysis, interpretation of the data, approval of the manuscript and decision to submit the manuscript for publication.