Assessing the impact of alcohol consumption on the genetic contribution to mean corpuscular volume

Abstract The relationship between the genetic loci that influence mean corpuscular volume (MCV) and those associated with excess alcohol drinking is unknown. We used white British participants from the UK Biobank (n = 362 595) to assess the association between alcohol consumption and MCV, and whether this was modulated by genetic factors. Multivariable regression was applied to identify predictors of MCV. GWAS, with and without stratification for alcohol consumption, determined how genetic variants influence MCV. SNPs in ADH1B, ADH1C and ALDH1B were used to construct a genetic score to test the assumption that acetaldehyde formation is an important determinant of MCV. Additional investigations using Mendelian randomization and phenomewide association analysis were conducted. Increasing alcohol consumption by 40 g/week resulted in a 0.30% [95% confidence interval CI: 0.30–0.31%] increase in MCV (P < 1.0 × 10−320). Unstratified (irrespective of alcohol intake) GWAS identified 212 loci associated with MCV, of which 108 were novel. There was no heterogeneity of allelic effects by drinking status. No association was found between MCV and the genetic score generated from alcohol metabolizing genes. Mendelian randomization demonstrated a causal effect for alcohol on MCV. Seventy-one SNP-outcome pairs reached statistical significance in phenomewide association analysis, with evidence of shared genetic architecture for MCV and thyroid dysfunction, and mineral metabolism disorders. MCV increases linearly with alcohol intake in a causal manner. Many genetic loci influence MCV, with new loci identified in this analysis that provide novel biological insights. However, there was no interaction between alcohol consumption and the allelic variants associated with MCV.


Introduction
Alcohol misuse and abuse is a leading cause of morbidity and mortality (1). In 2016, global statistics suggested that 5.1% (∼3 million) of deaths and 5.3% (∼133 million) of disability-adjusted life years were caused by the harmful use of alcohol (2). Early identification of individuals who are misusing alcohol is critical for interventions to stop progression towards alcohol dependence and alcohol-related end-organ damage. Unfortunately, a history from a patient is not always reliable, and laboratory tests vary in their diagnostic accuracy, availability and usage. In clinical practice, therefore, it is usual to use a combination of history (including alcohol intake and symptoms consistent with organ damage), physical examination (to look for features of organ damage) and laboratory markers that support alcohol misuse as the underlying aetiology. The most widely used laboratory tests are (a) liver function tests (in particular, gammaglutamyl transferase), which indicates liver damage, and (b) the mean corpuscular volume (MCV), a measure of the mean size and volume of erythrocytes, which is a non-specific marker of alcohol misuse (3).
The molecular basis for the increase in MCV that occurs with alcohol misuse is incompletely understood. A study of 105 alcohol-dependent individuals, 62 moderate drinkers and 24 abstainers was able to show that the increase in MCV was dosedependent (4). Alcohol may have direct haematotoxic effects by interfering with cell structure and erythrocyte stability (5). Interestingly, the levels of acetaldehyde, a metabolite of alcohol, show a significant increase inside erythrocytes of alcohol-dependent individuals (6). Since acetaldehyde is a toxic metabolite, and can bind to proteins, this may lead to erythrocyte damage directly or through an immune-mediated mechanism via the development of anti-acetaldehyde adduct antibodies (4). Folate deficiency also occurs with alcohol misuse, particularly in those patients with liver disease (7), and therefore may be implicated in increasing the MCV (8).
It is important to note, however, that MCV is a non-specific biomarker in that other factors such as age, smoking, malnutrition and underlying diseases, including hypothyroidism, liver disease and pernicious anaemia, are also known to affect MCV (7,9,10). There is also a genetic contribution to the MCV, in addition to the genetic factors that have been identified for other haematological indices (11,12). Genetic factors have also been implicated in high alcohol consumption-our recent study showed genome-wide significant effects across six loci following meta-analysis in two large independent cohorts (13). The relationship between the genetic loci that determine the MCV and those associated with excess alcohol drinking is, however, unknown.
In this study, we have used genome-wide association study (GWAS) data from the UK Biobank (UKB) to understand if genetic variants and level of alcohol consumption interact to influence MCV. The specific aims were as follows: (1) determine how genetic loci influence MCV, with and without stratification for alcohol consumption; (2) confirm the causal effect of alcohol consumption on MCV using Mendelian randomization and (3) explore the association between acetaldehyde accumulation and MCV using genotype data from alcohol metabolizing genes.

Alcohol consumption and MCV
Alcohol consumption was associated with higher MCV (P < 1.0 × 10 −320 ). Increasing alcohol consumption by 5 units (40 g) per week resulted in a 0.3% increase in MCV. Variation by drinking status was evident; compared with light drinkers (reference group), zero drinkers had 0.9% lower mean values, while moderate and heavy drinkers had 1.1 and 2.8% higher mean values, respectively (all P < 1.0 × 10 −320 ; Table 1). Results from multivariate analysis for MCV were consistent in terms of direction and magnitude when those classified as teetotal were removed (Supplementary Material Table S1).

GWAS of MCV
An unstratified (i.e. irrespective of alcohol intake) GWAS in white British individuals identified 212 loci associated with MCV at P < 5 × 10 −8 (Fig. 1). Presented P-values are corrected based on a LD score regression intercept = 1.20. The large sample size (n = 362 595) resulted in identification of variants with small effect sizes, equivalent to a change in MCV of 0.057% (Table 2). There was evidence to suggest that lower minor allele frequency was associated with larger effect sizes (Fig. 2). The largest effect size was observed with rs144861591 [effect allele frequency (EAF) 0.076; P = 3.4 × 10 −640 ], where the minor allele (T) was associated with an increase in MCV by 1.11%. This variant is located ∼13.5 bp downstream of LOC108783645, an HFE antisense RNA. HFE itself is involved with iron regulation and has been associated with haemochromatosis (14). Strong associations were also reported in loci mapping to HBS1L-MYB, TMPRSS6, CCND3, CARMIL1, ODF3B and CCDC162P (11,12,(15)(16)(17)(18). We compared (using SNP ID and reported gene symbol) our findings to those of other equivalently sized genome-wide studies of MCV in UKB (11,19) and found that 58.0% (n = 123) of our loci were unique, likely due to targeted study of MCV and trait-specific covariate control whereas the cited studies explored multiple traits. Investigation in the GWAS catalog (https://www.ebi.ac.uk/ gwas/) found replication with a further 15 mapped genes, leaving 108 new findings ( Table 2). The SNP-based heritability of MCV was estimated to be 24.2% through LD score regression.

GWAS of MCV stratified by alcohol intake
Analysis of the heterogenous effects between individuals with different alcohol intakes found that no variants reached the threshold for statistical significance (P < 2.4 × 10 −4 ). SNP rs218264 was the closest to this threshold at P = 5.2 × 10 −4 , although both the low and heavy drinking groups showed significant associations with this variant (Table 3). No variants reached genome-wide significance (P < 5 × 10 −8 ) when exploring the heterogeneity of allelic effects between the different  drinking groups. Specific assessment of the alcohol metabolizing pathway found no evidence of an alcohol-related association between MCV and either the ADH or ALDH SNPs (Supplementary Material, Table S2).

Allele score for alcohol metabolism pathway
The genetic score used as a proxy for acetaldehyde accumulation rate/speed of clearance in drinkers only was independent of confounding factors (i.e. covariates included in multivariate model).
The frequencies of the effect alleles contributing to the allele score were as follows: rs1229984_T = 0.021; rs698_T = 0.588 and rs2228093_T = 0.121. We found no evidence for an association between MCV and the allele score (P = 0.53). There was however evidence that the allele score was associated with alcohol intake (P < 2 × 10 −16 ). Categorization of the allele score demonstrated that this relationship with alcohol consumption was dose-dependent (negative direction), and thus, the score can be considered valid given current knowledge of alcohol metabolism and its relationship with intake (Supplementary Material, Table  S3).     [12,37] or in the GWAS catalog (https://www.ebi.ac.uk/gwas/) to be associated with MCV.

Phenome-wide analysis
We performed Phenomewide association analysis (PheWAS) to detect whether the variants implicated in MCV might impact other diseases or clinically relevant phenotypes. This showed that the SNPs contribute to a range of different diseases, with 71 SNP-outcome pairs reaching P < 4.8 ×10 −7 (Supplementary Material, Table S4). The most consistent outcomes were observed for ICD-10 chapter IV codes, including disorders of mineral metabolism and disorders of lipoprotein metabolism and other lipidaemias. There was also strong evidence from three SNPs for a shared risk with neoplasms of the skin. Thyroid-related disorders were also found in two SNPs (rs2134814, rs592229), with evidence for both underand overactive thyroid diagnoses. The G allele in rs2134814 was associated with increased MCV and hypothyroidism, while the T allele in rs592229 was associated with decreased MCV and hyperthyroidism. Other outcomes included diabetes, multiple sclerosis, hypertension, varicose veins and rheumatoid arthritis.

Mendelian randomization
Mendelian randomization analysis demonstrated a significant causal effect of alcohol consumption on MCV. Each copy of the effect allele at rs1229984 in ADH1B was associated with a 0.19 decrease in drinks per week in the work by Jorgenson et al. (20) and was also found to reduce MCV by 0.18 femtoliters (fL) (SE = 0.002; P = 0.002). However, the addition of rs7686419 (KLB) returned a null outcome with evidence of effect heterogeneity, although the effect size of rs7686419 for drinks per week was approximately 6-fold smaller than rs1229984 (20).

Discussion
In the largest study undertaken to date, we have shown, as would be expected, that alcohol was clearly associated with an increase in MCV in a dose-dependent manner. However, the effects of alcohol on MCV were largely independent of genetic architecture, despite the association of MCV with genetic variation at 212 autosomal loci. Our analysis using Mendelian randomization provides evidence of a causal relationship between alcohol intake and MCV. However, we demonstrated a lack of association between alcohol metabolizing genes and MCV using a genetic score approach. Taken together, these findings support MCV as a marker of alcohol use disorder, although lack of specificity remains a substantial barrier in predictive accuracy and therefore clinical utility (3). The strengths of this study are as follows: (1) the large sample size for GWAS analyses, (2) post-GWAS analysis including fixed effect inverse-variance weighted meta-analysis to generate heterogeneity statistics, (3) the use of a mixed-model approach in GWAS to account for relatedness and maximize sample size and (4) use of allele scores to explore the functional consequences of alcohol metabolizing gene variants as a proxy for acetaldehyde accumulation. There are, however, several limitations. First, the alcohol measures were based on self-report. The accuracy of self-report alcohol consumption has been questioned due to under-coverage compared with sales data (21). Second, we restricted our analysis to those of white British ancestry to limit population structure variability on the outcomes. This limits generalizability of our findings to other ethnic groups. Third, we did not undertake formal replication of findings, but our top GWAS outcomes are consistent with those reported elsewhere (11,12,(15)(16)(17)(18). Finally, we considered including folate in our models. However, folate levels were not measured in the UKB and the prevalence of folate deficiency anaemia was low (<0.002%).
The large sample size of the UKB enabled the detection of genetic variants with small effect sizes. The replication of findings in loci such as HBS1L-MYB, TMPRSS6 and CCND3, which have been identified in previous GWAS for MCV (11,12,(15)(16)(17)(18), supports the validity of our outcomes. Indeed, many of the low-frequency variants with smaller effect sizes were reported in an analysis of 36 blood cell traits (11). However, we also identified 108 new loci associated with MCV providing new biological insights. We observed associations between MCV and several loci involved in DNA modification through binding and/or processing alterations (e.g. ZNF165, TAF6, ZBTB38, ZKSCAN5, SPIDR). It is known that impaired DNA synthesis delays cell division resulting in macrocytosis (22). Of the new loci identified, rs13191659 (VN1R12P/LINC00240) has been associated with total iron binding capacity in Hispanics (23); DPP8 has been suggested as a candidate gene for mean corpuscular haemoglobin (MCH) in Europeans (24) and was identified as part of an LD block at 15q22.3 containing IGDCC4-DPP8-PTPLAD1-C15orf44-SLC24A1-DENND4A for MCH in Japanese (25); OBFC1 and MEGF11 have been associated with MCH but not MCV (26,27); PAK2 has been reported to have a role in eryptosis of erythrocytes, and therefore the effect of PAK2 on red blood cell indices might be greater than previously recognized (28); LDB1 influences erythrocyte development by the protein product acting as a cofactor for transcription factor complexes with, for example, Gata1, Tal1, E2A and Lmo2 (29). Indeed, the critical requirement for LDB1 during early-stage erythropoiesis has been demonstrated in rodent models (30). Furthermore, several of our lead SNPs were missense variants, including rs1047891 (EAF 0.684; P = 1.1 × 10 −20 ) (CPS1) alongside more well-described MCVassociated SNPs such as rs855791 (EAF 0.440; P = 3.0 × 10 −610 ) (TMPRSS6) and rs3811444 (EAF 0.667; P = 1.1 × 10 −81 ) (TRIM58). rs1047891 is in the 3 untranslated section of CPS1, a region reported to play a key role in glycine and serum homocysteine metabolism. Allelic variation in rs1047891 has been associated with various cardiometabolic traits (31,32) and lower platelet count (33). The substitution at this SNP (T->N; p.Thr1412Asn) increases enzymatic activity and influences nitric oxide production (34), an important mediator of vascular function. MCV has been reported to be an independent predictor for cardiovascular events (35) and rs1047891 variation is therefore a potential pathway for this relationship.
Stratification of participants by drinking status did not identify any loci that determined the effect of alcohol intake on MCV. This suggests that the pathways through which alcohol influences MCV are not mediated by genetic variation. This was supported by the causal inference for alcohol on MCV levels when using rs1229984 as a proxy for alcohol consumption in the Mendelian randomization analysis. However, the discriminatory power of MCV in identifying heavy alcohol use is modest given that alcohol accounts for only ∼65% of MCV values above 100 fL (36). In addition, the turnover of erythrocytes is around 120 days meaning that recently abstinent individuals will present with evidence of alcohol consumption for several months.
Using a genetic score to define alcohol metabolism, we did not find evidence to support that acetaldehyde accumulation is important in determining MCV levels. This is contrary to the findings in Asians for MCV (37) and other alcohol-related liver function in Europeans (38). The lack of association with MCV is likely to be due to the fact that rs1229984 (ADH1B) is rare in Europeans and the ubiquitous presence of active ALDH2, the enzyme primarily involved in the rapid metabolism of acetaldehyde to acetate (39). Similar results to our own for ALDH gene polymorphisms were reported in a study of 510 white alcohol-dependent patients (40).
The PheWAS analysis showed SNP level pleiotropy for variants involved in MCV suggesting a shared genetic risk with a number of conditions. Many of these combinations have strong physiological connections with one another (e.g. mineral metabolism disorders and liver disease). The association between MCV and thyroid dysfunction is well described, with thyroid hormones being essential for erythropoiesis (41). Indeed, we found evidence to support the relationship between hypothyroidism and increased MCV (42) alongside hyperthyroidism and decreased MCV (43). Our findings suggest that some pathways, as mediated by rs2134814 (BACH2) and rs592229 (SKIV2L), convey shared genetic architecture for MCV and thyroid dysfunction. Other findings offer additional insight in areas of ongoing investigation such as the association between psoriasis and red blood cell deformability (44).
In summary, we have demonstrated that the impact of alcohol consumption on MCV is independent of allelic variation and provided new biological insights into the genetic loci determining MCV itself. The role of acetaldehyde, although likely important in determining MCV, is difficult to measure in Europeans due to rare variation in alcohol metabolizing genes. Interindividual variability in MCV in the setting of moderate to heavy alcohol consumption is likely to be due to a complex (and at present incompletely understood) interaction between genetic factors, underlying medical conditions and lifestyle factors.

Materials and Methods
A complete description of the methods can be found in the Supplementary Material.

UKB
The UKB is a large population cohort of ∼502 000 individuals from the United Kingdom aged 40-69 years at recruitment. Only white British participants were included in this study. Ethical approval for the UKB was gained from the Research Ethics Service (reference: 17/NW/0274), and written informed consent was obtained from all participants. Analyses were conducted under approved application 15110.

Alcohol consumption
Questions from the UKB baseline assessment were used to estimate alcohol consumption. We applied a standardized number of UK alcohol units to each drink to enable estimation of the number of units per week, as described previously (13).

MCV measurement
Components of full blood counts were measured in UKB participants using clinical haematology analysers at the centralized processing laboratory of the UK Biocenter (Stockport, UK). Full information on the protocol can be found elsewhere (45).

Multivariable analyses for predictors of MCV
MCV was natural log-transformed to normalize the distribution of residuals. Multivariable linear regression was applied to identify predictors of MCV. Analyses examined alcohol consumption as both a continuous and categorical predictor of MCV. All multivariable analyses were adjusted for age, sex, smoking status, history of hypothyroidism and vitamin B12 deficiency, and individuals with liver disease were removed due to the interaction between alcoholic liver disease risk and macrocytosis (7). Models were rerun with those reporting zero alcohol consumption removed.

Genetic analyses
In July 2017, UKB released genetic information (directly typed and imputed genotypes) for 487 406 individuals to approved collaborators. Genotyping, quality control and imputation were performed centrally by UKB and have been described previously (46).

GWAS analysis.
Autosomal genetic association analysis was conducted for ln(MCV) using a linear mixed model in BOLT-LMM v2. 3.4 (47), adjusted for genotyping array and covariates outlined in multivariable analyses plus alcohol consumption in units/week as a continuous variable. Distance-based clumping was used for defining loci. Genomic control adjustments were applied for standard errors and P-values.

Heterogeneity of allelic effects by drinking group.
Variants reaching P < 5 × 10 −8 and surviving distance-based clumping (i.e. lead SNPs) were explored for heterogeneous outcomes based on drinking category. GWAMA was used to run a fixed effect inverse-variance weighted meta-analysis on outcomes and generate heterogeneity statistics for allelic effects between groups, which is equivalent to fitting an interaction term (48). Any variant reaching the Bonferroni-corrected threshold (P < 0.05/'number of lead SNPs from unstratified GWAS') was considered statistically significant.

Impact of genetic score for acetaldehyde on MCV.
To test the assumption that acetaldehyde is important in MCV, we used genotype data for SNPs in ADH1B, ADH1C and ALDH1B to construct a genetic score. The SNPs rs1229984 (ADH1B), rs698 (ADH1C) and rs2228093 (ALDH1B) were used to generate an unweighted allele score based on number of ADH alleles increasing the metabolism of ethanol to acetaldehyde and the number of ALDH alleles slowing the metabolism of acetaldehyde to acetate. This score (0-6) was used as a continuous predictor alongside covariates previously outlined in multivariable analyses. The selected variants were independent (r 2 < 0.01 for all SNP pairs).
Mendelian randomization. MR-Base v0.4.21 was used for performing Mendelian randomization to explore the causal relationship between alcohol consumption and MCV (51). The causal estimates between exposure and outcome were obtained using the two-sample Mendelian randomization inverse variance-weighted method.
Results are reported using STROBE guidelines. A checklist can be found in the Supplementary Material.