Molecular genetic contributions to self-rated health

Abstract Background: Poorer self-rated health (SRH) predicts worse health outcomes, even when adjusted for objective measures of disease at time of rating. Twin studies indicate SRH has a heritability of up to 60% and that its genetic architecture may overlap with that of personality and cognition. Methods: We carried out a genome-wide association study (GWAS) of SRH on 111 749 members of the UK Biobank sample. Univariate genome-wide complex trait analysis (GCTA)-GREML analyses were used to estimate the proportion of variance explained by all common autosomal single nucleotide polymorphisms (SNPs) for SRH. Linkage disequilibrium (LD) score regression and polygenic risk scoring, two complementary methods, were used to investigate pleiotropy between SRH in the UK Biobank and up to 21 health-related and personality and cognitive traits from published GWAS consortia. Results: The GWAS identified 13 independent signals associated with SRH, including several in regions previously associated with diseases or disease-related traits. The strongest signal was on chromosome 2 (rs2360675, P = 1.77 x 10-10) close to KLF7. A second strong peak was identified on chromosome 6 in the major histocompatibility region (rs76380179, P = 6.15 x 10-10). The proportion of variance in SRH that was explained by all common genetic variants was 13%. Polygenic scores for the following traits and disorders were associated with SRH: cognitive ability, education, neuroticism, body mass index (BMI), longevity, attention-deficit hyperactivity disorder (ADHD), major depressive disorder, schizophrenia, lung function, blood pressure, coronary artery disease, large vessel disease stroke and type 2 diabetes. Conclusions: Individual differences in how people respond to a single item on SRH are partly explained by their genetic propensity to many common psychiatric and physical disorders and psychological traits.

The CHARGE Aging and Longevity working group analysis of the longevity phenotype was funded through the individual contributing studies. The working group thanks all study participants and study staff.

CHARGE-Cognitive working group
General cognitive function data were obtained from the CHARGE Cognitive working group.

CHIC
Childhood cognitive ability data were obtained from the CHIC consortium.

DIAGRAM
Type 2 diabetes data were obtained from the DIAGRAM consortium.

Genetic Consortium for Anorexia nervosa
Anorexia nervosa data were obtained from the Genetic Consortium for Anorexia nervosa.
GIANT BMI data were obtained from the GIANT consortium.

International Consortium for Blood Pressure (ICBP)
Blood pressure data were provided by ICBP.

International Genomics of Alzheimer's Project (IGAP)
Alzheimer's disease data were obtained from (IGAP) Material and methods International Genomics of Alzheimer's Project (IGAP) is a large two-stage study based upon genomewide association studies (GWAS) on individuals of European ancestry. In stage 1, IGAP used genotyped and imputed data on 7 055 881 single nucleotide polymorphisms (SNPs) to meta-analyse four previously-published GWAS datasets consisting of 17 008 Alzheimer's disease cases and 37 154 controls (The European Alzheimer's disease Initiative -EADI the Alzheimer Disease Genetics Consortium -ADGC The Cohorts for Heart and Aging Research in Genomic Epidemiology consortium -CHARGE The Genetic and Environmental Risk in AD consortium -GERAD). In stage 2, 11 632 SNPs were genotyped and tested for association in an independent set of 8572 Alzheimer's disease cases and 11 312 controls. Finally, a meta-analysis was performed combining results from stages 1 & 2. Only stage 1 data were used for LD Score regression and polygenic risk score analyses.

Acknowledgments
We thank the International Genomics of Alzheimer's Project (IGAP) for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. IGAP was made possible by the generous participation of the control subjects, the patients, and their families. The i-Select chips was funded by the French National Foundation on Alzheimer's disease and related disorders. EADI was supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2 and the Lille University Hospital. GERAD was supported by the Medical Research Council (Grant n° 503480), Alzheimer's Research UK (Grant n° 503176), the Wellcome Trust (Grant n° 082604/2/07/Z) and German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant n° 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by the NIH/NIA grant R01 AG033193 and the NIA AG081220 and AGES contract N01-AG-12100, the NHLBI grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by the NIH/NIA grants: U01 AG032984, U24 AG021886, U01 AG016976, and the Alzheimer's Association grant ADGC-10-196728.

METASTROKE
Ischaemic stroke data were obtained from the METASTROKE consortium. The METASTROKE consortium is supported by NINDS (NS017950). We thank all study participants, volunteers, and study personnel that made this consortium possible. The METASTROKE study consists of combined data from 15 GWAS of IS (12 389 cases vs 62 004 controls). We used TOAST criteria17 to classify IS as large artery stroke (LAS) (2167 cases/49 159 controls from 11 studies), cardioembolic stroke (CE) (2365 cases/ 56,140 controls from 13 studies), and small vessel disease (SVD) (1894 cases/51 976 controls from 12 studies). METASTROKE studies consisted of independently performed genome-wide single nucleotide polymorphism (SNP) genotyping using standard technologies and imputation to HapMap release 21 or 22 CEU phased genotype18 or 1000 Genome reference panels. Investigators contributed summary statistical data from association analyses using frequentist additive models for metaanalysis after application of appropriate quality control measures. Polygenic scores reveal combined effects of multiple nonsignificant variants derived from a derivation sample and tested in an independent replication sample. We derived polygenic scores for multiple p value cutoffs (0·5, 0·25, 0·1, 0·05, 0·01, 0·001, and 0·0001) in derivation samples.

Psychiatric Genetics Consortium
Schizophrenia, bipolar disorder, major depressive disorder and ADHD data were obtained from the Psychiatric Genetics Consortium.

Social Science Genetic Association Consortium
Years of education data were obtained from the Social Science Genetic Association Consortium.

SpiroMeta/CHARGE Pulmonary
Lung function data were obtained from the SpiroMeta and CHARGE Pulmonary group.

The Genetics of Personality Consortium
Neuroticism data were obtained from the Genetics of Personality consortium.

Polygenic profiling procedure
The genetic data files (.map and .ped files) supplied from Biota (the UK Biobank online repository) were unsuitable for use in the polygenic profile analyses as the .ped allele coding used a 1, 2 numeric allele encode rather than the standard ACGT encode format. In order to enable the analysis, the .ped files were recoded to the standard encode format. To achieve this, a bespoke programme was developed to create new files using a lookup-substitution method. A fast-in-memory lookup string hash table was created to hold the SNP-ID, along with the allele identifiers for the SNP. A simple loop then performed serialised lookups based on string position, to create an associated string with the correct ACGT encode. This was then appended to the six mandatory data fields extracted from initial string. In order to maximise performance and enable timely completion of the lookupsubstitution, these loops were run in parallel threads in a standard multiprocessor environment.

Sensitivity analysis
To test whether FDR significant associations between polygenic risk for coronary artery disease, and Verbal-numerical reasoning and educational attainment were confounded by individuals diagnosed with cardiovascular disease, 2779 individuals who had had a heart attack and 2521 individuals with angina were removed from the regression analysis. Similarly 5800 individuals with diabetes (type 1 or type 2) were removed from the regression investigating an association between polygenic risk for type 2 diabetes and educational attainment. Finally, 26 912 individuals with hypertension were removed from the regression analysis investigating the association between polygenic risk for systolic blood pressure and educational attainment.
Supplementary Figure 1. Regional plots showing the conditional analyses of the genome-wide significant regions with evidence of multiple signals. Analyses were conditioned on the most significant SNP in each region. The circles represent individual SNPs, with the colour indicating pairwise linkage disequilibrium (LD) to the most significant SNP from the original analysis (calculated from 1000 Genomes Nov 2014 EUR). The solid blue line indicates the recombination rate and -log10 P-values are shown on the y-axis. Figure 2. Enrichment analysis for general cognitive function using the 52 functional categories in the baseline model. The enrichment statistic is the proportion of heritability found in each functional group divided by the proportion of SNPs in each group (Pr(h 2 )/Pr(SNPs)). The dashed line indicates no enrichment found when Pr(h 2 )/Pr(SNPs) = 1. Statistical significance is indicated by asterisk. FDR correction indicates significance at 0.00638. Figure 3. Enrichment analysis for general cognitive function using the 10 cell-type specific categories in the baseline model. The enrichment statistic is the proportion of heritability found in each functional group divided by the proportion of SNPs in each group (Pr(h 2 )/Pr(SNPs)). The dashed line indicates no enrichment found when Pr(h 2 )/Pr(SNPs) = 1. Statistical significance is indicated by asterisk. FDR correction indicates significance at 1.68x10-7

Supplementary
Supplementary