UK Biobank: opportunities for cardiovascular research

Cardiovascular diseases are a major cause of morbidity and mortality, accounting for 45% of all deaths in European countries in 20161 and almost a third of deaths worldwide in 2013.2 A similar pattern is observed in the UK where cardiovascular diseases were responsible for 27% of deaths in 2014, with coronary heart disease resulting in the largest number of deaths attributable to a single cause (n = ∼69 000) whilst stroke is the third biggest cause (n = ∼39 000).3 Although age-standardized cardiovascular disease mortality rates are decreasing worldwide, the total deaths and burden as measured through disability-adjusted life years of cardiovascular diseases are increasing.4,5 Furthermore, in the UK, cardiovascular risk factors such as high blood pressure and high cholesterol are among the leading causes of disease burden.6 
 
Epidemiological studies have historically played an essential role in identifying the causes and consequences of cardiovascular disease and have resulted in improvements in prevention and treatment. The seminal US-based Framingham Heart Study which recruited 5200 participants between 1948 and 1952, was integral in identifying a range of important risk factors for cardiovascular disease, such as high blood pressure, a high cholesterol level, cigarette smoking, obesity and physical inactivity, and consequently shifted the focus from management to preventative strategies for cardiovascular disease.7 This, together with findings from other epidemiological studies, such as the Seven Countries Study and the MONICA project,8 have been influential in leading to treatments for the primary and secondary prevention of cardiovascular events, most notably statins (that act to lower cholesterol levels), and anithypertensives.9,10 
 
Epidemiological studies such as the Framingham Heart Study with moderate sample sizes are useful in detecting risk factors with large effects on common outcomes; however, they lack statistical power to reliably identify risk factors which have small to moderate effects or to assess associations with disease across subgroups of the population. The need for large sample sizes has led to collaborative efforts, such as the Prospective Studies Collaboration (an individual participant meta-analysis of data from 61 studies and more than a million participants11) that has demonstrated conclusively that a continuous increase in blood pressure corresponds with an increased risk of vascular death across all age groups (see Figure ​Figure11 that illustrates the importance of a large sample size (about 500 000 participants) for detecting this association).12 Sample size is also of particular importance in the current era of genome-wide association studies, where many investigations are aiming to detect either small effects from common variants or large effects from rare variants.13 
 
 
 
 
 
 
Open in a separate window 
 
 
Figure 1 
 
 
Absolute risk of ischaemic heart disease mortality by usual systolic blood pressure and age at risk in 5000, 50 000, and 500 000 participants. Unpublished figure containing data from the Prospective Studies Collaboration, obtained through personal communication. CI, confidence interval; IHD, ischaemic heart disease.


Introduction
Cardiovascular diseases are a major cause of morbidity and mortality, accounting for 45% of all deaths in European countries in 2016 1 and almost a third of deaths worldwide in 2013. 2 A similar pattern is observed in the UK where cardiovascular diseases were responsible for 27% of deaths in 2014, with coronary heart disease resulting in the largest number of deaths attributable to a single cause (n = 69 000) whilst stroke is the third biggest cause (n = 39 000). 3 Although age-standardized cardiovascular disease mortality rates are decreasing worldwide, the total deaths and burden as measured through disability-adjusted life years of cardiovascular diseases are increasing. 4,5 Furthermore, in the UK, cardiovascular risk factors such as high blood pressure and high cholesterol are among the leading causes of disease burden. 6 Epidemiological studies have historically played an essential role in identifying the causes and consequences of cardiovascular disease and have resulted in improvements in prevention and treatment. The seminal US-based Framingham Heart Study which recruited 5200 participants between 1948 and 1952, was integral in identifying a range of important risk factors for cardiovascular disease, such as high blood pressure, a high cholesterol level, cigarette smoking, obesity and physical inactivity, and consequently shifted the focus from management to preventative strategies for cardiovascular disease. 7 This, together with findings from other epidemiological studies, such as the Seven Countries Study and the MONICA project, 8 have been influential in leading to treatments for the primary and secondary prevention of cardiovascular events, most notably statins (that act to lower cholesterol levels), and anithypertensives. 9,10 Epidemiological studies such as the Framingham Heart Study with moderate sample sizes are useful in detecting risk factors with large effects on common outcomes; however, they lack statistical power to reliably identify risk factors which have small to moderate effects or to assess associations with disease across subgroups of the population. The need for large sample sizes has led to collaborative efforts, such as the Prospective Studies Collaboration (an individual participant meta-analysis of data from 61 studies and more than a million participants 11 ) that has demonstrated conclusively that a continuous increase in blood pressure corresponds with an increased risk of vascular death across all age groups (see Figure 1 that illustrates the importance of a large sample size (about 500 000 participants) for detecting this association). 12 Sample size is also of particular importance in the current era of genome-wide association studies, where many investigations are aiming to detect either small effects from common variants or large effects from rare variants. 13 The causes of cardiovascular disease involve a complex interplay between predisposing genetic factors and lifestyle, environmental, and health-related exposures. Furthermore, cardiovascular risk factors are likely involved in the development of non-cardiovascular diseases, such as Alzheimer's disease. 14 Large prospective studies that collect an extensive range of exposures before the subsequent development of disease are essential in order to gain novel insights into the causes and consequences of cardiovascular (and non-cardiovascular) diseases. Although there are many cohort studies with large sample sizes and biological samples, they generally consist of less comprehensive data collection (see Table 1 for overview of major studies). In order to address this, UK Biobank was established as a prospective cohort study that combines a large sample size with a very wide range of data on exposures and outcomes in order to improve the prevention, diagnosis and treatment of diseases of middle and old age, including cardiovascular diseases such as heart disease and stroke. [15][16][17]

UK Biobank
Between 2006 and 2010, half a million participants aged 40-69 years who lived within 25 miles of one of the 22 assessment centres located throughout England, Wales and Scotland were recruited into UK Biobank. At the assessment centres, participants provided electronic signed consent, answered touchscreen and verbal interview questions on sociodemographic, lifestyle, environmental, and healthrelated factors, completed a range of physical measures and provided blood, urine, and saliva samples (see Table 2 for further detail on measures collected at baseline). Once recruitment was fully underway, further enhancements were introduced to the assessment visit, with large subsets of the cohort undergoing a range of eye measures, heel bone ultrasound, an electrocardiograph test, pulse wave velocity, and a hearing test. A large amount of the data collected at baseline has direct relevance to cardiovascular disease and health, including, but not limited to, self-reported information on medications and health conditions, family history of cardiovascular disease, measures of arterial stiffness, blood pressure, cardiorespiratory fitness, body size, and body fat.
The cohort is not representative of the general population (e.g. participants are more likely to live in less socio-economically deprived areas and have lower death rates than the general population), so is unsuitable for estimating disease prevalence and incidence rates. However, it is well-designed to reliably detect generalizable associations between most baseline characteristics and health outcomes due to the sufficiently large numbers of participants across the full distribution of exposures. For example, whilst the number of current smokers is low in UK Biobank compared with the general population, there are sufficiently large numbers of smokers to detect the association with various diseases.

Enhancements to data collection
The UK Biobank resource is continuously being enhanced through additional phenotyping (see Table 3 for further detail on ongoing data collection). The samples from all 500 000 participants are currently being assayed for a range of selected biomarkers, many of which have been implicated in the development of cardiovascular disease (e.g. cholesterol, direct low-density lipoprotein, high-density lipoprotein, lipoprotein (a), triglyceride, apolipoprotein A, apolipoprotein B, and C-reactive protein).
Genotyping of 820 000 single nucleotide polymorphisms and insertion-deletion markers has been performed using a bespoke genome-wide array, designed collaboratively by a group of leading academics and Affymetrix, with centralized quality control and imputation to >70 million variants. The genotyping array includes thousands of markers that are involved in cardiometabolic processes and blood pressure regulation, rare variants associated with cardiac disease and those involved in the absorption, distribution, metabolism, and excretion of drugs ( Figure 2). 18 This data will enable researchers to explore the genetic determinants of cardiovascular disease, conduct Mendelian randomization experiments to identify potentially causal effects and investigate gene-environment interactions.
Subsets of the cohort are invited to have a repeat assessment every few years (the first of which was performed during 2012-13 on 20 000 participants) to allow for correction for regression dilution bias caused by measurement error or intra-individual changes in exposures and biomarkers. 19 UK Biobank is undertaking multimodal imaging in 100 000 participants. 20 Imaging measures relevant for cardiovascular research include cardiac magnetic resonance imaging (MRI), which measures the left and right ventricles and atrium, aorta and aortic valve   Figure 3 e.g. of cardiac MRI images performed in UK Biobank), 21 an ultrasound of the carotid arteries 22 and a resting 12-lead electrocardiogram (ECG). 23 The other imaging modalities include an MRI of the brain and body and a whole body dual-energy X-ray absorptiometry (DXA) of the bones and joints. These modalities also capture information of relevance to cardiovascular health, e.g. white matter lesions on T2-weighted brain MRI scans as well as fat distribution from liver and pancreatic MRI scans. Analytical pipelines are being set up so that derived phenotypes, such as detailed measures of brain structure and function, body fat distribution and cardiac function can be made available for researchers. The imaging assessment began in 2014 with a pilot study of 6000 participants and is now being expanded to three assessment centres over the next few years. Imaging data will be released in tranches every 6-12 months as it becomes available so researchers can continuously refresh their analyses throughout the course of their project without having to wait until the end of the imaging study (2020 or later).

. (see
Data are also being collected from web-based questionnaires that focus on the collection of more detailed information on exposure (such as dietary habits and occupational history) and of outcomes that are difficult to ascertain through electronic health records (such as cognitive function, mental health, irritable bowel syndrome, pain, and quality of life).

Capturing health outcomes through data linkage
A major advantage of UK Biobank is that all participants at recruitment were registered with a general practitioner in the National Health Service (NHS) and consented to linkage to their healthrelated records. The NHS provides nationwide healthcare in the UK and keeps detailed records of health-related information. As a result, UK Biobank can follow-up all participants' health outcomes through linkage to a range of national datasets. Currently, data from national death and cancer registries and hospital inpatient records are available whilst efforts are underway to integrate data from primary care, screening programmes, and disease-specific registries (See Table 4 for more detail on health-records).
Linking to a wide range of health record datasets will provide a rich amount of data. However, it also introduces the daunting challenge of harmonizing information across multiple sources to produce reliable and valid outcomes. In order to ensure effective use of the resource, UK Biobank aims to develop scalable approaches not only for the ascertainment of the many thousands of many different health-related outcomes that will occur during prolonged follow-up (as well as those that occurred prior to recruitment) but also for their sub-classification and detailed characterization. This will be achieved through a staged approach of initially ascertaining cases using linked data, then increasing the accuracy of diagnoses by cross-referencing multiple sources of information, starting with the ascertainment of health outcomes using lower-cost linked health-related data sources, 24,25 and proceeding to more intensive methods (e.g. retrieval of imaging or laboratory data) for validation and sub-classification.
Expert-led health outcome subgroups are guiding the development and testing of approaches for a wide range of conditions (including cardiac diseases, stroke, diabetes, musculoskeletal, neurodegenerative, and mental health disorders, renal and eye diseases). These adjudicated outcomes will provide an invaluable resource for researchers interested in investigating cardiovascular disease, or non-cardiovascular diseases in the context of cardiovascular research, but who do not have the time or expertize to derive these outcomes themselves. Alternatively, researchers who are interested in developing their own algorithms to enhance the classification of diseases or extract useful information from the medical records have access to an increasing diversity of linked data.

UK Biobank and implications for cardiovascular health and disease research
Observational studies tend to focus on either collecting a diverse range of measures from a small number of participants or less detailed information on a large number of participants. In contrast, UK Biobank combines a large sample size with extensive phenotypic and genotypic data as well as ongoing follow-up of participant's health through linkage to electronic medical records. The obvious advantage of conducting research on a single large well-phenotyped population is that the assays and measurements have been performed in a standardized way as opposed to integrating data from multiple studies consisting of widely varying participant characteristics and different methodologies. The unprecedented depth and breadth of data available in UK Biobank offers unparalleled opportunity to address a wide range of research questions related to cardiovascular health outcomes. Classification and sub-classification of diseases can be enhanced through combining the diverse phenotypic and genotypic data with medical records. The large sample size enables researchers to perform risk stratification on well-defined phenotypes to focus on high-and low-risk populations for cardiovascular disease, e.g. those with the lowest and highest levels of circulating lipid levels. Additionally, mechanistic pathways between risk factors and outcomes can be explored using the genetic, biomarker and imaging data. UK Biobank is already the largest-ever multimodal imaging study; previous studies that have incorporated cardiovascular imaging have usually included a few thousand participants, which has limited the potential research opportunities available. 20 UK Biobank will provide sufficient statistical power to investigate imaging derived phenotypes in association with a range of incident health outcomes, as well as the interplay between the heart, body and brain in determining disease risk.  UK Biobank has also included a range of physical measures to complement self-reported information which is prone to various biases, e.g. the collection of objective physical activity on 100 000 participants using accelerometers, allowing the quantification of the type and amount of physical activity in association with cardiovascular health, as well as the relationship with other factors such as sedentary behaviour, obesity, and body fat as measured through imaging.
The prospective nature of UK Biobank as well as the large sample size has enabled a large number of incident events to be captured through cohort-wide follow-up, including 40 000 incident cancers, 14 000 deaths and 1.3 million hospitalizations, a substantial number of which are attributable to cardiovascular disease. By end of March 2015, there were 5800 incident cases of myocardial infarction and 3600 incident cases of stroke using an adjudicated algorithm that  Accelerometry 100 000 participants wore an Axivity AX3 tri-axial wrist accelerometer for a 7-day period. Derived summary data on duration and intensity of activity available.

2013-15 2015
Multi-modal imaging MRI of brain, heart and body, carotid ultrasound and whole body DXA scan of bones and joints for 100 000 participants. incorporates self-report, death and hospital inpatient data. When primary care data are made available for the full cohort, it is anticipated that the number of cases will increase by 10-15% for myocardial infarction and 50% for stroke. This linkage will not only aid the ascertainment of certain conditions underdiagnosed in a hospital inpatient setting, but will also provide information on laboratory and physical measurements, referrals, and prescriptions. The large number of events allows the exploration of well-powered exposure-outcome associations as well as the development and/or validation of risk prediction models. 26 Accessing UK Biobank data UK Biobank is an open-access resource which any bona fide researcher can apply to use (without the need for collaboration with UK Biobank scientists), to conduct health-related research that is in the public good. Applications to use the UK Biobank resource can be made from researchers from the academic, commercial, charity, or public sector, from either the UK or internationally, with access being granted on a non-preferential and non-exclusive basis. Once researchers have registered with UK Biobank (www.ukbiobank.ac.uk), they can submit an application, which involves a brief description of the scientific rationale, aims, methodology and expected value of the project. Researchers select the required data-fields through the data showcase (www.ukbio bank.ac.uk/data-showcase/), which provides complete information on each variable available, how it was collected and the univariate distribution of participants across categories. Following approval by the Access Sub-Committee, researchers are required to sign a Material Transfer Agreement before downloading the data. Applications can be for data only, sample requests or proposals to re-contact participants. The main requirement for data only applications is that the scope of the application can be clearly defined. Applications can be hypothesis-driven or hypothesis generating, involve a range of phenotypic/genotypic data or be focused on developing novel methods. Applications that request biological samples are subject to higher levels of scrutiny due to the depletable nature of the resource; researchers therefore need to provide a strong scientific justification together with assay details and sample requirements. UK Biobank also welcomes proposals to recontact participants for participation in other research studies, although these are also carefully scrutinized and their implementation needs careful management to avoid over-burdening participants.
Researchers are required to publish their results in an academic journal or an open source publication site (e.g. bioRxiv) and to return their findings (i.e. the underlying code used to generate the findings and any derived variables that were generated as part of the research) to UK Biobank so that these can be made available to share with other researchers.

Research interest and output
Between April 2012, when UK Biobank was opened for research use, and April 2017, 4600 researchers had registered to use the resource, >880 applications had been submitted and 430 projects were ongoing. Since 2013, there has been a three-fold increase in applications, with a particular increase from international researchers (9% in 2013, 23% in 2014, 44% in 2015 and 59% in 2016), predominantly from the USA (16% of total applications) and mainland Europe (14% of total applications), reflecting increasing global awareness of UK Biobank as a major resource for health-related research. All approved research projects, including a short description of their objectives, can be found in the following searchable database: http://www.ukbiobank.ac.uk/approved-research/. More than 100 applications are focused on 'cardiovascular disease', which is, to date, the most common health outcome of research interest in UK Biobank. The vast majority of applications have been 'data only' requests (>95%), although projects are now underway that have requested biological samples (e.g. for exome sequencing or copy number variant measurement) and, to a lesser extent, to recontacting participants to invite them to join other research studies.
The resource is still in its relatively early stages as regards research output, but the number of publications and conference abstracts based upon UK Biobank data are steadily increasing (http://www. ukbiobank.ac.uk/published-papers/). By January 2017, more than 130 peer-reviewed journal articles had been published, including several within the area of cardiovascular research. These have mainly involved cross-sectional associations between traditional risk factors and cardiovascular disease. [27][28][29] However, as more incident cases accrue, research will begin to take advantage of the prospective nature of the cohort.
Genotyping data for the first 150 000 participants became available in November 2015, and results using this data are beginning to emerge. For example, a recently published study found that a subset of 'favourable adiposity' alleles associated with higher likelihood of adiposity were in turn associated with a lower risk of hypertension and heart disease. 30 Several genome-wide association studies have also identified novel genetic variants linked with blood pressure phenotypes. 31,32 The genotyping data for all 500 000 participants will be released during 2017, and a corresponding increased interest from the research community in using the genetic data is expected.

Conclusion
Observational cohort studies have been essential in informing the prevention and treatment of cardiovascular diseases and identifying the role of cardiovascular risk factors in disease development. However, previous cohorts have either been too small to investigate less common diseases or lacked the depth of data to explore the complex interplay between different factors and cardiovascular disease risk. UK Biobank combines a large sample size of half a million participants with an unprecedented amount of phenotypic and genotypic data as well as ongoing linkage to health records. This open-access resource provides researchers worldwide with the opportunity to address a wide variety of novel research questions with the aim of improving the prevention, treatment and diagnosis of cardiovascular disease.