Cohort Profile Cohort Profile : Genetics of Diabetes Audit and Research in Tayside Scotland ( GoDARTS )

Cohort Profile: Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) Harry L Hébert, Bridget Shepherd, Keith Milburn, Abirami Veluchamy, Weihua Meng, Fiona Carr, Louise A Donnelly, Roger Tavendale, Graham Leese, Helen M Colhoun, Ellie Dow, Andrew D Morris, Alexander S Doney, Chim C Lang, Ewan R Pearson, Blair H Smith and Colin NA Palmer* Division of Population Health Sciences, Pat Macpherson Centre for Pharmacogenetics and Pharmacogenomics, Health Informatics Centre Services, Ninewells Hospital & Medical School, University of Dundee, Dundee, UK, Institute of Genetics & Molecular Medicine and Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK

The prevalence of diabetes worldwide has been steadily increasing over the past 20 years. In 1997 it was estimated to be 124 million, 1 in 2015 it was estimated to be 415 million among 20-70 year olds, and this is expected to rise to 642 million by 2040. 2 In the UK, an estimated 4 million people have diabetes either diagnosed or undiagnosed. 2 This represents a significant burden on health care resources, 3 particularly given that type 2 diabetes (T2D) is associated with comorbidities including obesity, 4 cardiovascular disease, 5 chronic kidney disease 6 and neuropathy. 7 T2D is a complex disorder, caused by a combination of environmental and genetic factors. 8 Before the first genomewide association study (GWAS) was conducted for T2D in 2007, 9 very few genetic loci were known to be involved with T2D. However, linkage and candidate-gene association studies have often failed to replicate findings through lack of power and inadequate knowledge of the underlying biological pathways. 10,11 Diabetes Audit and Research in Tayside Scotland (DARTS) started in 1996 as a joint collaboration between the University of Dundee's Department of Medicine and Medicines Monitoring Unit (MEMO), three Tayside Health Care Trusts (at Ninewells Hospital and Medical School, Perth Royal Infirmary and Stracathro Hospital) and a group of Tayside general practitioners (GPs) with a special interest in diabetes care. 12 Initially supported by the Scottish Home and Health Department, the Wellcome Trust, the Robertson Trust and Tenovus Tayside, the aim of the study was to identify all patients with diabetes within the wider Tayside region, through electronic record linkage, in order to improve health care over and above that which was practical through existing general practice lists alone. In 1998, consenting patients within this electronic database were recruited to the Genetics of DARTS (GoDARTS) study and invited to provide a blood sample for DNA extraction, for research purposes. At the same time, they were invited to provide phenotypic data (clinical and lifestyle factors), through questionnaires and clinical examination. This resource was intended to help identify the relative contribution of specific genetic and environmental factors that are associated with disease onset, progression and response to treatment. 10,11 Who is in the cohort?
Patients from the Tayside region of Scotland (n ¼ 391 274 on 1 January 1996 12 ) were added to the DARTS register, for clinical purposes, through electronic record linkage on the basis of having diabetes mellitus according to primary and/or secondary care data sources. These included hospital diabetes clinics, mobile diabetes eye units, diabetes prescription databases, the Tayside regional biochemistry database and all diabetes-related hospital discharge records. This electronic record linkage technique has a sensitivity of 97% and is continually being updated, creating a longitudinal dataset of clinical data which is manually validated by a dedicated team of clinicians. 12 Patients with T2D, which comprises around 90% of all diabetes cases, were invited to participate in the GoDARTS study either at diabetes or eye screening clinics or through their GP.
For the pilot phase of the study (GoDARTS1), 2763 patients with T2D were recruited from December 1998 to October 2004. This phase was used to test recruitment processes and the ability to anonymously link patient clinical data from electronic records to the study, and was funded by Tenovus Tayside. As this was the primary aim of the pilot phase, only blood samples were taken at the point of recruitment and no baseline data were recorded.
From October 2004 to May 2009, a second collection (GoDARTS2) was undertaken as part of the Wellcome Trust United Kingdom Type 2 Diabetes Case-Control Collection (WTCCC). A total of 16 146 people were recruited in this phase, including 7989 patients with T2D and 8157 matched healthy controls. We initially invited five matched nondiabetic controls per case from the corresponding GP practice; however, after initial success, this was reduced to two controls per case and on average one of the invited controls accepted. This incidentally included 1292 patients with T2D who had already been recruited in the GoDARTS1 phase. Baseline clinical and lifestyle measurements (Table 1) were recorded for all patients recruited in GoDARTS2. From October 2009 until 2015, an extension to the WTCCC project was granted (GoDARTS3), with 1342 patients with T2D being recruited during this time. Some of these participants had also been recruited to GoDARTS1 (n ¼ 20), GoDARTS2 (n ¼ 513) or both (n ¼ 120), where baseline data did not exist or original DNA quality was poor ( Figure 1). This gives a current total GoDARTS cohort of 18 306 participants, 10 149 of whom have T2D ( 44.8% of the DARTS study, representing the diabetic population in Tayside) and 8157 of whom were healthy controls at baseline.
Currently the cohort is in the early stages of GoDARTS4, the fourth phase of the study. In this phase, recruitment is continuing through a number of initiatives including the Scottish Health Research Register (SHARE)/Scottish Diabetes Research Network (SDRN), the Genetics of SHARE (GoSHARE) and GoDARTS-Scotland. SHARE/SDRN is a register of patients in Scotland who want to participate in medical research and have provided consent for their electronic medical records to be used for research purposes [http://www.share-sdrn.org]. GoSHARE is a parallel project which additionally aims to get permission to collect spare blood from people attending for routine clinical tests at hospital or GP clinics, that would otherwise to go to waste after the necessary tests had been performed [http://www.goshare. org.uk/]. Since the aim is to involve everyone who is resident in the Tayside area, this will inevitably include people with T2D, and they will contribute to the GoDARTS study. GoDARTS Scotland is a sub-study specifically recruiting people who have been diagnosed with T2D in the past 2 years in order to study response to therapies, including metformin.
At the point of recruitment, all participants in GoDARTS provide, by invitation, informed consent for their data to be used for research purposes and explicit consent for use in collaboration with industry. This includes allowing their baseline data to be linked anonymously to individual patient medical records including laboratory data, hospital admissions and Scottish Care Information -Diabetes (SCI-Diabetes) data. SCI-Diabetes is a shared electronic patient record which can be accessed by health professionals and researchers to aid the treatment of diabetes patients in Scotland. In this way, longitudinal data can be accessed relating to routine diabetes management, for example glycosylated haemoglobin (HbA1c), fasting insulin and fasting glucose, as well as previous patient diagnoses including diabetic complications. Furthermore, 95% of patients have consented to being contacted for future studies, aiding research beyond T2D.
How often have they been followed up?
As patients attend a baseline clinic at recruitment, initial measurements are cross-sectional. However, the use of electronic record linkage, which automatically updates patient details and grants access to NHS data as far back as 1987, makes GoDARTS a longitudinal cohort. This is made possible through the use of the community health index (CHI) number, which is a unique numerical identifier issued to each patient on first registration with a GP or admission to a hospital in Scotland. Around 96% of the UK population are estimated to be registered with a GP. 13 The CHI is a 10-digit number consisting of six digits corresponding to the patient's date of birth (DDMMYY), two digits randomly generated, one digit corresponding to the patients gender (odd for males, even for females) and one check digit. The CHI number links to live databases which are constantly being updated, such as the Scottish Morbidity Record (SMR) providing data on primary and secondary diagnoses for patients discharged from hospital since 1980, the Tayside echocardiography database providing data on all echocardiograms performed at Ninewells Hospital since 1994, the General Registrar's Office providing mortality data since 1998, the biochemistry database listing all assays performed since 1981 and a database containing all prescriptions dispensed since 1989. This allows identification of an up-to-date record of every individual's health care processes and outcomes and linkage of corresponding datasets. An anonymization process converts the CHI into a study pro-CHI, to protect the identities and confidentiality of individuals while retaining the ability to link patient data across multiple datasets.

What has been measured?
For GoDARTS1, only blood samples were taken for DNA extraction as this was a pilot phase used to test the ability to link electronic health records anonymously to genetic data. For GoDARTS2 and GoDARTS3, participants completed a lifestyle questionnaire and consented to baseline measurements being recorded at recruitment (Table 1). In addition, during GoDARTS3 urine samples ( 80% of recruits) were taken for proteomic and metabolomics analysis and RNA ( 30% of recruits) was extracted from blood samples. The lifestyle questionnaire contains items relating to smoking history (present and past status, along with amount and age started where applicable), as well as level of physical activities in three common locations (work/education, travel and home life) over three different time periods in life (recently, past 10 years and youth). In addition, women were asked about their menopausal history. Baseline observations were recorded and included height, weight and waist measurements, as well as heart rate and blood pressure. The patient's recruitment information was recorded including ethnicity, screening location, confirmation of T2D and medication history, as well as family history of diabetes and whether the patient had previously participated in GoDARTS. As baseline data were only recorded for participants recruited in GoDARTS2 and GoDARTS3, there are 1451 participants who were only involved in GoDARTS1 ( Figure 1) and do not have these data. Furthermore, there are 17 healthy control participants from GoDARTS2 who are missing baseline data, meaning that a total of 16 838 patients have these available, including 8698 cases and 8140 controls. As well as phenotypic data, genetic data are available for 8564 T2D cases ( Figure 2) and 4586 controls ( Figure 3) after quality control. Samples have been genotyped across five platforms. GWAS data have been obtained for 7857 T2D cases and 1108 controls, using the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina HumanOmni Express. The Affymetrix GWAS chip contains 932 979 single nucleotide polymorphisms (SNPs), and the Illumina GWAS chip contains 731 296. This has allowed for imputation of additional and missing genotypes by SHAPEIT 14 and IMPUTE2 15 using the 1000 Genomes reference panel. 16 In addition, 707 T2D cases and 3478 controls have been genotyped using custom genotyping arrays from Illumina. These include the Immunochip, Cardio-Metabochip (Metabochip) and Human Exome array. The Immunochip contains   196 524 genetic markers from loci that have previously been associated with at least one of 13 autoimmune diseases, including T1D, 17 and the Metabochip contains 196 725 SNPs from loci that have prior evidence of association with T2D, coronary artery disease/myocardial infarction and 21 related traits. 18 The specific criteria by which markers on the Cardio-Metabochip and the Immunochip have been chosen makes these platforms a cost-effective means of replicating and fine-mapping known loci and discovering novel loci by virtue of overlapping biological mechanisms between the related traits. The Human Exome Array contains 247 870 genetic markers from across the exome, allowing for studies to focus on identifying protein-altering variants. 19 What has it found?
Baseline and follow-up epidemiology Baseline clinical and demographic statistics are summarized in Table 2. Overall, 53.33% of the cohort are male, which is similar to the proportion represented in DARTS (52.83%), with the proportion being higher in cases (56.38%) compared with controls (50.08%). The majority of the cohort are Caucasian (99.70%) and the median age at recruitment was higher in cases (67 years) compared with controls (60 years). This observation is also apparent when the cases and controls are further dichotomized into males (66 vs 62 years) and females (68 vs 58 years). The cohort contains data on a number of continuous traits known to be associated with T2D. For example, median body mass index (BMI) (30.6 vs 26.6 kg/m 2 ), resting heart rate (1st ¼ 73 vs 68 bpm), creatinine (89 vs 87 mmol/l) and triglyceride (1.880 vs 1.315 mmol/l) levels were all higher in cases compared with controls. Furthermore, there was a higher proportion of past smokers among those with T2D (63.14% vs 53.56%).
As of 2014, mortality data have shown that the number of deaths at 9 years after recruitment was 2587 out of 10 149among the cases and the Kaplan-Meier survival probability is 70.0%, whereas among the controls the number of deaths was 851 out of 8157 and the Kaplan-Meier survival probability is 88.2% (Figure 4). Control group mortality data do not go beyond this, as recruitment of controls did not begin until GoDARTS2 (approximately 7 years after the start of GoDARTS1); however, the number of deaths after 16 years among the cases was 2941 (out of 10 149) with a Kaplan-Meier survival probability of 53.5%.
According to SCI-Diabetes data, the number of people initially recruited as controls at baseline, but who went on to develop diabetes, is 650 (out of 8157) and the Kaplan-Meier cumulative incidence probability is 8.3% ( Figure 5). Also captured were self-reported physical activity data, and these can be seen to successfully stratify the effect of the fat mass and obesity-associated protein gene (FTO) risk allele, rs9939609, where the genetic association with BMI is largely attenuated in active individuals, as has been observed in large meta-analyses ( Figure 6).

Research output
Over 100 studies have been published using GoDARTS either as the primary study cohort or as part of a larger meta-analysis or replication. The following is a summary of important studies, in all of which GoDARTS has been involved. A full and up-to-date list of studies can be found at [www.researcherid.com/rid/K-9448-2016] (Researcher ID: K-9448-2016).
GoDARTS began in the pre-GWAS era, with its first study being published in 2002. 20 At this time candidategene studies were conducted, and these were particularly successful in replicating associations at the peroxisome proliferator-activated receptor (PPAR) transcription factor family, which had previously been associated with a range of phenotypes including T2D. In particular, two variants (rs1801282 and rs3856806) in PPARG were shown to have opposing effects on body weight, with the nonsynonymous rs1801282 (Pro12Ala) associated with lower BMI and the synonymous rs3856806 (C1431T) associated with higher BMI. 20 This provided an explanation for previous inconsistencies found between this gene and BMI. Similar findings were reported with respect to T2D susceptibility 21 and myocardial infarction, 22 with haplotype analysis confirming the protective effect of rs1801282 with these phenotypes in contrast to the risk effect of rs3856806. In addition, rs1801282 was shown to have an opposing effect to another polymorphism, rs10865710 (C681G), with respect to height and weight in pre-pubertal children. 23 Further studies identified similar attenuations in the related genes PPARA with myocardial infarction, 24 and replicated an association at PPARD with reduced adult height. 25 In 2007, the GoDARTS study became part of the UK T2D genetics consortium collection (UKT2DGCC), which formed the main replication cohort for the WTCCC. This involved large-scale studies investigating quantitative traits including height and obesity, in addition to T2D. One of the first published studies using the UKT2DGCC found an association in the FTO gene with obesity. This effect was observed from age 7 years and older. 26 Another related study identified associations at the MC4R gene with BMI and childhood obesity among 7-11-year-olds. 27 GoDARTS was involved in the seminal publications describing the identification of many genes for type 2 diabetes, including the first descriptions of CDKAL1 and CDKN2A/CDKN2B variants and risk for diabetes [28][29][30][31][32] and also the original report of the association of the HMGA2 gene with height. 33,34 Smaller-scale candidategene studies have also been conducted to good effect in T2D, as demonstrated by the identification of the WFS1 gene. 35 T2D-related traits have also been studied with 10 novel loci associated with fasting glucose levels, 29,36 IGF1 influencing fasting insulin 29 and GIPR associated with 2-h glucose levels after oral glucose challenge. 32 In addition to GWAS of anthropometric traits and T2D susceptibility, GoDARTS has pioneered the use of electronic medical records for the study of drug efficacy. This was initially used to study response to sulphonylureas 37 and statin lipid-lowering agents. 38 As part of the WTCCC2 consortium, GoDARTS served as a discovery cohort for the GWAS of response to statins and metformin. 39 The metformin analysis found a novel association of glycaemic response to metformin at a locus including the ataxia telangiectasia mutated (ATM) gene, which provided new clues to the mechanism of action of this mysterious drug. 40,41 More recently, the GoDARTS cohort was the main discovery cohort for a large Metformin Genetics Consortium study (MetGen) that showed a variant in SLC2A2 to be consistently associated with altered glycaemic response to metformin. 42 Other investigated  phenotypes in GoDARTS include response to thiazolidinediones 43 and sulphonylureas, 37,44 and adverse reactions to statins, metformin and angiotensin-converting enzyme (ACE) inhibitors. 40,45 Linkage of the GoDARTS study to the Tayside echocardiography database has allowed the identification of genetic variants associated with left ventricular hypertrophy, 46 as well as the association of both high ( > 10%) and low ( < 6%) HbA1c levels with risk of incident heart failure. 47 From around 2012, the focus of studies has switched from genome-wide analysis to more targeted approaches. One of these methods involves genotyping with the Metabochip. This has been influential in the discovery and fine-mapping of loci in diabetes-and cardiovascular disease-related phenotypes. In particular, 17 novel T2D loci 48,49 have been discovered and a further 39 genetic regions have undergone variant localization and genomic annotation to identify the causative mutations. 50 Similar success has been seen in coronary artery disease, with 25 loci being identified. 51,52 Glycaemic and anthropometric traits have also been studied successfully, with loci being identified with fasting insulin, fasting glucose, height and obesity. [53][54][55][56][57] Another targeted approach that has been increasingly used is exome sequencing and follow-up studies using the Illumina Exome chip. This has been used to capture rare disease-associated variants lying within the protein coding region of the genome, which are hypothesized to make up much of the missing heritability in common diseases. Due to their protein-altering nature, disease-associated polymorphisms discovered in these regions are likely to be causative, and the biological pathways can be more easily elucidated than other study methods. Exome chip genotyping in GoDARTS contributed to the demonstration that, loss-of-function variants in APOC3 58 and ANGPTL4 59 are simultaneously associated with lower triglyceride levels and reduced risk of coronary artery disease (CAD). Similarly, inactivating mutations in NPC1L1 have a  protective effect on CAD risk and reduced low-density lipoprotein (LDL) cholesterol level. 60 Whereas other phenotypes related to glycaemic traits, 61,62 lipids 63 and adiponectin 64 have successfully identified loci, a large exome sequencing/exome chip study has revealed that coding variation does not play a large role in T2D susceptibility. 65 In more recent years, GoDARTS has been used in Mendelian randomzsation studies to establish the causal relationship between a variable and disease. This has been used to demonstrate a role for sex hormone-binding globulin 66 and to rule out a causal effect between circulating triglycerides and adiponectin in T2D. 67,68 In addition, adiposity and adiponectin have been implicated in moderating CVD risk. 69,70 Other studies have been conducted, and a more comprehensive list of disease phenotypes can be found in Table 3.

What are the strengths and weaknesses?
The main strengths of GoDARTS are: its large size [including 10 149 participants with T2D ( 51% of people with T2D in Tayside) and 8157 controls]; the availability of rich genetic and phenotypic data; the ability to link patient genetic and baseline data to routine electronic medical records; and the existing consent for use of these for research and for future contact for possible research participation, especially the potential for recruitment by genotype studies. Consent for future contact has allowed further studies to take place. One of these is DOLORisk, an EU Horizon 2020-funded project that will be re-phenotyping participants for neuropathic pain and related traits, to identify possible risk factors [http://dolorisk.eu/]. As a result, the GoDARTS cohort is rich in longitudinal phenotypic data, such as biochemistry, prescribing, morbidity and demography. In addition the linkage of the study to participants' individual electronic medical records, which are constantly updated, means the cohort is not limited by loss-to-follow up bias that can beset other longitudinal studies. This linkage is made possible through the use of the CHI number which allows patient data to remain anonymous. The availability of a large range of clinical and demographic data allows a large range of diabetes-related phenotypes to be investigated, on both genome-wide and more targeted scales, and also provides a means to control for common confounders such as BMI, smoking, age and blood pressure. The large number of samples recruited at baseline (8697 T2D cases and 8141 controls) provides the  ACEi, angiotensin-converting enzyme inhibitor; ARMD, age-related macular degeneration; CAD, coronary artery disease; LDLc, low-density lipoprotein cholesterol; T2D, type 2 diabetes. statistical power with which to identify genetic variants conferring susceptibility to disease, both as a stand-alone cohort and as part of meta-analyses. Furthermore, RNA has been collected from patient blood samples, which will enable gene expression studies to be conducted in the future.
A weakness of the GoDARTS cohort is the missing baseline data of GoDARTS1 patients who were not subsequently recruited again in GoDARTS1 or 2. This is due to GoDARTS1 being a pilot phase of the study, and consequently baseline data were not collected at this stage. However, linkage is still possible for these samples. Patients were also not necessarily recruited at the point of diagnosis, which creates heterogeneity in the effects of disease duration on the serum samples obtained. Another weakness concerns the lifestyle questionnaire that was administered to patients. As this was self-completed by the patients, physical activity, smoking and female menopausal history are subject to recall bias.
Where can I find out more?
Further information about this cohort can be found at [http://diabetesgenetics.dundee.ac.uk/]. Access to the dataset is available to researchers worldwide and access requests, together with the general management of the resource, are handled by the Access Group. More details on the application and collaboration process can be found at [http://diabetesgenetics.dundee.ac.uk/Community.aspx].

Profile in a nutshell
• GoDARTS was set up in 1998 in order study the genetics underpinning type 2 diabetes (T2D) susceptibility, diabetes complications and patient response to therapy.
• The study is a branch of the pre-existing Diabetes Audit and Research in Tayside Scotland study, which was set up to identify all patients within the Tayside area with diabetes, through electronic record linkage, to provide better care over and above existing registries.