Genetic and environmental risk factors for rheumatoid arthritis in a UK African ancestry population: the GENRA case–control study

Abstract Objectives. To evaluate whether genetic and environmental factors associated with RA in European and Asian ancestry populations are also associated with RA in African ancestry individuals. Methods. A case–control study was undertaken in 197 RA cases and 868 controls of African ancestry (Black African, Black Caribbean or Black British ethnicity) from South London. Smoking and alcohol consumption data at RA diagnosis was captured. Genotyping was undertaken (Multi-Ethnic Genotyping Array) and human leukocyte antigen (HLA) alleles imputed. The following European/Asian RA susceptibility factors were tested: 99 genome-wide loci combined into a genetic risk score; HLA region [20 haplotypes; shared epitope (SE)]; smoking; and alcohol consumption. The SE was tested for its association with radiological erosions. Logistic regression models were used, including ancestry-informative principal components, to control for admixture. Results. European/Asian susceptibility loci were associated with RA in African ancestry individuals. The genetic risk score provided an odds ratio (OR) for RA of 1.53 (95% CI: 1.31, 1.79; P = 1.3 × 10 −7). HLA haplotype ORs in European and African ancestry individuals were highly correlated (r = 0.83, 95% CI: 0.56, 0.94; P = 1.1 × 10 −4). Ever-smoking increased (OR = 2.36, 95% CI: 1.46, 3.82; P = 4.6 × 10 −4) and drinking alcohol reduced (OR = 0.34, 95% CI: 0.20, 0.56; P = 2.7 × 10 −5) RA risk in African ancestry individuals. The SE was associated with erosions (OR = 2.61, 95% CI: 1.36, 5.01; P = 3.9 × 10 −3). Conclusion. Gene–environment RA risk factors identified in European/Asian ancestry populations are relevant in African ancestry individuals. As modern statistical methods facilitate analysing ancestrally diverse populations, future genetic studies should incorporate African ancestry individuals to ensure their implications for precision medicine are universally applicable.


Introduction
RA is a complex disease resulting from environmental exposures in genetically predisposed individuals [1]. Many genetic and environmental RA risk factors have been identified in European and Asian ancestry individuals [25]. Their generalizability to other ancestral populations is uncertain. The benefits of establishing RA risk factors include facilitating the identification of novel therapeutic targets [5] and risk prediction modelling [6].
Although RA causes significant health-care burdens in Africa [7], few studies have assessed RA risk factors in Africa or African ancestry groups. Relevant Africa-based research comprises several small casecontrol studies (including <60 cases) [810]. Two Cameroonian studies showed the shared epitope (SE) was three times commoner in RA cases [9], but a genetic risk score (GRS) of 28 European RA risk single nucleotide polymorphisms (SNPs) was not associated with RA [8]. A Senegalese study reported increased RA risk in HLA-DR3 and HLA-DR10 carriers [10]. RA susceptibility factors in African Americans are better characterized, with smoking and the SE being established risk factors [11]. To date, no studies have examined RA risks in UK-based African ancestry individuals. As UK, USA and Africa-based African ancestry populations are ethnically different (using different self-reported ethnicity classifications), and are likely to differ genetically (with varying degrees of genetic admixture), it is crucial to evaluate RA risks in African ancestry individuals living in the UK.
We evaluated whether geneenvironment RA risk factors identified in European and Asian ancestry populations are relevant to African ancestry UK individuals. Our novel casecontrol study (comprising 197 RA cases and 868 controls of Black African, Black Caribbean and Black British ethnicity) tested whether: the HLA locus, 99 genome-wide loci, smoking, and alcohol consumption were associated with RA in this population. We also compared HLA region contributions to RA risk in African and European ancestry individuals, and evaluated associations between the SE and radiological erosions.

Guidelines
The STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines have been devised to strengthen transparency in the analysis and reporting of observational studies. We adhered to the STROBE checklist for study reporting [12].

African ancestry cohort
African ancestry cases were from the GENetics of RA in individuals of African ancestry (GENRA) study [13]. GENRA recruited 212 patients (January 2011 to February 2015) from rheumatology clinics in four South London hospitals (Guy's, King's College, St George's and Lewisham). Inclusion criteria comprised an RA diagnosis fulfilling the 1987/2010 ACR criteria, and Black African, Black Caribbean or Black British self-reported ethnicity [14]. Patients were assessed once, recording ethnicity, disease characteristics (year diagnosed, medications, serology, extra-articular features) and outcomes (disease activity, disability, quality of life). Patients were asked about current and previous smoking, including time-points for starting/stopping smoking, cigarettes/day (or alternatives) and breaks from smoking. Alcohol intake was assessed by asking if they were current, previous or never drinkers; alcohol intake (units/day) at diagnosis was collected. Hand and feet radiographs were evaluated for erosions; if unavailable, documentation of erosions in the clinical notes was captured. Serum and DNA was extracted from whole blood.
African ancestry controls (n = 887) were included from the South London Ethnicity and Stroke Study (SLESS) [15]. Inclusion criteria comprised being of Black Caribbean or Black African ethnicity, and being free of clinical cerebrovascular disease. SLESS control recruitment was by random selection from primary care lists in St George's, Guy's and St Thomas', and King's College Hospital catchment areas between 1999 and 2012; emailing St George's University of London/St George's Hospital staff; and affixing posters publicizing the study in local leisure centres, primary care surgeries, churches and communities centres. Using population controls from the same catchment area as cases reduced selection bias risk. Lifestyle habits were captured by self-completed questionnaire, checked and completed, if necessary, by a clinician interviewing participants. Smoking was assessed by asking: Do you smoke? and Are you an exsmoker? Alcohol intake was assessed by asking: How many alcohol units do you drink per week?

European ancestry cohort
We included European ancestry cases from the Combination Anti-Rheumatic Drugs in Early RA (CARDERA) genetics cohort and controls from the Wellcome Trust Case Control Consortium 2 (WTCCC2) to compare the contribution of the SE to RA risk between African and European ancestry groups. Both cohorts have been described previously [16,17]. CARDERA comprises www.rheumatology.oxfordjournals.org 524 patients with early, active RA enrolled in two clinical trials. WTCCC2 controls are from the 1958 Birth Cohort.
Genotyping GENRA and SLESS were genotyped together on the Illumina Multi-Ethnic Genotyping Array, a multi-ethnic platform with >1.7 million markers [18]. Quality control (QC) and imputation procedures are described in supplementary Table  S1, available at Rheumatology Online. Post-QC 197 cases and 868 controls were available. Principal components (PCs) were calculated, combining GENRA/SLESS with individuals from 1000 Genomes phase 3 populations using smartpca (EIGENSTRAT) on an LD-pruned subset of data [19]. Data were imputed to the 1000 Genomes phase 3 reference set using IMPUTE2 (ver. 2.3.0); 4-digit HLA alleles were imputed using HLA*IMP:02 (multi-ethnic reference panel, imputing 4digit HLA alleles with 84% accuracy in Africans) [20].
CARDERA cases and WTCCC2 controls were genotyped separately on the Illumina Immunochip [21]. QC was initially performed on each dataset separately and subsequently after merging (using the same procedures as for GENRA/SLESS). PC analysis (PCA) was performed using smartpca (EIGENSTRAT) on an LD-pruned subset [19]. Individuals were removed that were >8 S.D.s from the mean of the first five PCs. Post-QC 520 cases and 2648 controls were available. Imputation of HLA alleles was performed with HLA*IMP:02 (European reference panel, imputing 4-digit HLA alleles with 94% accuracy in Europeans) [20].

HLA associations
Raychaudhuri et al. [22] showed that most HLA-derived risk for ACPA-positive RA in Europeans is from polymorphisms in five amino acids [22].
These define 16 haplotypes in HLA-DRb1, 2 haplotypes in HLA-B and 2 haplotypes in HLA-DPb1. We tested their association with RA in our African and European ancestry cohorts using logistic regression. In GENRA/SLESS, the first 10 ancestry-informative PCs were included as covariates, to adjust for admixture. Pearson's correlation coefficient tested correlations between the log [odds ratios (ORs)] for association of each HLA haplotype in the meta-analysis, and GENRA/SLESS. As Raychaudhuri et al. determined these haplotypes by using omnibus tests to define critical HLA-DRb1 molecule amino acid positions for RA susceptibility, we additionally used omnibus tests to evaluate the associations between the critical amino acid positions 11, 13, 71 and 74 in HLA-DRb1 and RA in our African and European ancestry cohorts. We also tested associations between the SE and RA using logistic regression. Nagelkerke's [23] measure of proportion of trait variance explained compared the degree of RA risk explained by the SE in European and African ancestry individuals.

Genome-wide associations
A trans-ethnic RA genome-wide association study (GWAS) meta-analysis by Okada et al. [5] identified 102 risk SNPs in European and Asian ancestry individuals. We constructed a genetic risk score (GRS) including 99 of these variants [omitting two X-chromosome markers, and one SNP (rs147622113) not present in Africans] and tested its association with RA in our African ancestry cohort using logistic regression, including the first 10 ancestry-informative PCs. All SNPs included in the GRS had INFO scores >0.55. The GRS was created by summing the number of risk alleles carried at each SNP, weighted by its log (OR). To ensure the GRS association with RA in GENRA/SLESS was not due to European admixture, we divided GENRA cases into quartiles based on PC 1 (separating European and African ancestry individuals) and repeated the GRS analysis for cases in each quartile against all controls. It was not possible to test the association between the GRS and RA in our European ancestry cohort because CARDERA was genotyped on the ImmunoChip, which is missing a substantial proportion of the SNPs identified by Okada et al.

Smoking and alcohol associations
Smoking and alcohol abstinence have consistent associations with RA in European/North American studies [2,3]. We tested the association between (i) being an eversmoker vs never-smoker and drinker vs non-drinker at RA diagnosis and (ii) casecontrol status in GENRA/ SLESS. GENRA cases were classified as non-drinkers if they reported being a never drinker or if a current/previous drinker reported consuming 0 U/week at RA diagnosis. SLESS controls were classified as non-drinkers if they reported drinking 0 U/week at assessment. Propensity score matching (1:2 ratio) matched GENRA cases with SLESS controls for age, sex and ethnicity. As Black British ethnicity individuals were not recruited to SLESS, 19 Black British GENRA cases were excluded. Eight GENRA cases and 190 SLESS controls with missing smoking/alcohol data were also omitted. Associations between smoking and drinking and RA were tested using multivariate logistic regression models, including age, sex and ethnicity as covariates. Due to ethnic differences in lifestyle habits, a secondary analysis stratifying by ethnicity was performed.

SmokingSE interaction
An additive interaction between the SE and smoking on RA risk is well established in Europeans [24]. We tested this in GENRA/SLESS using logistic regression models incorporating smoking (ever vs never-smoking) and the SE (0 vs any copies), alongside the first 10 ancestry-informative PCs, age and sex as covariates. The interaction between the SE and smoking was evaluated as the relative excess risk due to interaction, using the approach defined by Rothman and Greenland [25].

SE and erosive status
The SE associates with erosions in most RA populations [26]. We tested this in GENRA cases using logistic regression including erosions (present vs absent) as the response variable, and the first 10 ancestry-informative PCs, age, sex, ACPA and disease duration as covariates.
All cases vs ACPA-positive cases Geneenvironment risk factors for RA have stronger associations with ACPA-positive disease. We therefore analysed all RA and ACPA-positive RA cases separately.

Sample size
We had 80% power to detect a common variant [minor allele frequency (MAF) = 30%] with OR >2.10 at genomewide significance (assuming an additive genetic model and RA prevalence in Africans of 0.75%) [7,27]
PCA showed strong segregation of GENRA/SLESS samples with African populations from 1000 Genomes phase 3 (Fig. 1). While European admixture in GENRA and SLESS was observed, this was not significantly different between them. The major PC dividing European from African ancestry (PC1) was not significantly associated with RA status under logistic regression (P = 0.27).

HLA imputation
HLA alleles were imputed with a high-degree of certainty (supplementary Figs S14, available at Rheumatology Online). In GENRA/SLESS, 232 alleles were imputed at 4-digit resolution; 127 alleles had a frequency of >1%. The median posterior probability (Q) of allele allocation was 0.974; 88.1% of alleles were imputed with Q > 0.9; 96.3% were imputed with Q > 0.8. In CARDERA/ WTCCC2, 183 alleles were imputed at 4-digit resolution; 111 of these had a frequency of >1%. The median Q of allele allocation was 0.975; 87.3% of alleles were imputed with Q > 0.9; 96.8% were imputed with Q > 0.8. Lower frequency alleles were imputed with less certainty, as expected due to their rarity in reference panels. Imputation posterior probabilities for the HLA region were in general similar between RA cases and controls, with no significant differences between them as assessed by linear regression. The exception was for HLA-DPB1, which showed slightly lower imputation quality in controls compared with cases (P = 0.024). Omnibus tests showed highly significant associations for amino acid positions 11 and 13 in HLA-DRb1 (P = 8.1 Â 10 À 9 , P = 7.8 Â 10 À 9 , respectively) and casecontrol status in GENRA/SLESS, but not positions 74 (P = 0.017) and 71 (P = 0.20). In CARDERA/WTCCC2, highly significant associations were observed for amino acids positions 11, 13 and 71 in HLA-DRb1 (P = 1.6 Â 10 À 10 , P = 1.2 Â 10 À 10 and P = 5.0 Â 10 À 9 , respectively) with casecontrol status. The association at position 74 was not significant (P = 0.058).
Dividing GENRA cases into quartiles based on their proportion of European ancestry (captured by the first PC) and repeating the GRS analysis for each quartile against all controls ensured the association was not driven by European admixture (supplementary Fig. S5, available at Rheumatology Online).
Plotting the ORs for RA associated with each SNP in the Okada et al. meta-analysis and our GENRA/SLESS African ancestry cohort and scaling each point by their minor allele frequency (supplementary Fig. S6, available at Rheumatology Online) showed that the majority of    Alcohol consumption had a significant inverse relationship with RA in GENRA/SLESS. The OR for RA in drinkers vs non-drinkers was 0.34 (95% CI: 0.20, 0.56; P = 2.7 Â 10 À 5 ). The association was marginally stronger for ACPA-positive RA (OR = 0.30, 95% CI: 0.17, 0.53; P = 3.9 Â 10 À 5 ; supplementary Table 3, available at Rheumatology Online). While smoking and drinking rates were lower in Black African individuals compared with Black Caribbean individuals, ORs for RA associated with smoking and alcohol were similar in these ethnic groups (Table 4).

Discussion
We studied whether geneenvironment RA risk factors identified in European and Asian ancestry populations are relevant in UK-based African ancestry individuals. Our study has three key findings. First, we showed that EuropeanAsian RA susceptibility loci (a 99 SNP GRS and 20 HLA haplotypes) are associated with RA in African ancestry individuals. Second, we found that smoking and alcohol (dominant European/North American environmental RA risks [2,3]) are associated with RA in African ancestry individuals. Third, we demonstrated that the SE, which predicts radiological damage in a range of ancestral groups [26] also predicts erosive status in African ancestry RA patients. Overall, our findings provide strong evidence for a shared geneticenvironmental architecture for RA across European, Asian and African ancestry populations.
We used three approaches to examine the association between the HLA locus and RA in GENRA/SLESS. First, we used the 20 haplotype model proposed by Raychaudhuri et al. [22]; second, we used omnibus tests to evaluate the critical amino acid positions in HLA-DRb1 identified by Raychaudhuri et al.; and third, we used the SE. In all three instances, the HLA region had a highly significant association with RA. While amino acid position 71 was not associated with RA in GENRA/SLESS, the relevance of this is uncertain, as the SE (which spans positions 7174) had a highly significant association with RA status. Although SE alleles were rarer in Africans than Europeans, the difference in the proportion of SE allele carriers between cases and controls was similar in both GENRA/SLESS and  Our study lacked the power to detect individual SNP associations with RA in GENRA/SLESS. We therefore tested a GRS combining validated RA susceptibility SNPs. This approach is widely used to replicate genetic risks in polygenic disorders, whose genetic architecture comprises hundreds to thousands of very small effect common alleles [28,29]. Excluding the HLA-tagging SNP from our GRS reduced the significance of the association, suggesting that as in Europeans, most of RA's heritability is from the HLA locus. It is improbable that all validated susceptibility SNPs reported by Okada et al. would replicate in a similarly sized African ancestry cohort, but our analysis supports the concept of an overall shared burden of genetic RA risk loci across ancestral groups.
Smoking and abstinence from alcohol increased RA risk in African ancestry individuals, replicating the effects observed in Europeans. While we observed larger ORs for RA associated with these factors compared with the pooled study risks observed in published metaanalyses-meta-analysis OR for RA in drinkers vs nondrinkers of 0.78 (95% CI: 0.63, 0.96) [2] and ever-vs never smokers of 1.40 (95% CI: 1.25, 1.58) [3]-our modest sample size limited our precision in estimating risk. The ethnic differences we observed in alcohol and smoking habits-with smoking and drinking being commoner in Black Caribbean, compared with Black African individuals-highlight the importance of considering ethnicity when evaluating lifestyle factors, although the small sample sizes of these ethnic subgroups means this finding requires interpreting with caution. This is the first analysis of associations between erosions and the SE in UK-based African ancestry individuals. The OR of 2.60 in GENRA was similar to a metaanalysis of eight studies containing 532 Northern European RA patients, reporting an OR for RA with one SE allele of 2.4 [26]. In GENRA the association appeared independent of ACPA, which was included as a modelling covariate. More recently we, along with other research groups, have demonstrated a significant association between the presence of Valine at position 11 (external to the SE) in HLA-DRb1 and radiological damage in European ancestry RA patients [17,30]. Testing this position in GENRA also revealed a significant association with erosions (OR = 2.15, 95% CI: 1.02, 4.53; P = 0.047), although we could not confirm this was independent of the SE owing to our limited sample size.
Our study has several strengths. First, it is the first analysis of RA risk factors in Black British and Black Caribbean individuals. Second, we evaluated a broad range of geneticenvironmental factors. Third, pooling African ancestry ethnic groups and using computational techniques to account for population stratification optimized study power. Fourth, population controls were used from the same catchment area as cases. It also has limitations. First, the absence of HLA allele genotyping prevented imputation internal validation; however, posterior-probability scores suggested common HLA variants were accurately imputed: HLA*IMP:02 has documented accuracy at imputing HLA alleles [20], and the SE prevalence in GENRA (24%) was similar to that observed in Cameroonian RA cases (30%) [9]. Second, as RA patients retrospectively recalled lifestyle habits at diagnosis, recall bias was possible. In the context of alcohol, this could inflate the effect seen (RA patients often abstain from alcohol with DMARD use, which could make them more likely to classify themselves as non-drinkers at diagnosis). Third, different methods were used to evaluate lifestyle habits in cases and controls. Finally, controls were not screened for RA; however, its low prevalence (0.60.9% in Africans [7]) suggests few would have this disease, and it is not routine practice to screen controls for disease in genetic studies of low prevalence disorders.
Despite increasing UK ethnic diversity (2631% of our local South London community are of Black ethnicity [31,32]), few ethnic minority patients participate in research. This issue is particularly pressing in genetic studies, as GWAS traditionally exclude non-Europeans to minimize population stratification. This calls into question the generalizability of research findings to ethnic minority groups. As statistical advances facilitate genomic analyses of ancestrally diverse populations [33,34], we undertook a proof-of-concept GWAS (including the first 10 ancestryinformative PCs) in our ancestrally heterogeneous GENRA/SLESS cohort (supplementary Figs S7 and S8, available at Rheumatology Online). The genomic-inflation factor was 0.99, suggesting appropriate control for population stratification. We therefore propose that future GWAS meta-analyses include African ancestry samples.
In conclusion, genetic and environmental factors for RA susceptibility and severity in European and Asian ancestry populations are generalizable to African ancestry individuals. Greater efforts are needed to include ethnic minorities in research. This will ensure research findings and their potential for clinical benefit (which in the context of susceptibility loci and prognostic markers includes prevention and precision medicine) are universally translatable.
Clinical Research Network (NIHR CRN). Funding for the Wellcome Trust Case Control project was provided by the Wellcome Trust under awards 076113, 085475 and 090355.
Funding: This article presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. The funders had no role in the study design, data collection and analysis, data interpretation, the writing of the manuscript or the decision to submit the manuscript for publication. This study also represents independent re-

Supplementary data
Supplementary data are available at Rheumatology Online.