Genetic fine mapping of systemic lupus erythematosus MHC associations in Europeans and African Americans

Abstract Genetic variation within the major histocompatibility complex (MHC) contributes substantial risk for systemic lupus erythematosus, but high gene density, extreme polymorphism and extensive linkage disequilibrium (LD) have made fine mapping challenging. To address the problem, we compared two association techniques in two ancestrally diverse populations, African Americans (AAs) and Europeans (EURs). We observed a greater number of Human Leucocyte Antigen (HLA) alleles in AA consistent with the elevated level of recombination in this population. In EUR we observed 50 different A—C—B—DRB1—DQA—DQB multilocus haplotype sequences per hundred individuals; in the AA sample, these multilocus haplotypes were twice as common compared to Europeans. We also observed a strong narrow class II signal in AA as opposed to the long-range LD observed in EUR that includes class I alleles. We performed a Bayesian model choice of the classical HLA alleles and a frequentist analysis that combined both single nucleotide polymorphisms (SNPs) and classical HLA alleles. Both analyses converged on a similar subset of risk HLA alleles: in EUR HLA– B*08:01 + B*18:01 + (DRB1*15:01 frequentist only) + DQA*01:02 + DQB*02:01 + DRB3*02 and in AA HLA–C*17:01 + B*08:01 + DRB1*15:03 + (DQA*01:02 frequentist only) + DQA*02:01 + DQA*05:01+ DQA*05:05 + DQB*03:19 + DQB*02:02. We observed two additional independent SNP associations in both populations: EUR rs146903072 and rs501480; AA rs389883 and rs114118665. The DR2 serotype was best explained by DRB1*15:03 + DQA*01:02 in AA and by DRB1*15:01 + DQA*01:02 in EUR. The DR3 serotype was best explained by DQA*05:01 in AA and by DQB*02:01 in EUR. Despite some differences in underlying HLA allele risk models in EUR and AA, SNP signals across the extended MHC showed remarkable similarity and significant concordance in direction of effect for risk-associated variants.


Introduction
Systemic lupus erythematosus (SLE) is a highly complex disease, with occurrence heavily influenced by genetics (heritability = 44%; (1)). SLE incidence varies markedly across populations, with Europeans (EURs) showing 3-4-fold lower prevalence compared with individuals of African or Asian ancestry (2,3). Genomewide association studies (GWAS) indicate a strong genetic signal arising from the major histocompatibility complex (MHC) in all populations studied (4)(5)(6). The association signals in the MHC have been studied in EURs (7) and East Asians (8)(9)(10). In EURs, the strength of the MHC signal seen in GWAS is driven by multiple separate genetic factors. Unravelling these different effects is hampered by extensive linkage disequilibrium (LD). Two SLE-associated haplotypes that exhibit extended LD have been described in EURs: the haplotypes include the HLA-DRB1 alleles, HLA-DRB1 * 03:01 and HLA-DRB1 * 15:01. These two haplotypes are also associated with other autoimmune diseases (11,12) and are often referred to by their tagging HLA-DRB1 alleles, with haplotypes containing DRB1 * 03 alleles being the 'DR3' serotype; haplotypes containing DRB1 * 15 or DRB1 * 16 alleles comprise the 'DR2' serotype. The actual causal alleles at the MHC in EURs are unknown, a somewhat surprising situation given the comparatively, in complex trait terms, large relative risk of at least two conveyed by MHC alleles. The limitation has principally been the extended LD at the MHC. In East Asian SLE the MHC risk is also strong, but may be slightly simpler than in EURs, the predominant risk arising from the extended haplotypes including HLA-DRB1 * 15:02 in LD with DQA1 * 01 and DQB1 * 05 or * 06 alleles (9,10). Investigation of the MHC associations in African Americans (AAs) has only previously been studied intensively in small cohorts and using limited genotyping (13) or as part of a larger scan of immune-related loci using the Immunochip (14) with limited information on HLA alleles. Small studies have implicated HLA-DRB1 * 15:03-DQA1 * 01:02-DQB1 * 06:02 13 and a modest SNP-based study did suggest that multiple MHC association signals were present (13). Population admixture is a complicating factor in the genetic analysis in AAs.
The greater prevalence of SLE in non-EUR populations rationalizes a trans-ancestral approach to fine map genetic association signals. We have previously employed this strategy at a genome-wide level (15) and we have fine-mapped individual loci identifying a single polymorphism, likely to be causal, close to the transcription start of the SLE susceptibility gene, TNFSF4 (16). In a small SNP-based study, we examined the pattern of association with SLE at the MHC in northern and southern EUR cohorts and in a Filipino population (10). Aligning the patterns of association suggested some similarity but revealed differences in LD around these association signals. These results suggest that trans-ancestral fine mapping strategy at the MHC is of value. A recent trans-ancestral study using the Immunochip (14) did look at HLA and SNP associations in the MHC but was not focused on the MHC and the analysis used a simple stepwise approach with a generous level of statistical significance for inclusion. The Immunochip study was also limited by a small number of AA ancestry samples in the reference data used for HLA imputation.
We have genotyped 1494 SLE cases and 5908 controls of AA ancestry for genetic markers within the MHC, as part of a GWAS. 308 AA subjects were also genotyped for classical class II HLA alleles and included in the reference data for HLA imputation. These data were compared to an equivalent analysis of MHC data from a recent GWAS in a EUR population (4). We performed two parallel analyses to determine the model of association for HLA alleles: 1) an analysis guided by the a priori view of causality in the Class II region and 2) a fully Bayesian model choice. The classical approach started from an assumption of association at class II loci and was motivated by the observed association signal in this area combined with the relatively short-range LD in the AA population. The Bayesian approach used Reversible Jump Markov Chain Monte Carlo (RJMCMC) simulation to search over all possible HLA models of association, with defined priors (see Materials and Methods) for genetic risk effects (odds ratios) and model size (the number of causal variants). We found that our two analyses strategies converged to very similar results for association in the HLA region.

Results
We analysed genetic data across the MHC in AA and EUR for association with SLE. The EUR data were taken from a previously published GWAS (4) comprising 4036 cases and 6959 controls. Post quality control (QC) (see Materials and Methods) there were 6079 SNPs in the MHC (Chr6, 26-34 Mb). 1494 cases and 5908 controls of AA ancestry, genotyped as part of a GWAS (unpublished), passed QC as did 4222 SNPs within the MHC.
We generated a new reference panel of HLA-typed individuals in a subset of the AA data. A total of 308 subjects were genotyped for classical class II HLA alleles (HLA * DQA, HLA * DQB and HLA * DRB1) by targeted sequencing of exons 2 and 3 (HLA-DQA and HLA-DQB) and exon 2 (HLA-DRB1) (17). These were added to the database of reference HLA genotypes for HLA imputation with the software HLA * IMPV2 (18). We imputed HLA alleles in each populations' data (see Materials and Methods) using HLA * IMPV2 and also imputed amino acid data (see Materials and Methods).

Fine mapping the class II signal
We were interested to determine the most likely HLA alleles that explained the class II signal in the AA and EUR data in Figure 1. Therefore, we conducted a haplotype analyses followed by a model selection analysis (see Materials and Methods and Supplementary Material 1) in both populations. This approach began with a focus on the two most associated class II DR-DQ haplotypes in each population representing DR2 and DR3 ( Fig. 2B- We found that DR2 was best explained by DRB1 * 15:03 + DQA * 01:02 in AA and by DRB1 * 15:01 + DQA * 01:02 in EUR, while DR3 was best explained by DQA * 05:01 in AA and by DQB * 02:01 in EUR. These alleles are noted in Figure 2B-ii.

Stepwise regression on HLA alleles
Having determined the most likely explanation for the class II association peak in each population, we then conditioned on these models to find additional independently associated HLA alleles. We ran a forward stepwise regression on all HLA alleles starting from the class II HLA alleles just discussed (see Supplementary Material 2). This biased approach to stepwise regression, reassuringly, resulted (  Table 1). The colour codes in Figure 2 highlight which HLA alleles lay on the DR2 and DR3 risk haplotypes discussed above. Other alleles, such as class I B * 18:01 in EUR and C * 17:01 in AA, for example, are associated in addition to and independently of the risk haplotypes.

Associations conditional on the HLA alleles
To search for SNP associations in addition to and independent of HLA alleles, and to understand the independent regional HLA associations, we ran stepwise regression conditional on various sets of HLA alleles. Figure 3 displays association results in a sequential fashion conditional on various sets of associated HLA alleles. Figure 3A and B show the results after conditioning on the best model of association at class II; Figure 3C and D are conditioning on the best model of association for class II including the extended ancestral MHC DR3 haplotype  (see Supplementary Material 4), which is effectively the class I signal from HLA-B8. Figure 3E and F show residual association after removing the signals from the best model of all HLA alleles. After conditioning on the top HLA class II association signals in each cohort, it is apparent that both cohorts show evidence of additional association signals close to the junction of MHC classes I and III regions. Class I HLA-B8 (or variants highly correlated with it) makes a major contribution to both of these association signals, as the association spike is markedly diminished when conditional on HLA-B * 08:01. Interestingly, when condition-ing on the best overall model for HLA association there is limited evidence for further signals in the EUR cohort; however, there remains clear evidence for further association in the AA cohort in the class III region (Fig. 3F). The stepwise regression on SNPs only using each population's data and conditioning on the respective HLA alleles in Figure 2iii returned two SNPs in the EUR data (rs146903072: P-value = 3.93 × 10 −06 , OR = 1.82 95% CI 1.39-2.37, 31,847,180 bp, intergenic SLC44A4 -EHMT2; rs501480: P-value = 9.84 × 10 −06 , OR = 1.15 95% CI 1.08-1.22, 33,563,946 bp,   intergenic GGNBP1 -LINC00336) and two SNPs in the AA data (rs389883:P-value = 4.37 × 10 −08 , OR = 1.76 95% CI 1.31-1.76, 31 947 460 bp, intron STK19; rs114118665: P-value = 5.76 × 10 −06 , OR = 2.37 95% CI 1.56-3.60, 31 342 005 bp, intergenic HLA-B -MICA). The two associated SNPs in the AA data are not in LD with the two associated SNPs in the EUR data (R 2 < 0.01 in all parings, in both populations). We found no evidence of association for the AA SNPs in the EUR data (as single markers of conditional on the HLA) and vice versa.

The HLA-DQ heterodimer risk profile
As the cell surface HLA-DQ molecule is a heterodimer with variation in both its alpha (coded DQA) and beta (coded DQB) chains, we explored the hypothesis that a combination of DQA and DQB alleles would be a better model fit than including the alleles as independently associated. We found no evidence (see Materials and Methods) in favour of an interaction model between any pair of DQA and DQB alleles. Furthermore, we found no specific combination of DQA and DQB alleles that fit the data better than simple additive models. This suggests that the effects of DQA and DQB alleles are independent.

Two-digit DRB1 * 15 association and amino acid data
We looked closely at the association signals for HLA alleles nested within the two-digit HLA-DRB1 * 15 group, as these alleles are consistently associated with SLE across major populations yet differ in frequency and in the most associated allele.  (P-value = 1.86 × 10 −01 , OR = 0.81 95% C.I. =0.59-1.12) for association. DRB1 * 15:02 has been found to be associated in East Asians (9), DRB1 * 15:01 has also been found to be associated in this population (19). We tested a one-parameter two-digit DRB1 * 15 allele model against a three-parameter (a separate odds ratio for each allele: DRB1 * 15:01 + DRB1 * 15:02 + DRB1 * 15:03) model in the AA data. We did find weak evidence (P-value = 0.02) to reject the twodigit model using a likelihood ratio test; however the Bayesian Information Criterion (BIC) favoured the two-digit model (difference in BIC = 10.37). This has some biological significance as the three HLA alleles share the same amino acid residue at position 71 (A) and no other HLA-DRB1 allele amongst those imputed in the AA dataset codes for this residue at this position. The twodigit model of association is therefore equivalent to an amino acid residue association.

Comparison of HLA, amino acid and SNP models of association
An important question is whether the association signal across the MHC can be best explained by SNPs, HLA alleles or amino acid residues. To explore this we compared our results for HLA association to those obtained by stepwise regression analyses on amino acid and SNP data (Table 3). In both populations' analyses we found that the amino acid models were a poorer fit than HLA alleles, as judged by the Akaike Information Criterion (AIC) or BIC. In the AA data, the HLA model was the best overall fit. In the EUR data, the SNP model was the best fit. The SNP model in the AA data is likely not tagging all the SLE-associated variation, in support of this interpretation we did find two further independent HLA associations, namely HLA * DQA * 05:05 and HLA * DRB1 * 13:04, conditional on the four SNPs noted in Table 4. The HLA alleles tagged by the SNP models can be seen in Supplementary Material, Fig. S3, and for reference the full set of HLA frequencies and associations can be seen in Supplementary  Material, Fig. S4.

Autoantibody sub-phenotypes
We had data available on autoantibody levels in both populations, so we exploited this and present here novel crosspopulation genetic association analyses of these phenotypes.

Discussion
Our analyses of SNP, HLA and amino acid data in the MHC in an AA and EUR population have identified the key HLA alleles that are associated with SLE together with two SNPs independently associated in both populations. We found models using HLA alleles were a better fit to the data than amino acids' models in both the AA and EUR data. There is a similar landscape of association with two independent class II associations in both populations.
Our results for HLA associations are not the result of a single analyses using stepwise regression, as is common in analysis of a single region such as the MHC. We used two approaches: a frequentists approach to decomposing class II-associated haplotypes followed by conditional analyses and a Bayesian model choice that searches over the full model space of HLA alleles. The two approaches resulted in largely the same set of HLA alleles, while the Bayesian approach was more parsimonious by only including DQA * 01:02 as associated in the EUR data, rather than both DRB1 * 15:01 and DQA * 01:02. In addition, the Bayesian approach included only DRB1 * 15:03 as associated in the AA data, rather than both DRB1 * 15:03 and DQA * 01:02. In both cases the pair of alleles is in LD (r 2 = 0.61 and r 2 = 0.37 in each population, respectively) and this discrepancy between the approaches demonstrates some uncertainty remains on this particular haplotype. There is some suggestion that the DRB1 * 15 two-digit allele could be the best explanation in both populations for one of the main class II haplotypes associated, and this could be further explained by a specific amino acid coding at position 71 (A) for DRB1 * 1501, DRB1 * 1502 and DRB1 * 1503.
The class II DR3 haplotype harbouring the commonly observed SLE-associated DRB1 * 03:01 allele was best explained by DQB * 02:01 in the EUR data and DQA * 05:01 in the AA data. The LD between these two alleles is much lower in the AA than EUR data (r 2 = 0.33 versus r 2 = 0.92); thus there is more power to resolve the DR3 class II associations in AAs. Our results suggest that DQA * 05:01 is the most likely causal HLA class II allele on this haplotype. This and the lack of extended LD, as illustrated in Figure 1, suggest that the AA data have been very useful here in fine mapping both the HLA alleles and independently associated SNPs. Both populations have evidence of additional independent associations in class I with B * 08:01 being a consistent associated allele in the two populations.
Our findings of SNP associations independent of HLA alleles do show some consistency in the identification of two class II/III SNPs independently associated in both populations, but they also highlight some uncertainty and hence the need for more extensive sequencing at the MHC including accurate HLA typing.
We find novel HLA-DQ associations in the AA data (DQA * 02:01, DQA * 05:05 and DQB * 02:02). There is no difference in the peptidebinding groove when replacing DQA * 05:05 with DQA * 05:01, which captures the DR3 signal in the AA represented by DQB * 02:01 in the EURs. The only difference between the two products is in the 11th codon in the leader sequence [position −13; DQA * 05:01 has GCC (alanine, non-polar and hydrophobic); DQA * 05:05 has ACC (threonine, polar and hydrophilic). There-fore, the primary amino acid sequences of the two mature proteins are identical and should exhibit identical disease susceptibility. However, we did not sequence exon-1 of DQA; hence the genotyping is dependent on imputation and this, together with DQA * 05:05 being rare in AA, leads to some uncertainty in this allele's association.
The DQA * 02:01 and DQB * 02:02 alleles' associations seem complex as these two HLA alleles are in LD with one another (R 2 = 0.87 in the AA data); they show conditional association with a likely dominant effect for DQA * 02:01 (OR = 0.67; 95% C.I. = 0.60-0.76; P-value = 1.31 × 10 −11 ). It seems that DQB * 02:02 only has a significant risk effect when conditioned on the protective (possible dominant) effect of DQA * 02:01. We find no evidence of interaction between HLA-DQA * 02:01 and HLA-DQB * 02:02. Due to the two alleles being in strong LD this result could be due to omitted variable bias, which would result in each of the allele's effect being shrunk to zero when not including both correlated variables in a model of association.
We found a significant association between B * 08:01 and anti-Ro antibodies in a case-only analysis of the EUR data (OR = 2.03 95% CI 1.74-2.36; P-value = 4.02 × 10 −19 ). While a class I SNP was more associated than the HLA allele, due to imputation uncertainly we cannot rule out this HLA allele as more likely causal, which would be an interesting finding in the light of the suspected role of Epstein Barr Virus (EBV) in SLE pathogenesis. B8 binds an immune-dominant peptide from EBV EBNA antigen (20,21). This association was also seen in the AA data, but it was less significant (OR = 1.67 95% CI 1.16-2.41; P-value = 6.13 × 10 −03 ).
In summary this study substantially extends our understanding of MHC association in SLE with the inclusion of a large-scale study of AA samples and combining with a new analysis of a large EUR dataset. We have novel HLA typing included in a subset of the AA dataset, which greatly improves imputation. We find similarity between the AA and EURs in their pattern of association across the MHC using novel and coherent fully Bayesian analyses to determine the best model of association with HLA. The AA data highlight strong evidence for association at class II independent of other loci. This has shown that comparing the results of the MHC associations in EURs and AAs assists in fine mapping these signals.
Post QC there were 4222 SNPs within the MHC. SNPs were removed if they had greater than 2% missing data across all samples, a P-value <0.05 for a test of differential missing data between cases and controls, a Hardy Weinberg Equilibrium test in cases with P-value <10 −04 or a Hardy Weinberg Equilibrium test in controls with P-value <10 −02 .
Samples were removed if their call rates <90% across good quality SNPs, had excess autosomal heterozygosity or if their genetically determined sex differed from their reported sex.
Additionally, duplicate samples and first-degree relatives were removed.
A total of 308 subjects were also genotyped for classical class II HLA alleles (HLA * DQA, HLA * DQB and HLA * DRB1) by targeted sequencing of exons 2 and 3 (HLA-DQA and HLA-DQB) and exon 2 (HLA-DRB1) (17). This set included the 'HLA reference set' used for HLA imputation into the rest of the AA study. These were added to the database of reference HLA genotypes for HLA imputation with the software HLA * IMPV2 18 .

SNP imputation
All AA and EUR subjects were imputed up to the 1000 Genomes (Phase I integrated set V3 March 2012) density using post-QC typed SNPs using IMPUTE (22). All populations' reference data were used for imputation in the AA and EUR data as advised by the authors of IMPUTE. We set a quality threshold of 0.7 for IMPUTE INFO score and only analysed SNPs with scores above this level.

HLA allele imputation
HLA genotypes for HLA-A, HLA-B, HLA-C, HLA-DQA, HLA-DQB and HLA-DRB1 were imputed into the AA data using HLA * IMP-V2 (18). The same procedure was used to impute HLA alleles in the EUR data for the classical HLA genes: HLA-A, HLA-B, HLA-C, HLA-DQA, HLA-DQB, HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5 and HLA-DPB1. While the same reference data was used to impute both the AA and EUR data, the additional HLA alleles imputed in the EUR data were not supported for multi-ethic samples in the HLA * IMP algorithm and so were not imputed in the AA data. HLA-IMP-V2 uses multi-ethnic samples as reference data including data from the 1958 British Birth Cohort, 1000 genomes subjects and additional, mainly EUR, data provided by GlaxoSmithKline. Full details of these samples can be seen in the publication paired with this software (18). Our contributed AA samples to the reference data increased the size of the AA/African background set, which was 28, 34 and 28 for HLA-DQA1, HLA-DQB1 and HLA-DRB1, respectively, by over 10-fold.
For regression analyses we took the probabilistic genotypes (rather than best guess) output and converted to dosage (expected allele counts). For phasing and haplotype analyses we took the best guess genotypes.

HLA imputation assessment
HLA * IMP-V2 (18) performs cross validation on all reference samples (two-thirds are used for reference and one-third, for validation) as an indicative evaluation of imputation performance. The results of this can be seen in Supplementary Material, Table S1 for the AA data on subjects in the 'African' HLA-IMP-V2 reference data combined with our contributed AA samples. This table also contains HLA-A, HLA-B and HLA-C; however these analyses were performed on reference samples outside of our study.
We also performed our own imputation accuracy assessment on the 308 HLA-typed subjects that were also included in our association study. These results can be seen in Supplementary Material, Table S2-Supplementary Material, Table S4. This assessment is biased upwards for accuracy estimation, as the samples tested were also in the reference panel. However. the results are comparable with that returned by HLA * IMP-V2, which performed leave one-third out cross-validation on data that included our samples, with HLA-DRB1 performing slightly worse than HLA-DQA and HLA-DQB.

Amino acid translation
Amino acid sequences for each HLA allele were extracted from the European Bioinformatics Institute HLA database (http://www.ebi.ac.uk/ipd/imgt/hla/). HLA allele dosages were converted to amino acid dosages at each position; the dosage for a particular amino acid 'A' at position 'p' would be the sum of HLA alleles' dosage that coded for amino acid 'A' at position 'p'. The total dosage for each position is therefore equal to 2 and this total is split between each possible amino acid possible at the position.

Phasing
The HLA data were phased together with the SNP data using BEAGLE (23) to aid the classical statistical analysis of the SLE HLA risk haplotypes.

AA admixture analysis
The AA data were subject to an analysis for admixture using ADMIXTURE (24) on an LD-pruned dataset containing the AA samples as well as Hapmap3 (CEU, CHB and YRI) samples as anchoring populations. The resulting admixture estimates were used to remove genetic outliers. We also used this analysis to infer a set of subjects with a lower content of non-African derived haplotypes. This analysis was performed on genomewide SNP data and on MHC-wide SNP data; results can be seen in Supplementary Material, Figure S4. The set of subjects chosen for HLA typing was all within the African cluster in the MHC-wide admixture analysis. We created a 'more African' subset of the AA data by removing AA subjects that were in the top 25th percentile of the non-African derived haplotypes estimate, which would have retained all Africans in the HapMap data; the data consisted of 1375 cases and 5414 controls. We refer to these data as AA sub.

Statistical analysis
Study design. We began with parallel frequentist and Bayesian association tests to determine the best underlying HLA risk model for SLE. After determining the best model of association at the HLA, we conditioned on this model, using classical stepwise regression, and tested for further association with SNPs. A workflow can be seen in Figure 4; we expand on each step in the description below. We also tested for association with SLE subphenotypes using classical stepwise regression.

Association analysis.
Association analyses were performed in R (25) using logistic regression. SLE status was coded as 0 (Healthy controls) and 1 (cases). The SNP and HLA data were coded as minor allele counts (0 < g < 2) with imputed SNPs and HLA alleles coded as expected allele counts where the expectation was taken from the imputation probabilities: Expectation = 0 X P(G = 0) + 1 X P(G = 1) + 2 X P(G = 2), where P(G = j), for j = 0,1,2, is the probability of 0, 1 or 2 copies of the HLA or SNP reference allele. These probabilities were taken from the output of HLA * IMP V2. Covariates derived from an admixture analysis using ADMIXTURE (24) were used to account for population structure in the AA data. Our AA data were combined with HapMap European (CEU), African (YRI) and Asian (CHB + JPT) populations and we used the admixture proportions of CEU and YRI as covariates (the third proportion, assumed to be of Asian ancestry, being redundant as all sum to 1).

Analysis of extended MHC haplotypes.
We used likelihood ratio testing between nested models of association with each of the SLE-associated class II haplotypes to find the best set of alleles that explained the association. This was complimented by checking the AIC and BIC for each model.
For example, in Supplementary Material, Bayesian association analysis. A Bayesian model selection was performed on the HLA data using the association studies toolkit for WinBUGS, employing a reverse jump algorithm on the model space, in the Markov Chain Monte Carlo (MCMC) framework (28). This approach used a probit link (rather than a logit link commonly used for case control association studies). The advantage is that the MCMC algorithm samples from an underlying normally distributed variable (z i ) where the probability of disease for subject i is defined as p(z i > 0 | M i ) where the mean parameter M i depends on a regression on the genotype values: M i = beta * G i, with G i the genotype (the number of minor alleles for individual i) and beta is the regression parameter. We made simple prior assumptions: first that the magnitude of genetic effect (Odds ratio) could with non-negligible probability be in the range 0.25-4, and second that the genetic model would be most likely to have 3-5 genetic effects but much less likely to have more than 10 effects. We therefore used a Poisson distribution with mean parameter equal to 4; however we tested the robustness of our approach by re-running the analyses with Poisson (3) and Poisson (5). For the prior on the effect sizes we used a normal distribution with mean = 0 and variance = 0.25. This reflects the belief that the beta parameter is relatively unlikely to be larger than 1 (two standard deviations in our prior). A value of 1 on the probit scale, with samples sizes similar to the ones in our study, transfers to a relative risk of ∼1.7 and so most of our prior belief in the relative risk is between 0.5-2, while values below 0.5 and above 2 are allowed but with less belief. It is important to have informative priors in Bayesian model choice as vague priors can overly favour the null model (zero effect size or equivalently no explanatory variables in the chosen model). Our priors are informative but not overly so, reflecting the commonly observed risk effects in GWAS.
The MCMC model fitting in WinBUGS is a computationally expensive exercise; however it was feasible within a period of 2 days to get results. The MCMC framework is a samplingbased technique that requires convergence. With the current AA and EUR data we found that running six chains in parallel each of 80 000 samples with a burn-in period (where samples are discarded) of 20 000 was sufficient. This required a 12-core desktop PC with two 2.4 GHz Xeon processors and utilized 10GB of RAM.
The HLA-DQ heterodimer risk profile. We tested for interaction between all DQA-DQB pairs noted in Figure 2. For example, in the case of EUR we tested for interaction between DQA * 01:02 and DQB * 02:01.
We also created a variable from the product of the two DQA and DQB pairs and tested this as a sole variable in the regression; we then compared the AIC and BIC for this singlevariable model to the two-parameter models (independent additive effect for the DQA and DQB alleles). This single-parameter model captures risk attributable to the specific DQ molecules created by the pairing, for example, the variable created from the product of DQA * 01:02 × DQB * 02:01 gives the expected number of DQA * 01:02/DQB * 02:01 molecules that could be expressed by an individual: an individual with one copy of DQA * 01:02 and two copies of DQB * 02:01 can make two molecules consisting of DQA * 01:02/DQB * 02:01, while an individual with two copies of DQA * 01:02 and two copies of DQB * 02:01 can make four molecules consisting of DQA * 01:02/DQB * 02:01.

Ethics
Ethical approval was obtained from the institutional review committee of King's College London (Study Ref: 07/H0718/049). All patients with SLE and healthy controls were given information sheets and verbal explanations of what the research entailed. Informed written consent was obtained from all subjects.

Supplementary Materials
Supplementary Materials are available at HMG online.

Web resources
Summary association data for this study are available at http://insidegen.com/insidegen-LUPUS-data.html