Local genetic covariance between serum urate and kidney function estimated with Bayesian multitrait models

Abstract Hyperuricemia (serum urate >6.8 mg/dl) is associated with several cardiometabolic and renal diseases, such as gout and chronic kidney disease. Previous studies have examined the shared genetic basis of chronic kidney disease and hyperuricemia in humans either using single-variant tests or estimating whole-genome genetic correlations between the traits. Individual variants typically explain a small fraction of the genetic correlation between traits, thus the ability to map pleiotropic loci is lacking power for available sample sizes. Alternatively, whole-genome estimates of genetic correlation indicate a moderate correlation between these traits. While useful to explain the comorbidity of these traits, whole-genome genetic correlation estimates do not shed light on what regions may be implicated in the shared genetic basis of traits. Therefore, to fill the gap between these two approaches, we used local Bayesian multitrait models to estimate the genetic covariance between a marker for chronic kidney disease (estimated glomerular filtration rate) and serum urate in specific genomic regions. We identified 134 overlapping linkage disequilibrium windows with statistically significant covariance estimates, 49 of which had positive directionalities, and 85 negative directionalities, the latter being consistent with that of the overall genetic covariance. The 134 significant windows condensed to 64 genetically distinct shared loci which validate 17 previously identified shared loci with consistent directionality and revealed 22 novel pleiotropic genes. Finally, to examine potential biological mechanisms for these shared loci, we have identified a subset of the genomic windows that are associated with gene expression using colocalization analyses. The regions identified by our local Bayesian multitrait model approach may help explain the association between chronic kidney disease and hyperuricemia.


Introduction
Chronic kidney disease (CKD) carries significant global health and economic burden (Hill et al. 2016;Bikbov et al. 2020). CKD stages 3-5 manifest as decreased renal function and are defined by elevated serum creatinine (sCr) or estimated glomerular filtration rate (eGFR) <60 ml/min/1.73 m 2 . Hyperuricemia is defined by serum urate (sU) concentration >6.8 mg/dl and is contributed to by deteriorating renal function (Sun et al. 2018). Hyperuricemia has several comorbidities associated with it, including CKD and gout (Clarson et al. 2015;Sun et al. 2018;Singh et al. 2019). Among people with hyperuricemia, there is a higher prevalence of CKD, and among patients with CKD, sU concentrations are higher (Zhu et al. 2012;Jing et al. 2018).
Genome-wide analyses have demonstrated that the association observed between eGFR and sU has a genetic basis. Tin et al. (2019) carried out a large-sample trans-ethnic genomewide association study (GWAS) of sU and, through cross-trait linkage disequilibrium (LD) score regression, obtained an estimate of overall genetic correlation between eGFR and sU of À0.26 (SE of 0.04). This was one of the largest negative correlations with sU out of 748 traits analyzed (Tin et al. 2019). Reynolds et al., using 2 large family-based datasets and Bayesian whole-genome regressions, obtained global genetic correlations between sCr (which has a direct inverse relationship to eGFR, hence the directionality difference between the estimates) and sU of 0.20 [95% credibility region (CR): 0.07, 0.33] in one dataset and 0.25 (95% CR: 0.07, 0.41) in the other (Reynolds et al. 2021). While these estimates contribute to dissecting biological causes of the observed comorbidities, the shared pleiotropic genomic regions and underlying biological mechanisms are only reliably discovered by estimating local genetic covariances (Shi et al. 2017).
GWAS of sU and eGFR have identified numerous loci associated with each phenotype separately. A recent study comparing large GWAS of these traits identified 36 shared loci (Leask et al. 2020). However, the GWAS methods used to detect the shared signals are based on the marginal association of individual single-nucleotide polymorphisms (SNPs) with phenotypes, thus not accounting for LD between SNPs. Our method improves over postanalysis of GWAS summary statistics by estimating neighboring SNP effects concomitantly. Incorporating local LD to estimate genetic effects in a tightly segregating chromosomal segment has been previously suggested to account for the correlation between SNPs (Vilhjá lmsson et al. 2015;Fernando et al. 2017;Funkhouser et al. 2020). Additionally, our methodology implements a multitrait model so we obtain direct genetic covariance estimates.
In this study, we aimed to characterize the common genetic basis for CKD (eGFR) and hyperuricemia (sU levels) by identifying pleiotropic genomic regions. To achieve this goal, we identified the local regions contributing to genetic variances and covariances across the whole genome (Funkhouser et al. 2020). We used Bayesian multitrait models to estimate the genetic (co)variances. SNP effects were estimated in large DNA regions and genetic variances and covariances were calculated from the posterior means per LD window. We identified 64 unique local genetic regions with significant local genetic covariance, including previously implicated and novel shared loci.

Participants
This study was based on 333,542 Caucasian participants from the UK Biobank. Participants missing sU or sCr for both of their 2 visits were excluded from the analysis. We excluded close relatives with relatedness !0.1, estimated using the R package BGData (Grueneberg and de los Campos 2019) (see details in the Supplementary Methods).

Genotypes and phenotypes
The UK Biobank used the custom UK Biobank Axiom Array by Affymetrix to genotype study participants (Affymetrix 2021). Quality control involved removing SNPs that had a minor allele frequency <1% or a missing call rate >5%, resulting in 607,490 autosomal chromosomes (1-22) SNPs (Kim et al. 2017).
Serum urate and sCr data were obtained from the first visit. For the small number of participants (0.28%) that did not have phenotype data of interest collected at the first visit, we retrieved data from the second visit. sCr was used to define eGFR and details on this can be found in the Supplementary Methods. For both eGFR and sU, we took a log transformation to normalize their distributions and preadjusted by age, sex, and the first 5 SNP-derived principal components using ordinary least squares.

Local Bayesian multitrait models
We estimated local (co)variances by fitting Bayesian models to chromosomal segments with a nonoverlapping core of 1,000 contiguous SNPs (between 3 and 4 Mbp depending on the region). We included 2 overlapping flanking regions each consisting of 250 SNPs to each side of the core. The SNPs in the flanking regions were included to account for the effects of SNPs that were outside of the core region but possibly in LD with SNPs in the core segment. Whole-genome regressions have been used to fit several markers concomitantly [e.g. Vazquez et al. (2012)]. However, biobank data impose computational restrictions due to its large dimensions. In the context of a single trait, local Bayesian conditional regressions have been employed to deal with the computational burden (Funkhouser et al. 2020). In their study, the authors indagated sex differences in genetic effects in single-trait models. Here, we utilized the idea of conditional regressions in large chunks of DNA with flanking regions in the context of a multitrait Bayesian model. This provides posterior estimates of variances and covariances between traits to find pleiotropic regions. The linear model used had the form Y¼ 1l 0 þ Xb þ E, where Y nÂ2 is a matrix containing the preadjusted phenotypes, l 2Â1 is a vector of trait-specific intercepts, X nÂ1;500 is an SNP-genotype matrix (1,000 core SNPs plus 250 flanking SNPs to each side), b 1,500Â2 is a matrix of SNP effects, and E nÂ2 is a matrix of error terms. The error terms were assumed to be IID multivariate normal with a mean of zero and covariance Var e i ð Þ¼R2Â2, where e i is the ith row of E. We used IID priors with a point of mass at zero and a bivariate Gaussian slab with a mean of zero and (co)variance matrix R 2Â2 . The extent of shrinkage and variable selection was influenced by 3 groups of parameters: R, R, and the prior proportion of nonzero effects, p. For a 2-trait model, p¼fp 1 , p 2 g and represents the prior probability of nonzero effects for traits 1 and 2 (sU and eGFR), respectively. We treated the fR, R, pg parameters as unknown and we assigned Inverse-Wishart priors for the (co)variance matrices and Beta priors for the prior probability of nonzero effects.
We used the multitrait function from the BGLR R package available in the R CRAN (Pé rez and de los Campos 2014) to generate 5,000 samples from the posterior distribution for each chromosomal segment. We filtered the samples of the SNP effects collected using a burn-in of 250 SNPs and a thinning interval of 10, thus retaining 475 samples for further inference.

Defining local LD-based windows
After we obtained the model estimates, for each core segment SNP we defined an LD window that contained correlated, neighboring SNPs with an overlapping sliding technique (Fernando et al. 2017;Funkhouser et al. 2020). Within each LD window, we collected the corresponding estimated effects and computed (co)variance estimates (described below). For each seed SNP x ij (i ¼ 1,. . .,n individuals and j ¼ 1,. . .,p core segment SNPs) coming from the core segment of SNPs, we sequentially identified SNPs in both directions (x ij* ) surrounding the seed SNP and included them in window j if Corr(x ij , x ij* ) ! 0.1. In a simplified example, if SNP x ij had an adequate pairwise correlation with 2 SNPs to the left, and 1 SNP to the right, the window for that SNP would be defined as the set of SNPs: Our definition of an LD sliding window also involved an allowance for 1 SNP in the sequential process to not meet this correlation criterion, to allow for a brief loss of LD or minor mapping errors, and the SNP was still included in the LD window. In the previous example, if Corr(x ij , x ijÀ1 ) < 0.1, and Corr(x ij , x ijÀ2 ) ! 0.1, then the set would still include both x ijÀ2 and x ijÀ1 . The LD window ends when 2 SNPs sequentially did not meet the criteria described above. The LD windows could include flanking buffer SNPs, but buffer SNPs were never used to define an LD window.

Local (co)variances
For each LD window, we computed the local variances for traits 1 and 2 and the local and covariances using Here, X w is the matrix containing the genotypes of the SNPs in the wth window and b w1s and b w2s are the samples of effects of those SNPs for traits 1 and 2 collected at the sth iteration of the sampler. This generated samples from the posterior distribution of the local (co)variances, which we used to produce posterior mean estimates (by averaging across the samples from the posterior distribution), estimate posterior SDs, and obtain 95% posterior CRs. As discussed in Lehermeier et al. (2017), this approach accounts for the contribution of local LD to genetic (co)variances and, by averaging over samples from the posterior distribution, for uncertainty about SNP effects.
Gene expression/eQTL analysis A colocalization analysis was performed between GWAS significant markers for sU and sCr and the publicly available eQTL data from Genotype Tissue Expression (GTEx) V8 (Giambartolomei et al. 2014). The R package COLOC was used, which implements a Bayesian test that analyses a single genomic region and identifies LD patterns in that locus using SNP summary statistics and the associated minor allele frequencies. The lead variant for both sCr and sU was used at each significant covariance window with a surrounding 500 kb buffer in the GTEx database. The contextualizing developmental SNPs using 3D Information algorithm (Fadason et al. 2018; Genome3d/Codes3d-V2 [2019] 2021) was modified to identify long-distance regulatory relationships for the lead sU and sCr variants at each significant covariance window within a 500-kb region. eQTL data for variants 6500 kb of the lead variant were also extracted from GTEx and then COLOC was used to assess if the significant cis-and trans-eQTL identified were colocalized with sCr and sU signals. An eQTL was determined to be colocalized if the COLOC H4 [posterior probability of colocalization (PPC)] was at least 0.5 for both traits and at least 0.8 for one of the 2 traits, according to Giambartolomei et al. (2014).

Validation
We performed a validation analysis with the related Caucasian UK Biobank cohort, consisting of 57,370 subjects not missing sU or eGFR phenotypes. The genotyping array used for this cohort is the same as that used for the discovery analysis cohort. The validation analysis repeated the estimation procedures described above and the sliding LD windows used were identical to those used in the discovery set.

Results
This study was based on 333,542 distantly related white participants, of whom 53.7% were female with an average age of 56.9 6 8.0 years old. The average sCr level was 0.8 6 0.2 mg/dl (the average 6 SE), average eGFR was 144.2 6 56.0 ml/min/1.73 m 2 , and the average sU level was 5.2 6 1.3 mg/dl. Two (2.0) percent of the individuals had an ICD10 diagnosis or self-diagnosis of gout, 12.4% had hyperuricemia, 0.5% had CKD, and 0.3% had hyperuricemia and CKD. We analyzed the markers (sU and eGFR) using a sequence of Bayesian multitrait models where the markers were regressed on contiguous SNPs in a large chromosomal segment (core) plus overlapping flanking buffers. We collected the samples from the posterior distribution of effects for each core segment and used these samples to estimate the local variances for each marker (Fig. 1) and the local covariances between the markers (Fig. 2). The (co)variances were estimated within 511,828 overlapping LD windows (small, nonindependent contiguous chromosomal regions).
We found 134 LD windows with covariance estimates that had a 95% CR excluding zero ( Fig. 2; Supplementary Table 1). The number of SNPs in the significant LD windows ranged from 1 to 56, and the median SNPs per window was 6.0 (22 kb on average, excluding 12 single-SNP windows). Interestingly, although the global correlation between sU and eGFR is negative (Tin et al. 2019;Reynolds et al. 2021), 49 of the 134 significant windows showed positive genetic covariance directionality, and the remaining 85 were negative.
The 134 significant LD windows often included the same variants and mapped to identical GWAS loci, so we collapsed the 134 windows to 64 unique loci that possessed genetic covariance signal between eGFR and sU (Supplementary Table 2 and Supplementary Methods). The top 25 distinct loci implicated by the significant windows in terms of covariance magnitude are listed in Table 1. A graphical representation of the top significant loci is presented in Fig. 3.

Gene expression/eQTL analysis
We used COLOC (Giambartolomei et al. 2014) and expression data from the GTEx project (v8) (Carithers and Moore 2015) to identify candidate causal genes at significant local genetic covariance windows between sU and eGFR. Twenty-six of the 64 distinct significant shared loci (41.6%) were shown to modify the expression of candidate causal genes colocalized with the covariance signals (Supplementary Table 3). Of note are TRIM6 and L3MBTL3 in cis, which are genes that have a significant covariance signal and a colocalized eQTL that is expressed in the kidney.

Validation
In the related white UK Biobank validation cohort 12 LD windows were significant for genetic covariance between sU and eGFR (Supplementary Table 1). All of the 12 significant windows were also significant in the main analysis with consistent directionality. The 12 windows condensed to 5 distinct loci (Supplementary  Table 2), meaning 5 out the 64 significant distinct loci from the main analysis were also significant in this validation. The sample size of the related cohort is 82.8% smaller (n ¼ 57,370) than the unrelated cohort used in the discovery set (n ¼ 333,542), so our validation analysis was comparatively underpowered to the main analysis.

Discussion
The goal of this study was to infer the shared genetic architecture of sU (causal for gout), and eGFR (a marker for CKD). Our results highlight genes that may be involved in the observed relationship between the traits. In this study, we estimated local genetic (co)variances between sU and eGFR and identified regions with pleiotropy. This study was based on the large-scale UK Biobank and formal statistical inference from local Bayesian multitrait models. Our results demonstrated that genetic covariance between eGFR and sU was widespread across the genome. Our method identified 64 distinct LD windows with shared genetic effects between eGFR and sU, the majority of which had negative genetic covariance estimates. We identified 22 distinct novel shared loci, to our knowledge, with significant local genetic covariance for sU and eGFR, including MMP11/SMARCB1, ADH1B, MIP/GLS2, ENG/AK1, EPB41L5, KIAA1199, CELSR2, SOS2, KCNS3, TET2, SMLR1/EPB41L2, GLIS1, KIAA1683/JUND, and METTL10/ FAM175B. Furthermore, 14 distinct loci identified were previously only known to be associated with only one of the 2 traits, demonstrating that the set of loci contributing to both traits is  Windows that contained SNPs in loci associated with known eGFR genes are highlighted in dark green, windows that contained SNPs in genes associated with sU are highlighted in blue, and windows that contained SNPs in genes associated with both sU and eGFR [from comparing GWAS, Leask et al. (2020)] are highlighted in bright green. Windows significant for genetic covariance are highlighted in red. The covariance estimates were multiplied by 1E4.
substantially larger than previously thought. These loci are partially responsible for the comorbidity between hyperuricemia/ gout and CKD.
One advantage of the local method that we present here is that it facilitates the identification of genomic windows with opposite signs to the overall negative genetic correlation between eGFR and sU. Out of the significant shared loci, about two-thirds showed negative local genetic covariance estimates. This is consistent with the overall genetic covariance directionality (Tin et al. 2019;Reynolds et al. 2021), indicating that they either contribute to worsening kidney function (decreasing eGFR or increasing sCr) and increasing sU, or vice versa. Interestingly, there were 21 distinct significant shared loci with positive local genetic covariance estimates (about one-third). Positive covariance indicates that the genomic region either contributes to increasing sU and improved kidney function or decreasing sU and worsening kidney function. Two of the loci with a significant positive signal, GCKR and CPS1, are mainly expressed in the liver and one, LRP2, is mainly expressed in the kidney (Carithers and Moore 2015). One novel shared locus identified in this study consisted of the genes SLC17A1, SLC17A3, and SLC17A2. This large window in chromosome 6 (56 SNPs, Table 1) had a strong, positive significant covariance signal and SLC17A1 and SLC17A3 are urate transporters both linked to gout (Reimer 2013). The opposite signs of locus- specific genetic covariances are indicative of distinct physiological processes governing the phenotypic expression of urate and eGFR. The loci with positive covariance in particular are excellent candidates for discovering functional mechanisms that simultaneously increase sU and improve kidney function. Urate transporters SLC2A9 and ABCG2 have the largest GWAS effect sizes for sU, accounting for 4-5% of the variance in sU (Yang et al. 2010;Hughes et al. 2014;Johnson et al. 2018;Major et al. 2018;Tin et al. 2019). However, no windows in SLC2A9 or ABCG2 had a 95% CR for local genetic covariance that did not include zero. Our results demonstrate that windows in both SLC2A9 and ABCG2 loci are associated with just sU levels but are not pleiotropic regions for sU and eGFR. A similar phenomenon is observed with the eGFR gene SHROOM3. That is, none of the windows containing SNPs in SHROOM3 were significant for local genetic covariance. This exemplifies that the loci driving the genetic correlation between these 2 traits are not necessarily the leading GWAS hits.
Previous research investigating pleiotropic genetic loci between sU and eGFR has implicated loci as shared if signals of association obtained from marginal single-marker regressions (e.g. GWAS) for both traits are colocalized (Leask et al. 2020). Leask et al. (2020) recently compared overlapping loci between 2 large GWAS, one of sU and the other kidney function (Wuttke and Kö ttgen 2016;Tin et al. 2019), and found 36 independent colocalized loci. Our results validate 20 of these 36 loci, and all but 3 loci (DACH1, CPS1, and INS-IGF2) had covariance directionality that matched the directionality of effects found by Leask et al. (2020).
Our covariance approach may have direct implications for assessing causal relationships between exposures using Mendelian randomization (MR). Pleiotropic genetic variants violate assumptions of univariate MR, however, they are useful in multivariable MR that can simultaneously assess the causal effects of multiple risk factors on an outcome (Burgess and Thompson 2015). For example, genetic variants from SLC2A9 and ABCG2 may be valid instrumental variables to use in MR to test for a causal effect of sU on CKD, however, the loci listed in Supplementary Table 1 would not. In fact, SLC22A11 has previously been identified as a pleiotropic variant that may improve kidney function through its activity in raising urate levels (Hughes et al. 2014). MR has previously been used to show that sU is not causal of CKD (Jordan et al. 2019), however, Jordan et al. noted significant pleiotropy in the genetic variants used in their study, which they attempted to counter using MR techniques robust to pleiotropy. Of the 26 SNPs used by Jordan et al., rs1260326 (GCKR) and rs17050272 (LINC01101) were identified by us as shared, and rs1165151 and rs3741414 were located within one of our significant pleiotropic regions but were not in our genotyping platform.
Our eQTL analysis of the windows significant for local genetic covariance uncovered numerous genes of interest, such as SLC7A9, which encodes a solute transporter largely expressed in the small intestine, A1CF, which encodes a protein involved in apolipoprotein B synthesis in the liver, and TRIM6, which encodes an E3 ubiquitin ligase involved in interferon gamma signaling and innate immune response with high expression levels in the kidney (Carithers and Moore 2015). The genes uncovered from the eQTL analysis will be particularly interesting for future study, as they will likely aid our understanding of the relationship between kidney function and sU.
Through our approach of obtaining local genetic (co)variance estimates from Bayesian multitrait models in very large datasets, we have uncovered 22 novel shared genetic regions for sU and eGFR. The approach presented in this paper was applied in the context of sU and eGFR, but it could be applied to any pair of traits. While our discovery set sample size is excellent, we lack a dataset of a similar size for the validation. Some regions were validated but not all.
The local shared genomic regions we have uncovered in this study can provide insight into the relationship between hyperuricemia/gout and CKD, elucidating the biological mechanisms underlying the traits. This will help further understanding of the genetic basis of hyperuricemia/gout and CKD.

Data availability
All data used are secondary and are held in public repositories. This study utilized deidentified data from the UK Biobank where genotype and phenotype data are available to researchers upon registration. The protocol and consent were approved by the UK Biobank's Research Ethics Committee and were conducted under the application number "15326." For eQTL analysis, cis-and trans-eQTL data were downloaded from the GTEx V8 portal (Carithers and Moore 2015).
Supplemental material is available at G3 online.