Deciphering Sex-Specific Genetic Architectures Using Local Bayesian Regressions

Many complex human traits exhibit differences between sexes. While numerous factors likely contribute to this phenomenon, growing evidence from genome-wide studies suggest a partial explanation: that males and females from the same population possess differing genetic architectures. Despite this, mapping gene-by-sex (G×S) interactions remains a challenge likely because the magnitude of such an interaction is typically and exceedingly small; traditional genome-wide association techniques may be underpowered to detect such events, due partly to the burden of multiple test correction. Here, we developed a local Bayesian regression (LBR) method to estimate sex-specific SNP marker effects after fully accounting for local linkage-disequilibrium (LD) patterns. This enabled us to infer sex-specific effects and G×S interactions either at the single SNP level, or by aggregating the effects of multiple SNPs to make inferences at the level of small LD-based regions. Using simulations in which there was imperfect LD between SNPs and causal variants, we showed that aggregating sex-specific marker effects with LBR provides improved power and resolution to detect G×S interactions over traditional single-SNP-based tests. When using LBR to analyze traits from the UK Biobank, we detected a relatively large G×S interaction impacting bone mineral density within ABO, and replicated many previously detected large-magnitude G×S interactions impacting waist-to-hip ratio. We also discovered many new G×S interactions impacting such traits as height and body mass index (BMI) within regions of the genome where both male- and female-specific effects explain a small proportion of phenotypic variance (R2 < 1 × 10−4), but are enriched in known expression quantitative trait loci.

effects are disproportional between sexes (Lynch 1998). Although many traits seem to have a between-sex genetic correlation that is evidentially ,1, genome-wide association (GWA) studies intended to map G3S interactions have struggled to pinpoint such loci (Liu et al. 2012;Hoffmann et al. 2017). Based on this dichotomy, G3S interactions presumably exist for many traits, but the magnitude of a typical G3S interaction is suspected to be exceedingly small, explaining why such events commonly elude detection, particularly after multiple test correction. However, just as numerous small effect causal loci accumulate to affect phenotypic variance, small G3S interactions may accumulate to influence both sex differences and phenotypic variance.
Most GWA studies utilize single-marker regression (SMR), in which the phenotype is regressed upon allele content one SNP at a time, thereby obtaining marginal SNP effect size estimates that do not fully account for LD patterns. In contrast, whole-genome regression methods, in which the phenotype is regressed upon all SNPs across the genome concurrently, fully account for multi-locus LD. These methods are increasingly being used as a one-stop solution to estimate conditional (with respect to other SNPs) effect sizes of SNP markers and to provide genome-wide estimates including genomic heritability (Meuwissen et al. 2001;Yang et al. 2010;de Los Campos et al. 2015a) and between-sex genetic correlations (Zillikens et al. 2008;Yang et al. 2015;Rawlik et al. 2016). By estimating conditional SNP effect sizes, the goal across many studies is to select SNPs with nonzero effects and to build a model for predicting polygenic scores (de los Campos et al. 2010(de los Campos et al. , 2013Lello et al. 2018). Other works have directly illustrated the use of whole-genome regression methods for GWAS (Yi et al. 2003;Ayers and Cordell 2010;Guan and Stephens 2011;Zeng et al. 2012;Fernando et al. 2017). Whole-genome regressions are computationally challenging to use with biobank-level data; however, recent work suggests relatively accurate genomic prediction and SNP effect estimation can be achieved simply by accounting for local (as opposed to global) LD patterns (Vilhjálmsson et al. 2015).
Building on the idea of utilizing conditional SNP marker effects, here we developed local Bayesian regressions (LBR) in which the phenotype is regressed upon multiple SNPs spanning multiple LD blocks (thereby accounting for local LD patterns) to study sex differences in complex traits from the UK Biobank. The LBR model uses random-effect SNP-by-sex interactions (de Los Campos et al. 2015b;Veturi et al. 2019) that decompose conditional SNP effects into three components: (i) one shared across sexes, (ii) a male-specific deviation from the shared component, and (iii) a female-specific deviation from the shared component. Using samples from the posterior distribution of conditional SNP effects, we developed methods to infer sex-specific effects and G3S interactions at the single SNP level and by aggregating SNP effects within small LD-based regions, offering multiple perspectives to study sex-specific genetic architectures.
In this study, we have utilized genotypes for 607,497 autosomal SNPs from 259,000 distantly related Caucasians from the UK Biobank for assessing the performance of LBR in analyzing simulated and real complex traits including height, body mass index (BMI), waist-to-hip ratio (WHR), and heel bone mineral density (BMD). Using simulations, we showed that (i) for inferences of G3S interactions, LBR offers higher power with lower false discovery rate (FDR) than methods based on marginal effects (aka single-marker regression), and (ii) under imperfect LD between SNPs and causal variants (i.e., when causal variants are not genotyped), aggregating SNP effects within small LD-based regions offers higher power than methods based on testing individual SNPs.
The traits analyzed in this study span a range of genomewide metrics and G3S plausibility, from height and BMI, for which previous studies indicate males and females possess very similar genetic architectures , to WHR, a trait with well-documented G3S interactions (Heid et al. 2010;Randall et al. 2013;Shungin et al. 2015;Winkler et al. 2015), and BMD, for which G3S interactions are thought to exist but have eluded detection (Karasik and Ferrari 2008). LBR provided evidence of G3S interactions impacting height, BMI, and BMD at regions of the genome where sex-specific genetic effects are relatively small; however, such regions are enriched in known eQTL. For WHR, LBR replicated many large-magnitude G3S interactions previously discovered using single-marker regression, but also located novel G3S interactions near such genes as the estrogen receptor ESR1.

Materials and Methods
Genotype and phenotype data Genotyped SNPs from the custom UK Biobank Axiom Array (http://www.ukbiobank.ac.uk/scientists-3/uk-biobank-axiomarray/) were used in all analysis. Prior to analysis, all phenotypes were precorrected for sex, age, batch, genotyping center, and the first five genomic principal components. More information on genotypes and phenotypes can be found in the Supplemental Methods.
Overview of the LBR model, inference methods, and implementation Here, we present brief details of the LBR model and implementation, with more details found in the Supplemental Methods. An example of how to fit LBR and perform postprocessing of posterior samples of model parameters is available at: https://github.com/funkhou9/LBR-sex-interactions.
To study sex differences, we regressed male and female phenotypes (y m and y f ) on male and female genotypes (X m and X f ) using a SNP-by-sex interaction model of the form Above, m m and m f are male and female intercepts, b 0 ¼ fb 0j g (j = 1, . . ., p) is a vector of main effects, b m ¼ fb m j g and b f ¼ fb f j g are male and female interactions, respectively, and e m ¼ fe m i g and e f ¼ fe f i g are male and female errors that were assumed to follow normal distributions with zero mean and sex-specific variances. Female-specific and male-specific SNP effects are defined as Prior assumptions: For SNP effects, we adopted priors from the spike-slab family with a point of mass at zero and a Gaussian slab (Habier et al. 2011); specifically, Here, p k and s 2 b k are hyper-parameters representing the proportion of nonzero effects and the variance of the slab; these hyper-parameters were treated as unknown and given their own hyper-priors.
Local-regression: Implementing the above model with whole-genome SNPs (p 600K) and very large sample size (n 300K) is computationally extremely challenging. However, LD in homogeneous unstructured human populations spans over relatively short regions (R 2 between allele dosages typically vanishes within 1-2 Mb; Supplemental Material, Figure S1). Therefore, we applied LBR to long, overlapping chromosome segments ( Figure 1). Specifically, we divided the genome into "core" segments containing 1500 contiguous SNPs (roughly 8 Mb, on average), then applied the regression in Equation 1 to SNPs in the core segment plus 250 SNPs (i.e., roughly 1 Mb) in each flanking region, which were added to account for LD between SNPs at the edge of each core segment with SNPs in neighboring segments.
Inferences: We used the BGLR (Pérez and De Los Campos 2014) software to draw samples from the posterior distribution of the model parameters, and used these samples to make inference about individual SNP effects including: (i) the posterior probability that the j th SNP has a nonzero effect in males ðPPM SNPj Þ and females ðPPF SNPj Þ; and (ii) the posterior probability that the female and male effects are different ðPPDiff SNP j Þ: In regions involving multiple SNPs in strong LD, inferences at the individual-SNP level may be questionable. Therefore, we borrowed upon previous work by Fernando et al. (2017), enabling us to aggregate multiple sex-specific SNP effects within relatively small regions using "window variances." For each SNP j we defined a window j* around the SNP based on local LD patterns. We then defined the male-specific and female-specific window variances as s 2 g m j* ¼ varðX j* b m j* Þ and s 2 g f j* ¼ varðX j* b f j* Þ; respectively. Here, X j* represent genotypes at SNPs within the j* window, and varðÞ is the sample variance operator. Prior to model fitting, the phenotype is scaled across sexes; thus, sex-specific window variances may be interpreted as the proportion of total phenotypic variance explained by sex-specific SNP effects. From samples of sex-specific window variances, we computed the posterior probability of (i) nonzero male-specific window variance ðPPM s 2 g j* Þ; (ii) nonzero female-specific window variance ðPPF s 2 g j* Þ; and (iii) sex difference in window variances (denoted as PPDiff s 2 g j* ).

Overview of simulations
We used simulations to assess the power and FDR of LBR, and to compare them with that of standard single-marker-regression (SMR). Traits were simulated using SNP genotypes from the UK Biobank (119,190 males and 139,738 females, all distantly related Caucasians), thus providing realistic LD patterns. We simulated a highly complex trait with one causal variant per 2 Mb, which, on average, explained a proportion of the phenotypic variance equal to 3.3 3 10 24 . Our simulations used a total of 60,000 genotyped SNPs (onetenth of the genome, consisting of 6000 consecutive SNPs taken from 10 different chromosomes) and 150 causal variants; on the complete human genome "scale" this corresponds to a trait with 1500 causal variants and a heritability of 0.5 (see Supplemental Methods for further details). Of the causal variants, 40% (a total of 60 SNPs) had differing sex-specific effects and the remaining 60% (90 SNPs) had effects that were the same in males and females. Estimates of power and FDR were based on 30 Monte Carlo (MC) replicates.

Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully Figure 1 Strategy for implementing local Bayesian regressions genome-wide. The phenotype is regressed upon multiple sequential SNPs using a sliding window approach. The core region contained 1500 SNPs (roughly 8 Mb, on average), and each buffer region contained 250 SNPs (roughly 1 Mb, on average). Core parameters (posterior samples) are stitched together, then sex-specific effects and G3S interactions are inferred at the level of SNP j and window j*.
within the article. Genotype and phenotype data from the UK Biobank are available to all researchers upon application at http://www.ukbiobank.ac.uk/register-apply/. The script used to simulate phenotypes is available at https://github. com/funkhou9/LBR-sex-interactions. For eQTL enrichment analysis, single-tissue cis-eQTL data (significant variant-gene associations based on permutations) from GTEx V7 was downloaded from https://gtexportal.org/home/datasets. Supplemental Methods, Figure S1 through S5, and Table S1 through S4 are available at Figshare at https://doi.org/ 10.25386/genetics.11900247.

LBRs offer improved power with lower FDRs
Power and FDR when causal variants are genotyped: First, we analyzed the simulated phenotypes using all SNPs (including all 150 causal SNPs). Initially interested in inferring G3S interactions, we ranked SNPs based on the PPDiff SNP metric of LBR and on SMR's P-value for sex difference (P value-diff, see Supplemental Methods) and used the two ranks to estimate power and FDR as a function of the number of SNPs selected ( Figure 2). We observed little variation in power and false discoveries across MC replicates (Table S2); this was expected because each MC replicate involved 60,000 loci, 150 of which had causal effects. LBR showed consistently higher power (achieving a power of 75% when selecting the top 50 SNPs with highest PPDiff SNP ) and lower FDR than SMR. The FDR of LBR was very low when selecting the top 50 SNPs with highest PPDiff SNP and exhibited a very sharp phase-transition with fast increase in FDR thereafter.
We also compared the two methods based on arbitrary, albeit commonly used, mapping thresholds for SMR and LBR. At PPDiff SNP $ 0:95 [a posterior probability threshold used in GWAS to minimize the local false discovery rate (Efron 2008)], LBR selected an average (across simulation replicates) of 38.33 SNPs with an estimated power of 0.634 and estimated FDR of 0.007. Conversely, at P value-diff # 5 3 10 28 [a P-value threshold routinely used in human GWA studies, based on the number of approximately independent SNPs in the human genome (International HapMap Consortium et al. 2007)] SMR selected an average of 50.7 SNPs with an estimated power of 0.436 and estimated FDR of 0.451. Altogether, these results suggest that for G3S discovery, LBR offers higher power and lower FDR than SMR-the method most widely used in GWA studies-at least when G3S interactions are observed.
When trying to map SNPs that had effect in at least one sex, we used PP SNP j ¼ max½PPM SNP j ; PPF SNP j and P-values from an F-test (see Supplemental Methods) as metrics for LBR and SMR methods, respectively. Again, LBR showed higher power with lower FDR than a standard SMR P-value ( Figure S2). At traditional mapping thresholds, LBR and SMR had similar power but LBR achieved that power with much lower FDR; at PP SNP j $ 0:95, the average number of SNPs selected was 120.83 with an estimated power of 0.799 and estimated FDR of 0.009, whereas at P-value # 5 3 10 28 , the number of SNPs selected was 374.56 with an estimated power of 0.794 and FDR of 0.66.
Power and FDR when causal variants are masked: In a second round of analyses, we removed all causal variants from the panel of SNPs used in the analysis to represent a situation where causal variants are not observed, and genotyped SNPs are tagging causal variants at varying degrees. We initially assessed the relative performance of LBR to infer segments harboring G3S interactions. Power and FDR were assessed at several resolutions: 1 Mb, 500 kb, and 250 kb regions around each causal variant. At each resolution, a discovery was considered true if the finding laid within a segment harboring a G3S causal variant. In this scenario, we again observed that LBR had small variability in power and false discoveries between MC replicates (Table S3). Power and FDR were computed at different thresholds (using PPDiff SNP and PPDiff s 2 g for LBR and P value-diff for SMR; Figure 3). When using a 1 Mb target area-such that correct G3S discoveries must be within 500 kb on either side of a true G3S event-PPDiff s 2 g thresholds (LBR's window-based metric) provided highest power within an FDR range of 0-0.3; thereafter, SMR provided slightly higher power. As expected, when removing causal variants, power was estimated to be much lower than when causal variants were observed; at PPDiff s 2 g $ 0:95; the estimated power and FDR were 0.454 and 0.004, respectively, while at P value-diff # 5 3 10 28 , estimated power and FDR were 0.22 and 0.006. As seen in Figure 3, when considering a finer resolution (500 kb and 250 kb) the performance of both LBR-based approaches was more robust than that of SMR. Altogether this indicates that for the discovery and mapping of unobserved G3S interactions, LBR's window-based metric provides higher power with equivalent FDR and finer resolution than single-marker regression methods.
To infer segments containing causal variants that affect at least one sex, we again used LBR to decide whether either sex-specific effect was nonzero at the level of individual SNPs or windows. Using a 1-MB target area, LBR's window-based metrics provided the highest power within an FDR range of 0-0.025. When decreasing the target area, LBR provided the highest power over larger FDR ranges ( Figure S3).
For real human traits, many novel G3S interactions showed relatively small sex-specific effects We analyzed four complex human traits (height, BMI, BMD, and WHR) measured among 259,000 distantly related Caucasians from the UK Biobank (119,000 males and 140,000 females). For each trait, we fit the LBR model (Equation 1) across the entire autosome consisting of 607,497 genotyped SNPs using 417 overlapping segments ( Figure 1) and obtained evidence of G3S interactions at the level of SNP j and window j*.
To compare both the magnitude and sign of sex-specific SNP effects, we plotted eachb fj againstb mj ( Figure 4A). The trait was scaled across sexes prior to model fitting; thus, male-and female-specific effects were not constrained to the same scale. In this way, one might expect male-specific SNP effects to uniformly differ from female-specific SNP effects by a multiplicative factor if the variance of the phenotype is different between sexes (sample statistics within each sex are provided within Table S1). Surprisingly, we did not observe evidence of sex-specific SNP effects uniformly differing due to differences in phenotypic scale; for height, BMD, and BMI, as seen in Figure 4A, most large effect SNPs lie near the blue diagonal line. For WHR, we observed largely consistent results from prior studies (Heid et al. 2010;Randall et al. 2013;Winkler et al. 2015): namely the prevalence of numerous SNPs with relatively large effects in females but little to no effect in males. No traits exhibited evidence of any SNPs Figure 3 Power vs. false discovery rate for discovering genomic regions containing masked G3S interactions. Here, power is defined as the expected proportion of G3S interactions that are being tagged by at least one selected SNP j or window j*. False discovery rate is defined as the expected proportion of selected SNPs or windows that are not tagging any G3S interactions. Each point is an estimate, and error bars for both axes represent 95% confidence intervals. Point estimates and intervals were derived using 30 Monte Carlo replicates. Each facet corresponds to a different "target area," a fixed width around each G3S interaction that defines the set of SNPs effectively tagging it. LBR (SNP): uses the PPDiff SNP metric spanning 1-0. LBR (Window): uses the PPDiff s 2 g metric spanning 1-0. SMR: uses the P value-diff metric spanning (on the -log 10 scale) 8-0. with (i) high confidence male-and female-specific effects (no SNPs with PPM SNP $ 0:9 and PPF SNP $ 0:9) and (ii) differing signs between sexes.
We then aggregated sex-specific SNP effects within small LD-based regions to estimate sex-specific window variances s 2 gm j* and s 2 g f j* and compared the magnitude of each ( Figure  4B). Interestingly, for traits such as height, many large effect regions bear slightly larger window variances for males than for females. This was not observed at the single SNP level, suggesting that many regions bearing numerous small effect SNPs produce aggregate effects that are potentially larger (although not reaching a PPDiff s 2 g $ 0:9 threshold) in males than in females. One example is the GDF5 locus, previously known to strongly associate with adult height (Sanna et al. 2008), where a peak PPDiff s 2 g signal centered on rs143384 had slightly different estimated sex-specific window variances (ŝ 2 g m 5 3:0 3 10 23 andŝ 2 g f 5 2:6 3 10 23 ) but weak evidence of a G3S interaction ðPPDiff s 2 g ¼ 0:544Þ. For BMD, several large effect regions show suggestive evidence of G3S interactions, including the AKAP11 locus and the CCDC170 locus (PPDiff s 2 g ¼ 0:856 and 0.745, respectively), both previously associated with BMD (Mullin et al. 2016(Mullin et al. , 2017Correa-Rodríguez et al. 2018;Zhu et al. 2018).
To make G3S inferences at the level of window variances irrespective of the magnitude of sex-specific effects, we adopted a PPDiff s 2 g threshold of 0.9, which, in simulations (Figure 3), provided optimal power at an estimated FDR of 0.029 when using a 1-MB target area. For height, a total of eight distinct regions possessed a PPDiff s 2 g $ 0:9; two of which possessed a PPDiff s 2 g $ 0:95: For BMI, five distinct regions possessed a PPDiff s 2 g $ 0:9; with none reaching a more stringent PPDiff s 2 g $ 0:95 threshold, and none overlapping with two previously suggested BMI G3S SNPs . As seen in Figure 4C, inferred G3S interactions for height and BMI possess relatively small sex-specific window variances; as an example, for height, the window centered on SNP rs1535515 (near LRRC8C) had a PPDiff s 2 g ¼ 0:96; whereasŝ 2 gm ¼ 2:1 3 10 25 and s 2 g f ¼ 1:1 3 10 24 : For BMD, seven regions reached a 0.9 PPDiff s 2 g threshold, while one higher-confidence G3S interaction ðPPDiff s 2 g $ 0:95Þ was detected within ABO, the gene controlling blood type.
For WHR, roughly 45 distinct genomic regions possessed a PPDiff s 2 g $ 0:9; while 34 of these possessed a PPDiff s 2 g $ 0:95: We found many previously detected G3S interactions known to associate with WHR or a related trait, WHR adjusted for BMI (WHRadjBMI) (Heid et al. 2010;Randall et al. 2013;Shungin et al. 2015;Winkler et al. 2015). These included interactions at LYPLAL1, MAP3K1, COBLL1, RSPO3, and VEGFA, among others. We also detected numerous novel G3S interactions (Table 1) near physiologically intriguing genes such as the estrogen receptor gene ESR1 and the ATP binding cassette transporter A1 gene ABCA1 known to play a role in HDL metabolism ðPPDiff s 2 g $ 0:95Þ: As seen in Table 1, both novel signals possessed a high-confidence female-specific effect with weak evidence for a male-specific effect (PPF s 2 g $ 0:95; PPM s 2 g # 0:6); however, the magnitude of the female-specific effect was relatively small Listed are loci with at least 0.95 posterior probability that sex-specific window variances differ. The table is sorted first by trait, then by magnitude of the female-specific window variance. Results are filtered such that each window listed consisted of a distinct set of SNPs. A full list of all G3S signals at a PPDiff s 2 g $ 0.90 threshold is provided in Table S4. a Focal SNP is defined as the center SNP j in window j*. b The proportion of variance explained by male-specific SNP effects, expressed as a percentage. c The proportion of variance explained by female-specific SNP effects, expressed as a percentage. d Nearest gene and location identified through Axiom UKB WCSG annotations, release 34. The gene/locus is bold if it has been previously detected as a G3S interaction for WHR or WHR adjusted for BMI (Heid et al. 2010;Randall et al. 2013;Shungin et al. 2015;Winkler et al. 2015). e If "yes," the focal SNP is significantly associated with gene expression in at least one tissue, according to GTEx V7 ðŝ 2 g f # 1:4 3 10 24 Þ: As evident from Table 1, most novel WHR G3S interactions detectable with LBR are those with relatively small sex-specific effects.
Additionally, we utilized a traditional SMR approach (see Supplemental Methods) for the discovery of G3S interactions among traits to compare P value-diff signals to PPDiff s 2 g signals ( Figure S4). At P value-diff # 5 3 10 28 , there were no genome-wide significant G3S-interacting SNPs for height, one significant SNP for BMI near a window with PPDiff s 2 g $ 0:9; and one significant peak within ABO for BMD (the same signal detected using PPDiff s 2 g ). Regions with a PPDiff s 2 g $ 0:9 generally coincided with at least nominally significant P value-diff signals; for height and BMD, regions with PPDiff s 2 g $ 0:9 also possessed a peak SNP with P valuediff # 0.01. For BMI, PPDiff s 2 g $ 0:9 signals possessed a peak SNP of P value-diff # 0.1. This, together with the fact that novel G3S interactions found using LBR possess relatively small sex-specific effects, suggests that LBR may be detecting G3S interactions that are otherwise missed due to low power. Lastly, for WHR, most of the high-confidence PPDiff s 2 g $ 0:9 signals coincided with clear and obvious P value-diff peaks.
Inferred G3S interactions were enriched in tissuespecific eQTL As seen previously, many G3S interactions inferred using LBR have exceedingly small sex-specific effects. To further investigate whether G3S detections using the PPDiff s 2 g metric may be functionally relevant, we inferred whether such signals are enriched in eQTL identified from GTEx. Specifically, using a hyper-geometric test we asked whether PPDiff s 2 g -selected focal SNPs (SNP j within window j*) were enriched in eQTL, then compared to eQTL enrichment from P value-diff-selected SNPs as a function of the number of SNPs selected ( Figure S5). PPDiff s 2 g -selected focal SNPs showed consistently higher eQTL enrichment than P value-diff-selected SNPs for all traits except WHR. For instance, at PPDiff s 2 g $ 0:9; the total number of windows (focal SNPs) selected was 36, 264, 34, and 13, for height, WHR, BMD, and BMI, respectively. With these selections, eQTL enrichment P-values were 2.39 3 10 24 , 1.52 3 10 212 , 2.01 3 10 212 , and 8.33 3 10 24 , for height, WHR, BMD, and BMI, respectively. When selecting the same number of SNPs using P value-diff, enrichment P-values were 2.25 3 10 22 , 1.56 3 10 228 , 5.54 3 10 28 , 1.93 3 10 21 , for height, WHR, BMD, and BMI, respectively.
To provide more information about how genetic regions bearing G3S interactions may impact gene expression in specific tissues, we determined whether focal SNPs at PPDiff s 2 g $ 0:9 are enriched in tissue-specific eQTL ( Figure  5). For height, BMD, and WHR, such SNPs showed significant eQTL enrichment in at least one tissue, using a conservative Bonferroni corrected enrichment P-value of 2.6 3 10 24 (correcting for 192 tests in total; 48 tissues and four traits). Interestingly, BMD G3S signals are very strongly enriched in eQTL with associated eGenes (including ABO and CYP3A5) expressed in the adrenal gland, among other tissues. For height, we observed small enrichment P-values across many tissues since G3S focal SNPs are enriched in eQTL with associated eGenes (including LOC101927975 and CNDP2) expressed across many tissues. Lastly, for WHR, we observed G3S detections to be heavily enriched in eQTL with associated eGenes expressed in fibroblast, adipose, and skin tissues.

Discussion
We have investigated the degree to which sex-specific genetic architectures differ at local regions, using large biobank data (N 119,000 males and 140,000 females) and Bayesian multiple regression techniques that estimate sex-specific marker effects accounting for local LD patterns. The flexibility of the Bayesian approach enables multi-resolution inference of sex-specific effects, from individual SNP effects to window-variances that aggregate SNP effects within chromosome segments. These inferences can all be drawn using the results of the same model fit (Equation 1), but different postprocessing of samples of SNP effects from the posterior distribution.
The Bayesian multiple regression technique performed in this study, along with estimation of window variances, was largely inspired by Fernando et al. (2017). In that study, windows were defined using disjoint, fixed intervals. In contrast, for each SNP, we define a window based on local LD patterns, resulting in heavily overlapping, dynamically sized windows. The methods presented here also bear resemblance to those of Vilhjálmsson et al. (2015), which utilized pointnormal priors to estimate human SNP effects after accounting for local LD patterns. In that study, posterior means of SNP effects were estimated for the purposes of prediction, whereas, in this study, we derive the full posterior distribution numerically, allowing for inference of non-null SNP effects and window variances.
Through simulations, we showed that LBR provides superior power and precision to detect causal variants and those specifically bearing G3S interactions. We rationalize improvements in power upon traditional SMR methods by noting that the magnitude of a typical causal variant or G3S interaction is exceedingly small, and can elude hypothesis testing, due partly to the burden of multiple test correction. We also note that the resolution (peak size) in SMR signals is relatively large when using large sample sizes (due to not fully accounting for local LD patterns). To overcome this problem, we provided evidence that LBR methods-either by estimating conditional (accounting for local LD) marker effects or by aggregating conditional marker effects within relatively small regions-can achieve improved resolution when working with large sample sizes such as biobank-level data.
When using LBR to analyze real human traits, we have provided credence to our posterior probability-based discoveries by determining that LBR-detected G3S interactions are Figure 5 Evidence that LBR-identified G3S interactions are enriched in tissue-specific eQTL. Plotted on the x-axis is the P-value obtained from a hypergeometic test providing evidence that focal SNPs selected using PPDiff s 2 g $ 0:9 are enriched in tissue-specific eQTL. The dashed line represents a Bonferroni corrected significance threshold of 2.6 3 10 24 . generally more enriched in eQTL than SMR-detected interactions. For BMD, we provided new evidence that sex-specific effects differ within ABO, and that G3S interactions are highly enriched in adrenal gland-specific eQTL. This encourages the hypothesis that some G3S are eQTL that may modulate gene expression in the adrenal gland, with gene function dependent on the level of sex hormones. This was also an intriguing finding given that ABO blood groups have been known to associate with osteoporosis and osteoporosis severity (Choi and Pai 2004;Lu and Li 2011). For WHR, we detected previously known, large-magnitude G3S interactions that were discovered using WHR or WHRadjBMI (Heid et al. 2010;Randall et al. 2013;Shungin et al. 2015;Winkler et al. 2015), but additionally discovered novel, small magnitude G3S interactions near such genes as ESR1 and ABCA1. In a previous work analyzing WHRadjBMI, ABCA1 showed a significant female-specific genetic effect only; however, the test for G3S interaction failed to reach significance (Shungin et al. 2015).
For traits like height and BMI, large effect loci are estimated to have very similar effects between males and females, and loci with evidence of G3S interactions were those possessing relatively small sex-specific effects. As seen in Figure 4B, many relatively large window variances for height are estimated to be slightly higher for males than for females, albeit not reaching a PPDiff s 2 g $ 0:9 threshold. This is consistent with the fact that the global genomic variance for height was estimated to be higher in males than in females in a previous study using the interim release of the UK Biobank (Rawlik et al. 2016). Similarly, the same prior study estimated the global genomic variance of BMI to be higher in females than in males, and we observe, if anything, evidence of sex-specific window variances leading to the same conclusion. These observations may potentially indicate that relatively large causal variants have slightly different sex-specific effects for traits like height and BMI; however, if that is the case, we are still underpowered to confidently detect such interactions.
It is important to acknowledge that, while the methods presented here appear useful to decipher sex-specific genetic architectures from large human samples, additional work will be required to determine how these techniques may infer heterogeneous genetic effects in other contexts (other types of gene-by-covariate interactions), or when using different sample sizes or samples from different populations. With large sample sizes, the increased power and flexibility of LBR comes at the cost of a significantly larger computational burden than that involved in the traditional SMR approach; however, working with large datasets can be made manageable by adjusting the size of each fitted segment ( Figure 1) and parallel processing the fitting of each segment. Alternatively, LBR may be used as a follow up to traditional SMR tests, using preselected regions of interest. Another limitation inherent to aggregating SNP effects using window variances is that the sign of the effect is lost. In this way, when inferring G3S interactions through window variance differences, we cannot comment on whether sex-specific effects had the same sign or differing signs.
To conclude, we have demonstrated the powerful and flexible use of local Bayesian regressions for GWA to infer sex-specific genetic effects and G3S interactions using the UK Biobank. This was done largely by showing various means to utilize estimates of conditional (accounting for local LD), sex-specific SNP marker effects for GWA even when causal variants are not on the SNP panel for analysis. We anticipate that many more traits will be analyzed with this method to increasingly learn more about what is contributing to differences between males and females in human populations.