Genome-wide association study of body fat distribution traits in Hispanics/Latinos from the HCHS/SOL

Abstract Central obesity is a leading health concern with a great burden carried by ethnic minority populations, especially Hispanics/Latinos. Genetic factors contribute to the obesity burden overall and to inter-population differences. We aimed to identify the loci associated with central adiposity measured as waist-to-hip ratio (WHR), waist circumference (WC) and hip circumference (HIP) adjusted for body mass index (adjBMI) by using the Hispanic Community Health Study/Study of Latinos (HCHS/SOL); determine if differences in associations differ by background group within HCHS/SOL and determine whether previously reported associations generalize to HCHS/SOL. Our analyses included 7472 women and 5200 men of mainland (Mexican, Central and South American) and Caribbean (Puerto Rican, Cuban and Dominican) background residing in the USA. We performed genome-wide association analyses stratified and combined across sexes using linear mixed-model regression. We identified 16 variants for waist-to-hip ratio adjusted for body mass index (WHRadjBMI), 22 for waist circumference adjusted for body mass index (WCadjBMI) and 28 for hip circumference adjusted for body mass index (HIPadjBMI), which reached suggestive significance (P < 1 × 10−6). Many loci exhibited differences in strength of associations by ethnic background and sex. We brought a total of 66 variants forward for validation in cohorts (N = 34 161) with participants of Hispanic/Latino, African and European descent. We confirmed four novel loci (P < 0.05 and consistent direction of effect, and P < 5 × 10−8 after meta-analysis), including two for WHRadjBMI (rs13301996, rs79478137); one for WCadjBMI (rs3168072) and one for HIPadjBMI (rs28692724). Also, we generalized previously reported associations to HCHS/SOL, (8 for WHRadjBMI, 10 for WCadjBMI and 12 for HIPadjBMI). Our study highlights the importance of large-scale genomic studies in ancestrally diverse Hispanic/Latino populations for identifying and characterizing central obesity susceptibility that may be ancestry-specific.

). The most 232 significant of these associations was with MEGF9 (Multiple Epidermal Growth Factor-Like Domains 9) in 233 whole blood (P=1.8x10 -149 ), a gene that rests 30 Kb upstream of rs1330996. This SNP is also significantly 234 associated with expression of MEGF9 in subcutaneous adipose tissue, sun-exposed skin, and T-cells. 235 Additionally, our lead variant in CDK5RAP2 is associated with expression of MEGF9 in whole blood and 236 the testis; and with expression of PSMD5 (proteasome [prosome, macropain] 26S subunit, non-ATPase, 237 5) and/or PSMD5-AS1 in several relevant tissues, including whole blood, tibial artery, tibial nerve, lung, 238 thyroid, esophagus muscle, skeletal muscle, liver, cerebellum, and subcutaneous adipose tissues, among 239 others. Although rs13301996 is associated with gene expression for several genes, there is additional 240 support for a regulatory role of this SNP and those with which it is in high LD (r 2 >0.8). For example, our 241 lead SNP lies just outside of a DNase hypersentivitiy cluster, lies within a region with evidence of histone 242 modification in nine tissues including brain, skin, muscle, and heart; and likely falls in a transcription 243 factor binding site active in skeletal and lung tissue; etc. (Supplementary RegulomeDB Score of 6, indicating little evidence of binding. 246 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. Our lead SNP associated with HIPadjBMI in women, rs28692724 (NC_000014.9:g.77027445C>T), 257 is a synonymous variant exonic to IRF2BPL (interferon regulatory factor 2 binding protein like) that is 258 significantly associated with expression of the same gene in whole blood (Supplementary Table 14 surprising that four novel loci (rs13301996, rs79478137, rs28692724, and rs3168072) were mapped. We . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.23.21251958 doi: medRxiv preprint encodes a regulator of CDK5 (cyclin-dependent kinase 5) activity (60), interacts with CDK5R1 and 321 pericentrin (PCNT) (60), plays a role in centriole engagement and microtubule nucleation (61), and has 322 been linked to primary microcephaly and Alzheimer's disease (62, 63). In addition, we identified a novel 323 association for WHRadjBMI with rs79478137 (p-value= 3.64E -9 ) in Hispanic/Latino women. Rs79478137 324 is intronic to the antisense SLC22A18AS gene, which is highly expressed in the liver and kidney, as well as 325 the gastrointestinal tract and placenta. Very little is known of the biological role of this gene (64), and 326 SLC22A18AS has no counterpart in mice or other rodents (65). Thus, although its genomic organization is 327 known, the regulation and function of this gene is not understood (66). 328 Lastly, we identified a novel association for HIPadjBMI at rs28692724 following meta-analysis 329 with an independent sample of European women. Rs28692724 is a synonymous variant in IRF2BPL, 330 which encodes a transcription factor that, acting within the neuroendocrine system, plays a role in 331 regulating female reproductive function (67). 332 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.23.21251958 doi: medRxiv preprint A limitation of this study was the small sample size within each HCHS/SOL background group. 345 However, the use of genetic-analysis groups in our analyses accounted for heterogeneity of genetic 346 effects among ethnic groups often ignored in GWAS studies. Compared to self-identified background 347 groups, genetic-analysis groups are more genetically homogeneous and lack principal component 348 outliers in stratified analysis, which may hinder detection of and adjustment for important population 349 structure when ignored (68). In addition, genetic-analysis groups allow all individuals to be classified in a 350 specific group, whereas many individuals in HCHS/SOL have a missing or non-specific self-identified 351 background (68). Therefore, by using genetic-analysis groups in our analysis rather than self-identified 352 background groups, we have increased our study's power to detect novel and previously documented 353 associations with central adiposity traits (68). Due to the diverse background of our discovery 354 population, another limitation was the lack of an ideal replication study. We attempted to overcome this 355 limitation by focusing on both multiethnic meta-analyses, which would validate those associations that 356 generalize across ancestries, and meta-analyses stratified by ancestry, which may allow for validation of 357 more population-specific associations. However, it is possible that the limited Native American ancestry 358 present across our replication cohorts may have hindered replication, and further analyses in more 359 diverse Hispanic/Latino populations are needed to confirm the relevance of promising central adiposity 360 associated loci identified in our study. Americans. We also found that several previously identified central adiposity loci discovered in 366 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.23.21251958 doi: medRxiv preprint selection), that have resulted in changes in LD, or allele frequency differences, or due to variation in tobacco and alcohol assessed by self-report), and socio-demographic (e.g., socioeconomic status, 391 migration history) assessments. This study was approved by the institutional review boards at each field 392 center, where all subjects gave written informed consent. 393 Participants in HCHS/SOL self-identified their background as Mexican, Central American, South 394 American (mainland), Puerto Rican, Cuban, or Dominican (Caribbean). Some participants chose "more 395 than one," "other," or chose not to self-identify. We addressed the missing or inconsistent data in self-396 identified background groups by defining "genetic analysis groups," described in Conomos et al (68). To 397 increase power in this analysis, we chose to stratify by the broader mainland or Caribbean categories 398 rather than more specific groups. In this paper, we will use the term "background group" to refer to a All measurements were taken from the baseline visit. Participants were dressed in scrub suits or 406 light non-constricting clothing and shoes were removed for weight and height measurements. WC and 407 HIP were measured using Gulick II 150 and 250 cm anthropometric tape and rounded to the nearest 408 centimeter (cm). Height was measured using a wall mounted stadiometer and rounded to the nearest 409 cm, and weight measured with a Tanita Body Composition Analyzer, TBF-300Ato the nearest tenth of a 410 kilogram (kg). Height and weight were used to calculate BMI (kg/m 2 ). We applied a log10 transformation 411 on HIP, due to its non-normal trait distribution. 412

413
Genotyping 414 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We used linear mixed-model regression, assuming an additive genetic model adjusted for age, age 2 , 426 study center, sample weights, genetic analysis subgroup (68, 75), principal components to account for 427 ancestry, population structure using kinship coefficients and sample eigenvectors, household, census 428 block group, and sex in the combined analysis. Kinship, household, and block group were treated as 429 random effects in each model. Sample weights were incorporated in our models as a fixed effect to 430 account for oversampling of the communities in the 45-74 age group (n=9,714, 59.2%) which was 431 intended to facilitate the examination of HCHS/SOL target outcomes. HCHS/SOL sampling weights are 432 the product of a "base weight" (reciprocal of the probability of selection) and three adjustments: 1) non-433 response adjustments made relative to the sampling frame, 2) trimming to handle extreme values (to 434 avoid a few weights with extreme values being overly influential in the analyses), and 3) calibration of 435 weights to the 2010 U.S. Census according to age, sex, and Hispanic background. We used genetic-436 analysis groups in our analyses accounted for heterogeneity of genetic effects among ethnic groups. 437 Compared to self-identified background groups, genetic-analysis groups are more genetically 438 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.23.21251958 doi: medRxiv preprint groups allow all individuals to be classified in a specific group, whereas many individuals in HCHS/SOL 441 have a missing or non-specific self-identified background (68). Also, we conducted stratified analyses by 442 region (mainland vs. Caribbean) to identify potential heterogeneity in effect by background group. We 443 examined heterogeneity across background group using I 2 statistics calculated using METAL (76) and 444 tested for significant interaction (Pdiff<0.05) by background group using EasyStrata (77). 445 To decrease the number of spurious associations, we filtered all results on minor allele 446 frequency (MAF) < 0.5%, Hardy-Weinberg Equilibrium (HWE) P < 1x10 -7 , minor allele count (MAC 447 [effective N]) < 30 (68). Additionally, we categorized suggestive loci as those with variants reaching 448 P<1x10 -6 and with at least one additional variant within 500 kb+/-with a P<1x10 -5 . We used regional 449 association plots produced in LocusZoom to visualize association regions using 1000 Genomes Admixed 450 American (AMR) reference population for LD (http://locuszoom.sph.umich.edu/). 451 452

Local Ancestry Estimation 453
We estimated local ancestry (African, Native American, and European) using RFMix (78), which 454 applies a conditional-random-field-based approach for estimation, to inform differences by background is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. where applicable for each SNP that reached suggestive significance (P<1x10 -6 ) in the discovery analysis. 473 We employed a fixed-effects meta-analysis using the inverse variance-weighted method for 474 WHRadjBMI and WCadjBMI. For HIPadjBMI, due to trait transformations, we used sample size weighted 475 meta-analysis. All meta-analyses were implemented in METAL (82). We conducted meta-analyses 476 stratified by race/ethnicity group and combined across groups. We included SNPs with a study and 477 stratum specific imputation quality (Rsq) greater than 0.4, Hardy-Weinberg Equilibrium P-value greater 478 than 1x10 -7 , and a minor allele count (MAC) greater than five. To declare statistical significance for 479 replicated loci, we required in each replication sample a trait and stratum-specific P<0.05 with a 480 consistent direction of effect with discovery, and genome-wide significance (P<5x10 -8 ) when meta-481 analyzed together with HCHS/SOL. We also hypothesized that some regions did not generalize due to lack of power (the HCHS/SOL 501 sample size is much smaller than the GIANT sample size). To test this, we took all tested SNPs from the 502 non-generalized regions and considered the GIANT multi-ethnic GWAS results. In an iterative procedure, 503 we pruned the list by first identifying the SNP with lowest GIANT P-value in the analysis, then found all 504 SNPs in a 1MB region around it and removed them from the list. We repeated until no SNPs remained. 505 All the SNPs in the pruned list were selected solely based on their GIANT P-values. Since there were 506 many such variants, we further grouped them according to their P-values. Groups were formed by trait, 507 sex (men, women, combined), and GIANT P-value (between 10 -6 to 10 -7 , between 10 -7 to 10 -8 , and 508 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.23.21251958 doi: medRxiv preprint each sex stratum and each group of SNPs, the value of the GRS was the sum of all trait increasing alleles 510 in that group. We tested the GRS in the appropriate analysis group (men, women, combined). A low P-511 value implies that some of the SNPs in the group are likely associated with the trait in HCHS/SOL. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)  CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 25, 2021. ; https://doi.org/10.1101/2021.02.23.21251958 doi: medRxiv preprint