Evidence From Men for Ovary-independent Effects of Genetic Risk Factors for Polycystic Ovary Syndrome

Abstract Context Polycystic ovary syndrome (PCOS) is characterized by ovulatory dysfunction and hyperandrogenism and can be associated with cardiometabolic dysfunction, but it remains unclear which of these features are inciting causes and which are secondary consequences. Objective To determine whether ovarian function is necessary for genetic risk factors for PCOS to produce nonreproductive phenotypes. Design, Setting, and Participants Cohort of 176 360 men in the UK Biobank and replication cohort of 37 348 men in the Estonian Biobank. Main Outcome Measures We calculated individual PCOS polygenic risk scores (PRS), tested for association of these PRS with PCOS-related phenotypes using linear and logistic regression and performed mediation analysis. Results For every 1 SD increase in the PCOS PRS, men had increased odds of obesity (odds ratio [OR]: 1.09; 95% CI, 1.08-1.10; P = 1 × 10-49), type 2 diabetes mellitus (T2DM) (OR: 1.08; 95% CI, 1.05-1.10; P = 3 × 10-12), coronary artery disease (CAD) (OR: 1.03; 95% CI, 1.01-1.04; P = 0.0029), and marked androgenic alopecia (OR: 1.03; 95% CI, 1.02-1.05; P = 3 × 10-5). Body mass index (BMI), hemoglobin A1c, triglycerides, and free androgen index increased as the PRS increased, whereas high-density lipoprotein cholesterol and SHBG decreased (all P < .0001). The association between the PRS and CAD appeared to be completely mediated by BMI, whereas the associations with T2DM and marked androgenic alopecia appeared to be partially mediated by BMI. Conclusions Genetic risk factors for PCOS have phenotypic consequences in men, indicating that they can act independently of ovarian function. Thus, PCOS in women may not always be a primary disorder of the ovaries.


UK Biobank cohort
Participants underwent genotyping with one of two closely related arrays, and additional genotypes were imputed using the Haplotype Reference Consortium (HRC) data as the main reference panel (1). Analyses were conducted on genetic data release version 3 under UKB application 11898. All work complied with all relevant ethical regulations related to the UK Biobank, and all participants provided informed consent.
Genetic principal components were derived on a subset of unrelated individuals of European ancestry (randomly selected without regard to PCOS case or control status) in the UK Biobank using FlashPCA2 (2). Because the largest PCOS GWAS meta-analysis was primarily conducted in individuals of European ancestry, and genetic risk scores are less valid with current methodologies when applied to populations with different ancestry due to differences in genetic architecture (3,4), the present analysis was restricted to individuals of European ancestry, which was determined by a combination of selfreported ancestry and principal components analysis (1,5). Related individuals were identified by genetic data, and kinship coefficients were reported for all pairs of relatives who were inferred to be third-degree relatives or closer (1). To minimize the possibility of confounding due to genetic relatedness, only one individual was included per group of relatives (third-degree or closer). If any individual had a diagnosis of PCOS, the individual with PCOS was preferentially included in the analysis to optimize the prevalence of PCOS. 3

Estonian Biobank replication cohort
Analyses in the EstBB were done under ethical approval 1.1-12/624 from the Estonian Committee on Bioethics and Human Research and data release N05 from the EstBB. The 200K data freeze was used for analyses. Genotyping for all participants was completed at the Core Genotyping Lab of the Institute of Genomics, University of Tartu, using Illumina GSAv1.0, GSAv2.0, and GSAv2.0_EST arrays. PLINK format files were created using Illumina GenomeStudio v2.0.4. Individuals were excluded from the analysis if their call-rate was <95% or if sex defined based on heterozygosity of the X chromosome did not match the sex in phenotype data. Before imputation, variants were filtered by call-rate <95%, HWE p-value <1x10 -4 (autosomal variants only), and minor allele frequency <1%. Variant positions were updated to b37, and all variants were changed to the TOP strand orientation using GSAMD-24v1-0_20011747_A1-b37.strand.RefAlt.zip files from https://www.well.ox.ac.uk/~wrayner/strand/. Prephasing was completed using Eagle v2.3 software (number of conditioning haplotypes Eagle2 uses when phasing each sample was set to: --Kpbwt=20000)(6), and imputation was performed using Beagle v.28Sep18.793 with effective population size ne=20,000 (7).
Population specific imputation reference of 2297 WGS samples was used (8). Genetic principal components were derived on the relatedness matrix of Estonian biobank participants. The cohort was restricted to men of European ancestry, which was determined by a combination of self-reported ancestry and principal components analysis (1,5). Related individuals were identified by genetic data, and kinship coefficients were reported for all pairs of relatives who were inferred to be third-degree relatives or closer.
For ascertainment of phenotypes, ICD codes were obtained via linking with the national Health Insurance Fund and other relevant databases (9).

PCOS polygenic risk score calculation
We compared three methods for calculating a polygenic risk score (PRS) for PCOS. Genetic variants with ambiguous strand (A/T or C/G) were removed from all polygenic risk score calculations. PRSice-2 uses a "clumping and thresholding" approach to clump genetic variants in close linkage disequilibrium, such that the remaining variants are independent of each other, and includes only those variants with a GWAS association P-value below a given threshold, with the threshold chosen to maximize the association of the risk score with PCOS (10). The PRSice-2 software also incorporates both directly

OUTCOMES IN MEN
Cardiometabolic phenotypes: Obesity was defined as a BMI ³ 30 kg/m 2 .
Coronary artery disease was based on a composite of myocardial infarction, ischemic heart disease, and/or coronary revascularization (12). Myocardial infarction was defined by self-report, a reported date of myocardial infarction, and/or ICD-9 codes of 410.9, diagnosis codes from hospitalization records, medication use, and age at diagnosis (13,14). Those with age of diabetes onset less than 40 years were excluded. Controls were individuals aged 55 years or greater with no history of diabetes.
Hyperandrogenic phenotypes: Marked androgenic alopecia was based on selfreport of hair/balding pattern on questionnaire based on the Norwood-Hamilton scale, a classification system of male pattern baldness ranging from stage 1 (no baldness) to stage 7 (complete baldness) (15,16). Because a self-report of stages 2 to 4 may represent variations in hairline and/or hair volume rather than true androgenic alopecia, cases were individuals who self-reported marked baldness (stage 5+), and controls were individuals who self-reported no sign of baldness (stage 1) (15). The free-androgen index (FAI), a measure of androgen production, was calculated by dividing the total testosterone level 7 (nmol/L) by the sex hormone-binding globulin (SHBG) level (nmol/L) and multiplying by the constant 100.

Statistical analysis of BMI as a mediator
Causal mediation analysis was used to assess the indirect effect of BMI on the association between the outcome of interest and the PRS. The significance of the indirect effect of BMI was tested using bootstrapping procedures. Indirect effects were computed for 1,000 bootstrapped samples, and the 95% CI was computed. A p <0.05 was considered statistically significant.
Polygenic risk scores were calculated in 206,852 unrelated women of European ancestry ("All") and in a subcohort of 50,612 women who were £50 years of age with no history of menopause ("Subcohort") in the UK Biobank. The first two scores were calculated using the PRSice-2 software using independent variants (R 2 <0.1 with all other variants) with automatically optimized p-value thresholds for the PCOS phenotype. The remaining 15 scores were calculated using the PRS-CS algorithm, a Bayesian approach that weighs the effect size of each variant based on the level of statistical significance in the GWAS and a tuning parameter F based on the underlying genetic architecture. Because the PRS-CS algorithm does not incorporate dosage information on imputed genotypes, additional scores were calculated using a modified algorithm using PRS-CS to generate modified