-
PDF
- Split View
-
Views
-
Cite
Cite
Matthew Dapas, Ryan Sisk, Richard S Legro, Margrit Urbanek, Andrea Dunaif, M Geoffrey Hayes, Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome, The Journal of Clinical Endocrinology & Metabolism, Volume 104, Issue 9, September 2019, Pages 3835–3850, https://doi.org/10.1210/jc.2018-02496
- Share Icon Share
Abstract
Polycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5% to15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date.
The objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis.
We performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis.
We found rare variants in DENND1A (P = 5.31 × 10−5, adjusted P = 0.039) that were significantly associated with reproductive and metabolic traits in PCOS families.
Common variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.
Polycystic ovary syndrome (PCOS) is a common endocrine disorder affecting 5% to 15% of premenopausal women worldwide (1), depending on the diagnostic criteria applied. PCOS is diagnosed by two or more of its reproductive features of hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. It is frequently associated with insulin resistance and pancreatic β-cell dysfunction, making it a leading risk factor for type 2 diabetes in young women (2).
PCOS is a highly heritable complex genetic disorder. Analogous to other complex traits (3), common susceptibility loci identified in genome-wide association studies (GWASs) (4–8) account for only a small proportion of the estimated genetic heritability of PCOS (9). As GWASs were designed to assess common allelic variants, usually with minor allele frequencies (MAFs) ≥2% to 5%, it has been proposed that less frequently occurring variants with greater effect sizes account for the observed deficit in heritability (10). Next-generation sequencing approaches have identified rare variants that contribute to complex disease pathogenesis (11–18).
We tested the hypothesis that rare variants contribute to PCOS by conducting family-based association analyses using whole-genome sequencing data. We filtered and weighted rare variants (MAF ≤2%) according to their predicted levels of deleteriousness and grouped them regionally and by genes. We not only tested for associations with PCOS diagnosis, but we also tested for association with its quantitative reproductive and metabolic trait levels: testosterone (T), dehydroepiandrosterone sulfate (DHEAS), LH, FSH, SHBG, and fasting insulin. The quantitative trait phenotypes were combined using a meta-analysis approach to identify sets of rare variants that associate with altered hormonal levels in PCOS.
Materials and Methods
Subjects
This study included 261 individuals from 62 families with PCOS who were whites of European ancestry. Families were ascertained by an index case who fulfilled the National Institutes of Health criteria for PCOS (19). Each family consisted of at least a proband–parent trio. Brothers were not included in this study. Phenotypic data and some genetic analyses on these subjects have been previously reported (20–22). The study was approved by the Institutional Review Boards of Northwestern University Feinberg School of Medicine, Penn State Health Milton S. Hershey Medical Center, and Brigham and Women’s Hospital. Written informed consent was obtained from all subjects prior to the study.
Women were ages 14 to 63 years, in good health, and had both ovaries and a uterus. They were not taking medications known to alter reproductive or metabolic hormone levels for at least 1 month prior to the study. Exogenous gonadal steroid administration was discontinued at least 3 months prior to the study. Thyroid, pituitary, and adrenal disorders were excluded by appropriate tests (23). Women were considered to be of reproductive age when they were between the ages of at least 2 years postmenarche and 45 years old, and had FSH levels ≤40 mIU/mL. Hyperandrogenemia was defined by elevated levels of T (>58 ng/dL), non-SHBG bound T (>15 ng/dL), and/or DHEAS (>2683 ng/mL), as established in Legro et al. (23). Ovarian dysfunction was defined as six or fewer menses per year. Ovarian morphology was not assessed because it does not correlate with the endocrine phenotype (6, 24).
Phenotypes in reproductive-age women were assigned as previously reported (23). Reproductive-age women with hyperandrogenemia and ovarian dysfunction were assigned a PCOS phenotype. Reproductive-age women with normal androgen levels and regular menses (every 27 to 35 days) were assigned an unaffected phenotype. Reproductive-age women with hyperandrogenemia and regular menses were assigned a hyperandrogenemic (HA) phenotype. Because androgen levels do not decrease during the menopausal transition (25, 26), women between 46 and 63 years of age with hyperandrogenemia were also assigned the HA phenotype, regardless of menstrual cycle pattern. One index case fulfilled the criteria for PCOS when she was 45 years of age but she was 46 years of age when enrolled in the study. She was confirmed to have persistent hyperandrogenemia and ovarian dysfunction. Women with either the PCOS or HA phenotype were considered affected, as we have done in our previous linkage (24) and family-based association testing (27) studies. In the current study, we also included HA women between 46 and 63 years of age as affected. Women with normal androgen levels who were not of reproductive age and all fathers in the study were not assigned a phenotype.
The quantitative trait analysis examined associations between rare variants and T, DHEAS, SHBG, LH, FSH, and insulin levels. In addition to the women included in the dichotomous trait analysis, women were included in the quantitative trait association testing as follows. Women 46 to 72 years old were included in the analyses for T and DHEAS because androgen levels do not change during the menopausal transition (25, 26). These women were also included in the analysis of SHBG and insulin. Women with bilateral oophorectomy (n = 10) not receiving postmenopausal hormone therapy were included in the analysis of the adrenal androgen, DHEAS, and insulin (28, 29). We compared hormonal traits in women receiving postmenopausal hormone therapy (n = 15) to women from the cohort of comparable age who were not receiving hormonal therapy (n = 10). Only SHBG levels differed significantly. Therefore, women receiving postmenopausal hormone therapy were included in the analyses of T and DHEAS. They were not included in the insulin analysis because of the effect of estrogen on circulating insulin levels (30). No additional women were included in the LH or FSH analyses.
Fathers were included in the insulin level association test because there are not sex differences in this parameter (31). Subjects receiving glucocorticoids (men, 0; women, 2) were excluded from all quantitative trait association tests. Subjects receiving antidiabetic medications (men, 6; women, 7) were excluded from the T, SHBG, and insulin analyses but were included in the DHEAS analysis (32). Subjects with type 2 diabetes not receiving medications (men, 3; women, 7) were excluded from the insulin analysis, but the women were included in the T, DHEAS, and SHBG analyses.
Reference ranges for hormonal parameters from concurrently studied reproductively normal control subjects of comparable age, sex, body mass index (BMI), and ancestry (6, 33, 34) are included in Table 1. All control subjects had normal glucose tolerance with a 75 g oral glucose tolerance test (35). Reproductive-age control women were 18 to 45 years old with regular menses, FSH <40 mIU/mL, and normal androgen levels. Older control women were 46 to 65 years of age with a history of regular menses and normal androgen levels. Control men were 46 to 65 years of age. The hormonal data from control subjects in Table 1 were provided to depict normal reference ranges and were not used for any statistical comparisons.
Subject Type . | N . | Age (y)a . | N . | BMI (kg/m2) . | N . | Testosterone (ng/dL) . | N . | SHBG (nmol/L) . | N . | DHEAS (ng/mL) . | N . | Insulin (μU/mL) . | N . | LH (mIU/mL) . | N . | FSH (mIU/mL) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index cases | 62 | 29 (19–48) | 62 | 35.6 (27.9–42.7) | 62 | 74 (65–90) | 59 | 56 (36–92) | 61 | 2095 (1638–2710) | 62 | 21 (14–29) | 59 | 14 (7–20) | 59 | 9 (8–11) |
Fathers | 62 | 58 (43–85) | 61 | 28.9 (27.0–32.2) | — | — | — | 49 | 17 (11–26) | — | — | |||||
Mothers | ||||||||||||||||
HA | 6 | 51 (40–63) | 6 | 34.5 (25.2–38.2) | 5 | 51 (49–72) | 5 | 117 (105–135) | 6 | 2630 (1264–2694) | 3 | 19 (15–22) | — | — | ||
Unaffected | 2 | 44 (42–45) | 2 | 24.0 (20.6–27.4) | 2 | 32 (27–37) | 2 | 280 (218–342) | 2 | 1081 (901–1260) | 2 | 12 (8–15) | 2 | 7 (6–7) | 2 | 8 (6–10) |
Age >45 y | 54 | 57 (46–72) | 53 | 28.6 (26.6–35.5) | 36 | 26 (16–34) | 25 | 93 (74–166) | 50 | 645 (501–993) | 28 | 19 (11–25) | — | — | ||
Sisters | ||||||||||||||||
PCOS | 10 | 29 (19–36) | 10 | 32.3 (29.0–37.3) | 10 | 63 (49–74) | 10 | 55 (45–104) | 10 | 1665 (807–2774) | 10 | 28 (23–32) | 10 | 10 (5–15) | 10 | 11 (10–12) |
HA | 5 | 32 (15–40) | 5 | 22.3 (22.3–28.5) | 5 | 67 (54–68) | 5 | 126 (69–177) | 5 | 2047 (1509–3775) | 5 | 15 (11–17) | 4 | 4 (3–18) | 4 | 10 (7–14) |
Unaffected | 57 | 32 (14–45) | 57 | 25.1 (22.2–29.3) | 57 | 30 (23–35) | 55 | 128 (90–181) | 57 | 1408 (1028–1814) | 55 | 12 (9–15) | 55 | 5 (3–10) | 55 | 10 (7–12) |
Age >45 y | 3 | 47 (46–49) | 3 | 26.4 (24.5–32.1) | 3 | 30 (20–48) | 3 | 137 (88–150) | 3 | 772 (765–1588) | 3 | 13 (12–15) | — | — | ||
Reference ranges | ||||||||||||||||
Reproductive–aged women | 346 | 30 (25–35) | 346 | 28.5 (23.0–35.4) | 227 | 29 (21–37) | 188 | 100 (70–144) | 226 | 1357 (1018–1756) | 185 | 12 (10–16) | 173 | 4 (3–8) | 173 | 9 (7–12) |
Men | 55 | 53 (50–57) | 55 | 28.0 (25.9–31.0) | — | — | — | 46 | 14 (10–17) | — | — | |||||
Older women (46 to 65 y) | 69 | 56 (52–60) | 69 | 26.4 (23.2–29.6) | 69 | 23 (15–30) | 60 | 102 (75–156) | 66 | 645 (487–1016) | 69 | 13 (10–17) | 48 | 50 (35–69) | 49 | 80 (57–100) |
Subject Type . | N . | Age (y)a . | N . | BMI (kg/m2) . | N . | Testosterone (ng/dL) . | N . | SHBG (nmol/L) . | N . | DHEAS (ng/mL) . | N . | Insulin (μU/mL) . | N . | LH (mIU/mL) . | N . | FSH (mIU/mL) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index cases | 62 | 29 (19–48) | 62 | 35.6 (27.9–42.7) | 62 | 74 (65–90) | 59 | 56 (36–92) | 61 | 2095 (1638–2710) | 62 | 21 (14–29) | 59 | 14 (7–20) | 59 | 9 (8–11) |
Fathers | 62 | 58 (43–85) | 61 | 28.9 (27.0–32.2) | — | — | — | 49 | 17 (11–26) | — | — | |||||
Mothers | ||||||||||||||||
HA | 6 | 51 (40–63) | 6 | 34.5 (25.2–38.2) | 5 | 51 (49–72) | 5 | 117 (105–135) | 6 | 2630 (1264–2694) | 3 | 19 (15–22) | — | — | ||
Unaffected | 2 | 44 (42–45) | 2 | 24.0 (20.6–27.4) | 2 | 32 (27–37) | 2 | 280 (218–342) | 2 | 1081 (901–1260) | 2 | 12 (8–15) | 2 | 7 (6–7) | 2 | 8 (6–10) |
Age >45 y | 54 | 57 (46–72) | 53 | 28.6 (26.6–35.5) | 36 | 26 (16–34) | 25 | 93 (74–166) | 50 | 645 (501–993) | 28 | 19 (11–25) | — | — | ||
Sisters | ||||||||||||||||
PCOS | 10 | 29 (19–36) | 10 | 32.3 (29.0–37.3) | 10 | 63 (49–74) | 10 | 55 (45–104) | 10 | 1665 (807–2774) | 10 | 28 (23–32) | 10 | 10 (5–15) | 10 | 11 (10–12) |
HA | 5 | 32 (15–40) | 5 | 22.3 (22.3–28.5) | 5 | 67 (54–68) | 5 | 126 (69–177) | 5 | 2047 (1509–3775) | 5 | 15 (11–17) | 4 | 4 (3–18) | 4 | 10 (7–14) |
Unaffected | 57 | 32 (14–45) | 57 | 25.1 (22.2–29.3) | 57 | 30 (23–35) | 55 | 128 (90–181) | 57 | 1408 (1028–1814) | 55 | 12 (9–15) | 55 | 5 (3–10) | 55 | 10 (7–12) |
Age >45 y | 3 | 47 (46–49) | 3 | 26.4 (24.5–32.1) | 3 | 30 (20–48) | 3 | 137 (88–150) | 3 | 772 (765–1588) | 3 | 13 (12–15) | — | — | ||
Reference ranges | ||||||||||||||||
Reproductive–aged women | 346 | 30 (25–35) | 346 | 28.5 (23.0–35.4) | 227 | 29 (21–37) | 188 | 100 (70–144) | 226 | 1357 (1018–1756) | 185 | 12 (10–16) | 173 | 4 (3–8) | 173 | 9 (7–12) |
Men | 55 | 53 (50–57) | 55 | 28.0 (25.9–31.0) | — | — | — | 46 | 14 (10–17) | — | — | |||||
Older women (46 to 65 y) | 69 | 56 (52–60) | 69 | 26.4 (23.2–29.6) | 69 | 23 (15–30) | 60 | 102 (75–156) | 66 | 645 (487–1016) | 69 | 13 (10–17) | 48 | 50 (35–69) | 49 | 80 (57–100) |
Traits reported as count, median (25th to 75th percentiles). Values reported here are unadjusted. Trait values were adjusted and normalized for variant association testing.
Age reported as count, median (minimum to maximum).
Subject Type . | N . | Age (y)a . | N . | BMI (kg/m2) . | N . | Testosterone (ng/dL) . | N . | SHBG (nmol/L) . | N . | DHEAS (ng/mL) . | N . | Insulin (μU/mL) . | N . | LH (mIU/mL) . | N . | FSH (mIU/mL) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index cases | 62 | 29 (19–48) | 62 | 35.6 (27.9–42.7) | 62 | 74 (65–90) | 59 | 56 (36–92) | 61 | 2095 (1638–2710) | 62 | 21 (14–29) | 59 | 14 (7–20) | 59 | 9 (8–11) |
Fathers | 62 | 58 (43–85) | 61 | 28.9 (27.0–32.2) | — | — | — | 49 | 17 (11–26) | — | — | |||||
Mothers | ||||||||||||||||
HA | 6 | 51 (40–63) | 6 | 34.5 (25.2–38.2) | 5 | 51 (49–72) | 5 | 117 (105–135) | 6 | 2630 (1264–2694) | 3 | 19 (15–22) | — | — | ||
Unaffected | 2 | 44 (42–45) | 2 | 24.0 (20.6–27.4) | 2 | 32 (27–37) | 2 | 280 (218–342) | 2 | 1081 (901–1260) | 2 | 12 (8–15) | 2 | 7 (6–7) | 2 | 8 (6–10) |
Age >45 y | 54 | 57 (46–72) | 53 | 28.6 (26.6–35.5) | 36 | 26 (16–34) | 25 | 93 (74–166) | 50 | 645 (501–993) | 28 | 19 (11–25) | — | — | ||
Sisters | ||||||||||||||||
PCOS | 10 | 29 (19–36) | 10 | 32.3 (29.0–37.3) | 10 | 63 (49–74) | 10 | 55 (45–104) | 10 | 1665 (807–2774) | 10 | 28 (23–32) | 10 | 10 (5–15) | 10 | 11 (10–12) |
HA | 5 | 32 (15–40) | 5 | 22.3 (22.3–28.5) | 5 | 67 (54–68) | 5 | 126 (69–177) | 5 | 2047 (1509–3775) | 5 | 15 (11–17) | 4 | 4 (3–18) | 4 | 10 (7–14) |
Unaffected | 57 | 32 (14–45) | 57 | 25.1 (22.2–29.3) | 57 | 30 (23–35) | 55 | 128 (90–181) | 57 | 1408 (1028–1814) | 55 | 12 (9–15) | 55 | 5 (3–10) | 55 | 10 (7–12) |
Age >45 y | 3 | 47 (46–49) | 3 | 26.4 (24.5–32.1) | 3 | 30 (20–48) | 3 | 137 (88–150) | 3 | 772 (765–1588) | 3 | 13 (12–15) | — | — | ||
Reference ranges | ||||||||||||||||
Reproductive–aged women | 346 | 30 (25–35) | 346 | 28.5 (23.0–35.4) | 227 | 29 (21–37) | 188 | 100 (70–144) | 226 | 1357 (1018–1756) | 185 | 12 (10–16) | 173 | 4 (3–8) | 173 | 9 (7–12) |
Men | 55 | 53 (50–57) | 55 | 28.0 (25.9–31.0) | — | — | — | 46 | 14 (10–17) | — | — | |||||
Older women (46 to 65 y) | 69 | 56 (52–60) | 69 | 26.4 (23.2–29.6) | 69 | 23 (15–30) | 60 | 102 (75–156) | 66 | 645 (487–1016) | 69 | 13 (10–17) | 48 | 50 (35–69) | 49 | 80 (57–100) |
Subject Type . | N . | Age (y)a . | N . | BMI (kg/m2) . | N . | Testosterone (ng/dL) . | N . | SHBG (nmol/L) . | N . | DHEAS (ng/mL) . | N . | Insulin (μU/mL) . | N . | LH (mIU/mL) . | N . | FSH (mIU/mL) . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index cases | 62 | 29 (19–48) | 62 | 35.6 (27.9–42.7) | 62 | 74 (65–90) | 59 | 56 (36–92) | 61 | 2095 (1638–2710) | 62 | 21 (14–29) | 59 | 14 (7–20) | 59 | 9 (8–11) |
Fathers | 62 | 58 (43–85) | 61 | 28.9 (27.0–32.2) | — | — | — | 49 | 17 (11–26) | — | — | |||||
Mothers | ||||||||||||||||
HA | 6 | 51 (40–63) | 6 | 34.5 (25.2–38.2) | 5 | 51 (49–72) | 5 | 117 (105–135) | 6 | 2630 (1264–2694) | 3 | 19 (15–22) | — | — | ||
Unaffected | 2 | 44 (42–45) | 2 | 24.0 (20.6–27.4) | 2 | 32 (27–37) | 2 | 280 (218–342) | 2 | 1081 (901–1260) | 2 | 12 (8–15) | 2 | 7 (6–7) | 2 | 8 (6–10) |
Age >45 y | 54 | 57 (46–72) | 53 | 28.6 (26.6–35.5) | 36 | 26 (16–34) | 25 | 93 (74–166) | 50 | 645 (501–993) | 28 | 19 (11–25) | — | — | ||
Sisters | ||||||||||||||||
PCOS | 10 | 29 (19–36) | 10 | 32.3 (29.0–37.3) | 10 | 63 (49–74) | 10 | 55 (45–104) | 10 | 1665 (807–2774) | 10 | 28 (23–32) | 10 | 10 (5–15) | 10 | 11 (10–12) |
HA | 5 | 32 (15–40) | 5 | 22.3 (22.3–28.5) | 5 | 67 (54–68) | 5 | 126 (69–177) | 5 | 2047 (1509–3775) | 5 | 15 (11–17) | 4 | 4 (3–18) | 4 | 10 (7–14) |
Unaffected | 57 | 32 (14–45) | 57 | 25.1 (22.2–29.3) | 57 | 30 (23–35) | 55 | 128 (90–181) | 57 | 1408 (1028–1814) | 55 | 12 (9–15) | 55 | 5 (3–10) | 55 | 10 (7–12) |
Age >45 y | 3 | 47 (46–49) | 3 | 26.4 (24.5–32.1) | 3 | 30 (20–48) | 3 | 137 (88–150) | 3 | 772 (765–1588) | 3 | 13 (12–15) | — | — | ||
Reference ranges | ||||||||||||||||
Reproductive–aged women | 346 | 30 (25–35) | 346 | 28.5 (23.0–35.4) | 227 | 29 (21–37) | 188 | 100 (70–144) | 226 | 1357 (1018–1756) | 185 | 12 (10–16) | 173 | 4 (3–8) | 173 | 9 (7–12) |
Men | 55 | 53 (50–57) | 55 | 28.0 (25.9–31.0) | — | — | — | 46 | 14 (10–17) | — | — | |||||
Older women (46 to 65 y) | 69 | 56 (52–60) | 69 | 26.4 (23.2–29.6) | 69 | 23 (15–30) | 60 | 102 (75–156) | 66 | 645 (487–1016) | 69 | 13 (10–17) | 48 | 50 (35–69) | 49 | 80 (57–100) |
Traits reported as count, median (25th to 75th percentiles). Values reported here are unadjusted. Trait values were adjusted and normalized for variant association testing.
Age reported as count, median (minimum to maximum).
Hormone assays
T, DHEAS, SHBG, LH, FSH, and insulin levels were measured as previously reported (6).
Whole-genome sequencing
Genomic DNA was isolated from whole blood samples using the Gentra Puregene Blood Kit (Qiagen, Valencia, CA). Whole-genome sequencing was performed by Complete Genomics, Inc. (CGI) using their proprietary sequencing technology. Their sequencing platform used high-density nanoarrays populated with amplified DNA clusters called DNA nanoballs. DNA was read using a novel, iterative hybridization and ligation approach, which produces paired-end reads up to 70 bases in length (36). CGI’s sequencing service included sample quality control, library construction, whole-genome DNA sequencing, and variant calling. Reads were mapped to the National Center for Biotechnology Information Build 37.2 reference genome (GRCh37).
Variant calling
Called variant filtering
Variants were considered rare when they appeared with a frequency ≤2% in the Scripps Wellderly Genome Resource, which consisted of 597 unrelated participants of European ancestry from the Scripps Wellderly Study (39). The Wellderly study population is composed entirely of elderly individuals ≥80 years of age with no history of chronic disease. Within each sequenced family, reported variants that were inconsistent with Mendelian patterns of inheritance were removed from consideration. A variant was considered consistent with Mendelian inheritance when it was called (VQHIGH) in one or more of the offspring and in at least one parent. The vast majority of DNA sequencing errors can be eliminated using Mendelian inheritance analysis (40). As previously described (41–44), an additional set of filters was applied to called variants for each sample to minimize the number of false-positive calls: (i) variants with VQLOW allele tags were removed; (ii) variants in microsatellite regions were removed; (iii) variants within simple tandem repeat regions were removed; (iv) three or more single-nucleotide polymorphisms (SNPs) clustered within a distance of 10 bp were removed; (v) SNPs located within 10 bp of an insertion or deletion (indel) were removed; (vi) calls located within known regions of segmental duplication were removed; and (vii) calls with an observed read depth greater than threefold the average read depth (>168) were removed.
Selection of optimal read depth and quality score thresholds
Assessing deleteriousness
After filtering for rare variants called with high confidence, as described above, variants were further characterized according to their predicted effects. Only variants that were predicted to have a deleterious effect were included in the association testing. Deleteriousness was primarily assessed using the Combined Annotation Dependent Depletion (CADD) tool (46), which is trained on output from numerous annotation programs to predict deleteriousness based on conservation relative to our ancestral genome. Variants were retained when they produced a CADD score greater than the gene-specific mutation significance cutoff (MSC; 95% CI) suggested by Itan et al. (47). For variants outside of gene regions or in inconsistently annotated genes, the mean MSC (MSCμ = 12.889) was used as the cutoff. In a further effort to reduce false positives, only coding variants classified as at least possibly damaging by the PolyPhen2 (48) or SIFT (49) variant effect prediction tools and noncoding variants with LINSIGHT (50) scores ≥0.8 were included in our analysis, as integrating individual methods can improve variant effect prediction (47, 51–53). CADD and LINSIGHT are both primarily based on evolutionary conservation, and they therefore carry certain limitations (54). However, the applicability of prediction tools based on functional annotations from specific cell types (55, 56) is extremely limited for PCOS because its pathophysiology involves numerous cell types across multiple organs (57), and annotations for relevant cell types are largely unavailable (58–61).
Genome-wide rare variant association testing
Variants were then grouped for association testing using both gene-based and sliding window approaches. Gene regions included all coding and noncoding DNA from the 3′ untranslated region to 7.5 kb upstream of the 5′ transcriptional start site for all RefSeq annotated genes. The sliding window approach included all noncoding variation contained within sequential windows across the genome using three different window sizes: 10-kb windows with no overlap, 25-kb windows with 12.5-kb overlap, and 100-kb windows with 75-kb overlap.
In rare variant association testing, genes are often filtered according to a minimum number of rare variants detected per gene (62–65) or a minimum cumulative variant frequency (CVF) per gene (66–69) to increase power to detect disease associations. In this study, genes and windows were removed from consideration when they did not harbor deleterious variants in at least 10% of affected subjects (70, 71). For the gene-based test, to reduce the resultant bias toward larger genes, the observed CVFs were adjusted for gene length. Coding and noncoding CVFs were modeled separately using linear regressions against the coding sequence length (72, 73) and the square root of the noncoding sequence length (74, 75), respectively. The models also accounted for gene-level guanine-cytosine content. The CVFs observed in affected subjects for each gene were then adjusted accordingly prior to applying the 10% threshold. Adjusting for gene length in this manner reduced the risk for erroneous associations, as genes with longer open reading frames are more likely to be reported falsely as disease-associated (76).
Six quantitative traits, that is, T, DHEAS, LH, FSH, SHBG, and insulin levels, were tested for association. The trait distributions were each positively skewed. Testing against skewed trait distributions can result in heavily inflated type I error rates in rare variant association testing (78). Therefore, each trait was modeled against a gamma distribution when adjusting for age and BMI, and nonnormal residuals (Shapiro–Wilk <0.05) were then further normalized using a rank-based inverse normal transformation (INT). The INT has been found to be the optimal method for maintaining type I error control without sacrificing power in rare variant association testing on nonnormally distributed traits (79).
Quantitative trait meta-analysis
For complex diseases with multivariate phenotypes, combining multiple related phenotypic traits into one analysis can increase power in finding disease associations (80, 81). We combined the six aforementioned quantitative trait associations into one meta-statistic using Fisher’s combination function modified to account for correlated traits (82). Intertrait correlations were determined using the Spearman correlation coefficient (45). P values were adjusted using a conservative Bonferroni correction according to the number of variant groupings that were tested that contained at least one variant across both the dichotomous and quantitative trait associations, as well as by the genomic inflation factor, λ. Nominal P values are reported herein as Pn, and corrected P values are reported as Pc. Correlation between meta-analysis association results and dichotomous trait association results was calculated using a Spearman coefficient and tested using a two-tailed t test.
The characteristic disturbance of gonadotropin secretion associated with PCOS is increased LH relative to FSH release (83). For genes with significant meta-analysis associations, LH/FSH ratios were compared between variant carriers and noncarriers using a Wilcoxon rank sum test, adjusted for multiple testing (Bonferroni). Differences in LH/FSH ratios between variant carriers and noncarriers would indicate that the gene variants alter gonadotropin signaling.
Linkage disequilibrium between rare variants and GWAS risk alleles
To evaluate whether rare variants identified in the association tests could account for GWAS association signals in the same gene, we calculated linkage disequilibrium (LD) between each of the rare variants and each of the corresponding GWAS risk alleles. Alleles were phased utilizing pedigree information. Instances in which the allelic phasing could not be resolved between the rare variants and GWAS variants were removed from consideration.
In silico binding effect prediction
To assess the potential functional effects of noncoding variants identified in the quantitative trait meta-analysis, we predicted the corresponding impacts to transcription factor (TF) and RNA-binding protein (RBP) binding in silico. For each noncoding SNV identified in the quantitative trait meta-analysis, TF binding affinities were calculated for all subsequences overlapping the SNV position on each strand within a ±20-bp window. Binding affinity scores were calculated using position weight matrices derived from ENCODE chromatin immunoprecipitation sequencing experiments (76). Scores were determined by summing the logged frequencies for a given sequence across a motif position weight matrix. Binding P values were defined as the probability that a sequence sampled from a genomic background distribution had an affinity score greater than or equal to the largest affinity score produced from one of the tested subsequences. Genomic background sequences were generated using a first-order Markov model (84). The significance of a given change in binding affinity scores between reference and SNV alleles was assessed by determining whether the differences in relative binding affinity rank between the two alleles was significantly different than what would be expected by chance (85, 86). P values were conservatively adjusted to account for multiple testing using the Benjamini–Hochberg procedure (87).
Once binding affinities were calculated for each TF at each SNV, filters were applied to identify the most likely candidates for TF binding site disruption. Instances in which the predicted TF binding affinity score was <80% of the maximum affinity score for the given motif were excluded. SNVs in which the reference allele was not predicted to bind a particular TF with statistical significance (Benjamini–Hochberg P < 0.05) were also removed from consideration, as were variants in which both the reference and SNV alleles were predicted to bind a TF with statistical significance. Only TFs expressed in the ovary were analyzed. Tissue-specific gene expression was determined using GTEx data (88) (median reads per kilobase of transcript per million mapped reads ≥0.1).
The identified SNVs were likewise analyzed for potential alteration to RBP sites following the same procedure but for a few modifications. RBPs and their binding affinity scores were determined using the ATtTRACT database (89). Only sequences on the coding strand were evaluated as to reflect the mRNA sequences. Additionally, significant changes in RBP binding were considered regardless of the direction of effect, such that instances in which the alternate allele was predicted to induce RBP binding were also included.
Supplemental methods
For additional details regarding methodological considerations and rationale, please refer to the online repository (45).
Results
Characteristics of study population
The characteristics of the study population, including counts and trait distributions by familial relationship and the numbers of subjects included in each association test, are summarized in Table 1.
Whole-genome sequencing and variant calling
Genome sequencing yielded average genome coverage of 96.2% per sample and an overall mean sequencing depth of 56× (45). On average, 90.4% and 66.4% of the genome, including 95.2% and 78.8% of the exome, was covered with at least 20× and 40× sequencing depth, respectively. Approximately 4.04 million high-confidence small variant calls were reported per genome. By applying optimal read depth and quality thresholds as well as a series of genomic filters (41), we reduced the discrepancy rate between replicate samples from 0.23% to 0.04% for rare variants (45).
Association testing and quantitative trait meta-analysis
We found 339 genes that had rare, deleterious variants in at least 10% of cases after adjusting for gene length and percentage guanine-cytosine content. No set of rare variants reached genome-wide significance for association with PCOS/HA disease status in the dichotomous trait analysis. We found 32 rare variants (2 coding, 30 noncoding) in the DENND1A gene that were collectively significantly associated with quantitative trait levels (Pn = 5.31×10−5, Pc = 0.039; Table 2), after adjusting for multiple testing and for observed genomic inflation (45). Women with one or more of these DENND1A variants had significantly higher LH/FSH ratios (Pc = 0.0036). PCOS/HA phenotype women with one or more DENND1A variants had significantly higher LH/FSH ratios than did PCOS/HA phenotype women without DENND1A variants (Pc = 0.0180). Unaffected women with one or more DENND1A variants also trended toward higher LH/FSH ratios than unaffected women without DENND1A variants (Pc = 0.1758; Fig. 1). No other gene-based set of rare variants reached genome-wide significance for association with quantitative trait levels. The correlation between the meta-analysis gene associations and the dichotomous trait gene associations was 0.24 (Spearman; P = 5.12 × 10−6).
Chromosome . | Gene . | Length . | Variants . | Familiesa . | ORb . | Pn . | Pc . |
---|---|---|---|---|---|---|---|
9 | DENND1A | 550 kb | 32 | 50% | 1.20 (0.69–2.14) | 5.31 × 10−5 | 0.039 |
11 | PKNOX2 | 269 kb | 18 | 39% | 1.55 (0.87–2.94) | 7.28 × 10−4 | 0.53 |
6 | BMP6 | 155 kb | 11 | 35% | 1.41 (0.65–3.27) | 3.98 × 10−3 | 1.00 |
9 | C9orf3 | 421 kb | 10 | 21% | 2.79 (1.08–8.24) | 6.14 × 10−3 | 1.00 |
1 | PRDM2 | 125 kb | 8 | 19% | 3.26 (0.77–22.51) | 6.92 × 10−3 | 1.00 |
Chromosome . | Gene . | Length . | Variants . | Familiesa . | ORb . | Pn . | Pc . |
---|---|---|---|---|---|---|---|
9 | DENND1A | 550 kb | 32 | 50% | 1.20 (0.69–2.14) | 5.31 × 10−5 | 0.039 |
11 | PKNOX2 | 269 kb | 18 | 39% | 1.55 (0.87–2.94) | 7.28 × 10−4 | 0.53 |
6 | BMP6 | 155 kb | 11 | 35% | 1.41 (0.65–3.27) | 3.98 × 10−3 | 1.00 |
9 | C9orf3 | 421 kb | 10 | 21% | 2.79 (1.08–8.24) | 6.14 × 10−3 | 1.00 |
1 | PRDM2 | 125 kb | 8 | 19% | 3.26 (0.77–22.51) | 6.92 × 10−3 | 1.00 |
Rare variants exceed thresholds of predicted deleteriousness.
Families include all that have daughters with at least one of the rare variants.
ORs shown with 95% CIs, estimated from logistic regression of gene variant count against disease status, with equal variant weighting, adjusted for BMI, not considering relatedness (individual variants may have larger effects).
Chromosome . | Gene . | Length . | Variants . | Familiesa . | ORb . | Pn . | Pc . |
---|---|---|---|---|---|---|---|
9 | DENND1A | 550 kb | 32 | 50% | 1.20 (0.69–2.14) | 5.31 × 10−5 | 0.039 |
11 | PKNOX2 | 269 kb | 18 | 39% | 1.55 (0.87–2.94) | 7.28 × 10−4 | 0.53 |
6 | BMP6 | 155 kb | 11 | 35% | 1.41 (0.65–3.27) | 3.98 × 10−3 | 1.00 |
9 | C9orf3 | 421 kb | 10 | 21% | 2.79 (1.08–8.24) | 6.14 × 10−3 | 1.00 |
1 | PRDM2 | 125 kb | 8 | 19% | 3.26 (0.77–22.51) | 6.92 × 10−3 | 1.00 |
Chromosome . | Gene . | Length . | Variants . | Familiesa . | ORb . | Pn . | Pc . |
---|---|---|---|---|---|---|---|
9 | DENND1A | 550 kb | 32 | 50% | 1.20 (0.69–2.14) | 5.31 × 10−5 | 0.039 |
11 | PKNOX2 | 269 kb | 18 | 39% | 1.55 (0.87–2.94) | 7.28 × 10−4 | 0.53 |
6 | BMP6 | 155 kb | 11 | 35% | 1.41 (0.65–3.27) | 3.98 × 10−3 | 1.00 |
9 | C9orf3 | 421 kb | 10 | 21% | 2.79 (1.08–8.24) | 6.14 × 10−3 | 1.00 |
1 | PRDM2 | 125 kb | 8 | 19% | 3.26 (0.77–22.51) | 6.92 × 10−3 | 1.00 |
Rare variants exceed thresholds of predicted deleteriousness.
Families include all that have daughters with at least one of the rare variants.
ORs shown with 95% CIs, estimated from logistic regression of gene variant count against disease status, with equal variant weighting, adjusted for BMI, not considering relatedness (individual variants may have larger effects).

LH/FSH ratios in DENND1A variant carriers. LH/FSH ratios in DENND1A rare variant carriers (+) and noncarriers (−) in unaffected women and in women with PCOS/HA. Differences in group means were analyzed using a Wilcoxon rank sum test (*Pc < 0.05). The significance bars illustrate significant differences in LH/FSH ratios between (i) all DENND1A variant carriers and all noncarriers and (ii) DENND1A variant carriers and noncarriers in affected subjects only. Note that PCOS includes HA subjects. Unaff, unaffected.
Using the sliding windows approach, we found a subset of noncoding variants within a 25-kb region of DENND1A (chr9:126,537,500 to 126,562,500) that were significantly associated with altered quantitative trait levels (Pn = 1.92 × 10−5, Pc = 0.019) (45). The region included three noncoding variants that were collectively present in eight subjects with PCOS/HA but in no unaffected subjects. Notably, one of these rare variants, rs117893097 (MAFWellderly = 0.013), was homozygous in one of the subjects. This 25-kb region encompasses one of the DENND1A GWAS risk variants [rs10986105; MAFWellderly = 0.034; ORMeta=1.39 (90)], although all carriers of the rare variants in the region lacked the rs10986105 risk allele. The relative positions of all of the rare variants found in DENND1A are shown in Fig. 2 (91–93). No other windows across the genome were found to have significant associations with disease state or hormonal levels.

Rare variants in DENND1A. The locations of deleterious rare variants and previously reported GWAS SNPs within the DENND1A gene, including the two primary isoforms DENND1A.V1 and DENND1A.V2, are shown. The 25-kb region significantly associated with altered hormone levels is highlighted in light blue. The conservation track was measured on multiple alignments of 100 vertebrate species by phyloP (91). The H3K4Me1 track shows enrichment of monomethylation of lysine 4 of the H3 histone protein, which is associated with enhancers and DNA regions downstream of transcription starts, as determined by chromatin immunoprecipitation sequencing assay and layered by different cell types (58). The DNase clusters track shows regions of DNase hypersensitivity, an indicator of regulatory activity, with darkness proportional to maximum signal strength (58). The GTEx eQTL track displays gene expression quantitative trait loci for DENND1A, as identified from GTEx RNA sequencing and genotype data, with red and blue indicating positive and negative effects on gene expression, respectively (92). The linkage disequilibrium heat map was generated using phase 1 CEU data from the 1000 Genomes Project (93).
Several other PCOS GWAS candidate genes appeared in our filtered set of genes, including C9orf3 (5, 6, 8) (Pn = 6.14 × 10−3), HMGA2 (5) (Pn = 0.062), ZBTB16 (8) (Pn = 0.20), TOX3 (5, 8) (Pn = 0.22), and THADA (4, 5, 7, 8) (Pn = 0.74). C9orf3 had the fourth strongest association overall (45), but it failed to reach genome-wide significance after correction for multiple testing. The relative quantitative trait associations for these genes are illustrated in Fig. 3.

Trait associations for PCOS GWAS genes. The relative quantitative trait associations are shown for PCOS GWAS susceptibility loci included in meta-analysis results, with meta-analysis association ranking. I, fasting insulin.
Linkage disequilibrium with GWAS variants
On average, the 32 rare variants identified in DENND1A were in low LD with each of the previously identified DENND1A GWAS risk alleles: rs2479106 (MAF = 0.25, D′μ = 0.28, r2μ = 0.0077), rs10818854 (MAF = 0.06, D′μ = 0.035, r2μ = 0.0024), and rs10986105 (MAF = 0.05, D′μ = 0.031, r2μ = 0.0026), although individually, some were in high LD with rs2479106 (D′ = 1). Pairwise LD values are provided in Table 3.
Position . | ID . | Variant . | Allele Frequency . | CADD . | LINSIGHT . | TF − (Disrupted) . | RBP − (Disrupted) . | RBP + (Enhanced) . | GWAS Allelic LD (D′) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Affa . | Unrelb . | Popc . | rs2479106 . | rs10818854 . | rs10986105 . | ||||||||
9:126,144,390d | rs189947178 | G → T | 0.018 | 0.020 | 0.003 | 11.91 | N/A | N/A | — | — | <0.01 | 0.13 | <0.01 |
9:126,154,100 | rs147370674 | C → T | 0.006 | 0.004 | 0.004 | 21.10 | 0.97 | — | XPO5 | — | <0.01 | <0.01 | <0.01 |
9:126,154,582 | rs529224231 | T → G | 0.006 | 0.004 | 0.000 | 21.30 | 0.97 | MAFK, MAFB, NRL | — | IFIH1, XPO5 | N/A | N/A | N/A |
9:126,169,258 | rs561100869 | C → T | 0.000 | 0.004 | 0.000 | 16.94 | 0.97 | IRF1 | NOVA1 | — | <0.01 | <0.01 | <0.01 |
9:126,180,758 | rs750425892 | G → C | 0.000 | 0.004 | 0.000 | 19.31 | 0.96 | STAT6, MZF1, SPI1 | PCBP2 | — | <0.01 | <0.01 | <0.01 |
9:126,203,546 | — | T → C | 0.006 | 0.004 | 0.000 | 20.50 | 0.93 | FOXM1 | RBMX | — | <0.01 | <0.01 | <0.01 |
9:126,225,586 | rs538451690 | T → A | 0.006 | 0.004 | 0.002 | 16.07 | 0.94 | — | — | RC3H1, XPO5 | 1 | <0.01 | <0.01 |
9:126,231,902 | — | C → T | 0.006 | 0.004 | 0.000 | 20.20 | 0.97 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,247,645 | rs543947590 | C → A | 0.006 | 0.004 | 0.003 | 21.00 | 0.97 | STAT5A | — | OAS1 | 1 | <0.01 | <0.01 |
9:126,267,980 | rs558809288 | C → T | 0.006 | 0.004 | 0.000 | 19.97 | 0.84 | ARNT | — | — | <0.01 | <0.01 | <0.01 |
9:126,284,213 | rs149244424 | C → T | 0.012 | 0.004 | 0.008 | 12.44 | 0.92 | ESRRA, ELF1 | — | — | <0.01 | <0.01 | <0.01 |
9:126,312,990 | rs748274474 | A → G | 0.006 | 0.004 | 0.003 | 21.10 | 0.96 | — | — | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,313,234 | — | C → A | 0.006 | 0.004 | 0.000 | 17.84 | 0.97 | — | SRSF2 | — | <0.01 | <0.01 | <0.01 |
9:126,326,081 | rs138249397 | C → T | 0.018 | 0.016 | 0.002 | 9.32 | 0.88 | — | OAS1 | SRP68 | 0.73 | <0.01 | <0.01 |
9:126,331,427 | rs184609118 | A → C | 0.000 | 0.004 | 0.002 | 18.48 | 0.97 | — | ELAVL1, SSB | CELF2, NOVA1 | 1 | <0.01 | <0.01 |
9:126,370,689 | rs564042790 | C → T | 0.006 | 0.004 | 0.003 | 16.84 | 0.89 | SMAD3 | IFIH1 | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,402,348 | rs182167487 | C → T | 0.000 | 0.004 | 0.000 | 14.18 | 0.93 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,404,312 | — | G → A | 0.006 | 0.004 | 0.000 | 15.72 | 0.91 | — | SRP54, SRP68 | PTBP1 | 1 | <0.01 | <0.01 |
9:126,414,365d | rs141759269 | T → G | 0.000 | 0.004 | 0.001 | 26.00 | N/A | N/A | — | — | N/A | <0.01 | <0.01 |
9:126,440,857 | rs147844210 | T → C | 0.000 | 0.004 | 0.008 | 21.60 | 0.95 | GATA6 | — | CMTR1, FUS, SRSF3, YBX1 | <0.01 | <0.01 | <0.01 |
9:126,441,904 | rs75342773 | T → C | 0.006 | 0.004 | 0.008 | 16.64 | 0.95 | — | HNRNPH1–2 | TRA2B | <0.01 | <0.01 | <0.01 |
9:126,447,680 | rs117984673 | T → A | 0.018 | 0.016 | 0.008 | 17.70 | 0.94 | SOX10, SOX17 | NOVA2 | — | 0.73 | <0.01 | <0.01 |
9:126,458,124 | rs112188193 | G → C | 0.012 | 0.008 | 0.013 | 14.43 | 0.92 | STAT4, STAT5A, TEAD1 | RC3H1 | NUDT21 | <0.01 | <0.01 | <0.01 |
9:126,480,236 | rs543924878 | A → C | 0.006 | 0.004 | 0.004 | 18.31 | 0.81 | SREBF1 | — | HNRNPH1–3, HNRNPF, KHSRP, SRP14 | <0.01 | <0.01 | <0.01 |
9:126,513,154 | rs147058034 | A → G | 0.006 | 0.004 | 0.003 | 19.81 | 0.97 | FOXG1, FOXP3, FOXO4, FOXO6, FOXC1, FOXK1 | CELF1, XPO5 | NOVA1 | 1 | 1 | 1 |
9:126,549,983 | — | T → C | 0.018 | 0.004 | 0.000 | 19.45 | 0.81 | — | — | — | 1 | <0.01 | <0.01 |
9:126,555,003 | rs117893097 | C → G | 0.030 | 0.020 | 0.013 | 21.40 | 0.99 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,557,679 | rs552299287 | A → T | 0.006 | 0.004 | 0.000 | 18.00 | 0.84 | STAT5A, STAT6 | CELF1–2, RC3H1, XPO5 | — | 1 | 0.84 | 1 |
9:126,597,096 | — | T → C | 0.006 | 0.004 | 0.000 | 16.64 | 0.97 | SOX5 | HNRNPA0, HNRNPA1, HNRNPD, ELAVL1, ZFP36 | XPO5 | <0.01 | <0.01 | <0.01 |
9:126,603,402 | rs78012023 | TTA → ATG | 0.018 | 0.012 | 0.000 | 8.99 | 0.91 | — | — | — | N/A | <0.01 | <0.01 |
9:126,625,643 | rs116887221 | C → A | 0.006 | 0.004 | 0.008 | 16.24 | 0.8 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,691,321 | rs79740971 | G → A | 0.006 | 0.008 | 0.010 | 13.52 | 0.84 | ESRRA | — | — | <0.01 | <0.01 | <0.01 |
Position . | ID . | Variant . | Allele Frequency . | CADD . | LINSIGHT . | TF − (Disrupted) . | RBP − (Disrupted) . | RBP + (Enhanced) . | GWAS Allelic LD (D′) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Affa . | Unrelb . | Popc . | rs2479106 . | rs10818854 . | rs10986105 . | ||||||||
9:126,144,390d | rs189947178 | G → T | 0.018 | 0.020 | 0.003 | 11.91 | N/A | N/A | — | — | <0.01 | 0.13 | <0.01 |
9:126,154,100 | rs147370674 | C → T | 0.006 | 0.004 | 0.004 | 21.10 | 0.97 | — | XPO5 | — | <0.01 | <0.01 | <0.01 |
9:126,154,582 | rs529224231 | T → G | 0.006 | 0.004 | 0.000 | 21.30 | 0.97 | MAFK, MAFB, NRL | — | IFIH1, XPO5 | N/A | N/A | N/A |
9:126,169,258 | rs561100869 | C → T | 0.000 | 0.004 | 0.000 | 16.94 | 0.97 | IRF1 | NOVA1 | — | <0.01 | <0.01 | <0.01 |
9:126,180,758 | rs750425892 | G → C | 0.000 | 0.004 | 0.000 | 19.31 | 0.96 | STAT6, MZF1, SPI1 | PCBP2 | — | <0.01 | <0.01 | <0.01 |
9:126,203,546 | — | T → C | 0.006 | 0.004 | 0.000 | 20.50 | 0.93 | FOXM1 | RBMX | — | <0.01 | <0.01 | <0.01 |
9:126,225,586 | rs538451690 | T → A | 0.006 | 0.004 | 0.002 | 16.07 | 0.94 | — | — | RC3H1, XPO5 | 1 | <0.01 | <0.01 |
9:126,231,902 | — | C → T | 0.006 | 0.004 | 0.000 | 20.20 | 0.97 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,247,645 | rs543947590 | C → A | 0.006 | 0.004 | 0.003 | 21.00 | 0.97 | STAT5A | — | OAS1 | 1 | <0.01 | <0.01 |
9:126,267,980 | rs558809288 | C → T | 0.006 | 0.004 | 0.000 | 19.97 | 0.84 | ARNT | — | — | <0.01 | <0.01 | <0.01 |
9:126,284,213 | rs149244424 | C → T | 0.012 | 0.004 | 0.008 | 12.44 | 0.92 | ESRRA, ELF1 | — | — | <0.01 | <0.01 | <0.01 |
9:126,312,990 | rs748274474 | A → G | 0.006 | 0.004 | 0.003 | 21.10 | 0.96 | — | — | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,313,234 | — | C → A | 0.006 | 0.004 | 0.000 | 17.84 | 0.97 | — | SRSF2 | — | <0.01 | <0.01 | <0.01 |
9:126,326,081 | rs138249397 | C → T | 0.018 | 0.016 | 0.002 | 9.32 | 0.88 | — | OAS1 | SRP68 | 0.73 | <0.01 | <0.01 |
9:126,331,427 | rs184609118 | A → C | 0.000 | 0.004 | 0.002 | 18.48 | 0.97 | — | ELAVL1, SSB | CELF2, NOVA1 | 1 | <0.01 | <0.01 |
9:126,370,689 | rs564042790 | C → T | 0.006 | 0.004 | 0.003 | 16.84 | 0.89 | SMAD3 | IFIH1 | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,402,348 | rs182167487 | C → T | 0.000 | 0.004 | 0.000 | 14.18 | 0.93 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,404,312 | — | G → A | 0.006 | 0.004 | 0.000 | 15.72 | 0.91 | — | SRP54, SRP68 | PTBP1 | 1 | <0.01 | <0.01 |
9:126,414,365d | rs141759269 | T → G | 0.000 | 0.004 | 0.001 | 26.00 | N/A | N/A | — | — | N/A | <0.01 | <0.01 |
9:126,440,857 | rs147844210 | T → C | 0.000 | 0.004 | 0.008 | 21.60 | 0.95 | GATA6 | — | CMTR1, FUS, SRSF3, YBX1 | <0.01 | <0.01 | <0.01 |
9:126,441,904 | rs75342773 | T → C | 0.006 | 0.004 | 0.008 | 16.64 | 0.95 | — | HNRNPH1–2 | TRA2B | <0.01 | <0.01 | <0.01 |
9:126,447,680 | rs117984673 | T → A | 0.018 | 0.016 | 0.008 | 17.70 | 0.94 | SOX10, SOX17 | NOVA2 | — | 0.73 | <0.01 | <0.01 |
9:126,458,124 | rs112188193 | G → C | 0.012 | 0.008 | 0.013 | 14.43 | 0.92 | STAT4, STAT5A, TEAD1 | RC3H1 | NUDT21 | <0.01 | <0.01 | <0.01 |
9:126,480,236 | rs543924878 | A → C | 0.006 | 0.004 | 0.004 | 18.31 | 0.81 | SREBF1 | — | HNRNPH1–3, HNRNPF, KHSRP, SRP14 | <0.01 | <0.01 | <0.01 |
9:126,513,154 | rs147058034 | A → G | 0.006 | 0.004 | 0.003 | 19.81 | 0.97 | FOXG1, FOXP3, FOXO4, FOXO6, FOXC1, FOXK1 | CELF1, XPO5 | NOVA1 | 1 | 1 | 1 |
9:126,549,983 | — | T → C | 0.018 | 0.004 | 0.000 | 19.45 | 0.81 | — | — | — | 1 | <0.01 | <0.01 |
9:126,555,003 | rs117893097 | C → G | 0.030 | 0.020 | 0.013 | 21.40 | 0.99 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,557,679 | rs552299287 | A → T | 0.006 | 0.004 | 0.000 | 18.00 | 0.84 | STAT5A, STAT6 | CELF1–2, RC3H1, XPO5 | — | 1 | 0.84 | 1 |
9:126,597,096 | — | T → C | 0.006 | 0.004 | 0.000 | 16.64 | 0.97 | SOX5 | HNRNPA0, HNRNPA1, HNRNPD, ELAVL1, ZFP36 | XPO5 | <0.01 | <0.01 | <0.01 |
9:126,603,402 | rs78012023 | TTA → ATG | 0.018 | 0.012 | 0.000 | 8.99 | 0.91 | — | — | — | N/A | <0.01 | <0.01 |
9:126,625,643 | rs116887221 | C → A | 0.006 | 0.004 | 0.008 | 16.24 | 0.8 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,691,321 | rs79740971 | G → A | 0.006 | 0.008 | 0.010 | 13.52 | 0.84 | ESRRA | — | — | <0.01 | <0.01 | <0.01 |
Positions correspond to GRCh37. Variant queries were facilitated by Kaviar (94). CADD and LINSIGHT columns show CADD and LINSIGHT deleteriousness prediction scores, accordingly. TF− shows TFs for which binding is predicted to be negatively impacted. RBP− and RBP+ show RBPs for which binding is predicted to be negatively impacted or enhanced, respectively. GWAS allelic LDs show linkage disequilibrium D′ values between the rare variants and each of the previously identified DENND1A GWAS risk alleles.
Abbreviation: N/A, not applicable for coding variant.
Affected (aff) cohort allele frequencies represent proportion of alleles with variant in PCOS/HA subjects.
Unrelated (unrel) cohort allele frequencies represent proportion of variant alleles in parents.
Population (pop) allele frequencies correspond to Wellderly cohort.
Coding variant (rs189947178 and rs141759269).
Position . | ID . | Variant . | Allele Frequency . | CADD . | LINSIGHT . | TF − (Disrupted) . | RBP − (Disrupted) . | RBP + (Enhanced) . | GWAS Allelic LD (D′) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Affa . | Unrelb . | Popc . | rs2479106 . | rs10818854 . | rs10986105 . | ||||||||
9:126,144,390d | rs189947178 | G → T | 0.018 | 0.020 | 0.003 | 11.91 | N/A | N/A | — | — | <0.01 | 0.13 | <0.01 |
9:126,154,100 | rs147370674 | C → T | 0.006 | 0.004 | 0.004 | 21.10 | 0.97 | — | XPO5 | — | <0.01 | <0.01 | <0.01 |
9:126,154,582 | rs529224231 | T → G | 0.006 | 0.004 | 0.000 | 21.30 | 0.97 | MAFK, MAFB, NRL | — | IFIH1, XPO5 | N/A | N/A | N/A |
9:126,169,258 | rs561100869 | C → T | 0.000 | 0.004 | 0.000 | 16.94 | 0.97 | IRF1 | NOVA1 | — | <0.01 | <0.01 | <0.01 |
9:126,180,758 | rs750425892 | G → C | 0.000 | 0.004 | 0.000 | 19.31 | 0.96 | STAT6, MZF1, SPI1 | PCBP2 | — | <0.01 | <0.01 | <0.01 |
9:126,203,546 | — | T → C | 0.006 | 0.004 | 0.000 | 20.50 | 0.93 | FOXM1 | RBMX | — | <0.01 | <0.01 | <0.01 |
9:126,225,586 | rs538451690 | T → A | 0.006 | 0.004 | 0.002 | 16.07 | 0.94 | — | — | RC3H1, XPO5 | 1 | <0.01 | <0.01 |
9:126,231,902 | — | C → T | 0.006 | 0.004 | 0.000 | 20.20 | 0.97 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,247,645 | rs543947590 | C → A | 0.006 | 0.004 | 0.003 | 21.00 | 0.97 | STAT5A | — | OAS1 | 1 | <0.01 | <0.01 |
9:126,267,980 | rs558809288 | C → T | 0.006 | 0.004 | 0.000 | 19.97 | 0.84 | ARNT | — | — | <0.01 | <0.01 | <0.01 |
9:126,284,213 | rs149244424 | C → T | 0.012 | 0.004 | 0.008 | 12.44 | 0.92 | ESRRA, ELF1 | — | — | <0.01 | <0.01 | <0.01 |
9:126,312,990 | rs748274474 | A → G | 0.006 | 0.004 | 0.003 | 21.10 | 0.96 | — | — | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,313,234 | — | C → A | 0.006 | 0.004 | 0.000 | 17.84 | 0.97 | — | SRSF2 | — | <0.01 | <0.01 | <0.01 |
9:126,326,081 | rs138249397 | C → T | 0.018 | 0.016 | 0.002 | 9.32 | 0.88 | — | OAS1 | SRP68 | 0.73 | <0.01 | <0.01 |
9:126,331,427 | rs184609118 | A → C | 0.000 | 0.004 | 0.002 | 18.48 | 0.97 | — | ELAVL1, SSB | CELF2, NOVA1 | 1 | <0.01 | <0.01 |
9:126,370,689 | rs564042790 | C → T | 0.006 | 0.004 | 0.003 | 16.84 | 0.89 | SMAD3 | IFIH1 | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,402,348 | rs182167487 | C → T | 0.000 | 0.004 | 0.000 | 14.18 | 0.93 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,404,312 | — | G → A | 0.006 | 0.004 | 0.000 | 15.72 | 0.91 | — | SRP54, SRP68 | PTBP1 | 1 | <0.01 | <0.01 |
9:126,414,365d | rs141759269 | T → G | 0.000 | 0.004 | 0.001 | 26.00 | N/A | N/A | — | — | N/A | <0.01 | <0.01 |
9:126,440,857 | rs147844210 | T → C | 0.000 | 0.004 | 0.008 | 21.60 | 0.95 | GATA6 | — | CMTR1, FUS, SRSF3, YBX1 | <0.01 | <0.01 | <0.01 |
9:126,441,904 | rs75342773 | T → C | 0.006 | 0.004 | 0.008 | 16.64 | 0.95 | — | HNRNPH1–2 | TRA2B | <0.01 | <0.01 | <0.01 |
9:126,447,680 | rs117984673 | T → A | 0.018 | 0.016 | 0.008 | 17.70 | 0.94 | SOX10, SOX17 | NOVA2 | — | 0.73 | <0.01 | <0.01 |
9:126,458,124 | rs112188193 | G → C | 0.012 | 0.008 | 0.013 | 14.43 | 0.92 | STAT4, STAT5A, TEAD1 | RC3H1 | NUDT21 | <0.01 | <0.01 | <0.01 |
9:126,480,236 | rs543924878 | A → C | 0.006 | 0.004 | 0.004 | 18.31 | 0.81 | SREBF1 | — | HNRNPH1–3, HNRNPF, KHSRP, SRP14 | <0.01 | <0.01 | <0.01 |
9:126,513,154 | rs147058034 | A → G | 0.006 | 0.004 | 0.003 | 19.81 | 0.97 | FOXG1, FOXP3, FOXO4, FOXO6, FOXC1, FOXK1 | CELF1, XPO5 | NOVA1 | 1 | 1 | 1 |
9:126,549,983 | — | T → C | 0.018 | 0.004 | 0.000 | 19.45 | 0.81 | — | — | — | 1 | <0.01 | <0.01 |
9:126,555,003 | rs117893097 | C → G | 0.030 | 0.020 | 0.013 | 21.40 | 0.99 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,557,679 | rs552299287 | A → T | 0.006 | 0.004 | 0.000 | 18.00 | 0.84 | STAT5A, STAT6 | CELF1–2, RC3H1, XPO5 | — | 1 | 0.84 | 1 |
9:126,597,096 | — | T → C | 0.006 | 0.004 | 0.000 | 16.64 | 0.97 | SOX5 | HNRNPA0, HNRNPA1, HNRNPD, ELAVL1, ZFP36 | XPO5 | <0.01 | <0.01 | <0.01 |
9:126,603,402 | rs78012023 | TTA → ATG | 0.018 | 0.012 | 0.000 | 8.99 | 0.91 | — | — | — | N/A | <0.01 | <0.01 |
9:126,625,643 | rs116887221 | C → A | 0.006 | 0.004 | 0.008 | 16.24 | 0.8 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,691,321 | rs79740971 | G → A | 0.006 | 0.008 | 0.010 | 13.52 | 0.84 | ESRRA | — | — | <0.01 | <0.01 | <0.01 |
Position . | ID . | Variant . | Allele Frequency . | CADD . | LINSIGHT . | TF − (Disrupted) . | RBP − (Disrupted) . | RBP + (Enhanced) . | GWAS Allelic LD (D′) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Affa . | Unrelb . | Popc . | rs2479106 . | rs10818854 . | rs10986105 . | ||||||||
9:126,144,390d | rs189947178 | G → T | 0.018 | 0.020 | 0.003 | 11.91 | N/A | N/A | — | — | <0.01 | 0.13 | <0.01 |
9:126,154,100 | rs147370674 | C → T | 0.006 | 0.004 | 0.004 | 21.10 | 0.97 | — | XPO5 | — | <0.01 | <0.01 | <0.01 |
9:126,154,582 | rs529224231 | T → G | 0.006 | 0.004 | 0.000 | 21.30 | 0.97 | MAFK, MAFB, NRL | — | IFIH1, XPO5 | N/A | N/A | N/A |
9:126,169,258 | rs561100869 | C → T | 0.000 | 0.004 | 0.000 | 16.94 | 0.97 | IRF1 | NOVA1 | — | <0.01 | <0.01 | <0.01 |
9:126,180,758 | rs750425892 | G → C | 0.000 | 0.004 | 0.000 | 19.31 | 0.96 | STAT6, MZF1, SPI1 | PCBP2 | — | <0.01 | <0.01 | <0.01 |
9:126,203,546 | — | T → C | 0.006 | 0.004 | 0.000 | 20.50 | 0.93 | FOXM1 | RBMX | — | <0.01 | <0.01 | <0.01 |
9:126,225,586 | rs538451690 | T → A | 0.006 | 0.004 | 0.002 | 16.07 | 0.94 | — | — | RC3H1, XPO5 | 1 | <0.01 | <0.01 |
9:126,231,902 | — | C → T | 0.006 | 0.004 | 0.000 | 20.20 | 0.97 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,247,645 | rs543947590 | C → A | 0.006 | 0.004 | 0.003 | 21.00 | 0.97 | STAT5A | — | OAS1 | 1 | <0.01 | <0.01 |
9:126,267,980 | rs558809288 | C → T | 0.006 | 0.004 | 0.000 | 19.97 | 0.84 | ARNT | — | — | <0.01 | <0.01 | <0.01 |
9:126,284,213 | rs149244424 | C → T | 0.012 | 0.004 | 0.008 | 12.44 | 0.92 | ESRRA, ELF1 | — | — | <0.01 | <0.01 | <0.01 |
9:126,312,990 | rs748274474 | A → G | 0.006 | 0.004 | 0.003 | 21.10 | 0.96 | — | — | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,313,234 | — | C → A | 0.006 | 0.004 | 0.000 | 17.84 | 0.97 | — | SRSF2 | — | <0.01 | <0.01 | <0.01 |
9:126,326,081 | rs138249397 | C → T | 0.018 | 0.016 | 0.002 | 9.32 | 0.88 | — | OAS1 | SRP68 | 0.73 | <0.01 | <0.01 |
9:126,331,427 | rs184609118 | A → C | 0.000 | 0.004 | 0.002 | 18.48 | 0.97 | — | ELAVL1, SSB | CELF2, NOVA1 | 1 | <0.01 | <0.01 |
9:126,370,689 | rs564042790 | C → T | 0.006 | 0.004 | 0.003 | 16.84 | 0.89 | SMAD3 | IFIH1 | YBX1 | <0.01 | <0.01 | <0.01 |
9:126,402,348 | rs182167487 | C → T | 0.000 | 0.004 | 0.000 | 14.18 | 0.93 | — | — | — | <0.01 | <0.01 | <0.01 |
9:126,404,312 | — | G → A | 0.006 | 0.004 | 0.000 | 15.72 | 0.91 | — | SRP54, SRP68 | PTBP1 | 1 | <0.01 | <0.01 |
9:126,414,365d | rs141759269 | T → G | 0.000 | 0.004 | 0.001 | 26.00 | N/A | N/A | — | — | N/A | <0.01 | <0.01 |
9:126,440,857 | rs147844210 | T → C | 0.000 | 0.004 | 0.008 | 21.60 | 0.95 | GATA6 | — | CMTR1, FUS, SRSF3, YBX1 | <0.01 | <0.01 | <0.01 |
9:126,441,904 | rs75342773 | T → C | 0.006 | 0.004 | 0.008 | 16.64 | 0.95 | — | HNRNPH1–2 | TRA2B | <0.01 | <0.01 | <0.01 |
9:126,447,680 | rs117984673 | T → A | 0.018 | 0.016 | 0.008 | 17.70 | 0.94 | SOX10, SOX17 | NOVA2 | — | 0.73 | <0.01 | <0.01 |
9:126,458,124 | rs112188193 | G → C | 0.012 | 0.008 | 0.013 | 14.43 | 0.92 | STAT4, STAT5A, TEAD1 | RC3H1 | NUDT21 | <0.01 | <0.01 | <0.01 |
9:126,480,236 | rs543924878 | A → C | 0.006 | 0.004 | 0.004 | 18.31 | 0.81 | SREBF1 | — | HNRNPH1–3, HNRNPF, KHSRP, SRP14 | <0.01 | <0.01 | <0.01 |
9:126,513,154 | rs147058034 | A → G | 0.006 | 0.004 | 0.003 | 19.81 | 0.97 | FOXG1, FOXP3, FOXO4, FOXO6, FOXC1, FOXK1 | CELF1, XPO5 | NOVA1 | 1 | 1 | 1 |
9:126,549,983 | — | T → C | 0.018 | 0.004 | 0.000 | 19.45 | 0.81 | — | — | — | 1 | <0.01 | <0.01 |
9:126,555,003 | rs117893097 | C → G | 0.030 | 0.020 | 0.013 | 21.40 | 0.99 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,557,679 | rs552299287 | A → T | 0.006 | 0.004 | 0.000 | 18.00 | 0.84 | STAT5A, STAT6 | CELF1–2, RC3H1, XPO5 | — | 1 | 0.84 | 1 |
9:126,597,096 | — | T → C | 0.006 | 0.004 | 0.000 | 16.64 | 0.97 | SOX5 | HNRNPA0, HNRNPA1, HNRNPD, ELAVL1, ZFP36 | XPO5 | <0.01 | <0.01 | <0.01 |
9:126,603,402 | rs78012023 | TTA → ATG | 0.018 | 0.012 | 0.000 | 8.99 | 0.91 | — | — | — | N/A | <0.01 | <0.01 |
9:126,625,643 | rs116887221 | C → A | 0.006 | 0.004 | 0.008 | 16.24 | 0.8 | — | HNRNPH1–3, HNRNPF, KHSRP | — | <0.01 | <0.01 | <0.01 |
9:126,691,321 | rs79740971 | G → A | 0.006 | 0.008 | 0.010 | 13.52 | 0.84 | ESRRA | — | — | <0.01 | <0.01 | <0.01 |
Positions correspond to GRCh37. Variant queries were facilitated by Kaviar (94). CADD and LINSIGHT columns show CADD and LINSIGHT deleteriousness prediction scores, accordingly. TF− shows TFs for which binding is predicted to be negatively impacted. RBP− and RBP+ show RBPs for which binding is predicted to be negatively impacted or enhanced, respectively. GWAS allelic LDs show linkage disequilibrium D′ values between the rare variants and each of the previously identified DENND1A GWAS risk alleles.
Abbreviation: N/A, not applicable for coding variant.
Affected (aff) cohort allele frequencies represent proportion of alleles with variant in PCOS/HA subjects.
Unrelated (unrel) cohort allele frequencies represent proportion of variant alleles in parents.
Population (pop) allele frequencies correspond to Wellderly cohort.
Coding variant (rs189947178 and rs141759269).
Protein binding effect prediction
Nine of the DENND1A variants were predicted to significantly impact TF binding motifs, whereas most variants were predicted to significantly alter RBP binding motifs (17 disrupted, 14 induced). The specific TF and RBP motifs associated with each noncoding variant are listed in Table 3. Binding by the heterogeneous nuclear ribonucleoprotein family of RBPs appeared to be the most commonly impacted.
Discussion
We identified rare variants in DENND1A that were significantly associated with altered reproductive and metabolic hormone levels in PCOS. Fifty percent of the PCOS families included in this study carried at least one of these rare DENND1A variants. These findings are of considerable interest because common variants in DENND1A were associated with PCOS diagnosis in a GWAS of Han Chinese women (4); these associations were replicated in women of European ancestry (95). Subsequently, McAllister and colleagues (61, 96) discovered that DENND1A plays a key role in androgen biosynthesis in human theca cells and is upregulated in PCOS theca cells. Our study, using an independent family-based WGS analytical approach, further implicates DENND1A in the pathogenesis of PCOS. Additionally, our results provide genetic evidence to support our hypothesis that hyperandrogenemia is a core biologic pathway in PCOS (22, 97). The increased LH/FSH ratios that we observed in rare variant carriers (Fig. 1) also suggest that DENND1A contributes to the disruption of gonadotropin secretion that is a characteristic feature of PCOS (98).
DENND1A encodes two transcripts as the result of alternative splicing, DENND1A.V1 and DENND1A.V2 (61). The encoded V2 protein was found in ovarian theca cells and its abundance was correlated with increased androgen production (96). Forced expression of the V2 transcript produced a PCOS phenotype in normal theca cells, whereas knockdown of V2 in PCOS theca cells reduced thecal androgen biosynthesis (96). The three DENND1A risk variants identified by GWASs (rs2479106, rs10818854, rs10986105) were located in introns, and the functional consequences of these variants are unknown (61). There have been no reported large-scale sequencing studies to fine-map causal variants in DENND1A (90). Targeted (61, 96, 99) and whole-exome sequencing (100) in small PCOS case-control cohorts have failed to identify any variants in DENND1A that were associated with PCOS or with V2 isoform expression. The GWAS risk variants and most of the rare DENND1A variants in the current study lie well upstream (100 to 400 kb) of this splicing region (Fig. 2). Nevertheless, based on the LD we observed between the rare variants and the GWAS risk variants, some of the rare variants (particularly those with D′ = 1) may contribute to the GWAS signal at rs2479106, but a larger cohort would be needed to conduct conditional association analyses.
A number of our rare DENND1A variants were predicted to disrupt conserved TF binding motifs, which could affect gene expression, but most of the variants were predicted to alter affinities of RBPs to the mRNA transcript (Table 3). It is plausible, therefore, that these rare variants were selectively driving V2 expression via posttranscriptional regulation. Collectively, the rare DENND1A variants were found in 50% of families, but each individual variant was typically found in only one or two families. Our findings, therefore, support a model of PCOS in which causal variants are individually uncommon but collectively occur in key genes regulating core biologic pathways contributing to disease pathogenesis, such as androgen biosynthesis (23). Our recent findings (18, 38) of multiple rare variants in the AMH and AMHR2 genes that reduce their biologic activity in ∼7% of women with PCOS is consistent with this model.
Several PCOS GWAS candidate genes besides DENND1A appeared among the top gene associations, but they failed to reach genome-wide significance. These genes included C9orf3 (5, 6, 8), HMGA2 (5), ZBTB16 (8), TOX3 (5, 8), and THADA (4, 5, 7, 8) (Fig. 3). Two additional genes with strong, but not genome-wide significant, associations with PCOS quantitative traits were highly plausible PCOS candidate genes. BMP6 had the third strongest association in our meta-analysis (P = 4.00 × 10−3, Table 2). It is a member of the bone morphogenetic protein family, which are growth factors involved in folliculogenesis. BMP6 expression was previously found to be significantly higher in granulosa cells from women with PCOS compared with reproductively normal control women (101). Moreover, BMP6 was found to increase expression of the FSH receptor, inhibin/activin β subunits, and AMH genes in human granulosa cells (102). PRDM2 had the fifth strongest association in our meta-analysis (P = 6.95×10−3, Table 2). PRDM2 is an estrogen receptor coactivator (103) that is highly expressed in the ovary and pituitary gland (88). Ligand-bound estrogen receptor α binds with PRDM2 to open chromatin at estrogen receptor α target genes (103, 104). PRDM2 also binds with the retinoblastoma protein (105), which has been shown to play an important role in follicular development in granulosa cells (106, 107).
Our results align with the emerging evidence that rare coding variants with large effect sizes do not play a major role in complex disease (108). The average estimated effect size of the rare DENND1A variants—assuming additive, equal variant weighting—was relatively modest (OR, 1.20; 95% CI, 0.69 to 2.14; Table 2), although individual variants may have larger effects. Of the 32 rare variants we identified in DENND1A, 30 were noncoding. Indeed, it appears that complex traits are primarily driven by noncoding variation (109, 110), both common and rare (110, 111). Paired WGS and transcriptome sequencing analysis from one extensive kindred demonstrated that rare noncoding variants had strong effects on individual gene-expression profiles (113). In a study designed similarly to ours, Ament et al. (114) found rare variants, the vast majority of which were noncoding, associated with increased risk for bipolar disorder. Owing to the relative cost-effectiveness of whole-exome sequencing (109), the limited availability of computational tools designed to predict the effects of noncoding variants on phenotypes (115), and our relatively poor understanding of regulatory mechanisms in the genome (116), noncoding variants have been noticeably understudied in complex trait genetics. As larger WGS data sets are accumulated and the focus of complex trait studies shifts more toward understanding regulatory mechanisms, the contribution of rare noncoding variants in various complex diseases will become clearer.
The central statistical challenge in studying rare variants is achieving adequate power to detect significant associations while controlling for type I error. The family-based structure of our cohort provided an enrichment of individual rare variants and enabled modeling of familial segregation (117). To mitigate variant calling errors, we used replicate samples from one family to determine optimal read depth and quality thresholds. To remove irrelevant variants from consideration, we applied a LINSIGHT score threshold and gene-specific CADD score thresholds (77) to filter for deleteriousness. We further prioritized variants by weighting them by their relative CADD scores. To group rare variants effectively, we applied several windows-based binning methods, in addition to the gene-based approach, to ensure that different kinds of functionally correlated genomic regions were tested, both of fixed length and of variable length. To limit our search to genes that were more likely to have specific roles in PCOS etiology, we only considered genes with rare deleterious variants in at least 10% of cases, as causal rare variants are more likely to accumulate in core disease genes (109). We greatly increased our power to detect relevant disease genes and account for pleiotropic effects by consolidating quantitative trait association results into a meta-analysis (80, 81). To reduce type I error, in addition to the variant calling quality control measures, we modeled the quantitative traits against skewed distributions and further normalized trait residuals using an INT.
Given the size of our cohort, it was necessary to apply relatively strict filters based on predicted variant effects and cumulative allele frequencies to detect rare variant associations. Very large sample sizes are otherwise required for WGS studies of rare variants (118, 119). By only including genes with rare, likely-deleterious variants in at least 10% of cases, we greatly reduced the multiple-testing burden of our analysis, thereby increasing our power to detect core PCOS genes (120, 121). Applying a priori hypotheses regarding which variants and genes may be relevant to disease, however, is analogous to a candidate gene approach (118). Any sets of rare variants that contribute to PCOS in smaller subpopulations of women with PCOS, as we found for AMH (38), were likely removed from consideration. Likewise, any causal rare variants that were not predicted bioinformatically to be deleterious based on existing annotations and evolutionary conservation (46, 50) would not have been detected. Furthermore, because variants were filtered for consistency with Mendelian inheritance, de novo mutations were not considered.
Because the number of genetic variants is directly correlated with the size of the gene (76), the CVF threshold introduced a bias toward larger genes in some of the analyses. However, this bias was mitigated by adjusting for gene length. Furthermore, any such bias was not applicable to our fixed-length windows-based approach, which replicated our DENND1A findings. It is also possible that causal rare variants with larger effect sizes were omitted from the meta-analysis because we tested against normalized trait residuals in an effort to reduce type I errors. Using normalized trait residuals may have excluded variants with large effects that produced outliers. However, as mentioned above, recent evidence has demonstrated that complex traits are primarily driven by noncoding variation with modest effect sizes (109, 110).
Despite the numerous steps taken to increase power, our study ultimately remained underpowered to detect rare variant associations with PCOS diagnosis in the dichotomous trait analysis. Although an association with PCOS quantitative traits implicates genetic variants in disease pathogenesis, it does not necessarily mean that a gene is associated with PCOS itself. The noncoding variants identified in this study, despite being rare and predicted to be deleterious, may be in LD with the actual pathogenic variant on the same alleles that were removed by the applied allele frequency or predicted effect thresholds. These noncoding variants may also affect the expression of one or more distant genes outside of the gene in which the variants are located. Importantly, the results of this study have not been replicated in an independent cohort and therefore, despite the established association of DENND1A variation with PCOS, should be regarded as hypothesis generating. Replication and functional studies are needed to confirm individual variant functionality and disease associations. Finally, because the women with PCOS studied herein were diagnosed according to National Institutes of Health criteria, our results may not be generalizable to other PCOS phenotypes defined by different sets of diagnostic criteria.
In summary, by applying family-based sequence kernel association tests on filtered whole-genome variant call data from a cohort of PCOS families, we were able to identify rare variants in the DENND1A gene that were associated with quantitative hormonal traits of PCOS. Because DENND1A is an important regulator of androgen biosynthesis, these findings provide additional genetic evidence to support our hypothesis that hyperandrogenemia is a core biologic pathway in PCOS (23, 97). Our results also suggest that rare noncoding variants contribute to the distinctive hormonal profile of PCOS. This study further demonstrates that using a quantitative trait meta-analysis can be a powerful approach in rare variant association testing, particularly for complex diseases with pleiotropic etiologies.
Acknowledgments
We thank Drs. Terry Farrah and Gustavo Glusman, from the Institute for Systems Biology, for facilitating variant queries via Kaviar (http://db.systemsbiology.net/kaviar). We thank the Northwestern University Research Computing Services team for supporting the computational needs of this research. We are very grateful to all of the families who took part in the study.
Financial Support: This study was supported by National Institutes of Health Grants P50 HD044405 (to A.D.) and R01 HD085227 (to A.D.). M.D. was supported by National Institutes of Health National Research Service Award T32 DK007169. This study uses data from the Scripps Wellderly Genome Resource, which is funded under National Institutes of Health Grant UL1 TR001114 and a Scripps Translational Science Institute Clinical and Translational Science Award.
Disclosure Summary: The authors have nothing to disclose.
Abbreviations:
- BMI
body mass index
- CADD
Combined Annotation Dependent Depletion
- CGI
Complete Genomics, Inc.
- CVF
cumulative variant frequency
- DHEAS
dehydroepiandrosterone sulfate
- GWAS
genome-wide association study
- HA
hyperandrogenemic
- INT
inverse normal transformation
- LD
linkage disequilibrium
- MAF
minor allele frequency
- MSC
mutation significance cutoff
- Pc
corrected P value
- Pn
nominal P value
- PCOS
polycystic ovary syndrome
- RBP
RNA-binding protein
- SNP
single-nucleotide polymorphism
- SNV
single nucleotide variant
- T
testosterone
- TF
transcription factor
- WGS
whole-genome sequencing
- μ (as subscript)
mean
References and Notes
Author notes
A.D. and M.G.H. jointly supervised this work.