-
PDF
- Split View
-
Views
-
Cite
Cite
Matthew Dapas, Andrea Dunaif, Deconstructing a Syndrome: Genomic Insights Into PCOS Causal Mechanisms and Classification, Endocrine Reviews, Volume 43, Issue 6, December 2022, Pages 927–965, https://doi.org/10.1210/endrev/bnac001
- Share Icon Share
Abstract
Polycystic ovary syndrome (PCOS) is among the most common disorders in women of reproductive age, affecting up to 15% worldwide, depending on the diagnostic criteria. PCOS is characterized by a constellation of interrelated reproductive abnormalities, including disordered gonadotropin secretion, increased androgen production, chronic anovulation, and polycystic ovarian morphology. It is frequently associated with insulin resistance and obesity. These reproductive and metabolic derangements cause major morbidities across the lifespan, including anovulatory infertility and type 2 diabetes (T2D).
Despite decades of investigative effort, the etiology of PCOS remains unknown. Familial clustering of PCOS cases has indicated a genetic contribution to PCOS. There are rare Mendelian forms of PCOS associated with extreme phenotypes, but PCOS typically follows a non-Mendelian pattern of inheritance consistent with a complex genetic architecture, analogous to T2D and obesity, that reflects the interaction of susceptibility genes and environmental factors. Genomic studies of PCOS have provided important insights into disease pathways and have indicated that current diagnostic criteria do not capture underlying differences in biology associated with different forms of PCOS.
We provide a state-of-the-science review of genetic analyses of PCOS, including an overview of genomic methodologies aimed at a general audience of non-geneticists and clinicians. Applications in PCOS will be discussed, including strengths and limitations of each study. The contributions of environmental factors, including developmental origins, will be reviewed. Insights into the pathogenesis and genetic architecture of PCOS will be summarized. Future directions for PCOS genetic studies will be outlined.

PCOS is a highly heritable complex trait with ~20 identified common susceptibility variants from genome-wide association studies and rare variants in DENND1A, AMH, and AMHR2 discovered by next-generation sequencing
The genetic architecture of NIH PCOS, non-NIH Rotterdam PCOS, and self-reported PCOS is generally similar, suggesting that these diagnostic criteria do not identify biologically distinct PCOS phenotypes
Unsupervised clustering analysis has identified discrete and reproducible reproductive and metabolic PCOS subtypes with preliminary evidence for distinct genetic architectures
Environmental factors, such as intrauterine androgen excess, may act through epigenetic mechanisms in concert with susceptibility variants to produce PCOS phenotypes
Ongoing genetic analyses promise to elucidate the distinct etiologies of PCOS, enabling the transition toward precision medicine for PCOS
Polycystic ovary syndrome (PCOS) was originally described as a reproductive disorder characterized by enlarged, smooth polycystic ovaries, menstrual irregularity, infertility, and hirsutism (1, 2). Stein and Leventhal (3) are credited with the first report that the clinical features of menstrual regularity and infertility could be improved by removal of portions of the enlarged ovaries in a procedure known as bilateral ovarian wedge resection.
In 1935, Stein and Leventhal called attention to a syndrome characterized by amenorrhea, occasionally menometrorrhagia, sterility, and hirsutism. The pathological findings were large, pale, polycystic ovaries with thickened capsules. Therapeutically, it was found that with adequate wedge resection of both ovaries, fertility and regular menstrual periods could be achieved in many instances (4).
As a result, this constellation of findings became known as the Stein-Leventhal syndrome (2, 5). Over the ensuing decades, PCOS has replaced Stein-Leventhal syndrome as the preferred terminology (2, 6). With the advent of hormone measurements in the 1950s, the biochemical features of PCOS began to be characterized.
Despite derangements in gonadotropin secretion and androgen production, PCOS remained an enigmatic reproductive disorder apparently amenable to surgical cure. The discovery in the 1970s of the monogenic syndromes of extreme insulin resistance, acanthosis nigricans, and hyperandrogenism, due to mutations in the insulin receptor gene, suggested a possible causal link between hyperinsulinemia and hyperandrogenism (7). In 1980, Burghen and colleagues (8) reported that women with PCOS had increased insulin responses during oral glucose tolerance testing independent of obesity, which suggested for the first time that PCOS might be related to these insulin resistance–hyperandrogenism syndromes. Subsequent studies reported that acanthosis nigricans was a common finding in women with typical PCOS (9, 10). Studies of glucose homeostasis indicated that affected women were substantially insulin resistant, independent of obesity, but phenotypically distinct from women with the monogenic disorders of extreme insulin resistance (reviewed in (7, 11)). Nevertheless, these similarities suggested that variation in the insulin receptor gene might also contribute to typical PCOS.
Pathophysiology of PCOS
Reproductive phenotype
PCOS is characterized by a constellation of interrelated reproductive hormone alterations (12) (Fig. 1) designated as a “vicious cycle” by Rebar and colleagues in 1976 (13). There is enhanced luteinizing hormone (LH) relative to follicle-stimulating hormone (FSH) release. However, circulating FSH levels are in the low-normal range compared to normally cycling women in the early follicular phase of the menstrual cycle and do not exhibit cyclic increases in anovulatory PCOS (13, 14). Further, the characteristic increased LH:FSH ratio may escape detection in a single random blood sample due to pulsatile LH release (15). The most frequent androgen abnormality in PCOS is increased free testosterone levels because sex hormone–binding globulin (SHBG) levels are reduced (7, 16-19). In anovulatory PCOS, estradiol levels are tonically in the mid-follicular range and progesterone levels are low (20).

Pathophysiology of PCOS. The “vicious cycle” of PCOS (13), a self-sustaining constellation of reproductive abnormalities. The association with insulin resistance and hyperinsulinemia was discovered in the 1980s (reviewed in (7, 21)). Abbreviations: AMH, anti-Müllerian hormone; E2, estradiol; FSH, follicle-stimulating hormone; GnRH, gonadotropin-releasing hormone; LH, luteinizing hormone; PCOS, polycystic ovary syndrome; SHBG, sex hormone–binding globulin; T, testosterone.
Anti-Müllerian hormone (AMH) is secreted by the granulosa cells of preantral and small antral follicles and is a regulator of ovarian folliculogenesis (22). Circulating AMH levels are proportional to the number of growing ovarian follicles (23). Accordingly, AMH levels are increased in PCOS and have been considered to be a marker for the syndrome’s distinctive disorder of folliculogenesis (22, 23). However, the receptor for AMH, AMHR2, is also expressed on a subset of hypothalamic gonadotropin-releasing hormone (GnRH) neurons in mice and humans (24). In mice, AMH activates GnRH neuron firing (24, 25). Thus, it has been proposed that AMH may contribute to PCOS through extragonadal actions (24, 25).
The frequency and amplitude of pulsatile GnRH secretion is increased (26, 27). This derangement selectively increases LH while simultaneously suppressing FSH release (28). Testosterone decreases the sensitivity of the hypothalamic GnRH pulse generator to the normal feedback actions of estradiol and progesterone to slow pulse frequency (29). In addition, enhanced gonadotrope sensitivity to GnRH resulting from tonic estrogen feedback contributes to increased LH pulses (30). LH stimulates ovarian theca cell testosterone production; there is decreased granulosa aromatization of testosterone to estradiol due to relative FSH deficiency (31, 32). Intraovarian factors, such as increased AMH (33) and, perhaps, alternative splicing of the androgen receptor (34), may contribute to the inhibition of granulosa cell aromatase activity. There are also constitutive increases across multiple theca cell steroidogenic enzyme pathways regulating androgen biosynthesis (35). These steroidogenic enzymes are shared by the adrenal gland so that this alteration likely contributes to increased adrenal androgen production that is a common feature of PCOS (18, 19, 36). There is also evidence for enhanced adrenal sensitivity to ACTH in PCOS (37).
Polycystic ovaries (PCO) are characterized by an increase in antral follicles and ovarian stroma as well as by theca cell hyperplasia and ovarian cortical thickening (21, 38). Theca cells from PCO secrete more androgens, basally and in response to LH (39), consistent with the constitutive activation of steroidogenic enzymes (40). Increases in androgen production are found in theca cells isolated from ovulatory as well as anovulatory women with PCOS (39). PCO have an excess of growing follicles, with a reciprocal decrease in the proportion of primordial follicles compared with normal ovaries (38, 41). The excess of follicles could result from accelerated follicle growth and/or prolonged survival of small follicles in comparison to follicles from normal ovaries (41-43). Thus, the gonadotropin-independent development of preantral follicles appears to be disordered in PCO, but this process remains poorly understood (43). Aberrant responsiveness of granulosa cells to LH (44, 45), altered expression of luteinizing hormone/human chorionic gonadotrophin receptor (LHCGR) and follicle-stimulating hormone receptor (FSHR) (46), androgens (47) and intraovarian actions of AMH and other members of the transforming growth factor beta (TGF-β) signaling family (22, 46) may contribute to arrested follicular development characteristic of PCO. However, FSH administration can produce normal follicular maturation and ovulation (43, 48, 49).
Polycystic ovarian morphology (PCOM) detected by ovarian ultrasound examination is a very common finding in adult women with regular menstrual cycles, with prevalence rates as high as 32% in a large population-based cohort (50). The prevalence of PCOM is age-related and it decreases with increasing age (50). Indeed, PCOM prevalence rates are so high in adolescent girls that this finding is not recommended as a diagnostic criterion until 8 years after menarche (51-53). Women with PCOM and regular menstrual cycles may have reproductive hormone alterations (54, 55), most commonly higher testosterone levels, compared to reproductively normal women without PCOM (56, 57). Some women with PCOM have normal baseline reproductive hormone levels (56, 58) but increased androgen (59) and 17-hydroxyprogesterone responses (59, 60) to GnRH analog stimulation. Further, otherwise reproductively normal women with PCOM are at increased risk for ovarian hyperstimulation during ovulation induction with exogenous gonadotropins, analogous to women with PCOS (61-63). Family studies suggest that PCOM is a heritable trait (64, 65). Taken together, these observations suggest that PCOM reflects intrinsic ovarian abnormalities. Nevertheless, one study suggests that women with PCOM do not develop PCOS during follow-up and that ~50% of women with PCOM at baseline no longer met criteria for PCOM at follow-up (66).
Metabolic phenotype
In the 1980s, comprehensive studies of whole-body insulin action using the gold standard glucose clamp technique demonstrated that insulin-mediated glucose disposal, which reflects primarily skeletal muscle, was significantly and substantially decreased (~35%-40%) in women with PCOS, independent of obesity and body fat topography (67, 68). The decrease was of a similar magnitude to that reported in type 2 diabetes (T2D) (69). This finding has been replicated in multiple studies, although there are some studies in which nonobese women with PCOS have had normal insulin sensitivity; an observation that may reflect ethnic/racial differences (70). Hepatic insulin resistance is found only in obese women with PCOS, reflecting a synergistic deleterious effect of adiposity on hepatic insulin action.
The predominant cellular defect in classic insulin target tissues, adipocytes, and skeletal muscle, is a post-binding defect in insulin-mediated signaling (7). Most studies have also found a less striking, but significant, decreases in maximal rates of insulin-stimulated glucose transport (68, 71) suggesting a decrease in post-receptor events (68, 72). Significant decreases in the abundance of GLUT4 glucose transporters in subcutaneous adipocytes accounted for this defect in some (73, 74) but not all studies (75).
The insulin receptor-signaling defect is due to decreased tyrosine autophosphorylation of the receptor and downstream signaling molecules. There is increased constitutive serine phosphorylation of the insulin receptor and insulin receptor substrates due to serine kinases; serine inhibits tyrosine phosphorylation (7). One serine kinase responsible for this abnormal phosphorylation is in the MEK-ERK1/2 pathway, which is constitutively activated in PCOS skeletal muscle (76). In at least some tissues, such as skin fibroblasts (77), skeletal muscle (76), and ovarian granulosa lutein cells (see below and (78)), insulin resistance in PCOS is selective, affecting metabolic but no other actions of insulin. However, another study suggested that both metabolic and mitogenic pathways were compromised in PCOS skeletal muscle (79).
Under normal circumstances, insulin secretion increases to compensate for peripheral insulin resistance (80). Glucose tolerance decompensates when the β-cell is no longer able to secrete sufficient amounts of insulin to meet the increased requirements (81, 82). Despite fasting hyperinsulinemia, which reflects both insulin secretion and clearance, women with PCOS have evidence for β-cell dysfunction when it is appropriately assessed (83-85). These defects are independent of obesity and dysglycemia but are more pronounced in women with PCOS who have a first-degree relative with T2D (85, 86). Evidence for β-cell dysfunction is present in daughters of women with PCOS prior to menarche (87). However, increased basal secretion rates of insulin (84) contribute to the fasting hyperinsulinemia that is characteristic of obese women with PCOS (8, 88). Studies that have reported increased insulin secretion in PCOS have been constrained by failing to examine insulin secretion in the context of insulin sensitivity (89-93).
Diagnosis of PCOS
Given the lack of information on the cause(s) of PCOS, its diagnosis has been based on the phenotypic features of the syndrome (94), as with the diagnosis of many medical disorders.
National Institutes of Health criteria
The first widely accepted diagnostic criteria for PCOS came from a survey of a small group of experts at the 1990 National Institutes of Child Health and Human Development (NICHD) Conference on PCOS (6); a scientific meeting that reviewed various features of the syndrome. These participants were asked to vote on potential diagnostic features and those receiving the most votes, hyperandrogenism (clinical and/or biochemical) and chronic anovulation, with the exclusion of secondary causes, became what are known as the NICHD or National Institutes of Health (NIH) criteria (95). The NIH criteria did not include ovarian morphology because of the lack of specificity of this finding (95). It was clear at that time that 20% to 30% of women with regular menses and no androgenic symptoms had PCOM on ultrasound examination (54). An increased LH:FSH ratio had been a common diagnostic criterion for PCOS, but it was not included in the NIH criteria because it was thought that it could escape reliable detection in a single blood sample due to the pulsatility of LH release (15, 95). The widespread adoption of the NIH criteria standardized the diagnosis of PCOS. Importantly, these criteria identified a single phenotype of women with chronic anovulation, also known as ovulatory dysfunction (OD), and hyperandrogenism (HA). In doing so, these criteria enabled research in the field by providing a common definition of PCOS.
Rotterdam criteria
In 2003, a conference specifically addressing the PCOS diagnostic criteria was convened in Rotterdam, the Netherlands (96, 97). Although designated as a consensus conference, the conference did not follow any formal consensus development process nor were the recommended criteria evidence-based. The result of the conference was that PCOM on ultrasound examination was added to the NIH diagnostic criteria (96, 97). Accordingly, the Rotterdam criteria (96, 97) for the diagnosis of PCOS required the presence of 2 of the 3 following findings, after the exclusion of disorders of the pituitary, ovary, or adrenals that could present in a manner similar to PCOS: 1) HA; 2) chronic anovulation or OD; and 3) PCOM. These criteria include the NIH criteria but extended the diagnosis to include 2 new groups of affected women: 1) PCOM+HA without OD; and 2) PCOM+OD without HA (Table 1).
Diagnostic criteria . | NIH (1990) . | Rotterdam (2003) . | AE-PCOS (2006) . |
---|---|---|---|
1.HA | |||
2.OD | + | + | + |
3.± PCOM | |||
1.HA | |||
2.PCOM | + | + | |
1.OD | |||
3.PCOM | + |
Diagnostic criteria . | NIH (1990) . | Rotterdam (2003) . | AE-PCOS (2006) . |
---|---|---|---|
1.HA | |||
2.OD | + | + | + |
3.± PCOM | |||
1.HA | |||
2.PCOM | + | + | |
1.OD | |||
3.PCOM | + |
Sets of diagnostic criteria for PCOS are shown relative to their corresponding applicable phenotypes. Abbreviations: AE-PCOS, Society of Androgen Excess and Polycystic Ovary Syndrome; HA, hyperandrogenism (clinical and/or biochemical); NIH, National Institutes of Health; OD, ovulatory dysfunction; PCOM, polycystic ovarian morphology; ±, with or without.
Diagnostic criteria . | NIH (1990) . | Rotterdam (2003) . | AE-PCOS (2006) . |
---|---|---|---|
1.HA | |||
2.OD | + | + | + |
3.± PCOM | |||
1.HA | |||
2.PCOM | + | + | |
1.OD | |||
3.PCOM | + |
Diagnostic criteria . | NIH (1990) . | Rotterdam (2003) . | AE-PCOS (2006) . |
---|---|---|---|
1.HA | |||
2.OD | + | + | + |
3.± PCOM | |||
1.HA | |||
2.PCOM | + | + | |
1.OD | |||
3.PCOM | + |
Sets of diagnostic criteria for PCOS are shown relative to their corresponding applicable phenotypes. Abbreviations: AE-PCOS, Society of Androgen Excess and Polycystic Ovary Syndrome; HA, hyperandrogenism (clinical and/or biochemical); NIH, National Institutes of Health; OD, ovulatory dysfunction; PCOM, polycystic ovarian morphology; ±, with or without.
Androgen Excess Society
In 2006, the Androgen Excess Society (AES) charged a task force of experts in the field with the development of an evidence-based definition of PCOS (18, 19). The task force recommended that hyperandrogenism be considered as an essential component of PCOS. However, robust human data supporting an essential role for HA in the pathogenesis of PCOS was not provided, although such data were beginning to emerge (98, 99). According to the AES criteria, the diagnosis of PCOS required biochemical or clinical HA with 1) OD ± PCOM; or with 2) PCOM without OD (18, 19). In addition, the task force proposed further stratifying these AES phenotypes by the presence or absence of hirsutism. The AES criteria eliminated the Rotterdam phenotype of OD+PCOM (18, 19) (Table 1). The AES criteria have not been widely adopted (100).
NIH-sponsored evidence-based methodology workshop on PCOS
In 2012, the NIH sponsored an evidence-based methodology workshop on PCOS, in which a panel of experts reviewed the current state-of-the-science (101). Although it was not an official NIH Consensus Development Conference (https://consensus.nih.gov/), the meeting followed the same “court” model where the evidence was presented to a panel that functioned as a jury. The panel members were experts in relevant topic areas, such as Gynecology, Diabetes and Metabolism, Cardiology, and Primary Care, but were not engaged in PCOS research. As noted in the panel’s report (102), “invited experts discussed the body of evidence and attendees had opportunities to provide comments during open discussion periods.” The panel’s final report noted that “the name ‘PCOS’ was a distraction and an impediment to progress,” and that the emphasis on PCOM created confusion because it was neither necessary nor sufficient for the diagnosis of PCOS. The panel recommended using the Rotterdam criteria with precise specification of the phenotype in research studies. Further, they proposed a comprehensive research agenda that included assessment of the epidemiology and long-term health outcomes of the PCOS phenotypes.
Evidence-based guidelines
There are 2 major evidence-based guidelines for the diagnosis and management of PCOS: the Endocrine Society’s 2013 Clinical Practice Guideline (103) and the 2018 International Evidence-Based Guideline (51-53). However, the quality of the evidence upon which these guidelines are based is predominantly low, due to a paucity of randomized clinical trials (RCTs) in the field. Of the 34 recommendations in the Endocrine Society Clinical Practice Guideline, the evidence supporting 24 of these was rated as low or very low. There were almost 175 recommendations in the International Guideline, of which only 31 were ranked as evidence-based, the remainder were clinical consensus recommendations or clinical practice points. Both guidelines recommended the use of the Rotterdam criteria for the diagnosis of PCOS, which, as outlined above, are based on expert opinion rather than on RCTs or high-quality observational studies.
Assessment of PCOS phenotypes
There have been numerous cross-sectional studies comparing the PCOS phenotypes created by application of the PCOS diagnostic criteria (Table 1) (reviewed in (7, 104)). There is agreement that the NIH phenotype is associated with substantially more insulin resistance and related cardiometabolic abnormalities than the non-NIH Rotterdam phenotypes of HA+PCOM or OD+PCOM (7, 50, 59, 101, 104-111). Women with HA+PCOM and with OD+PCOM tend to be leaner than those with HA+OD (105, 107, 110, 112-114). Women with HA+PCOM and with OD+PCOM are similar with regard to body mass index (BMI) and metabolic abnormalities (105, 107), although some studies have failed to detect metabolic abnormalities in women with OD+PCOM compared to reproductively normal control women (109). There are no reproductive or metabolic differences in NIH phenotype PCOS (HA+OD) with PCOM as compared to those without (115). In fact, studies report that PCOM is present in > 90% of women with HA+OD, that is, the NIH phenotype (105, 116). The strong correlation between HA+OD and PCOM is supported by genetic analyses suggesting that these traits are associated with many of the same genome-wide association studies (GWAS) loci (117) (see “Genetic Architecture of PCOS NIH and Rotterdam Phenotypes”). Stratification of PCOS according to Rotterdam phenotypes is also affected by the accuracy of the androgen assays utilized, whether ovulation is formally assessed, the technical capacity of the ultrasound equipment as well as the approach (transvaginal vs transabdominal), and operator expertise in performing pelvic ultrasound (18, 19, 118-122).
Epidemiology
The prevalence of the NIH PCOS phenotype is 5% to 8% of premenopausal women (123-127). These prevalence estimates are remarkably consistent across racial and ethnic groups (123, 124, 126, 128, 129). Nevertheless, racial and ethnic variation in the phenotypic features of PCOS has been reported in Latinas (130, 131), African Americans (132, 133), Icelanders (112), Sri Lankans (113), Koreans (134), and Chinese (110). PCOS is the most common cause of normogonadotropic anovulation, accounting for 55% to 91% of the entire World Health Organization-II (WHO-II) cohort (135). The prevalence of PCOS is higher, 8% to 13%, using the 2003 Rotterdam criteria since these criteria include additional phenotypes (127, 135) (Table 1). PCOS is commonly associated with obesity, and Mendelian randomization studies suggest that BMI is causally related to PCOS (117, 136, 137) (see “Mendelian Randomization with PCOS”). However, one study failed to detect a significant increase in the prevalence of PCOS with the increasing prevalence of obesity (138). Additional adequately powered analyses are needed to robustly investigate this question. It is noteworthy that the prevalence of PCOS is similar in diverse populations with markedly different prevalence rates of obesity (123, 124, 126, 128, 129).
Evidence for a Genetic Contribution to PCOS
Phenotypic variance, or the observable characteristic differences between individuals, results from the combined variation of genetic, environmental, and random factors in any given population. Quantifying the relative extent that these factors can account for differences in disease prevalence is an important first step in understanding the nature of a disease. Such analyses can in turn inform the prediction of clinical outcomes. For example, in conditions that are more heavily influenced by inherited genetic factors, family history and/or genetic testing may be more predictive of disease risk and treatment outcomes than environmental or behavioral measures. Examining trait correlations within families can indicate to what extent different PCOS phenotypes are caused by inherited genetic factors.
Familial Clustering
A possible genetic susceptibility to PCOS was first suggested in the 1960s, when families with multiple affected women were reported (139-143). The phenotypic similarity between PCOS and the rare syndromes of extreme insulin resistance and hyperandrogenism (144, 145) suggested that insulin receptor mutations might also be present in PCOS (130, 146). Further, defects in insulin action persisted in cultured cells from women with PCOS, suggesting that they were genetically programmed (40, 147).
Systematic studies of relatives of women with PCOS indicated that there was familial clustering of reproductive features of the syndrome (64, 65, 98, 99, 148, 149) consistent with a genetic susceptibility to these traits. When premature balding was used to assign male affected status (64, 149), a dominant mode of inheritance was suggested but the studies were constrained by small sample size and a failure to examine all relatives (150). Nevertheless, elevated androgen levels were highly prevalent (64, 65, 98), affecting up to ~40% of reproductive-age sisters. Non-SHBG-bound testosterone levels had a bimodal distribution in sisters of women with PCOS compared to a unimodal distribution observed in control women (98, 151), suggesting that elevated testosterone levels in PCOS were a monogenic trait within families (152) and/or self-reinforcing, as in a positive feedback loop, beyond a particular threshold (153). Similarly, bimodality of insulin levels was observed in sisters of women with PCOS (151). Brothers (154), mothers (155), and premenarchal daughters (87) had hyperandrogenemia, suggesting it was a consistent reproductive endophenotype. Elevated AMH levels were also present in male (156) as well as female relatives, including children (157-159). Metabolic features of the syndrome, including hyperinsulinemia (98, 151, 155), T2D (160, 161), and obesity (162), were present in both male and female relatives.
Heritability
The proportion of trait variation in a population that can be explained by genetic differences between individuals is referred to as a trait’s heritability (163). The higher the heritability is for a given trait or disease, the more informative genetic studies can be for understanding its underlying causes. Twins are especially informative for estimating trait heritability, because they share the same environmental influences (164). Heritability can be derived by simply comparing trait correlations between pairs of monozygotic (MZ) twins, who share identical DNA, to those between pairs of dizygotic (DZ) twins, who share half of their DNA. In a large study of Dutch twins, Vink and colleagues estimated the heritability of PCOS (165). They defined PCOS as having fewer than 9 menstrual cycles annually, combined with hirsutism or acne. They found that PCOS defined by these criteria co-occurred in MZ twins with a rate of 0.71, compared to 0.38 in DZ twins (165). These correlations suggest an additive heritability of 0.66 for PCOS, but by using a common pathway model that accounted for individual genetic effects from oligomenorrhea, acne, and hirsutism, they estimated the overall heritability of PCOS to be 0.79 (165).
Other studies have shown that individual hormonal components of PCOS are also highly heritable. Family studies of testosterone heritability in women have yielded estimates of 0.26 to 0.50 (65, 166-168). Correlations of dehydroepiandrosterone sulfate (DHEAS) levels between PCOS probands and their sisters indicated a heritability of 0.44 for BMI-adjusted DHEAS (169). Heritability estimates for SHBG in women have ranged from 0.56 to 0.63 (65, 168). Metabolic factors such as BMI and insulin resistance have also been found to be highly heritable in sisters of women with PCOS (65). New computational methods have recently been developed that can use large, unrelated genomic datasets to estimate the heritability of traits due to common genetic differences between individuals (170). Such methods have not yet been applied with adequate power to PCOS, itself, but have produced common-variant heritability estimates of 0.13 to 0.20 for testosterone and SHBG levels in women (171).
Genetics for Non-Geneticists
Detecting Genetic Variation
The familial clustering and estimated heritability of PCOS and its component traits provide compelling evidence for genetic susceptibility to PCOS and provide a rationale for studies investigating associated genetic variation. Genetic variation arises from naturally occurring random mutations in germ cells that are passed to future generations, ultimately becoming distributed among populations via random drift, migration, and selection (172-174). By producing phenotypic changes, genetic variants are selected for or against depending on the survival advantages or disadvantages they may confer under different environmental pressures (175). Importantly, the phenotypic effects from individual genetic variants vary widely and depend on many interacting factors, such as age (176), environmental influences (177), and the co-occurrence of other genetic variants (178). Natural selection generally acts to prevent the proliferation of deleterious alleles within populations (175, 179, 180). Deleterious variants with large phenotypic effects—the variants that can most directly point to specific disease mechanisms—therefore, tend to be rare (175, 181, 182). Consequently, there is generally an inverse relationship between variant frequency and effect size. Rare diseases are typically caused by de novo mutations or from co-occurrence of rare, recessive genetic variants within a single gene that significantly impair important biological functions. Such monogenic diseases follow classic Mendelian patterns of inheritance and are referred to as Mendelian disorders. Most of the genetic risk for common diseases, however, results from the cumulative influence of many variants with individually tiny effect sizes that interact in complex ways (183, 184). Such conditions, including PCOS, are therefore much more difficult to understand from a genetics perspective (185, 186) and are collectively referred to as complex traits. There are a number of different methods for detecting genetic variation and many different analytical approaches for testing genetic associations with a given trait or disease. We will briefly review some of the primary methods prior to discussing their specific applications in studying PCOS.
Linkage Analysis
Humans are diploid organisms with 2 pairs of every chromosome, having inherited 1 set of chromosomes from each parent. During meiosis, a single set of chromosomes carried in each gamete (sperm or egg cell) is produced in such a way that crossovers may occur between chromosome pairs, resulting in single chromosomes of alternating segments from each of the parental chromosomes. DNA is, therefore, inherited in blocks, and genes that lie nearer to each other on a chromosome tend to be inherited together on the same haplotypes (ie, are “linked”) more often than distant genes are (Fig. 2). Linkage is therefore a function of the expected rate of recombination between loci. Prior to the sequencing of the human genome, genetic distances were measured in terms of recombination frequencies.

Linkage. Alleles that are inherited together are “linked”. The probability that any 2 markers on the same chromosome are linked is inversely proportional to the distance between the 2 markers. Figure reproduced from Bush & Moore, 2012 (187).
Linkage analysis can be applied to family data to identify disease loci. By measuring polymorphic sequences in large families (using polymerase chain reaction [PCR], for example), co-segregation patterns can be studied, and specific alleles can be mapped to different phenotypes (188-190). Whether by specifying an inheritance model or by using a nonparametric approach, linkage analysis essentially measures the relative frequencies with which alleles at different loci are shared between relatives. In theory, haplotypes carrying disease variants are shared between concordantly affected relatives more often than between discordantly affected ones. If certain disease variants consistently occur in particular regions of the genome, then linkage analyses can be used to identify those regions (191). Linkage analyses are ideal for studying Mendelian disorders that run strongly in families, but these analyses are less well suited for studying complex traits, which feature greater polygenicity, smaller allelic effect sizes, and more genetic heterogeneity (192).
Association Testing
An alternative method for identifying disease genes is the genetic association study. Rather than tracing haplotypes through families and calculating allele-sharing probabilities, genetic association studies simply compare allele counts between affected and unaffected individuals (ie, cases and controls). Association studies can therefore be performed using case-control cohorts of unrelated subjects. Association studies likewise rely on the concept of linkage to identify candidate genes, as measured genotype markers tend to be co-inherited with nearby alleles. The extent to which alleles at different loci are correlated in a population is referred to as linkage disequilibrium (LD). Because the average number of meioses between unrelated individuals is much greater than between relatives, the sizes of the nonrecombinant regions, or LD blocks, measured by markers in association studies, are much smaller than those measured in linkage analyses (Fig. 3). Accordingly, the candidate regions tagged in association studies are more precise, but many more markers are required to cover the full extent of the genome.

Linkage disequilibrium. Over many generations, linked chromosomal segments undergo successive recombination until all nonadjacent allele pairs reach equilibrium. The extent to which any 2 alleles coexist nonrandomly in a population is referred to as “linkage disequilibrium”. Figure reproduced from Bush & Moore, 2012 (187).
Prior to the advent of technologies that could efficiently genotype hundreds of thousands of markers simultaneously, association studies were typically limited to candidate genes with putative involvement in a particular disease pathway. Given large enough sample sizes, association tests can detect any number of genetic associations with more modest effect sizes. However, association tests are extremely sensitive to genetic ancestry differences between cases and controls, an effect known as population stratification, as alleles frequencies and LD structures vary greatly between ancestral populations (Fig. 4). These differences have arisen over time due to random genetic drift and local selection and are reflective of the migration history of modern humans. Failure to adequately control for population structure, which typically requires accurate genetic ancestry information, can lead to false positive associations (193, 194). Confounding can also occur from unaccounted-for relatedness between study participants, so-called cryptic relatedness, which violates assumptions of independence underlying basic association test statistics. Consequently, results from many early genetic association tests were never successfully replicated (195).

Population stratification. Allele frequencies often vary by ancestry. Failing to control for unequal distributions of ancestry between cases and controls can result in false positive associations.
Family-Based Association Testing
Under the null hypothesis in population-based association tests, variant allele frequencies are the same between the cases and controls (196). With family-based data, however, association test statistics can be computed conditionally on observed parental genotypes, thereby eliminating bias due to population stratification and/or admixture (197-200). In other words, the null hypotheses in family-based association tests are a function of which alleles each set of parents have. The most basic such test, known as the transmission disequilibrium test (TDT), simply compares the relative transmission frequencies of different marker alleles between parents and affected offspring using a χ 2 statistic, with excess transmission of an allele suggesting a disease association (201). The family-based association test (FBAT) is a generalization of the TDT method, allowing for much greater flexibility in modeling family genotype and phenotype distributions (202, 203). The FBAT can be configured for different genetic models (eg, dominant or recessive or additive), more complex pedigree structures with affected and unaffected offspring, pedigrees with missing parental genotypes, and multi-allelic markers for both binary and quantitative traits. Large pedigrees are ideal for tracing causal variants in rare diseases (204), but family-based designs can also increase the power to detect genetic associations in common diseases by inherently enriching the frequencies of causal rare variants in the study population (205, 206).
Genome-Wide Association Studies
Following the sequencing of the human genome (207, 208), the development of commercial genotyping arrays that could simultaneously assay hundreds of thousands of common DNA variants established a new era of genetic discovery (209, 210). Association testing could now be performed economically across the entire genome, thereby negating the need for a priori hypotheses about candidate disease genes. These analyses came to be known as genome-wide association studies (GWAS). The new technology held particular promise for studying the genetics of common diseases, which were hypothesized to be a function of common alleles with relatively small individual effect sizes (211, 212). By cataloging genetic variation from numerous individuals from multiple populations (213), panels of single nucleotide polymorphisms (SNPs) were developed that could most efficiently tag different haplotypes in each LD block across the entire genome. In this way, many variants can be inferred, or “imputed,” from a more limited set of genotyped SNPs. As a consequence of this design, accurate imputation of tagged variants in GWAS is predominantly limited to common variants, as LD mapping for lower frequency variants is limited by haplotype reference panel sizes (214). Furthermore, trait-associated SNPs in GWAS are rarely causal variants themselves; rather, they are linked to causal variants within the same haplotype (Figs. 2, 3).
The identification of causal variants within an associated haplotype requires more intensive follow-up study of the implicated genomic region, a process known as fine mapping (215). Nevertheless, the efficient identification of candidate gene regions by GWAS has greatly facilitated disease gene discovery. To date, thousands of GWAS have been published and hundreds of thousands of variant-trait associations have been identified (216). Furthermore, GWAS have enabled individual genetic risk prediction for complex diseases. By considering the combined effects from trait-associated variants across the genome, a polygenic risk score can be derived for a given individual. Genetic risk scores facilitate disease risk stratification and can inform disease screening and lifestyle modification (217). However, the predictive power of polygenic risk scores is inherently limited by a condition’s heritability and the quantity of underlying genetic risk that has been identified via GWAS (218), which is a limitation for PCOS given the limited sample sizes studied to date. Furthermore, risk score models do not transfer well across populations due to differences in LD and allele frequencies (219).
Collectively, GWAS have also revealed a number of important characteristics regarding the genetic architecture of complex traits and common diseases in general, namely that complex traits are highly polygenic, single gene variants often confer multiple phenotypic effects (a phenomenon called pleiotropy), and that most of the heritability of common diseases can be explained by the cumulative effects of common genetic variation (184, 220).
Phenome-Wide Association Studies
More recently, the integration of genetic data with electronic health records (EHRs) has enabled a different kind of genetic association study, the phenome-wide association study (PheWAS). Instead of systematically testing SNP associations with a given phenotype, as in a GWAS, the paradigm is reversed in PheWAS such that a catalog of phenotypes is systematically tested for association with a given SNP. Phenotypes included in a PheWAS could be International Classification of Diseases (ICD) codes, for example, or more carefully defined algorithmic-based conditions. PheWAS need not be limited to individual SNP associations, either; genetic risk scores can also be tested against multiple phenotypes. PheWAS are ideal for identifying pleiotropic gene effects. Identifying genetic variants with multiple disease associations using PheWAS can be informative for understanding common disease pathways, including potentially revealing new drug indications or off-target drug effects (221).
Mendelian Randomization and LD Score Regression
Genetics can also be used to assess causality between correlated phenotypes. Observational epidemiologic studies, which test for associations between diseases and various clinical, behavioral, and/or environmental factors, are susceptible to confounders and reverse causation, in which a disease influences the apparent exposure. Genetic alleles, however, are randomly allocated during gamete formation and are not influenced by subsequent environmental exposures or reverse causation, akin to study arm assignment in a randomized control trial (222). Genetic variation can therefore serve as proxy variants for inferring the extent to which a disease is caused by an environmental exposure in an approach called Mendelian randomization (223, 224). Mendelian randomization can likewise be used to determine whether the effects of disease-associated variants independently influence disease risk or whether they are mediated through certain intermediary risk factors (eg, BMI) (Fig. 5).

Mendelian randomization. In Mendelian randomization analyses, genetic variants that are predictive of an intermediate risk factor but are otherwise independent of the outcome of interest can be used to determine to what extent the intermediate factor is causal for the outcome in question. Example variables are shown in parentheses: to determine whether BMI is causal for PCOS, the genetic variants that predict BMI should not be independently associated with PCOS or any confounding factors like insulin resistance.
Mendelian randomization requires that the instrumental genetic variants that predict an intermediate exposure variable do not affect the outcome directly or through some other confounding relationship. For example, to determine whether BMI is causal for PCOS, the genetic variants that predict BMI should not be independently associated with PCOS or any confounding factors like insulin resistance (Fig. 5). Violation of this assumption, known as horizontal pleiotropy, is common and can produce severely biased results in Mendelian randomization studies, if not properly controlled (225). Additionally, the ability of Mendelian randomization to determine causality is limited by the extent to which the exposure variable is genetically determined by the instrument variants.
GWAS remain susceptible to confounding from population stratification (226) and cryptic relatedness (194) among samples. The net confounding bias from these effects can be estimated, however, using LD score regression (227). The basic principle behind LD score regression is that the distribution of test statistics in a GWAS is a function of LD: SNPs tagging a causal variant have elevated association statistics proportional to their relative LD with the causal variant (226). Therefore, by regressing the association test statistics against the amount of genetic variation tagged by each variant (its LD score), inflation of test statistics due to LD is accounted for, and a more accurate estimate of confounding due to population stratification and/or cryptic relatedness can be obtained (227). Further, LD score regression can be extended to study the genetic overlap between traits. By regressing the product of normalized effect sizes (z-scores) derived from each of 2 GWAS against variant LD scores, the resultant slope is an estimate of genetic covariance, from which the genetic correlation between the corresponding traits can be derived (228). The genetic correlation between 2 traits is essentially the extent to which the set of genetic variants that contribute to one trait also contribute to the other, the net result of pleiotropy and LD between the 2 traits. Unlike Mendelian randomization, which can probe overlapping contributions between traits for a limited number of SNPs, cross-trait LD score regression provides genome-wide estimates of genetic correlation. Quantifying the shared additive genetic effects between traits can yield broader insights into shared biological pathways and causal relationships.
Next-Generation Sequencing
The advent of next-generation sequencing (NGS) technologies, in which millions of sequence reads can be processed simultaneously, led a dramatic reduction in the time and cost associated with performing genomic sequencing (229). Unlike other genotyping methods that measure a limited set of SNPs and rely on LD patterns to fill in the blanks, as it were, NGS can directly detect nearly all positions in the genome, thereby enabling researchers to genotype rare variants and/or private mutations. Accordingly, NGS is especially useful in identifying pathogenic variants in rare diseases (230), in fine mapping of candidate gene regions (215), or in measuring the expression of novel gene variants (231). NGS has also enabled researchers to study to what extent rare alleles contribute to common disease (212). Many different experimental approaches have been developed that utilize NGS, including measuring genetic variation across the genome with whole genome sequencing (WGS) or within protein-coding regions using whole exome sequencing (WES) (232) or within specific regions with targeted sequencing; quantifying gene expression by measuring transcribed DNA using RNA sequencing (RNA-seq) (233); identifying histone modifications using chromatin immunoprecipitation followed by NGS (ChIP-seq) (234); measuring DNA methylation using whole-genome bisulfite sequencing (WGBS) (235); and studying chromatin structure with transposase accessibility profiling, (ATAC-seq) (236), among others (237) (Table 2). Recent efforts to amass extremely large databases of whole genome sequencing data are facilitating great advances in our collective understanding around the genetics of human populations and of complex traits (238, 239).
System . | Method . | Information . | References . |
---|---|---|---|
Genome | WGS (whole genome sequencing) | Genetic variation across the genome | (240) |
WES (whole exome sequencing) | Genetic variation in gene-coding regions | (232) | |
Targeted sequencing | Genetic variation in specific regions of interest | (241, 242) | |
Epigenome | ChIP-seq (chromatin immunoprecipitation sequencing) | Protein-DNA interactions | (234) |
WGBS (whole-genome bisulfite sequencing) | DNA methylation sites | (235) | |
DNase-seq (DNase I hypersensitivity sequencing) | Open chromatin sites | (243, 244) | |
FAIRE-seq (formaldehyde-assisted isolation of regulatory elements sequencing) | Open chromatin sites | (245, 246) | |
ATAC-seq (assay for transposase-accessible chromatin using sequencing) | Open chromatin sites | (236) | |
Hi-C | Chromatin interactions | (247) | |
ChIA-PET (chromatin immunoprecipitation and paired-end tag sequencing) | Protein-mediated chromatin interactions | (248) | |
Transcriptome | RNA-seq (RNA sequencing) | Gene expression | (233) |
smRNA-seq (small RNA sequencing) | Expression of microRNAs and other small RNAs | (249) | |
CLIP-seq (cross-linking immunoprecipitation sequencing) | Protein-RNA interactions | (250) |
System . | Method . | Information . | References . |
---|---|---|---|
Genome | WGS (whole genome sequencing) | Genetic variation across the genome | (240) |
WES (whole exome sequencing) | Genetic variation in gene-coding regions | (232) | |
Targeted sequencing | Genetic variation in specific regions of interest | (241, 242) | |
Epigenome | ChIP-seq (chromatin immunoprecipitation sequencing) | Protein-DNA interactions | (234) |
WGBS (whole-genome bisulfite sequencing) | DNA methylation sites | (235) | |
DNase-seq (DNase I hypersensitivity sequencing) | Open chromatin sites | (243, 244) | |
FAIRE-seq (formaldehyde-assisted isolation of regulatory elements sequencing) | Open chromatin sites | (245, 246) | |
ATAC-seq (assay for transposase-accessible chromatin using sequencing) | Open chromatin sites | (236) | |
Hi-C | Chromatin interactions | (247) | |
ChIA-PET (chromatin immunoprecipitation and paired-end tag sequencing) | Protein-mediated chromatin interactions | (248) | |
Transcriptome | RNA-seq (RNA sequencing) | Gene expression | (233) |
smRNA-seq (small RNA sequencing) | Expression of microRNAs and other small RNAs | (249) | |
CLIP-seq (cross-linking immunoprecipitation sequencing) | Protein-RNA interactions | (250) |
System . | Method . | Information . | References . |
---|---|---|---|
Genome | WGS (whole genome sequencing) | Genetic variation across the genome | (240) |
WES (whole exome sequencing) | Genetic variation in gene-coding regions | (232) | |
Targeted sequencing | Genetic variation in specific regions of interest | (241, 242) | |
Epigenome | ChIP-seq (chromatin immunoprecipitation sequencing) | Protein-DNA interactions | (234) |
WGBS (whole-genome bisulfite sequencing) | DNA methylation sites | (235) | |
DNase-seq (DNase I hypersensitivity sequencing) | Open chromatin sites | (243, 244) | |
FAIRE-seq (formaldehyde-assisted isolation of regulatory elements sequencing) | Open chromatin sites | (245, 246) | |
ATAC-seq (assay for transposase-accessible chromatin using sequencing) | Open chromatin sites | (236) | |
Hi-C | Chromatin interactions | (247) | |
ChIA-PET (chromatin immunoprecipitation and paired-end tag sequencing) | Protein-mediated chromatin interactions | (248) | |
Transcriptome | RNA-seq (RNA sequencing) | Gene expression | (233) |
smRNA-seq (small RNA sequencing) | Expression of microRNAs and other small RNAs | (249) | |
CLIP-seq (cross-linking immunoprecipitation sequencing) | Protein-RNA interactions | (250) |
System . | Method . | Information . | References . |
---|---|---|---|
Genome | WGS (whole genome sequencing) | Genetic variation across the genome | (240) |
WES (whole exome sequencing) | Genetic variation in gene-coding regions | (232) | |
Targeted sequencing | Genetic variation in specific regions of interest | (241, 242) | |
Epigenome | ChIP-seq (chromatin immunoprecipitation sequencing) | Protein-DNA interactions | (234) |
WGBS (whole-genome bisulfite sequencing) | DNA methylation sites | (235) | |
DNase-seq (DNase I hypersensitivity sequencing) | Open chromatin sites | (243, 244) | |
FAIRE-seq (formaldehyde-assisted isolation of regulatory elements sequencing) | Open chromatin sites | (245, 246) | |
ATAC-seq (assay for transposase-accessible chromatin using sequencing) | Open chromatin sites | (236) | |
Hi-C | Chromatin interactions | (247) | |
ChIA-PET (chromatin immunoprecipitation and paired-end tag sequencing) | Protein-mediated chromatin interactions | (248) | |
Transcriptome | RNA-seq (RNA sequencing) | Gene expression | (233) |
smRNA-seq (small RNA sequencing) | Expression of microRNAs and other small RNAs | (249) | |
CLIP-seq (cross-linking immunoprecipitation sequencing) | Protein-RNA interactions | (250) |
The additional genomic resolution afforded by WGS does come at a cost, however, beyond that associated with the sequencing itself. Due to the extensive repetitive sequences found throughout the human genome and the sheer volume associated with WGS output in general, analyzing WGS data is extremely computationally arduous. Efficient processing of NGS data of any kind typically requires the use of high-performance computing clusters. A typical WGS study will require many terabytes (1 terabyte = 1024 gigabytes) of storage space (251). There are numerous open-source software tools for processing sequencing data and performing various statistical analyses, but advanced bioinformatics training is required to ensure such analyses are performed correctly with the application of appropriate quality control measures.
Genetic Analyses of PCOS
PCOS Genetics Before GWAS
Historically, most genetic studies of PCOS used a candidate gene approach, in which polymorphic sites near specific genes of interest—chosen based on the gene’s hypothesized role in disease-related pathways—were tested for associations with PCOS. Such studies are inherently limited by a priori assumptions about gene functions. Based on the primary characteristics of the disease, early PCOS candidate genes included those involved with steroidogenesis (eg, CYP11A1, CYP17A1, STAR), androgen and gonadotropin action (eg, FSHR, LHCGR, SHBG, AR), and insulin resistance (eg, INSR, INS-VNTR, IGF1, IGF1R, IRS1, PPARG), among others (252). Collectively, these candidate gene studies identified hundreds of associations, but the findings were largely inconsistent, with few associations being replicated in subsequent studies for a variety of reasons (16, 253). Many failed to adequately control for population stratification between cases and controls (210), which could have produced false associations due to ethnic/racial differences in allele frequencies (254). Likewise, failure to control for comorbid phenotypes, such as obesity, could produce misleading association signals (16). These confounding effects were exacerbated by the small sample sizes in most of these studies, which also resulted in limited statistical power to detect associations with modest effect sizes (255). Most studies also failed to adequately correct for multiple hypothesis testing (256), which is the simple reality that if you perform enough statistical tests, you will inevitably observe improbable associations under the null. Finally, because PCOS phenotypes will vary according to the diagnostic criteria applied (254, 257, 258), candidate gene studies were doubtlessly affected by the populations chosen for each study.
In an effort to assess the relative reliability of different PCOS candidate gene study findings, Hiam and colleagues recently performed a systematic review of candidate gene meta-analyses in PCOS (259). Of 21 qualified candidate gene meta-analyses, only 5 were deemed “high quality.” Most of the meta-analyses did not describe inclusion criteria for control groups or exclude studies with statistically implausible distributions of alleles (260). Moreover, the quality assessment used to systematically evaluate the meta-analyses (261) did not include specific considerations for genetic association studies, such as those detailed above, so even candidate gene meta-analyses that were deemed to be of relatively high quality could still suffer from underlying issues that affect candidate gene studies in general.
The first replicated candidate risk locus for PCOS was in a region on chromosome 19 linked to the insulin receptor gene, INSR. In linkage and TDT analyses of 150 families with at least one affected case, we (252) identified an allele located near INSR that was significantly associated with NIH PCOS or hyperandrogenemia. The NIH PCOS or hyperandrogenemia phenotype was used based on our study suggesting that hyperandrogenemia was a major endophenotype in the sisters of women with PCOS (98). This finding was then replicated by us in several independent family-based cohorts (262-265), as well as in subsequent case-control analyses (266, 267). Although INSR was the intended target in the initial study, the associated allele was actually located in an intron of the fibrillin 3 gene, FBN3, about 800kb from INSR. The risk marker could conceivably have been tagging an allele in FBN3 or any one of several other genes in the genomic vicinity, despite the physiologic evidence in support of INSR. However, by measuring a set of SNPs across the entire INSR gene in a relatively large case-control association study (799 cases, 3758 controls), Goodarzi and colleagues did identify an intronic variant within INSR that was significantly associated with PCOS, independent of BMI, which lent additional evidence in support of INSR being the functional gene in the region (268), a finding subsequently supported by GWAS (269). Nevertheless, until one or more functional INSR variants are experimentally validated, it remains unknown which gene(s) in this gene-rich region of chromosome 19 are actually contributing to PCOS.
GWAS of PCOS
The development of cost-efficient, high-throughput genotyping platforms marked a turning point in the study of complex trait genetics, including PCOS. Genetic variation could now be measured across the genome using a hypothesis-free approach in thousands of individuals in order to identify alleles associated with the disease or trait of interest. The first 2 PCOS GWAS were completed in Han Chinese cohorts with PCOS cases defined according to the Rotterdam criteria (269, 270). These GWAS collectively identified 11 significant association signals with PCOS (Table 3 (117, 269-273)). Two subsequent GWAS were performed in women of European ancestry, with the first using the NIH definition for PCOS (271) and the second using self-reported PCOS cases validated by replication in additional cohorts of Rotterdam and NIH phenotype cases (272). These European GWAS identified 5 novel associations—including a locus encompassing the FSHB gene, which encodes the beta subunit of FSH—in addition to replicating 2 Han Chinese association signals. Two GWAS have also been performed in Korean women using the Rotterdam criteria (274, 275), but neither identified any loci significantly associated with PCOS, likely due to limited sample sizes and the prevalence of hyperandrogenemia in the controls (16). A more recent, large-scale meta-analysis of PCOS in European ancestry identified 3 novel loci and replicated 11 of the previously reported loci (117). Finally, a meta-analysis in mixed ancestries from multiple biobanks identified a novel locus associated with algorithmically defined Rotterdam PCOS cases from EHRs (273) (Table 3), but the accuracy of the EHR-based algorithm was not validated by manual chart review or patient recall (273). Further, none of the previously identified PCOS GWAS loci were replicated with genome-wide significance, which is surprising considering multiple loci have been otherwise replicated in PCOS cohorts of different ancestries (Table 3). Recent studies have validated EHR-based algorithms for effective identification of PCOS with positive-predictive values over 90% in populations enriched for PCOS (276, 277), which should facilitate more biobank-based genetic analyses. However, the best performing algorithms have low sensitivity (~50%) (276). Therefore, EHR-predicted case cohorts may represent a distinct subset of women with PCOS compared to traditionally defined PCOS case cohorts.
Locus . | Implicated genes . | Han Chinese Rotterdam; Chen, 2011 (270) . | Han Chinese Rotterdam; Shi, 2012 (269) . | European NIH; Hayes & Urbanek, 2015 (271) . | European self-report, Rotterdam, NIH; Day, 2015 (272) . | European self-report, Rotterdam, NIH; Day, 2018 (117) . | Mixed ancestry EHR-based Rotterdam; Zhang & Ho, 2020 (273) . |
---|---|---|---|---|---|---|---|
2p16.3 | LHCGR | x | x | ||||
2p16.3 | FSHR | x | |||||
2p21 | THADA | x | x | x | x | ||
2q34 | ERBB4 | x | x | ||||
5q13.1 | IRF1/RAD50 | x | x | ||||
6q25.3 | SOD2 | x | |||||
8p32.1 | GATA4/NEIL2 | x | x | ||||
9p24.1 | PLGRKT | x | |||||
9q22.32 | C9orf3 | x | x | x | |||
9q33.3 | DENND1A | x | x | x | |||
11p14.1 | FSHB | x | x | x | |||
11q22.1 | YAP1 | x | x | x | |||
11q23.1 | ZBTB16 | x | |||||
12q13.2 | RAB5B/SUOX/ERBB3 | x | x | ||||
12q14.3 | HMGA2 | x | |||||
12q21.2 | KRR1 | x | x | ||||
16q12.1 | TOX3 | x | x | ||||
19q13.3 | INSR | x | |||||
20q11.21 | SUMO1P1 | x | |||||
20q13.2 | MAPRE1 | x |
Locus . | Implicated genes . | Han Chinese Rotterdam; Chen, 2011 (270) . | Han Chinese Rotterdam; Shi, 2012 (269) . | European NIH; Hayes & Urbanek, 2015 (271) . | European self-report, Rotterdam, NIH; Day, 2015 (272) . | European self-report, Rotterdam, NIH; Day, 2018 (117) . | Mixed ancestry EHR-based Rotterdam; Zhang & Ho, 2020 (273) . |
---|---|---|---|---|---|---|---|
2p16.3 | LHCGR | x | x | ||||
2p16.3 | FSHR | x | |||||
2p21 | THADA | x | x | x | x | ||
2q34 | ERBB4 | x | x | ||||
5q13.1 | IRF1/RAD50 | x | x | ||||
6q25.3 | SOD2 | x | |||||
8p32.1 | GATA4/NEIL2 | x | x | ||||
9p24.1 | PLGRKT | x | |||||
9q22.32 | C9orf3 | x | x | x | |||
9q33.3 | DENND1A | x | x | x | |||
11p14.1 | FSHB | x | x | x | |||
11q22.1 | YAP1 | x | x | x | |||
11q23.1 | ZBTB16 | x | |||||
12q13.2 | RAB5B/SUOX/ERBB3 | x | x | ||||
12q14.3 | HMGA2 | x | |||||
12q21.2 | KRR1 | x | x | ||||
16q12.1 | TOX3 | x | x | ||||
19q13.3 | INSR | x | |||||
20q11.21 | SUMO1P1 | x | |||||
20q13.2 | MAPRE1 | x |
Loci with reported associations with PCOS at genome-wide significance in GWAS. Loci identified in multiple studies are shown in bold. Loci identified in multiple ancestries are highlighted.
Locus . | Implicated genes . | Han Chinese Rotterdam; Chen, 2011 (270) . | Han Chinese Rotterdam; Shi, 2012 (269) . | European NIH; Hayes & Urbanek, 2015 (271) . | European self-report, Rotterdam, NIH; Day, 2015 (272) . | European self-report, Rotterdam, NIH; Day, 2018 (117) . | Mixed ancestry EHR-based Rotterdam; Zhang & Ho, 2020 (273) . |
---|---|---|---|---|---|---|---|
2p16.3 | LHCGR | x | x | ||||
2p16.3 | FSHR | x | |||||
2p21 | THADA | x | x | x | x | ||
2q34 | ERBB4 | x | x | ||||
5q13.1 | IRF1/RAD50 | x | x | ||||
6q25.3 | SOD2 | x | |||||
8p32.1 | GATA4/NEIL2 | x | x | ||||
9p24.1 | PLGRKT | x | |||||
9q22.32 | C9orf3 | x | x | x | |||
9q33.3 | DENND1A | x | x | x | |||
11p14.1 | FSHB | x | x | x | |||
11q22.1 | YAP1 | x | x | x | |||
11q23.1 | ZBTB16 | x | |||||
12q13.2 | RAB5B/SUOX/ERBB3 | x | x | ||||
12q14.3 | HMGA2 | x | |||||
12q21.2 | KRR1 | x | x | ||||
16q12.1 | TOX3 | x | x | ||||
19q13.3 | INSR | x | |||||
20q11.21 | SUMO1P1 | x | |||||
20q13.2 | MAPRE1 | x |
Locus . | Implicated genes . | Han Chinese Rotterdam; Chen, 2011 (270) . | Han Chinese Rotterdam; Shi, 2012 (269) . | European NIH; Hayes & Urbanek, 2015 (271) . | European self-report, Rotterdam, NIH; Day, 2015 (272) . | European self-report, Rotterdam, NIH; Day, 2018 (117) . | Mixed ancestry EHR-based Rotterdam; Zhang & Ho, 2020 (273) . |
---|---|---|---|---|---|---|---|
2p16.3 | LHCGR | x | x | ||||
2p16.3 | FSHR | x | |||||
2p21 | THADA | x | x | x | x | ||
2q34 | ERBB4 | x | x | ||||
5q13.1 | IRF1/RAD50 | x | x | ||||
6q25.3 | SOD2 | x | |||||
8p32.1 | GATA4/NEIL2 | x | x | ||||
9p24.1 | PLGRKT | x | |||||
9q22.32 | C9orf3 | x | x | x | |||
9q33.3 | DENND1A | x | x | x | |||
11p14.1 | FSHB | x | x | x | |||
11q22.1 | YAP1 | x | x | x | |||
11q23.1 | ZBTB16 | x | |||||
12q13.2 | RAB5B/SUOX/ERBB3 | x | x | ||||
12q14.3 | HMGA2 | x | |||||
12q21.2 | KRR1 | x | x | ||||
16q12.1 | TOX3 | x | x | ||||
19q13.3 | INSR | x | |||||
20q11.21 | SUMO1P1 | x | |||||
20q13.2 | MAPRE1 | x |
Loci with reported associations with PCOS at genome-wide significance in GWAS. Loci identified in multiple studies are shown in bold. Loci identified in multiple ancestries are highlighted.
In total, 19 loci significantly associated with PCOS risk have been identified via GWAS in Chinese and European cohorts, with one additional locus identified in a mixed cohort. Twelve loci have been replicated with genome-wide significance in more than one GWAS, with 6 across different ancestries (Table 3). Further, most loci have now been replicated across ancestries in smaller replication studies (278). The fact that many of the top PCOS risk loci are shared between European and Chinese populations suggests that PCOS is an ancient trait and that aspects of PCOS were selected for prior to human migration out of Africa (279).
A number of risk loci identified by these PCOS GWAS contain genes that had already been recognized as candidate genes for PCOS, namely the receptors for LH (LHCGR), FSH (FSHR), and insulin (INSR). Granulosa cell expression of FSHR was previously found to be higher in women with PCOS (280), and certain FSHR haplotypes had been identified that were associated with PCOS or its cardinal features in smaller, candidate gene studies (281-283). The gene encoding the beta subunit of FSH, FSHB, is also at a well-replicated PCOS GWAS risk locus (117, 271, 272, 284). The primary risk variant in the FSHB locus resides in a highly conserved regulatory region upstream of FSHB (285, 286) and upregulated FSHB transcription via enhancer activity in mouse gonadotropes (285, 287). FSHB variants were significantly associated with LH and FSH levels in women of European ancestry (271, 272), and with LH levels in women of Han Chinese ancestry (284). LHCGR was identified as a candidate gene due to its role in gonadotropin action (288). Its expression was higher in theca and granulosa cells from polycystic ovaries (289), and the gene was significantly demethylated, thereby facilitating expression (290, 291), in an animal model of PCOS (292). Importantly, the region of LD for the GWAS locus over INSR did not extend beyond the bounds of the INSR gene, itself (269). Therefore, it remains possible that the previously identified risk allele in FBN3 (264) marks an independent risk allele. Nonetheless, the INSR GWAS signal provided the strongest evidence yet for INSR’s role in PCOS pathogenesis. Indeed, although each of these genes had already been implicated in PCOS, for reasons noted above, such findings were not always consistent (256, 282, 293-295). The identification of these loci in PCOS GWAS served to further corroborate their putative involvement in PCOS pathogenesis.
Other GWAS loci include genes that had not yet been considered in PCOS but have since been connected to specific disease pathways. Risk alleles for T2D had already been identified at the 2 loci containing the THADA and HMGA2 genes (296, 297). The T2D risk alleles in THADA are independent of PCOS risk alleles (298), but THADA is now known to regulate thermogenesis (299, 300). HMGA2 has been found to promote adipogenesis (301, 302) and granulosa cell proliferation (303). The DENND1A gene was discovered to be an important regulator of theca cell androgen biosynthesis, with upregulated expression in PCOS (304). The YAP1 gene was also found to play a key role in ovarian follicle development (305, 306). The GWAS locus spanning the RAB5B, SUOX, and ERBB3 genes was previously identified as containing a risk allele for type 1 diabetes (T1D) (307, 308). ERBB3 has since been implicated in β-cell apoptosis (309). ERBB4, which is also at a risk locus for BMI (310, 311), encodes a receptor in the epidermal growth factor receptor (EGFR) superfamily, like ERBB3. ERBB4 has been found to regulate the oocyte environment during folliculogenesis (312). Both ERBB3 and ERBB4 bind neuregulins 1 (NRG1) and 2 (NRG2) (313). NRG1 is expressed and secreted by granulosa cells in response to the ovulatory LH surge regulates luteinization and oocyte maturation (314, 315). NRG1 has also been found to affect glucose metabolism in rodents (316, 317). These results suggest that the ERBB3 and ERBB4 GWAS loci may contribute to PCOS risk via reproductive and metabolic pathways. The lead variant in the chr5q13.1 risk allele in Europeans is associated with alternative splicing of the nearby RAD50 gene in the ovary, according to estimates from the Genotype-Tissue Expression (GTEx) project database (318, 319). RAD50 is involved in DNA damage response signaling (320). DNA repair pathways have been implicated in age of menopause GWAS (321, 322). Therefore, alleles at the RAD50 locus may include a common mechanism between PCOS and ovarian aging, although this connection is speculative.
The potential mechanisms through which other PCOS GWAS loci affect PCOS risk remain to be determined, but colocalization studies are beginning to connect these GWAS loci to different pathways (323, 324). Collectively, PCOS GWAS have substantially advanced our understanding of PCOS pathophysiology by identifying new candidate genes and by implicating various causal pathways, including gonadotropin secretion (FSHB) and action (LHCGR, FSHR), androgen biosynthesis (DENND1A), metabolic regulation (THADA, INSR, HMGA2), follicle development (HMGA2, YAP1), and age of menopause (FSHB, RAD50). Additional fine mapping and functional studies are needed to confirm the roles of these putative disease genes and uncover the specific molecular mechanisms through which variation in these GWAS loci confer risk for PCOS.
Case-control GWAS of PCOS have been informative in identifying disease loci, but there have also been GWAS performed for PCOS-related quantitative traits. Our European ancestry GWAS (271) included genome-wide association tests with testosterone, LH, FSH, DHEAS, and SHBG. There was a significant association between the PCOS risk allele at the FSHB locus and LH levels. This finding was replicated in Han Chinese ancestry PCOS with a variant in FSHB that was the lead SNP in this ancestry (284). A GWAS using data from the Twins UK cohort, featuring ~2600 women and ~300 men, replicated this finding in a non-PCOS population by finding that a highly correlated variant at the same locus was associated with higher LH and lower FSH levels (325). In a GWAS limited to mothers of dizygotic (DZ) twins, the FSHB PCOS variant was also associated with higher FSH levels and DZ twinning, as well as with earlier ages at menarche, menopause, and first child (326). These findings suggest that this FSHB locus is a general regulator of fertility in women rather than a PCOS-specific variant. A quantitative trait analysis in Han Chinese PCOS cases and control women identified variants in the FSHR gene that were associated with FSH levels, independent of PCOS status (327).
Using data from the UK Biobank, a large-scale biomedical database containing genetic and health information from half a million participants (328), Ruth, Day, Tyrrell, Thompson, and colleagues performed GWAS of testosterone and SHBG levels in more than 188 000 women (171). They found 254 significant genetic associations with total testosterone and 359 with SHBG, which demonstrates that these traits are highly polygenic. In a GWAS of steroid hormones in Germans, a low-frequency variant in THADA was significantly associated with DHEAS levels in women (329). The DHEAS-increasing allele is found exclusively on the same haplotype as the European PCOS-risk allele in THADA (117, 272, 330, 331). Large-scale quantitative trait studies such as these may ultimately help in mapping the biological pathways surrounding PCOS disease genes. These results also demonstrate that PCOS risk alleles may affect general regulators of circulating hormone levels and that such variants are not necessarily specific to PCOS.
Mendelian Randomization With PCOS
The reproductive and metabolic abnormalities of PCOS are interrelated and the primary defect(s) remains unknown. Mendelian randomization studies have enabled the investigation of putative causal pathways and complement the insights provided by GWAS (Fig. 6). Genetically determined BMI is a significant risk factor for PCOS, with reported odds ratios of 1.90 to 4.89 per standard deviation increase in BMI (136, 137, 272). Conversely, genetically determined PCOS is not associated with increased BMI (137). A Mendelian randomization analysis conducted in Korean women did not find a causal relationship between BMI and PCOS, but the genetic instrument for BMI consisted of only 3 SNPs, and the PCOS GWAS sample size was small, with only 1000 cases and 1000 controls (332). Further, Mendelian randomization suggests that higher insulin and lower SHBG levels have causal effects on PCOS (117, 272, 333). These studies provide robust genetic evidence that cardiometabolic abnormalities contribute to the development of PCOS.

Significant causal factors for PCOS. Odds ratios for PCOS risk per standard deviation increase are shown for significant causal factors according to Mendelian randomization analyses (117, 136, 137, 171, 272, 334). Abbreviations: BMI, body mass index; EPIA-S, epiandrosterone sulfate; PCOS, polycystic ovary syndrome; SHBG, sex hormone–binding globulin; T, testosterone. *Insulin resistance was the tested causal factor. ¶Bioavailable testosterone was the tested causal factor.
Similarly, Mendelian randomization studies have elucidated reproductive causal pathways. Both genetically determined total and non-SHBG-bound testosterone contribute to the development of PCOS (171), providing support for the hypothesis that testosterone plays a primary role in PCOS pathogenesis (18, 19, 98, 335). In addition, a Mendelian randomization analysis of genetically determined serum metabolites found that higher levels of epiandrosterone sulfate (EPIA-S) were significantly associated with risk of developing PCOS (334). Genetically determined later age of menopause was also causally associated with PCOS (117, 272), which may be attributable to the PCOS risk variants at the FSHB locus (271, 284, 321, 325, 326, 336, 337) and/or perhaps the DNA repair gene RAD50 (272). Male pattern balding has been proposed to be a PCOS male phenotype (142, 149); this hypothesis is supported by Mendelian randomization showing a causal association between male pattern balding and PCOS (117). Of course, menopause and male pattern balding do not precede PCOS; therefore, their causal associations must represent overlapping biological pathways, as opposed to other traits that may directly influence PCOS risk, such as testosterone or BMI.
There have been no adequately powered prospective studies to determine the long-term morbidities of PCOS, such as T2D, cardiovascular events, or cancer (338, 339). Many cross-sectional and patient registry studies have shown that PCOS is associated with an increased risk of T2D (338-344), independent of obesity. However, obesity substantially increases this risk (338-342). Although PCOS is associated with multiple risk factors for atherosclerosis, including metabolic syndrome (345), endothelial dysfunction (346), and increased carotid intima thickness (347), no study has demonstrated an increase in cardiovascular events (348). A recent Mendelian randomization study found that genetically determined PCOS did not increase risk for coronary heart disease or stroke in Europeans, nor for T2D in both European and East Asian ancestry cohorts (349). However, since increased BMI, low SHBG, and higher testosterone levels in women increase risk for both T2D and PCOS (117, 137, 171, 272, 350), these common features may account for the association of PCOS with T2D (349). Because Mendelian randomization is confounded by shared genetic etiology, also known as horizontal pleiotropy, it may be difficult to eliminate all potentially confounding associations when assessing causality between 2 highly correlated complex traits like T2D and PCOS (351). Indeed, LD score regression suggests substantial shared genetic architecture between PCOS and T2D (117). Recently, several methods have been developed to account for horizontal pleiotropy in Mendelian randomization analyses (352-354). These methods may help improve our understanding of causal relationships between PCOS and other complex traits, but to our knowledge none have yet been applied in studying PCOS.
Mendelian randomization studies have investigated whether PCOS is a risk factor for certain cancers. There have been conflicting data regarding the association between PCOS and ovarian cancer (355), but a recent Mendelian randomization study found that PCOS had a modest protective effect against ovarian cancer (356). Further, genetically determined PCOS was causally associated for estrogen receptor-positive, but not estrogen receptor-negative breast cancer (357).
LD Score Regression With PCOS
In their large-scale meta-analysis of PCOS, Day and colleagues (117) performed LD score regression to investigate the genetic correlations between PCOS and other traits (Fig. 7). The analysis revealed strong genetic correlations between PCOS and BMI, childhood obesity, fasting insulin, T2D, triglyceride levels, high-density lipoprotein levels, and cardiovascular disease. These correlations indicate there is substantial shared genetic architecture between PCOS and cardiometabolic disorders. Additionally, there was a significant genetic correlation between PCOS and menarche timing. A previous study found that PCOS was genetically related with earlier onset of puberty, as well, demonstrating significant genetic correlations with age at menarche in women and age at voice breaking in men (358), consistent with epidemiological findings (359-361). There was also shared genetic architecture between PCOS and depression according to Mendelian randomization and LD score regression estimates (117), which supports observational studies suggesting an increased risk of depression and anxiety in women with PCOS (362). The connection between PCOS and depression is likely mediated to some extent through obesity, as BMI is causally related to both conditions (363). Indeed, a recent study investigating shared genetic effects between PCOS and psychiatric disorders found that the causal association between PCOS and depression was nullified after controlling for BMI (364). Male pattern baldness was associated with PCOS in a Mendelian randomization analysis, but its overall genetic correlation with PCOS was not significant according to LD score regression (117). These contrasting results suggest that the shared etiology between the 2 conditions may be limited to a few key pathways. In their UK Biobank study, Ruth, Day, Tyrrell, Thompson, and colleagues calculated genetic correlations between different sex hormone levels in women and reported correlations between total testosterone and SHBG (r = −0.06), between testosterone and estradiol (r = −0.25), and between estradiol and SHBG (r = 0.45), indicating that these traits are functions of overlapping biological pathways (171).

Genetic correlations with PCOS. Genetic correlations between PCOS risk and various hormonal and metabolic traits, according to Day and colleagues (117). Positive and negative correlations correspond to the direction of effect. Correlation estimates are shown with standard error bars. Menarche and Menopause correspond to age of onset. Other binary traits correspond to risk of occurrence. Abbreviations: BMI, body mass index; PCOS, polycystic ovary syndrome.
PheWAS of PCOS
PheWAS offer yet another way of investigating genetic overlap between conditions. In a large-scale PheWAS (ndiscovery = 49 343, nreplication = 18 096) that utilized genotype data linked with electronic health record (EHR) data from the eMERGE network (365), Joo and colleagues (366) tested for associations of 1711 EHR phenotypes, classified according to ICD codes, against a PCOS polygenic risk score generated from the recent PCOS meta-GWAS summary statistics (117). They identified 13 replicated phenome-wide signals with consistent directions of effect. Some of these signals represent the same disorders, such as “obesity” and “overweight” or “sleep apnea” and “obstructive sleep apnea,” but all of the significant associations appear to be obesity-related (Table 4). These findings reinforce the comorbid relationship of BMI with PCOS. PheWAS of individual GWAS SNPs have likewise identified correlated traits, including menstrual cycle length, obesity, and age at menopause (324).
Phenotype code . | Discovery (n = 49 343) . | Replication (n = 18 096) . | ||||
---|---|---|---|---|---|---|
. | Prevalence . | OR (95% CI) . | P . | Prevalence . | OR (95% CI) . | P . |
Morbid obesity | 0.18 | 1.010 (1.008–1.013) | 9.74 × 10-18 | 0.11 | 1.116 (1.054–1.182) | 1.64 × 10-4 |
Obesity | 0.32 | 1.008 (1.006–1.009) | 4.14 × 10-17 | 0.20 | 1.087 (1.042–1.134) | 1.29 × 10-4 |
Overweight | 0.37 | 1.007 (1.005–1.009) | 2.20 × 10-16 | 0.25 | 1.077 (1.037–1.120) | 1.44 × 10-4 |
T2D | 0.25 | 1.007 (1.005–1.009) | 8.18 × 10-13 | 0.22 | 1.081 (1.036–1.128) | 3.70 × 10-4 |
Sleep apnea | 0.16 | 1.008 (1.006–1.010) | 4.71 × 10-12 | 0.12 | 1.096 (1.036–1.158) | 1.33 × 10-3 |
Diabetes mellitus | 0.26 | 1.007 (1.005–1.009) | 5.39 × 10-12 | 0.23 | 1.079 (1.035–1.125) | 3.56 × 10-4 |
Chronic liver disease and cirrhosis | 0.11 | 1.008 (1.005–1.011) | 4.17 × 10-9 | 0.10 | 1.093 (1.028–1.163) | 4.64 × 10-3 |
Bariatric surgery | 0.04 | 1.012 (1.008–1.016) | 7.59 × 10-9 | 0.02 | 1.202 (1.079–1.339) | 8.00 × 10-4 |
Obstructive sleep apnea | 0.13 | 1.007 (1.005–1.010) | 1.16 × 10-8 | 0.09 | 1.098 (1.030–1.170) | 3.98 × 10-3 |
Other chronic nonalcoholic liver disease | 0.11 | 1.008 (1.005–1.011) | 2.13 × 10-8 | 0.09 | 1.112 (1.042–1.187) | 1.38 × 10-3 |
Polycystic ovaries | 0.02 | 1.015 (1.009–1.020) | 3.16 × 10-7 | 0.02 | 1.174 (1.026–1.343) | 1.93 × 10-2 |
Insulin pump user | 0.09 | 1.008 (1.005–1.011) | 2.25 × 10-6 | 0.08 | 1.136 (1.060–1.219) | 3.42 × 10-4 |
T2D with ophthalmic manifestations | 0.05 | 1.010 (1.005–1.014) | 9.20 × 10-6 | 0.03 | 1.221 (1.082–1.377) | 1.20 × 10-3 |
Phenotype code . | Discovery (n = 49 343) . | Replication (n = 18 096) . | ||||
---|---|---|---|---|---|---|
. | Prevalence . | OR (95% CI) . | P . | Prevalence . | OR (95% CI) . | P . |
Morbid obesity | 0.18 | 1.010 (1.008–1.013) | 9.74 × 10-18 | 0.11 | 1.116 (1.054–1.182) | 1.64 × 10-4 |
Obesity | 0.32 | 1.008 (1.006–1.009) | 4.14 × 10-17 | 0.20 | 1.087 (1.042–1.134) | 1.29 × 10-4 |
Overweight | 0.37 | 1.007 (1.005–1.009) | 2.20 × 10-16 | 0.25 | 1.077 (1.037–1.120) | 1.44 × 10-4 |
T2D | 0.25 | 1.007 (1.005–1.009) | 8.18 × 10-13 | 0.22 | 1.081 (1.036–1.128) | 3.70 × 10-4 |
Sleep apnea | 0.16 | 1.008 (1.006–1.010) | 4.71 × 10-12 | 0.12 | 1.096 (1.036–1.158) | 1.33 × 10-3 |
Diabetes mellitus | 0.26 | 1.007 (1.005–1.009) | 5.39 × 10-12 | 0.23 | 1.079 (1.035–1.125) | 3.56 × 10-4 |
Chronic liver disease and cirrhosis | 0.11 | 1.008 (1.005–1.011) | 4.17 × 10-9 | 0.10 | 1.093 (1.028–1.163) | 4.64 × 10-3 |
Bariatric surgery | 0.04 | 1.012 (1.008–1.016) | 7.59 × 10-9 | 0.02 | 1.202 (1.079–1.339) | 8.00 × 10-4 |
Obstructive sleep apnea | 0.13 | 1.007 (1.005–1.010) | 1.16 × 10-8 | 0.09 | 1.098 (1.030–1.170) | 3.98 × 10-3 |
Other chronic nonalcoholic liver disease | 0.11 | 1.008 (1.005–1.011) | 2.13 × 10-8 | 0.09 | 1.112 (1.042–1.187) | 1.38 × 10-3 |
Polycystic ovaries | 0.02 | 1.015 (1.009–1.020) | 3.16 × 10-7 | 0.02 | 1.174 (1.026–1.343) | 1.93 × 10-2 |
Insulin pump user | 0.09 | 1.008 (1.005–1.011) | 2.25 × 10-6 | 0.08 | 1.136 (1.060–1.219) | 3.42 × 10-4 |
T2D with ophthalmic manifestations | 0.05 | 1.010 (1.005–1.014) | 9.20 × 10-6 | 0.03 | 1.221 (1.082–1.377) | 1.20 × 10-3 |
Replicated phenotype associations demonstrating a consistent direction of effect with PCOS polygenic risk score, according to Joo and colleagues (366). Patients without the phenotype code of interest but with clinically related phenotypes were excluded from consideration. Abbreviations: OR, odds ratio; PCOS, polycystic ovary syndrome; T2D, type 2 diabetes.
Phenotype code . | Discovery (n = 49 343) . | Replication (n = 18 096) . | ||||
---|---|---|---|---|---|---|
. | Prevalence . | OR (95% CI) . | P . | Prevalence . | OR (95% CI) . | P . |
Morbid obesity | 0.18 | 1.010 (1.008–1.013) | 9.74 × 10-18 | 0.11 | 1.116 (1.054–1.182) | 1.64 × 10-4 |
Obesity | 0.32 | 1.008 (1.006–1.009) | 4.14 × 10-17 | 0.20 | 1.087 (1.042–1.134) | 1.29 × 10-4 |
Overweight | 0.37 | 1.007 (1.005–1.009) | 2.20 × 10-16 | 0.25 | 1.077 (1.037–1.120) | 1.44 × 10-4 |
T2D | 0.25 | 1.007 (1.005–1.009) | 8.18 × 10-13 | 0.22 | 1.081 (1.036–1.128) | 3.70 × 10-4 |
Sleep apnea | 0.16 | 1.008 (1.006–1.010) | 4.71 × 10-12 | 0.12 | 1.096 (1.036–1.158) | 1.33 × 10-3 |
Diabetes mellitus | 0.26 | 1.007 (1.005–1.009) | 5.39 × 10-12 | 0.23 | 1.079 (1.035–1.125) | 3.56 × 10-4 |
Chronic liver disease and cirrhosis | 0.11 | 1.008 (1.005–1.011) | 4.17 × 10-9 | 0.10 | 1.093 (1.028–1.163) | 4.64 × 10-3 |
Bariatric surgery | 0.04 | 1.012 (1.008–1.016) | 7.59 × 10-9 | 0.02 | 1.202 (1.079–1.339) | 8.00 × 10-4 |
Obstructive sleep apnea | 0.13 | 1.007 (1.005–1.010) | 1.16 × 10-8 | 0.09 | 1.098 (1.030–1.170) | 3.98 × 10-3 |
Other chronic nonalcoholic liver disease | 0.11 | 1.008 (1.005–1.011) | 2.13 × 10-8 | 0.09 | 1.112 (1.042–1.187) | 1.38 × 10-3 |
Polycystic ovaries | 0.02 | 1.015 (1.009–1.020) | 3.16 × 10-7 | 0.02 | 1.174 (1.026–1.343) | 1.93 × 10-2 |
Insulin pump user | 0.09 | 1.008 (1.005–1.011) | 2.25 × 10-6 | 0.08 | 1.136 (1.060–1.219) | 3.42 × 10-4 |
T2D with ophthalmic manifestations | 0.05 | 1.010 (1.005–1.014) | 9.20 × 10-6 | 0.03 | 1.221 (1.082–1.377) | 1.20 × 10-3 |
Phenotype code . | Discovery (n = 49 343) . | Replication (n = 18 096) . | ||||
---|---|---|---|---|---|---|
. | Prevalence . | OR (95% CI) . | P . | Prevalence . | OR (95% CI) . | P . |
Morbid obesity | 0.18 | 1.010 (1.008–1.013) | 9.74 × 10-18 | 0.11 | 1.116 (1.054–1.182) | 1.64 × 10-4 |
Obesity | 0.32 | 1.008 (1.006–1.009) | 4.14 × 10-17 | 0.20 | 1.087 (1.042–1.134) | 1.29 × 10-4 |
Overweight | 0.37 | 1.007 (1.005–1.009) | 2.20 × 10-16 | 0.25 | 1.077 (1.037–1.120) | 1.44 × 10-4 |
T2D | 0.25 | 1.007 (1.005–1.009) | 8.18 × 10-13 | 0.22 | 1.081 (1.036–1.128) | 3.70 × 10-4 |
Sleep apnea | 0.16 | 1.008 (1.006–1.010) | 4.71 × 10-12 | 0.12 | 1.096 (1.036–1.158) | 1.33 × 10-3 |
Diabetes mellitus | 0.26 | 1.007 (1.005–1.009) | 5.39 × 10-12 | 0.23 | 1.079 (1.035–1.125) | 3.56 × 10-4 |
Chronic liver disease and cirrhosis | 0.11 | 1.008 (1.005–1.011) | 4.17 × 10-9 | 0.10 | 1.093 (1.028–1.163) | 4.64 × 10-3 |
Bariatric surgery | 0.04 | 1.012 (1.008–1.016) | 7.59 × 10-9 | 0.02 | 1.202 (1.079–1.339) | 8.00 × 10-4 |
Obstructive sleep apnea | 0.13 | 1.007 (1.005–1.010) | 1.16 × 10-8 | 0.09 | 1.098 (1.030–1.170) | 3.98 × 10-3 |
Other chronic nonalcoholic liver disease | 0.11 | 1.008 (1.005–1.011) | 2.13 × 10-8 | 0.09 | 1.112 (1.042–1.187) | 1.38 × 10-3 |
Polycystic ovaries | 0.02 | 1.015 (1.009–1.020) | 3.16 × 10-7 | 0.02 | 1.174 (1.026–1.343) | 1.93 × 10-2 |
Insulin pump user | 0.09 | 1.008 (1.005–1.011) | 2.25 × 10-6 | 0.08 | 1.136 (1.060–1.219) | 3.42 × 10-4 |
T2D with ophthalmic manifestations | 0.05 | 1.010 (1.005–1.014) | 9.20 × 10-6 | 0.03 | 1.221 (1.082–1.377) | 1.20 × 10-3 |
Replicated phenotype associations demonstrating a consistent direction of effect with PCOS polygenic risk score, according to Joo and colleagues (366). Patients without the phenotype code of interest but with clinically related phenotypes were excluded from consideration. Abbreviations: OR, odds ratio; PCOS, polycystic ovary syndrome; T2D, type 2 diabetes.
Next-Generation Sequencing in PCOS
In recent years, the use of NGS technologies has become more common in PCOS research and has been applied numerous different ways, yielding new insights into the disease’s genetic origins. We used genomic sequencing to study the role that rare variants play in PCOS, using both candidate gene and genome-wide approaches. The common PCOS-associated alleles identified in GWAS could only explain a relatively small fraction of overall PCOS heritability (271, 367); therefore, we hypothesized that rare variants, which are not reliably tagged by GWAS arrays, could account for a significant proportion of the unexplained heritability (186).
Although next-generation sequencing can be used to identify rare variants, the inherently low population frequencies of rare variants necessitate extremely large cohorts to detect variant effects via standard case vs control genetic association tests (220). For example, in their large WES study of T2D, Flannick and colleagues estimated that at least 150 000 to 370 000 sequenced exomes would be required to detect individual T2D-associated rare variants at exome-wide significance with 80% power (368). Populations can be enriched for causal rare variants by studying extreme phenotypes (369) or by studying families with multiple affected individuals (205, 370). Unlike with Mendelian disorders, however, pathogenic alleles in complex traits tend to have small phenotypic effects (371, 372). Even in enriched cohorts, most rare alleles remain too scarce for effective variant association testing (206). Therefore, rare variant association studies are often either limited to specific candidate genes or use methods that aggregate sets of rare variants together for association testing, such as with gene-based burden tests (373) or sequence kernel association tests (374). To study rare variants in PCOS, we used a candidate gene approach to study individual rare variants and a family-based analysis to study rare variants across the genome at the gene level.
In 2 targeted sequencing studies of the PCOS candidate genes AMH and its type 2 receptor, AMHR2, we measured the functional impact of rare variants (minor allele frequency [MAF] ≤ 1%) that were present in PCOS cases using AMH-mediated luciferase assays (375, 376). AMH plays a central role in folliculogenesis and is typically overexpressed in women with PCOS (22, 377, 378). First, we sequenced the AMH gene in a cohort of 643 PCOS cases and 153 controls and identified 18 rare coding variants that were present in PCOS cases but not in controls, 4 that were present in both cases and controls, and 2 found in controls only (375). Seventeen of the 18 PCOS-specific variants decreased AMH-mediated signaling capacity in transfected COS7 cells. None of the rare variants present in controls impacted AMH signaling. Several computational tools were applied to assess variant deleteriousness (379-382), but their scores were poorly correlated with the assay results, emphasizing the importance of using functional assays to confirm the biologic relevance of genetic variants. We subsequently investigated rare variation in AMH regulatory regions and AMHR2 (376). We identified 20 additional PCOS-specific variants in or near AMH and AMHR2 that significantly reduced AMH signaling activity. These variants included 3 noncoding variants upstream of AMH, 16 noncoding or splicing variants in or upstream of AMHR2, and 1 missense variant in AMHR2. Interestingly, women with PCOS and 1 or more functional AMH/AMHR2 variants had significantly lower AMH levels than other women with PCOS.
In total, we identified 37 PCOS-specific rare variants that significantly impaired AMH signaling activity. The variants were also collectively associated with PCOS at the population level (375, 376). All PCOS cases with functional AMH/AMHR2 variants were heterozygous carriers. Five of the functional AMH variants have previously been identified in men with persistent Müllerian duct syndrome (PMDS), which is a rare disorder where men retain internal Müllerian duct structures (383). Men with PMDS caused by AMH mutations typically have low or undetectable AMH levels (384). Loss of CYP17 inhibition by AMH, however, could contribute to PCOS, as CYP17 is a key enzyme in androgen biosynthesis (385). Indeed, we found that AMH variants with reduced AMH signaling showed a significant reduction in CYP17α1 expression inhibition compared with wild-type AMH (376). Testosterone levels did not differ significantly between carriers of AMH variants and other women with PCOS. Collectively, about 6.7% of PCOS cases from these cohorts had one or more of the AMH/AMHR2 rare variants (375, 376), which represents a substantial minority of cases, but the PCOS risk effect sizes and heritability attributable to these variants has not been estimated. Although AMH overexpression is a more consistent feature of PCOS (22, 377, 378), these rare variant studies indicate that the role of AMH in PCOS is more complex than previously thought and may vary between affected women. The specific mechanism through which impaired AMH signaling may lead to PCOS requires further study.
In a WGS study of 261 individuals from 62 families with one or more daughters with PCOS, we tested for gene-level associations of deleterious rare variants (MAF ≤ 2%) with PCOS (386). In order to increase the study’s power to detect PCOS-associated variants, we first tested each set of rare variants against PCOS-related quantitative traits (testosterone, DHEAS, insulin, glucose, LH, FSH, SHBG) and then combined the trait association results into a meta-statistic. The meta-statistic served to combine information from multiple association tests and thereby reduce the penalty for multiple hypothesis testing. We identified a collection of rare variants in the DENND1A gene that were significantly associated with altered levels of these reproductive and metabolic traits within the families. DENND1A was first implicated as a PCOS candidate gene in GWAS (269). Subsequently, it was shown to influence androgen production in ovarian theca cells according to the relative abundance of a particular splicing isoform, DENND1A.V2 (304). Most of the rare variants identified in our family study were not in LD with the DENND1A GWAS variants (117, 269, 270), which have no impact on splicing or apparent functional significance (304, 387), but the rare variants were predicted to significantly disrupt transcription factor binding and/or RNA-binding protein motifs. Thus far, targeted sequencing (304, 387, 388) and whole exome sequencing (389) have failed to identify any variants in DENND1A associated with PCOS or with DENND1A.V2 expression, but these studies only examined limited regions of the gene and were in small case-control cohorts. Replication and functional studies are needed to confirm whether any of the rare DENND1A variants identified in the family study drive DENND1A.V2 expression and/or PCOS risk. Notably, subjects from 50% of families in the family study had at least one of the rare DENND1A variants (386). These results support a model in which causal variants may be individually rare, but they are collectively common in certain disease genes.
No other gene associations reached genome-wide significance in the family-based meta-analysis, but 2 additional genes among the top 5 associations are established PCOS candidate genes: C9orf3, which is a PCOS GWAS gene (117, 269-271), and BMP6, which regulates folliculogenesis in granulosa cells (390) and is overexpressed in PCOS women (391). Taken together, these genomic sequencing studies reinforce the putative role these genes play in PCOS pathogenesis and suggest that family-specific variants affecting key genes may be largely responsible for PCOS risk.
Numerous studies of PCOS-related pathways have measured gene expression in various tissues under different conditions using RNA-seq. These studies have identified different gene networks and pathways that are disrupted in PCOS, including MAPK signaling (392, 393), androgen receptor signaling (34), metabolic processes (394), and inflammatory and immune responses (395, 396). In gene expression studies, however, it is difficult to discern what changes are adaptive effects of the disease as opposed to causal aberrations (397). Investigators have also begun to map DNA methylation changes to gene expression changes in PCOS, in an effort to connect environmental exposures to transcriptional changes (398, 399).
Genetic Architecture of PCOS
Frequency and Function of Disease Variants
There is emerging consensus that complex traits are primarily driven by common noncoding variation (184, 400, 401). Around 90% of disease-associated GWAS variants are in noncoding regions of the genome (402, 403), and although lead GWAS SNPs are seldom causal (404), fine-mapping studies have confirmed that putatively causal SNPs in GWAS loci are significantly enriched in regulatory regions (405). Furthermore, most trait-associated variants either directly influence nearby gene expression or co-localize with other variants that influence expression of one or more genes (318).
As mentioned previously, common SNPs in GWAS loci typically account for only a small fraction of the estimated heritability for a given complex disease, an observation that came to be known as the problem of “missing heritability” (186). This mystery has since been largely resolved, as common SNPs collectively (not just those with significant association signals in GWAS) account for most of the remaining heritability in complex diseases (406-408). Accordingly, as the sample sizes of GWAS have increased, the number of significant associations discovered per study has increased proportionally (409). This widespread distribution of heritability—apart from the inherent complexity of biological networks—likely results from the fact that pathogenic variants in disease pathways are more strongly selected against via natural selection (183, 401). All of the most significant PCOS GWAS index variants have been located in noncoding regions, primarily intronic, and are common in their respective populations (117, 269-272). Based on these observations and findings across other complex traits, it is likely that the causal variants tagged by PCOS GWAS SNPs influence the expression of one or more nearby gene transcripts in one or more specific tissues, although some may affect more distant genes through “trans” effects (401).
Although rare variants may contribute less to the phenotypic variance of complex traits overall, the identification of disease-associated rare variants can point more directly to causal mechanisms and key disease pathways (410). In fact, gene expression variation attributable to nearby genetic variation is driven most strongly by rare variants (411). In our family-based WGS study of rare variants in PCOS, 30 of the 32 identified rare variants in DENND1A were noncoding, most of which were predicted to impact transcription factor and/or RNA-binding protein binding sites (386). Among the other top gene associations in that study were several other GWAS candidate genes. These results suggest that rare noncoding variants within GWAS loci likely contribute to PCOS risk, in addition to the common alleles tagged from GWAS. GWAS loci have been found to harbor independent trait-associated rare variants in other complex traits as well (412-414). Therefore, future studies in PCOS would likely benefit from considering both common and rare variants in their statistical models.
Nonsynonymous protein-coding variants, by altering the encoding of amino acids, can directly impact protein function significantly. Coding variants, therefore, are more often deleterious and, consequently, are relatively rare in the genome (415). Disruptive mutations in protein-coding regions are typical of rare monogenic diseases, so-called Mendelian disorders. Such variants, which individually cause significant individual phenotypic consequences, are said to be highly penetrant. While common diseases are characterized by highly polygenic architectures driven by noncoding variants with small individual variant effects, protein-coding variants are nonetheless significantly enriched for contributing to complex traits (405). Indeed, coding variant associations have been validated in a number of complex phenotypes, including LDL-cholesterol levels (416-418), Alzheimer’s disease (419, 420), and even height (413, 421). Recent large-scale whole exome association studies have identified rare alleles with large effect sizes that are associated with BMI (422) and T2D (368).
Our sequencing studies of rare variants in the AMH and AMHR2 genes identified 18 coding variants that impaired AMH-mediated signaling capacity and were collectively associated with PCOS in an independent cohort (375, 376). Missense mutations in other candidate genes have been reported in PCOS (388, 423-425), but functional and/or replication studies are needed to confirm their disease associations. As more candidate genes are sequenced, it is likely that additional rare coding variants contributing to PCOS will be identified. Although such variants contribute modestly to complex trait heritability (368, 426), they still may play a significant role in efforts to better understand PCOS disease etiology, because their biological consequences are more substantial and easier to interpret.
Similarly, identifying causal variants in extreme phenotypes can be an effective way of identifying candidate genes for corresponding complex traits. Genes linked to monogenic forms of complex traits are significantly enriched in corresponding complex trait GWAS gene sets (371). Genetic risk in Mendelian compared to complex diseases may simply be a function of penetrance determined by the nature of individual risk variants and the relative essentiality of their affected genes (427). For example, common variants in maturity-onset diabetes of the young (MODY) genes are associated with T2D (428-430). Therefore, studying familial genetics of extreme PCOS phenotypes could help identify genes and pathways commonly disrupted in PCOS more generally.
The genetic studies of PCOS indicate that common and rare variants, both coding and noncoding, contribute to PCOS pathogenesis. Although their relative contributions to PCOS heritability have yet to be precisely quantified, based on findings in other complex diseases, the majority of heritability likely comes from a highly polygenic network of common variants with small effect sizes, followed by the contribution of less frequent variants found in more central disease genes (184, 401). Collectively, 50% of families in our family-based WGS study had one or more of the rare variants identified in DENND1A (386). The high prevalence of hormone-associated rare variants in DENND1A, together with our earlier family studies in PCOS that observed a bimodal distribution of testosterone levels in sisters of women with PCOS compared to a unimodal distribution observed in control women (98), suggests that ovarian androgen biosynthesis is a central pathway in PCOS pathogenesis, with DENND1A as a key mediator. The discovery of functional rare variants that decrease AMH signaling implicates this pathway in PCOS pathogenesis, as well (375, 376). The findings suggest that AMH is more than a biomarker for antral follicle number (431). Conceivably then, there is a spectrum of underlying genetic risk for PCOS among any given population driven by common variation, but heritability within individual families or certain subpopulations of women with PCOS may be driven more by rare alleles with larger effects.
Genetic Architecture of PCOS NIH and Rotterdam Phenotypes
The most appropriate diagnostic criteria for PCOS have always been controversial since they are all based on subjective assessment of which clinical features of the syndrome reflect core biologic derangements (102) (Table 1). The PCOS GWAS meta-analysis (117) had adequate statistical power to objectively assess for the first time whether there were genetic differences among the PCOS phenotypes defined by the NIH compared to the Rotterdam criteria. Specifically, cases defined by the NIH criteria, HA+OD ± PCOM, were compared to the non-NIH Rotterdam cases, HA+PCOM and OD+PCOM, as well as to self-reported cases. The 14 GWAS loci that were significantly associated with PCOS according to any case definition were tested for heterogeneity of effect sizes across the cases stratified by phenotype. Only one locus, near GATA4/NEIL2, showed significant evidence for heterogeneity between the different diagnostic criteria groups, being most strongly associated with the NIH phenotype (Fig. 8 [(117)]). Since the NIH phenotype is substantially more insulin resistant than the non-NIH Rotterdam phenotypes (7), this locus may be involved in pathways regulating insulin sensitivity. However, the absence of heterogeneity among the phenotypes for the other 13 loci suggests that the genetic architecture of these phenotypes is generally similar. These findings imply that the current diagnostic criteria do not identify biologically distinct phenotypes.

Odds ratio of PCOS as a function of diagnostic criteria applied. The odds ratios (OR) and 95% CI are shown for each significant GWAS locus from the PCOS meta-analysis, stratified by case definitions according to different diagnostic criteria. NIH: groups recruiting only NIH diagnostic criteria; Non-NIH_Rotterdam: Rotterdam diagnostic criteria excluding the subset fulfilling NIH diagnostic criteria; Rotterdam+NIH: all groups except self-reported; self-reported: 23andMe. rs804279 at the GATA4/NEIL2 locus demonstrated significant heterogeneity (Het P = 2.6 × 10-5). The * indicates statistically significant associations for PCOS. Abbreviations: NIH, National Institutes of Health; PCOS, polycystic ovary syndrome. Figure reproduced from Day et al, 2018 (117).
The PCOS GWAS meta-analysis (117) also examined associations for each of the lead PCOS susceptibility variants with the individual diagnostic features, PCOM, OD, HA, as well as with PCOS-related reproductive traits including ovarian volume and testosterone, LH, and FSH levels. For the 3 PCOS diagnostic features, 4 SNPs were associated with HA, 9 with OD, and 8 with PCOM. The FSHB locus contained the only SNP associated with LH and FSH levels, consistent with findings in previous European ancestry PCOS GWAS (271, 272). There was only 1 SNP, at the IRF1/RAD50 locus, that was associated with testosterone levels. SNPs from 9 loci were associated with OD, 7 of these were also associated with PCOM. There was only 1 locus, near THADA, that was associated with PCOM and no other PCOS-related traits. There were no SNPs associated with ovarian volume. These findings suggest that the genetic architecture of PCOM and OD are similar. Taken together with the shared genetic architecture of NIH, non-NIH Rotterdam, and self-reported PCOS, these findings support the use of OD as a proxy for ultrasound assessment of ovarian morphology for the diagnosis of PCOS. Limitations of the PCOS-related trait analysis include the relatively small sample size with available data (n = ~2000 for PCOM and n = ~3000 for HA, OD, and quantitative hormonal traits) and the overlap between groups (117). Thus, it remains possible that additional unique association signals will be discovered as sample sizes increase.
Novel PCOS Subtypes
Genetic heterogeneity in a disease is the notion that different sets of biological aberrations in different individuals can lead to convergent phenotypes under the same clinical diagnosis. The inclusion of genetically discrete disorders within one disease cohort would significantly compromise the power of disease association studies and could account for missing heritability by masking true allele effects (432-434). More precise disease phenotyping resulting in more homogeneous study populations would not only empower the discovery of causal mechanisms (434) but would also lead to more effective care for patients along the path toward precision medicine (435).
PCOS is frequently referred to as a heterogeneous disorder (436, 437). It has been suggested that there are subtypes of PCOS, a leaner type with increased LH:FSH ratios and a heavier type with insulin resistance (438). However, there have been limited objective assessments of putative PCOS subtypes. Cluster analysis has been previously performed on PCOS quantitative traits (120, 439) but there has been no validation that the clusters thus identified were biologically relevant. In an effort to identify subpopulations of women with PCOS with distinct underlying genetic risk factors, we applied an unsupervised clustering approach (440). Using anthropometric, reproductive, and metabolic data from multiple, independent PCOS cohorts, we identified 2 reproducible subtypes that had distinct phenotypic characteristics: a “reproductive” group (23%) characterized by higher LH and SHBG levels with relatively low BMI and insulin levels, and a “metabolic” group (37%) characterized by high BMI, glucose, and insulin levels with lower LH and SHBG levels. The remaining cases were designated as “indeterminate” (40%) (Fig. 9). These clusters were additionally validated using bootstrap resampling (441).

PCA plot of novel PCOS clusters. Clustered PCOS cases are plotted on the first 2 PCs of adjusted quantitative trait data, colored according to their identified subtype with 95% concentration ellipses. The relative magnitude and direction of trait correlations with the PCs are shown with black arrows. Abbreviations: BMI, body mass index; DHEAS, dehydroepiandrosterone sulfate; FSH, follicle-stimulating hormone; Glu0, fasting glucose; Ins0, fasting insulin; LH, luteinizing hormone; PC, principal component; PCA, principal component analysis; PCOS, polycystic ovary syndrome; SHBG, sex hormone–binding globulin; T, testosterone. Figure reproduced from Dapas et al, 2020 (440).
Subsequent GWAS, using a data subset from our European PCOS GWAS (243), revealed novel loci significantly associated with each subtype; these loci included genes with putative functions in pathways relevant to PCOS (Fig. 10). One locus significantly associated with the reproductive subtype was located in the type-I AMH receptor BMPR1B (bone morphogenetic protein receptor type 1B). The BMPR1B receptor, which is highly expressed in granulosa cells and GnRH neurons (24), forms a heterodimer with transforming growth factor beta (TGF-β) type-II receptors, including AMHR2, and binds AMH and other BMP ligands to initialize TGF-β signaling (442). BMPR1B regulates follicular development and mediate granulosa AMH response in sheep (443), and BMPR1B deficiency led to infertility and functional ovarian defects in mice, including lower aromatase production in granulosa cells (444). BMPR1B was also identified as a key driver in an androstenedione-related lncRNA-mRNA network in PCOS (445). Interestingly, in our family-based rare variant study (386), one of the BMPR1B ligand genes, BMP6, had the third-strongest gene-level association with altered hormone levels, out of 339 genes. BMP6 expression was significantly higher in granulosa cells from women with PCOS compared to those from healthy controls and significantly inhibited FSH-induced estradiol production (391). Similarly, the strongest association with the reproductive subtype was just upstream of and within the same topologically associated domain (TAD) as PRDM2, the gene with the fifth-strongest association signal in the family-based rare variant study. PRDM2 is an estrogen receptor coactivator (446) highly expressed in the pituitary gland (447) and the ovary (448), and its protein also binds with the retinoblastoma protein (449), which is known to play a role in ovarian granulosa cell development (450, 451). The one locus that was significantly associated with the metabolic subtype included a number of potential disease genes in the surrounding region, such as GRB14, FIGN, and KCNH7 (440). The case sample sizes of the subtype GWAS were relatively small, however (n = 207 reproductive, 329 metabolic), so until these association results have been replicated, they should be considered preliminary.

Novel PCOS cluster GWAS results. Manhattan plots for (a) reproductive, (b) metabolic, and (c) indeterminate PCOS subtypes. The red horizontal line indicates genome-wide significance (P ≤ 1.67 × 10−8). Variants proximal to genome-wide significant loci (± 200kb) are colored in green and labeled according to nearby gene(s). Quantile–quantile plots with genomic inflation factor, λ GC, are shown adjacent to corresponding Manhattan plots. Abbreviations: GWAS, genome-wide association studies; PCOS, polycystic ovary syndrome. Figure reproduced from Dapas et al, 2020 (440).
Association testing on the indeterminate subtype replicated the FSHB locus association from the original PCOS GWAS (271). This result suggests that the indeterminate group captures relatively generic PCOS cases. However, the association signal with indeterminate PCOS cases was actually stronger than in the larger, original GWAS (271), suggesting that the indeterminate group was more genetically homogenous after the reproductive and metabolic subtypes were removed. It is possible that the indeterminate subtype or some other undefined subset is associated with characteristics that were not included in the clustering analysis, such as AMH levels and/or follicle numbers (55). Regardless, the association between the FSHB locus and the indeterminate cluster indicates that the locus is less likely to drive the forms of PCOS associated with the reproductive or metabolic subtypes.
The reproductive and metabolic clusters identified in the subtyping study appear to capture the opposite ends of the PCOS phenotypic spectrum. The largest dimension of phenotypic variation between women with PCOS was an axis with high SHBG levels on one end and high BMI and fasting insulin levels on the other, just as in a previous analysis of PCOS phenotypic variation by Dewailly and colleagues (55). The reproductive and metabolic subtypes clustered on either end of this axis, respectively (Fig. 9). Overlaying these results with the pathophysiology of PCOS, as illustrated in Fig. 1, the reproductive subtype aligns with LH-dependent androgen production resulting from aberrations in the hypothalamic-pituitary-gonadal axis, whereas the metabolic subtype is consistent with hyperandrogenemia driven primarily by insulin resistance. This concept of distinct causal pathways underlying the subtypes is supported by the fact that PCOS GWAS loci appear to confer risk for PCOS through either reproductive or metabolic mechanisms (452). Indeed, Dewailly and colleagues found that metabolic-, androgen-, and follicle-related variables were all independently correlated with high total testosterone levels in women with PCOS (55). Moreover, affected sisters tended to be concordant for either reproductive or metabolic subtypes. Furthermore, the carriers of the rare variants reported in DENND1A (386), a gene that regulates androgen biosynthesis, were significantly more likely to be classified with the reproductive subtype of PCOS (440). In contrast to the PCOS diagnostic criteria that do not result in genetically discrete phenotypes (117), these reproductive and metabolic subtypes appear to identify distinct genetic architectures and therefore promise to be more biologically relevant to PCOS.
Environmental Contributors to PCOS
Obesity
Obesity is an obvious environmentally determined, putative contributor to PCOS. It is a common feature of PCOS with prevalence rates as high as 80% to 90% in affected women in the United States (453, 454). However, there is clearly a large population of women with PCOS who are not obese. Furthermore, the prevalence rates of PCOS are similar in diverse populations with varying prevalence rates of obesity (124, 127, 138, 455). Additionally, despite increasing prevalence rates of obesity worldwide, the prevalence of PCOS has remained relatively stable (138). Obesity exacerbates androgenic symptoms (456, 457), chronic anovulation (458), insulin resistance (67, 68) and dysglycemia (340, 341). Accordingly, women who present for medical evaluation of PCOS are more likely to be obese (459).
The strongest obesity susceptibility gene, FTO (460), is associated with PCOS (461, 462), although it has not appeared as a genome-wide significant PCOS susceptibility locus in GWAS conducted to date (117, 269). Its association with PCOS appears to be mediated through BMI. FTO and other robustly replicated BMI-increasing alleles are not associated with PCOS independent of BMI (463), but recent Mendelian randomization analyses indicate that genetically determined BMI is significantly associated with PCOS (117, 136, 137, 358). Conversely, genetically determined PCOS is not associated with BMI (136, 137). Taken together, the genetic data support the hypothesis that obesity does contribute to the development of PCOS.
Developmental Origins of PCOS
It is now widely recognized that environmental factors acting in early life can contribute to the development of a number of adult chronic diseases (464-466), a field known as the developmental origins of health and disease (DOHaD). There is compelling evidence that suggests early-life exposures contribute to PCOS risk, as well. The co-occurrence of PCOS in DZ twins is greater than half the rate observed in MZ twins (165), which is indicative of shared environmental influences. Furthermore, maternal parent-of-origin effects on PCOS phenotype were found to be much stronger than paternal effects (467), suggesting that the intrauterine environment may play a role in PCOS.
Based on epidemiologic studies, Barker first proposed the thrifty phenotype hypothesis that fetal undernutrition results in adult cardiometabolic disorders (466, 468). Low birth weight is a consequence of poor fetal nutrition and is highly correlated with adult cardiovascular disease (469). Ibáñez and colleagues proposed that low birth weight was associated with the later development of premature pubarche and PCOS (470, 471). However, studies examining the association between birth weight and PCOS have yielded conflicting results. There was no association between low birth weight and the development of PCOS in a number of studies (472-477), whereas others found such an association (478, 479). The reasons for these discrepant findings are unclear but ethnic/racial differences could account for some of the heterogeneity in study outcomes (480-482).
There is considerably more evidence to support the role of another extrinsic factor, intrauterine androgens, in the pathogenesis of PCOS. Abbott, Dumesic, and colleagues reported that female rhesus monkeys who had been exposed to testosterone in utero developed both reproductive and metabolic features of PCOS after puberty (483-488). Similar phenocopies of PCOS could be created in other animal species, including sheep (489-491) and rodents (492-494), by prenatal androgen exposure. These observations have led to the hypothesis that intrauterine exposure to androgens is a final common path for the development of PCOS (495). Maternal circulating androgen levels are increased in pregnant women with PCOS (496). Maternal PCOS was associated with greater risk for developing androgen-associated neuropsychiatric disorders, after accounting for genetic relatedness (497). However, given the tremendous capacity of the placenta to aromatize androgens (498-500), the source of intrauterine androgens in PCOS could be fetal rather than maternal. The fetal adrenal is steroidogenically active and an important source of fetal androgens (501). The fetal ovary is minimally steroidogenically active under normal circumstances (501). Nevertheless, it has the steroidogenic enzyme capacity for androgen biosynthesis (502, 503). Therefore, it is biologically plausible that there is ovarian as well as adrenal androgen secretion by the female fetus.
Based on our findings that hyperandrogenemia was a consistent PCOS endophenotype (87, 98, 154, 155), we proposed that variation in a gene regulating androgen biosynthesis could contribute to PCOS (98), perhaps by causing intrauterine androgen excess (504). The discovery that the PCOS GWAS candidate gene, DENND1A, is a key regulator of androgen biosynthesis (304) and that rare variants in it are present in ~50% of families with PCOS (386) supports this hypothesis. Moreover, a recent Mendelian randomization found that genetically determined total and bioavailable testosterone levels were causally associated with PCOS (171).
Tata and colleagues (25) have proposed an alternative mechanism for prenatal androgen excess. They demonstrated that AMH administration to pregnant mice resulted in increased maternal LH-dependent testosterone secretion and a PCOS phenotype in adult female offspring. These offspring have hyperactivated GnRH neurons, an established extragonadal action of AMH (24). The GnRH hyperactivation was normalized with intermittent GnRH antagonist treatment. They found that AMH levels were increased during pregnancy in women with PCOS, supporting the potential physiologic relevance of AMH-mediated increased maternal LH secretion. However, our findings that PCOS-specific rare genetic variants in AMH (375) or in AMHR2 (376) reduce signaling through this pathway in a substantial minority of cases suggest that there may be different AMH-mediated pathways that contribute to PCOS (505).
Nevertheless, human studies attempting to find evidence for prenatal androgen exposure have provided conflicting results. Cord blood androgen levels from female infants of women with PCOS have been reported to be increased (506, 507), decreased (476, 501, 508), or unchanged (509). Differences in steroid assay methodology may have contributed to these discrepant results, with studies employing the gold standard method, liquid chromatography with mass spectrometry, failing to find increased androgen levels (476, 509). Further, cord blood androgens levels did not predict the development of PCOS in a prospective cohort study (510). However, cord blood steroid levels at birth may not reflect those levels in the second trimester during the putative period for androgen programming of PCOS (495). A lower ratio of the index (digit 2) to ring (digit 4) finger (511, 512) or a longer anogenital distance (513) provide anatomical evidence for prenatal androgen exposure. Digit ratio studies in PCOS have been conflicting, with reports of lower ratios (514) and unchanged ratios (515). Similarly, there have been conflicting reports regarding anogenital distance in PCOS, with several studies finding increased anogenital distance in adult women with PCOS (516-518) as well as in their daughters (518). However, another study in infant daughters of women with PCOS failed to find differences in anogenital distance compared to control girls (519).
Epigenetic Inheritance
Gene expression is controlled by mechanisms that alter the local structure of DNA folding within chromosomes and/or recruit certain proteins that promote or repress transcription. Such modifications that persist through mitosis are referred to as epigenetic and contribute to trait variance beyond genetic differences. Two of the most common epigenetic mechanisms are DNA methylation, in which methyl groups are added directly to certain DNA nucleotides leading to gene silencing (520), and histone modifications, where chemical modifications to histone tails can distort local chromatin structure or recruit remodeling enzymes (521). Epigenetic modifications are the mechanistic link between nature and nurture, because epigenetic states are shaped by both intrinsic genetic variants and extrinsic environmental exposures (522). Epigenetic modifications are how early-life experiences can lead to specific phenotypic changes in adulthood (522). Furthermore, epigenetic programming is often disrupted in disease (520, 523). Epigenetic markers are globally erased in primordial germ cells and reset following fertilization to allow for organism development (524). However, an accumulation of studies, particularly in rodents, have demonstrated that epigenetic modifications can persist across multiple generations (525), whether through incomplete developmental reprogramming or through transgenerational fetal programming.
Evidence for transgenerational epigenetic, non-DNA-sequence-based inheritance in PCOS, such as persistent DNA methylation patterns, could connect numerous findings concerning environmental influences, developmental origins, and unexplained heritability in PCOS pathogenesis (526, 527). It is possible the epigenetic transgenerational changes contribute to some of the unexplained heritability of PCOS (526, 528). Correlation of DNA methylation between MZ twins is greater than between DZ twins, but most of the difference can be attributed to additive genetic influences (529-531). Epigenetic modifications caused by genetic variation would inherently be accounted for in heritability estimates from both twin studies and GWAS, and therefore would not contribute to unexplained heritability. However, epigenetic effects divorced from additive genetic variation at the population level would not be quantified in GWAS-based heritability estimates, and therefore could contribute to unexplained heritability (532).
Several studies have provided evidence for physiologically relevant epigenetic changes in women with PCOS by demonstrating alterations in gene expression associated with methylation changes in granulosa cells (395, 533), adipose tissue (534), and skeletal muscle (535). There were also differences in DNA methylation patterns in a pilot study of cord blood from infants of women with PCOS compared to that from control women (536). Particularly compelling evidence for an epigenetic contribution to PCOS heritability mediated by in utero exposures has been provided recently by complementary studies examining transgenerational effects of prenatally exposed murine PCOS models (528, 537). These studies identified persistent PCOS-like transgenerational effects in the third generation of offspring from mice either prenatally androgenized by dihydrotestosterone (528) or exposed to AMH (537). These changes included persistent alterations in gene expression across generations. Parallel human studies supported these findings by demonstrating that some of the same genes were differentially methylated (538) or expressed (528) in cross-sectional PCOS case-control cohorts and in daughters of women with PCOS. It is noteworthy that the genes with altered expression, as well as their associated Gene Ontology terms (539), differed between the 2 studies, although elevated prenatal androgen levels were a common feature presumably mediating expression changes in both models (25, 538).
These findings should be considered in the context of the robustly replicated evidence for substantial genetic contributions to PCOS. The increased risk of diagnosis in daughters of mothers with PCOS observed by Risal and colleagues (528) is consistent with genetic inheritance. Sisters of women with PCOS are more likely to be affected than their premenopausal mothers (540), MZ twins are much more likely to both have PCOS than DZ twins (165), and the rate of disease concordance between DZ twins and non-twin sisters is similar (165, 540), all of which indicates that intrauterine exposure does not drive PCOS inheritance in humans. Family studies have also shown that there are paternal effects on PCOS risk and phenotype (541, 542), which is inconsistent with an intrauterine exposure model. It is possible that paternal effects can be transmitted via epigenetic inheritance (543), but it is likely that perceived inheritance of DNA methylation in mammals is primarily driven by genetics and intrauterine exposure rather than incomplete erasure of epigenetic signatures during developmental reprogramming (544). Therefore, rather than being exclusive mechanisms, it is more plausible that genetically determined core pathways, eg, testosterone biosynthesis (98, 171, 386, 401), act in concert with epigenetic actions of these androgens to produce PCOS phenotypes. This hypothesis aligns with the current understanding of complex traits, in which heritability is predominantly driven by numerous common variants with small effects (401). Collectively, these findings emphasize the need to better understand how both genetic and environmental factors lead to specific epigenetic changes that can increase risk for developing PCOS.
Summary and Future Directions
In less than 10 years, modern genetic analyses have confirmed an important contribution of genetic variation to the pathogenesis of PCOS. GWAS have identified numerous common risk alleles, implicating several plausible etiologic pathways related to neuroendocrine, reproductive, and metabolic function. Mendelian randomization studies have used genetic data to support epidemiologic findings linking BMI, insulin, age at menopause, depression, SHBG, and male pattern balding to PCOS risk. LD score regression analyses have indicated that PCOS shares genetic architecture with T2D, coronary artery disease, BMI, insulin levels, HDL levels, triglyceride levels, depression, and age at menarche. Candidate gene sequencing and functional studies have identified rare coding and noncoding variants in AMH and AMHR2 in a substantial minority of European ancestry PCOS cases. Whole genome sequencing studies have found rare, mainly noncoding, hormone-associated variants in DENND1A in ~50% of families with PCOS. Genetic subtyping has indicated there are reproductive and metabolic subtypes of PCOS that appear to have distinct genetic architectures. Meanwhile, animal studies have demonstrated that epigenetic changes incurred by environmental exposures in utero can perpetuate PCOS phenotypes across multiple generations, offering a possible mechanism for genetic and/or non-genetic familial PCOS risk.
Taken together with prevailing models in complex trait genetics (183, 184, 545), results from recent PCOS genetics studies have yielded important insights that should inform future directions of research. PCOS GWAS have identified many risk loci (Table 3), but it is important to emphasize that GWAS SNPs are rarely the disease-causing variants themselves but rather are tagging regions of the genome containing the actual causal variants (220). Extensive fine-mapping studies with subsequent functional analyses will be needed to determine how genetic variation within these loci contributes to the development of PCOS. Indeed, determining the biological significance of genetic variation remains a central challenge in elucidating complex traits such as PCOS (220).
There is a pressing need to expand genetic analyses to PCOS cohorts of diverse racial and ethnic backgrounds. Because genetic variation is population-specific, risk loci identified in genetic association studies do not necessarily apply to individuals with a different ancestry (Table 3). Moreover, for shared risk loci, individual risk variants are likely to vary across populations. For example, the top risk SNPs differ between European and Han Chinese PCOS susceptibility loci (117, 546, 547). In general, the proportion of shared variants between 2 populations is a function of time since their divergence, and common variants tend to be older (548). Accordingly, there is common genetic variation shared between the 2 major PCOS ancestry groups studied to date, European and Han Chinese (117, 271, 549, 550), but rare genetic variation appears to be population-specific (551). Multiethnic studies can utilize population information for fine mapping causal variants (552-554). Ancestral differences in LD structure in shared risk loci can also help to pinpoint the location of causal variants (555). For example, LD blocks among individuals with African ancestry tend to be much smaller, as human migration across the rest of the world was associated with a corresponding population bottleneck and reduction in genetic diversity (556). Importantly, performing genetic studies in minority populations would not only help researchers to better understand PCOS pathogenesis, but it would also help ensure that these historically understudied populations will be more likely to share in the potential benefits of PCOS genetics research in the future.
Results from whole genome sequencing studies indicate that set-based rare variant association tests in larger PCOS cohorts could be used to discover more functional variants. Family-based cohorts can be more effective for identifying rare variants relative to population-based cohorts (205), because families are enriched for the same variants and Mendelian inheritance filters can further refine the selection of candidate variants (206, 220, 386). However, targeted sequencing of candidate genes may be a more cost-effective and reliable approach for identifying pathogenic rare variants (557), as disease risk genes harbor both common and rare risk variants (183, 545).
Methods for both fine mapping and rare variant association testing typically utilize information from large databases that combine association study results and functional data to predict the relative effects of different variants (318, 558). For example, by aggregating data from numerous ChIP-seq experiments in various tissues (559), variant effects on transcription factor binding can subsequently be predicted “in silico” (560). Regulatory variant effects are tissue- and cell-type-specific (561), however, so the relevance of available genome annotation data for PCOS studies may depend on the particular gene(s) under investigation. Moreover, specific cell populations within tissues are often of pathophysiologic importance to PCOS, eg, theca cells in the ovary, gonadotropes in the pituitary or GnRH neurons in the arcuate nucleus of the hypothalamus; whole-tissue gene expression may not be sufficiently sensitive to detect changes at the cellular level. Single-cell RNA sequencing could be used to distinguish cell-type-specific effects (562) but is likewise dependent on the ability to procure relevant human tissue samples.
Genetic analyses have also enabled the first objective assessment of the PCOS diagnostic criteria. The criterion of PCOM was added to PCOS diagnosis with the Rotterdam criteria (96, 97). However, in the PCOS meta-analysis of more than 10 000 cases (117), the genetic architecture was similar among nonoverlapping PCOS cases defined by NIH criteria, non-NIH Rotterdam criteria, or by self-report for 13 of 14 susceptibility loci. In the PCOS-related trait analysis, PCOM was genetically similar to OD for 7 of the 8 loci with which it was associated; there was only 1 locus associated solely with PCOM (117). Taken together, these findings suggest that assessment of PCOM is not needed for the diagnosis of PCOS. In contrast, unsupervised clustering identified reproductive and metabolic subtypes that appeared to have distinct genetic architectures (440). Although these findings require replication, they provide an example of modern disease classification based on objective biologic differences (94).
Current genetic risk scores for PCOS derived from genome-wide significant PCOS susceptibility loci (117) have limited predictive power given the small effect sizes of common genetic variants and the fact that variants below the genome-wide significance threshold collectively contribute greatly to disease risk (184, 563). It remains to be determined whether expanded polygenic risk scores, or subtype-specific risk scores, or rare variants contributing to PCOS pathogenesis (375, 376, 386) have greater predictive power. Integrated risk models that consider genetic and clinical factors may prove most effective for risk stratification (564-566).
The extensive body of animal data provides compelling support for the role of intrauterine androgen programming in PCOS pathogenesis with recent murine models demonstrating evidence for transgenerational androgen effects (528, 537). These mechanisms likely act in concert with genetic variation, leading to intrauterine androgen excess that contributes to the development of PCOS through epigenetic effects in utero. Numerous studies have reported differentially methylated and/or expressed genes in PCOS, but few have tied these changes to specific genetic variants. Jones and colleagues (398) found variants at PCOS loci that were associated with adjacent, differentially methylated regions in adipose tissue. Another study by Makrinou and colleagues (567) identified sites at PCOS risk loci that were differentially methylated in granulosa lutein cells of women with PCOS. Neither of these studies, however, could attribute any epigenetic differences to PCOS risk variants. Pau and colleagues (452) identified differentially expressed genes in adipose tissue associated with specific PCOS risk genotypes, including BLK and NEIL2 at the NEIL2/GATA4 locus, and GLIPR1 and PHLDA1 at the KRR1 locus (Table 3), but the intermediary mechanisms remain unknown. Therefore, in an effort to connect the parallel avenues of research exploring genetic and epigenetic risk factors, future studies should strive to identify risk variants that associate with the epigenetic changes observed in PCOS.
While many opportunities remain for future research in PCOS genetics, we can now propose an updated model of PCOS pathogenesis that incorporates findings from the genetic studies reviewed herein. Because the characteristic hormonal disruptions in PCOS self-propagate in a feedback loop along the hypothalamic-pituitary-gonadal axis (Fig. 1), the underlying causes of this syndrome can vary in their tissues and pathways of origin and nonetheless result in the same PCOS phenotype. This notion has been supported empirically by PCOS GWAS, which have implicated genes in neuroendocrine, metabolic, and reproductive pathways (117, 269-272, 452). The different phenotypes described by PCOS diagnostic criteria do not appear to capture this genetic heterogeneity (117), but by characterizing women by their relative hormonal profiles using machine learning methods (440), we can identify subsets of women with PCOS rooted in one or another of these core pathways. These subsets include reproductive and metabolic subtypes with unique genetic associations. Rare genetic variation in DENND1A, AMH, and AMHR2 implicate androgen biosynthesis and AMH signaling as 2 of the core reproductive pathways in PCOS pathogenesis. Similarly, certain environmental risk factors are likely to be more or less relevant to these different forms of PCOS, as evidenced by the specific phenotypes that result from different in utero exposures (25, 495, 538). Ultimately, these relative genetic and environmental contributions occur along a spectrum. It is likely that many women with PCOS have genetic risk alleles from multiple core pathways. There are likely genetic risk factors that transcend subtypes, as well.
PCOS can no longer be dismissed as a perplexing vicious cycle of hormonal disturbances (16). Its phenotypic and genetic heterogeneity clearly indicate the need to shift away from PCOS diagnosis based on expert opinion to criteria based on biologic mechanisms. By deconstructing the syndrome into its core components, genomics is leading the transition toward precision medicine for PCOS. Ongoing genetic analyses promise to elucidate the distinct etiologies of PCOS, enabling the development of targeted therapies to reverse and ultimately prevent the development of the syndrome. Affected women and their families, who have been woefully underserved by the medical community (568, 569), may at last look forward to reaping the benefits of precision medicine in predicting, treating, and preventing PCOS.
Abbreviations
- AES
Androgen Excess Society
- AMH
anti-Müllerian hormone
- BMI
body mass index
- ChIP-seq
chromatin immunoprecipitation followed by NGS
- DHEAS
dehydroepiandrosterone sulfate
- DZ
dizygotic
- EHR
electronic health record
- FSH
follicle-stimulating hormone
- FSHR
follicle-stimulating hormone receptor
- GnRH
gonadotropin-releasing hormone
- GWAS
genome-wide association studies
- HA
hyperandrogenism
- LD
linkage disequilibrium
- LH
luteinizing hormone
- LHCGR
luteinizing hormone/human chorionic gonadotrophin receptor
- MAF
minor allele frequency
- MZ
monozygotic
- NGS
next-generation sequencing
- NICHD
National Institute of Child Health and Human Development
- NIH
National Institutes of Health
- OD
ovulatory dysfunction
- PCO
polycystic ovaries
- PCOM
polycystic ovary morphology
- PCOS
polycystic ovary syndrome
- PheWAS
phenome-wide association study
- RCT
randomized controlled trial
- SHBG
sex hormone–binding globulin
- SNP
single-nucleotide polymorphism
- T2D
type 2 diabetes
- TDT
transmission disequilibrium test
- WES
whole exome sequencing
- WGS
whole genome sequencing
Financial Support
National Institutes of Health funding sources: P50 HD044405, R01 HD085227, and R01 HD100812 (to A.D.); 5TL1TR002388 (to M.D.)
Disclosures
The authors have nothing to disclose.