Large-scale meta-analysis highlights the hypothalamic–pituitary–gonadal axis in the genetic regulation of menstrual cycle length

Abstract The normal menstrual cycle requires a delicate interplay between the hypothalamus, pituitary and ovary. Therefore, its length is an important indicator of female reproductive health. Menstrual cycle length has been shown to be partially controlled by genetic factors, especially in the follicle-stimulating hormone beta-subunit (FSHB) locus. A genome-wide association study meta-analysis of menstrual cycle length in 44 871 women of European ancestry confirmed the previously observed association with the FSHB locus and identified four additional novel signals in, or near, the GNRH1, PGR, NR5A2 and INS-IGF2 genes. These findings not only confirm the role of the hypothalamic–pituitary–gonadal axis in the genetic regulation of menstrual cycle length but also highlight potential novel local regulatory mechanisms, such as those mediated by IGF2.


Introduction
A menstrual cycle is crucial for human reproduction as it is required for oocyte selection, maturation and ovulation in preparation for its fertilization and subsequent pregnancy (1). The median menstrual cycle length is 27-30 days, depending on age (2) and can be divided into two distinct ovarian phasesthe follicular and luteal phases separated by ovulation. During the follicular phase the emerging follicle secretes estrogen that causes proliferation of the endometrium, the uterine lining, and in the subsequent luteal phase progesterone secretion from the corpus luteum of the ruptured follicle causes endometrium to cease proliferating and change both phenotypically and functionally in preparation for implantation of the embryo (3). The menstrual cycle and its length are under the control of reproductive hormones secreted via the integration of the hypothalamic-pituitary-gonadal axis (HPG axis), where the gonadotropin-releasing hormone (GnRH) secreted from the hypothalamus stimulates the release of the gonadotropins, follicle-stimulating hormone (FSH) and luteinizing hormone (LH), from the anterior pituitary (3,4). FSH and LH in turn stimulate follicular growth and secretion of estrogens to prepare for ovulation and progesterone from ovarian follicular cells (3,4). The length of menstrual cycle reflects fertility status and has been associated with a range of reproductive traits, such as time to pregnancy, risk of spontaneous abortion and success rates in assisted reproduction (5)(6)(7). Moreover, shorter cycles have been associated with an increased risk of a gynecological condition known as endometriosis (8). Although a small twin study suggested no significant heritability for menstrual cycle length (9), it was recently demonstrated that a genetic variant in the promoter of follicle-stimulating hormone beta subunit gene (FSHB) is associated with longer menstrual cycles, nulliparity and lower endometriosis risk (10). However, only variants in, or near, the FSHB gene reached genome-wide significance among 9534 women (10), leaving the possibility that additional loci regulating menstrual cycle length could be revealed in larger studies.
Here, we present the results of a genome-wide association study (GWAS) meta-analysis of 44 871 women of European ancestry. We confirm the previous association with the FSHB locus (10) and also identify four additional novel association signals, contributing to an increase in our knowledge on the underlying genetics of menstrual cycle length control along the hypothalamus-pituitary-ovarian axis and also providing a genetic basis for the observed epidemiological correlations with gynecological pathologies.

SNP-based heritability of menstrual cycle length
We evaluated single nucleotide polymorphism (SNP)-based heritability (phenotypic variance explained by SNPs in the GWAS meta-analysis) using LD-score regression (LDSC) (13). The overall SNP-based heritability of menstrual cycle length was estimated at 6.1% (s.e. = 1.2). After filtering out all variants within ± 500 kb of the lead SNPs, the heritability estimate for menstrual cycle length decreased to 5.4% (s.e. = 1.1), indicating that common SNPs explain a small but significant part of menstrual Figure 1. Regional plots for five genome-wide significant loci. Regional plot depicts SNPs plotted by their position and GWAS meta-analysis -log10 (P-value) for association with menstrual cycle length. Nearby genes are shown on the lower panel. cycle length variability, and moreover, the majority of the SNPheritability still remains to be discovered.

Genetic associations between menstrual cycle length and other traits
To evaluate the potential shared genetic architecture between menstrual cycle length and other traits, we performed a lookup in the GWAS catalogue (https://www.ebi.ac.uk/gwas/; Supplementary Material, Table 2) for menstrual cycle length associated variants and candidate SNPs identified by the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) tool. Several significant associations were found for the FSHB locus, including gonadotropin (FSH and LH) levels, age at menarche and menopause, spontaneous dizygotic twinning, endometriosis and polycystic ovary syndrome (PCOS) (P <= 3 × 10 -8 ). Additionally, the NR5A2 locus was associated with menarche timing (P = 5 × 10 -8 ) and showed some evidence for association with age at voice drop (P = 6 × 10 -7 ) and pancreatic cancer (P = 1 × 10 -11 ).
Finally, we carried out a genetic correlation analysis with the LDSC method implemented in LD-Hub (20). Comparison with cardiometabolic, anthropometric, autoimmune, hormone, cancer and reproductive traits [for example lowest P-values were observed for age of first birth (r g = 0.12, s.e. = 0.07, P = 0.055) and age at menopause (r g = 0.15, s.e. = 0.08, P = 0.058)] revealed no significant correlations (Supplementary Material, Table 5).

Functional annotation of associated variants and candidate gene mapping
Functional mapping and annotation of genetic associations for menstrual cycle length was carried out using FUMA (21), and a total of 600 candidate SNPs (defined as being in LD with the lead SNPs with a r 2 >= 0.6) were identified. The majority of these (∼90%; Supplementary Material, Fig. 2A and Table 6) were located in intergenic or intronic regions, and >75% of the variants overlapped chromatin state annotations (Supplementary Material, Fig. 2C and Table 6), suggesting that they affect gene regulation.
To identify the potential effector transcripts for the five significant loci for menstrual cycle length, genes within the loci were prioritized if there was evidence for both expression quantitative trait loci (eQTL) and chromatin interaction (21).
The INS-IGF2 locus (lead signal rs11042596) included a total of 34 candidate SNPs, with the lowest Regulome DataBase (RDB) score (1d-likely to affect binding and linked to expression of a gene target) for rs6578986. eQTL mapping and chromatin interactions highlighted IGF2 and INS-IGF2 as likely effector transcripts at this locus (Supplementary Material, Table 7 and Fig.  3).

Tissue specificity and gene set enrichment analysis
Using the list of genes that were highlighted either in gene-based analysis and/or had both eQTL and chromatin interaction data supporting their candidacy, we performed a tissue specificity and pathway enrichment analysis with the GENE2FUNC option implemented in FUMA (21). Enrichment test of differentially expressed genes (DEGs) across GTEx v7 30 tissue types (see Materials and methods) showed significantly higher expression of prioritized genes in female reproductive tissues: uterus (Bonferroni corrected P-value; P Bon = 0.047), cervix uteri (P Bon = 0.048) and ovary (P Bon = 0.050; Supplementary Material, Fig. 4 and Table 8). Prioritized genes were also overrepresented in hormone activity-related pathways [for example, GO hormone activity FDR = 7.6 × 10 -7 , KEGG GnRH signaling pathway FDR = 1.5 × 10 -3 , WikiPathways (24) ovarian infertility genes FDR = 7.5 × 10 -5 (Supplementary Material, Table 9)]. Tissue and cell-type enrichment analysis with DEPICT (25) revealed no significant enrichments.
Using GREAT (26) we found that genes within the five significant menstrual cycle length GWAS loci are enriched for uterus and circulating hormone level-related mouse phenotypes (Supplementary Material, Table 10) and further highlighted an enrichment at these loci for 'genes involved in hormone ligandbinding receptors' (P FDR = 1.3 × 10 -2 ; Supplementary Material, Table 11). Reviewing the MGI mouse phenotype database (27) showed that mouse knockouts of Fshb, Nr5a2, Gnrh1 and Pgr all present with female reproductive phenotypes (Supplementary Material, Table 12), including altered estrous cycle length or abnormal ovulation for Fshb, Gnrh1 and Pgr (progesterone receptor) (Supplementary Material, Table 13). Nr5a2 (nuclear receptor subfamily 5, group A, member 2) is linked to reduced fertility, primarily by reduced circulating progesterone levels in Nr5a2+/-female mice (28).
The presence of female reproductive phenotypes in mice with altered expression of Fshb, Nr5a2, Gnrh1 and Pgr provides evidence that these genes may be causal and could explain, at least in part, the mediating mechanisms underlying four of the five significant loci associated with menstrual cycle length. Further experimental validation will be necessary to fully unravel the mechanism of these non-coding associations.

Discussion
This large-scale GWAS meta-analysis reveals several novel insights into the genetic control of menstrual cycle length and provides evidence of the genetic underpinnings of the epidemiological associations between menstrual cycle length and other traits. Understanding the genetics regulating normal menstrual cycle variation is vital for figuring out the mechanisms leading to different menstrual cycle-related pathologies. Moreover, genetic control of menstrual cycle and folliculogenesis is of importance for in vitro fertilization treatment, where markers allowing for individualization of treatment protocols are still extensively sought (29).
While some of the results confirm what is already known about the biology of the menstrual cycle (such as the regulatory role of GnRH and FSH in the HPG axis), others point to potentially novel modulators and the role of local control of folliculogenesis. For example, IGF2 has been proposed to be an important local regulator of folliculogenesis (30) as it stimulates estrogen production (31) and modulates the action of FSH and LH, whereas IGF2 expression in turn is regulated by FSH (32). However, to our knowledge no direct link between genetic variation in the INS-IGF2 region and menstrual cycle length had been previously demonstrated. Similarly, while it is known that progesterone is the dominant hormone in the second half of the menstrual cycle, the evidence linking genetic variation in the progesterone signaling pathway with menstrual cycle length was scarce (33,34). SMAD3, highlighted in gene-based analysis, is shown to modulate the proliferation of follicular granulosa cells and also ovarian steroidogenesis (35) and is an essential regulator of FSH signaling in the mouse (36). Recently, genetic variation in SMAD3 was associated with dizygotic twinning (37). However, the obvious candidacy and support for one gene in most of these loci does not exclude the possibility that there might be additional genes and/or functional sequence in these loci that contribute to menstrual cycle length.
Analysis of pleiotropy between menstrual cycle lengthassociated variants and GWAS signals of other traits confirmed the central role of the FSHB locus, which is involved in regulating the reproductive lifespan from menarche to menopause and is also associated with gynecological diseases such as PCOS and endometriosis and with menstrual cycle disturbances. Additionally, we found nominally significant associations with some of the reported PCOS susceptibility loci (15)(16)(17)(18)(19), which might help understand how these loci are involved in PCOS pathogenesis. However, it should be emphasized that women with self-reported irregular menstrual cycles (a hallmark characteristic of PCOS) were excluded from the analyses, potentially limiting the overlap.
While there is epidemiological evidence that shorter menstrual cycles are associated with earlier age at menopause (38), we did not observe a significant overlap on a genetic level, as these traits did not show a significant genetic correlation. At the same time, the FSHB locus is significantly associated with both menstrual cycle length and age at menopause (10,39), indicating that this locus is probably largely driving the observed phenotypic correlation between menstrual cycle length and age at menopause in the literature.
Our study has a number of limitations. First, only selfreported data were available for menstrual cycle length, which might be difficult to accurately recall. Second, the UKBB includes women >40 years, some of whom are approaching menopause and might therefore have more irregular and shorter cycles (40), characteristic to the perimenopause. Therefore, a certain effect of the perimenopause on the effect sizes observed in UKBB cannot be ruled out, especially for the FSHB locus, where we observed significant heterogeneity in the effect estimates for the two cohorts. Also, participants in the UKBB were asked about their current cycle length, whereas EGCUT participants were asked to report their cycle length at the age of 25-35 years, where it is believed to be most regular (40). Although it is possible that the effect estimates from these two cohorts may not be directly comparable, we observe consistency in effect direction and magnitude. Third, we cannot rule out the possibility that some women in our sample have reported their cycle length during use of hormonal contraceptives or others hormones, which affect menstrual cycle length. Finally, while our sample size is the largest to date, it may still be underpowered to detect further associations.
In conclusion, the largest menstrual cycle length GWAS meta-analysis to date not only confirms the role of key players in the HPG axis in the genetic regulation of menstrual cycle length (GNRH1, FSHB and PGR) but also pinpoints novel genes with a potential local regulatory role (such as IGF2/INS-IGF2 and NR5A2). Our analysis also highlights the central role of the FSHB locus in female reproductive health and provides evidence that the systemic determinants of normal menstrual cycle length (FSHB) are also associated with menstrual cyclerelated pathologies, such as excessive, frequent and irregular menstruation. However, the loci identified as significant in our analysis represent a small fraction of the SNP-heritability for menstrual cycle length, warranting additional larger metaanalysis efforts to further uncover the remaining genetic underpinnings of menstrual cycle length. Additionally, we believe the current exploratory analysis forms a good basis for further similar studies with more refined research questions, such as the role of the identified variants in regulating cycle length at different stages of a woman's life.

Data availability
Summary statistics of single-marker analyses are available at http://www.geenivaramu.ee/tools/Cycle length Laisk et al 2018.gz.

Study cohorts
The current meta-analysis included a total of 44 871 women of European ancestry from two cohorts. We used the data of the UKBB, a population-based biobank comprising 502 637 people (aged 37-73 years) recruited from across the UK during 2006-2010, who have filled out detailed medical history questionnaires (41). Menstrual cycle length information was derived from data field 3710 'Length of menstrual cycle'. Participants were asked 'How many days is your usual menstrual cycle? (The number of days between each menstrual period)'. This question was asked of women who had indicated they were not menopausal and still had menstrual periods in their answer to data field 2724 ['Have you had your menopause (periods stopped)?']. The phenotype was transformed according to the default PHESANT pipeline (42), whereby the integer phenotype is split into three ordered bins if a single value represents >20% of all respondents answers. As a result, length of menstrual cycle was split into <26, 26-28 and ≥28 days. All answers corresponding to 'Irregular cycle', 'Do not know' and 'Prefer not to answer' were coded as NA. As a result, each bin included We also included data from the Estonian Biobank (EGCUT), a population-based biobank with 51 515 participants of European ancestry (43). In EGCUT, women >35 years were asked about their menstrual cycle length using the question 'Approximately how long was your menstrual cycle when you were between 25 and 35 years old?', with the following choices: 'I don't know', 'I have not had any menstrual cycles', 'Irregular', '20 days or less', '21-24 days', '25-29 days', '30-135 days' or 'more than 35 days'. To follow a similar structure as with the UKBB data, the answers were regrouped into three bins: <25, 25-29 and

GWAS and meta-analysis
In the UKBB data set, quality control and association testing were carried out as described in https://github.com/Nealelab/UK Biobank GWAS. In brief, samples were filtered for white British genetic ancestry, related individuals, individuals with sex chromosome aneuploidies and individuals who had withdrawn their participation in the UKBB. The analysis included SNPs imputed to the Haplotype Reference Consortium (HRC) reference panel, and additional filters included minor allele frequency (MAF) > 0.1%, Hardy-Weinberg equilibrium (HWE) P > 1 × 10 -10 and imputation INFO score > 0.8. Association testing was carried out using linear regression implemented in HAIL (https:// github.com/hail-is/hail), adjusting for the first 10 principal components (PCs).
Before meta-analysis, results from individual cohorts underwent central quality control with EasyQC (44), checking for allele frequency against the HRC reference and filtering out variants with a MAF < 1% and INFO score < 0.4. The results from individual cohorts were meta-analyzed with METAL (45) using samplesize weighted P-value-based meta-analysis with genomic control correction. The meta-analysis included 9 344 826 markers, and those with a P < 5 × 10 -8 were considered genome-wide significant.
To convert the effects obtained from the linear regression of binned trait to a standardized scale, we calculated the mean and variance of the 0, 1 and 2 binned menstrual cycle length phenotype and divided the effect estimates from linear regression with calculated standard deviation of the binned phenotype.

Gene-based testing
Gene-based genome-wide association analysis was carried out with MAGMA 1.6 (14) with default settings implemented in FUMA (21). Briefly, variants were assigned to protein-coding genes (n = 18 297; Ensembl build 85) if they are located in the gene body, and the resulting SNP P-values are combined into a gene test-statistic using the SNP-wise mean model (14).
According to the number of tested genes, the level of genomewide significance was set at 0.05/18 297 = 2.7 × 10 -6 .

Heritability estimate
The menstrual cycle length GWAS meta-analysis summary statistics and LDSC method (13) were used for heritability estimation. The LD estimates from European ancestry samples in the 1000 Genomes Project were used as a reference.

Functional mapping
Functional annotation was performed using the FUMA platform designed for prioritization, annotation and interpretation of GWAS results (21). As the first step, independent significant SNPs in the GWAS meta-analysis summary statistics were identified based on their P-values (P < 5 × 10 -8 ) and independence from each other (r 2 < 0.6 in the 1000G phase 3 reference) within a 1Mb window. Thereafter, lead SNPs were identified from independent significant SNPs, which are independent of each other (r 2 < 0.1). SNPs that were in LD with the identified independent SNPs (r 2 ≥ 0.6) within a 1Mb window, have a MAF of ≥ 1% and GWAS meta-analysis P-value of >0.05 were selected as candidate SNPs and taken forward for further annotation.
FUMA annotates candidate SNPs in genomic risk loci based on functional consequences on genes Annotate Variation (ANNOVAR) (46), CADD (a continuous score showing how deleterious the SNP is to protein structure/function; scores >12.37 indicate potential pathogenicity) (47) and RegulomeDB scores (ranging from 1 to 7, where lower score indicates greater evidence for having regulatory function) (48), 15 chromatin states from the Roadmap Epigenomics Project (49,50), eQTL data (GTEx v6 and v7) (22), blood eQTL browser (51), BIOS QTL browser (52), BRAINEAC (53), MuTHER (54), xQTLServer (55) and the CommonMind Consortium (23) and 3D chromatin interactions from HI-C experiments of 21 tissues/cell types (56), also embedded in the FUMA platform. Next, genes were mapped using positional mapping, which is based on ANNOVAR annotations and maximum distance between SNPs (default 10 kb) and genes, eQTL mapping and chromatin interaction mapping. Chromatin interaction mapping was performed with significant chromatin interactions (defined as FDR < 1 × 10 -6 ). The two ends of significant chromatin interactions were defined as follows: region 1, a region overlapping with one of the candidate SNPs; and region 2, another end of the significant interaction, used to map to genes based on overlap with a promoter region (250 bp upstream and 50 bp downstream of the transcription start site).

Genetic associations between menstrual cycle length and other traits
The Oxford BIG Server (v2.0; http://big.stats.ox.ac.uk/) was used to query the sentinel variants in each locus against an array of UKBB phenotypes (Supplementary Material, Table 3). Additionally, during the FUMA functional mapping, sentinel SNPs and proximal SNPs in tight LD (r 2 = 0.6) were linked with the GWAS catalog (https://www.ebi.ac.uk/gwas/). Full results of the GWAS catalog query are shown in Supplementary Material, Table 2.
We analyzed genome-wide genetic correlation analyses applying the LDSC method (13) using the LD-Hub resource and 50 selected traits (cardiometabolic, anthropometric, autoimmune, hormone, reproductive, cancer and aging categories). Full results of the LDSC genetic correlation analysis are reported in Supplementary Material, Table 5.

Tissue specificity and gene set enrichment analyses
Tissue and gene set enrichment analyses were carried out with GENE2FUNC implemented in FUMA (21). Genes that were highlighted in MAGMA gene-based analysis or which had functional annotation support from eQTL and chromatin interaction data were used as an input (a total of 14 genes). Using all genes as a background gene set, 2 × 2 enrichment tests were carried out. The GTEx v7 30 general tissue types data set was used for tissue specificity analyses. DEG sets are pre-calculated in the GENE2FUNC by performing two-sided t-test for any one of tissues against all others. For this, expression values were normalized (zero-mean) following a log 2 transformation of expression values (transcripts per million). Genes with P ≤ 0.05 after Bonferroni correction and absolute log fold change ≥0.58 were defined as DEGs in a given tissue compared to others. In addition to general DEG, upregulated and downregulated DEG sets were also precalculated by taking sign of t-statistics into account. Our set of prioritized input genes was tested against each of the DEG sets using a hypergeometric test, where background genes are genes that have average expression value > 1 in at least one of the tissues. Significant enrichment at Bonferroni corrected P ≤ 0.05 are colored in red in Supplementary Material, Figure 4.

Tools used in this paper.
HAIL

Supplementary Material
Supplementary Material is available at HMG online.