-
PDF
- Split View
-
Views
-
Cite
Cite
E M Schoeman, S Bringans, K Peters, T Casey, C Andronis, L Chen, M Duong, J E Girling, M Healey, B A Boughton, D Ismail, J Ito, C Laming, H Lim, M Mead, M Raju, P Tan, R Lipscombe, S Holdsworth-Carson, P A W Rogers, Identification of plasma protein biomarkers for endometriosis and the development of statistical models for disease diagnosis, Human Reproduction, Volume 40, Issue 2, February 2025, Pages 270–279, https://doi.org/10.1093/humrep/deae278
- Share Icon Share
Abstract
Can a panel of plasma protein biomarkers be identified to accurately and specifically diagnose endometriosis?
A novel panel of 10 plasma protein biomarkers was identified and validated, demonstrating strong predictive accuracy for the diagnosis of endometriosis.
Endometriosis poses intricate medical challenges for affected individuals and their physicians, yet diagnosis currently takes an average of 7 years and normally requires invasive laparoscopy. Consequently, the need for a simple, accurate non-invasive diagnostic tool is paramount.
This study compared 805 participants across two independent clinical populations, with the status of all endometriosis and symptomatic control samples confirmed by laparoscopy. A proteomics workflow was used to identify and validate plasma protein biomarkers for the diagnosis of endometriosis.
A proteomics discovery experiment identified candidate biomarkers before a targeted mass spectrometry assay was developed and used to compare plasma samples from 464 endometriosis cases, 153 general population controls, and 132 symptomatic controls. Three multivariate models were developed: Model 1 (logistic regression) for endometriosis cases versus general population controls, Model 2 (logistic regression) for rASRM stage II to IV (mild to severe) endometriosis cases versus symptomatic controls, and Model 3 (random forest) for stage IV (severe) endometriosis cases versus symptomatic controls.
A panel of 10 protein biomarkers were identified across the three models which added significant value to clinical factors. Model 3 (severe endometriosis vs symptomatic controls) performed the best with an area under the receiver operating characteristic curve (AUC) of 0.997 (95% CI 0.994–1.000). This model could also accurately distinguish symptomatic controls from early-stage endometriosis when applied to the remaining dataset (AUCs ≥0.85 for stage I to III endometriosis). Model 1 also demonstrated strong predictive performance with an AUC of 0.993 (95% CI 0.988–0.998), while Model 2 achieved an AUC of 0.729 (95% CI 0.676–0.783).
The study participants were mostly of European ethnicity and the results may be biased from undiagnosed endometriosis in controls. Further analysis is required to enable the generalizability of the findings to other populations and settings.
In combination, these plasma protein biomarkers and resulting diagnostic models represent a potential new tool for the non-invasive diagnosis of endometriosis.
Subject recruitment at The Royal Women’s Hospital, Melbourne, was supported in part by funding from the Australian National Health and Medical Research Council (NHMRC) project grants GNT1105321 and GNT1026033 and Australian Medical Research Future Fund grant no. MRF1199715 (P.A.W.R., S.H.-C., and M.H.). Proteomics International has filed patent WO 2021/184060 A1 that relates to endometriosis biomarkers described in this manuscript; S.B., R.L., and T.C. declare an interest in this patent. J.I., S.B., C.L., D.I., H.L., K.P., M.D., M.M., M.R., P.T., R.L., and T.C. are shareholders in Proteomics International. Otherwise, the authors have no conflicts of interest.
N/A.
Introduction
Endometriosis is a chronic and progressive inflammatory disease characterized by the presence of estrogen-dependent endometrial-like tissue (or lesions) outside the uterus. Its symptoms include persistent pelvic pain and infertility. Endometriosis occurs in ∼11% of women and girls of reproductive age (Rowlands et al., 2021) and has been observed in 35% of women using assisted reproductive technology procedures (Moss et al., 2021). Disease severity does not always correlate with symptom severity, leading to diagnostic challenges and limited treatment options. In addition to the acute and chronic symptoms associated with the condition, endometriosis has been linked to long-term negative health consequences, including a higher risk of cardiovascular disease, ovarian cancer, and autoimmune diseases (Eisenberg et al., 2022; Okoli et al., 2023).
Based on the location and depth of lesions, the main types of endometriosis are superficial (peritoneal or other sites), deep endometriosis (DE), and ovarian (endometrioma). Disease stage is most commonly classified based on the revised American Society for Reproductive Medicine (rASRM) system, which considers the location, extent, and depth of lesions, as well as adhesions, all visualized at surgery (American Society for Reproductive Medicine, 1997). Despite its recognition for over a century, the exact cause of endometriosis remains elusive, resulting in delays in diagnosis and treatment. With the time for patient diagnosis averaging 7 years from symptom onset, there are negative impacts on physical, mental, and social well-being (Ellis et al., 2022). Endometriosis also imposes a substantial economic burden due to productivity losses and healthcare costs (Ellis et al., 2022).
Diagnosis involves medical history, physical examination, imaging, laparoscopy, and histopathology. The dependability of current diagnostic tools varies, owing to factors such as the location and severity of lesions, as well as the experience of the healthcare provider (Zondervan et al., 2020). The gold standard for diagnosing endometriosis is via laparoscopy, but the procedure is invasive and costly, and carries risks including adverse events such as nerve damage, damage to pelvic organs or major blood vessels, and formation of post-surgical adhesions (Nisenblat et al., 2016a). Imaging techniques such as transvaginal ultrasound and magnetic resonance imaging can identify ovarian endometriosis and DE, nonetheless, the non-invasive identification of endometriosis, particularly in superficial cases, continues to pose a challenge (Nisenblat et al., 2016a).
Non-invasive diagnostic biomarkers would significantly improve early detection and management of endometriosis. Several potential blood biomarkers have been proposed, however, studies to date have been limited by cohort size or lacked validation studies (Nisenblat et al., 2016b; Lipscombe et al., 2021). This study aimed to identify and validate plasma protein biomarkers specific to endometriosis using a proteomics-based approach, which involved discovery, analytical, and clinical validation phases. The study hypothesized that people with endometriosis would have significantly different plasma concentrations of select proteins compared to the general population, or to those with similar pelvic symptoms but no endometriosis, and that such plasma biomarkers could be used for early diagnostic screening.
Materials and methods
Ethical approval
Recruitment and collection protocols were approved by the appropriate ethical review boards (Bellberry Human Research Ethics Committee (ref: 2016-05-383); Royal Women’s Hospital Human Research Ethics Committee (Project No. 10-43, No. 11-24, and No. 16-43)), and all participants provided informed written consent.
Clinical and reference samples
Discovery phase
The discovery phase included samples from 22 endometriosis cases, 15 symptomatic controls, and 19 general population controls obtained from the Wesley Medical Research Institute Biobank (Brisbane, Australia). Samples were pooled in each of the clinical groups and differentially expressed proteins were compared between groups. All endometriosis and symptomatic control samples had their status confirmed by laparoscopy.
Analytical validation phase
Pooled reference plasma (from three donors, obtained from the Australian Red Cross Lifeblood) was used to design targeted assays to measure each biomarker peptide identified from the discovery phase and test the robustness of that measurement, providing analytical validation of the reproducibility of the biomarkers identified.
Clinical validation phase
To clinically validate the candidate protein biomarkers identified in the discovery phase each protein was measured in individual clinical samples from a separate cohort. Samples comprised those of endometriosis cases (n = 464 diagnosed via laparoscopy and confirmed with histopathology) and symptomatic controls without endometriosis (n = 132, confirmed with laparoscopy), obtained from the Royal Women’s Hospital (RWH) (Melbourne, Australia). In addition, general population control samples (n = 153) were obtained from healthy volunteers in the Perth metropolitan area.
All RWH participants attended the Endometriosis and Pelvic Pain Clinic with pelvic, menstrual, and/or intercourse pain and underwent laparoscopy for treatment of endometriosis or suspected endometriosis with histopathology to confirm the presence or absence of endometriosis. Endometriosis severity was classified by the rASRM score including stage I/minimal (1–5), stage II/mild (6–15), stage III/moderate (16–40), and stage IV/severe (>40) (American Society for Reproductive Medicine, 1997). Exclusion criteria included menopause, positive pregnancy test or unknown pregnancy status, and malignancy. Comprehensive demographic and clinical information including age, BMI, age at menarche, gravidity, live births, problems conceiving, type of pelvic pain (menstrual/pelvic/intercourse), menstrual cycle length, smoking status, exogenous hormone medication use, family history of endometriosis, and ethnicity was available. All general population controls answered a comprehensive survey to exclude possible symptomatic endometriosis and other gynecological pathologies.
Sample collection
In all cases, whole blood was collected in EDTA-treated vacutainers (Becton Dickinson, USA) and plasma was prepared by centrifugation at 1500g for 10 min at 4°C. Plasma samples were stored at −80°C until biomarker analysis. For endometriosis cases and symptomatic controls (Wesley and RWH participants), plasma was collected on the day of admission for surgery. The median time for plasma processing was within one day of collection for all cohorts, with samples stored at 4°C between collection and centrifugation. The Wesley samples were collected between 2010 and 2017, the RWH samples between 2012 and 2022, and the general population controls between 2021 and 2022.
Participant characteristics
The clinical and demographic characteristics of participants in the discovery and clinical validation cohort are presented in Table 1. Associations between clinical variables and outcome (endometriosis vs symptomatic controls or general population), were tested using the chi-square test of independence for categorical clinical variables and the Point-biserial correlation test for continuous clinical variables.
Discovery cohort (N = 56) . | |||
---|---|---|---|
Endometriosis (N = 22) . | Symptomatic controls (N = 15) . | General population (N = 19) . | |
Age (years) (mean ± SD) | 35 ± 6 | 35 ± 7 | 36 ± 8 |
Discovery cohort (N = 56) . | |||
---|---|---|---|
Endometriosis (N = 22) . | Symptomatic controls (N = 15) . | General population (N = 19) . | |
Age (years) (mean ± SD) | 35 ± 6 | 35 ± 7 | 36 ± 8 |
Clinical validation cohort (N = 749) . | |||||
---|---|---|---|---|---|
Endometriosis (N = 464) . | Symptomatic controls (N = 132) . | P-value (a) . | General population (N = 153) . | P-value (b) . | |
Age (years) (mean ± SD) | 30 ± 7 | 29 ± 8 | 0.051 | 28 ± 9 | 0.006 |
BMI (kg/m2) (mean ± SD) | 25 ± 5 | 26 ± 6 | 0.006 | 25 ± 6 | 0.594 |
Smoking status (% current or past) | 41 | 37 | 0.771 | 30 | 0.001 |
Age at menarche (years) (mean ± SD) | 13 ± 2 | 13 ± 2 | 0.421 | 13 ± 2 | 0.401 |
Family history of endometriosis (%) | 27 | 25 | 0.571 | 3 | <0.001 |
Pain (% menstrual/pelvic/intercourse) | 93/80/75 | 94/85/82 | 0.608 | 0/0/0 | <0.001 |
Cycle length (% 14–20/21–27/28/29+ days/other*) | 33/8/29/21/9 | 45/11/20/16/8 | 0.071 | 0/19/50/31/0 | <0.001 |
Cycle stage (% menses/prolif/secret/unknown) | 4/25/23/48 | 2/22/23/53 | 0.447 | 9/44/39/8† | <0.001 |
Exogenous hormone medication (% oral/IUD/depo inj) | 32/8/2 | 36/12/2 | 0.376 | 24/7/1 | 0.006 |
Gravidity (% 0/1/2/3+) | 68/16/10/6 | 63/10/9/18 | <0.001 | 73/9/8/10 | 0.709 |
Live births (% 0/1/2/3+) | 81/10/7/2 | 71/11/9/9 | 0.001 | 81/5/12/2 | 0.270 |
Problems conceiving (% yes/no/not tried/unknown) | 18/22/50/10 | 12/23/52/13 | 0.440 | N/A | |
Ethnicity (% AS/AMR/AFR/EUR/Other/Unknown) | 11/1/1/77/4/6 | 5/1/0/85/8/1 | 0.148 | 17/5/1/62/6/9 | <0.001 |
Clinical validation cohort (N = 749) . | |||||
---|---|---|---|---|---|
Endometriosis (N = 464) . | Symptomatic controls (N = 132) . | P-value (a) . | General population (N = 153) . | P-value (b) . | |
Age (years) (mean ± SD) | 30 ± 7 | 29 ± 8 | 0.051 | 28 ± 9 | 0.006 |
BMI (kg/m2) (mean ± SD) | 25 ± 5 | 26 ± 6 | 0.006 | 25 ± 6 | 0.594 |
Smoking status (% current or past) | 41 | 37 | 0.771 | 30 | 0.001 |
Age at menarche (years) (mean ± SD) | 13 ± 2 | 13 ± 2 | 0.421 | 13 ± 2 | 0.401 |
Family history of endometriosis (%) | 27 | 25 | 0.571 | 3 | <0.001 |
Pain (% menstrual/pelvic/intercourse) | 93/80/75 | 94/85/82 | 0.608 | 0/0/0 | <0.001 |
Cycle length (% 14–20/21–27/28/29+ days/other*) | 33/8/29/21/9 | 45/11/20/16/8 | 0.071 | 0/19/50/31/0 | <0.001 |
Cycle stage (% menses/prolif/secret/unknown) | 4/25/23/48 | 2/22/23/53 | 0.447 | 9/44/39/8† | <0.001 |
Exogenous hormone medication (% oral/IUD/depo inj) | 32/8/2 | 36/12/2 | 0.376 | 24/7/1 | 0.006 |
Gravidity (% 0/1/2/3+) | 68/16/10/6 | 63/10/9/18 | <0.001 | 73/9/8/10 | 0.709 |
Live births (% 0/1/2/3+) | 81/10/7/2 | 71/11/9/9 | 0.001 | 81/5/12/2 | 0.270 |
Problems conceiving (% yes/no/not tried/unknown) | 18/22/50/10 | 12/23/52/13 | 0.440 | N/A | |
Ethnicity (% AS/AMR/AFR/EUR/Other/Unknown) | 11/1/1/77/4/6 | 5/1/0/85/8/1 | 0.148 | 17/5/1/62/6/9 | <0.001 |
Cycle length other, unknown, unsure, or not cycling; menses, menstruation; prolif, proliferative phase; secret, secretory phase; IUD, intrauterine device; depo inj, depot injection; AS, Asian; AMR, American; AFR, African; EUR, European; Other, mixed.
Calculated from self-reported data. P-value (a) = endometriosis versus symptomatic controls; P-value (b) = endometriosis versus general population. N/A, data on pregnancy problems was not available for the general population cohort.
Discovery cohort (N = 56) . | |||
---|---|---|---|
Endometriosis (N = 22) . | Symptomatic controls (N = 15) . | General population (N = 19) . | |
Age (years) (mean ± SD) | 35 ± 6 | 35 ± 7 | 36 ± 8 |
Discovery cohort (N = 56) . | |||
---|---|---|---|
Endometriosis (N = 22) . | Symptomatic controls (N = 15) . | General population (N = 19) . | |
Age (years) (mean ± SD) | 35 ± 6 | 35 ± 7 | 36 ± 8 |
Clinical validation cohort (N = 749) . | |||||
---|---|---|---|---|---|
Endometriosis (N = 464) . | Symptomatic controls (N = 132) . | P-value (a) . | General population (N = 153) . | P-value (b) . | |
Age (years) (mean ± SD) | 30 ± 7 | 29 ± 8 | 0.051 | 28 ± 9 | 0.006 |
BMI (kg/m2) (mean ± SD) | 25 ± 5 | 26 ± 6 | 0.006 | 25 ± 6 | 0.594 |
Smoking status (% current or past) | 41 | 37 | 0.771 | 30 | 0.001 |
Age at menarche (years) (mean ± SD) | 13 ± 2 | 13 ± 2 | 0.421 | 13 ± 2 | 0.401 |
Family history of endometriosis (%) | 27 | 25 | 0.571 | 3 | <0.001 |
Pain (% menstrual/pelvic/intercourse) | 93/80/75 | 94/85/82 | 0.608 | 0/0/0 | <0.001 |
Cycle length (% 14–20/21–27/28/29+ days/other*) | 33/8/29/21/9 | 45/11/20/16/8 | 0.071 | 0/19/50/31/0 | <0.001 |
Cycle stage (% menses/prolif/secret/unknown) | 4/25/23/48 | 2/22/23/53 | 0.447 | 9/44/39/8† | <0.001 |
Exogenous hormone medication (% oral/IUD/depo inj) | 32/8/2 | 36/12/2 | 0.376 | 24/7/1 | 0.006 |
Gravidity (% 0/1/2/3+) | 68/16/10/6 | 63/10/9/18 | <0.001 | 73/9/8/10 | 0.709 |
Live births (% 0/1/2/3+) | 81/10/7/2 | 71/11/9/9 | 0.001 | 81/5/12/2 | 0.270 |
Problems conceiving (% yes/no/not tried/unknown) | 18/22/50/10 | 12/23/52/13 | 0.440 | N/A | |
Ethnicity (% AS/AMR/AFR/EUR/Other/Unknown) | 11/1/1/77/4/6 | 5/1/0/85/8/1 | 0.148 | 17/5/1/62/6/9 | <0.001 |
Clinical validation cohort (N = 749) . | |||||
---|---|---|---|---|---|
Endometriosis (N = 464) . | Symptomatic controls (N = 132) . | P-value (a) . | General population (N = 153) . | P-value (b) . | |
Age (years) (mean ± SD) | 30 ± 7 | 29 ± 8 | 0.051 | 28 ± 9 | 0.006 |
BMI (kg/m2) (mean ± SD) | 25 ± 5 | 26 ± 6 | 0.006 | 25 ± 6 | 0.594 |
Smoking status (% current or past) | 41 | 37 | 0.771 | 30 | 0.001 |
Age at menarche (years) (mean ± SD) | 13 ± 2 | 13 ± 2 | 0.421 | 13 ± 2 | 0.401 |
Family history of endometriosis (%) | 27 | 25 | 0.571 | 3 | <0.001 |
Pain (% menstrual/pelvic/intercourse) | 93/80/75 | 94/85/82 | 0.608 | 0/0/0 | <0.001 |
Cycle length (% 14–20/21–27/28/29+ days/other*) | 33/8/29/21/9 | 45/11/20/16/8 | 0.071 | 0/19/50/31/0 | <0.001 |
Cycle stage (% menses/prolif/secret/unknown) | 4/25/23/48 | 2/22/23/53 | 0.447 | 9/44/39/8† | <0.001 |
Exogenous hormone medication (% oral/IUD/depo inj) | 32/8/2 | 36/12/2 | 0.376 | 24/7/1 | 0.006 |
Gravidity (% 0/1/2/3+) | 68/16/10/6 | 63/10/9/18 | <0.001 | 73/9/8/10 | 0.709 |
Live births (% 0/1/2/3+) | 81/10/7/2 | 71/11/9/9 | 0.001 | 81/5/12/2 | 0.270 |
Problems conceiving (% yes/no/not tried/unknown) | 18/22/50/10 | 12/23/52/13 | 0.440 | N/A | |
Ethnicity (% AS/AMR/AFR/EUR/Other/Unknown) | 11/1/1/77/4/6 | 5/1/0/85/8/1 | 0.148 | 17/5/1/62/6/9 | <0.001 |
Cycle length other, unknown, unsure, or not cycling; menses, menstruation; prolif, proliferative phase; secret, secretory phase; IUD, intrauterine device; depo inj, depot injection; AS, Asian; AMR, American; AFR, African; EUR, European; Other, mixed.
Calculated from self-reported data. P-value (a) = endometriosis versus symptomatic controls; P-value (b) = endometriosis versus general population. N/A, data on pregnancy problems was not available for the general population cohort.
Proteomics workflow
Discovery phase
This study analyzed plasma protein biomarkers using a proteomics workflow as previously described (Bringans et al., 2017). In brief, quantitative biomarker discovery (iTRAQ labeling) was performed in quadruplicate experiments on pooled samples across the three groups: endometriosis cases, symptomatic controls, and general population. Each experiment involved immunodepletion of the pooled plasma sample to remove the 14 most abundant proteins. The immunodepleted fraction was then diafiltrated before reduction, alkylation, and enzymatic digestion with trypsin. The resulting peptide solutions were labeled with iTRAQ reagents (Sciex, USA) before mixing 1:1:1 for the three groups of pooled plasma. Desalted samples were then fractionated on a high-pH HPLC system with the resulting 12 fractions injected onto an LCMS system with analysis by a QE-HF Orbitrap (Thermo Fisher Scientific, USA) mass spectrometer.
Proteins observed to be differentially expressed (proteins required to have a P-value ≤0.05 with a relative ratio change of >10%) between endometriosis and symptomatic or general population groups were designated as candidate biomarkers if significant across the experiments. To this list, 12 putative biomarkers previously reported in the literature as having an association with endometriosis were added (see Fig. 1 and Supplementary Table S1).

Flow diagram of biomarker identification proteomics workflow. MRM, multiple reaction monitoring.
Analytical validation phase
For analytical validation, targeted mass spectrometry assays using multiple reaction monitoring (MRM) were defined for each candidate protein biomarker as described in Bringans et al. (2017). Each assay measured changes in relative peptide abundances of individual plasma samples against an 18O-labeled reference plasma to calculate peak area ratios for each of the candidate biomarkers. These ratios were normalized to the median value for each peptide. In brief, the analytical targeted assay was designed utilizing the following method. Each plasma sample was immunodepleted (removal of top 14 abundant proteins) before diafiltration, reduction, alkylation, and digestion of the plasma proteins. The reverse phase desalted sample was then injected along with a fixed amount of the internal standard 18O labeled reference plasma digest onto a microflow (5 µl/min) HPLC system and analyzed on a Sciex 6500 Triple Quad mass spectrometer (Sciex, USA). Assays were assessed for robustness with analytical validation considered successful if the MRM signal for each peptide was individually verified to be unique and where the signal to noise (S/N) was >3.
Clinical validation phase
In clinical validation, a new cohort comprising individual samples (n = 464 endometriosis cases, n = 132 symptomatic controls, and n = 153 general population controls) was measured using the analytically validated targeted MRM mass spectrometry assay. Samples were randomized across plates before analysis to minimize batch effects and ensure consistency. Analysis of the mass spectrometry data was carried out in Skyline software (University of Washington, USA) with both unlabeled and 18O labeled peptide peaks, integrated with peak areas exported to enable calculation of the relative peak ratios.
Statistical and data analyses
The peptide data presented reflect the relative concentration of a protein biomarker between samples. To maximize the likelihood of identifying biomarkers for the disease, changes in protein concentration were initially assessed at the extremes of the disease spectrum, for example, symptomatic controls versus severe endometriosis or general population controls versus endometriosis. To improve the normality of the data, a natural logarithmic transformation was applied to all measurements. Candidate biomarkers were confirmed in bivariate analysis by two-way comparisons of medians using a Mann–Whitney U-test.
To evaluate the diagnostic relationship between clinical characteristics, biomarker concentration, and clinical groups, elastic-net logistic regression modeling was employed (R Statistical Software, v4.2.2; R Core Team, 2021). Clinical variables for inclusion in the models were restricted to age and BMI due to practical usability and accessibility. Repeated or nested cross-validation was performed (glmnet package v4.1-6; caret v6.0-93 (Kuhn, 2008); nestedcv.glmnet package v0.7.4). During the nested cross-validation approach, variables were filtered using a Wilcoxon U-test with a significance threshold of 0.2.
A series of multivariate logistic regression models containing both clinical factors and biomarker concentrations were developed to distinguish: (i) endometriosis cases from general population controls and (ii) endometriosis cases (stages II–IV) from symptomatic controls. To further evaluate the complex interactions and non-linear relationships between predictors, a random forest classifier was employed using the predictors identified during elastic-net logistic regression modeling. This third model was constructed by comparing stage IV endometriosis and symptomatic controls. The performance of Model 3 was then tested across the stages of endometriosis (stages I–IV, i.e. minimal to severe) to assess its effectiveness in diagnosing endometriosis at different disease levels. The randomForest package v4.6-14 (Liaw and Wiener, 2002) was used with 5-fold cross-validation and hyper-parameter tuning (mtry = 2, 3, 4, ntree = 100). Only participants with complete data were included in each model.
To assess the discriminative performance of each model, the area under the receiver operating characteristic curve (AUC) was assessed. DeLong’s test was used to compare the AUC between biomarker models with and without clinical variables. The optimal predicted probability threshold was determined at the maximum Youden Index. Diagnostic performance metrics were computed based at this optimal threshold, including sensitivity (Sn) and specificity (Sp), and positive predictive value (PPV) and negative predictive value (NPV).
A power analysis was conducted to assess the study’s power for subgroup analysis in different stages of endometriosis. The power analysis was performed using the pwr package version 1.3-0 (Champely, 2020) in R. The parameters for the power analysis included a sample size for the subgroup (stage I: n = 241, stage II: n = 65, stage III: n = 58, and stage IV: n = 89; only participants with complete data were included in this analysis), an effect size of 0.5 (Cohen’s d), and a significance level of 0.05. The target statistical power was set at 0.8.
The interaction pathways of the proteins identified in the diagnostic models were examined to provide insights into the biological processes and molecular functions associated with these proteins (STRING database, v12.0; Szklarczyk et al., 2019). Only interactions above a score of 4.0 (medium) were included in the predicted network.
Results
Participant demographics and clinical characteristics
Table 1 presents the demographics and clinical characteristics of the participants (n = 805) in both the discovery and clinical validation cohorts. Age was the only characteristic available for the discovery cohort and no significant difference was observed across the clinical groups. In the clinical validation cohort, BMI, gravidity, and live births were significantly different between endometriosis patients and symptomatic controls. Additionally, age, smoking status, family history of endometriosis, pain characteristics, cycle length, cycle stage, exogenous hormone medication use, and ethnicity were significantly different between the endometriosis patients and the general population.
The significant differences in the cycle stage between endometriosis patients and general population women may be largely explained by the higher proportion of endometriosis patients in the ‘unknown or on hormones’ group. A common management option for symptomatic endometriosis is hormone therapies, which are aimed at inducing amenorrhea and therefore hormone effects will be visualized on histology which do not permit grouping into menstrual/proliferative/secretory phases. It should also be noted that for the general population controls, the menstrual phase was calculated from self-reported data, whereas this was not the case for endometriosis participants (where menstrual dating was carried out using histological assessment by a pathologist).
Biomarker identification
The proteomics discovery experiment identified 48 candidate plasma protein biomarkers that were differentially expressed between endometriosis cases and both symptomatic controls and general population controls (Fig. 1 and Supplementary Table S1).
Targeted mass spectrometry assays were then built against all candidates, and well-defined assays were successfully developed for 39 of these, plus 12 putative biomarkers taken from the literature. Analytical validation was successful if analytically acceptable levels of reproducibility and signal to noise were achieved.
For the clinical validation phase, 51 protein biomarkers were analyzed. During two-way comparisons using a Mann–Whitney U-test, significant (P ≤ 0.05) differences were observed for 41 of the 51 candidate proteins across one or both clinical group comparisons. Ten protein biomarkers (Table 2) were found to be independently associated with endometriosis after adjusting for age and BMI. These biomarkers were assessed for any correlation with the other available clinical information (e.g. menstrual cycle length), and no significant strong or moderate correlations were observed (maximum correlation coefficient of 0.26).
Protein associations: bivariate analysis (Mann–Whitney U-test) versus multivariate modeling (logistic regression).
Protein name . | Accession number . | Bivariate analysis . | Multivariate modeling . | ||
---|---|---|---|---|---|
P-value . | Median difference (endometriosis vs control) . | Model . | Association direction (endometriosis) . | ||
Vitamin K-dependent protein S | P07225 | <0.001 | Decrease | 1 | Negative |
Hemoglobin subunit beta | P68871 | <0.001 | Increase | 1 | Positive |
Serum paraoxonase/arylesterase 1 | P27169 | <0.001 | Increase | 1 | Positive |
Afamin | P43652 | <0.05 | Decrease | 2/3 | Negative |
Coagulation factor XII | P00748 | <0.05 | Decrease | 2/3 | Negative |
Complement component C9 | P02748 | 0.417 | Increase | 2/3 | Positive |
Neuropilin-1 | O14786 | <0.001 | Decrease | 2/3 | Negative |
Inter-alpha-trypsin inhibitor light chain | P02760 | 0.123 | Decrease | 2/3 | Negative |
Selenoprotein P | P49908 | <0.05 | Decrease | 2/3 | Positive |
Proteoglycan 4 | Q92954 | 0.066 | Decrease | 2/3 | Negative |
Protein name . | Accession number . | Bivariate analysis . | Multivariate modeling . | ||
---|---|---|---|---|---|
P-value . | Median difference (endometriosis vs control) . | Model . | Association direction (endometriosis) . | ||
Vitamin K-dependent protein S | P07225 | <0.001 | Decrease | 1 | Negative |
Hemoglobin subunit beta | P68871 | <0.001 | Increase | 1 | Positive |
Serum paraoxonase/arylesterase 1 | P27169 | <0.001 | Increase | 1 | Positive |
Afamin | P43652 | <0.05 | Decrease | 2/3 | Negative |
Coagulation factor XII | P00748 | <0.05 | Decrease | 2/3 | Negative |
Complement component C9 | P02748 | 0.417 | Increase | 2/3 | Positive |
Neuropilin-1 | O14786 | <0.001 | Decrease | 2/3 | Negative |
Inter-alpha-trypsin inhibitor light chain | P02760 | 0.123 | Decrease | 2/3 | Negative |
Selenoprotein P | P49908 | <0.05 | Decrease | 2/3 | Positive |
Proteoglycan 4 | Q92954 | 0.066 | Decrease | 2/3 | Negative |
Model 1, endometriosis versus general population; Model 2, endometriosis (stages II–IV) versus symptomatic controls; Model 3, endometriosis (stage IV) versus symptomatic controls.
Protein associations: bivariate analysis (Mann–Whitney U-test) versus multivariate modeling (logistic regression).
Protein name . | Accession number . | Bivariate analysis . | Multivariate modeling . | ||
---|---|---|---|---|---|
P-value . | Median difference (endometriosis vs control) . | Model . | Association direction (endometriosis) . | ||
Vitamin K-dependent protein S | P07225 | <0.001 | Decrease | 1 | Negative |
Hemoglobin subunit beta | P68871 | <0.001 | Increase | 1 | Positive |
Serum paraoxonase/arylesterase 1 | P27169 | <0.001 | Increase | 1 | Positive |
Afamin | P43652 | <0.05 | Decrease | 2/3 | Negative |
Coagulation factor XII | P00748 | <0.05 | Decrease | 2/3 | Negative |
Complement component C9 | P02748 | 0.417 | Increase | 2/3 | Positive |
Neuropilin-1 | O14786 | <0.001 | Decrease | 2/3 | Negative |
Inter-alpha-trypsin inhibitor light chain | P02760 | 0.123 | Decrease | 2/3 | Negative |
Selenoprotein P | P49908 | <0.05 | Decrease | 2/3 | Positive |
Proteoglycan 4 | Q92954 | 0.066 | Decrease | 2/3 | Negative |
Protein name . | Accession number . | Bivariate analysis . | Multivariate modeling . | ||
---|---|---|---|---|---|
P-value . | Median difference (endometriosis vs control) . | Model . | Association direction (endometriosis) . | ||
Vitamin K-dependent protein S | P07225 | <0.001 | Decrease | 1 | Negative |
Hemoglobin subunit beta | P68871 | <0.001 | Increase | 1 | Positive |
Serum paraoxonase/arylesterase 1 | P27169 | <0.001 | Increase | 1 | Positive |
Afamin | P43652 | <0.05 | Decrease | 2/3 | Negative |
Coagulation factor XII | P00748 | <0.05 | Decrease | 2/3 | Negative |
Complement component C9 | P02748 | 0.417 | Increase | 2/3 | Positive |
Neuropilin-1 | O14786 | <0.001 | Decrease | 2/3 | Negative |
Inter-alpha-trypsin inhibitor light chain | P02760 | 0.123 | Decrease | 2/3 | Negative |
Selenoprotein P | P49908 | <0.05 | Decrease | 2/3 | Positive |
Proteoglycan 4 | Q92954 | 0.066 | Decrease | 2/3 | Negative |
Model 1, endometriosis versus general population; Model 2, endometriosis (stages II–IV) versus symptomatic controls; Model 3, endometriosis (stage IV) versus symptomatic controls.
Model development and validation
Regression models were developed to discriminate between endometriosis cases and the general population (Model 1) or symptomatic controls (Model 2), as shown in Table 2. A random forest model (Model 3) was subsequently developed using the same biomarkers as Model 2 and constructed by comparing severe endometriosis and symptomatic controls, before being applied to all stages of endometriosis. No proteins provided utility in both models 1 and 2/3. For each model, the predicted probabilities for an endometriosis diagnosis were significantly higher (P < 0.0001) in the endometriosis group compared to the general population and symptomatic control groups.
Table 3 and the receiver operating characteristic (ROC) curves in Fig. 2 compare the outcomes predicted by the models against the observed diagnosis of endometriosis, along with the performance metrics (AUC, Sn, Sp, PPV, NPV) for each model. Three of the 10 protein biomarkers demonstrated excellent utility in distinguishing between the two clinical groups in Model 1 (AUC = 0.993, 95% CI 0.988–0.998) compared to age and BMI alone (P < 0.001). In Model 2, age and BMI were significant independent associates of endometriosis (stages II–IV) (AUC = 0.649, 95% CI 0.589–0.709). After adjusting for age and BMI, the remaining seven biomarkers provided significant incremental value to Model 2 (AUC = 0.729, 95% CI 0.676–0.783, P < 0.01). The same seven biomarkers demonstrated significant diagnostic accuracy in Model 3, with an AUC of 0.997 (95% CI 0.994–1.000) for discriminating stage IV endometriosis from symptomatic controls. Critically for clinical usage, Model 3 also showed strong diagnostic performance when applied to all stages of endometriosis (AUC for stage I: 0.852 (95% CI 0.811–0.893); stage II: 0.903 (95% CI 0.853–0.953); stage III: 0.908 (95% CI 0.852–0.965); stage IV: 0.997 (95% CI 0.994–1.000), respectively) (Fig. 3). Power analysis indicates that the study is well-powered for subgroup analysis in stage I, II, and IV endometriosis groups, with power levels of 100%, 80.8%, and 91.3%, respectively, however, the power for the stage III endometriosis subgroup was below the desired threshold at 76.1%.

Receiver operating characteristic curves for Models 1, 2, and 3. Model 1: all endometriosis n = 443 versus general population controls n = 147, Model 2: endometriosis (stages II–IV) n = 212 versus symptomatic controls n = 130, and Model 3: endometriosis (stage IV) n = 89 versus symptomatic controls n = 130. Only participants with complete data were included in each model.

Receiver operating characteristic curves for Model 3: endometriosis (stage IV) versus symptomatic controls applied to all rASRM stages. Stage I: n = 241, stage II: n = 65, stage III: n = 58, and stage IV: n = 89. Only participants with complete data were included in this modeling.
Predicted versus observed diagnosis of endometriosis and performance metrics.
Endometriosis (predicted) . | No endometriosis (predicted) . | Total . | Performance . | |||||
---|---|---|---|---|---|---|---|---|
AUC . | Sn (%) . | Sp (%) . | PPV (%) . | NPV (%) . | ||||
Model 1 at max Youden Index cut-off | ||||||||
General population controls (observed) | 3 | 144 | 147 | 0.993 | 96 | 98 | 99 | 89 |
Endometriosis (observed) | 426 | 17 | 443 | |||||
Total | 429 | 161 | 590 | |||||
Model 2 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 43 | 87 | 130 | 0.729 | 73 | 67 | 78 | 60 |
Stage II–IV endometriosis (observed) | 154 | 58 | 212 | |||||
Total | 197 | 145 | 342 | |||||
Model 3 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 5 | 125 | 130 | 0.997 | 98 | 96 | 95 | 98 |
Stage IV endometriosis (observed) | 87 | 2 | 89 | |||||
Total | 92 | 127 | 219 | |||||
Model 3 applied to other Endometriosis stages using max Youden Index cut-off* | ||||||||
Symptomatic controls (observed) | 37 | 93 | 130 | 0.852 | 87 | 72 | 85 | 75 |
Stage I endometriosis (observed) | 210 | 31 | 241 | |||||
Total | 247 | 124 | 371 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.903 | 82 | 84 | 72 | 90 |
Stage II endometriosis (observed) | 53 | 12 | 65 | |||||
Total | 74 | 121 | 195 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.908 | 91 | 84 | 72 | 96 |
Stage III endometriosis (observed) | 53 | 5 | 58 | |||||
Total | 74 | 114 | 188 |
Endometriosis (predicted) . | No endometriosis (predicted) . | Total . | Performance . | |||||
---|---|---|---|---|---|---|---|---|
AUC . | Sn (%) . | Sp (%) . | PPV (%) . | NPV (%) . | ||||
Model 1 at max Youden Index cut-off | ||||||||
General population controls (observed) | 3 | 144 | 147 | 0.993 | 96 | 98 | 99 | 89 |
Endometriosis (observed) | 426 | 17 | 443 | |||||
Total | 429 | 161 | 590 | |||||
Model 2 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 43 | 87 | 130 | 0.729 | 73 | 67 | 78 | 60 |
Stage II–IV endometriosis (observed) | 154 | 58 | 212 | |||||
Total | 197 | 145 | 342 | |||||
Model 3 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 5 | 125 | 130 | 0.997 | 98 | 96 | 95 | 98 |
Stage IV endometriosis (observed) | 87 | 2 | 89 | |||||
Total | 92 | 127 | 219 | |||||
Model 3 applied to other Endometriosis stages using max Youden Index cut-off* | ||||||||
Symptomatic controls (observed) | 37 | 93 | 130 | 0.852 | 87 | 72 | 85 | 75 |
Stage I endometriosis (observed) | 210 | 31 | 241 | |||||
Total | 247 | 124 | 371 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.903 | 82 | 84 | 72 | 90 |
Stage II endometriosis (observed) | 53 | 12 | 65 | |||||
Total | 74 | 121 | 195 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.908 | 91 | 84 | 72 | 96 |
Stage III endometriosis (observed) | 53 | 5 | 58 | |||||
Total | 74 | 114 | 188 |
Sn, sensitivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value at the maximum Youden index. rASRM stage I=minimal, stage II=mild, stage III=moderate, and stage IV=severe endometriosis. Only participants with complete data were included in each model.
Max Youden Index as defined for each specific group comparison.
Predicted versus observed diagnosis of endometriosis and performance metrics.
Endometriosis (predicted) . | No endometriosis (predicted) . | Total . | Performance . | |||||
---|---|---|---|---|---|---|---|---|
AUC . | Sn (%) . | Sp (%) . | PPV (%) . | NPV (%) . | ||||
Model 1 at max Youden Index cut-off | ||||||||
General population controls (observed) | 3 | 144 | 147 | 0.993 | 96 | 98 | 99 | 89 |
Endometriosis (observed) | 426 | 17 | 443 | |||||
Total | 429 | 161 | 590 | |||||
Model 2 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 43 | 87 | 130 | 0.729 | 73 | 67 | 78 | 60 |
Stage II–IV endometriosis (observed) | 154 | 58 | 212 | |||||
Total | 197 | 145 | 342 | |||||
Model 3 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 5 | 125 | 130 | 0.997 | 98 | 96 | 95 | 98 |
Stage IV endometriosis (observed) | 87 | 2 | 89 | |||||
Total | 92 | 127 | 219 | |||||
Model 3 applied to other Endometriosis stages using max Youden Index cut-off* | ||||||||
Symptomatic controls (observed) | 37 | 93 | 130 | 0.852 | 87 | 72 | 85 | 75 |
Stage I endometriosis (observed) | 210 | 31 | 241 | |||||
Total | 247 | 124 | 371 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.903 | 82 | 84 | 72 | 90 |
Stage II endometriosis (observed) | 53 | 12 | 65 | |||||
Total | 74 | 121 | 195 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.908 | 91 | 84 | 72 | 96 |
Stage III endometriosis (observed) | 53 | 5 | 58 | |||||
Total | 74 | 114 | 188 |
Endometriosis (predicted) . | No endometriosis (predicted) . | Total . | Performance . | |||||
---|---|---|---|---|---|---|---|---|
AUC . | Sn (%) . | Sp (%) . | PPV (%) . | NPV (%) . | ||||
Model 1 at max Youden Index cut-off | ||||||||
General population controls (observed) | 3 | 144 | 147 | 0.993 | 96 | 98 | 99 | 89 |
Endometriosis (observed) | 426 | 17 | 443 | |||||
Total | 429 | 161 | 590 | |||||
Model 2 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 43 | 87 | 130 | 0.729 | 73 | 67 | 78 | 60 |
Stage II–IV endometriosis (observed) | 154 | 58 | 212 | |||||
Total | 197 | 145 | 342 | |||||
Model 3 at max Youden Index cut-off | ||||||||
Symptomatic controls (observed) | 5 | 125 | 130 | 0.997 | 98 | 96 | 95 | 98 |
Stage IV endometriosis (observed) | 87 | 2 | 89 | |||||
Total | 92 | 127 | 219 | |||||
Model 3 applied to other Endometriosis stages using max Youden Index cut-off* | ||||||||
Symptomatic controls (observed) | 37 | 93 | 130 | 0.852 | 87 | 72 | 85 | 75 |
Stage I endometriosis (observed) | 210 | 31 | 241 | |||||
Total | 247 | 124 | 371 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.903 | 82 | 84 | 72 | 90 |
Stage II endometriosis (observed) | 53 | 12 | 65 | |||||
Total | 74 | 121 | 195 | |||||
Symptomatic controls (observed) | 21 | 109 | 130 | 0.908 | 91 | 84 | 72 | 96 |
Stage III endometriosis (observed) | 53 | 5 | 58 | |||||
Total | 74 | 114 | 188 |
Sn, sensitivity; Sp, specificity; PPV, positive predictive value; NPV, negative predictive value at the maximum Youden index. rASRM stage I=minimal, stage II=mild, stage III=moderate, and stage IV=severe endometriosis. Only participants with complete data were included in each model.
Max Youden Index as defined for each specific group comparison.
Functional enrichment in the network of protein biomarkers for endometriosis
A network analysis of the 10 protein biomarkers associated with endometriosis revealed that most can be broadly categorized into three groups: coagulation cascade, complement system, and protein–lipid complex. Specific associations include: Coagulation factor XII, Complement component C9, and Vitamin K-dependent protein S with the complement and coagulation cascades (P < 0.01); and Afamin and Serum paraoxonase/arylesterase 1 with protein–lipid complex (P < 0.001).
Discussion
This study sought to develop a diagnostic blood test for endometriosis. A proteomics discovery workflow was used to identify and validate a novel panel of plasma protein biomarkers for the disease. Utilization of a large, clinically well-defined, independent cohort (n = 749) led to the development of three multivariate models, which demonstrated good to excellent performance for distinguishing endometriosis from both the general population and symptomatic controls. The models contained a panel of 10 protein biomarkers, which added significant value to clinical factors. This research contributes to the development of non-invasive diagnostic tools for endometriosis, which will have significant implications in reducing diagnostic delay and providing screening tools for surveillance of disease recurrence.
The primary objective of developing a diagnostic test for endometriosis was to distinguish symptomatic controls from endometriosis patients. To facilitate this, general population controls were included to allow investigation of the extremes of disease, thereby identifying potential protein biomarkers for the disease. A model distinguishing healthy women from those with endometriosis also has both biological and clinical relevance. The biology enables an understanding of disease pathophysiology, while clinically there is relevance in the context of fertility where a 3-fold increased incidence of endometriosis in women undergoing fertility treatments is observed (Moss et al., 2021). Infertility is a common consequence of endometriosis and a test to rule in or rule out endometriosis could help guide clinical decisions for assisted reproductive treatments. Model 1 (endometriosis vs the general population) demonstrates the biological association of these protein biomarkers with the disease state. Model 2 (stage II–IV endometriosis vs symptomatic controls) extends the utility of the test into a real-world scenario: differentiating the presence of endometriosis (lesions) from symptomatic pelvic pain in the absence of lesions. The inferior performance of Model 2 may reflect common symptom attributions between groups, and the marginal differences between patients with stage II endometriosis as compared to symptomatic patients, where no endometriosis is observed. Individuals with stage I endometriosis were specifically excluded to improve this, and further work is required to examine this. Nonetheless, Model 3, which applied alternative statistical modeling to allow for the complex interactions and non-linear relationships between predictors in Model 2 (and was built by comparing the extremes of disease, namely stage IV endometriosis vs symptomatic controls), demonstrates strong performance for discriminating disease across all stages of endometriosis, suggesting a clear association of the biomarkers with disease state.
Laparoscopy is the gold standard for diagnosing endometriosis, but it is invasive and costly, carries risks, and is not readily accessible to all patients. Of known plasma biomarkers, CA-125 is sometimes used as a single biomarker for endometriosis. However, CA-125 has limited Sn and Sp, and elevated CA-125 levels can occur in multiple conditions such as ovarian cancer, pelvic inflammatory disease, and menstruation. A recent multicenter study showed that CA-125 differentiated endometriosis from symptomatic controls with Sn 61% at a pre-defined Sp of 60% (Burghaus et al., 2024), with better performance for stage III/IV endometriosis compared to stage I/II (AUC 0.795 vs 0.583, respectively). A 2016 systematic review and meta-analysis reported a pooled Sn of 52% and Sp of 93% for CA-125, with significantly higher Sn for stage III/IV compared to stage I/II endometriosis (Sn 63% vs 25%, respectively) (Hirsch et al., 2016). CA-125 can be effective for diagnosing stage IV endometriosis cases, such as those with dense pelvic adhesions or ovarian endometriomas but is less reliable for other forms of endometriosis, and its use may lead to potential false positives due to the presence of other conditions. Consequently, it is not widely recommended as a diagnostic or screening tool by major guidelines such as the ESHRE (Becker et al., 2022).
The diagnostic models distinguishing endometriosis patients from symptomatic controls are particularly relevant for clinical practice. In comparison to known biomarkers, the multivariate biomarker models developed in the present study to distinguish endometriosis from symptomatic controls have sensitivities of 73% and 98% and specificities of 67% and 96%, for Models 2 and 3, respectively. Important to improving patient outcomes by enabling earlier and more accurate diagnosis, results indicate that Model 3 has potential utility across earlier stages of the disease, with AUCs of ≥0.85 for stage I–III endometriosis. These results compare favorably to the performance of known biomarkers in terms of AUC. In the present study, model cut-offs to assess performance metrics such as Sn and Sp were arbitrarily set at the maximum Youden Index, but further optimization should be considered before using in a clinical setting. By providing a non-invasive diagnostic method to differentiate endometriosis from other causes of pelvic pain, such tools can help clinicians make more informed decisions about which patients should undergo invasive procedures like laparoscopy, and facilitate more targeted and effective treatment plans, enhancing overall patient care.
The discrepancies observed between bivariate and multivariate results in Table 2 (Complement component C9, Inter-alpha-trypsin inhibitor light chain, Selenoprotein P, and Proteoglycan 4) can be attributed to two key factors: unmeasured confounders and the suppressor effect. Bivariate analysis assesses the association between two variables without considering other confounding factors. A suppressor variable, unlike a typical confounder, does not directly affect the outcome but interacts with the predictor, altering the strength or direction of the association. Inclusion of a suppressor variable in a multivariate model, can reveal the true relationship between the predictor and the outcome (Woolley, 1997).
Biologically, each of the 10 protein biomarkers identified in the diagnostic models for endometriosis plays a role relevant to disease pathophysiology, including in the coagulation and complement cascades, lipid metabolism, oxidative defense, immune regulation, and tissue homeostasis and morphogenesis. Of the 10 proteins listed in Table 2, only three (Selenoprotein P, Neuropilin-1, and Serum paraoxonase/arylesterase 1) have previously been directly linked with endometriosis, as discussed below.
In the complement cascade, Complement component C9 is required for target cell lysis during complement activation (Noris and Remuzzi, 2013). In Model 2, Complement component C9 has a positive association with endometriosis. Complement dysregulation has been implicated in the pathophysiology of endometriosis (Rahal et al., 2021).
For the coagulation cascade, Coagulation factor XII is crucial for fibrin clot formation (Renne et al., 2012). Similarly, Vitamin K-dependent protein S has a role in regulating coagulation (Weizmann Institute of Science, 2024). In Model 1, Vitamin K-dependent protein S showed a negative association with endometriosis, whereas Coagulation factor XII had a negative association in Model 2. A previous small study failed to find a statistically significant difference in blood Vitamin K-dependent protein S levels between endometriosis patients and controls (Pretta et al., 2007). Hemoglobin subunit beta, a component of hemoglobin, indirectly contributes to coagulation by contributing to overall blood function (Ross et al., 2024). In Model 1, Hemoglobin subunit beta exhibited a positive association with endometriosis.
Afamin, Selenoprotein P, and Serum paraoxonase/arylesterase 1 play roles in lipid metabolism and oxidative defense (Dieplinger and Dieplinger, 2015; Shunmoogam et al., 2018; Saito, 2021). While earlier work has found no significant difference in mean serum Afamin concentrations between endometriosis and control groups using ELISA (Seeber et al., 2010), Afamin correlated negatively with endometriosis in Model 2. A positive correlation was observed for Selenoprotein P in Model 2, and the bivariate results for Selenoprotein P are consistent with previous findings from a small study (n = 8), where downregulated gene expression was reported in tissue samples from patients with endometriosis (Aghajanova et al., 2011). Additionally, a positive correlation exists between Serum paraoxonase/arylesterase 1 and endometriosis in Model 1. Interestingly, another study reported reduced Serum paraoxonase/arylesterase 1 activity (not concentration) in women with endometriosis compared to controls (Verit et al., 2008). This apparent discrepancy may arise from assessing different aspects of Serum paraoxonase/arylesterase 1 within distinct biological contexts or using varied measurement methods.
Neuropilin-1, Inter-alpha-trypsin inhibitor light chain, and Proteoglycan 4 each contribute significantly to immune regulation (Das et al., 2019; Lord et al., 2020; Chikh and Raimondi, 2024). Neuropilin-1 also plays important roles in vascular development (Chikh and Raimondi, 2024), while Inter-alpha-trypsin inhibitor light chain contributes to extracellular matrix organization (Lord et al., 2020) and Proteoglycan 4 acts as a boundary lubricant (Das et al., 2019). The negative association observed for Neuropilin-1 in Model 2 contrasts to previous findings of elevated serum Neuropilin-1 for endometriosis patients when compared to healthy controls by ELISA (Barberic et al., 2020). Both Inter-alpha-trypsin inhibitor light chain and Proteoglycan 4 also showed negative associations with endometriosis in Model 2.
Further investigations are warranted to unravel the precise mechanisms underlying the associations of these protein biomarkers with endometriosis and potential therapeutic implications. The results presented here, which contrast with the literature, highlight the challenges of biomarker analysis in small cohorts and across different laboratories. Differences between published studies and the results in this manuscript may be also due to pre-analytical and analytical factors. Varying gene expression profiles across different tissues affected by endometriosis, along with post-transcriptional regulation and translation efficiency, may contribute to the divergent results.
The strengths of this study are in its robust sample size, independent cohorts, a well-defined clinical validation cohort, and high-performing models using simple components. The use of clinical variables in the models was deliberately limited to age and BMI because this information can be easily and precisely determined. In concert, the exclusion from the models of other clinical information such as menstrual cycle stage, exogenous hormone use, or family history of endometriosis avoids potentially imprecise variables, whilst also ensuring the test can be widely used. The robust sample size enhances statistical power, leading to more reliable conclusions. The utilization of independent cohorts strengthens the findings by validating the biomarkers across different populations, thereby minimizing bias and increasing generalizability. This study also benefits from a diagnosis method grounded in laparoscopy and histopathology, ensuring a more reliable assessment of disease absence or presence. The use of both elastic-net logistic regression and random forest algorithms in the analysis underscores the robustness and versatility of the models, providing a comprehensive evaluation of the diagnostic potential of the identified biomarkers. The random forest approach confirms the importance of the biomarkers, but may be dataset specific, and further validation is warranted.
The study also has potential limitations. The participants were mostly of European ethnicity, and the study was not powered to detect differences across ethnic groups. The use of minimal clinical variables in the models may reduce diagnostic performance, however, information on more complex clinical variables may not always be available or consistently measured. It is also possible that some general population controls might have endometriosis, which could potentially skew results. Given the nature of the condition, the prevalence of asymptomatic endometriosis at the general population level is difficult to ascertain, but has been reported to be as high as 11% (Buck Louis et al., 2011). While the study was not specifically designed to stratify patients based on the stage of endometriosis, it is well-powered for subgroup analysis of stage I, II, and IV endometriosis. The experimental design used matched samples within each cohort, however, the delay in time between sample collection and processing and the difference in sample storage could affect biomarker concentrations observed in this study. Further analysis is required to enable generalizability of the findings to other populations or settings, including stratification of patients by type or stage of endometriosis.
This study represents an advancement toward precise non-invasive endometriosis diagnosis and personalized care, achieved through the integration of proteomics and clinical expertise. A panel of novel plasma protein biomarkers was identified that enabled the development of diagnostic models demonstrating strong discriminatory capabilities. The reported functions of these protein biomarkers offer potential insights into endometriosis pathogenesis. Further validation of these biomarkers will fortify the robustness and reliability of this diagnostic tool and enable its integration into clinical practice, benefiting individuals affected by endometriosis and paving the way for improved patient care.
Supplementary data
Supplementary data are available at Human Reproduction online.
Data availability
The data underlying this article cannot be shared publicly due to the privacy of individuals who participated in the study. The data will be shared on reasonable request to the corresponding author.
Acknowledgements
The authors thank the Wesley Research Institute (Queensland, Australia), the Royal Women’s Hospital (Melbourne, Australia) for clinical samples, Linear Clinical Research (Perth, Australia) for the collection of healthy controls, and all of the participants who donated their blood samples. The authors also thank The Proteomics International laboratory staff for their precise contributions to sample preparation and analysis, and Dr Roop Judge, Sue Wong, and Dr Kerryn Garrett for their work in securing access to the clinical samples used in this study. Sections of this work have been presented previously at the 18th Human Proteome Organization World Congress in 2019 (Ito et al., 2019), Fertility Society of AU & NZ Annual Conference 2023, 70th Annual Meeting of the Society for Reproductive Investigation in 2023, and 15th World Congress on Endometriosis in 2023.
Authors’ roles
The study was designed by R.L., S.B., K.P., P.T., S.H.-C., and P.A.W.R., who were responsible for the overall direction and planning of the research. Sample collection and processing were conducted by J.E.G., M.H., and B.A.B. The laboratory work was carried out by C.A., T.C., L.C., M.D., D.I., J.I., C.L., H.L., M.M., and M.R. E.M.S. and K.P. were responsible for the statistical analysis and E.M.S., S.B., K.P., and R.L. were responsible for the interpretation of the results. The manuscript was written, reviewed, and prepared by E.M.S., K.P., S.B., and R.L., with S.H.-C. and P.A.W.R. Each author has approved the final version of the manuscript.
Funding
The subject recruitment at The Royal Women’s Hospital, Melbourne, was supported in part by funding from the Australian National Health and Medical Research Council (NHMRC) project (GNT1105321, GNT1026033) and Australian Medical Research Future Fund (MRF1199715) (P.A.W.R., S.H.-C., and M.H.).
Conflict of interest
Proteomics International has filed patent WO 2021/184060 A1 that relates to endometriosis biomarkers described in this manuscript; S.B., R.L., and T.C. declare an interest in this patent. J.I., S.B., C.L., D.I., H.L., K.P., M.D., M.M., M.R., P.T., R.L., and T.C. are shareholders in Proteomics International. Otherwise, the authors have no conflicts of interest.