Abstract

Context

Circulating proteomes may provide intervention targets for type 2 diabetes (T2D).

Objective

We aimed to identify proteomic biomarkers associated with incident T2D and assess its joint effect with dietary or lifestyle factors on the T2D risk.

Methods

We established 2 nested case-control studies for incident T2D: discovery cohort (median 6.5 years of follow-up, 285 case-control pairs) and validation cohort (median 2.8 years of follow-up, 38 case-control pairs). We integrated untargeted mass spectrometry-based proteomics and interpretable machine learning to identify T2D-related proteomic biomarkers. We constructed a protein risk score (PRS) with the identified proteomic biomarkers and used a generalized estimating equation to evaluate PRS-T2D relationship with repeated profiled proteome. We evaluated association of PRS with trajectory of glycemic traits in another non-T2D cohort (n = 376). Multiplicative interactions of dietary or lifestyle factors with PRS were evaluated using logistic regression.

Results

Seven proteins (SHBG, CAND1, APOF, SELL, MIA3, CFH, IGHV1-2) were retained as the proteomic biomarkers for incident T2D. PRS (per SD change) was positively associated with incident T2D across 2 cohorts, with an odds ratio 1.29 (95% CI, 1.08-1.54) and 1.84 (1.19-2.84), respectively. Participants with a higher PRS had a higher probability showing unfavored glycemic trait trajectory in the non-T2D cohort. Red meat intake and PRS showed a multiplicative interaction on T2D risk in the discovery (P = 0.003) and validation cohort (P = 0.017).

Conclusion

This study identified proteomic biomarkers for incident T2D among the Chinese populations. The higher intake of red meat may synergistically interact with the proteomic biomarkers to exaggerate the T2D risk.

Globally, about 463 million adults were living with diabetes in 2019, with >90% having type 2 diabetes (T2D) (1). The persistent increase in the burden of T2D worldwide and complex heterogeneity of the disease necessitate further investigation into the pathophysiology of T2D (2). Identification of protein biomarkers would be highly informative to help understand the disease predisposition and etiology and may provide new targeted methods for T2D screening, diagnosis, or treatment (3, 4). Many hypothesis-driven studies have investigated the relationship between a specific protein or a group of protein biomarkers and T2D risk (2, 5-8).

Recently, mass spectrometry (MS)-based proteomics technologies, in particular data-independent acquisition MS, have been improved dramatically, allowing effective and high-throughput measurement of large-scale epidemiological samples (9). MS-based proteomics is a multiplexity technology for protein detection with high degree of sensitivity and specificity (9). However, there was only 1 small study that investigated the association of circulating proteins (14 proteins) with incident T2D using targeted MS-based method (10), whereas several other studies only used affinity-based technologies (11-14). Human proteome profiles changes over time (15), and therefore measurements at a single time point, may not capture the variability of proteome profiles during follow-ups. Of note, all these prior studies measured proteome data at single time point, whereas none of them examined the relationship of repeated measures of circulating proteome with incident T2D. On the other hand, none of the previous studies has evaluated the joint effect of proteome with dietary or lifestyle factors on the T2D risk.

Methodologically, highly dimensional and internally correlated proteomic data challenge traditionally statistical strategies, but more sophisticated analytical methods are needed for the feature selection and dimensionality reduction. Recently, machine learning integrated with interpretable algorithms showed unique strength in disease prediction and risk factor identification (16, 17). The combination of machine learning and interpretable algorithms with untargeted MS-based proteomics has a potential to unveil protein biomarkers associated with the incident T2D.

Therefore, with MS-based repeated measures of circulating proteome in 2 nested case-control studies, we aimed to identify proteomic biomarkers for incident T2D using a machine learning model. As a secondary aim, we aimed to evaluate the joint effect of the circulating proteome with dietary or lifestyle factors on the T2D risk.

Materials and Methods

Study and Participants

The overview of the study workflow is shown in Fig. 1. Incident T2D cases were ascertained based on fasting blood glucose ≥ 7.0 mmol/L or glycated hemoglobin (HbA1c) ≥ 6.5% or currently under medical treatment for diabetes at either of the follow-up visits, according to American Diabetes Association criteria for the diagnosis of diabetes (18).

Study design and analysis pipeline. (A) Identification and validation of the type 2 diabetes (T2D) proteomic biomarkers. Within the Guangzhou Nutrition and Health Study (GNHS) and external validation cohort, we established 2 prospective nested case-control studied for incident T2D: 1 discovery cohort (285 case-control pairs) and 1 external validation cohort (38 case-control pairs). A combination of interpretable machine learning framework and conditional logistic regression was used to determine the circulating proteomic biomarkers associated with T2D in the discovery cohort. The identified proteomic biomarkers for T2D were further validated in the independent cohort and confirmed by the repeatedly measured proteomic datasets. (B) Examination of the relationship between the T2D proteomic biomarkers and longitudinal glycemic trait (fasting glucose, HbA1c, and Homeostasis Model Assessment of Insulin Resistance) trajectory in another independent 376 participants from the GNHS who were free of T2D across all visits and with available proteomic data. (C) Examination of the interaction of dietary and lifestyle factors with the protein risk score for incident T2D risk.
Figure 1.

Study design and analysis pipeline. (A) Identification and validation of the type 2 diabetes (T2D) proteomic biomarkers. Within the Guangzhou Nutrition and Health Study (GNHS) and external validation cohort, we established 2 prospective nested case-control studied for incident T2D: 1 discovery cohort (285 case-control pairs) and 1 external validation cohort (38 case-control pairs). A combination of interpretable machine learning framework and conditional logistic regression was used to determine the circulating proteomic biomarkers associated with T2D in the discovery cohort. The identified proteomic biomarkers for T2D were further validated in the independent cohort and confirmed by the repeatedly measured proteomic datasets. (B) Examination of the relationship between the T2D proteomic biomarkers and longitudinal glycemic trait (fasting glucose, HbA1c, and Homeostasis Model Assessment of Insulin Resistance) trajectory in another independent 376 participants from the GNHS who were free of T2D across all visits and with available proteomic data. (C) Examination of the interaction of dietary and lifestyle factors with the protein risk score for incident T2D risk.

Within the Guangzhou Nutrition and Health Study (GNHS), we designed a nested case-control study (Supplementary Figure1) (19) for incident T2D as a discovery cohort for the serum proteomic research. Detailed study designs of the GNHS have been reported previously (20). Briefly, the GNHS was performed between 2008 and 2013 and all participants were followed up every 3 years. According to our prespecified inclusion and exclusion criteria (Supplementary Figure1) (19), we excluded the participants who had self-reported cancers or were without blood samples at baseline. Finally, there were a total of 285 participants with newly developed T2D during follow-up (median 6.5 years’ follow-up). Each incident T2D case was matched to a non-T2D control according to the baseline age (±1 year) and sex. Among the 285 pairs of selected participants, there were 197 case-control pairs with a repeated collection of blood samples during a follow-up visit.

As an independent external validation cohort, we then performed another nested case-control study for incident T2D using a small cohort study (n = 245). The validation cohort was performed between 2015 and 2017 and followed up once by 2019 (21). There were 38 case-control pairs for incident T2D in the independent validation cohort, whose blood samples at baseline and follow-up were collected (with a median 2.8 years of follow-up; Supplementary Figure 1) (19).

We further selected another independent 376 participants within the GNHS who were free of T2D across all visits (with a median 6.3 years’ follow-up). This subcohort was used to explore the longitudinal association between the identified proteomic biomarkers and trajectories for the change of blood glycemic traits over time.

Details of the serum sample preprocessing, proteomic profiling, quality control, covariates, and laboratory measures are described in supplemental materials (19).

Bioinformatics and Statistical Analysis

Statistical analyses were performed using Stata 15 (StataCorp, College Station, TX, USA). The classifier was based on code adapted from sklearn 0.15.2 (22). The missing value of serum protein was filled with 50% of the lowest value observed in all analyzed samples.

We used 2-step analysis to identify the proteomic biomarkers for incident T2D. First, considering the interconnection between proteins, we devised a classifier based on a gradient-boosting predictor trained with LightGBM (23) to identify the core predictive proteins for incident T2D. In this step, all detected proteins were simultaneously included as an input to train the machine learning model. Here, we limited the number of T2D protein biomarkers to be no more than 10 so that they may be practically measured by targeted proteomics or antibodies in the clinic. Based on these criteria, the top 10 ranked predictive proteins in the discovery cohort were ascertained as candidate proteomic biomarkers for incident T2D. Shapley Additive exPlanations (24) was used to estimate the contribution of each protein to the overall classifier prediction. Different from the traditional statistical methods, which heavily rely on assumptions, LightGBM does not require the distribution of a dependent or independent variable to be specified.

Then, we used a conditional logistic regression model to further determine the statistical association of each candidate proteomic biomarkers (top 10 ranked predictive proteins) with incident T2D, adjusting for potential confounders, including age, sex, body mass index (BMI), waist circumference, total energy intake, alcohol drinking, smoking, household income, marital status, self-reported educational level, and technical confounding factors at the baseline. In the discovery cohort, the survived candidate proteins with P < 0.1 were selected as the final T2D proteomic biomarkers. As a replication, we also evaluated the association of each identified T2D proteomic biomarker with incident T2D in the external validation cohort and combined the effect estimates from the two cohorts using random-effects meta-analysis.

To address the concern that circulating proteomes may change over time and the measurement at a single time point may not be sufficient, we then examined the relationship between the dynamic changes in each serum proteomic biomarker over time with incident T2D using a generalized estimating equation (GEE) model adjusted for the same covariates as the previous discussed conditional logistic regression analysis. Here, independent correlation structure was used as the GEE model’s working correlation matrix, and robust estimates of the standard errors were used for the GEE modeling. We combined the effect estimates from the 2 cohorts using random-effects meta-analysis.

We constructed classifiers (LightGBM) based on the identified proteomic biomarkers and all measured proteins, respectively. The classifier’s performance in the discovery cohort was checked by 10-fold cross-validation. The trained classifier from the discovery cohort was directly validated on the validation cohort. We also compared the classifiers’ predictive performance for T2D with the identified proteomic biomarkers and traditional risk factors alone and their combination. Here, the traditional risk factors including age, sex, BMI, fasting glucose, systolic blood pressure, diastolic blood pressure, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, total cholesterol, and triglycerides. The area under the receiver operator characteristic curve was used to assess the discriminating power of the classifiers. R package pROC (25) was used for receiver operating characteristic curve analyses, and the method of DeLong was used to assess the difference between the classifier’s predictive performance.

To assess a combined impact of multiple identified proteomic biomarkers on T2D progression, we used the identified proteomic biomarkers to construct a protein risk score (PRS; supplemental materials). In both the discovery and external validation cohorts, we used a conditional logistic regression and GEE model to validate the association of PRS with incident T2D, adjusted for the same covariates as described for individual proteomic biomarker analysis.

Among another independent subcohort of non-T2D individuals who were T2D-free throughout the follow-up within the GNHS, we used a group-based trajectory modeling approach (26) to create trajectory groups based on glycemic trait data (fasting glucose, HbA1C, and Homeostasis Model Assessment of Insulin Resistance) across all study visits. Group-based trajectory model is a common explanatory modeling technique that allows researchers to identify groups of people who have similar characteristics; these were widely applied in clinical research (27, 28). We calculated the posterior predicted probability in each trajectory for each participant and assigned the participant into a trajectory group according to the probability. For each of the glycemic traits, the identified trajectory groups were further defined as a favored or unfavored group according to the glycemic status across the follow-up visits. Logistic regression was used to examine the association of the baseline PRS (per SD) with the trajectory groups of each glycemic trait, adjusted for the same covariates as described previously for conditional logistic regression analysis.

Multiplicative interactions of dietary and lifestyle factors with the PRS were evaluated using a logistic regression model, with adjustment for age, sex, BMI, waist circumference, hip circumference, waist-hip ratio, household income, marital status, self-reported educational level, total energy intake, and mutual adjustment for the other tested dietary or lifestyle factors. The tested dietary or lifestyle factors included vegetable, fruit, fish, dairy, red meat, processed meat, poultry, eggs, nuts, whole grain, physical activity, alcohol drinking, and smoking status. These dietary or lifestyle factors with a Pinteraction < 0.05 in the discovery cohort were selected for replication in the validation cohort. For the dietary or lifestyle factors that showed a significant interaction with PRS for T2D risk, we performed a subgroup analysis stratified by the median level of the factors.

Results

Participant Characteristics

For the included participants, baseline characteristics of the incident T2D cases and controls are described (Table 1). There were no demographic differences between the participants without and with repeated measurement of serum proteome in the discovery cohort (Supplementary Table 1) (19).

Table 1.

Baseline characteristics of the participants in the discovery cohort and external validation cohorta

Discovery cohortExternal validation cohortNon-T2D subcohort
Healthy controlsIncident T2D casesHealthy controlsIncident T2D cases
No. of participants2852853838376
Age, y59.2 (5.3)59.1 (5.5)70.5 (6.3)69.7 (6.2)58.2 (6.0)
Women, %176 (61.8%)183 (64.4%)30 (78.9%)31 (81.6%)280 (74.5%)
Married, %268 (94%)269 (94.4%)28 (73.7%)28 (73.7%)334 (88.8%)
Education, %
 Middle school or lower79 (27.7%)91 (31.9%)4 (10.5%)4 (10.5%)113 (30.1%)
 High school or professional college126 (44.2%)123 (43.2%)6 (15.8%)10 (26.3%)160 (42.6%)
 University80 (28.1%)71 (24.9%)28 (73.7%)24 (63.2%)103 (27.4%)
Income (Yuan/month/person), %
 ≤5007 (2.5%)10 (3.5%)003 (0.8%)
 501-150078 (27.4%)83 (29.1%)5 (13.2%)3 (7.9%)91 (24.2%)
 1501-3000140 (49.1%)154 (54.0%)8 (21%)11 (28.9%)239 (63.6%)
 >300060 (21.1%)38 (13.3%)25 (65.8%)24 (63.2%)43 (11.4%)
BMI, kg/m222.9 (3.1)24.9 (3.5)22.8 (3.0)24.1 (3.6)23.0 (2.8)
Waist circumference, cm81.4 (8.8)87.2 (9.3)85.7 (9.4)91.6 (6.9)82.9 (8.3)
Fasting glucose, mmol/L4.7 (0.6)5.3 (0.8)4.7 (0.6)5.4 (0.8)4.7 (0.6)
HDL-C, mmol/L1.4 (0.4)1.3 (0.3)1.4 (0.5)1.2 (0.3)1.5 (0.3)
LDL-C, mmol/L3.6 (0.9)3.6 (0.9)2.9 (0.8)2.8 (0.9)3.5 (0.8)
TC, mmol/L5.3 (1.0)5.4 (1.0)4.8 (1.1)5.1 (1.0)5.5 (0.9)
TG, mmol/L1.5 (0.9)1.9 (1.4)1.5 (0.8)1.5 (0.6)1.5 (1.3)
Sibling or parent having diabetes, %28 (9.8%)43 (15.1%)5 (13.2%)12 (31.6%)36 (9.6%)
Smoking status (yes)51 (17.9%)62 (21.8%)5 (13.5%)4 (10.5%)45 (12.0%)
Alcohol drinking (yes)22 (7.7%)17 (6.0%)1 (2.6%)4 (10.5%)22 (5.9%)
Physical activity, MET-h/day43.5 (16.3)42.1 (15.8)103.6 (60.9)95.4 (47.4)40.7 (13.8)
Discovery cohortExternal validation cohortNon-T2D subcohort
Healthy controlsIncident T2D casesHealthy controlsIncident T2D cases
No. of participants2852853838376
Age, y59.2 (5.3)59.1 (5.5)70.5 (6.3)69.7 (6.2)58.2 (6.0)
Women, %176 (61.8%)183 (64.4%)30 (78.9%)31 (81.6%)280 (74.5%)
Married, %268 (94%)269 (94.4%)28 (73.7%)28 (73.7%)334 (88.8%)
Education, %
 Middle school or lower79 (27.7%)91 (31.9%)4 (10.5%)4 (10.5%)113 (30.1%)
 High school or professional college126 (44.2%)123 (43.2%)6 (15.8%)10 (26.3%)160 (42.6%)
 University80 (28.1%)71 (24.9%)28 (73.7%)24 (63.2%)103 (27.4%)
Income (Yuan/month/person), %
 ≤5007 (2.5%)10 (3.5%)003 (0.8%)
 501-150078 (27.4%)83 (29.1%)5 (13.2%)3 (7.9%)91 (24.2%)
 1501-3000140 (49.1%)154 (54.0%)8 (21%)11 (28.9%)239 (63.6%)
 >300060 (21.1%)38 (13.3%)25 (65.8%)24 (63.2%)43 (11.4%)
BMI, kg/m222.9 (3.1)24.9 (3.5)22.8 (3.0)24.1 (3.6)23.0 (2.8)
Waist circumference, cm81.4 (8.8)87.2 (9.3)85.7 (9.4)91.6 (6.9)82.9 (8.3)
Fasting glucose, mmol/L4.7 (0.6)5.3 (0.8)4.7 (0.6)5.4 (0.8)4.7 (0.6)
HDL-C, mmol/L1.4 (0.4)1.3 (0.3)1.4 (0.5)1.2 (0.3)1.5 (0.3)
LDL-C, mmol/L3.6 (0.9)3.6 (0.9)2.9 (0.8)2.8 (0.9)3.5 (0.8)
TC, mmol/L5.3 (1.0)5.4 (1.0)4.8 (1.1)5.1 (1.0)5.5 (0.9)
TG, mmol/L1.5 (0.9)1.9 (1.4)1.5 (0.8)1.5 (0.6)1.5 (1.3)
Sibling or parent having diabetes, %28 (9.8%)43 (15.1%)5 (13.2%)12 (31.6%)36 (9.6%)
Smoking status (yes)51 (17.9%)62 (21.8%)5 (13.5%)4 (10.5%)45 (12.0%)
Alcohol drinking (yes)22 (7.7%)17 (6.0%)1 (2.6%)4 (10.5%)22 (5.9%)
Physical activity, MET-h/day43.5 (16.3)42.1 (15.8)103.6 (60.9)95.4 (47.4)40.7 (13.8)

Abbreviations: HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; MET, metabolic equivalent; T2D: type 2 diabetes; TC, total cholesterol; TG, triglycerides.

aData are presented as number of participants (%) or as mean (SD).

Table 1.

Baseline characteristics of the participants in the discovery cohort and external validation cohorta

Discovery cohortExternal validation cohortNon-T2D subcohort
Healthy controlsIncident T2D casesHealthy controlsIncident T2D cases
No. of participants2852853838376
Age, y59.2 (5.3)59.1 (5.5)70.5 (6.3)69.7 (6.2)58.2 (6.0)
Women, %176 (61.8%)183 (64.4%)30 (78.9%)31 (81.6%)280 (74.5%)
Married, %268 (94%)269 (94.4%)28 (73.7%)28 (73.7%)334 (88.8%)
Education, %
 Middle school or lower79 (27.7%)91 (31.9%)4 (10.5%)4 (10.5%)113 (30.1%)
 High school or professional college126 (44.2%)123 (43.2%)6 (15.8%)10 (26.3%)160 (42.6%)
 University80 (28.1%)71 (24.9%)28 (73.7%)24 (63.2%)103 (27.4%)
Income (Yuan/month/person), %
 ≤5007 (2.5%)10 (3.5%)003 (0.8%)
 501-150078 (27.4%)83 (29.1%)5 (13.2%)3 (7.9%)91 (24.2%)
 1501-3000140 (49.1%)154 (54.0%)8 (21%)11 (28.9%)239 (63.6%)
 >300060 (21.1%)38 (13.3%)25 (65.8%)24 (63.2%)43 (11.4%)
BMI, kg/m222.9 (3.1)24.9 (3.5)22.8 (3.0)24.1 (3.6)23.0 (2.8)
Waist circumference, cm81.4 (8.8)87.2 (9.3)85.7 (9.4)91.6 (6.9)82.9 (8.3)
Fasting glucose, mmol/L4.7 (0.6)5.3 (0.8)4.7 (0.6)5.4 (0.8)4.7 (0.6)
HDL-C, mmol/L1.4 (0.4)1.3 (0.3)1.4 (0.5)1.2 (0.3)1.5 (0.3)
LDL-C, mmol/L3.6 (0.9)3.6 (0.9)2.9 (0.8)2.8 (0.9)3.5 (0.8)
TC, mmol/L5.3 (1.0)5.4 (1.0)4.8 (1.1)5.1 (1.0)5.5 (0.9)
TG, mmol/L1.5 (0.9)1.9 (1.4)1.5 (0.8)1.5 (0.6)1.5 (1.3)
Sibling or parent having diabetes, %28 (9.8%)43 (15.1%)5 (13.2%)12 (31.6%)36 (9.6%)
Smoking status (yes)51 (17.9%)62 (21.8%)5 (13.5%)4 (10.5%)45 (12.0%)
Alcohol drinking (yes)22 (7.7%)17 (6.0%)1 (2.6%)4 (10.5%)22 (5.9%)
Physical activity, MET-h/day43.5 (16.3)42.1 (15.8)103.6 (60.9)95.4 (47.4)40.7 (13.8)
Discovery cohortExternal validation cohortNon-T2D subcohort
Healthy controlsIncident T2D casesHealthy controlsIncident T2D cases
No. of participants2852853838376
Age, y59.2 (5.3)59.1 (5.5)70.5 (6.3)69.7 (6.2)58.2 (6.0)
Women, %176 (61.8%)183 (64.4%)30 (78.9%)31 (81.6%)280 (74.5%)
Married, %268 (94%)269 (94.4%)28 (73.7%)28 (73.7%)334 (88.8%)
Education, %
 Middle school or lower79 (27.7%)91 (31.9%)4 (10.5%)4 (10.5%)113 (30.1%)
 High school or professional college126 (44.2%)123 (43.2%)6 (15.8%)10 (26.3%)160 (42.6%)
 University80 (28.1%)71 (24.9%)28 (73.7%)24 (63.2%)103 (27.4%)
Income (Yuan/month/person), %
 ≤5007 (2.5%)10 (3.5%)003 (0.8%)
 501-150078 (27.4%)83 (29.1%)5 (13.2%)3 (7.9%)91 (24.2%)
 1501-3000140 (49.1%)154 (54.0%)8 (21%)11 (28.9%)239 (63.6%)
 >300060 (21.1%)38 (13.3%)25 (65.8%)24 (63.2%)43 (11.4%)
BMI, kg/m222.9 (3.1)24.9 (3.5)22.8 (3.0)24.1 (3.6)23.0 (2.8)
Waist circumference, cm81.4 (8.8)87.2 (9.3)85.7 (9.4)91.6 (6.9)82.9 (8.3)
Fasting glucose, mmol/L4.7 (0.6)5.3 (0.8)4.7 (0.6)5.4 (0.8)4.7 (0.6)
HDL-C, mmol/L1.4 (0.4)1.3 (0.3)1.4 (0.5)1.2 (0.3)1.5 (0.3)
LDL-C, mmol/L3.6 (0.9)3.6 (0.9)2.9 (0.8)2.8 (0.9)3.5 (0.8)
TC, mmol/L5.3 (1.0)5.4 (1.0)4.8 (1.1)5.1 (1.0)5.5 (0.9)
TG, mmol/L1.5 (0.9)1.9 (1.4)1.5 (0.8)1.5 (0.6)1.5 (1.3)
Sibling or parent having diabetes, %28 (9.8%)43 (15.1%)5 (13.2%)12 (31.6%)36 (9.6%)
Smoking status (yes)51 (17.9%)62 (21.8%)5 (13.5%)4 (10.5%)45 (12.0%)
Alcohol drinking (yes)22 (7.7%)17 (6.0%)1 (2.6%)4 (10.5%)22 (5.9%)
Physical activity, MET-h/day43.5 (16.3)42.1 (15.8)103.6 (60.9)95.4 (47.4)40.7 (13.8)

Abbreviations: HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; MET, metabolic equivalent; T2D: type 2 diabetes; TC, total cholesterol; TG, triglycerides.

aData are presented as number of participants (%) or as mean (SD).

Serum Proteomic Biomarkers Associated With Incident T2D

After filtering the protein groups, 1368 proteins were detected in our proteomics data sets. There were 392 MS-identified proteins that contributed to the overall model predictions. The distribution of missing rates of these 1368 and 392 proteins are listed in Supplementary Figure 2A (19). The top 10 ranked predictive proteins were: SHBG, Cullin-associated NEDD8-dissociated protein 1 (CAND1), apolipoprotein F (APOF), alpha-1-acid glycoprotein 1, L-selectin (SELL), vitamin K-dependent protein C, transport and Golgi organization protein 1 homolog (MIA3), focal adhesion kinase 1, complement factor H (CFH), and Ig heavy variable 1-2 (IGHV1-2); these were selected as the candidate biomarkers by our machine learning model. Conditional logistic regression analysis found that 7 (SHBG, CAND1, APOF, SELL, MIA3, CFH, IGHV1-2) of the top 10 ranked predictive proteins were associated with T2D risk in the discovery cohort (Fig. 2, P < 0.1). In the validation cohort, only CAND1 (P = 0.06) and SELL (P = 0.085) were associated T2D. Nevertheless, in the meta-analysis results of the discovery and validation cohorts, all 7 proteins were significantly associated with T2D (Fig. 2). In our cohorts, most of the identified T2D proteomic biomarkers have a relatively lower missing rate (Supplementary Figure 2B) (19). The number of peptides assigned to these proteomic biomarkers are 10, 20, 3, 5, 7, 112, and 3, respectively. We manually checked the peak groups for each of these 7 proteins in our wiff data by Skyline (Supplementary Figure 3) (19) as support for the identification of these proteins, which were retained as the final T2D biomarkers. We found that the SHBG, CAND1, APOF, and SELL showed an inverse association with T2D risk, whereas MIA3, CFH, and IGHV1-2 was positively associated (Fig. 2). Overall, SHBG, SELL, and APOF levels were higher in the healthy controls than in the T2D groups throughout follow-ups (Supplementary Figure 4) (19).

Association of the identified circulating proteomic biomarkers with incident type 2 diabetes (T2D). For each of the discovery cohort and the external validation cohorts, conditional logistic regression was used to estimate the odds ratio (OR) and 95% CI of T2D risk per SD change in each selected proteomic biomarker, adjusting for the age, sex, BMI, total energy intake, physical activity, alcohol drinking, smoking, household income, marital status, self-reported educational level, and technical confounding factors at the baseline.
Figure 2.

Association of the identified circulating proteomic biomarkers with incident type 2 diabetes (T2D). For each of the discovery cohort and the external validation cohorts, conditional logistic regression was used to estimate the odds ratio (OR) and 95% CI of T2D risk per SD change in each selected proteomic biomarker, adjusting for the age, sex, BMI, total energy intake, physical activity, alcohol drinking, smoking, household income, marital status, self-reported educational level, and technical confounding factors at the baseline.

For the T2D prediction analysis (Fig. 3), the classifier trained by the identified proteomic biomarkers obtained a similar predictive performance compared with that by all measured proteins (Fig. 3A, P = 0.70 in the discovery cohort; P = 0.90 in the validation cohort; DeLong test). Among the identified proteomic biomarkers, SHBG showed the strongest predictive power for T2D risk in the discovery cohort (Fig. 3B). However, because of the high missing rate (>90%) in the external validation cohort, we did not include the SHBG data for analysis in the validation cohort. Adding the 7 proteomic biomarkers to the classical T2D risk factors increased the area under the curve from 0.75 to 0.80 (P = 0.0033) in the discovery cohort and from 0.76 to 0.77 (P = 0.59) in the validation cohort.

Predictive performance analysis of the identified circulating proteomic biomarkers. (A) Comparison of the classifier predictive performance for incident type 2 diabetes (T2D) using 7 identified T2D proteomic biomarkers and all measured proteins. The area under the receiver operating curve (AUC) was used to evaluate the classifier’s performance. Classifier performance was checked by 10-fold cross-validation within the discovery cohort (n = 570) and was further directly tested in the external validation cohort (n = 76). The DeLong method was used to assess the differences between the classifier’s predictive performance. (B) T2D proteomic biomarkers’ relative contribution to T2D prediction. The Shapley Additive exPlanations method was used to estimate the relative contribution of each T2D proteomic biomarker to the overall predictions (the highest ranked protein was used as a reference). Bar colors indicate the direction of influence. Positive impact, protein up-regulated in incident T2D patients; otherwise, inverse impact.
Figure 3.

Predictive performance analysis of the identified circulating proteomic biomarkers. (A) Comparison of the classifier predictive performance for incident type 2 diabetes (T2D) using 7 identified T2D proteomic biomarkers and all measured proteins. The area under the receiver operating curve (AUC) was used to evaluate the classifier’s performance. Classifier performance was checked by 10-fold cross-validation within the discovery cohort (n = 570) and was further directly tested in the external validation cohort (n = 76). The DeLong method was used to assess the differences between the classifier’s predictive performance. (B) T2D proteomic biomarkers’ relative contribution to T2D prediction. The Shapley Additive exPlanations method was used to estimate the relative contribution of each T2D proteomic biomarker to the overall predictions (the highest ranked protein was used as a reference). Bar colors indicate the direction of influence. Positive impact, protein up-regulated in incident T2D patients; otherwise, inverse impact.

Association of PRS With Incident T2D and Trajectory of Glycemic Traits

Compared with the healthy controls, incident T2D cases had a higher PRS level across different visits in the 2 cohorts (Supplementary Figure 5) (19). Conditional logistic regression indicated that the baseline PRS (per SD increment) showed a positive association with incident T2D in the discovery cohort (odds ratio [OR] 1.89; 95% CI, 1.49-2.41) and external validation cohort (OR 4.42; 95% CI, 1.04-18.9) (Fig. 4A). After considering the repeated measures of circulating proteome in the GEE model, the PRS consistently associated with T2D risk, with PR per SD change in PRS 1.29 (95% CI, 1.08-1.54) and 1.84 (95% CI, 1.19-2.84) across the 2 cohorts (Fig. 4A).

Circulating protein risk score is associated with incident type 2 diabetes and longitudinal trajectories of glycemic traits. (A) Association of the PRS with the T2D risk in the discovery cohort (n = 570) and external validation cohort (n = 76). In the prospective analysis, conditional logistic regression was used to estimate the odds ratio and 95% CI of T2D per SD change in the baseline PRS, adjusting for potential confounders at the baseline. In the repeat measurement analysis, generalized estimating equation methodology was used to explore whether changes in the PRS (per SD) across all visits were associated with T2D, adjusting for potential confounders. (B) The longitudinal association between the baseline PRS and trajectories for the change of blood glycemic traits over time (n = 376). Logistic regression was used to examine the association of the baseline PRS (per SD) with the trajectory groups of each glycemic trait, adjusting for potential confounders at the baseline. HbA1c, glycated hemoglobin; HOMA-IR: Homeostasis Model Assessment of Insulin Resistance; PRS, protein risk score; T2D, type 2 diabetes.
Figure 4.

Circulating protein risk score is associated with incident type 2 diabetes and longitudinal trajectories of glycemic traits. (A) Association of the PRS with the T2D risk in the discovery cohort (n = 570) and external validation cohort (n = 76). In the prospective analysis, conditional logistic regression was used to estimate the odds ratio and 95% CI of T2D per SD change in the baseline PRS, adjusting for potential confounders at the baseline. In the repeat measurement analysis, generalized estimating equation methodology was used to explore whether changes in the PRS (per SD) across all visits were associated with T2D, adjusting for potential confounders. (B) The longitudinal association between the baseline PRS and trajectories for the change of blood glycemic traits over time (n = 376). Logistic regression was used to examine the association of the baseline PRS (per SD) with the trajectory groups of each glycemic trait, adjusting for potential confounders at the baseline. HbA1c, glycated hemoglobin; HOMA-IR: Homeostasis Model Assessment of Insulin Resistance; PRS, protein risk score; T2D, type 2 diabetes.

Three trajectory groups were identified for each of the glycemic traits (Supplementary Figure 6) (19). The 3 with a highest mean glycemic trait at each follow-up was defined as the unfavored group; the other groups were combined as the favored group. Overall, participants with a higher baseline PRS had a higher probability showing unfavored glycemic trait trajectory, with an OR per SD change in PRS 1.51 (95% CI, 1.07-2.11), 1.19 (95% CI, 0.94-1.52), and 1.76 (95% CI, 1.1-2.85) for fasting glucose, HbA1c, and the Homeostasis Model Assessment of Insulin Resistance, respectively (Fig. 4B).

Joint Effect of Red Meat and PRS on T2D Risk

We found that red meat intake, but not other dietary or lifestyle factors, showed a significant interaction with the PRS on T2D risk (P for interaction = 0.003, and 0.017 in the discovery and validation cohorts, respectively, Supplementary Table 2) (19). In the discovery cohort, participants with a higher PRS and higher red meat intake were related to a 106% (OR 2.06; 95% CI, 1.21-3.51) higher risk of T2D compared with the lower PRS and lower red meat intake (Table 2). A similar trend was observed in the external validation cohort (Table 2).

Table 2.

Joint effect of red meat intake and protein risk score on type 2 diabetesa

Discovery cohortValidation cohort
Lower protein risk scoreHigher protein risk scoreLower protein risk scoreHigher protein risk score
No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)
Lower red meat intake59/1411 (Reference)90/1441.67 (1.01-2.77)9/191 (Reference)10/191.56 (0.34-7.16)
Higher red meat intake46/1440.58 (0.35-1.01)90/1412.06 (1.21-3.51)11/192.57 (0.46-14.2)8/181.2 (0.26-5.52)
Discovery cohortValidation cohort
Lower protein risk scoreHigher protein risk scoreLower protein risk scoreHigher protein risk score
No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)
Lower red meat intake59/1411 (Reference)90/1441.67 (1.01-2.77)9/191 (Reference)10/191.56 (0.34-7.16)
Higher red meat intake46/1440.58 (0.35-1.01)90/1412.06 (1.21-3.51)11/192.57 (0.46-14.2)8/181.2 (0.26-5.52)

Abbreviations: BMI, body mass index; OR, odds ratio; PRS, protein risk score.

aLogistic regression was used to evaluate the joint effect of red meat intake and PRS on type 2 diabetes, with adjustment for age, sex, BMI, waist circumference, hip circumference, waist-hip ratio, household income, marital status, self-reported educational level, and mutual adjustment for the other tested dietary and lifestyle factors. Total red meat intake and PRS were categorized into 2 groups (higher and lower) by their median level, respectively.

Table 2.

Joint effect of red meat intake and protein risk score on type 2 diabetesa

Discovery cohortValidation cohort
Lower protein risk scoreHigher protein risk scoreLower protein risk scoreHigher protein risk score
No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)
Lower red meat intake59/1411 (Reference)90/1441.67 (1.01-2.77)9/191 (Reference)10/191.56 (0.34-7.16)
Higher red meat intake46/1440.58 (0.35-1.01)90/1412.06 (1.21-3.51)11/192.57 (0.46-14.2)8/181.2 (0.26-5.52)
Discovery cohortValidation cohort
Lower protein risk scoreHigher protein risk scoreLower protein risk scoreHigher protein risk score
No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)No. of cases/total no.OR (95% CI)
Lower red meat intake59/1411 (Reference)90/1441.67 (1.01-2.77)9/191 (Reference)10/191.56 (0.34-7.16)
Higher red meat intake46/1440.58 (0.35-1.01)90/1412.06 (1.21-3.51)11/192.57 (0.46-14.2)8/181.2 (0.26-5.52)

Abbreviations: BMI, body mass index; OR, odds ratio; PRS, protein risk score.

aLogistic regression was used to evaluate the joint effect of red meat intake and PRS on type 2 diabetes, with adjustment for age, sex, BMI, waist circumference, hip circumference, waist-hip ratio, household income, marital status, self-reported educational level, and mutual adjustment for the other tested dietary and lifestyle factors. Total red meat intake and PRS were categorized into 2 groups (higher and lower) by their median level, respectively.

Discussion

In the present study, we identified 7 proteomic biomarkers (SHBG, CAND1, APOF, SELL, MIA3, CFH, IGHV1-2) associated with the risk of incident T2D. We derived and validated the PRS for T2D based on the 7 proteomic biomarkers. Among another independent subcohort of non-T2D participants, we further demonstrated that participants with a higher PRS had a higher probability of showing the unfavored glycemic trait trajectory. Finally, our result revealed that red meat intake and PRS showed a synergistical interaction on T2D risk.

Among the 7 identified T2D proteomic biomarkers, 4 are in the first reported proteins (CAND1, SELL, MIA3, IGHV1-2) related to T2D risk. CAND1 is the key assembly factor of the ubiquitin-proteasome system (29). The ubiquitin-proteasome system is being recognized as a potential therapeutic target because of emerging evidence showing its link with various human diseases, including cancer, diabetes, and inflammation (30, 31). SELL is a cell adhesion molecule that plays a key role in the initiation of leukocyte migration from blood vessels to sites of local inflammation and has been demonstrated as a useful biomarker of type 1 diabetes (32). MIA3 plays pivotal roles in the secretory pathway (33). A recent study has discovered a correlation between MIA3 phosphorylation and proinsulin trafficking in mouse pancreatic β cells (34). IGHV1-2 is the V region of the variable domain of immunoglobulin heavy chains that participate in antigen recognition (35). These identified proteins may suggest new T2D preventive/treatment targets, although the detailed mechanisms and causality have yet to be investigated.

We confirmed several T2D-related proteins (SHBG, APOF, and CFH) that have been reported previously (7, 12, 14, 36). Our results about SHBG were consistent with results from several epidemiologic studies (7, 12, 36) and Mendelian randomization study (37), which reported an inverse association between circulating SHBG and T2D. APOF plays an important role in high-density lipoprotein metabolism and reverses cholesterol transport (38). A previous human study demonstrated that a reduction in the abundance of APOF was associated with a higher T2D risk (12). As one of the complement factors, CFH has been observed in the vitreous of patients with diabetic retinopathy (39). A recent study reported that CFH was positively associated with incident T2D (14).

There are several prior cohort studies that reported the prospective association of proteomic biomarkers with incident T2D using high-throughput proteomic technologies (10-14). Of note, all these prior studies used proteome data at single time point (ie, baseline), although it is well recognized that repeated measurement at multiple time points is important for biomarker discovery and validation. Therefore, the present study provided a unique resource for investigating the relationship between blood proteome and diabetes risk.

We demonstrated the longitudinal association of baseline proteomic biomarkers with changing trajectories of several glycemic traits, showing the influence of the proteomic biomarkers on future glycemic traits among non-T2D individuals, which supported the results from the present 2 nested case-control studies. Previous studies have found that dietary intake of red meat and its major components, including saturated fat, cholesterol, animal protein, and heme iron were associated with higher insulin resistance and T2D risk (40, 41). Our study further found that the higher intake of red meat may synergistically interact with the proteomic biomarkers to exaggerate the risk of T2D. Taken together, our present results identified a panel of proteomic biomarkers associated with incident T2D and suggest the synergistical interaction of these proteins with red meat intake on T2D risk.

This study has several strengths. First, this is the first untargeted MS-based proteomics biomarker study in the field of incident T2D research. Second, repeated measures of serum proteomics data reduce the potential bias derived from the proteins’ change over time and allow the assessment of cumulative effects in relation to the T2D risk, which has rarely been achieved in prior studies (10-14). Third, we validate PRS-T2D association in an independent nested case-control study and further validate the association of the PRS with longitudinal change of the glycemic traits in another cohort of non-T2D individuals. These validations confirm the robustness of our present findings. Finally, we found that red meat intake and PRS showed a synergistical interaction on T2D risk. From a public health point of view, our results add further evidence that limiting red meat intake is beneficial for the prevention of T2D.

A major limitation of the present study is that sample size of the validation cohort was relatively small. Although we could replicate the results from the PRS, results from individual proteins (eg, SHBG) and interaction analysis could not be perfectly replicated probably because of small sample size and heterogeneity of the population. Specifically, the levels of SHBG were determined by many factors, such as genetics, physical activity, and demographic factors (42, 43). In our study, compared with the discovery cohort, the majority of the participants in the validation cohort were women, older, and had a larger waist circumference. These heterogeneities may lead to difference in the distribution of SHBG between the discovery and validation cohorts. Another limitation is that we did not perform an oral glucose tolerance test on participants, and so may have underestimated the number of T2D cases. In addition, all participants included in the present study are Han Chinese; therefore, caution should be taken in extrapolating our findings to other ethnic groups. Finally, residual confounders in our statistical models may still exist because of the observational nature of the present study and the causality of the protein–T2D association could not be established at this stage.

Conclusions

In conclusion, we discovered and validated 7 proteomic biomarkers associated with incident T2D, providing complementary information to improve understanding of the pathophysiology of T2D. The higher intake of red meat may synergistically interact with the proteomic biomarkers to amply the T2D risk. More investigations are needed to further replicate our present findings and reveal the detailed mechanism.

Abbreviations

    Abbreviations
     
  • APOF

    apolipoprotein F

  •  
  • BMI

    body mass index

  •  
  • CAND1

    Cullin-associated NEDD8-dissociated protein 1

  •  
  • CFH

    complement factor H

  •  
  • GEE

    generalized estimating equation

  •  
  • GNHS

    Guangzhou Nutrition and Health Study

  •  
  • HbA1c

    glycated hemoglobin

  •  
  • IGHV1-2

    MIA3, Golgi organization protein 1 homolog

  •  
  • MS

    mass spectrometry

  •  
  • OR

    odds ratio

  •  
  • PRS

    protein risk score

  •  
  • SELL

    L-selectin

  •  
  • T2D

    type 2 diabetes

Acknowledgements

We thank all the participants of the cohorts for contributing stool samples and phenotypes. We thank Westlake University High-Performance Computing Center for data storage and computation and thank for the support by Westlake Education Foundation.

Funding

This study was funded by the National Natural Science Foundation of China (82073529, 81903316, 81773416), Westlake Multidisciplinary Research Initiative Center (MRIC20200301), Zhejiang Ten-thousand Talents Program (2019R52039), and the 5010 Program for Clinical Research (2007032) of the Sun Yat-sen University. The funder had no role in study design, data collection and analysis, decision to publish, or writing of the manuscript.

Author Contributions

J.S.Z., Y.M.C., and T.N.G. designed research; W.L.G., L.Y., and X.C. conducted research; W.L.G. analyzed data; Z.W., Z.L.M., Y.Q.F., X.Y.T., Y.Y.W., M.L.S., Z.L.J., J.L.W., Y.Y.T., C.M..X, H.C., and N.X. provided essential reagents or provided essential materials; W.L.G. and J.S.Z. wrote the manuscript; all the authors were involved in revising the manuscript; and all authors read and approved the final version.

Conflict of Interest

The authors declare no competing financial interests.

Ethics Approval

This study was approved by the Ethics Committee of the School of Public Health at Sun Yat-sen University and Ethics Committee of Westlake University. All participants gave written informed consent.

Data Availability

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the iProX partner repository with the dataset identifier PXD019675. All data supporting the conclusions of the article are presented in the main text and Supplementary Material (19). Please ensure that this citation includes a working link to the data repository.

References

1.

Saeedi
P
,
Petersohn
I
,
Salpea
P
, et al.
Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas, 9(th) edition
.
Diabetes Res Clin Pract.
2019
;
157
:
107843
.

2.

Chen
Z-Z
,
Gerszten
RE
.
Metabolomics and proteomics in type 2 diabetes
.
Circ Res.
2020
;
126
(
11
):
1613
-
1627
.

3.

Williams
SA
,
Kivimaki
M
,
Langenberg
C
, et al.
Plasma protein patterns as comprehensive indicators of health
.
Nat Med.
2019
;
25
(
12
):
1851
-
1857
.

4.

Santos
R
,
Ursu
O
,
Gaulton
A
, et al.
A comprehensive map of molecular drug targets
.
Nat Rev Drug Discovery.
2017
;
16
(
1
):
19
-
34
.

5.

Kollerits
B
,
Lamina
C
,
Huth
C
, et al.
Plasma concentrations of afamin are associated with prevalent and incident type 2 diabetes: a pooled analysis in more than 20,000 individuals
.
Diabetes Care.
2017
;
40
(
10
):
1386
-
1393
.

6.

Sun
Q
,
van Dam
RM
,
Meigs
JB
,
Franco
OH
,
Mantzoros
CS
,
Hu
FB
.
Leptin and soluble leptin receptor levels in plasma and risk of type 2 diabetes in U.S. women: a prospective study
.
Diabetes.
2010
;
59
(
3
):
611
-
618
.

7.

Ding
EL
,
Song
Y
,
Manson
JE
, et al.
Sex hormone-binding globulin and risk of type 2 diabetes in women and men
.
N Engl J Med.
2009
;
361
(
12
):
1152
-
1163
.

8.

Spranger
J
,
Kroke
A
,
Möhlig
M
, et al.
Adiponectin and protection against type 2 diabetes mellitus
.
Lancet
2003
;
361
(
9353
):
226
-
228
.

9.

Aebersold
R
,
Mann
M
.
Mass-spectrometric exploration of proteome structure and function
.
Nature
2016
;
537
(
7620
):
347
-
355
.

10.

Huth
C
,
von Toerne
C
,
Schederecker
F
, et al.
Protein markers and risk of type 2 diabetes and prediabetes: a targeted proteomics approach in the KORA F4/FF4 study
.
Eur J Epidemiol.
2019
;
34
(
4
):
409
-
422
.

11.

Molvin
J
,
Pareek
M
,
Jujic
A
, et al.
Using a targeted proteomics chip to explore pathophysiological pathways for incident diabetes- the Malmö preventive project
.
Sci Rep.
2019
;
9
(
1
):
272
.

12.

Gudmundsdottir
V
,
Emilsson
V
,
Aspelund
T
, et al.
Circulating protein signatures and causal candidates for type 2 diabetes
.
Diabetes.
2020
;
69
(
8
):
1843
-
1853
.

13.

Nowak
C
,
Sundström
J
,
Gustafsson
S
, et al.
Protein biomarkers for insulin resistance and type 2 diabetes risk in two large community cohorts
.
Diabetes.
2016
;
65
(
1
):
276
-
284
.

14.

Ngo
D
,
Benson
MD
,
Long
JZ
, et al.
Proteomic profiling reveals biomarkers and pathways in type 2 diabetes risk
.
JCI insight
2021
;
6
(
5
):e144392.

15.

Lehallier
B
,
Gate
D
,
Schaum
N
, et al.
Undulating changes in human plasma proteome profiles across the lifespan
.
Nat Med.
2019
;
25
(
12
):
1843
-
1850
.

16.

Lundberg
SM
,
Nair
B
,
Vavilala
MS
, et al.
Explainable machine-learning predictions for the prevention of hypoxaemia during surgery
.
Nat Biomed Eng.
2018
;
2
(
10
):
749
-
760
.

17.

Artzi
NS
,
Shilo
S
,
Hadar
E
, et al.
Prediction of gestational diabetes based on nationwide electronic health records
.
Nat Med.
2020
;
26
(
1
):
71
-
76
.

18.

American Diabetes Association
.
Diagnosis and classification of diabetes mellitus
.
Diabetes Care.
2013
;
36
(
Supplement 1
):
S67
.

19.

Gou
WL
,
Yue
L
,
Tang
XY
, et al.
Data from: Supplementary materials for J Clin Endocrinol Metab about circulating proteome and progression of type 2 diabetes. figshare 2021
. Deposited 10 November 2021. http://doi.org/10.6084/m9.figshare.13120295

20.

Zhang
ZQ
,
He
LP
,
Liu
YH
,
Liu
J
,
Su
YX
,
Chen
YM
.
Association between dietary intake of flavonoid and bone mineral density in middle aged and elderly Chinese women and men
.
Osteoporos Int.
2014
;
25
(
10
):
2417
-
2425
.

21.

Fan
F
,
Xue
W-Q
,
Wu
B-H
, et al.
Higher fish intake is associated with a lower risk of hip fractures in Chinese men and women: a matched case-control study
.
PLoS One.
2013
;
8
(
2
):
e56849
.

22.

Pedregosa
F
,
Varoquaux
G
,
Gramfort
A
, et al.
Scikit-learn: machine learning in python
.
J Mac Learn Res.
2011
;
12
:
2825
-
2830
.

23.

Ke
G
,
Meng
Q
,
Finley
T
, et al.
LightGBM: a highly efficient gradient boosting decision tree
.
NIPS
.
2017
;
30
:
3146
-
3154
.

24.

Lundberg
S
,
Lee
S-I
.
A unified approach to interpreting model predictions
.
NIPS.
2017
:
4768
-
4777
. https://papers.nips.cc/paper/7062-a-unifiedapproachto-interpretingmodel-predictions

25.

Robin
X
,
Turck
N
,
Hainard
A
, et al.
pROC: an open-source package for R and S + to analyze and compare ROC curves
.
BMC Bioinf.
2011
;
12
(
1
):
77
.

26.

Andruff
H
,
Carraro
N
,
Thompson
A
,
Gaudreau
P
.
Latent class growth modelling: a tutorial
.
Tutor Quant Methods Psychol.
2009
;
5
(
1
):
11
-
24
.

27.

Nagin
DS
,
Odgers
CL
.
Group-based trajectory modeling in clinical research
.
Annu Rev Clin Psychol.
2010
;
6
:
109
-
138
.

28.

Mori
M
,
Krumholz
HM
,
Allore
HG
.
Using latent class analysis to identify hidden clinical phenotypes
.
JAMA
2020
;
324
(
7
):
700
-
701
.

29.

Bulatov
E
,
Ciulli
A
.
Targeting Cullin-RING E3 ubiquitin ligases for drug discovery: structure, assembly and small-molecule modulation
.
Biochem J.
2015
;
467
(
3
):
365
-
386
.

30.

Bedford
L
,
Lowe
J
,
Dick
LR
,
Mayer
RJ
,
Brownell
JE
.
Ubiquitin-like protein conjugation and the ubiquitin-proteasome system as drug targets
.
Nat Rev Drug Discovery.
2011
;
10
(
1
):
29
-
46
.

31.

Nalepa
G
,
Rolfe
M
,
Harper
JW
.
Drug discovery in the ubiquitin-proteasome system
.
Nat Rev Drug Discovery.
2006
;
5
(
7
):
596
-
613
.

32.

Kretowski
A
,
Gillespie
KM
,
Bingley
PJ
,
Kinalska
I
.
Soluble L-selectin levels in type I diabetes mellitus: a surrogate marker for disease activity?
Immunology.
2000
;
99
(
2
):
320
-
325
.

33.

Saito
K
,
Chen
M
,
Bard
F
, et al.
TANGO1 facilitates cargo loading at endoplasmic reticulum exit sites
.
Cell.
2009
;
136
(
5
):
891
-
902
.

34.

Kang
T
,
Boland
BB
,
Alarcon
C
,
Grimsby
JS
,
Rhodes
CJ
,
Larsen
MR
.
Proteomic analysis of restored insulin production and trafficking in obese diabetic mouse pancreatic islets following euglycemia
.
J Proteome Res.
2019
;
18
(
9
):
3245
-
3258
.

35.

Lefranc
M-P
.
Immunoglobulin and T cell receptor genes: IMGT(®) and the birth and rise of immunoinformatics
.
Front Immunol.
2014
;
5
:
22
.

36.

Le
TN
,
Nestler
JE
,
Strauss
JF
3rd
,
Wickham
EP
3rd
.
Sex hormone-binding globulin and type 2 diabetes mellitus
.
Trends Endocrinol Metab.
2012
;
23
(
1
):
32
-
40
.

37.

Sinnott-Armstrong
N
,
Tanigawa
Y
,
Amar
D
, et al.
Genetics of 35 blood and urine biomarkers in the UK Biobank
.
Nat Genet.
2021
;
53
(
2
):
185
-
194
.

38.

Lagor
WR
,
Brown
RJ
,
Toh
S-A
, et al.
Overexpression of apolipoprotein F reduces HDL cholesterol levels in vivo
.
Arterioscler Thromb Vasc Biol.
2009
;
29
(
1
):
40
-
46
.

39.

Wang
J
,
Yang
MM
,
Li
YB
,
Liu
GD
,
Teng
Y
,
Liu
XM
.
Association of CFH and CFB gene polymorphisms with retinopathy in type 2 diabetic patients
.
Mediators Inflamm.
2013
;
2013
:
748435
.

40.

Pan
A
,
Sun
Q
,
Bernstein
AM
,
Manson
JE
,
Willett
WC
,
Hu
FB
.
Changes in red meat consumption and subsequent risk of type 2 diabetes mellitus: three cohorts of US men and women
.
JAMA Int Med.
2013
;
173
(
14
):
1328
-
1335
.

41.

Aune
D
,
Ursin
G
,
Veierød
MB
.
Meat consumption and the risk of type 2 diabetes: a systematic review and meta-analysis of cohort studies
.
Diabetologia.
2009
;
52
(
11
):
2277
-
2287
.

42.

Tin Tin
S
,
Reeves
GK
,
Key
TJ
.
Body size and composition, physical activity and sedentary time in relation to endogenous hormones in premenopausal and postmenopausal women: findings from the UK Biobank
.
Int J Cancer.
2020
;
147
(
8
):
2101
-
2115
.

43.

Haiman
CA
,
Riley
SE
,
Freedman
ML
,
Setiawan
VW
,
Conti
DV
,
Le Marchand
L
.
Common genetic variation in the sex steroid hormone-binding globulin (SHBG) gene and circulating shbg levels among postmenopausal women: the Multiethnic Cohort
.
J Clin Endocrinol Metab.
2005
;
90
(
4
):
2198
-
2204
.

Author notes

These authors contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)