Screening for obstructive sleep apnea in patients with cancer — a machine learning approach

Abstract Background Obstructive sleep apnea (OSA) is a highly prevalent sleep disorder associated with daytime sleepiness, fatigue, and increased all-cause mortality risk in patients with cancer. Existing screening tools for OSA do not account for the interaction of cancer-related features that may increase OSA risk. Study Design and Methods This is a retrospective study of patients with cancer at a single tertiary cancer institution who underwent a home sleep apnea test (HSAT) to evaluate for OSA. Unsupervised machine learning (ML) was used to reduce the dimensions and extract significant features associated with OSA. ML classifiers were applied to principal components and model hyperparameters were optimized using k-fold cross-validation. Training models for OSA were subsequently tested and compared with the STOP-Bang questionnaire on a prospective unseen test set of patients who underwent an HSAT. Results From a training dataset of 249 patients, kernel principal component analysis (PCA) extracted eight components through dimension reduction to explain the maximum variance with OSA at 98%. Predictors of OSA were smoking, asthma, chronic kidney disease, STOP-Bang score, race, diabetes, radiation to head/neck/thorax (RT-HNT), type of cancer, and cancer metastases. Of the ML models, PCA + RF had the highest sensitivity (96.8%), specificity (92.3%), negative predictive value (92%), F1 score (0.93), and ROC-AUC score (0.88). The PCA + RF screening algorithm also performed better than the STOP-Bang questionnaire alone when tested on a prospective unseen test set. Conclusions The PCA + RF ML model had the highest accuracy in screening for OSA in patients with cancer. History of RT-HNT, cancer metastases, and type of cancer were identified as cancer-related risk factors for OSA.


Introduction
Obstructive sleep apnea (OSA) is a highly prevalent medical condition with nearly one billion people affected globally [1].In spite of its rising prevalence [2,3], up to 80% of patients with moderateto-severe OSA remain undiagnosed [4].OSA confers deleterious quality-of-life symptoms (e.g.daytime sleepiness and fatigue) [5,6] and is associated with numerous medical conditions [7,8], including cancer [9,10].OSA is implicated in oncogenesis [11,12], tumor progression [13], and all-cause cancer mortality [14][15][16].Preclinical [17,18] and epidemiological studies [14] have proposed that chronic intermittent hypoxia (IH), a pathophysiologic feature of OSA, contributes to tumor growth and metastasis.To date, clinical studies have been done to assess the prevalence of OSA in specific types of cancers [19,20], yet there has not been an established screening tool for OSA in patients with cancer.
OSA is characterized by recurrent upper airway (UA) collapse during sleep, resulting in reduction or cessation of airflow despite breathing efforts.Certain cancer treatments can affect the pharyngeal airway and contribute to the development of OSA.Radiation therapy to the head and neck can exacerbate UA narrowing through fibrosis, pharyngeal dilator dysfunction, and reduction in posterior genioglossus muscle tone [20][21][22].Hormonal therapy may also increase risk for OSA.Female hormones (e.g.estrogen and progesterone) increase UA muscle tonicity and stimulate ventilation [23,24].Aromatase inhibitors are a class of anticancer drugs that block estrogen production, thereby reducing the availability of estrogen to protect against UA collapse during sleep [24].
Current screening tools for OSA do not incorporate cancerspecific features that can potentially distort or worsen UA anatomy architecture [21,24].While the STOP-Bang questionnaire [25] is a commonly used screening questionnaire validated for the preoperative setting and sleep clinics, its reliability in patients with cancer may be different than those in the general population [26,27].For instance, weight and age have variable prevalence in patients with cancer.Thus, the accuracy of the STOP-Bang questionnaire in the population with cancer is not only unknown but may be less relevant.
Identifying a screening tool and predictors for OSA in patients with cancer could be understood using p-values and Gaussian distributions.However, in this era of precision medicine, real-life populations often do not fit a linear structure.Machine learning (ML) algorithms enable analysis of heterogeneous conditions without imposing restrictions on the study population [28].ML is highly effective in disease classification and determining predictors of disease [29].The objectives of this study were to develop an accurate screening tool for OSA among patients with cancer using ML and to identify the most significant features predicting OSA in this population.

Study participants and design
This was a retrospective study of adult patients at Memorial Sloan Kettering Cancer Center who completed a home sleep apnea test (HSAT) from February 25, 2019 to February 15, 2021.All HSAT referrals were reviewed for appropriateness according to the American Academy of Sleep Medicine practice guidelines [30] by a board-certified sleep physician.This study was approved by the Memorial Sloan Kettering Cancer Center Institutional Review Board and complies with the Health Insurance Portability and Accountability Act.
Baseline demographics and clinical characteristics from the date of HSAT were electronically extracted from the institutional database.Types of cancer and comorbid conditions were obtained using international classification of diseases-9/international classification of diseases-10 codes.Chart review was performed by a physician to verify the type of cancer, history of radiation to the head/neck/thorax (RT-HNT), and aromatase inhibitor administration.
The Alice Night One (Phillips, USA) is a portable, type 3 sleep study device used for OSA diagnosis.A sleep technologist or sleep physician demonstrated how to use the device in person.Patients were subsequently issued the same HSAT device for overnight use in their homes.
All portable sleep studies were scored by a board-certified sleep physician using the Center for Medicare and Medicaid Services (CMS) scoring criteria [32].The CMS scoring criteria were used because it is universally accepted by insurance payors.An apnea was defined as ≥90% reduction in airflow lasting > 10 seconds.A hypopnea was scored if there was a partial (≥30%) reduction in airflow for > 10 seconds accompanied by a 4% arterial oxygen desaturation from baseline.OSA severity was calculated using the respiratory event index (REI), defined as the total number of hypopneas and apneas scored divided by monitoring time (MT).MT was defined as total recording time minus periods of artifacts, as determined by lack of signal from plethysmography.Post-study questionnaires were completed by patients to record their perception of estimated total sleep time (TST) and compared with HSAT's estimated total MT.Patients were asked to repeat HSAT or pursue in-laboratory polysomnogram if the discrepancy between estimated TST and estimated total MT was greater than 1 hour.The REI correlates well with the apnea-hypopnea index (AHI) but may be lower since the denominator (i.e.MT) is larger than TST [33].For simplicity and standard convention, "REI" was used as a surrogate for AHI and will be referred to as "AHI" in this paper.

Statistical analysis
The chi-squared and Fisher's exact tests were used to determine statistical significance of the categorical variables.The Shapiro-Wilk test was performed to check for the normality in data distribution.The ANOVA and Kruskal-Wallis tests were conducted to examine the statistical significance between the continuous variables and categorical variables.The Spearman correlation test was conducted on the continuous variables.A correlation heatmap was plotted to color code the correlation coefficients and examine the positive and negative correlation between the continuous variables in our dataset.The p-values and population proportion with a 95% confidence interval (CI) were computed to evaluate the statistical association between OSA and other variables.p < 0.05 was considered statistically significant.

ML approach
ML has been utilized in medicine to predict two categories of clinical outcomes using binary classifiers [34].[35], In this study, we conducted the following steps to predict "OSA" or "no OSA" in patients with cancer.Programming for statistical testing and ML was performed in Python (version 3.4, with scikit-learn package).

Assessing linearity of features.
Baseline demographics and clinical characteristics (Table 1) were used as features for analysis.The linearity of the features was tested using linear regression and visually plotted to confirm the presence of nonlinear relationships.

Feature extraction using dimensionality reduction of variables.
In clinical datasets, multiple features and a high number of categorical variables can give rise to multicollinearity and the "curse of dimensionality" [36]; [37], when there are more features existing in higher dimensions, the amount of samples in each dimension decreases.Thus, more samples are required in each dimension for numerical analysis and ML.Statistical testing is challenging when evaluating sparse samples in higher dimensions.
In regression and classification experiments, feature selection and elimination is the first step [38].Kernel principal component analysis (PCA), an advanced unsupervised ML method, was applied to extract the most significant features from the data and reduce the dimensions of the feature space into a non-linear subspace [39].This technique eliminates insignificant features and determines the significant features ("principal components") that explain the maximum variance with our target variable, OSA.

Classification of OSA.
Linear (logistic regression, LR) and non-linear (k-nearest neighbors, KNN; random forest, RF) ML classifiers were applied to the principal components following kernel PCA.The classifiers operated on the principal components to predict whether a patient is at risk of OSA.
Training and hyperparameter optimization.
The ML models chosen to perform best with hyperparameter optimization were trained on our dataset obtained from February 25, 2019 to February 15, 2021.We used 80% of the dataset to train the algorithms and the remaining 20% as the "validation set" based on standard convention [40].The model hyperparameters were optimized using nested cross-validation technique (k-fold, k = 5) with grid search method to fine-tune model performance and estimate performance on five different folds.This technique prevents overfitting and biased outcomes by selecting the model with the bestperforming parameters using a series of training, validation, and testing splits.The number of components and kernel type were optimized for PCA.The "linear" kernel was selected for the PCA + LR model; the "sigmoid" kernel was selected for the PCA + KNN and PCA + RF model.The optimum parameters were subsequently chosen based on the performance accuracy of OSA risk classification.
The diagnostic performance of the ML models and STOP-Bang questionnaire (as the only feature) were compared in the validation set.STOP-Bang analysis was performed using LR.

Prospective testing.
We tested our ML models on a prospective unseen test set of new patients from February 16, 2021 to June 30, 2021.Performance metrics of prediction accuracy and receiver operating characteristic-area under the curve (ROC-AUC) scores were obtained for each ML model and compared against the STOP-Bang questionnaire to evaluate the ability to screen for OSA.

Patient population
We evaluated a total of 340 patients who completed a sleep study at our institution from Feb  1).

Statistical analysis
AHI was found to have the highest positive correlation (correlation coefficient 0.46) with the STOP-Bang score and a negative correlation with O2 nadir (Supplementary eFigure S2).Distinct positive correlations were between STOP-Bang score to neck size and BMI to neck size (Supplementary eFigure S2).There was no significant difference found between subgroups with BMI < 30kg/ m 2 and BMI ≥30 kg/m 2 and Epworth Sleepiness Scale in terms of population proportion (95% CI) for risk of OSA (Table 1).

ML approach
Non-linearity of data.
We calculated the coefficient of determination (R-squared) to estimate the goodness of fit of a linear model with the dataset.A negative R-squared value (−2.42) was obtained from the linearity test on the dataset using linear regression, which indicated that the linear model could not fit the data optimally and fit worse than a horizontal hyperplane [41].Visual plotting confirmed the non-linear relationships between features (Figure 1A-D subplots).1 were the features evaluated for the ML model.The application of kernel PCA extracted eight principal components through dimension reduction into the non-linear subspace (Figure 1E, F subplots).These components explain the maximum expected variance with OSA at 98%.PCA analysis revealed that history of RT-HNT (0.66, PC5), eversmoker (0.62, PC7), asthma (0.61, PC6), CKD (0.61, PC8), cancer metastases (0.58, PC5), STOP-Bang score (0.57, PC3), race (0.481, PC1), type of cancer (0.459, PC2), and diabetes (0.49, PC6) established heavy loading in the 8 principal components.The coefficients of eigenvectors in the brackets describe the maximum loading of the feature for the specific principal component.

Classification of OSA.
PCA followed by classification with non-linear ML classifiers outperformed LR with the ROC-AUC being the highest for PCA + RF algorithm (0.88), followed by PCA + LR (0.80) and PCA + KNN (0.82).
Varying sample sizes from the validation set were used to evaluate the performance of the ML classifiers using true positive rate versus false positive rate (Figure 4).LR without kernel PCA performed better for smaller sizes of test samples (ROC-AUC scores of 0.81 and 0.88 for 10 and 20 samples, respectively) only; the ROC-AUC for LR without kernel PCA declined when additional sample sizes were added (i.e.0.77, 30 samples; 0.71, 50 samples).LR performance improved when combined with kernel PCA; ROC-AUC scores for 10, 20, 30, and 50 samples were 0.95, 0.96, 0.86, and 0.81, respectively.

Prospective unseen test set.
In assessing the performance of the ML models on our prospective unseen test set (n = 15), the PCA + RF model obtained the highest F1 score (0.82), sensitivity (81.7%), and specificity (81.3%) (Table 3).This was followed in performance by the PCA + KNN model, PCA + LR, and LR, respectively.The PCA + RF and PCA + KNN models outperformed the STOP-Bang questionnaire for all performance metrics (Table 3).

Discussion
This novel study utilized ML to develop a predictive screening tool for OSA in patients with cancer and identify predictors for OSA in this population.Characteristics unique to patients with cancer may contribute to OSA risk, including history of RT-HNT, type of cancer, and cancer metastases.Our ML screening tool incorporates these features to improve precision screening for OSA in patients with cancer.The PCA + RF ML model performed best at screening for OSA in patients with cancer compared to the STOP-Bang questionnaire, which is typically used for screening in the general population.
ML can facilitate improved clinical decision-making and precision medicine.Traditional statistical analyses operate on a best-fit model; however, real-life populations, especially cancer patients, are heterogeneous and non-linear.We were able to visualize evidence for this in our dataset (Figure 2, 2A-D subplots), which showed patient characteristics scattered in multiple dimensions.The different test samples (Figure 4) demonstrated that traditional LR sensitivity and specificity performance were improved with the addition of the unsupervised ML technique, PCA, for dimension reduction and feature extraction prior to classification.Consequently, we achieved a higher prediction accuracy for OSA using an ML model with PCA and non-linear classifiers compared to the linear classifier, LR.
Non-linear ML models perform better in real-life datasets because of no a priori statistical assumption of variables, an ability to train through continuous supervised or unsupervised learning mechanisms on intrinsic dimensions, and non-linear characteristics of variables, which leads to a reduction in biased outcomes [36].The high sensitivity, specificity, positive-negative Figure 1.Non-linearity of data.Subplots A-D show the features scattered in 3D space indicating non-linear relationships.(E) is a projection of the eight principal components that contribute to the maximum variance (98%) of the features with the target variable "OSA."(F) is a projection of the components on a non-linear subspace with reduced dimensionality that appears like clusters, upon which the ML classifiers were then applied for pattern recognition.predictive value, F1, and ROC-AUC scores of our ML algorithms cross-validated on the validation set and tested on the prospective unseen test set further confirms the reliability of our PCA + RF model as a robust screening tool for screening OSA in patients with cancer.

Predictors of OSA in patients with cancer.
Patient characteristics were used as input features to build our ML framework to screen for OSA in patients with cancer.The PCA + RF ML algorithm revealed the following features to be the most significant predictors of OSA: history of RT-HNT, eversmoker, asthma, CKD, cancer metastases, STOP-Bang score, race, diabetes, and type of cancer.Although the p-values for history of RT-HNT, smoking history, CKD, race, and type of cancer failed to reject the null hypothesis in our statistical analysis, the heavy loading of these variables with individual principal components in our kernel PCA signifies a strong relationship between the interaction of these features with the principal components.This is insightful because the p-value criterion significance > 0.05  cannot always signify the actual impact of a feature when dealing with smaller sample sizes.Moreover, it is not a definitive marker of a feature's "significance" or the probability of a null hypothesis being true, but rather how data are compatible with a null hypothesis.
Cancer-related predictors of OSA.
We observed that history of RT-HNT, cancer metastases, and type of cancer prove significant in the prediction of OSA in patients with cancer.It is well-established that craniofacial anatomy plays a large role in the development and severity of OSA; restriction of the UA by altered skeletal morphology or excessive soft tissue endangers airway patency and predisposes to airway collapse [42].Both early (i.e.edema) and sustained (i.e.deconditioned airway musculature, fibrosis, and stenosis) airway complications from RT-HNT are plausible mechanisms for increased susceptibility of OSA (Figure 5).IH, one of the physiologic consequences of OSA, may enhance metastatic potential, as supported by animal model studies [43,44].IH increases the expression of hypoxia-vascular endothelial growth factor-A; subsequent angiogenesis can support tumor proliferation and metastasis [44].IH also upregulates hypoxiainducible factor-1-α, activating the transcription of genes involved in angiogenesis, invasion, and metastasis [45].
The association between cancer type and OSA by our ML algorithm, namely lung, prostate, and hematologic cancers, is consistent with previous clinical studies [46,47].The increased incidence of hematologic malignancies with OSA was recently described in a South Korean population [48,49].One of the proposed theories for the relationship is the shared risk factor of obesity for OSA and hematologic malignancies [50].

Limitations
Our study consisted of 249 patients with cancer, which is to the best of our knowledge, the largest sample size to be screened for OSA using ML in patients with cancer.This study, however, has limitations.First, there was a lower number of patients with certain cancer-relevant features such as history of RT-HNT and specific subtypes of cancer, which may affect model performance.To address this limitation, our study used the 95% CI of population proportion to verify the similar prevalence of features across subgroups.All cancer-relevant features were also chart reviewed to ensure integrity of the data.Our dataset had no missing values, which increases the strength of our ML algorithm and reduces inaccuracy.Furthermore, our ML model was tested on a prospective unseen test set which confirmed exceptional performance accuracy.Future studies should incorporate a database of a more uniform distribution of cancerrelevant features to enhance model performance.
Second, OSA was established using HSATs which can underestimate OSA severity and potentially mislabel patients with OSA as not having OSA.This can, in turn, affect our ML training.To address the inaccuracies in HSAT reporting, HSAT post-study questionnaires with estimated time slept were corroborated with total recording time to improve HSAT reliability.We also used a binary classification of OSA (with or without) to train our screening model.Subsequent studies aiming to predict OSA severity can use larger sample sizes, including samples across all severity groups, and integration of polysomnogram data to train ML classifiers.
Third, we did not validate our ML model with an external test set.Although we used a prospective unseen test set for validation, external validation sets from different institutions would be warranted in future studies to ensure a similar discriminative ability of our model.

Conclusions
This is the first time to the best of our knowledge that an ML algorithm has been designed to screen for OSA in patients with cancer.ML identified features unique to patients with cancer, which are currently not included in traditional screening tools  for OSA.Our ML algorithm was based on 249 patients, devoid of null values, and reproducible in performance on unseen data.It incorporated features related to cancer and patient demographics (e.g.race) to increase precision for this population and reduce the probability of biased outcomes.Given the potential impact of OSA on morbidity and mortality in patients with cancer, it is important to screen for OSA in this group.Future direction will focus on expanding the sample size.If the ML algorithm performance remains robust, this ML model could serve as an efficient and cost-effective tool to accurately screen for OSA and improve access to care.Integration of this ML algorithm into an electronic health record system may facilitate screening of OSA and reduce health inequities.

Figure 2 .
Figure 2. The receiver operating characteristic curves for ML models.The receiver operating characteristic curves for prediction of OSA using the different model techniques on test sets with (A) 10 samples, (B) 20 samples, (C) 30 samples, and (D) 50 samples.The legend displays the ROC-AUC score for each model in the respective subplots.

Figure 3 .
Figure 3. Airflow limitation in OSA.(A) In normal sleep, UA patency is maintained.(B) OSA occurs when there is a narrowing of the UA space with airflow limitation during sleep due to interactions between unfavorable anatomic UA susceptibility and sleep-related changes in UA function.(C) Patients who receive radiation therapy to the head and neck may have subsequent anatomic UA modification (e.g.stenosis), which can also cause airflow limitation during sleep and result in nascent or worsening existing OSA.

Table 1 .
Participant Characteristics of Dataset Used for Statistical Analysis, Training, and Validating the ML Model.p-values Represent Single Chi-squared Tests Examining for a Relationship With OSA.95% Confidence Interval Pertains to the Number of People With OSA (Population Proportion) With Each Respective Participant Characteristic

Table 1 .
25, 2019 to February 15, 2021; 249 participants Continued Wong et al. | 5 fulfilled all inclusion and exclusion criteria (Supplementary eFigure S1).Table1summarizes the baseline characteristics for those found to have OSA (n = 205) and no OSA (n = 44) on HSAT in our training dataset.No statistical significance was found between the OSA group and the group without OSA in terms of variability of gender or race; 73% of the OSA group were male whereas 59% of the patients without OSA were male (Table

Table 2 .
Diagnostic Performance of ML Models and the STOP-Bang Questionnaire for Screening of Obstructive Sleep Apnea in Patients With Cancer From the Validation Set