Using machine learning to extract cognitive status from the sleep EEG in progressing stages of dementia: defining interpretable and age-related features

Emerging evidence supports the bidirectional nature between sleep and Alzheimer’s disease (AD), such that sleep disturbances occurring in cognitively normal older individuals may accelerate the accumulation of AD pathology, and at the same time, AD pathology can change underlying features of sleep [1, 2]. Much of this work has taken the approach of distilling the polysomnogram (PSG) to a few variables in order to identify relationships between sleep macroand microarchitecture features and clinical diagnoses and/or biomarkers of AD. An alternative approach is to mine the PSG data for a multitude of features and train machine learning algorithms to automatically classify disease states such as mild cognitive impairment (MCI) and dementia. In this issue of SLEEP, Ye et al. [3] train and characterize algorithms that use EEG-derived sleep macroand microarchitecture features from the PSG to predict clinical diagnoses of MCI and dementia. Among close to 11 000 clinical PSG’s to which the authors had access, approximately 1000 features were computed and used to train machine learning models such as random forest, support vector machine, and simple logistic regression on clinical diagnoses of dementia or MCI as subjectively defined from the electronic medical record. These features were comprised of standard sleep architecture measures (e.g. total sleep time and sleep onset latency), spectral features in specific frequency bands for each stage, and statistics of detected spindles, slow oscillations, and anatomical (across-lead) coherence. Overall performance of dementia classification was similar across the 3 model types with a max area under the ROC curve of 0.78, suggesting moderate fit. The concept of model interpretability goes beyond performance—it includes an understanding of why the model arrived at the features that led to the output discrimination [4]. A strength of this article is that the authors investigated the relevance of the input sleep features on dementia discrimination using an odds ratio with false discovery rate-controlled analysis akin to genomic feature selection in large-scale omics studies. These model interpretability analyses revealed spectral-specific differences such as abnormal theta (4–8 Hz) and delta (0.5–4 Hz) activity in W/ N1 stages or impaired spindles/slow oscillations in N2 sleep that were important for classifying dementia and MCI compared to cognitively normal. Features from the model interpretation by Ye et al. that were consistent with nonmachine learning studies of sleep include increased theta/alpha (8–12 Hz) power in W/N1 sleep in an AD case/control study [5]. Similarly, with regard to REM sleep, features of theta, sigma (11–15 Hz), and alpha spectral bands were top discriminative features for cognitively normal versus MCI and dementia, with REM sleep reduction and EEG slowing having been previously identified in patients with AD [6]. Another strength of this paper involves a clever approach to define the age-relationship of the top discriminative sleep features by taking a linear combination of disease class, age, and class times age interaction terms to explain each feature outcome. Here the intuition is that a sleep feature can change across age between cognitively normal and diseased classes in multiple ways. For example, N2 kurtosis, a statistical measure of distribution shape, can have similar slopes across disease classes but start out with different intercepts, while N1 theta/alpha ratio minimum can have entirely crossing slopes whose age-related relationship changes depending on the class membership at that age. While the authors’ machine learning approach to predict dementia stage can be done in an automated and large-scale fashion which could in theory be applied to sleep study data acquired globally, it is important to recognize that sleep EEG features that predict dementia status may or may not be features that translate to future “dementia risk” in cognitively normal older individuals. That said, it is reassuring that some features that distinguish dementia in their model, such as reduced spindle density, are features that are also predictive of tau load by fluid biomarkers, reflecting dementia risk, in cognitively normal older individuals [7]. The authors’ findings should also be viewed with certain methodological limitations in mind. They employed analysis

Emerging evidence supports the bidirectional nature between sleep and Alzheimer's disease (AD), such that sleep disturbances occurring in cognitively normal older individuals may accelerate the accumulation of AD pathology, and at the same time, AD pathology can change underlying features of sleep [1,2]. Much of this work has taken the approach of distilling the polysomnogram (PSG) to a few variables in order to identify relationships between sleep macro-and microarchitecture features and clinical diagnoses and/or biomarkers of AD. An alternative approach is to mine the PSG data for a multitude of features and train machine learning algorithms to automatically classify disease states such as mild cognitive impairment (MCI) and dementia.
In this issue of SLEEP, Ye et al. [3] train and characterize algorithms that use EEG-derived sleep macro-and microarchitecture features from the PSG to predict clinical diagnoses of MCI and dementia. Among close to 11 000 clinical PSG's to which the authors had access, approximately 1000 features were computed and used to train machine learning models such as random forest, support vector machine, and simple logistic regression on clinical diagnoses of dementia or MCI as subjectively defined from the electronic medical record. These features were comprised of standard sleep architecture measures (e.g. total sleep time and sleep onset latency), spectral features in specific frequency bands for each stage, and statistics of detected spindles, slow oscillations, and anatomical (across-lead) coherence. Overall performance of dementia classification was similar across the 3 model types with a max area under the ROC curve of 0.78, suggesting moderate fit. The concept of model interpretability goes beyond performance-it includes an understanding of why the model arrived at the features that led to the output discrimination [4]. A strength of this article is that the authors investigated the relevance of the input sleep features on dementia discrimination using an odds ratio with false discovery rate-controlled analysis akin to genomic feature selection in large-scale omics studies. These model interpretability analyses revealed spectral-specific differences such as abnormal theta (4-8 Hz) and delta (0.5-4 Hz) activity in W/ N1 stages or impaired spindles/slow oscillations in N2 sleep that were important for classifying dementia and MCI compared to cognitively normal. Features from the model interpretation by Ye et al. that were consistent with nonmachine learning studies of sleep include increased theta/alpha (8-12 Hz) power in W/N1 sleep in an AD case/control study [5]. Similarly, with regard to REM sleep, features of theta, sigma (11)(12)(13)(14)(15), and alpha spectral bands were top discriminative features for cognitively normal versus MCI and dementia, with REM sleep reduction and EEG slowing having been previously identified in patients with AD [6].
Another strength of this paper involves a clever approach to define the age-relationship of the top discriminative sleep features by taking a linear combination of disease class, age, and class times age interaction terms to explain each feature outcome. Here the intuition is that a sleep feature can change across age between cognitively normal and diseased classes in multiple ways. For example, N2 kurtosis, a statistical measure of distribution shape, can have similar slopes across disease classes but start out with different intercepts, while N1 theta/alpha ratio minimum can have entirely crossing slopes whose age-related relationship changes depending on the class membership at that age.
While the authors' machine learning approach to predict dementia stage can be done in an automated and large-scale fashion which could in theory be applied to sleep study data acquired globally, it is important to recognize that sleep EEG features that predict dementia status may or may not be features that translate to future "dementia risk" in cognitively normal older individuals. That said, it is reassuring that some features that distinguish dementia in their model, such as reduced spindle density, are features that are also predictive of tau load by fluid biomarkers, reflecting dementia risk, in cognitively normal older individuals [7].
The authors' findings should also be viewed with certain methodological limitations in mind. They employed analysis of clinical PSGs, which are performed for heterogeneous indications, including what is likely a majority of sleep apnea evaluations. Sleep apnea is highly prevalent in older adults [8] and may serve as a risk factor for dementia [9] through mechanisms such as hypoxemic burden [10] that may not have a clear EEG correlate. Additionally, the authors utilized both diagnostic and positive airways pressure (PAP) titration studies, potentially from the same individuals, in which marked changes to the EEG can exist as a function of sleep apnea treatment [11,12]. The authors acknowledge that the "dementia" and "MCI" clinical labels were determined through chart review, and in fact, could reflect disparate underlying neuropathology, for example synucleinopathies versus tauopathies, that could have a distinct impact on sleep.
Another methodological limitation is that the authors did not use an independently acquired PSG dataset to test the validity of their dementia classification models. Presumably, the use of over 1000 features may introduce issues of collinearity, particularly when restricting the machine learning model to a linear kernel. Relatedly, additional interpretability analyses of models that might one day be used clinically are of utmost importance and should proceed prior to models being used in clinical trials/ diagnoses. For example, machine learning models whose predictions are evaluated for feature reduction, rule extraction, or local explanation provide a reasonable tradeoff between "black box" decisions and clinical reasoning that includes types of patient interaction not easily stored in the electronic medical record [13]. Additionally, the use of probability values for cognitive status based on multiple tests of dementia risk assessment, rather than dichotomized as 0/1 as in the case of most machine learning models, would perhaps increase the utility of the proposed approach.
In summary, the study by Ye et al. takes steps of interpretability characterization that move our field one step closer to employing types of machine learning models in clinical sleep research, like identifying those who may respond to early interventions of amyloid or tau therapy. In order to progress to this lofty goal, it is likely that additional sleep data, such as respiratory, cardiovascular, and limb movements, all might contribute to both improving dementia discrimination while reinforcing existing evidence for sleep and its disorders in AD research.

Conflict of interest
AWV has served as a consultant for Jazz, Eisai, and Merck Pharmaceuticals in the last 3 years and receives grant support from the Merck Investigator Studies Program (MISP). AP receives grant support from Itamar Medical Ltd. All other authors declare no conflicts of interest.