Abstract

Sleep is a critical component of health and well-being but collecting and analyzing accurate longitudinal sleep data can be challenging, especially outside of laboratory settings. We propose a simple neural network model titled SOMNI (Sleep data restOration using Machine learning and Non-negative matrix factorIzation [NMF]) for imputing missing rest-activity data from actigraphy, which can enable clinicians to better handle missing data and monitor sleep–wake cycles of individuals with highly irregular sleep–wake patterns. The model consists of two hidden layers and uses NMF to capture hidden longitudinal sleep–wake patterns of individuals with disturbed sleep–wake cycles. Based on this, we develop two approaches: the individual approach imputes missing data based on the data from only one participant, while the global approach imputes missing data based on the data across multiple participants. Our models are tested with shift and non-shift workers' data from three independent hospitals. Both approaches can accurately impute missing data up to 24 hours of long dataset (>50 days) even for shift workers with extremely irregular sleep–wake patterns (AUC > 0.86). On the other hand, for short dataset (~15 days), only the global model is accurate (AUC > 0.77). Our approach can be used to help clinicians monitor sleep–wake cycles of patients with sleep disorders outside of laboratory settings without relying on sleep diaries, ultimately improving sleep health outcomes.

Statement of Significance

Wearable devices, such as actigraphy, have been considered as an alternative to sleep diaries by clinicians to monitor the sleep–wake pattern of patients over longitudinal periods. However, missing data limited the use of wearable devices in real-world clinical settings. Here, we propose a simple machine-learning model based on non-negative matrix factorization for the imputation of missing sleep data from wearables. We compared our model prediction against ground-truth sleep diaries of shift-working and non-shift-working populations recruited from three independent hospitals and accurately imputed missing sleep data in different real-world scenarios. This further facilitates the use of wearable devices in real-world clinical settings to improve sleep health outcomes.

Introduction

Sleep deprivation and inadequate sleep are leading threats to global public health, as more than 80% of the population live shiftwork-like lifestyles today [1, 2]. Importantly, irregular lifestyle affects more than 70 million individuals in the United States and is a leading factor in causing insomnia or excessive daytime sleepiness and may aggravate the cardiometabolic impact of obstructive sleep apnea [3–7]. To better treat patients with insomnia or excessive daytime sleepiness, a knowledge of one’s circadian rhythms and long-term sleep–wake patterns is necessary. Indeed, the third edition of the international classification of sleep disorders requires monitoring a patient’s sleep–wake pattern for at least 1 week, preferably 2 weeks, to diagnose circadian rhythm sleep–wake disorders [8]. Particularly, it is essential to accurately observe the sleep–wake cycle over a longitudinal period, especially in cases of patients suspicious of non-24-hours sleep–wake rhythm disorder (e.g. blind people with free-running patterns) or shift workers with circadian-misaligned sleep-work schedules to diagnose and provide the best treatments. Moreover, measuring the circadian rhythm and sleep–wake cycle can further open a new window to understand the impact of various factors such as environmental conditions, stress, social factors, and endogenous factors like aging and neurodegeneration on the human body [2, 9].

To enable monitoring of the circadian sleep–wake cycle over longitudinal periods, sleep diaries have been traditionally used by clinicians. However, sleep diaries are subjective in nature and may not always be reliable [10, 11]. In addition, sleep diaries are not feasible for cognitively impaired or vision-impaired individuals who cannot fill out diaries. Alternatively, wearable devices have been proposed as a growing platform because they enable noninvasive, real-time, and personalized monitoring of physiological measurements [12, 13]. Among different wearable devices, actigraphy is a validated tool to measure rest-activity patterns, objectively assessing a person’s sleep–wake pattern [14, 15]. Actigraphy shows sleep–wake state estimated from rest-activity measured using an accelerometer, although it may not always reflect a person’s actual sleep [16, 17]. Nevertheless, the noninvasive nature of wearable devices makes recording of the sleep–wake cycles over a longitudinal period less cumbersome for patients than filling out sleep diaries.

Despite their usefulness, wearable devices have several limitations. One of the major drawbacks is its dependence on the adherence of patients or participants to wear the device. In both clinical and research settings, it is not uncommon to encounter missing data from wearable devices, as individuals may forget to wear the device on their wrist after removing it for activities such as showering or other reasons. Missing data can pose a significant challenge to accurately measuring the sleep–wake patterns of individuals, particularly for shift workers with rotating schedules and vastly different sleep–wake patterns depending on their shifts and off-duty periods [18–20].

To handle missing data in various clinical settings, researchers have proposed various approaches. Some early works used various probabilistic and statistical frameworks to impute missing data arising in clinical settings. Specifically, multiple imputations (MI), which assume an underlying distribution of the missing variable and perform multiple random draws to calculate the imputation uncertainty, have been widely used by researchers to analyze missing data [21–23]. MI has been used to successfully impute missing data in various clinical settings; for instance, it has been used to accurately diagnose pulmonary embolism and deep venous thrombosis [24, 25].

Alternatively, various deep-learning approaches have been recently proposed due to an unprecedented advancement in artificial intelligence and machine learning [26–33]. For instance, Dong et al. reformulated the missing data imputation problem as a denoising problem and used Autoencoders (AE) to identify cardiovascular diseases among individuals with regular 24-hour lifestyles [28]. Jang et al. further used the AE framework to impute missing activity data from the actigraphy [30]. Other flavors of machine learning, such as convolutional neural networks, gradient boosting, and generative adversarial networks have also been proposed and were successfully applied to analyze missing 24 hours of sleep or activity data from wearables [27, 29, 31–33].

Despite these advances, there still exist limitations in adopting previous approaches to impute missing actigraphy sleep data from individuals with irregular sleep–wake schedules over a longitudinal period. For instance, MI computes the confidence interval for the imputation results, which does not naturally apply to the binary sleep–wake label collected from wearable devices. Moreover, other deep-learning approaches (e.g. Denoising AE, convolutional neural networks, and generative adversarial networks) may add unnecessary complexities and therefore highly complicated to be utilized by clinicians. Importantly, these approaches had not been validated against populations with rapidly rotating irregular shifts, whose sleep–wake pattern is far more unpredictable and difficult to impute. These limitations restrict the practical applicability of these algorithms in a clinical setting for monitoring the sleep–wake cycle of patients with irregular sleep–wake schedules or shift workers.

To address these limitations, we propose a simple neural network model titled SOMNI (Sleep data restOration using Machine learning and non-negative matrIx factorization [NMF]) for imputing missing actigraphy sleep data from individuals with highly irregular sleep–wake patterns. Specifically, our model consists of only two hidden layers and uses NMF to capture a hidden longitudinal sleep–wake pattern. We validated our model on actigraphy data collected from shift-working nurses. We similarly tested our algorithm on a shorter and independently collected actigraphy dataset of shift-working nurses. Finally, we demonstrate that our algorithm can be generalized to a population of clinically diagnosed sleep-disordered patients with relatively regular sleep–wake patterns. Our model can potentially help clinicians to better handle missing data from actigraphy and enable monitoring of the sleep–wake cycle outside the laboratory settings without relying on sleep diaries.

Methods

In this section, we describe the specific procedure and experimental protocol used for data collection. We also explain the structure of our neural network model, the input features used to construct the model, and calculating the optimal threshold from receiver operating characteristic (ROC) curves.

Dataset description and experimental protocol

Participants.

Nurses with rotating shift schedules were recruited as a part of the Shift-Work Sleep Intervention study (SWSI). Nurses working at the tertiary hospital with three rapidly rotating schedules were enrolled. Exclusion criteria included participants with neurologic or psychiatric disorders, with sleep disorders other than sleep problems related to shift workers, pregnancy, or planned to be pregnant. In addition, none of the participants were using hypnotics, antidepressants, other CNS-affecting or recreational drugs, or excessive amounts of alcohol. They were instructed not to drink alcohol during the study period. The three-shift schedules consist of early morning (7 am–3 pm), evening (3 pm–11 pm), and night shift (11 pm–7 am). Each shift usually lasts 2 to 4 days before the next shift, limiting the maximum number of night shifts in a row of up to 3 days. Thirty shift-work participants were finally considered for analysis (mean age: 28.66 ± 7.97 years; range: 23–52; sex: female 100%). To demonstrate the algorithm’s ability to generalize to non-shift working populations, data from 16 non-shiftwork patients (mean age: 31.31 ± 14.98 years, range: 15–60, 4 male) were also included in the analysis. Out of the 16 patients, 14 were assessed for insomnia, and two were assessed for hypersomnolence. Among the patients with insomnia, nine showed delayed sleep–wake phase disorder patterns. One patient with hypersomnolence was diagnosed with narcolepsy and the others with idiopathic hypersomnia. Moreover, a dataset consisting of nurses on rotating shifts at Samsung Medical Center (SMC) was also considered for this study. Nurses on rotating shifts at SMC (mean age 32.19 ± 4.37 years range, 25–41) who did not have a history of psychiatric illness or systemic disease and did not take CNS-affecting drugs including hypnotics, were recruited. The recruited participants were on fast-rotating 8 hours three-shift as SWSI participants. Seventeen participants whose actigraphy device did not malfunction and whose missing period of the collected data were no longer than 3 consecutive days were included in the dataset.

Datasets and procedure.

The datasets were collected from 2019 to 2020 at Dankook University Hospital in Cheonan and from 2020 to 2021 at Ewha Womans University Seoul Hospital in Seoul. The research was approved by the Internal Committee of ethical research of Ewha Womans University Seoul Hospital (EWH) and Dankook University Hospital (DKUH), conducted by the Declaration of Helsinki (approved number SEUMC 2020-06-002 for EWH and DKUH, 2017-05-012, respectively). Before participation, all participants gave informed written consent. The participants were instructed to wear actigraphy for the whole study period lasting 2 months except for during the shower or brief time at work if necessary, preferably on the non-dominant wrist of the participants, with the sensor facing the skin on the dorsal side of the wrist. In addition, they were instructed to fill out an everyday sleep diary, including their sleep–wake schedule, estimated sleep onset, total sleep time (TST), sleep latency, and wake time during sleep and naps. In the first month, they kept their habitual schedules according to their shift schedules, and in the second month, forced sleep intervention occurred following night shifts. Data from the first 30 days of the pre-intervention period were used for this research. Actigraphy with the solid-state piezoelectric accelerometer (Actiwatch2, Philips Respironics, Murrysville, PA) was used for 19 participants. For the rest of the participants (11 participants) and 16 non-shift worker patients, an actigraphy device with MEMS-type accelerometers (Actiwatch Spectrum Pro, Philips Respironics, Murrysville, PA) was used. Raw data were exported to ASCII text files which could be directly opened by Microsoft Excel by proprietary software Actiware (Philips Respironics, Murrysville, PA). Similarly, the light exposure and activity measurement from rotating nurses at the SMC was collected at a 2-minute interval over 13 days, using actigraphy devices with either solid-state piezoelectric accelerometer (Actiwatch2, Philips Respironics, Murrysville, PA, eight participants) or MEMS-type accelerometers (Actiwatch Spectrum Pro, Philips Respironics, Murrysville, PA, nine participants). The data were collected from May 24 to September 27, 2017. All participants from the SMC were instructed to wear the actigraphy throughout the study, except in inevitable circumstances such as while showering or swimming. The participants were also instructed to fill out a daily sleep diary to record the estimated timing of their sleep onset, sleep offset, TST, and time of removal of watch.

Inclusion and exclusion criteria.

In this study, two actigraphy datasets from shift-working nurses and one actigraphy dataset from non-shift workers were used. Because of potential risk that may arise from inconsistency between the reported sleep diary and actigraphy measurements, we checked whether sleep diary recording and actigraphy measurements from the recruited patients were well-matched. If the discrepancy greater than 30 minutes during non-missing period was greater than 5%, we excluded the data. The first shift worker dataset was from 30 nurses on rotating shifts recruited as part of the SWSI study and referred to as the SWSI dataset. We excluded two participants because the matching rate between sleep diary and actigraphy data were less than 95%. The second shift worker dataset was from 17 nurses on rotating shifts at the SMC and therefore named the SMC dataset throughout this study. Among the 17 participants, the data of the remaining eight were collected using Actiwatch 2 (Philips Respironics, Murrysville, PA, USA), which does not have the ability to measure off-wrist periods. Therefore, the activity data for these eight participants were continuously marked as 0 even though they were not marked as sleep. This improper labeling could lead to inaccurate results in imputation, thus, we have excluded these eight participants. The actigraphy data from 16 non-shift working patients recruited from EWH was also considered and named the EWH dataset throughout this study. All 16 participants in the EWH dataset were used for the analysis.

Missing sleep data from wearables in real-world settings.

The actigraphy data from all participants in the SWSI dataset had missing data, during which the ground-truth sleep–wake information of the participants during the missing intervals can be recovered using sleep diaries (Figure 1A). Moreover, the distribution of missing data interval length roughly followed a gamma distribution (Figure 1B), as illustrated by our best fit using the Method of Moments. This suggests that the missing data occur frequently from wearable measurements in clinical settings, significantly affecting the clinical practicality of wearable devices. To overcome these difficulties, we constructed a simple neural network model that imputes minute-by-minute missing wearable sleep data based on the participants’ non-missing activity and sleep data (Figure 1C).

Missing sleep data from wrist-worn actigraphy was imputed using a neural network model. (A) Twenty-eight shift working nurses from Ewha Womans University Seoul Hospital and Dankook University Hospital recorded a sleep diary and simultaneously wore a wrist-worn actigraphy collecting sleep labels and activity data. Missing data limits the use of actigraphy in clinical settings, where recording accurate recording of sleep diaries over longitudinal periods is undesired. Missing intervals from the actigraphy can be classified into three categories (wake, sleep, wake + sleep) based on their ground-truth label from sleep diaries. (B) Distribution of missing intervals based on their three categories, collected across all participating participants. The dataset is highly unbalanced as missing intervals with ground-truth label wake outnumber missing intervals with ground-truth label sleep. Here, the distribution roughly follows a gamma distribution with a shape parameter α = 1.1 and scale parameter β = 31.1, which gives a mean of 34.3 minutes and a standard deviation of 32.6 minutes. The parameters were obtained using the method of moments. (C) The objective of this study is to use neural network model to accurately impute missing sleep labels based on non-missing activity data and sleep labels.
Figure 1.

Missing sleep data from wrist-worn actigraphy was imputed using a neural network model. (A) Twenty-eight shift working nurses from Ewha Womans University Seoul Hospital and Dankook University Hospital recorded a sleep diary and simultaneously wore a wrist-worn actigraphy collecting sleep labels and activity data. Missing data limits the use of actigraphy in clinical settings, where recording accurate recording of sleep diaries over longitudinal periods is undesired. Missing intervals from the actigraphy can be classified into three categories (wake, sleep, wake + sleep) based on their ground-truth label from sleep diaries. (B) Distribution of missing intervals based on their three categories, collected across all participating participants. The dataset is highly unbalanced as missing intervals with ground-truth label wake outnumber missing intervals with ground-truth label sleep. Here, the distribution roughly follows a gamma distribution with a shape parameter α = 1.1 and scale parameter β = 31.1, which gives a mean of 34.3 minutes and a standard deviation of 32.6 minutes. The parameters were obtained using the method of moments. (C) The objective of this study is to use neural network model to accurately impute missing sleep labels based on non-missing activity data and sleep labels.

Feature generation and model construction.

Three features were used as input to the model for each minute-by-minute timepoint: activity, label, and NMF feature. Activity and label features were obtained from preprocessing and normalizing the activity and sleep–wake label data collected from actigraphy (Figure 2A). For the NMF feature, we exploited the known sleep–wake label to obtain a preliminary prediction of the participants’ sleep status during the missing intervals by capturing the hidden longitudinal sleep–wake patterns even for individuals with irregular sleep–wake cycle (see Supplementary Materials S1 for details) (Figure 2B). Using these features, the model predictions were then validated against the ground-truth sleep information obtained from sleep diaries (Figure 2C) and developed into a user-friendly computational package called SOMNI. A detailed user-friendly manual for implementation of SOMNI can be found in Supplementary Materials S2.

Raw data from wrist-worn actigraphy is used to create a feature vector. Raw activity data and sleep label from the actigraphy is preprocessed to generate the activity feature and label feature (A). The preprocessed activity data of each participant ranges from 0 to 1 after normalizing it by the 90th percentile activity value from individual data. Sleep label is either a 0 or 1, where 0 represents sleep, and 1 represents wake. When the data are missing, the activity feature is set as −1, and the label feature is NaN. Raw sleep label is used to generate the NMF feature that can effectively capture the global structural information of the sleep label (B). NMF feature generation is performed using both non-missing sleep labels (0or 1 for sleep and wake, respectively, in matrix V) and missing sleep labels (NaN in matrix V). Matrix decomposition returns the NMF feature, which predicts the sleep status as a probabilistic value between 0 (sleep) and 1 (wake). Note that the NMF feature predicts sleep status even when the data is missing (boxes in matrix WH that have same position as NaN in matrix V). The activity, label, and NMF features are combined to construct the feature vector, which is then used as an input to the neural network model (C).
Figure 2.

Raw data from wrist-worn actigraphy is used to create a feature vector. Raw activity data and sleep label from the actigraphy is preprocessed to generate the activity feature and label feature (A). The preprocessed activity data of each participant ranges from 0 to 1 after normalizing it by the 90th percentile activity value from individual data. Sleep label is either a 0 or 1, where 0 represents sleep, and 1 represents wake. When the data are missing, the activity feature is set as −1, and the label feature is NaN. Raw sleep label is used to generate the NMF feature that can effectively capture the global structural information of the sleep label (B). NMF feature generation is performed using both non-missing sleep labels (0or 1 for sleep and wake, respectively, in matrix V) and missing sleep labels (NaN in matrix V). Matrix decomposition returns the NMF feature, which predicts the sleep status as a probabilistic value between 0 (sleep) and 1 (wake). Note that the NMF feature predicts sleep status even when the data is missing (boxes in matrix WH that have same position as NaN in matrix V). The activity, label, and NMF features are combined to construct the feature vector, which is then used as an input to the neural network model (C).

Neural network and learning strategy.

Our model architecture includes two hidden dense layers with four units and ReLU activation functions and an output layer with a sigmoid activation function. The hidden layers provide a flexible model to learn the non-linear relationships between the sleep–wake status and our features. We used 20% of all training data as validation data. This ratio was chosen specifically to maximize the area under the curve (AUC) value of the imputation results in the SWSI dataset. The model’s number of layers, activation units, and other hyper-parameters are chosen to optimize the overall imputation performance. We summarize the hyper-parameters in Supplementary Table S1.

Generating artificial missing intervals to train and test neural network models.

One of the main limitations to constructing the neural network models is the imbalance between the missing intervals with the ground-truth label sleep and those labeled wake. The distribution of missing intervals in the SWSI dataset is heavily skewed towards wake because the rotating nurses are more likely to take off their actigraphy during their shifts than during sleep (Figure 1B). This imbalance can underrepresent missing intervals labeled sleep during the models’ training processes while overrepresenting missing intervals labeled wake. Thus, the models may not accurately impute missing intervals that are sleep, which leads to high sensitivity (i.e. % missing wake imputed correctly) but with low specificity (i.e. % missing sleep imputed correctly). Moreover, missing data from actigraphy was generally shorter and less frequent than other consumer-grade wearables, as actigraphy tends to have a longer battery span [34, 35]. This suggests that further data processing are needed to ensure accurate imputation results and realistic representation of missing sleep–wake data from wearable devices beyond actigraphy.

To account for this type of missing data, we created artificial missing intervals based on our prior knowledge of the statistical properties of the existing missing sleep data (Figure 1B). Specifically, we randomly selected a collection of non-missing intervals and deliberately masked them to consider the selected intervals as missing while training and testing the model. First, we randomly select a time between the start and end of the participant’s data collection. Next, to generate the missing patterns of the participants as closely to the original missing intervals as possible, we estimated the distribution of the existing missing interval’s length. We fitted the distribution of existing missing intervals to a gamma distribution, whose shape and scale parameters were obtained using the method of moments [36, 37]. With the method of moments, we obtained a shape parameter α = 1.1 and a scale parameter β = 31.1 that can reasonably describe how the length of missing intervals in the SWSI dataset is distributed, as illustrated by a black graph in Figure 1B. From this gamma distribution, we randomly extract the missing interval length l and consider any non-missing data points between time t and t + l as missing. In cases where the artificial missing intervals overlap with an existing missing interval, the two intervals were merged (Figure 3A). Using this systematic process, we generated artificial missing data that accurately represents the real-world missing sleep data among the participants who are more challenging to impute.

Individual neural network models accurately impute missing sleep data for longer datasets. (A) Individual models were generated solely based on each participant’s actigraphy measurements. Artificial missing data were generated based on the fitted gamma distribution (Figure 1B) for both testing and training process (in steps 1 and 2, respectively). (B) Individual models were constructed for all participants in the SWSI dataset (56.9 ± 8.9 days). (C) For missing intervals shorter than 24 hours, a ROC curve of all individual models from the SWSI dataset (AUC = 0.829) and participant-by-participant AUC values (0.86 ± 0.09, inset) information suggest that the individual models successfully impute missing sleep information. (D) ROC curves were generated for missing intervals of 0~1 hours, 1~3 hours, and 3~24 hours (AUC = 0.915, 0.888, 0.645, respectively). (E) The same process was applied to construct individual models for the SMC data, which is significantly shorter (15.0 ± 1.4 days). (F–G) The constructed individual models from the SMC dataset were tested. The overall ROC curve (AUC = 0.694, F), participant-by-participant variability (AUC = 0.701 ± 0.05, f inset), and ROC curve for intervals of different lengths (AUC = 0.736, 0.703, and 0.603, g) indicate that the individual model’s performances decreased for shorter datasets. (H) To demonstrate our method’s applicability to non-shift workers, the same process was repeated on the EWH dataset (15.5.8 ± 1.9 days). (I–J) Not surprisingly, the individual model’s performance was better for non-shift workers (overall AUC = 0.87, I, and AUC = 0.932, 0.922, and 0.700 for shorter intervals, J) than for shift workers. Note that for missing intervals of 3~24 hours, participant-by-participant AUC values were much higher (0.913 ± 0.041, Table 2). Each three participants has high AUC values (0.913 ± 0.041, I inset).
Figure 3.

Individual neural network models accurately impute missing sleep data for longer datasets. (A) Individual models were generated solely based on each participant’s actigraphy measurements. Artificial missing data were generated based on the fitted gamma distribution (Figure 1B) for both testing and training process (in steps 1 and 2, respectively). (B) Individual models were constructed for all participants in the SWSI dataset (56.9 ± 8.9 days). (C) For missing intervals shorter than 24 hours, a ROC curve of all individual models from the SWSI dataset (AUC = 0.829) and participant-by-participant AUC values (0.86 ± 0.09, inset) information suggest that the individual models successfully impute missing sleep information. (D) ROC curves were generated for missing intervals of 0~1 hours, 1~3 hours, and 3~24 hours (AUC = 0.915, 0.888, 0.645, respectively). (E) The same process was applied to construct individual models for the SMC data, which is significantly shorter (15.0 ± 1.4 days). (F–G) The constructed individual models from the SMC dataset were tested. The overall ROC curve (AUC = 0.694, F), participant-by-participant variability (AUC = 0.701 ± 0.05, f inset), and ROC curve for intervals of different lengths (AUC = 0.736, 0.703, and 0.603, g) indicate that the individual model’s performances decreased for shorter datasets. (H) To demonstrate our method’s applicability to non-shift workers, the same process was repeated on the EWH dataset (15.5.8 ± 1.9 days). (I–J) Not surprisingly, the individual model’s performance was better for non-shift workers (overall AUC = 0.87, I, and AUC = 0.932, 0.922, and 0.700 for shorter intervals, J) than for shift workers. Note that for missing intervals of 3~24 hours, participant-by-participant AUC values were much higher (0.913 ± 0.041, Table 2). Each three participants has high AUC values (0.913 ± 0.041, I inset).

Calculating performance metrics using an ROC curve.

The primary metric to evaluate imputation performance was the area under curve (AUC) value of the ROC curve. We also used the ROC curve to calculate the optimal threshold, which was used to compute other metrics: the overall accuracy, sensitivity, specificity, and Cohen’s κ coefficient. See Supplementary Materials S3 for details.

Results

For validation of SOMNI, we consider two realistic scenarios for data availability: in the first scenario, we consider a situation where a dataset from only one participant is available, and the model is validated within individual participants (“individual model” henceforth). In the second scenario, we consider a situation where datasets from multiple participants are available, and the model is trained on a group of participants (“global model” henceforth) and applied to another group of participants whose data were not used in the training process.

Validation of individual models on the SWSI dataset

To create training and testing datasets for individual models, we generated artificial missing data and considered them for imputation along with existing missing data to ensure robustness in the imputation performance. The lengths of artificial missing intervals were randomly determined based on the distribution of missing intervals’ lengths in the SWSI dataset (Figure 1B). First, an artificial missing interval was generated from the non-missing data for testing (Figure 3A, step 1). Next, another artificial missing interval was generated for training (Figure 3A, step 2). This procedure was repeated so that 100 artificial intervals were generated for both training and testing data. This process of individual model training and testing data generation was repeated 30 times for each participant in the SWSI dataset (n = 28, average length = 56.9 ± 8.9 days, Figure 3B) to account for the randomness in generating artificial missing data. Individual models in each repetition were then validated with their respective testing datasets.

The performance for all models achieved an AUC value of 0.829 (Figure 3C) for all missing intervals shorter than 24 hours. Any missing intervals longer than 24 hours were excluded from analysis because missing data longer than 24 hours were extremely rare (Figure 1B), and it would be extremely challenging to impute data longer than 24 hours. An optimal threshold (Figure 3C red dot) resulted in an overall accuracy of 79.9%, where the sensitivity (i.e. % missing wake imputed correctly) was 80.8%, and the specificity (i.e. % missing sleep imputed correctly) was 75.5% (Table 1) for missing data up to 24 hours. The corresponding Cohen’s κ value was 0.454, suggesting that the imputed sleep information generally agreed with the ground-truth sleep information from sleep diaries.

Table 1.

Imputation Results of Missing Sleep Data for Individual Models Across the Entire Dataset

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8290.9150.8880.645
Accuracy79.9%85.0%82.5%68.1%
Sensitivity80.8%85.0%82.7%70.8%
Specificity75.5%85.2%81.6%58.6%
Cohen’s κ0.4540.5420.5370.241
SMC
AUC0.6940.7360.7030.603
Accuracy66.5%70.1%67.2%61.3%
Sensitivity66.9%70.5%68.2%61.7%
Specificity64.4%67.2%64.5%56.1%
Cohen’s κ0.1830.2290.2760.052
EWH
AUC0.8700.9320.9220.700
Accuracy81.2%87.9%87.0%60.3%
Sensitivity80.9%88.6%88.2%60.0%
Specificity83.2%84.1%82.6%76.1%
Cohen’s κ0.4420.5980.6460.038
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8290.9150.8880.645
Accuracy79.9%85.0%82.5%68.1%
Sensitivity80.8%85.0%82.7%70.8%
Specificity75.5%85.2%81.6%58.6%
Cohen’s κ0.4540.5420.5370.241
SMC
AUC0.6940.7360.7030.603
Accuracy66.5%70.1%67.2%61.3%
Sensitivity66.9%70.5%68.2%61.7%
Specificity64.4%67.2%64.5%56.1%
Cohen’s κ0.1830.2290.2760.052
EWH
AUC0.8700.9320.9220.700
Accuracy81.2%87.9%87.0%60.3%
Sensitivity80.9%88.6%88.2%60.0%
Specificity83.2%84.1%82.6%76.1%
Cohen’s κ0.4420.5980.6460.038

The reported performance metrics are calculated based on the imputation results from all individual models constructed across each dataset. Here, SWSI (Shift-Work Sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH (Ewha Womans University Seoul Hospital) denotes a dataset from non-shift working patients experiencing different sleep disorders.

Table 1.

Imputation Results of Missing Sleep Data for Individual Models Across the Entire Dataset

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8290.9150.8880.645
Accuracy79.9%85.0%82.5%68.1%
Sensitivity80.8%85.0%82.7%70.8%
Specificity75.5%85.2%81.6%58.6%
Cohen’s κ0.4540.5420.5370.241
SMC
AUC0.6940.7360.7030.603
Accuracy66.5%70.1%67.2%61.3%
Sensitivity66.9%70.5%68.2%61.7%
Specificity64.4%67.2%64.5%56.1%
Cohen’s κ0.1830.2290.2760.052
EWH
AUC0.8700.9320.9220.700
Accuracy81.2%87.9%87.0%60.3%
Sensitivity80.9%88.6%88.2%60.0%
Specificity83.2%84.1%82.6%76.1%
Cohen’s κ0.4420.5980.6460.038
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8290.9150.8880.645
Accuracy79.9%85.0%82.5%68.1%
Sensitivity80.8%85.0%82.7%70.8%
Specificity75.5%85.2%81.6%58.6%
Cohen’s κ0.4540.5420.5370.241
SMC
AUC0.6940.7360.7030.603
Accuracy66.5%70.1%67.2%61.3%
Sensitivity66.9%70.5%68.2%61.7%
Specificity64.4%67.2%64.5%56.1%
Cohen’s κ0.1830.2290.2760.052
EWH
AUC0.8700.9320.9220.700
Accuracy81.2%87.9%87.0%60.3%
Sensitivity80.9%88.6%88.2%60.0%
Specificity83.2%84.1%82.6%76.1%
Cohen’s κ0.4420.5980.6460.038

The reported performance metrics are calculated based on the imputation results from all individual models constructed across each dataset. Here, SWSI (Shift-Work Sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH (Ewha Womans University Seoul Hospital) denotes a dataset from non-shift working patients experiencing different sleep disorders.

Next, we created ROC curves for each participant. The individual models imputed missing intervals up to 24 hours with an AUC value of 0.862 ± 0.086 (mean ± std) across different participants (Figure 3C inset). The participant-by-pa accuracy computed with individual optimal thresholds was 81.5 ± 7.5% (Table 2). Moreover, the individual models achieved a sensitivity of 82.0 ± 7.7%, a specificity of 80.0 ± 8.4%, and Cohen’s κ of 0.490 ± 0.148.

Table 2.

Participant-by-Participant Imputation Results of Missing Sleep Data for Individual Models

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.862 ± 0.0860.912 ± 0.0290.890 ± 0.0440.772 ± 0.175
Accuracy81.5% ± 7.5%84.8% ± 3.9%83.6% ± 5.4%78.8% ± 16.2%
Sensitivity82.0% ± 7.7%84.7% ± 4.7%84.3% ± 6.1%79.2% ± 18.2%
Specificity80.0% ± 8.4%85.3% ± 5.0%81.5% ± 6.1%68.1% ± 25.0%
Cohen’s κ0.490 ± 0.1480.547 ± 0.0900.571 ± 0.1520.328 ± 0.276
SMC
AUC0.701 ± 0.0490.726 ± 0.0520.703 ± 0.0700.656 ± 0.192
Accuracy67.2% ± 3.5%69.7% ± 4.1%67.5% ± 4.7%61.0% ± 14.3%
Sensitivity67.6% ± 3.3%70.1% ± 3.8%68.8% ± 4.3%57.5% ± 17.5%
Specificity65.2% ± 6.6%66.8% ± 6.2%63.9% ± 8.6%71.3% ± 17.9%
Cohen’s κ0.201 ± 0.0720.229 ± 0.0820.278 ± 0.1240.163 ± 0.217
EWH
AUC0.913 ± 0.0410.963 ± 0.0320.916 ± 0.0510.953 ± 0.057
Accuracy86.0% ± 4.8%88.1% ± 5.8%86.4% ± 6.4%74.3% ± 18.0%
Sensitivity86.3% ± 5.0%88.5% ± 6.5%86.6% ± 7.2%86.1% ± 17.1%
Specificity85.0% ± 5.1%85.5% ± 4.6%85.5% ± 6.6%76.5% ± 21.2%
Cohen’s κ0.587 ± 0.1660.623 ± 0.1410.666 ± 0.1370.270 ± 0.343
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.862 ± 0.0860.912 ± 0.0290.890 ± 0.0440.772 ± 0.175
Accuracy81.5% ± 7.5%84.8% ± 3.9%83.6% ± 5.4%78.8% ± 16.2%
Sensitivity82.0% ± 7.7%84.7% ± 4.7%84.3% ± 6.1%79.2% ± 18.2%
Specificity80.0% ± 8.4%85.3% ± 5.0%81.5% ± 6.1%68.1% ± 25.0%
Cohen’s κ0.490 ± 0.1480.547 ± 0.0900.571 ± 0.1520.328 ± 0.276
SMC
AUC0.701 ± 0.0490.726 ± 0.0520.703 ± 0.0700.656 ± 0.192
Accuracy67.2% ± 3.5%69.7% ± 4.1%67.5% ± 4.7%61.0% ± 14.3%
Sensitivity67.6% ± 3.3%70.1% ± 3.8%68.8% ± 4.3%57.5% ± 17.5%
Specificity65.2% ± 6.6%66.8% ± 6.2%63.9% ± 8.6%71.3% ± 17.9%
Cohen’s κ0.201 ± 0.0720.229 ± 0.0820.278 ± 0.1240.163 ± 0.217
EWH
AUC0.913 ± 0.0410.963 ± 0.0320.916 ± 0.0510.953 ± 0.057
Accuracy86.0% ± 4.8%88.1% ± 5.8%86.4% ± 6.4%74.3% ± 18.0%
Sensitivity86.3% ± 5.0%88.5% ± 6.5%86.6% ± 7.2%86.1% ± 17.1%
Specificity85.0% ± 5.1%85.5% ± 4.6%85.5% ± 6.6%76.5% ± 21.2%
Cohen’s κ0.587 ± 0.1660.623 ± 0.1410.666 ± 0.1370.270 ± 0.343

Each performance metric is reported as “Mean ± SD,” where the mean and standard deviation (SD) are calculated based on the imputation results from all individual models constructed for each participant. Here, SWSI (Shift-Work Sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH (Ewha Womans University Seoul Hospital) denotes a dataset from non-shift working patients experiencing different sleep disorders.

Table 2.

Participant-by-Participant Imputation Results of Missing Sleep Data for Individual Models

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.862 ± 0.0860.912 ± 0.0290.890 ± 0.0440.772 ± 0.175
Accuracy81.5% ± 7.5%84.8% ± 3.9%83.6% ± 5.4%78.8% ± 16.2%
Sensitivity82.0% ± 7.7%84.7% ± 4.7%84.3% ± 6.1%79.2% ± 18.2%
Specificity80.0% ± 8.4%85.3% ± 5.0%81.5% ± 6.1%68.1% ± 25.0%
Cohen’s κ0.490 ± 0.1480.547 ± 0.0900.571 ± 0.1520.328 ± 0.276
SMC
AUC0.701 ± 0.0490.726 ± 0.0520.703 ± 0.0700.656 ± 0.192
Accuracy67.2% ± 3.5%69.7% ± 4.1%67.5% ± 4.7%61.0% ± 14.3%
Sensitivity67.6% ± 3.3%70.1% ± 3.8%68.8% ± 4.3%57.5% ± 17.5%
Specificity65.2% ± 6.6%66.8% ± 6.2%63.9% ± 8.6%71.3% ± 17.9%
Cohen’s κ0.201 ± 0.0720.229 ± 0.0820.278 ± 0.1240.163 ± 0.217
EWH
AUC0.913 ± 0.0410.963 ± 0.0320.916 ± 0.0510.953 ± 0.057
Accuracy86.0% ± 4.8%88.1% ± 5.8%86.4% ± 6.4%74.3% ± 18.0%
Sensitivity86.3% ± 5.0%88.5% ± 6.5%86.6% ± 7.2%86.1% ± 17.1%
Specificity85.0% ± 5.1%85.5% ± 4.6%85.5% ± 6.6%76.5% ± 21.2%
Cohen’s κ0.587 ± 0.1660.623 ± 0.1410.666 ± 0.1370.270 ± 0.343
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.862 ± 0.0860.912 ± 0.0290.890 ± 0.0440.772 ± 0.175
Accuracy81.5% ± 7.5%84.8% ± 3.9%83.6% ± 5.4%78.8% ± 16.2%
Sensitivity82.0% ± 7.7%84.7% ± 4.7%84.3% ± 6.1%79.2% ± 18.2%
Specificity80.0% ± 8.4%85.3% ± 5.0%81.5% ± 6.1%68.1% ± 25.0%
Cohen’s κ0.490 ± 0.1480.547 ± 0.0900.571 ± 0.1520.328 ± 0.276
SMC
AUC0.701 ± 0.0490.726 ± 0.0520.703 ± 0.0700.656 ± 0.192
Accuracy67.2% ± 3.5%69.7% ± 4.1%67.5% ± 4.7%61.0% ± 14.3%
Sensitivity67.6% ± 3.3%70.1% ± 3.8%68.8% ± 4.3%57.5% ± 17.5%
Specificity65.2% ± 6.6%66.8% ± 6.2%63.9% ± 8.6%71.3% ± 17.9%
Cohen’s κ0.201 ± 0.0720.229 ± 0.0820.278 ± 0.1240.163 ± 0.217
EWH
AUC0.913 ± 0.0410.963 ± 0.0320.916 ± 0.0510.953 ± 0.057
Accuracy86.0% ± 4.8%88.1% ± 5.8%86.4% ± 6.4%74.3% ± 18.0%
Sensitivity86.3% ± 5.0%88.5% ± 6.5%86.6% ± 7.2%86.1% ± 17.1%
Specificity85.0% ± 5.1%85.5% ± 4.6%85.5% ± 6.6%76.5% ± 21.2%
Cohen’s κ0.587 ± 0.1660.623 ± 0.1410.666 ± 0.1370.270 ± 0.343

Each performance metric is reported as “Mean ± SD,” where the mean and standard deviation (SD) are calculated based on the imputation results from all individual models constructed for each participant. Here, SWSI (Shift-Work Sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH (Ewha Womans University Seoul Hospital) denotes a dataset from non-shift working patients experiencing different sleep disorders.

To further analyze how the imputation performance changes with respect to the length of missing intervals, we classified the missing intervals into three categories based on their lengths: 0~1 hours, 1~3 hours, and 3~24 hours. For each category, an average ROC curve was generated across the entire dataset (Figure 3D), and its respective AUC value was computed. As expected, the imputation performance decreased as the missing interval became longer; the average ROC curve yielded AUC values of 0.915, 0.888, and 0.645 for 0~1 hours, 1~3 hours, and 3~24 hours, respectively. The same trend was observed among other performance metrics (Table 1). For instance, the overall accuracies were 85.0%, 82.5%, and 68.1%, and Cohen’s κ values were 0.542, 0.537, and 0.241, respectively. We observed the same tendency in the participant-by-participant variability (Table 2). Notably, there was greater interindividual variability in the imputation results of longer intervals.

Validation of individual models on a short-shift worker dataset

To further validate our approach for constructing individual models, we similarly constructed individual models and validated them using the SMC dataset (n = 9, average length = 15.0 ± 1.4 days, Figure 3E). Because of smaller participant numbers and shorter data collection period than the SWSI dataset, we increased the number of repetitions for individual models training and testing of each participant to 100 times, instead of 30 times.

A ROC curve for all individual models across all participants yielded an AUC value of 0.694 for missing data up to 24 hours, indicating that the individual models’ overall performance significantly decreased compared to the SWSI dataset (Figure 3F). Similarly, the participant-by-participant AUC values (0.701 ± 0.049, Figure 3F inset) showed decreased performance of individual models on the SMC dataset compared to the SWSI dataset. Overall accuracy, sensitivity, specificity, and Cohen’s κ all worsened compared to the imputation results on the SWSI dataset (Tables 1 and 2). Furthermore, the AUC values for missing intervals of 0~1 hours, 1~3 hours, and 3~24 hours all decreased compared to the SWSI dataset (Figure 3G). These results indicate that the individual model performance decreases as the length of the dataset decreases.

Validation of individual models on a non-shift worker dataset

Finally, we validated individual models using actigraphy data collected from 16 non-shift working patients from EWH, who were experiencing insomnia or hypersomnolence. The data length (15.1 ± 1.9 days, Figure 3H) for the EWH dataset was similar to the SMC dataset (15.0 ± 1.4 days, Figure 3E). Not surprisingly, the imputation results on the non-shift worker data were significantly better than both shift worker datasets because non-shift workers have much more regular sleep patterns than shift workers. Specifically, the AUC of the ROC curve of all missing intervals shorter than 24 hours across the entire dataset was 0.870 (Figure 3I). With an optimal threshold (Figure 3I red dot), the overall accuracy was 81.2%, with a sensitivity of 80.9% and a specificity of 83.2%, and the Cohen’s κ was 0.442. Figure 3J shows the ROC curves for missing intervals of 0~1 hours, 1~3 hours, and 3~24 hours, whose AUC values are 0.932, 0.922, and 0.700, respectively (Table 1). Notably, the AUC values were higher in the participant-by-participant analysis (e.g. AUC = 0.913 ± 0.041 and 0.953 ± 0.057 for missing data of 0~24 hours and 3~24 hours, respectively, Figure 3I inset, Table 2). This emphasizes the significance of accounting for interindividual variability in imputation results. We also observed higher values for other metrics in the participant-by-participant analysis (see Table 2). Such higher accuracy for the non-shift worker dataset compared to the shift worker dataset highlights the importance of testing imputation algorithms for shift worker dataset, which has not been done in previous studies.

Construction of global models using LOSOCV

Despite the promising performance of individual models, the performance was unsatisfactory for the short SMC dataset (15.0 ± 1.4 days, Figure 3E). To circumvent this limitation, we constructed the global neural network model by training the neural network model across multiple rather than individual participants.

We first used Leave-one-participant-out cross-validation (LOSOCV) to train global models within the SWSI dataset. Specifically, the data from a single participant in the SWSI dataset was left out, while the data from all remaining participants were used as a training dataset of a global model (Figure 4A). To test the constructed global model, we used the data from a single left-out participant to generate artificial missing intervals. The generated artificial missing intervals and existing non-artificial missing data from a left-out participant were used as a testing dataset to validate the global model trained from the remaining participants. To account for randomness in the generation of artificial missing data, this process was repeated 30 times for each global model. Hence, using LOSOCV, 840 validations (30 repetitions for each of the 28 global models) were performed within the SWSI dataset. In all the repetitions, no data point was ever used for both training and testing.

Global neural network model accurately imputes missing sleep data even for short data. (A) Global models were constructed and validated for the SWSI dataset (56.9 ± 8.9 days) using LOSOCV. (B–C) For all missing intervals shorter than 24 hours, an ROC curve across all participants (AUC = 0.87) and participant-by-participant AUC values (0.89 ± 0.04, inset) show that the global model performs well. (C) ROC curves for missing intervals of 0~1 hours, 1~3 hours, and 3~24 hour (AUC = 0.922, 0.875, and 0.797, respectively) suggest that both individual and global model accurately imputes missing sleep information for longer dataset. (D) A global model was constructed on data from all participants in the SWSI dataset. (E–F) The global model trained with the SWSI dataset was tested on the SMC data (15.0 ± 1.4 days) and showed significant performance improvements (AUC = 0.772, E) compared to the individual models (AUC = 0.694, Figure 3F). The individual AUCs (0.84 ± 0.05, E inset) and AUCs for intervals of different lengths (0.824, 0.782, and 0.677, respectively, F) further show performance improvements compared to individual models, indicating that the global model accurately imputes missing sleep information, even for shorter dataset. (G–H) The global model trained with the SWSI data was tested with the EWH dataset (15.5 ± 1.9 days) and achieved high overall AUC (0.945, g), individual AUCs (0.958 ± 0.019, g inset) and AUCs for intervals of different lengths (0.969, 0.960, and 0.860, respectively, hours). These results indicate the global model’s applicability for imputing missing sleep information of non-shift workers.
Figure 4.

Global neural network model accurately imputes missing sleep data even for short data. (A) Global models were constructed and validated for the SWSI dataset (56.9 ± 8.9 days) using LOSOCV. (B–C) For all missing intervals shorter than 24 hours, an ROC curve across all participants (AUC = 0.87) and participant-by-participant AUC values (0.89 ± 0.04, inset) show that the global model performs well. (C) ROC curves for missing intervals of 0~1 hours, 1~3 hours, and 3~24 hour (AUC = 0.922, 0.875, and 0.797, respectively) suggest that both individual and global model accurately imputes missing sleep information for longer dataset. (D) A global model was constructed on data from all participants in the SWSI dataset. (E–F) The global model trained with the SWSI dataset was tested on the SMC data (15.0 ± 1.4 days) and showed significant performance improvements (AUC = 0.772, E) compared to the individual models (AUC = 0.694, Figure 3F). The individual AUCs (0.84 ± 0.05, E inset) and AUCs for intervals of different lengths (0.824, 0.782, and 0.677, respectively, F) further show performance improvements compared to individual models, indicating that the global model accurately imputes missing sleep information, even for shorter dataset. (G–H) The global model trained with the SWSI data was tested with the EWH dataset (15.5 ± 1.9 days) and achieved high overall AUC (0.945, g), individual AUCs (0.958 ± 0.019, g inset) and AUCs for intervals of different lengths (0.969, 0.960, and 0.860, respectively, hours). These results indicate the global model’s applicability for imputing missing sleep information of non-shift workers.

The global models successfully imputed missing data shorter than 24 hours across the entire dataset (AUC = 0.870, Figure 4B). The overall accuracy, sensitivity, specificity, and Cohen’s κ based on the calculated optimal threshold (Figure 4B red dot) were 82.4%, 83.3%, 80.5%, and 0.614, respectively (Table 3). Furthermore, the global models successfully imputed missing intervals across different participants (AUC = 0.886 ± 0.040, Figure 4B inset). With optimal threshold, the global models imputed 82.4% of all missing sleep data across different participants, with a standard deviation of 5.7%, and the corresponding sensitivity, specificity, and Cohen’s κ were 83.5 ± 10.2%, 80.5 ± 8.0%, and 0.617 ± 0.100, respectively (Table 4). This suggests that the global models successfully imputed the missing sleep data across the entire SWSI dataset and different participants.

Table 3.

Imputation Results of Missing Sleep Data for Global Models Across the Entire Dataset

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8700.9220.8750.797
Accuracy82.4%87.9%83.1%75.2%
Sensitivity83.3%88.8%84.4%75.5%
Specificity80.5%86.2%80.4%74.4%
Cohen’s κ0.6140.7330.6270.461
SMC
AUC0.7720.8240.7820.677
Accuracy78.2%83.9%76.3%71.9%
Sensitivity80.8%86.4%81.7%73.3%
Specificity62.3%68.1%60.5%50.9%
Cohen’s κ0.3220.4450.4050.094
EWH
AUC0.9440.9690.9600.860
Accuracy90.9%94.2%92.3%82.2%
Sensitivity92.0%95.7%95.1%82.2%
Specificity83.6%85.4%82.0%79.8%
Cohen’s κ0.6580.7760.7700.131
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8700.9220.8750.797
Accuracy82.4%87.9%83.1%75.2%
Sensitivity83.3%88.8%84.4%75.5%
Specificity80.5%86.2%80.4%74.4%
Cohen’s κ0.6140.7330.6270.461
SMC
AUC0.7720.8240.7820.677
Accuracy78.2%83.9%76.3%71.9%
Sensitivity80.8%86.4%81.7%73.3%
Specificity62.3%68.1%60.5%50.9%
Cohen’s κ0.3220.4450.4050.094
EWH
AUC0.9440.9690.9600.860
Accuracy90.9%94.2%92.3%82.2%
Sensitivity92.0%95.7%95.1%82.2%
Specificity83.6%85.4%82.0%79.8%
Cohen’s κ0.6580.7760.7700.131

The reported performance metrics are calculated based on the imputation results from all individual models constructed across each dataset. Here, SWSI (shift-work sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH (Ewha Womans University Seoul Hospital) denotes a dataset from non-shift working patients experiencing different sleep disorders.

Table 3.

Imputation Results of Missing Sleep Data for Global Models Across the Entire Dataset

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8700.9220.8750.797
Accuracy82.4%87.9%83.1%75.2%
Sensitivity83.3%88.8%84.4%75.5%
Specificity80.5%86.2%80.4%74.4%
Cohen’s κ0.6140.7330.6270.461
SMC
AUC0.7720.8240.7820.677
Accuracy78.2%83.9%76.3%71.9%
Sensitivity80.8%86.4%81.7%73.3%
Specificity62.3%68.1%60.5%50.9%
Cohen’s κ0.3220.4450.4050.094
EWH
AUC0.9440.9690.9600.860
Accuracy90.9%94.2%92.3%82.2%
Sensitivity92.0%95.7%95.1%82.2%
Specificity83.6%85.4%82.0%79.8%
Cohen’s κ0.6580.7760.7700.131
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.8700.9220.8750.797
Accuracy82.4%87.9%83.1%75.2%
Sensitivity83.3%88.8%84.4%75.5%
Specificity80.5%86.2%80.4%74.4%
Cohen’s κ0.6140.7330.6270.461
SMC
AUC0.7720.8240.7820.677
Accuracy78.2%83.9%76.3%71.9%
Sensitivity80.8%86.4%81.7%73.3%
Specificity62.3%68.1%60.5%50.9%
Cohen’s κ0.3220.4450.4050.094
EWH
AUC0.9440.9690.9600.860
Accuracy90.9%94.2%92.3%82.2%
Sensitivity92.0%95.7%95.1%82.2%
Specificity83.6%85.4%82.0%79.8%
Cohen’s κ0.6580.7760.7700.131

The reported performance metrics are calculated based on the imputation results from all individual models constructed across each dataset. Here, SWSI (shift-work sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH (Ewha Womans University Seoul Hospital) denotes a dataset from non-shift working patients experiencing different sleep disorders.

Table 4.

Participant-by-Participant Imputation Results of Missing Sleep Data for Global Models

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.886 ± 0.0400.938 ± 0.0200.895 ± 0.0330.812 ± 0.064
Accuracy82.4% ± 5.7%88.0% ± 4.1%83.0% ± 5.4%75.6% ± 7.3%
Sensitivity83.5% ± 10.2%89.0% ± 8.0%84.5% ± 9.9%76.1% ± 12.6%
Specificity80.5% ± 8.0%86.0% ± 6.2%80.6% ± 8.5%74.3% ± 11.5%
Cohen’s κ0.617 ± 0.1000.735 ± 0.0750.630 ± 0.0950.470 ± 0.116
SMC
AUC0.783 ± 0.0450.812 ± 0.0370.785 ± 0.0650.746 ± 0.185
Accuracy79.7% ± 4.5%83.2% ± 4.3%77.0% ± 5.7%74.3% ± 11.7%
Sensitivity82.6% ± 5.1%85.6% ± 4.7%83.2% ± 6.8%75.4% ± 13.3%
Specificity63.6% ± 8.7%67.6% ± 5.4%62.0% ± 11.5%65.3% ± 19.6%
Cohen’s κ0.366 ± 0.1190.440 ± 0.0860.426 ± 0.1390.266 ± 0.270
EWH
AUC0.958 ± 0.0190.969 ± 0.0170.958 ± 0.0270.939 ± 0.102
Accuracy93.1% ± 2.1%94.2% ± 3.7%91.6% ± 5.4%79.4% ± 20.5%
Sensitivity95.5% ± 2.7%96.0% ± 4.3%95.3% ± 5.6%92.4% ± 11.1%
Specificity81.7% ± 7.8%82.8% ± 7.1%81.2% ± 9.5%77.3% ± 26.5%
Cohen’s κ0.738 ± 0.1440.784 ± 0.0950.773 ± 0.1280.338 ± 0.402
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.886 ± 0.0400.938 ± 0.0200.895 ± 0.0330.812 ± 0.064
Accuracy82.4% ± 5.7%88.0% ± 4.1%83.0% ± 5.4%75.6% ± 7.3%
Sensitivity83.5% ± 10.2%89.0% ± 8.0%84.5% ± 9.9%76.1% ± 12.6%
Specificity80.5% ± 8.0%86.0% ± 6.2%80.6% ± 8.5%74.3% ± 11.5%
Cohen’s κ0.617 ± 0.1000.735 ± 0.0750.630 ± 0.0950.470 ± 0.116
SMC
AUC0.783 ± 0.0450.812 ± 0.0370.785 ± 0.0650.746 ± 0.185
Accuracy79.7% ± 4.5%83.2% ± 4.3%77.0% ± 5.7%74.3% ± 11.7%
Sensitivity82.6% ± 5.1%85.6% ± 4.7%83.2% ± 6.8%75.4% ± 13.3%
Specificity63.6% ± 8.7%67.6% ± 5.4%62.0% ± 11.5%65.3% ± 19.6%
Cohen’s κ0.366 ± 0.1190.440 ± 0.0860.426 ± 0.1390.266 ± 0.270
EWH
AUC0.958 ± 0.0190.969 ± 0.0170.958 ± 0.0270.939 ± 0.102
Accuracy93.1% ± 2.1%94.2% ± 3.7%91.6% ± 5.4%79.4% ± 20.5%
Sensitivity95.5% ± 2.7%96.0% ± 4.3%95.3% ± 5.6%92.4% ± 11.1%
Specificity81.7% ± 7.8%82.8% ± 7.1%81.2% ± 9.5%77.3% ± 26.5%
Cohen’s κ0.738 ± 0.1440.784 ± 0.0950.773 ± 0.1280.338 ± 0.402

Each performance metric is reported as “Mean ± SD,” where the mean and standard deviation (SD) are calculated based on the imputation results from all global models constructed for each participant. Here, SWSI (Shift-Work Sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH denotes a dataset from non-shift working patients experiencing different sleep disorders.

Table 4.

Participant-by-Participant Imputation Results of Missing Sleep Data for Global Models

Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.886 ± 0.0400.938 ± 0.0200.895 ± 0.0330.812 ± 0.064
Accuracy82.4% ± 5.7%88.0% ± 4.1%83.0% ± 5.4%75.6% ± 7.3%
Sensitivity83.5% ± 10.2%89.0% ± 8.0%84.5% ± 9.9%76.1% ± 12.6%
Specificity80.5% ± 8.0%86.0% ± 6.2%80.6% ± 8.5%74.3% ± 11.5%
Cohen’s κ0.617 ± 0.1000.735 ± 0.0750.630 ± 0.0950.470 ± 0.116
SMC
AUC0.783 ± 0.0450.812 ± 0.0370.785 ± 0.0650.746 ± 0.185
Accuracy79.7% ± 4.5%83.2% ± 4.3%77.0% ± 5.7%74.3% ± 11.7%
Sensitivity82.6% ± 5.1%85.6% ± 4.7%83.2% ± 6.8%75.4% ± 13.3%
Specificity63.6% ± 8.7%67.6% ± 5.4%62.0% ± 11.5%65.3% ± 19.6%
Cohen’s κ0.366 ± 0.1190.440 ± 0.0860.426 ± 0.1390.266 ± 0.270
EWH
AUC0.958 ± 0.0190.969 ± 0.0170.958 ± 0.0270.939 ± 0.102
Accuracy93.1% ± 2.1%94.2% ± 3.7%91.6% ± 5.4%79.4% ± 20.5%
Sensitivity95.5% ± 2.7%96.0% ± 4.3%95.3% ± 5.6%92.4% ± 11.1%
Specificity81.7% ± 7.8%82.8% ± 7.1%81.2% ± 9.5%77.3% ± 26.5%
Cohen’s κ0.738 ± 0.1440.784 ± 0.0950.773 ± 0.1280.338 ± 0.402
Length of missing intervals (hours)
0~24 h0~1 h1~3 h3~24 h
SWSI
AUC0.886 ± 0.0400.938 ± 0.0200.895 ± 0.0330.812 ± 0.064
Accuracy82.4% ± 5.7%88.0% ± 4.1%83.0% ± 5.4%75.6% ± 7.3%
Sensitivity83.5% ± 10.2%89.0% ± 8.0%84.5% ± 9.9%76.1% ± 12.6%
Specificity80.5% ± 8.0%86.0% ± 6.2%80.6% ± 8.5%74.3% ± 11.5%
Cohen’s κ0.617 ± 0.1000.735 ± 0.0750.630 ± 0.0950.470 ± 0.116
SMC
AUC0.783 ± 0.0450.812 ± 0.0370.785 ± 0.0650.746 ± 0.185
Accuracy79.7% ± 4.5%83.2% ± 4.3%77.0% ± 5.7%74.3% ± 11.7%
Sensitivity82.6% ± 5.1%85.6% ± 4.7%83.2% ± 6.8%75.4% ± 13.3%
Specificity63.6% ± 8.7%67.6% ± 5.4%62.0% ± 11.5%65.3% ± 19.6%
Cohen’s κ0.366 ± 0.1190.440 ± 0.0860.426 ± 0.1390.266 ± 0.270
EWH
AUC0.958 ± 0.0190.969 ± 0.0170.958 ± 0.0270.939 ± 0.102
Accuracy93.1% ± 2.1%94.2% ± 3.7%91.6% ± 5.4%79.4% ± 20.5%
Sensitivity95.5% ± 2.7%96.0% ± 4.3%95.3% ± 5.6%92.4% ± 11.1%
Specificity81.7% ± 7.8%82.8% ± 7.1%81.2% ± 9.5%77.3% ± 26.5%
Cohen’s κ0.738 ± 0.1440.784 ± 0.0950.773 ± 0.1280.338 ± 0.402

Each performance metric is reported as “Mean ± SD,” where the mean and standard deviation (SD) are calculated based on the imputation results from all global models constructed for each participant. Here, SWSI (Shift-Work Sleep Intervention study) denotes a longer dataset from shift-working nurses; SMC (Samsung Medical Center) denotes a shorter dataset from shift-working nurses; and EWH denotes a dataset from non-shift working patients experiencing different sleep disorders.

For each category of missing intervals (0~1 hours, 1~3 hours, and 3~24 hours), a ROC curve and its respective AUC value were computed across the entire dataset (AUC = 0.922, 0.875, and 0.797, respectively, Figure 4C and Table 3). Likewise, other metrics (Table 3) and participant-by-participant variations (Table 4) showed a decreasing pattern as the missing intervals became longer, similar to the individual models. While the imputation performance of the individual and global models was similar for missing intervals shorter than 3h (Figure 3,D and C), the global model (AUC = 0.797, 0.812 ± 0.064, Figure 4C and Table 4) performed better than the individual models (AUC = 0.645, 0.772 ± 0.175, Figure 3D and Table 2) for the missing intervals longer than 3 hours. Nonetheless, because most missing intervals are shorter than 3 hours, as illustrated by their distribution in Figure 1B, these results indicate that the overall performance of the individual and global models was similar.

Validation of the global model on a short-shift worker dataset

Next, we constructed a single global model by training a neural network model on data from all 28 shift working nurses in the SWSI dataset. The trained global model was then validated on the SMC dataset (Figure 4D). To compare the performance of the global and individual models, we tested our global model on the same testing dataset used to validate individual models within the SMC dataset.

The global model showed significant improvements compared to the individual models. Specifically, the AUC value of the ROC curve from all imputed missing intervals up to 24 hours across the SMC dataset was 0.772 (Figure 4E), significantly higher than that of the individual models (AUC = 0.694, Figure 3F). Moreover, the global model’s performance across different participants (AUC = 0.783 ± 0.045, Figure 4E inset) was superior to the individual models’ performance (AUC = 0.701 ± 0.049, Figure 3F inset). Likewise, the global model imputed missing intervals of 0~1 hours, 1~3 hours, and 3~24 hours (AUC = 0.824, 0.782, and 0.677, respectively, Figure 4F) significantly better than the individual models (AUC = 0.736, 0.703, and 0.603, Figure 3G). Remarkably, the global model improved the sensitivity by ~16% for missing intervals shorter than 1 hour, as it achieved a sensitivity of 86.4%, while the individual models’ sensitivity was 70.5% (Tables 1 and 3). Similarly, the average sensitivity of the global model was 85.6% ± 4.7% across different participants, showing a significant improvement from the individual models’ sensitivity of 70.1% ± 3.8% (Tables 2 and 4). The same trend was observed with missing intervals of 1~3 hours, as the global model improved the sensitivity by ~13% (Tables 14). These results strongly indicate that the global model outperformed the individual models on a short dataset.

Validation of the global model on a non-shift worker dataset

We used a global model trained on the SWSI dataset to impute missing sleep data from 16 non-shift workers in the EWH dataset (Figure 4D). Again, to compare the performance of the global model with individual models, we validated our global model on the same testing dataset used to validate individual models within the EWH dataset.

The imputation performance of the global model was comparable to the performance of the individual models. Specifically, the global model imputed all missing intervals shorter than 24 hours with an AUC value of 0.944 (Figure 4G) across the EWH dataset and AUC values of 0.958 ± 0.019 (Figure 4G inset) across different participants. These results were significantly better than the global model’s performance on the SWSI dataset (AUC = 0.870, 0.886 ± 0.040, Figure 4B) and the SMC dataset (AUC = 0.772, 0.783 ± 0.045, Figure 4E). Likewise, the global model’s performances on missing intervals of 0~1 hours, 1~3 hours, and 3~24 hours (AUC = 0.969, 0.960, and 0.860, respectively, Figure 4H) were superior to its performance on the SWSI and SMC dataset. The same trend was observed across all other metrics, as summarized in Tables 3 and 4. These results demonstrate that our algorithm can accurately impute missing sleep data from non-shift working populations, even if the imputation model is trained from shift workers’ data.

Individual and global models can accurately predict the timing of sleep onset and offset and 24 hours-TST

To further evaluate the imputation performance beyond AUC values and accuracies, we considered all the missing intervals that overlap with the transition in the sleep status (i.e. part-wake–part-sleep or part-sleep–part-wake) and estimated the timing of sleep onset and offset. Both individual and global models estimated the timing of sleep onset and offset reasonably well for missing intervals of up to 24 hours across all datasets (Figure 5). We quantified the performance using the relative error, defined as the ratio of the error in sleep onset or offset timing to the length of the baseline missing gap (e.g. if the error in timing = 90 min and the length of missing gap = 180 minutes, then the relative error = 50%). The relative error was then averaged over all missing intervals up to 24 hours. For the SWSI dataset, both the individual and global models estimated the timing of sleep onset and offset with similar relative error (individual and global models relative error ≈ 27%, Supplementary Figure S1A and B). For the SMC dataset, individual models estimated the sleep onset and offset timing with a relative error of 32.4% and 34.4%, respectively (Figure 5C), while the error for the global model was 30.3% and 30.4%, respectively (Figure 5D). This indicates that the global model performed better than the individual models on a shorter shift workers dataset, consistent with their AUC values. Moreover, both the individual and global models on the EWH dataset estimated the sleep onset and offset timing more accurately (relative error < 20%, Figure 5, E and F) than both shift worker datasets. This further demonstrates that our algorithm based on shift worker data can be generalized to impute missing sleep data from non-shift workers.

Estimation of sleep onset and offset timing using individual and global models. (A–B) Timing of the sleep onset (i) and offset (ii) overlapping with missing gaps were estimated using individual (a) and global models (b) within the SWSI dataset. Both models demonstrated similar relative error, defined as the ratio between the error and the baseline length of missing data. Missing data up to 4 hours were binned every 1 hour, and missing data longer than 4 hours were binned every 5 hours. The average and SEM (Standard Error of the Mean) in error of all missing data within each bin were plotted. (C-D) The estimation of sleep onset (i) and offset (ii) timing using the individual (c) and global model (d) for the SMC dataset. The global model achieved more accurate estimation for both sleep onset and offset timing than the individual models. Missing data were binned the same way as in the SWSI dataset. (E-F) Estimation of sleep onset (i) and offset (ii) using individual (e) and global model (f) for the EWH dataset. Both individual and global models estimated sleep onset and offset timing more accurately than either shift worker datasets.
Figure 5.

Estimation of sleep onset and offset timing using individual and global models. (A–B) Timing of the sleep onset (i) and offset (ii) overlapping with missing gaps were estimated using individual (a) and global models (b) within the SWSI dataset. Both models demonstrated similar relative error, defined as the ratio between the error and the baseline length of missing data. Missing data up to 4 hours were binned every 1 hour, and missing data longer than 4 hours were binned every 5 hours. The average and SEM (Standard Error of the Mean) in error of all missing data within each bin were plotted. (C-D) The estimation of sleep onset (i) and offset (ii) timing using the individual (c) and global model (d) for the SMC dataset. The global model achieved more accurate estimation for both sleep onset and offset timing than the individual models. Missing data were binned the same way as in the SWSI dataset. (E-F) Estimation of sleep onset (i) and offset (ii) using individual (e) and global model (f) for the EWH dataset. Both individual and global models estimated sleep onset and offset timing more accurately than either shift worker datasets.

Finally, we evaluated the performance of our algorithm on a daily-basis by estimating 24 hour-TST and comparing them against the ground-truth 24 hours-TST based on sleep diaries. Specifically, we compared the ground-truth 24 hour-TST and 24 hour-TST estimates after filling in all missing data using the individual or global models and similarly evaluated the relative error of the TST estimation. Only the 24-hour-intervals that have at least 30 minutes of missing during sleep were considered in the analysis to avoid trivial cases. Both individual and global models estimated 24 hour-TST reasonably well across all datasets (Figure 6). Similar to our previous results, individual and global models for the SWSI dataset estimated 24 hour-TST similarly well after filling in all missing data up to 24 hours (relative error < 15% Figure 6, A and B). Moreover, for the SMC dataset, the global model (relative error = 17.3%, Figure 6D) performed better than the individual models (relative error = 23.6%, Figure 6C). TST estimate was most accurate for the EWH dataset (average error ≈ 10%, Figure 6, E and F), which further validates the generalizability of our algorithm to non-shift working populations. In all cases, the ground-truth 24 hour-TST and estimated 24 hour-TST were positively correlated (Pearson’s ρ > 0.6, R2 > 0.6, Figure 6 gray lines).

Estimation of 24 hour-TST using individual and global models across different datasets. (A-B) 24 hour-TST (total sleep time) were estimated after filling in missing data using individual (a) and global models (b) within the SWSI dataset for missing gaps of 0~24 hours. Predicted TST and diary-based ground-truth TST were compared (i) Here, the black dotted line represents perfect estimation, and the gray line (left panel) represents the line of best fit between the estimated and ground-truth TST. The error in TST estimation relative to the length of missing data were also calculated (ii). Here, the relative error was defined as the ratio between the TST error and the baseline length of the missing data. Missing gaps up to 4 hours were binned every 1 hour, and missing gaps longer than 4 hours were binned every 5 hours. The average and SEM (standard error of the Mean) in error of all missing data within each bin were plotted. (C–D) Similarly, 24 hour-TST (total sleep time) were estimated after filling in missing data using individual (a) and global models (b) within the SMC dataset for missing gaps up to 24 hours. The global model performed significantly better than the individual models. (E–F) 24 hour-TST estimates after filling in missing data using individual (e) and global models (f) for the EWH dataset for missing gaps up to 24 hours. In all cases, the predicted TST and ground-truth TST were positively correlated with high R2 value, as shown by their lines of best fit (gray lines, left panels).
Figure 6.

Estimation of 24 hour-TST using individual and global models across different datasets. (A-B) 24 hour-TST (total sleep time) were estimated after filling in missing data using individual (a) and global models (b) within the SWSI dataset for missing gaps of 0~24 hours. Predicted TST and diary-based ground-truth TST were compared (i) Here, the black dotted line represents perfect estimation, and the gray line (left panel) represents the line of best fit between the estimated and ground-truth TST. The error in TST estimation relative to the length of missing data were also calculated (ii). Here, the relative error was defined as the ratio between the TST error and the baseline length of the missing data. Missing gaps up to 4 hours were binned every 1 hour, and missing gaps longer than 4 hours were binned every 5 hours. The average and SEM (standard error of the Mean) in error of all missing data within each bin were plotted. (C–D) Similarly, 24 hour-TST (total sleep time) were estimated after filling in missing data using individual (a) and global models (b) within the SMC dataset for missing gaps up to 24 hours. The global model performed significantly better than the individual models. (E–F) 24 hour-TST estimates after filling in missing data using individual (e) and global models (f) for the EWH dataset for missing gaps up to 24 hours. In all cases, the predicted TST and ground-truth TST were positively correlated with high R2 value, as shown by their lines of best fit (gray lines, left panels).

Discussion

In this study, we developed a new machine learning-based algorithm for imputing missing wearable sleep data collected from shift working and non-shift working populations. In particular, we utilized NMF to uncover important latent longitudinal sleep–wake patterns, a key feature in the algorithm’s effectiveness. Then, we proposed two different approaches based on data availability in real-world settings. An individual approach can be used when data from only one participant is available while a global approach can be used when data from multiple participants is available. Their performance differs depending on data length. Specifically, for short data (~15 days) of shift-working nurses, the global model outperformed the individual model (Figures 3, F and G, 4, E and F, 5, and 6). On the other hand, for data collected from shift working nurses over ~50 days (Figure 3, C and D, 4, B and C, 5, and 6) or non-shift working patients regardless of the length of the data (Figure 3, I and J, 4, G an d H, 5, and 6), both approaches can accurately and similarly impute missing sleep data. Because gathering long data from multiple participants is often resource-intensive and time-consuming in real-world clinical settings, these results suggest that it is recommended to use an individual model for the imputation on longer datasets (e.g. >15 days) and a global model for the imputation on shorter datasets (e.g. <15 days) (Figure 7). Furthermore, as the imputation performance gets poorer as the missing intervals get longer, it is recommended to impute data only if the consecutive missing interval is less than 24 hours.

Individual or global models can be used to accurately impute missing sleep data, depending on the length of data. (A) Sufficiently long data (greater than 15 days, e.g. SWSI) successfully captures the participant’s sleep pattern over longitudinal period. This allows individual models to be used for the accurate imputation of missing sleep data. (B) Missing sleep data from shorter dataset (less than 15 days, e.g. SMC dataset) can be accurately imputed with the global model trained with other participants’ actigraphy and sleep diary data.
Figure 7.

Individual or global models can be used to accurately impute missing sleep data, depending on the length of data. (A) Sufficiently long data (greater than 15 days, e.g. SWSI) successfully captures the participant’s sleep pattern over longitudinal period. This allows individual models to be used for the accurate imputation of missing sleep data. (B) Missing sleep data from shorter dataset (less than 15 days, e.g. SMC dataset) can be accurately imputed with the global model trained with other participants’ actigraphy and sleep diary data.

Missing data can pose various challenges in clinical settings, especially in the field of chronobiology, where actigraphy is commonly used to record longitudinal sleep data [18–20]. Despite the recent advancements in statistical and machine learning-based imputation approaches, the existing methods are highly complicated to be utilized by clinicians. Our neural network model is significantly less complex compared to previous deep-learning-based studies [26–33] and performed significantly better than the previous state-of-the-art missing data imputation methods (Supplementary Materials S5 and Figure S1). Importantly, to the best of our knowledge, this is the first study to impute the missing actigraphy sleep data from individuals with disturbed sleep–wake cycles, such as shift workers. This greatly extends the applicability of our algorithm given a high prevalence of irregular lifestyles with a potential risk of sleep disorders [2–7]. Furthermore, we provide a user-friendly computational package, SOMNI, to facilitate the potential use of our algorithm in real-world clinical settings. The applicability of SOMNI, together with its simplicity and accurate imputation performance, can make it highly beneficial for clinicians and patients with circadian sleep–wake disorders, including shift workers, in monitoring of the sleep–wake cycle over extended periods without relying on sleep diaries. Our algorithm can ultimately allow clinicians to easily decipher patients’ puzzling sleep–wake behaviors resulting from no wearing or improper contact with the wearable device and better treat patients with various sleep disorders, such as circadian sleep–wake disorders or insufficient sleep syndrome.

Our model also has applicability to other problems in chronobiology, such as circadian phase estimation. Specifically, it has been reported that identifying the sleep–wake status of an individual can provide more accurate circadian phase estimation of peripheral circadian clocks (e.g. heart rate clock) using data from wearable devices [38–40] due to differences in heart rate dynamics during sleep and wake. Using our imputation model to fill in the gaps in sleep–wake measurements can improve existing circadian phase estimation methods. The more accurate estimation of the circadian phase may further advance personalized monitoring of different diseases, including viral infection and neurodegenerative diseases [41, 42], that exploit circadian information. Additionally, the imputed sleep–wake data can be used to estimate two dynamic processes affecting sleep and alertness: homeostatic sleep pressure and the circadian sleep threshold. Such estimations can be used to alleviate various issues that arise from irregular sleep patterns, particularly for shift workers, such as fatigue and low alertness [43–46]. Likewise, the imputed sleep–wake data can also facilitate future research exploring the effect of shift work on sleep regularity, sleep debt, and social jetlag over longitudinal periods [47, 48].

Although our method was developed using the data collected with actigraphy, it can be easily generalized to other ubiquitous consumer-grade wearables, such as the Galaxy watch, Fitbit, and Apple watch. This generalization can be particularly useful in future real-world sleep research, considering that approximately 75% of the adult population in the United States. has access to these devices [49]. Moreover, consumer-grade devices tend to have a much shorter battery span compared to research-grade wearables [35] and cannot be worn with disposable wristbands to ensure proper attachment [50], which makes them prone to more missing data. Indeed, missing data from consumer-grade wearables caused significant limitations in previous studies that utilized these devices for long-term analysis of the sleep–wake cycle [51, 52]. Since our imputation model only requires the measured step counts and known sleep labels as inputs, and because our algorithm was validated with artificial missing data that can accurately and realistically represent missing sleep data from consumer-grade devices, it can be easily generalized to impute missing sleep data that arises from other consumer-grade wearable technologies.

Despite the strengths of our study, some limitations exist. The imputation performance decreases as the missing interval’s length increases (Figure 3, D and G and 4, C and F). Even though longer missing intervals (i.e. > 3 hours) are rare, as illustrated by their distributions in Figure 1B, these limitations could pose problems in real-world settings where patients may need to remove the wearable device for extended periods, but continuous monitoring of their sleep status during those times might be crucial for treatment. Constructing a global model across multiple missing datasets could address this problem, since longer missing intervals are rare and likely to be underrepresented in our current model. This would expose the new model to longer missing data during training, thereby enhancing its ability to handle longer missing data. In addition, our study is based entirely on individuals in Korea although the datasets were independently collected from three different hospitals. We believe the proposed model satisfies external validity as we showed our model trained on the shift workers data (SWSI) can work on heterogeneous datasets of shift workers (SMC) as well as non-shift workers (EWH). Nonetheless, additional validation of individual datasets from various races and demographics is necessary to ensure the universal applicability of our model, particularly for potential use in clinical settings. Furthermore, despite the potential unreliability of sleep diaries due to their subjectivity, we still utilized sleep diaries as a ground truth for the imputation of missing sleep data. Thus, we excluded participants who exhibited large discrepancies between their reported sleep diary and non-missing actigraphy measurements from all analyses after manually and meticulously identifying them. Nonetheless, it would be important to validate our algorithm using an ideal dataset with more dependable ground truth to guarantee its clinical applicability. Finally, while our study focuses on imputing binary sleep–wake status, it is important to note that sleep is a complex physiological process that can be further divided into different stages, such as rapid eye movement sleep and non-rapid eye movement sleep. Thus, future research could explore expanding our model to accurately determine these detailed sleep stages using data from polysomnography or other sleep stage classification algorithms from wearables [53–56]. This would enable us to investigate the generalizability of our algorithms to more challenging problems and deepen our understanding of sleep physiology.

Funding

This study was funded by Institute for Basic Sciences (IBS-R029-C3) (J.K.K.), the National Research Foundation of Korea funded by the Korean government (MSIT) (NRF- 2019R1A2C1090643) (J.H.K.), 2018 Research award grants from the Korean Sleep Research Society (J.H.K.), University of Cincinnati Taft Research Center (M80941) (W.C.), and “Regional Innovation Strategy (RIS)” through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2022RIS-005) (S.P.). The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Author Contribution

J.K.K. designed the study. J.H.K. collected and processed data. M.L, K.H., S.P., W.C., and J.K.K developed algorithms and analyzed the data. M.L, K.H. developed a computational package. M.L. and J.K.K. wrote the draft of the manuscript, and all authors revised the manuscript.

Disclosure Statement

All authors declare no financial or nonfinancial disclosure.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code Availability

The computer codes used in this study are available online (https://github.com/Mathbiomed/SOMNI).

Simulations

All data processing and simulation were performed using Python 3.9. The feature extraction processes were performed using the surprise package for Python, and model training and testing were performed using the Pytorch library.

References

1.

Irish
LA
,
Kline
CE
,
Gunn
HE
,
Buysse
DJ
,
Hall
MH.
The role of sleep hygiene in promoting public health: a review of empirical evidence
.
Sleep Med Rev.
2015
;
22
:
23
36
. doi: 10.1016/j.smrv.2014.10.001

2.

Sulli
G
,
Manoogian
EN
,
Taub
PR
,
Panda
S.
Training the circadian clock, clocking the drugs, and drugging the clock to prevent, manage, and treat chronic diseases
.
Trends Pharmacol Sci.
2018
;
39
(
9
):
812
827
. doi: 10.1016/j.tips.2018.07.003

3.

Härmä
M
,
Tenkanen
L
,
Sjöblom
T
,
Alikoski
T
,
Heinsalmi
P.
Combined effects of shift work and life-style on the prevalence of insomnia, sleep deprivation and daytime sleepiness
.
Scand J Work Environ Health
.
1998
;
24
(
4
):
300
307
. doi: 10.5271/sjweh.324

4.

Laudencka
A
,
Klawe
J
,
Tafil-Klawe
M
,
Zlomanczuk
P.
Does night-shift work induce apnea events in obstructive sleep apnea patients
?
J Physiol Pharmacol.
2007
;
58
(
5
):
345
347
.

5.

Vallières
A
,
Azaiez
A
,
Moreau
V
,
LeBlanc
M
,
Morin
CM.
Insomnia in shift work
.
Sleep Med.
2014
;
15
(
12
):
1440
1448
. doi: 10.1016/j.sleep.2014.06.021

6.

Booker
LA
,
Magee
M
,
Rajaratnam
SM
,
Sletten
TL
,
Howard
ME.
Individual vulnerability to insomnia, excessive sleepiness and shift work disorder amongst healthcare shift workers. A systematic review
.
Sleep Med Rev.
2018
;
41
:
220
233
. doi: 10.1016/j.smrv.2018.03.005

7.

Santos
I
,
Rocha
I
,
Gozal
D
,
e Cruz
MM.
Obstructive sleep apnea, shift work and cardiometabolic risk
.
Sleep Med.
2020
;
74
:
132
140
.

8.

Medicine AAoS
.
International classification of sleep disorders—third edition (ICSD-3)
.
AASM Resour Libr
.
2014
;
281
:
2313
.

9.

Kim
M
,
Vu
T-H
,
Maas
MB
, et al. .
Light at night in older age is associated with obesity, diabetes, and hypertension
.
Sleep.
2023
;
46
(
3
). doi: 10.1093/sleep/zsac130

10.

Rogers
AE
,
Caruso
CC
,
Aldrich
MS.
Reliability of sleep diaries for assessment of sleep/wake patterns
.
Nurs Res.
1993
;
42
(
6
):
368
372
.

11.

Morin
CM.
Measuring outcomes in randomized clinical trials of insomnia treatments
.
Sleep Med Rev.
2003
;
7
(
3
):
263
279
. doi: 10.1053/smrv.2002.0274

12.

Kim
DW
,
Zavala
E
,
Kim
JK.
Wearable technology and systems modeling for personalized chronotherapy
.
Curr Opin Syst Biol
.
2020
;
21
:
9
15
. doi: 10.1016/j.coisb.2020.07.007

13.

Smuck
M
,
Odonkor
CA
,
Wilt
JK
,
Schmidt
N
,
Swiernik
MA.
The emerging clinical role of wearables: factors for successful implementation in healthcare
.
npj Digital Med.
2021
;
4
(
1
):
45
. doi: 10.1038/s41746-021-00418-3

14.

Fekedulegn
D
,
Andrew
ME
,
Shi
M
,
Violanti
JM
,
Knox
S
,
Innes
KE.
Actigraphy-based assessment of sleep parameters
.
Ann Work Expo Health
.
2020
;
64
(
4
):
350
367
. doi: 10.1093/annweh/wxaa007

15.

Smith
MT
,
McCrae
CS
,
Cheung
J
, et al. .
Use of actigraphy for the evaluation of sleep disorders and circadian rhythm sleep-wake disorders: an American Academy of Sleep Medicine systematic review, meta-analysis, and GRADE assessment
.
J Clin Sleep Med
.
2018
;
14
(
7
):
1209
1230
. doi: 10.5664/jcsm.7228

16.

Delaney
L
,
Litton
E
,
Melehan
K
,
Huang
H-CC
,
Lopez
V
,
Van Haren
F.
The feasibility and reliability of actigraphy to monitor sleep in intensive care patients: an observational study
.
Crit Care
.
2021
;
25
:
1
12
.

17.

Marino
M
,
Li
Y
,
Rueschman
MN
, et al. .
Measuring sleep: accuracy, sensitivity, and specificity of wrist actigraphy compared to polysomnography
.
Sleep.
2013
;
36
(
11
):
1747
1755
. doi: 10.5665/sleep.3142

18.

Gold
DR
,
Rogacz
S
,
Bock
N
, et al. .
Rotating shift work, sleep, and accidents related to sleepiness in hospital nurses
.
Am J Public Health.
1992
;
82
(
7
):
1011
1014
. doi: 10.2105/ajph.82.7.1011

19.

Hulsegge
G
,
Loef
B
,
van Kerkhof
LW
,
Roenneberg
T
,
van der Beek
AJ
,
Proper
KI.
Shift work, sleep disturbances and social jetlag in healthcare workers
.
J Sleep Res.
2019
;
28
(
4
):
e12802
. doi: 10.1111/jsr.12802

20.

Tepas
D
,
Carvalhais
A.
Sleep patterns of shiftworkers
.
J Occup Med
.
1990
;
5
(
2
):
199
208
.

21.

Rubin
DB.
Multiple imputation after 18+ years
.
J Am Stat Assoc.
1996
;
91
(
434
):
473
489
. doi: 10.1080/01621459.1996.10476908

22.

Mackinnon
A.
The use and reporting of multiple imputation in medical research–a review
.
J Intern Med.
2010
;
268
(
6
):
586
593
. doi: 10.1111/j.1365-2796.2010.02274.x

23.

Hayati Rezvan
P
,
Lee
KJ
,
Simpson
JA.
The rise of multiple imputation: a review of the reporting and implementation of the method in medical research
.
BMC Med Res Methodol.
2015
;
15
:
1
14
.

24.

Van der Heijden
GJ
,
Donders
ART
,
Stijnen
T
,
Moons
KG.
Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example
.
J Clin Epidemiol.
2006
;
59
(
10
):
1102
1109
. doi: 10.1016/j.jclinepi.2006.01.015

25.

Janssen
KJ
,
Donders
ART
,
Harrell
FE
Jr
, et al. .
Missing covariate data in medical research: to impute is better than to ignore
.
J Clin Epidemiol.
2010
;
63
(
7
):
721
727
. doi: 10.1016/j.jclinepi.2009.12.008

26.

Catellier
DJ
,
Hannan
PJ
,
Murray
DM
, et al. .
Imputation of missing data when measuring physical activity by accelerometry
.
Med Sci Sports Exerc.
2005
;
37
(
11 suppl
):
S555
S562
. doi: 10.1249/01.mss.0000185651.59486.4e. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435061/

27.

Yoon
J
,
Jordon
J
,
Schaar
M.
Gain: Missing data imputation using generative adversarial nets
.
Cambridge, MA, USA
:
PMLR
;
2018
:
5689
5698
.

28.

Dong
X
,
Zhang
J
,
Wang
G
,
Xia
Y.
DAEimp: denoising autoencoder-based imputation of sleep heart health study for identification of cardiovascular diseases
.
New York, NY, USA
:
Springer
;
2019
:
517
527
.

29.

Lin
S
,
Wu
X
,
Martinez
G
,
Chawla
NV.
Filling missing values on wearable-sensory time series data
.
Philadelphia, PA, USA
:
SIAM
;
2020
:
46
54
.

30.

Jang
J-H
,
Choi
J
,
Roh
HW
, et al. .
Deep learning approach for imputation of missing values in actigraphy data: algorithm development study
.
JMIR MHealth UHealth
.
2020
;
8
(
7
):
e16113
. doi: 10.2196/16113

31.

Weed
L
,
Lok
R
,
Chawra
D
,
Zeitzer
J.
The impact of missing data and imputation methods on the analysis of 24-hour activity patterns
.
Clocks Sleep
.
2022
;
4
(
4
):
497
507
. doi: 10.3390/clockssleep4040039

32.

Gashi
S
,
Alecci
L
,
Gjoreski
M
, et al.
Handling Missing Data For Sleep Monitoring Systems
.
New York, NY, USA
:
IEEE
;
2022
:
1
8
.

33.

Silva
RF
,
Pinho
BR
,
Monteiro
NM
,
Santos
MM
,
Oliveira
JM.
Automated analysis of activity, sleep, and rhythmic behaviour in various animal species with the Rtivity software
.
Sci Rep.
2022
;
12
(
1
):
4179
.

34.

Jovanov
E.
Preliminary analysis of the use of smartwatches for longitudinal health monitoring
.
Annu Int Conf IEEE Eng Med Biol Soc
.
2015
;
2015
:
865
868
. doi: 10.1109/EMBC.2015.7318499

35.

Liang
J
,
Xian
D
,
Liu
X
, et al. .
Usability study of mainstream wearable fitness devices: feature analysis and system usability scale evaluation
.
JMIR Mealth UHealth
2018
;
6
(
11
):
e11066
. doi: 10.2196/11066

36.

Thom
HC.
A note on the gamma distribution
.
Mon Weather Rev.
1958
;
86
(
4
):
117
122
.

37.

Gomès
O
,
Combes
C
,
Dussauchoy
A.
Parameter estimation of the generalized gamma distribution
.
Math Comput Simul.
2008
;
79
(
4
):
955
963
. doi: 10.1016/j.matcom.2008.02.006

38.

Bowman
C
,
Huang
Y
,
Walch
OJ
, et al. .
A method for characterizing daily physiology from widely used wearables
.
Cell Rep Methods
.
2021
;
1
(
4
):
100058
. doi: 10.1016/j.crmeth.2021.100058

39.

Kim
DW
,
Lee
MP
,
Forger
DB.
A level set kalman filter approach to estimate the circadian phase and its uncertainty from wearable data
.
arXiv preprint
arXiv
:
2207.09406
.
2022
.

40.

Kim
DW
,
Mayer
C
,
Lee
MP
,
Choi
SW
,
Tewari
M
,
Forger
DB.
Efficient assessment of real-world dynamics of circadian rhythms in heart rate and body temperature from wearable data
.
J R Soc Interface.
2023
;
20
(
205
):
20230030
. doi: 10.1098/rsif.2023.0030

41.

Li
P
,
Yu
L
,
Lim
AS
, et al. .
Fractal regulation and incident Alzheimer’s disease in elderly individuals
.
Alzheimers Dement
.
2018
;
14
(
9
):
1114
1125
. doi: 10.1016/j.jalz.2018.03.010

42.

Mayer
C
,
Tyler
J
,
Fang
Y
, et al. .
Consumer-grade wearables identify changes in multiple physiological systems during COVID-19 disease progression
.
Cell Rep Med
.
2022
;
3
(
4
):
100601
. doi: 10.1016/j.xcrm.2022.100601

43.

Hong
J
,
Choi
SJ
,
Park
SH
, et al. .
Personalized sleep-wake patterns aligned with circadian rhythm relieve daytime sleepiness
.
iScience.
2021
;
24
(
10
):
103129
. doi: 10.1016/j.isci.2021.103129

44.

Klerman
EB
,
Hilaire
MS.
On mathematical modeling of circadian rhythms, performance, and alertness
.
J Biol Rhythms.
2007
;
22
(
2
):
91
102
. doi: 10.1177/0748730407299200

45.

Vital-Lopez
FG
,
Doty
TJ
,
Reifman
J.
Optimal sleep and work schedules to maximize alertness
.
Sleep.
2021
;
44
(
11
). doi: 10.1093/sleep/zsab144

46.

Knock
SA
,
Magee
M
,
Stone
JE
, et al. .
Prediction of shiftworker alertness, sleep, and circadian phase using a model of arousal dynamics constrained by shift schedules and light exposure
.
Sleep.
2021
;
44
(
11
). doi: 10.1093/sleep/zsab146

47.

Phillips
AJK
,
Clerx
WM
,
O’Brien
CS
, et al. .
Irregular sleep/wake patterns are associated with poorer academic performance and delayed circadian and sleep/wake timing
.
Sci Rep.
2017
;
7
(
1
):
3216
. doi: 10.1038/s41598-017-03171-4

48.

Fischer
D
,
Roenneberg
T
,
Vetter
C.
Chronotype-specific sleep in two versus four consecutive shifts
.
J Biol Rhythms.
2021
;
36
(
4
):
395
409
. doi: 10.1177/07487304211006073

49.

Baron
KG
,
Duffecy
J
,
Berendsen
MA
,
Mason
IC
,
Lattie
EG
,
Manalo
NC.
Feeling validated yet? A scoping review of the use of consumer-targeted wearable and mobile technology to measure and improve sleep
.
Sleep Med Rev.
2018
;
40
:
151
159
.

50.

Ancoli-Israel
S
,
Martin
JL
,
Blackwell
T
, et al. .
The SBSM guide to actigraphy monitoring: clinical and research applications
.
Behav Sleep Med.
2015
;
13
(
suppl 1
):
S4
S38
. doi: 10.1080/15402002.2015.1046356

51.

Burgdorf
A
,
Güthe
I
,
Jovanović
M
, et al. .
The mobile sleep lab app: An open-source framework for mobile sleep assessment based on consumer-grade wearable devices
.
Comput Biol Med.
2018
;
103
:
8
16
. doi: 10.1016/j.compbiomed.2018.09.025

52.

Roberts
DM
,
Schade
MM
,
Mathew
GM
,
Gartenberg
D
,
Buxton
OM.
Detecting sleep using heart rate and motion data from multisensor consumer-grade wearables, relative to wrist actigraphy and polysomnography
.
Sleep.
2020
;
43
(
7
). doi: 10.1093/sleep/zsaa045

53.

Walch
O
,
Huang
Y
,
Forger
D
,
Goldstein
C.
Sleep stage prediction with raw acceleration and photoplethysmography heart rate data derived from a consumer wearable device
.
Sleep.
2019
;
42
(
12
). doi: 10.1093/sleep/zsz180

54.

Boe
AJ
,
McGee Koch
LL
,
O’Brien
MK
, et al. .
Automating sleep stage classification using wireless, wearable sensors
.
npj Digital Med.
2019
;
2
(
1
):
131
. doi: 10.1038/s41746-019-0210-1

55.

Sridhar
N
,
Shoeb
A
,
Stephens
P
, et al. .
Deep learning for automated sleep staging using instantaneous heart rate
.
npj Digital Med.
2020
;
3
(
1
):
106
. doi: 10.1038/s41746-020-0291-x

56.

Radha
M
,
Fonseca
P
,
Moreau
A
, et al. .
A deep transfer learning approach for wearable sleep stage classification with photoplethysmography
.
npj Digital Med.
2021
;
4
(
1
):
135
. doi: 10.1038/s41746-021-00510-8

Author notes

Minki P Lee and Kien Hoang contributed equally.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.