Comparing polysomnography, actigraphy, and sleep diary in the home environment: The Study of Women’s Health Across the Nation (SWAN) Sleep Study

Abstract Study Objectives Polysomnography (PSG) is considered the “gold standard” for assessing sleep, but cost and burden limit its use. Although wrist actigraphy and self-report diaries are feasible alternatives to PSG, few studies have compared all three modalities concurrently across multiple nights in the home to assess their relative validity across multiple sleep outcomes. This study compared sleep duration and continuity measured by PSG, actigraphy, and sleep diaries and examined moderation by race/ethnicity. Methods Participants from the Study of Women’s Health Across the Nation (SWAN) Sleep Study included 323 White (n = 147), African American (n = 120), and Chinese (n = 56) middle-aged community-dwelling women (mean age: 51 years, range: 48–57). PSG, wrist actigraphy (AW-64; Philips Respironics, McMurray, PA), and sleep diaries were collected concurrently in participants’ homes over three consecutive nights. Multivariable repeated-measures linear models compared time in bed (TIB), total sleep time (TST), sleep efficiency (SE), sleep latency (SL), and wake after sleep onset (WASO) across modalities. Results Actigraphy and PSG produced similar estimates of sleep duration and efficiency. Diaries yielded higher estimates of TIB, TST, and SE versus PSG and actigraphy, and lower estimates of SL and WASO versus PSG. Diary SL was shorter than PSG SL only among White women, and diary WASO was lower than PSG and actigraphy WASO among African American versus White women. Conclusions Given concordance with PSG, actigraphy may be preferred as an alternative to PSG for measuring sleep in the home. Future research should consider racial/ethnic differences in diary-reported sleep continuity.


Introduction
Sleep measurement methods may influence the results and interpretation of epidemiological, experimental, and clinical sleep studies, emphasizing the importance of understanding how various sleep assessment modalities compare to one another. In addition to practical and logistical factors, such as cost and participant burden, the selection of measurement modality is often dictated by the outcome of interest. For example, polysomnography (PSG) may be used to quantify physiological characteristics of sleep (e.g. sleep architecture) and nocturnal physiology (e.g. sleep-disordered breathing [SDB], autonomic activity during sleep) [1], whereas self-report may be used to assess qualitative dimensions of sleep (e.g. how rested one feels upon awakening) [2]. Wrist actigraphy, in which sleep is inferred from lack of movement, is useful for measuring naturalistic rest-activity patterns and habitual sleep, as data are collected continuously and noninvasively over many days [3,4]. While some outcomes are unique to a specific measurement modality, indices of sleep duration and continuity can be measured by multiple modalities including objective (e.g. physiological [PSG] and behavioral [actigraphy]) and subjective (e.g. self-report) assessments. PSG, actigraphy, and self-report sleep diaries are three primary modalities by which sleep is measured.
Because PSG directly measures brain electrophysiology, it is considered the "gold standard" measure for many sleep outcomes against which actigraphy and self-report are compared. Despite its status as the benchmark sleep measure, PSG has several limitations, including high cost (equipment, signal processing, expert personnel) and participant burden [5], even when performed in-home and unattended. These shortcomings are compounded when data are collected across multiple nights, which is desirable due to the potential impact of study procedures on sleep (e.g. "first night effect") and the natural night-tonight variability in many sleep outcomes [6][7][8][9]. Habitual aspects of sleep and variability in sleep patterns both inform our understanding of normative and disordered sleep and their influence on health, functioning, and mortality [10][11][12], so actigraphy and self-report sleep diaries may be preferred over PSG.
Using participant self-report, daily sleep diaries ascertain habitual sleep characteristics including time in and out of bed, timing of sleep and wake, and the number, duration, and reasons for awakening after sleep onset [13]. However, diaries may suffer from recall bias and incur more participant burden than wrist actigraphy. Actigraphy, while having lower burden and being more objective than diaries, exhibits poor specificity for discriminating wake from sleep when activity is low [14] and may mis-score off-wrist activity as sleep [15]. Actigraphy and sleep diaries have unique clinical utility, as they are recommended for in-home assessment of sleep disorders, including insomnia and circadian rhythm sleep-wake disorders (CRSWDs) [16,17]. Given the practicality and clinical relevance of actigraphy and sleep diaries, it is necessary to understand how well these modalities compare to PSG in the home setting where they are often used. Furthermore, because the cost of actigraphy (equipment, data processing, and cleaning) may hinder implementation, it is important to consider how diary estimates of sleep compare to actigraphy.
The few studies that have compared PSG, actigraphy, and sleep diaries concurrently have reported that actigraphy yields comparable estimates of sleep duration to PSG [18], but mixed evidence for diary compared to PSG and actigraphy [18,19]. Actigraphy produces similar estimates as PSG on most other sleep parameters but yields consistently lower sleep latency (SL) estimates compared to PSG [18]. These studies are characterized by various limitations to ecological validity and generalizability, including small sample sizes [18,19], a single night of assessed sleep [19], study of individuals with sleep or mental health disorders [18,19], and administration in laboratory settings [18,19]. A meta-analysis of studies comparing actigraphy and PSG in non-laboratory settings found that actigraphy largely exhibited high agreement with PSG, yet also estimated longer sleep duration and greater sleep continuity than PSG. Agreement between modalities decreased with worsening sleep quality [20]. To our knowledge, only one previous study has compared sleep across all three modalities in the home [21], finding that self-report diaries yielded longer estimates of sleep duration-the only sleep outcome measured-compared to actigraphy and PSG. These results suggest that other sleep outcomes, such as indices of sleep continuity and clinically relevant sleep disturbances, may also differ by measurement modality, but these questions have not been previously tested.
Aging affects sleep, and sleep problems in women are especially prevalent during the late reproductive (perimenopausal) stages and across the menopausal transition, which may be a key inflection point when sleep patterns are altered negatively. Previous studies have shown that subjective sleep complaints persist during peri-and post-menopause [22]. Because there are potential differences between objectively and subjectively measured sleep in women [23,24], in this analysis we compared sleep duration and continuity measured both subjectively and objectively. Therefore, the present study compared measures of sleep duration (time in bed [TIB], total sleep time [TST]), and continuity (sleep efficiency [SE], SL, wakefulness after sleep onset [WASO]) assessed by PSG, wrist actigraphy, and sleep diaries across up to three nights in a community sample of 323 midlife women. Clinically relevant sleep disturbances (e.g. short sleep duration, difficulties maintaining sleep) were also compared between modalities. All data were collected in participants' homes over three consecutive nights, which permitted a direct comparison of measures for the same nights across all three modalities. Each of the five sleep indices (TIB, TST, SE, SL, and WASO) can be measured by all three modalities and have been variously related to health, functioning, and mortality [25][26][27][28][29][30][31].
Given that associations between diary-and actigraphyassessed sleep [32] and diary-and PSG-assessed sleep duration [21] have been shown to differ between African American and White adults, race/ethnicity was explored as a potential effect modifier. Several other factors may affect agreement between sleep measurement modalities. Vasomotor symptoms (VMS) have been associated with greater motor restlessness in bed [33], which may affect actigraphy more than diaries and PSG. Individuals who were obese self-reported shorter sleep at similar levels of actigraphymeasured sleep compared to those who were not obese [32]. The use of medications that affect sleep [34] and depressive symptoms [21] have both been associated with greater discrepancy between diary-and actigraphy-measured TST, resulting in shorter diaryversus actigraphy-assessed TST [21,34]. These factors were examined as covariates in the present analyses.

Study participants
The multi-modal Study of Women's Health Across the Nation (SWAN) Sleep Study was an ancillary study, conducted in a subset of the multi-racial/ethnic cohort of midlife women of SWAN [35]. SWAN is a community-based, longitudinal study of the menopausal transition and its relationships with health and aging, originally enrolling 3302 women. The following exclusion criteria were applied to SWAN participants to determine eligibility for the SWAN Sleep Study: hysterectomy or bilateral oophorectomy (<1% of the cohort), hormone therapy use (23%), nonadherence with core SWAN procedures (missed more than half of annual visits), and biobehavioral factors known to affect sleep, including regular shift/night work, oral corticosteroid use, active treatment for cancer, or alcohol consumption exceeding four drinks per day (1%-3% for each). All eligible participants were approached regarding participation. Of these, 30% declined, with the most cited reasons including "protocol burden," "too busy," and "family obligations." The SWAN Sleep Study enrolled 370 White, African American, and Chinese participants from four of the seven core SWAN study sites: Chicago, IL; Detroit, MI; Oakland, CA; and Pittsburgh, PA.
The present analyses excluded 47 (13%) Sleep Study participants who lacked at least one night of concurrent PSG, actigraphy, and sleep diary data, resulting in an analytic sample of 323. No other inclusion/exclusion criteria were applied. Included participants did not differ from excluded Sleep

Study protocol
The SWAN Sleep Study protocol [37] was conducted across an entire menstrual cycle or 35 days, whichever was shorter. Unattended PSG sleep studies were conducted in participants' homes on the first three nights of the protocol. Study staff arrived at participants' homes approximately 3 h before the participants' bedtime to apply electrodes and calibrate monitors. Participants slept in their own beds and went to bed and awoke according to their habitual sleep and wake times, which were determined by self-report. Participants turned off the PSG recorder and removed study equipment themselves upon awakening in the morning. Wrist actigraphy and sleep diary data were collected throughout the protocol. Other measures pertinent to the current analyses were collected in conjunction with the Sleep Study or core SWAN protocol, as described below.

Sleep
Each participant contributed one to three nights of concurrent PSG, wrist actigraphy, and sleep diary data. Sleep outcomes included in the present study were variables that could be measured by all three measurement modalities: indices of sleep duration (TIB, TST) and sleep continuity (SE, SL, WASO).

PSG
PSG sleep data were collected with Vitaport-3 (Temec; Kerkade, Netherlands) ambulatory recorders. Signals collected on each study night included bilateral central referential electroencephalogram (EEG) channels (C 3 and C 4 , referenced to A 1 -A 2 ), electro-oculogram (EOG), submentalis electromyogram (EMG), and electrocardiogram (EKG). Additional signals were collected on the first night of sleep studies for the assessment of SDB (nasal pressure and oral-nasal thermistors, fingertip oximeter, and abdominal and thoracic excursion, as measured by inductance plethysmography to reflect respiratory effort) and leg movements. Quality assurance assessments, scoring, and processing of all PSG records was performed at the University of Pittsburgh Neuroscience-Clinical and Translational Research Center (N-CTRC) as previously described [37].
Sleep stage scoring was performed by trained PSG technologists with established inter-rater reliability (i.e. intraclass correlation coefficients for wake, non-rapid eye movement, and rapid eye movement each > 0.90) in a sample largely overlapping this study. PSG-assessed. TIB was calculated as time from reported lights out ("got into bed with the intention to go to sleep") to time of reported awakening from sleep ("awoke in the morning"). Sleep technologists examined PSG records for signs of movement artifact in EEG, EMG, and EOG channels as an indicator of active wakefulness. A persistent reduction in movement artifact across channels was taken as evidence of "settling" that corresponds with lights off and/or attempting to sleep. PSG-assessed TST was calculated as total minutes of any sleep stage after sleep onset. PSG-assessed sleep continuity measures included SL (time from beginning of the recording period to the first of 10 consecutive minutes of Stage 2 or Stage 3-4 sleep interrupted by no more than two minutes of Stage 1 or wakefulness), WASO (total minutes of wakefulness between sleep onset and good morning time [GMT]), and SE (time spent asleep/TIB × 100).

Actigraphy
Participants wore the Mini-Mitter actiwatch (AW-64; Phillips Respironics, McMurray, PA) on their nondominant wrist throughout the duration of the protocol. This device has been validated against PSG [38]. Data were uploaded for later processing and scoring in 1-minute epochs using Actiware version 5.04 software standard procedures and the medium sensitivity threshold (40 activity counts per epoch). Actigraphy-assessed TIB was defined by study staff as each day's suspected nocturnal sleep period: the difference between good night time (GNT)-the time at which participants "got into bed with the intention to go to sleep," and GMT-the time at which participants "awoke in the morning." Actigraphy GNT and GMT were informed by GNT and GMT reported in sleep diaries. Within TIB, sleep onset was identified as the first epoch of 10 consecutive minutes of sleep, in which less than one epoch was scored as wake. Actigraphyassessed TST was calculated as the total number of epochs within TIB scored as sleep after sleep onset. Actigraphy-assessed SL and WASO were calculated as the number of epochs from GNT to sleep onset and the total number of epochs scored as "awake" following sleep onset to GMT, respectively. Actigraphyassessed SE was calculated as TST/TIB × 100.

Sleep diaries
Each morning upon awakening, participants recorded information about the previous night's sleep using a sample-specific version of the Pittsburgh Sleep Diary [39]. Diary variables relevant to the current analyses included GNT, GMT, SL ("last night it took me ___ minutes to fall asleep"), and WASO ("last night I spent ___ minutes awake after falling asleep"). Diary-assessed TIB was calculated as the total number of minutes between GNT and GMT, while TST was calculated as TIB minus SL and WASO. SE was calculated as TST/TIB × 100.

Covariates
Covariates were measures demonstrated in previous SWAN studies to be strongly related to sleep and included race/ethnicity, VMS, BMI, use of medications that affect sleep, and symptoms of depression [37,40]. Race/ethnicity (non-Hispanic White, African American, or Chinese) was ascertained by selfreport. Each morning upon awakening, participants recorded the total number of hot flashes, cold sweats, and night sweats experienced during the previous night. Due to the distributional properties of VMS in this sample, number of symptoms was averaged across PSG nights and dichotomized as "none" or "at least one" reported. BMI was calculated as weight in kilograms/(height in meters) 2 , as measured by study staff. Selfreported symptoms of depression were assessed on the final PSG night using the 16-item Quick Inventory of Depressive Symptomatology (QIDS) [41]. The QIDS was calculated as a continuous variable (Cronbach's α for reliability = 0.67, 95% CI [0.61 to 0.72]) without the four-item sleep disturbance subscale to reduce collinearity with sleep outcome variables. Use of medications that affect sleep was operationalized as present or absent.

Statistical analysis
Analyses were performed in SAS version 9.2. Descriptive statistics were used to characterize the study sample and evaluate data distributions. Prior to analyses, non-normally distributed variables (SE, SL, and WASO) were transformed by natural logarithm or square root. Participants could contribute a maximum of three nights of data for each of the three measurement modalities; contributing all nine possible data points was considered complete data. A total of 262 (81%) participants provided complete data, 53 (16%) provided eight data points, 7 (2%) provided seven data points, and 1 (<1%) provided six data points.
A series of multivariable linear regression models with correlated errors over repeated measures, a class of linear mixed effects models, were performed for each of the five sleep variables, adjusting for race/ethnicity, BMI, VMS, symptoms of depression, and medications that affect sleep. Models were fit with maximum likelihood estimation using SAS Proc MIXED. Time within participant and modality within participant were included as random effects and a categorical temporal fixed effect was included to allow sleep measures to vary across the three nights. A first-order autoregressive error structure was used to model the within-participant correlation over time, while an unstructured correlation structure was used to model the correlation of sleep as measured by different modalities for a given participant on a given night.
To allow covariates to interact with different modalities while offering parsimonious models, a step-down model selection procedure was implemented for each sleep variable. This procedure started with an initial model that included all main effects and two-way interactions between covariates, modality, and night. The reference group, used to compare specific values across measurement modalities, was White women of average BMI, low depressive symptoms, no use of medications that affect sleep, and no VMS. Race/ethnicity was the only covariate that interacted significantly with modality and was, therefore, the only covariate retained as an interaction term. Wald tests and confidence intervals were used for performing inference, and residual-based diagnostics were used to assess model fit; p-values were not corrected for multiple comparisons.
For each sleep variable, the Bland-Altman approach [42] was used to evaluate whether the observed values assessed by any pair of measurement modalities (e.g. actigraphy and PSG) differed as a function of the size of measurement across modalities. Plots of the mean difference and 95% limits of agreement (LoAs) were generated using recent guidelines [43]. In addition, McNemar's Test [44] was used to evaluate whether identification of clinically significant sleep disturbances differed as a function of modality. Clinically significant sleep disturbances were defined as follows: TST <6 h, SL >30 min, WASO >30 min, and SE <85% [16,45]. Long sleep duration (i.e. TST > 9 h) was not considered due to the paucity of long sleepers in our sample (n = 3).

Results
Participants were midlife women between 48 and 57 years of age (mean = 51.2 ± 2.2 years). Self-identified race/ethnicity was: White (n = 147), African American (n = 120), and Chinese (n = 56). Average BMI in the sample was 29.7 (± 7.7), and one quarter of the sample endorsed use of medications that affect sleep (25.7%). Scores for depressive symptoms were low (mean QIDS score = 4.8 ± 3.0; clinical cutoff for QIDS is 13). BMI differed between groups (F[2, 309] = 44.47, p < .001) such that Chinese women had lower BMI than White and African American women (ps < .001) and White women had lower BMI than African American women (p < .001). VMS differed between groups

Main effects of modality
Descriptive means and mean differences for each sleep outcome across each of the three measurement modalities in the full sample are presented in Table 1. Model fit was acceptable for all models (see residual-based model fit statistics in Supplementary Figure 1A-E). Results from the repeated-measures linear models showed that diary-assessed indices of sleep duration (TIB, TST) and SE were significantly higher than values obtained by PSG and by actigraphy. On average, diary-assessed TIB for the reference group was 20.4 (± 3.4) and 18.1 (± 2.3) minutes longer than PSG-and actigraphy-assessed values, respectively. Similarly, diary-assessed TST for the reference group was 12.6 (± 4.9) and 21.2 (± 4.9) minutes longer on average than values derived from PSG and actigraphy, respectively. Diary-assessed SE was 7.2% (± 1.1) and 7.0% (+/-1.1) higher on average than PSG-and actigraphy-assessed values, respectively. Actigraphy-assessed indices of sleep duration (TIB, TST) and SE did not significantly differ from those assessed by PSG (ps > .05).

Interactions of race and night by modality
We next examined whether modality differences for indices of sleep duration and continuity differed as a function of race/ ethnicity or night of study (   duration (TIB, TST) or SE. None of the modality-by-night interactions was significant, suggesting that modality effects were consistent across recording nights.

Modality effects across the spectrum of measurement
Bland-Altman plots were used to evaluate potential biases and LoAs between all three modalities (i.e. diary vs. PSG, actigraphy vs. PSG, and diary vs. actigraphy) for each sleep outcome ( Figure 1A-E). A mean difference near zero indicates no systematic bias between two modalities. Systematic biases depicted in the figures are consistent with results of mixed model analyses. The slope of the mean difference indicated that diaries yielded higher estimates of TIB, TST, and WASO versus PSG as the size of measurement increased. Mean difference slopes also showed that as the size of measurement increased, actigraphy produced higher estimates of all five sleep outcomes versus PSG and diaries yielded lower SE, SL, and WASO estimates versus actigraphy. Heteroscedasticity, representing increasing or decreasing variability with size of measurement, is indicated by 95% LoAs. Heteroscedasticity was observed for all sleep outcomes and modalities: variability increased with longer TIB and shorter TST and increased substantially with poorer values of sleep continuity (i.e. lower SE, higher SL, and WASO).

Discussion
To our knowledge, this is the largest study to date to directly compare indices of sleep duration and continuity assessed concurrently by PSG, wrist actigraphy, and sleep diaries. We found that mean estimates of sleep duration and SE were similar in actigraphy and PSG but higher in sleep diaries. Both diaries and actigraphy yielded lower estimates of SL and WASO compared to PSG, although differences in diary-assessed SL and WASO varied by race/ethnicity. All modalities showed less agreement with each other at values of poorer sleep: longer TIB, shorter TST, lower SE, and higher SL and WASO. Compared to PSG, sleep diaries identified a lower prevalence of clinically meaningful short sleep and poor sleep continuity, while diary and actigraphy estimated lower and higher prevalence of SL, respectively. These findings suggest that actigraphy measures many important sleep parameters comparably to in-home PSG, but diaries consistently differ from both PSG and actigraphy. Actigraphy and PSG produced similar estimates of TIB and TST, but diaries yielded longer estimates of sleep duration compared to both actigraphy and PSG. These results are mostly in line with previous studies comparing TST and TIB across modalities [21,[46][47][48][49][50][51][52], with the few exceptions being in patients with insomnia in which diary-assessed TST was longer than TST measured by actigraphy [53] and PSG [19]. Our results show that actigraphy and PSG perform comparably on indices of sleep duration in the home, but diaries estimate longer sleep duration than actigraphy and PSG.
Estimates of SE were also comparable between actigraphy and PSG, while diaries measured higher SE relative to both modalities. This actigraphy-PSG agreement is consistent with past research [46][47][48][49], as are the higher SE values estimated by diaries versus actigraphy [50]. Differences between diaries and actigraphy were likely explained by diaries yielding increasingly higher SE values than actigraphy at lower SE. Although the mean SE difference between actigraphy and PSG in our study was small, variability between measures increased as SE decreased, consistent with prior studies [46,49,54]. Our findings suggest that modalities may not be reliably comparable in individuals with poor SE (e.g. insomnia).
Modality differences between other indices of sleep continuity (SL and WASO) were complex. Diaries estimated lower SL and WASO values compared to PSG, opposite of the findings of another study, which reported that diaries estimated higher SL and WASO versus PSG in individuals with clinical depression and insomnia [19]. However, as higher subjective vs. objective sleep complaints are a defining feature of insomnia [55], our findings are not necessarily in conflict with previous research, given that our participants were not a clinical sample. The disrupted sleep onset process interferes with this process. Differences between the present study and previous findings may also be related to poorer correspondence among measurement modalities in individuals with poorer sleep continuity, which is observed in individuals with clinical depression and insomnia [19]. Our finding of actigraphy estimating lower SL values relative to PSG is consistent with a recent systematic review that determined that actigraphy generally yields SL estimates up to 10 minutes shorter than PSG, although differences were not often statistically significant, due in part to high inter-individual variability between modalities [56]. Our data suggest that midlife women self-report significantly shorter times falling asleep and waking during sleep relative to actigraphy and PSG. However, it should be noted that sleep onset is associated with a small amount of retrograde amnesia [57], which limits the amount of recalled time spent falling asleep and may contribute to lower reported SL and WASO compared to actigraphy and PSG. The sleep onset process is compromised in insomnia [57], which may explain differences between present study findings and those in individuals with insomnia [19].
Observed differences in sleep continuity were not uniform across race/ethnicities. Racial/ethnic differences in sleep are well documented [37,58,59], but few studies have examined racial/ethnic differences across sleep measurement modalities. Previous research indicates that actigraphy-and diary-assessed sleep duration correlate less strongly among African American compared to White adults [21,32]. Similarly, in a nationwide sample of adults, African Americans were less likely to report problems falling asleep than Whites despite being more likely to report SL greater than 30 min [60]. These differences may reflect racial/ethnic differences in beliefs about sleep (e.g. the role of sleep in health and functioning), such as were observed in a qualitative study of African American and White older women [61]. Given both the known racial/ethnic group differences in sleep [38,58,59] and the importance of sleep to health and functioning [26][27][28][29][30][31], more research is needed to understand the impact of measurement modality on sleep in diverse groups, including the impact of measurement modality on replication across race/ethnicity. In addition to race/ethnicity and included covariates, other factors may have influenced agreement between measurement modalities. Self-reported sleep duration has been more strongly correlated with wrist actigraphy among individuals with a college degree than those without a college degree [32], suggesting that agreement could differ by participants' educational attainment. However, education was not associated with sleep in the SWAN sample [37]. Movements by a bed partner may alter the inactivity inferred as sleep by actigraphy. Walters et al. [62] observed similar diary-and actigraphy-assessed SL but much higher actigraphy-assessed WASO compared to sleep diary among individuals with bed partners, possibly reflecting a scenario in which awakenings were sufficiently short that participants did not remember the next day. Finally, noise from road traffic has been associated with more reported awakenings and worse sleep quality, and effects on sleep were observed by actigraphy [63]. In summary, education, presence/absence of a bedpartner, and noise/neighborhood environment should be considered as potential moderators of modality agreement in future studies.
Our results also highlight inconsistencies across sleep measurement modalities in identifying clinically relevant sleep disturbances. Multiple measurement modalities are often used in conjunction with one another to improve identification and diagnosis of sleep disorders. For example, while self-report is largely recommended to evaluate insomnia and CRSWDs, actigraphy is also used to both characterize sleep disturbances in these conditions and, in the case of CRSWDs, assess response to treatment [16,17]. Our results indicate that diaries and actigraphy may classify short sleep duration and difficulty falling asleep similarly, but these modalities yield conflicting classifications of poor sleep continuity. Furthermore, differences will likely be exacerbated among individuals with poor sleep continuity. Although classification of clinically relevant sleep disturbances differed widely across measurement modalities, it must be noted that each modality characterizes unique aspects of sleep and may therefore provide clinically valid information depending on the outcome of interest.
Although midlife women generally report a high prevalence of sleep complaints [64][65][66], particularly in the context of physiological changes associated with the menopausal transition [67][68][69], our findings suggest that self-reported sleep was endorsed as more favorable (i.e. shorter TST, lower WASO, higher SE) compared to actigraphy and PSG. Furthermore, Bland-Altman plots indicated that differences between subjectively and objectively measured sleep continuity may be significantly greater in midlife women with more sleep disturbances, which is consistent with a model [57] in which individuals with good sleep underestimate SL and WASO, while individuals with insomnia overestimate relative to these objective measures. Our results highlight the need for better assessments of sleep disturbances in midlife women.
Several limitations and strengths should be considered when evaluating the present results and their implications. Although our study is unique in measuring sleep with objective (i.e. physiological [PSG] and behavioral [actigraphy]) and subjective (i.e. diaries) modalities in the home across three nights in a large and diverse sample of midlife women, results may not be generalizable to other populations. Characteristics of the menopausal transition, including its known effects on nocturnal physiology, may limit the degree to which these findings can be extended to women at other points in developmental or reproductive stages. In addition, results cannot be generalized to men, other age groups, or other racial/ethnic groups. More research should evaluate the impact of measurement modality on sleep given known changes in sleep across the lifespan [70] and differences in sleep by sex [71] and across racial/ethnic groups [72]. Finally, the exclusive use of the AW-64 medium sensitivity threshold limits generalization of findings to other sensitivity thresholds for this device. Low and high sensitivity thresholds can better detect wakefulness and sleep, respectively [73], so using alternate thresholds may have impacted the magnitude, but not the overall pattern, of observed modality differences. Despite these limitations, the present study has numerous strengths, including a rigorous design, a large and racially/ethnically diverse sample, consideration of numerous potential covariates, data collection using standardized protocols across all clinical sites, and high ecological validity via in-home assessment where participants adhered to their natural sleep-wake schedules.
In summary, we found that self-report sleep diaries yielded longer estimates of sleep duration and more favorable estimates of sleep continuity (i.e. lower WASO and higher SE) in comparison to objectively assessed actigraphy and PSG in midlife women. Differences were seen across up to three nights of sleep and, overall, were similar for White, African American, and Chinese women. Actigraphy and PSG produced similar N in each category (%) refers to the number and percentage, respectively, of participants meeting a given threshold for clinically significant sleep disturbance when measured by each modality. N changed categories (%) refers to the number and percentage, respectively, of participants who met the given threshold when measured estimates of sleep duration and efficiency. Our findings suggest that actigraphy may be recommended as a lower-cost alternative to PSG to assess sleep among midlife women in home settings. Observed differences between diaries and actigraphy and PSG should be considered when interpreting results from actigraphy and in-home PSG in the context of sleep diaries, specifically because self-report diaries are likely to estimate longer sleep duration and greater sleep continuity than these objective modalities. However, we emphasize that each modality captures unique aspects of sleep, and modality differences should not be interpreted as measurement error per se. Results of the present study may not be generalizable to patients with clinical sleep disorders, such as insomnia or sleep apnea, in which large differences between self-report and actigraphy-or PSGassessed sleep are common, or to men or to other age groups. Continued efforts to better understand differences in sleep outcomes vis-à-vis measurement modality, and factors that influence these differences, remain critical to our understanding of sleep, the diagnosis and treatment of sleep disorders, and the importance of sleep to health and functioning.

Supplementary Material
Supplementary material is available at SLEEP Advances online.