The Use of Survey Measures to Assess Circadian Variations in Alertness

Summary: Current models of sleep regulation postulate both a homeostatic and circadian component and promise an understanding of disturbed and displaced sleep. Estimates of these components have traditionally required relatively cumbersome and costly measures, including sleep electroencephalograms and continuously recorded rectal temperature, but it has recently been demonstrated that they may successfully be based on frequent (e.g. 2-hourly) concurrent ratings of alertness. This paper examines whether similar results might be obtained using retrospective survey measures of alertness obtained from shiftworking nurses at a single sitting. These retrospective measures are shown to be sensitive to both time of day and shift, to have a high level of reliability even for relatively small sample sizes (e.g. 10) and to be valid predictors of more traditional concurrent measures of alertness. It is concluded that retrospective alertness ratings may prove to be an extremely cost-effective method for examining the trends in alertness in various groups, including those suffering from specific types of sleep disorder.

the individual's circadian phase. Such assessment has, traditionally, been based on continuously recorded rectal temperature, a technique that is relatively timeconsuming, costly and cumbersome and requires considerable cooperation on behalf of the individuals concerned.
An alternative method for assessing an individual's circadian phase is to obtain frequent (e.g. 2-hourly) subjective ratings of alertness and/or sleepiness. Despite being entirely subjective, such ratings are known to cross-correlate with various psychophysiological measures of arousal rather better than they do with one another (3) and to exhibit a pronounced circadian rhythmicity that is, at least partially, independent of the sleep-wake cycle (4,5). Further, a mathematical model of the homeostatic and circadian components of variations in these ratings has been developed that is essentially based on that of Borbely (6)(7)(8)(9)(10). Indeed, the primary difference between this alertness model and that of Borbely (2) relates to the period immediately following awakening when sleep need is obviously low despite the ability to return to sleep being high. This is reflected in ratings of alertness being low and has been attributed to a "sleep inertia" or "wake up" effect (process "W"). This mathematical alertness model makes substantially similar predictions to that of Borbely's and has been shown to account for both variations in alertness on abnormal sleep-wake schedules (6)(7)(8) and various objective measures of sleep (9,10) with considerable accuracy.
This suggests that alertness ratings may provide a convenient method for interpreting the effects of displaced or disturbed sleeps. Such ratings are, however, subject to considerable random variation (11) and typically require data to be averaged from a number of individuals and/or days in order to provide reliable estimates (5). The researcher is thus dependent on individuals not only volunteering to provide frequent ratings, but also on them actually doing so over a relatively long period. Further, when "paper and pencil" ratings of alertness are used, there is no way of being certain that the ratings were actually made at the required time, rather than, for example, being filled in en masse at the end of a day or even at the end of the study! These problems can be reduced by the use of handheld computers that automatically "time and date stamp" each rating and can be programmed to ring an alarm to remind individuals to make their ratings (12). However, the use of these increases the cost of obtaining such ratings and still requires a considerable commitment on behalf of the volunteers.
An alternative to these "concurrent" ratings might be to ask individuals to retrospectively rate how fatigued or alert they "normally" feel at various stipulated times. Such a survey technique would allow ratings to be obtained from large numbers of individuals and may even prove less prone to "random error" than more traditional concurrent measures because they should be less affected by random variation and inconsistent external influences. The present paper is concerned with establishing the sensitivity of such ratings to time-of-day effects, their reliability and their validity in comparison to more traditional ratings. We have chosen to do this on groups of experienced shiftworking nurses whose sleeps are commonly displaced because, at least in the case of rotating nurses, this allows an examination of whether the trends obtained vary according to the timing of sleep.

METHODS
The nurses were all drawn from a larger sample working in larger (400+ bedded) general hospitals in England and Wales who had previously completed a "Standard Shiftwork Index" (13). The present questionnaire was sent by post to a total of 904 nurses who had volunteered to complete it. A total of 648 nurses returned questionnaires and the results of 545 of these are included in this paper. The majority (89) of the 103 nurses whose results were excluded never worked at night, whereas the remaining 14 nurses had failed to provide complete data on one or more of the scales considered in this paper. The present analyses were thus based on the results from 320 full-time rotating nurses who worked early, late and night shifts and 225 permanent night nurses. Only about 40% of the permanent night nurses worked full time (i.e. an average of3 7.5 hours per week), the remainder being part-time nurses who worked, on average, 24 hours per week. The biographical details for these samples are shown in Table 1.
It is clear from this table that the samples differed from one another in a number of ways. Thus, for example, the full-time permanent night nurses were older (t = 9.601, df = 408, p < 0.001) and had greater experience of shift work (t = 9.813, df= 408, p < 0.001) than their rotating counterparts, whereas a greater proportion of the part-time permanent night nurses were living with a partner (x 2 = 21.030, df= 1, p < 0.001), and they had more dependents (t = 5.075, df = 223, p < 0.001), than the full-time permanent night nurses. Thus any difference found between these groups in response to their different shift systems should be interpreted with caution.
The retrospective alertness rating scales formed part of a larger questionnaire that the nurses completed in a single session. They were given written instructions to use the rating scales to indicate how alert or sleepy they normally felt at 2-hourly intervals before, during and after the different shifts that they worked. Because these ratings were all made at the same time, they were The standard deviations (in parentheses) are expressed in minutes.
totally reliant on the nurses' memory for the manner in which their alertness normally varied across their shifts. In the case of the night shift they were asked to consider only their second and subsequent successive night shifts rather than their first. This was in order to avoid any potential difference on the first night shift that might result from the typically longer period of prior wakefulness. The nurses were also asked to indicate the times at which they normally started, finished, went to sleep after and woke up before each type of shift. A series of scales for each shift, one page per shift, were then presented with 2-hourly (even hour) times given in the left-hand column. These times were designed to cover all possible times of normally waking prior to, and going to sleep following, the shift concerned. The nurses were instructed to make the ratings only for those times that they would normally be awake. For each time, a nine-point rating scale [1][2][3][4][5][6][7][8][9] was used, with verbal labels at the top of each set of scales reading: Very alert [1], Alert [3], Neither alert nor sleepy [5], Sleepy (but not fighting sleep) [7] and Very sleepy (fighting sleep) [9]. This is essentially the Karolinska Sleepiness Scale (14) but modified slightly in an attempt to ensure that it is linearly related to visualanalogue scales (VAS).
A subsample of61 of the nurses, 32 rotating and 29 permanent night, were subsequently recruited to an intensive, longitudinal study that took place at least 6 months after they had returned the questionnaire. This study involved the nurses making 2-hourly, concurrent, alertness ratings while on duty over the course of 28 days. The ratings were made on a VAS displayed on a handheld computer (12) and formed part of a larger battery of items. The VAS scale comprised 20 discrete points, and thus the ratings could vary from 1 to 20. The nurses made these ratings by moving a cursor to the appropriate point on the scale and then pressing the "Execute" key. Once this had been done (47.6) (1l3.0) a new item was displayed and the rating could be neither changed nor displayed in any manner. Thus, in contrast to the retrospective ratings, each rating was made in the absence of any indication of the previous ones. Between them, the 61 nurses made a total of 3,363 such ratings spread over 768 shifts, giving an average of 4.4 ratings per shift. Following these ratings, the nurses performed a serial reaction-time task in which an asterisk appeared in one of four, equiprobable, positions on the screen and had to be responded to by pressing the corresponding key. As soon as each response was made, a new asterisk appeared until the nurse had made 160 responses. In the present paper we consider only the mean correct response time, i.e. omitting incorrect responses and abnormally long ones [i.e. > 1 second; see Totterdell and Folkard (12) for further details].

Work and sleep times
The mean times at which the two groups of nurses started and finished their various shifts are shown in Table 2. This table also shows the times at which the nurses reported that they normally went to sleep and woke up between two successive shifts of the same type. The use of t tests indicated that the rotating nurses started their night shifts rather later (11 minutes) than the permanent night staff (t = 4.705, df = 506, p < 0.001), but there was no reliable difference in the time at which they finished them (t = 0.386, df = 506, p > 0.25). More interestingly, the rotating nurses went to sleep earlier (t = 5.054, df= 501, p < 0.001) and woke up later (t = 3.028, df= 496, p < 0.01) between successive night shifts than the permanent night nurses. The net result of this was that, in contrast to previous findings (15), the reported day sleeps of the rotating nurses were an average of 48 minutes longer than those of the permanent night staff. retrospective alertness ratings on the shifts for the rotating (e, n = 320) and permanent night (0, n = 225) nurses. Note that the alertness scale has been inverted such that higher ratings indicate higher alertness.

Sensitivity
The mean retrospective alertness ratings before, during and after the various shifts and their standard errors are shown in Fig. 1, separately for the rotating and permanent night nurses. This figure is based only on the data from the 219 rotating nurses and 188 permanent night nurses for whom ratings were available at all of the times shown, but the trends shown are substantially similar to those obtained for the entire groups. This figure also shows the average timing of the early (E), late (L) and night (N) shifts as horizontal bars. In the case of the night shift, the timing was fairly similar for the rotating and permanent night staff (see Table 2).
It is clear from this figure that the nurses perceived their alertness as varying substantially both within and across the different shifts. Thus, for example, the rotating nurses rated their alertness as maximal at 1200 hours on the early shift, at 1600 hours on the late shift and at 0000 hours on the night shift. Further, the direction of change in alertness between, for example 1200 and 1600 hours, or 1800 and 2200 hours, depended crucially on which shift was being considered. Thus the general impression is that the trends in alertness were not only highly reliable for each of the three shifts, but also showed considerable adjustment to the shift in question. It is, however, unclear as to whether this simply reflected the changed sleep timing associated with the three shifts, or whether it might also have indicated some adjustment of the endogenous body clock.
These ratings were analyzed by means of repeated measures analyses of variance with the Greenhouse-Geisser correction for sphericity. These confirmed that these ratings varied over the course of each separate Sleep, Vol. 18, No. 5,1995 shift [rotators-early shift F (6, 1,308) = 95.3, p < 0.001; late shift F(5, 1,090) = 120.1, p < 0.001; night shiftF(6, 1,308) = 182.2, p < 0.001; permanent night F (6, 1,122) = 134.9, p < 0.001]. Further, when the last readings from the early and night shifts were omitted to equate the number of ratings on each shift, the rotating nurses' ratings showed both a main effect of shift [F(2, 436) = 188.3, p < 0.001] and an interaction between shift and time point within shift [F (1 0, 2,180) = 29.1, p < 0.001]. Overall, the rotating nurses rated their alertness as highest on the late shift (mean = 7.28) and lowest on the night shift (mean = 5.92), but this was at least partially due to a more substantial decrease in alertness over the course of the night shift.
A final analysis, comparing the trends over the night shift of the rotating and permanent night nurses, indicated that there was a highly reliable main effect of group [F(l, 405) = 117.4, p < 0.001] and a modest interaction between group and time point within shift [F (6, 2,430) = 2.6, p < 0.05]. The permanent night nurses rated themselves as considerably more alert before, during and after the night shift (mean = 6.79) than did their rotating counterparts (mean = 5.63). In view of the various demographic differences between these groups of nurses (Table 1), it is not possible to interpret this unexpected main effect. Nevertheless, it is of interest to determine whether it is reflected in the more traditional, concurrent alertness ratings (see below).

Reliability
Most reliability coefficients, such as Cronbach's alpha, are essentially based on the average correlation between individual items and the overall score on a psychometric test and are clearly inappropriate for use with these retrospective alertness ratings. However, Ebel (16) describes a parametric reliability coefficient that, like Kendall's non parametric coefficient of concordance, is specifically designed for use with subjective ratings and is based on the average concordance between any pair of raters over the range of items (i.e. times) being rated. Its use on the present data yielded extremely high coefficients of 0.993 and 0.994 for the groups of permanent and rotating nurses, respectively. Further, the coefficients for the average individual within each group were 0.416 and 0.425, respectively, which compares favorably with the loading of individual items on the overall scores of many standard psychometric tests.
Thus, on conventional criteria, these retrospective alertness ratings would appear to have an acceptably high level of reliability. Nevertheless, it is clear that the trend obtained from any given subject may be relatively unrepresentative, raising the question as to the number of subjects that might be required to obtain a representative trend for the population from which they are drawn. The first 160 nurses in each of the present groups were thus arbitrarily assigned to subgroups of 5 nurses each, and these were then combined into groups of 10, 20, 40 and 80, with the reliability coefficients being recalculated at each stage. The results are shown in Table 3, from which it is clear that respectable levels of reliability were achieved with relatively small samples of subjects. Thus it would appear that these retrospective alertness ratings show considerable consistency across individuals. Such a consistency in these ratings could however reflect on a number of factors such as systematic rater bias and need not imply that they are valid measures of alertness.

Validity
Our primary concern here was whether these retrospective alertness trends were a valid indicator of those obtained using more traditional concurrent ratings. A secondary concern was whether they might also relate to a more objective measure, namely response speed on a serial choice reaction time task. For the purposes of the present paper, the 2-hourly data points available from the concurrent, longitudinal study at each time on each shift for each nurse were averaged to give a mean trend across each shift for each nurse. In the case of the night shift, the data from the first night of each span of successive night shifts were omitted to ensure comparability with the retrospective ratings. These trends comprised four time points on the early (0800-1400 hours) and late (1400-2000 hours) shifts, and five on the night (2200-0600 hours) shift, that were coincident with subsets of those used in the retrospective study.
Of the 32 rotating nurses who took part in this study, complete trends were available from 22 nurses on the early shift, from 23 nurses on the late shift, but from only 15 on the night shift, reflecting on the fact that 11 of the nurses did not work a night shift during the course of the 28-day period of the study. Further, of the 15 nurses for whom concurrent trends were available on the night shift, 5 had either failed to provide retrospective ratings for one of the shifts or lacked complete concurrent trends across all three shifts. Thus, complete retrospective and concurrent ratings across all three shifts were only available from 10 rotating nurses. Similarly, complete night shift trends were available for both concurrent and retrospective ratings for only 23 of the 30 permanent night nurses who took part in the concurrent, longitudinal study. There was no evidence that these nurses for whom complete records were available differed from the samples from which they were drawn on any of their biographical details.
There was a highly reliable average cross-correlation between the individual retrospective and concurrent trends over the three shifts in the rotating nurses (rav = +0.585, df= 100, p < 0.001). Ofthe 10 individual cross-correlations, 5 were reliable (p < 0.05), whereas all ten were positive (binomial test, p < 0.001). Similarly, the average cross-correlation for the permanent night nurses was also highly reliable (rav = +0.611, df = 46, p < 0.001) despite the fact that each correlation was based on only five pairs of ratings. Of the 23 individual correlations, 6 were reliable (and positive, p < 0.05), whereas only four were negative (binomial test, p < 0.001). Nevertheless, it should be emphasized that the average cross-correlations accounted for <40% of the common variance, suggesting that the results from any single individual should be interpreted with extreme caution.
Of more concern, however, were the high cross-correlations between the averaged retrospective and concurrent trends (shown in Fig. 3) for the rotating (r = +0.951, df = 11, p < 0.001) and permanent night (r = +0.947, df = 3, p < 0.01). These imply that even when based on fairly small numbers of subjects (e.g. 10), mean retrospective trends may give a relatively valid estimate of the mean trend that would be obtained from the same subjects using more time-consuming, concurrent rating techniques. As a further check on this, analyses of variance with the Greenhouse-Geisser correction for sphericity were performed to see whether the same conclusions would have resulted from the sole use of either the concurrent or retrospective ratings. The first analysis was confined to the rotating nurses and examined the within-subject factors of type of rating (two levels), type of shift (three levels) and time within shift (four levels). For this purpose the 0800-hour reading following the end of the night shift was excluded to equate the number of times on each shift. This analysis yielded no reliable main effect of type of shift (F= 2.841, df= 2,18, p > 0.05), a highly reliable effect of time within shift (F = 11.877, df = 3, 27, p < 0.001) and a substantial interaction between these two factors (F = 7.150, df = 6, 54, p < 0.001). More importantly in the present context, there was no evidence of any interaction involving the type of rating (F < 1, p > 0.25 in all three cases). Separate analyses of the ratings on each type of scale confirmed that both scales showed a reliable main effect of time within shift and an interaction of this factor with type of shift, but no main effect of this latter factor.
A further analysis of variance compared the trends over the night shift obtained from the rotating and permanent nurses. This examined the between-subjects factor of group (two levels), and the within-subjects factors of type of rating (two levels) and time within shift (five levels). There were reliable main effects of group (F = 7.254, df = 1, 31, p < 0.05), type of rating (F = 341.189, df = 1, 31, p < 0.001) and time within shift (F= 45.152, df= 4,124, p < 0.001) Sleep, Vol. 18, No. 5,1995 but no evidence for an interaction between group and time within shift (F = 1.892, df = 4, 124, p > 0.10). Further, there was no evidence of any interaction involving type of scale (p > 0.10 in all three cases). Separate analyses based on each scale confirmed that both scales showed reliable main effects of group and time within shift, but no evidence for a group by time within shift interaction. It was noteworthy that, across the two sets of analyses, all four reliable F-ratios based on the retrospective ratings were larger than the corresponding ones based on the concurrent ratings.
Finally, the mean trends in response speed (number of responses per second) across the shifts are shown in Fig. 3. These trends are based on all 10 rotating nurses who completed the concurrent alertness ratings, but on only 19 of the 23 permanent night nurses who did so due to missing readings. The cross-correlations between these average trends in response speed and those for the retrospective alertness ratings from the same nurses were +0.813 for the rotating nurses (df = 10, p < 0.01) and +0.632 for the permanent night nurses (df = 3, p > 0.05). These correlations did not differ reliably from the cross-correlations of +0.880 and +0.861, respectively, that were obtained between these trends in response speed and those in concurrently rated alertness. Thus it would appear that these retrospective ratings of alertness were not substantially worse than concurrent ones in "predicting" withingroup variations in response speed. However, it is clear from Fig. 3 that the absolute difference between the permanent night and rotating nurses in their alertness ratings on the night shift was reversed in these response speed measures. It seems possible that this may reflect on the age difference between these groups (see Table   1).

DISCUSSION
These results clearly suggest that retrospective survey ratings may indeed provide sensitive, reliable and valid estimates of circadian variations in alertness from relatively small groups of subjects. Indeed there was no evidence that they were less sensitive or reliable than more traditional concurrent ones. The fact that the trends obtained from the rotating nurses clearly differed considerably across the three shifts suggests that, at least with sufficient experience, people are capable of distinguishing between days on which they sleep (and work) at different times. This suggests that even non-shiftworkers might be able to distinguish between, for example, rest and work days, or, given sufficient experience, between days on which they have slept at a substantially different time to normal, in making these ratings. Further research is, however, clearly needed to confirm this suggestion.
Perhaps the most important, and surprising, finding was that the retrospective ratings were sensitive to differences between relatively small groups of subjects. Thus the ratings of the permanent night nurses were, in line with their concurrent ratings, consistently higher than those of the rotating nurses when working on the night shift. In view of the biographical differences between these groups, and the reversal of this effect in the response speed measures, it would be wrong to attribute this to a beneficial effect of permanent night shifts. However, it does suggest that these retrospective survey ratings of alertness might usefully be used for a variety of purposes. These might include comparisons of the trends obtained from groups differing in their "momingness", age and gender, as well as comparisons between groups of people suffering from specific types of sleep disorder that are thought to relate to circadian abnormalities. Thus, and especially if used in conjunction with the alertness model (8), they offer the possibility of a survey technique for examining differences in the amplitude or phase of circadian rhythms in alertness.
In conclusion, although we would not advocate abandoning the use of the costlier, more time-consuming, concurrent measurements of fatigue, it would appear that the large-scale use of retrospective ratings might profitably be used, if only in pilot studies, to complement them.