Validation of the Bluebelle Wound Healing Questionnaire for assessment of surgical‐site infection in closed primary wounds after hospital discharge

Background Accurate assessment of surgical‐site infection (SSI) is crucial for surveillance and research. Self‐reporting patient measures are needed because current SSI tools are limited for assessing patients after leaving hospital. The Bluebelle Wound Healing Questionnaire (WHQ) was developed for patient or observer completion; this study tested its acceptability, scale structure, reliability and validity in patients with closed primary wounds after abdominal surgery. Methods Patients completed the WHQ (self‐assessment) within 30 days after leaving hospital and returned it by post. Healthcare professionals completed the WHQ (observer assessment) by telephone or face‐to‐face. Questionnaire response rates and patient acceptability were assessed. Factor analysis and Cronbach's α examined scale structure and internal consistency. Test–retest and self‐ versus observer reliability assessments were performed. Sensitivity and specificity for SSI discrimination against a face‐to‐face reference diagnosis (using Centers for Disease Control and Prevention criteria) were examined. Results Some 561 of 792 self‐assessments (70·8 per cent) and 597 of 791 observer assessments (75·5 per cent) were completed, with few missing data or problems reported. Data supported a single‐scale structure with strong internal consistency (α greater than 0·8). Reliability between test–retest and self‐ versus observer assessments was good (κ 0·6 or above for the majority of items). Sensitivity and specificity for SSI discrimination was high (area under the receiver operating characteristic (ROC) curve 0·91). Conclusion The Bluebelle WHQ is acceptable, reliable and valid with a single‐scale structure for postdischarge patient or observer assessment of SSI in closed primary wounds.


Introduction
Surgical-site infection (SSI) is the third most common healthcare-associated infection in the UK 1 influencing patient outcomes, quality of life and healthcare resources 2 . Rates of SSI vary considerably, depending on the type of surgery performed (for instance, clean or contaminated) and individual patient risk factors. Many SSIs take time to become apparent, often developing or becoming symptomatic after the patient has left hospital 3,4 . Rate estimates are influenced by methods and timing of data collection, particularly the robustness of postdischarge follow-up 2,5,6 . Accurate assessment after discharge is therefore key to SSI surveillance and research is needed to minimize this important healthcare issue 7 .
Assessing wounds for SSI after hospital discharge can be done by patient self-reporting, by asking patients to return for an outpatient appointment, or by conducting home visits. The latter two methods are resource-intensive 8 . Patient self-reporting can reduce these burdens, although accurate tools are needed. Existing postdischarge self-reporting questionnaires for patients 9 -11 have methodological weaknesses. They have been adapted from tools intended for completion by a professional, lack patient input in their development, have not been validated for use in a postdischarge setting, and have been criticized because they do not account for symptom severity -an important aspect in SSI diagnosis 12 . The Bluebelle Wound Healing Questionnaire (WHQ) was developed with input from patients and multidisciplinary healthcare professionals to address these limitations. It assesses signs, symptoms and wound care interventions relevant for the diagnosis of SSI in closed primary wounds, specifically after the patient has left hospital 13 . Early work 13 has demonstrated that the WHQ is comprehensive, easily understood, and can be completed by patients and/or observers (healthcare professionals). The present study examined the acceptability, scale structure, reliability and validity of the WHQ in a large sample of patients undergoing surgery with closed primary abdominal wounds.

The Bluebelle Wound Healing Questionnaire
The WHQ was developed as part of the Bluebelle study 14 , a feasibility study that included a pilot RCT to examine whether an RCT of different wound dressing strategies for reducing SSI was possible 14,15 . Initial development of the WHQ has been reported previously 13 . The WHQ was designed as a single questionnaire for patient and/or observer completion.
The version of the questionnaire undergoing validation in this study consisted of 16 items: eight relating to signs and symptoms of SSI, and eight relating to wound care interventions. Two of these items included additional components, collecting more detail on signs and symptoms, if applicable. Early versions of the questionnaire also included questions on resource use (for the wider Bluebelle feasibility study) that were not relevant to the diagnosis of SSI, and therefore are not included in these analyses.

Study design
Two data sets from the Bluebelle study were used in this analysis: data from a cohort recruited specifically to validate this new measure; and data from the pilot RCT. Research ethical approval was granted from the National Health Service (NHS) Health Research Authority National Research Ethics Service (NRES) Committee London -Camden and Kings Cross (reference 14/LO/0640) and the South West -Frenchay Research Ethics Committee (reference 15/SW/0008). Eligible participants were aged over 16 years, undergoing elective or unplanned abdominal general surgery or caesarean section. Participants who lacked capacity, ability to read or understand English, and prisoners were excluded. Further inclusion and exclusion criteria were relevant to the wider requirements of the Bluebelle study and have been reported previously 14 . Studies ran between August 2015 and January 2016, and between March 2016 and November 2016 (cohort study and pilot RCT respectively) from four UK NHS hospital trusts. Participants were recruited by research nurses, surgical trainees or other trained members of the study team on hospital wards before or after surgery. Potential participants were given an information leaflet, and were provided with sufficient time to consider involvement and discuss the study before being approached again to take part. All participants were asked to give written informed consent. Fig. 1 illustrates the study design and data collection.

Wound Healing Questionnaire self-assessment
The WHQ was distributed by post for participants to complete and return (by stamped addressed envelope, included) 30 days after surgery. Instructions were to complete the WHQ in relation to events since hospital discharge. A subset of 50 cohort participants (sampled during 1 month of the study) were posted an additional WHQ within 1 week of completing the first WHQ (for test-retest assessment). In a series of debriefing questions included with the WHQ, data were collected from the cohort participants on the time needed for WHQ completion, whether help was required, and whether items were confusing or difficult to answer. Reminders for non-responders were sent only to participants of the pilot RCT.

Wound Healing Questionnaire observer assessment
In the cohort study, the WHQ was completed by a clinical member of the study team via a telephone call with participants 4-5 weeks after surgery. In the pilot RCT, the WHQ was completed by a clinical member of the study team during the participant's face-to-face follow-up appointment between 4 and 8 weeks after surgery.
Reference diagnoses of whether SSI had occurred since the time of surgery were made in face-to-face study follow-up appointments between 4 and 8 weeks after the operation using Centers for Disease Control and Prevention (CDC) criteria and classification of no SSI, superficial, Total participants included in analysis n = 792 Available data: WHQ self-assessment n = 561 WHQ observer assessment n = 597 Self-and observer assessment n = 470 Reference SSI diagnosis n = 417  16 . Diagnoses were made by an independent member of the study team, blinded to the WHQ self-and observer assessment, using any available sources of information from the participant and hospital records. All pilot RCT participants and a convenience sample of cohort participants (sampled by availability due to limited study resources) underwent a face-to-face reference wound assessment.

Analyses
All 16 items were included in the initial analysis. Missing responses to the items with multiple components (collecting more detail on signs and symptoms, if applicable) were imputed with values of zero if no response was expected (for example, if the sign or symptom had not occurred). Summation of item scores was performed as suggested by the data. Reference SSI diagnoses were dichotomized to create a binary variable with 0 = no SSI and 1 = SSI of any type (combining CDC classifications of superficial, deep and organ/space SSI due to low numbers of reported deep and organ/space SSI).

Acceptability
Acceptability of the WHQ was explored in three ways: first, by examining response rates (the proportion of completed WHQ self-and observer assessments); second, by exploring missing responses to individual items (indicating possible issues such as not understanding the item); and third, by examining answers to the debriefing questions.

Scale structure
Exploratory factor analyses examined the underlying structure and constructs of the questionnaire. Analyses were conducted separately for self-and observer data. First, all iterations of item pairs were explored using Pearson's correlation coefficients. Pairs with very high correlations (r = 0⋅9 or above) were examined for similarity and considered for redundancy and exclusion before conducting factor analyses 17 . Next, three separate factor analysis models were run, specifying the maximum number of factors to be retained as one, two and three factors (maximum-likelihood method of estimation). Models were initially explored with data from the cohort study, and the best-fitting model was applied to data from the pilot RCT as a method of independent validation of the scale structure. The best-fitting model was applied finally to the combined cohort and pilot RCT data. A sensitivity factor analysis was performed using a polychoric matrix because of the ordinal, categorical nature of the WHQ data 18 . Multitrait scaling analyses were also applied as a comparative statistical approach 19 . Internal consistency (internal reliability) of the scales identified from the factor analyses was examined using Cronbach's α coefficient 17 . Values greater than 0⋅7 were considered to have good internal consistency 17 .

Reliability
Test-retest reliability 20 was assessed by comparing selfassessment responses to the WHQ completed twice over a period of anticipated stable health. Stable health was assumed if responders reported that they had not been back into hospital for treatment with a problem with the wound (item 11) in the retest assessment. Cross-tabulations of responses and weighted κ statistics were calculated.
Equal weights between response categories for ordinal items (items 1-8) were assumed, with weighted values of 0, 0⋅333, 0⋅667 and 1 between categories. κ values below 0⋅4 were considered to indicate poor agreement. Values between 0⋅4 and 0⋅75 were considered to indicate fair to good agreement 17 .
Inter-rater reliability (agreement between self-and observer assessments, where data from both assessments were available) was explored, to examine the reliability of the self-assessment for collecting outcome data in a future large-scale trial. Cross-tabulations of item responses and weighted κ statistics were calculated as described above. Percentages of agreement and discordance were examined.

Validity
Criterion validity was examined against the reference SSI diagnosis to demonstrate how well the WHQ performed in discriminating between individuals with and those without SSI. Cross-tabulations of the reference CDC diagnosis ('no SSI' or 'SSI of any type') and a binary variable of the self-assessment WHQ total score (created by a cut-off score; for instance, a WHQ total score of less than or equal to x) were compared. Sensitivity and 1 − specificity values of the WHQ for different cut-off scores were used to plot a receiver operating characteristic (ROC) curve, representing the trade-off between sensitivity and specificity 21 .
The overall ability of the WHQ to discriminate between individuals with and those without SSI was measured by the area under the ROC curve (AUC) and 95 per cent confidence intervals. An AUC value approaching 1⋅0 was interpreted to indicate good discrimination with high sensitivity and specificity, whereas a value of 0⋅5 was interpreted as the measure not being able to discriminate at all 21 .

Modifications for the final questionnaire
Findings from the above were used to inform modifications to the final version of the WHQ, considering rates of missing data for individual items, answers to the debriefing questions and overlap between items (if correlations of r greater than 0⋅9 were observed).

Response rates
Self-and observer WHQ assessments were completed for 561 of 792 (70⋅8 per cent) and 597 of 791 (75⋅5 per cent) participants respectively, with 470 of 791 (59⋅4 per cent) of these participants having both sets of data completed.
In total, 104 of 792 participants (13⋅1 per cent) did not have any WHQ self-or observer assessments available (complete non-responders).

Missing responses to items
Less than 3 per cent of responses were missing for most items (10 of 16) in the self-assessments and no items had more than 4 per cent of responses missing (Table S1, supporting information). For observer assessments, nearly all items (15 of 16) had less than 2 per cent of responses missing. Missing responses to the additional components of the two items for which further information on signs and symptoms was intended to be collected (if applicable) were, however, high, with up to 43 per cent of self-assessments missing a response when one would have been expected. Missing responses to these additional components were lower in the observer assessments, although levels were still notable and ranged between 8 and 17 per cent (Table S1, supporting information).

Responses to debriefing questions
Most participants (276 of 302, 91⋅4 per cent) reported that the questionnaire took fewer than 10 min to complete. Less than 6 per cent reported needing help or finding items difficult or confusing to answer.

Scale structure
A high correlation (r = 0⋅95) was observed between item 4 ('Have the edges of any part of the wound separated/gaped open of their own accord? (spontaneous dehiscence)') and its additional component collecting further information: 4a ('Did the skin separate?'). Study team agreement of similarity in the underlying concept of these questions deemed item 4a to be redundant, and it was therefore excluded from factor analyses. Factor analyses of the cohort and pilot RCT data separately supported a single-scale structure. Results from the combined data set are shown in Table 2. Item factor loadings ranged between 0⋅32 and 0⋅87 in data from participant self-assessments, and between 0⋅33 and 0⋅85 in data from observer assessments (with the exception of one item with a factor loading of 0⋅03). Examination of eigenvalues and factor loadings provided little Data suggested it was sensible to calculate a WHQ total score by summing the raw scores for each item without any weightings.
Cronbach's α for a single scale was high, with coefficients of 0⋅86 in participant data and 0⋅88 in observer data.

Test-retest reliability
A total of 44 of 50 participants (88 per cent) included in the test-retest sample (who all reported stable health) completed and returned a second WHQ. The median time between test-retest assessments was 5 (i.q.r. 4-7) days. Agreement in responses for test-retest assessments was high, with levels of observed agreement greater than 86⋅2 per cent for all items (Table S2, supporting information). Where it was possible to calculate a reliable κ statistic, the majority of values were greater than 0⋅59.

Inter-rater reliability
Self-and observer assessments were available for 59⋅4 per cent of participants, with a median of 8 (i.q.r. 2-16) days between assessments. Agreement was generally high (observed agreement for any item greater than 84⋅3 per cent), although participants showed a trend to report levels of signs and symptoms to be slightly more severe than observers; an example from one item is shown in Fig. 2 (for data from all items, see Fig. S1, supporting information). Where it was possible to calculate a reliable κ statistic, values were between 0⋅40 and 0⋅74 for the majority of items (Table S3, supporting information). Some minor discrepancy was shown between participant and observer responses to wound-care intervention items, and whether these interventions had occurred.

Validity
Reference SSI diagnoses (face-to-face, using CDC criteria) were available for 417 of 791 participants (52⋅7 per cent). Sensitivity and specificity values of the WHQ self-assessment for discriminating between SSI and no SSI were high, with an area under the ROC curve of 0⋅91 (95 per cent c.i. 0⋅83 to 0⋅98) (Fig. 3). Cross-tabulation of the self-assessment WHQ total score (excluding item 4a) with the reference SSI diagnosis is provided in Table S4 (supporting information). Sensitivity and specificity values for selected WHQ cut-off scores are shown in Table S5 (supporting information). From the present data set, a cut-off score of 6-8 appeared to be a reasonable threshold for suggesting no SSI/SSI compared with the reference diagnosis, with relatively few misclassifications.

Modifications for the final questionnaire
Evidence supported the need for minor revisions to the WHQ format to improve its efficiency and minimize missing data. Item 3 and its additional components collecting more information (3a-c) were restructured into three stand-alone items. Item 4a was removed. Items were renumbered to accommodate these changes. The response option of 'don't know' was removed. Questions collecting resource use purely for the purposes of the economic analysis of the Bluebelle pilot RCT were no longer included. The final WHQ items, after these revisions, are shown in Table 3.

Discussion
This study examined the acceptability, scale structure, reliability and validity of the WHQ for use as a patient-or observer-completed tool for the assessment of SSI in closed primary surgical wounds after abdominal surgery. The WHQ was found to be acceptable to patients and demonstrated good response rates, with low levels of missing data. Analyses supported a single-scale structure to assess SSI that made clinical and practical sense. Test-retest reliability was high, and agreement between participants and observers was good. The WHQ demonstrated high sensitivity and specificity for SSI discrimination compared with a face-to-face reference CDC diagnosis. It is therefore suggested that the WHQ is an acceptable, reliable and valid patient-reported or observer-completed questionnaire for assessing SSI in closed primary surgical wounds.
Existing self-reported questionnaires for patients have been adapted mostly from the CDC criteria and ASEPSIS tools 11,16,22 . They are limited because of the lack of user involvement in development. Criticisms include that they are complicated and difficult to complete 11,23 . These self-reporting measures are also limited in their design, such as asking for yes/no responses to questions without the option to report the amount or severity of the sign/symptom. This is important when assessing a wound, as demonstrated, for example, in a recent study 12 that found the amount of exudate was more strongly associated with SSI than with the type of exudate. Existing patient measures, however, do not provide an opportunity for the amount of exudate to be captured. The same study also highlighted that bright red skin was observed in patients who had SSI, but also in patients who did not, providing another example where capturing the amount or severity of a sign/symptom rather than just its presence or absence is important. The WHQ has addressed these limitations by involving a multidisciplinary team (including patients, surgeons, nurses, microbiologists and health service researchers) in its development and by using a combination of qualitative and quantitative methods; it also underwent rigorous pretesting during development to ensure face and content validity 13 . The result is a reliable, valid, comprehensive and uncomplicated questionnaire that includes an ordinal response scale to capture symptom severity.
The study has some limitations. First, a true standard for the diagnosis of SSI without subjective perceptions or opinions is lacking, with the result that tests for criterion validity are limited. The CDC classification of SSI diagnosis was chosen as the best available reference standard for comparing the WHQ as it is the most commonly used and widely regarded tool available. Second, reports of the more major wound care interventions (such as debridement and drainage) were rare in this data set; this may have an impact, for example, on factor analyses. In addition, some discrepancy was observed between participant and observer reports of these major interventions, suggesting possible low fidelity of participant responses. Although these more major interventions were rare in this data set and the number of discordant reports between self-and observer assessments were few, this discrepancy may be important to consider and warrants further investigation, as it may have implications for studies relying solely on patient self-assessment for collecting outcome data. Missing data in responses to the additional component parts of items collecting further information on signs and symptoms (if applicable) were relatively high, although this may be explained by the layout of the questionnaire; modifications in the revised version aim to address this. Although the wide range of abdominal operations is a strength of this study, it is recognized that the proportion of participants undergoing caesarean section (12⋅2 per cent) is likely to have affected the representative age of the rest of the sample presenting for general abdominal surgery and may have affected the findings. Finally, other limitations of this work relate to its testing and use after abdominal surgery alone, and for wounds healing by primary intention.
Further use and validation of the final version of the WHQ in other types of wound and surgical specialty is underway. Cut-off scores for SSI diagnosis will be explored.
In addition, members of the research group are exploring the feasibility of collecting digital images of the wound taken by patients as a tool to use in conjunction with the WHQ for improving remote and blinded SSI assessment. Advances in digital technology, including the use of smart phones and other tablet devices with cameras, mean that obtaining data from patients after discharge is becoming increasing possible 24,25 . These moves towards using digital technologies to obtain patient-reported data, including images of wounds, have great potential for improving SSI assessment and ultimately patient care.