Adaptation of the Wound Healing Questionnaire universal-reporter outcome measure for use in global surgery trials (TALON-1 study): mixed-methods study and Rasch analysis

Abstract Background The Bluebelle Wound Healing Questionnaire (WHQ) is a universal-reporter outcome measure developed in the UK for remote detection of surgical-site infection after abdominal surgery. This study aimed to explore cross-cultural equivalence, acceptability, and content validity of the WHQ for use across low- and middle-income countries, and to make recommendations for its adaptation. Methods This was a mixed-methods study within a trial (SWAT) embedded in an international randomized trial, conducted according to best practice guidelines, and co-produced with community and patient partners (TALON-1). Structured interviews and focus groups were used to gather data regarding cross-cultural, cross-contextual equivalence of the individual items and scale, and conduct a translatability assessment. Translation was completed into five languages in accordance with Mapi recommendations. Next, data from a prospective cohort (SWAT) were interpreted using Rasch analysis to explore scaling and measurement properties of the WHQ. Finally, qualitative and quantitative data were triangulated using a modified, exploratory, instrumental design model. Results In the qualitative phase, 10 structured interviews and six focus groups took place with a total of 47 investigators across six countries. Themes related to comprehension, response mapping, retrieval, and judgement were identified with rich cross-cultural insights. In the quantitative phase, an exploratory Rasch model was fitted to data from 537 patients (369 excluding extremes). Owing to the number of extreme (floor) values, the overall level of power was low. The single WHQ scale satisfied tests of unidimensionality, indicating validity of the ordinal total WHQ score. There was significant overall model misfit of five items (5, 9, 14, 15, 16) and local dependency in 11 item pairs. The person separation index was estimated as 0.48 suggesting weak discrimination between classes, whereas Cronbach’s α was high at 0.86. Triangulation of qualitative data with the Rasch analysis supported recommendations for cross-cultural adaptation of the WHQ items 1 (redness), 3 (clear fluid), 7 (deep wound opening), 10 (pain), 11 (fever), 15 (antibiotics), 16 (debridement), 18 (drainage), and 19 (reoperation). Changes to three item response categories (1, not at all; 2, a little; 3, a lot) were adopted for symptom items 1 to 10, and two categories (0, no; 1, yes) for item 11 (fever). Conclusion This study made recommendations for cross-cultural adaptation of the WHQ for use in global surgical research and practice, using co-produced mixed-methods data from three continents. Translations are now available for implementation into remote wound assessment pathways.


Introduction
Surgical-site infection (SSI) is the most common complication of abdominal surgery, and has a cross-societal, global impact on patients and their families [1][2][3][4][5] . Delayed return to work, readmission or reoperation leads to substantial effects on quality of life during recovery, and has spill-over effects on mental, economic, and social well-being for patients 6 . This is particularly relevant in low-resource settings, where patients are more likely to suffer catastrophic expenditure around the time of surgery 7 . Consequently, research in SSI prevention has been prioritized by patients, researchers, and clinicians in lowand middle-income countries (LMICs) 8 .
Timely identification of SSI is essential in maintaining patient safety after hospital discharge. Missed SSI diagnoses or misclassification of SSI can directly and indirectly affect patient safety 9 : directly, through delayed intervention for patients with active infection, and indirectly, by introducing bias to randomized studies that feed into best practice guidelines 3,10 . Postdischarge surveillance is therefore considered to be a key quality marker in SSI research and is an important component of postoperative care pathways 10 .
The Bluebelle Wound Healing Questionnaire (WHQ) was developed and validated in the UK in the English language to support postdischarge surveillance for SSI after abdominal surgery 11,12 . This instrument has, however, not yet been adapted for cross-cultural and cross-language implementation in LMICs. High-quality, contextually relevant tools for remote wound evaluation are urgently needed to build resilient and sustainable surgical systems and support safe upscaling of capacity during pandemic recovery 13,14 . They are also needed to reduce loss to follow-up and risk of attrition bias in randomized trials by developing contextually relevant pathways for remote assessment 9 .
The aims of this mixed-methods study (TALON-1) were: to explore cross-cultural and cross-language equivalence, acceptability, and content validity of the WHQ across several LMICs; to assess the scaling and psychometric properties of the WHQ when used across different patient populations and subgroups using Rasch analysis; and to consolidate recommendations for adaptation of the WHQ for use in global surgical research by triangulating qualitative and quantitative data.

Methods
TALON-1 was a mixed-methods study embedded in an international randomized trial, conducted according to best practice guidelines, and co-produced with community and patient partners [15][16][17] . The study used qualitative and quantitative data to explore the extent to which the WHQ measured SSI as a concept, and the parameters of the latent trait (that is, an underlying outcome of interest) in the target (low-resource context) and source (the UK, a high-resource universal healthcare system) cultures. It then aimed to assess how accurately items could transfer meaning across languages 18 . Some adaptation of standard methodology was required to enable the qualitative phase to progress during the SARS-CoV-2 pandemic (Appendix S2). An overview of the study methodology is shown in Fig. 1 and detailed in Table 1.

Reporting and registration
This study was reported with reference to recommendations from the Global Health Network for qualitative research in LMICs, the COREQ framework 15,21 , and PCORI recommendations 16 for best practice in mixed-methods adaptation of outcome measures (PCORI checklist is available in Appendix S3). Primary data from FALCON were published in The Lancet in 2021 22 . The protocol for TALON-1 was preregistered on the MRC Hubs for Trial Methodology Research database 23 (Queen's University Belfast) (SWAT ID126) and published in Trials 20 .

Ethics and ethical approvals
This study within a trial (SWAT) was first approved within the pragmatic multicentre factorial RCT testing measures to reduce SSI in LMICs (FALCON trial) protocol by a University of Birmingham Research Ethics Committee (v1_0_substudies_v1_0. Reference: ERN_18-0230A). Additional approvals were then obtained from national, regional, and/or hospital-level ethics committees for selected centres in all participating countries, in accordance with local protocols. Written (or fingerprint) informed consent to participate was obtained from all participants. In the qualitative phase, an information sheet for was provided to all participants. Verbal consent was taken and recorded. Participant data were pseudonymized for storage securely within a password-protected NVivo® V12 data management system. In the quantitative phase, written (or fingerprint) informed consent to participate was obtained from all participants. Quantitative data were stored in a secure REDCap server 24   First, an expert review of the Wound Healing Questionnaire (WHQ) was conducted using structured interviews and focus groups with surgeons, and site researchers involved in wound evaluation were used to gather rich data regarding cross-cultural, cross-contextual equivalence of the individual items and scale, and conduct a baseline translatability assessment. Second, data from a prospective cohort study were interpreted using a Rasch unidimensional measurement modelling approach to explore scaling and measurement properties of the questionnaire, including cross-cultural differential item functioning. Next, qualitative and quantitative data were triangulated using a modified, exploratory, instrumental design model to recommend adaptations for use of the WHQ in global surgery research and practice 19 . Finally, translation into five languages was completed in accordance with Mapi recommendations. CEI, community engagement and involvement; qual., qualitative; quant., quantitative. Adapted from Oxford University Innovation outcomes centre checklist, and Mapi process for cross-cultural and cross-language adaptation. SWAT, study within a trial; WHQ, Wound Healing Questionnaire.

Host trial
FALCON was a stratified, pragmatic, multicentre, 2 × 2 factorial trial testing two measures (skin preparation and antimicrobial sutures) to reduce superficial or deep skin infection after abdominal surgery in seven LMICs (NCT03700749) 1 . FALCON provided a platform for this study both to identify eligible site investigators for interviews and focus groups, and for co-recruitment of patients to the embedded prospective cohort study.

Study instrument
The WHQ was developed with the aim of detecting postdischarge SSI after abdominal surgery, and validated in a large feasibility study within a pilot RCT (Bluebelle) in the UK, as summarized in Appendix S4 12,25,26 . The WHQ includes 19 items (18 items and 1 subitem) related to the construct of surgical wound healing, with 11 items (10 items and 1 conditional subitem) related to symptoms of SSI, and 8 items related to interaction with the treatment pathway for SSI. It was designed so that it could either be administered by a healthcare professional, or self-reported by patients 27 (a universal-reporter outcome measure 28 ). Two developers of the WHQ were collaborating members of the Study Management Group.

Cross-cultural and cross-contextual adaptation
Owing to the number of target languages for questionnaire in the host trial, cross-cultural adaptation was initially performed in English language. Structured interviews were conducted with two to three research staff in each country, according to a template from the Social Research Association based on Willis 29 . Participants were purposefully sampled from sites participating in the FALCON trial (research nurses, or doctors directly involved in postoperative wound assessment), with a view to including an information-rich mix of participants by sex, country, patient population (urban/rural home location), and experience in face-to-face and telephone follow-up assessments. These interviews aimed to explore the universality of the construct of SSI, cross-cultural relevance of concepts, and construct validity of the questionnaire 18 . The topic guide was structured around four predefined categories (Appendix S5): item comprehension (patients' understanding of the idea and item), response mapping (relating a patients' internally generated answer to response categories provided), retrieval (patients' ability to remember and recall their response), and judgement (patients' overall ability to respond to the item and how they came to this answer) 29 . Unstructured interview notes and a reflexive diary were also maintained as an additional data source. Coding was performed using a pragmatic qualitative approach informed by cognitive theory, by a clinician with training in relevant qualitative research methods and with 10 years' experience of working in international multicentre trials (Appendix S6). The reflexive diary supported interpretation of the interviewer's role as a questionnaire developer and the potential impact on data collection. To ensure credibility, member checking was undertaken with the final summary themes with representative participants and in-country consultants to ensure that meaning was correctly interpreted and maintained 30 .
To check trustworthiness, one or two focus groups were then held with investigators from each country to review and discuss the thematic coding. The focus groups were held after the interviews had been completed to explore consensus and contrasting opinions between different stakeholders around themes emerging in the semistructured interviews. The overall objective was to obtain a single cross-culturally adapted questionnaire to move into cross-language adaptation 31,32 . They were conducted in the English language and co-led by the lead researcher, with one or more in-country consultant co-leads, and sampled 8-12 participants, adopting purposive sampling criteria similar to those of the structured interviews (based on sex, country, patient population, and research experience). A new sample of participants (separate from those participating in interviews) was approached for the focus group phase. Where required, iterative adaptation of the WHQ was made until a point of saturation was reached according to accepted best practice principles for adaptation of instruments 16,33,34 . Recommendations from the qualitative phase were either made overall, specific to an individual item, or related to questionnaire administration. The focus group also included several investigators who were fluent in both the source and target language to serve as a baseline translatability assessment. Together, the process produced an English language questionnaire which had been adapted to broadly ensure cross-cultural equivalence across the participating countries, was acceptable to all national principal investigators, and highlighting potential translatability issues during cross-language adaptation. The procedures for remote, telephone administration of the WHQ were also explored using targeted questions based on investigators' experience within the FALCON trial.

Cross-language adaptation
In some countries, English was a primary or prevalent secondary language among the host trial participants. In these countries, the feasibility of single-language administration of the questionnaire was tested at sites during the cohort study. Where translation of the WHQ was required, this was performed according to the Mapi process for standard linguistic validation to verify conceptual equivalence across languages [34][35][36] . This involved a seven-step process alongside clinicians directly involved in wound assessment (Appendix S7).

Quantitative phase
Data for the quantitative phase were collected during a prospective, international cohort SWAT. Consecutive adult patients (aged over 18 years) recruited to the FALCON trial were eligible. These included a broad range of abdominal operations with a predicted clean-contaminated, contaminated or dirty operating field, and a planned skin incision of greater than 5 cm. Operations could be performed for benign, malignant, trauma, or obstetric indications. Consent for an additional telephone follow-up call to administer the WHQ was taken at the same time as trial consent, using a targeted Informed Consent Form and Patient Information Sheet. Patient and community partners supported co-production of these resources to ensure culturally attuned language and delivery.
Telephone administration of the translated WHQ was undertaken 28-30 days after surgery (in the 72 h before in-person follow-up) integrated into the host trial pathway. The telephone WHQ was administered by a researcher, doctor, or research nurse (non-consultant or attending grade), who was independent of the assessment for the trial primary outcome at 30 days after surgery. Optimization and quality assurance of WHQ administration is described in Appendix S8. No minimum

Psychometric testing using Rasch analysis
A simple summary of Rasch methodology for the general reader is provided in Appendix S9.
The Rasch unidimensional measurement model was fitted to examine the psychometric properties of the WHQ, identify anomalies in the data, and evaluate the extent to which the WHQ items are measuring the latent trait of wound infection 38,39 . Individual items were assessed for excessive misfit (that is, not measuring the trait in question) and response dependency (where items are related by more than just the underlying trait). Additionally, appropriate use of item response categories was checked using category probability curves and threshold mapping. Where probability curves were disordered, response categories were rescored and item fit was then re-examined. Where residual correlations between items were high, subtesting was carried out with re-evaluation of item and model fit. Differential item functioning (DIF) was examined for each item by country, language, and patient home location (urban/rural). Exploration of DIF was undertaken only where a subgroup included at least 50 complete WHQ responses.

Triangulation
Qualitative and quantitative data were triangulated using data (between countries) and methodological (between qualitative interviews and psychometric analysis of quantitative data) triangulation, adopting a modified, exploratory, instrumental design model. Triangulation was performed item by item to enable a final version of the instrument in both source (English) and target languages to be finalized and consolidated 16,[40][41][42][43] . Finally, there was a phase of proofreading, before completion of a final report of the adapted WHQ, and adoption of this version for further prospective validation. Data were also triangulated regarding measurement procedures to optimize future implementation of remote follow-up pathways.

Community engagement and involvement
Patients and community members from LMICs were engaged in all phases of the design and delivery of this study. The interview topic guide was co-designed with input from a representative global surgery patient forum. Practicable methods for conducting interviews, and patient compensation for time in participation, were determined with the support of local community leaders. The Guidance for Reporting Involvement of Patients and the Public (GRIPP-2) short form was used to track and report the impact of CEI 44 .  I0008  I0004   I0019  I0017   I0018  I0016   I0001  I0014  I0005  I0003  I0009  I0002  I0006  I0007  I0011   I0010  I0012   I0015   I0013 1.

Qualitative phase
In total, 10 structured interviews and six focus groups were arranged with a total of 47 investigators across six countries. They included 34 surgeons, five anaesthetists, and eight research staff caring for patients in both urban and rural populations, and across a range of abdominal surgery disciplines. Interview duration ranged from 34 to 112 min, and focus groups lasted from 92 to 126 min. There was a median of 11 (range 6-16) participants involved in the focus groups. Interview and focus group data from site investigators confirmed that the assumption of a universalist approach to SSI was acceptable, and that symptomology and treatment paradigms were shared across settings. No divergence from this was identified during thematic analysis. This was also explored with the CEI partners; together, they confirmed content validity across settings. No new domains or concepts related to symptoms or treatment of SSI arose, suggesting content validity across contexts. A summary of qualitative data are presented for symptom items in Table S1 and treatment items in Table S2.
Themes emerged relating to comprehension, response mapping, retrieval, judgement, and novel cross-cultural insights. Translation was successfully completed in five target languages after the qualitative phase: French (Benin), Hindi (India), Kinyarwanda (Rwanda), Punjabi (India), and Tamil (India). For some potential languages of delivery, there was no written version of the dialect (for example, Goun in Benin, Fante in Ghana), and, on rare occasions, patients would travel a very long distance for treatment and spoke a language that was uncommon to the local area (for example, Malayam in Northern India). Here, the questionnaire was translated ad hoc from English (source language) by the assessor in the cohort study.

Quantitative phase
An attempt was made to contact 655 patients in the cohort study across five countries, of whom five had died by 30 days (15 missing status). Of the 635 confirmed to be alive, 537 (84.5 per cent) were contactable for WHQ completion. Features of included patients are summarized in Table 2.

Unidimensionality of scale
The exploratory Rasch model was fitted using these data from 537 patients (369 excluding extremes) across five class intervals (Table S3). Both analysis of principal components between positively and negatively loading items (1.9 per cent, n = 10 independent t tests less than 5 per cent) and symptom and pathway items (0.6 per cent, n = 8) suggested unidimensionality of the WHQ instrument in detection of SSI.

Model fit and targeting
Overall, the model did not fit well, with a high probability of item-trait interaction (χ 2 209.2, 76 d.f., P < 0.001) and a poor person separation index (0.48, low power of analysis). Conversely, Cronbach's α (with missing data excluded) demonstrated acceptable internal consistency, with a value of 0.86. There was a strong positive skew of person location values, with a mean(s.d.) person location of −2.91(1.05), demonstrating some mistargeting of the WHQ, as may be expected in a diagnostic or screening tool (Fig. 2). The item location map reflected clinical severity, with 168 of 537 participants (31.3 per cent) at the floor of the scale (no signs or symptoms of SSI), and item locations reflecting degrees of infection at the ceiling (Fig. 3).
Examination of individual-person fit did not reveal any significant misfit (s.d. of fit residual greater than +2.5 or less than −2.5). There was a high degree of correlation and dependence between items with local dependency in 11 item pairs (Table S5).

Triangulation
Triangulation of qualitative and quantitative data was performed item by item for the 11 symptom items (10 items and 1 subitem) and eight pathway items (Appendix S10). Where deductive cognitive themes or inductive cross-cultural themes arose, they were explored against individual item fit, dependency, and DIF in the Rasch model (Figs S1-S4). Recommendations were made for cross-cultural adaptation for WHQ items 1 (redness), 3 (clear fluid), 7 (deep wound opening), 10 (pain), 11 (fever), 15 (antibiotics), 16 (debridement), 18 (drainage), and 19 (reoperation). When triangulating disordered threshold probabilities (Figs 4 and 5) with corroborating or conflicting qualitative data, a recommendation was made to move to three item response categories (1, not at all; 2, a little; 3, a lot) for symptom items 1 to 10, and to two categories (0, no; 1, yes) for item 11 (fever). A summary of recommendations is displayed in Table 3, and the final adapted questionnaire in Appendix S11.

Fig. 5 Threshold map for Wound Healing Questionnaire
The higher the 'threshold' of transition in each item from a low scoring value to a higher scoring value, the more indicative ('difficult' in Rasch terminology) that item was in detecting the 'latent trait' (surgical-site-infection, SSI) in the Rasch model. For example, item 19 (reoperated) and item 17 (wound scraping) had the highest threshold so were most likely to indicate SSI, whereas item 10 (tenderness) and item 13 (dressing) had the lowest threshold so were least likely to indicate SSI (in isolation). *Items with disordered thresholds (overlapping category probability curves seen in Fig. 4).
Translated versions of the adapted WHQ are provided in Appendix S12.

Measurement procedures
A summary of measurement procedures is shown in Often the telephone owner was a friend or relative (who was then able to connect the researcher directly to the patient) rather than the patient themselves (189 of 537, 35.2 per cent), and commonly this was a mobile phone (534 of 537, 99.5 per cent). In total, 154 of 537 (28.7 per cent) had a mobile phone with video capability. Feedback from CEI partners alongside interview data supported optimization of the telephone follow-up pathway for future implementation; this is presented in a toolkit available in Appendix S13.

Discussion
Pathways for remote assessment of common complications after surgery in low-resource settings are essential in improving the safety and resilience of surgical care systems. This mixed-methods study made recommendations for crosscultural and cross-language adaptation of the WHQ for use in LMICs, and improved its relevance across cultures and for patients with lower levels of health literacy. Conceptual equivalence, and content and construct validity was confirmed across languages using qualitative and translation methods. Unidimensionality, measurement properties, and use of the total WHQ score were seen to be valid within the Rasch framework, although the overall power of fit was low. The telephone pathway was demonstrated to be feasible and highly acceptable. Working with CEI partners, recommendations were made for optimization of telephone follow-up in research and postoperative surveillance programmes. This study provides a large, international, high-quality proof of concept for rapid adaptation and implementation of patient-reported measures in emerging global health arenas such as surgery.
The use of mixed methods here added strength and depth. The qualitative data were used primarily to inform cross-cultural adaptation ahead of translation. Although this was based on cognitive theory, data were collected indirectly about patient experience from frontline clinicians involved in wound assessment. The Rasch analysis supplemented this, and allowed patient-level data to enrich and inform final recommendations for adaptation. In a majority of instances, the qualitative and quantitative data were supportive of one another, demonstrating coherence during triangulation. Where conflict arose, qualitative findings were softened and/or caveated (that   is, changes were recommended where there was coherence on triangulation, and further exploration recommended where there was conflict between the qualitative and quantitative data). Rasch analysis is an established method for instrument development and cross-cultural refinement 39,45,46 . Here, its principal value was in confirming the validity of use of the total WHQ score as an ordinal scale and in enhancing understanding of the response structure and local dependency. Properties of the WHQ, however, make it a rather unusual application of the Rasch model. First, it is principally a diagnostic tool for SSI rather than an interval-level tool measuring a spectrum of severity of a latent trial. This was best seen in mistargeting of the WHQ to the study population, with many patients at the 'floor' adding low information value to the model, as would be expected in a screening tool (where many patients are asymptomatic). This reduced the overall power of fit as many participants contributed little information about item locations. Second, as expected in a diagnostic test, many items had high levels of local dependency, which may have contributed to the overall model misfit. Third, several items misfit the Rasch model and the person separation index was poor, with a conversely high Cronbach's α value. Again, this is highly likely to be due to the extreme 'floor' of respondents in the setting of a diagnostic tool. It was not the overall aim to fit this diagnostic tool closely to the Rasch model, and it would not be required to be valid for use if it demonstrated a satisfactory psychometric structure, unidimensionality, and sufficient sensitivity and specificity upon clinical application. This highlights the importance of further work to validate the tool externally in a diagnostic test accuracy study.
Exploring complex relationships between items and optimizing the measurement properties using subtesting and adjusting for DIF was not the aim here, but warrants further investigation. It is feasible that the instrument could be simplified, or its diagnostic accuracy could be improved using Rasch by better accounting for differences in the symptomology and health-seeking behaviours of patients with SSI across countries. DIF by country observed for several items here supports methods to ensure balance in randomized trials, such as stratification or minimization of randomization by country.
This study has several limitations. Owing to safety and ethical concerns during the SARS-CoV-2 pandemic, cognitive interviewing could not be undertaken directly with patients. Instead, aggregate perspectives of frontline clinicians involved in the care of surgical patients were explored. This meant that the data represented clinicians' impressions of patients' responses, and challenges in retrieval and judgement, rather than direct exploration with patients in typical cognitive interviewing 29 . Sampling of researchers directly involved in the same portfolio of trials was a pragmatic decision, but may have reduced the transferability of themes across other hospital types (for example, remote rural hospitals), resource settings (such as hospitals with less research infrastructure) or differing populations (for example, less literate populations, with poorer access to healthcare). Thematic saturation overall was aimed for when ending recruitment to the qualitative phase, but this is unlikely to have been reached at an individual-country level 47 . It is, therefore, possible that important insights were missed during adaptation, although recommendations were strengthened by triangulation with quantitative data to reduce over-reliance on qualitative data alone 40 . Second, related to analysis, as the WHQ did not meet all the Rasch assumptions for model fit, a logit-adjusted scale was not developed. Further development could improve the measurement properties of the questionnaire to allow direct patient-to-patient comparisons in future research. Complex patterns of DIF in measurement that could lead to differences in point score equivalence across different patients with differing characteristics when applied clinically were not taken into account. Finally, related to interpretation, the most important metric of clinical utility in a screening tool such as this would be diagnostic test accuracy. A formal external validation study comparing the WHQ to a standard reference test for SSI is now required 20,48 . A choice of cut-off score for the adapted WHQ is likely to favour sensitivity to triage all patients with a likelihood of SSI to seek medical care.
The use of patient-reported outcome measures (PROMs) in low-income settings is complex; many instruments have not yet undergone cross-cultural and cross-language adaptation, and there is uncertainty about the feasibility of remote, digital methods. Although examples exist from established global health fields, such cardiovascular disease, few studies in global surgery have adopted PROMs to date [49][50][51] . Health technology assessments thus neglect important insights into quality of recovery and health utility that could affect policy decisions 52 . This study provides a proof of concept for rapid, pragmatic adaptation of instruments in the surgical setting that can be used across other measures and emerging contexts. Developing culturally attuned, remote follow-up pathways is particularly important during pandemic recovery in building resilience in resource-poor health systems 53,54 . The co-produced pathway for telephone follow-up in LMICs described here is ready for wider adoption. Recommendations from this mixed-methods study can now to be used for further exploration of the diagnostic accuracy of the adapted WHQ in low-resource contexts.