Standardization of Clinical Assessment and Sample Collection Across All PERCH Study Sites

Abstract Background. Variable adherence to standardized case definitions, clinical procedures, specimen collection techniques, and laboratory methods has complicated the interpretation of previous multicenter pneumonia etiology studies. To circumvent these problems, a program of clinical standardization was embedded in the Pneumonia Etiology Research for Child Health (PERCH) study. Methods. Between March 2011 and August 2013, standardized training on the PERCH case definition, clinical procedures, and collection of laboratory specimens was delivered to 331 clinical staff at 9 study sites in 7 countries (The Gambia, Kenya, Mali, South Africa, Zambia, Thailand, and Bangladesh), through 32 on-site courses and a training website. Staff competency was assessed throughout 24 months of enrollment with multiple-choice question (MCQ) examinations, a video quiz, and checklist evaluations of practical skills. Results. MCQ evaluation was confined to 158 clinical staff members who enrolled PERCH cases and controls, with scores obtained for >86% of eligible staff at each time-point. Median scores after baseline training were ≥80%, and improved by 10 percentage points with refresher training, with no significant intersite differences. Percentage agreement with the clinical trainer on the presence or absence of clinical signs on video clips was high (≥89%), with interobserver concordance being substantial to high (AC1 statistic, 0.62–0.82) for 5 of 6 signs assessed. Staff attained median scores of >90% in checklist evaluations of practical skills. Conclusions. Satisfactory clinical standardization was achieved within and across all PERCH sites, providing reassurance that any etiological or clinical differences observed across the study sites are true differences, and not attributable to differences in application of the clinical case definition, interpretation of clinical signs, or in techniques used for clinical measurements or specimen collection.

Current pneumonia treatment and prevention strategies are based mainly on data obtained from large clinical studies carried out in the 1980s. One such study, sponsored by the Board of Science and Technology for International Development (BOSTID), National Academy of Sciences, yielded valuable information on the pathogens present during acute respiratory infections (ARIs) in children <5 years old from resource-limited countries [1]. However, interpretation of the wide range of reported ARI incidence rates was complicated in part by the lack of a standardized case definition at the 10 participating study sites [2]. A subsequent literature review of pneumonia etiology studies, conducted between 2000 and 2010 on children aged <5 years, revealed wide disparity in case definitions, specimen collection techniques, and laboratory methods, which increased the complexity of data collation and analysis [3]. Other studies have demonstrated substantial interclinician variation in the interpretation of clinical signs of severe disease in children and young infants [4][5][6][7]. Standardization of the clinical [8], radiological [9], laboratory [10], and data management methods [11] at all PERCH sites has been prioritized since inception, as we wished to ensure that any observed variation in pneumonia etiology between sites was not attributable to methodological differences. The objectives of the clinical standardization program were to ensure that study staff (1) adhered strictly to the clinical case definitions; (2) were consistent in their assessment, recognition, and interpretation of clinical signs; (3) used standardized equipment and techniques for obtaining clinical measurements; and (4) used standardized methods for obtaining key clinical samples for laboratory testing. This paper describes the PERCH clinical standardization program of clinical training, retraining, and staff assessment that ran throughout the study.

Study Sites
At all sites (Table 1), clinical assessment and enrollment of PERCH cases and controls were carried out by doctors, nurses, and clinical officers (health workers with at least 3 years of formal clinical training). Nurses and field workers or research assistants took anthropometric measurements, assisted clinical staff with procedures, and identified and located PERCH community controls.

Preparatory Phase
The PERCH case definition (Table 2) was based on the 2005 World Health Organization (WHO) clinical definition of severe and very severe pneumonia [8]. The definition relies on the presence of prespecified clinical signs, without information from chest radiograph (CXR) or pulse oximetry. The PERCH enrollment period predated the 2013 reclassification of severe and very severe pneumonia by the WHO [12].
Through a series of teleconferences and 2 face-to-face meetings between all PERCH principal investigators (PIs), consensus was achieved on how to elicit, recognize, and interpret each of the signs and symptoms comprising the PERCH clinical case definition ( Training materials and advice were sought from a wide variety of sources (see Acknowledgments). Many of the clinical video clips, audio recordings and photographs were recorded at PERCH sites by the principal trainer (J. C.), with written informed consent from the patient's parents or guardians. Initial clinical standardization training occurred at all sites immediately prior to a period of pilot enrollment. All sites enrolled to the main study for 24 months, with refresher training carried out in the first and second year.
The initial 3-day training and subsequent 2-day refresher training courses were conducted at all sites by the principal trainer, with support from site project leaders. All cadres of PERCH staff (doctors, nurses, clinical officers, research assistants, and field workers) were trained together; interested local non-PERCH clinicians were invited to participate.
Training courses comprised lectures, discussion of case scenarios in small groups, practical sessions, and ward-based clinical teaching. The initial training lectures covered the background to the PERCH study, rationale for clinical standardization, recognition of the critically ill child, clinical assessment of the child with cough or difficulty breathing, vital signs, pulse oximetry, techniques for collection of NP/OP swabs, and anthropometry. Discussion of PERCH case scenarios, designed to test the trainees' ability to identify signs and symptoms that constitute study inclusion and exclusion criteria, took place in groups of 8-10 people, each group being led by the principal trainer and/or a local facilitator. Trainees were divided into groups of 5-8 for hands-on instruction in clinical assessment.
Staff members were asked to conduct clinical assessments on children, and were assessed on their ability to elicit and correctly interpret clinical signs.
Practical skills were taught through training videos, demonstrations, and hands-on practice in small groups, with key points highlighted in summary lectures. Clinically stable children acted as subjects for the anthropometry training. Staff learned NP/OP swab collection by practicing on each other. Clinicians from sites where IS samples were routinely collected from children (Kenya, The Gambia, South Africa) trained staff from the other 4 sites. The clinical team in Kenya reviewed video recordings of the collection procedures in The Gambia and South Africa, to ensure that they were consistent with procedures in Kenya. IS training was included in the refresher courses, as was guidance on reducing blood culture contamination rates through improved phlebotomy technique. Ethical approval to perform diagnostic percutaneous needle lung aspiration among PERCH cases was only obtained in The Gambia, Mali, South Africa, and Bangladesh. Clinicians from The Gambia (where lung aspiration is performed frequently on children with focal consolidation on CXR [13]) trained PERCH staff from the other 3 countries. Pleural aspirates and gastric aspirates were not included in the training as they were not designated PERCH procedures, but were carried out as routine hospital procedures if clinically indicated. c AVPU score: (1) clinician first assesses whether the child is alert; A = alert (child takes an age-appropriate interest in their environment); if child not alert, clinician tests, in sequence, V, P, and U, stopping when the child gives a positive response; (2) clinician calls the child's name without simultaneously touching him or her; V = response to voice (any consistent visual, verbal, or motor response to voice); (3) clinician presses on the base of the child's fingernail using a pencil or pen; P = response to pain (child withdraws digit); (4) U = unresponsive or unconscious (no response to pain). All courses finished with a multiple-choice question (MCQ) examination, presentation of certificates, prizes for those achieving top scores, and a group photograph. All participants were invited to provide feedback, using a Likert scale to grade the quality of different course components.

Clinical Standardization Guidelines
Guidelines summarizing key information from the training program were distributed to all staff at the time of refresher training, with an electronic version made available on the internal PERCH study website.

Training Website
A training website (www.perchtraining.org), developed in association with a company specializing in digital healthcare (see Acknowledgments), had the following objectives: (1) to act as a repository for the clinical standardization training materials, thereby supporting the training of any new staff who had missed their initial site training course; (2) to provide continuing training of all PERCH staff throughout study enrollment; and (3) to facilitate regular evaluation of all PERCH clinical staff, and comparison of staff performance within and across sites.
The website contained all lectures from the initial training course, which could be streamed or downloaded as lectures with recorded voice-over, or as PowerPoint presentations. When internet speeds were slow, staff accessed training materials from DVDs, which had been distributed to all sites at the start of the study. At several sites, limited access to personal computers meant that project leaders downloaded the MCQs and organized the evaluations as classroom sessions. The website baseline training was supplemented by on-site training in practical skills and ward-based clinical teaching, both coordinated by the local PERCH study leader. On completion of the online course, trainees were required to take the same MCQ examination as those who had participated in face-to-face training. Trainees achieving a score of 80% or more were able to download a certificate from the website. The website also contained 2 additional MCQ examinations and a video quiz (see Evaluation).

Evaluation
MCQ examinations were conducted after initial baseline training, immediately before and after each refresher course, and online after 10 months and 20 months of enrollment. An online video quiz was used to assess interobserver variation in interpretation of clinical signs at 20 months. Checklist evaluation of practical skills was performed at the end of the first year.
MCQs were designed to test knowledge and understanding of the screening, consent and enrollment process, and the recognition and correct interpretation of key clinical signs, particularly those included in the WHO definitions of severe and very severe pneumonia. Each of the 10-20 MCQs contained a typical PERCH case scenario, plus, in most cases, a photograph or short video of a clinical sign. Answers to each question were provided at the end of the quiz, once all of the questions had been answered, with explanatory notes highlighting key learning points. Staff scoring <80% in the MCQ administered after baseline training were required to repeat selected lectures and the quiz, while staff scoring <80% after refresher training received additional training from their site-specific trainer.
The video quiz assessed the ability of clinical staff to identify 6 clinical signs (lower chest wall indrawing [LCWI], head nodding, deep breathing, central cyanosis, nasal flaring, alert child). Clinical staff were shown 35 video clips (10 videos of LCWI, the defining clinical feature of WHO severe pneumonia, and 5 videos of each of the other clinical signs). Each video lasted approximately 10 seconds, and clinicians had to decide whether a specific clinical sign was present or not.
Local clinical standardization trainers observed PERCH nurses and field workers carrying out anthropometry, IS, and NP/OP swab collection. Scored checklists (Supplementary  Tables 1-3) were used to award points for key predefined procedural steps, the resulting percentage score providing a measure of procedural competence.

Statistical Analysis
Median percentage scores and interquartile range (IQR) were calculated for MCQ tests and checklists. Median MCQ scores before and after refresher training were compared using the Wilcoxon signed-rank test. The distribution of results across participants was compared within and across study sites. Results were stratified by professional cadre and by whether staff assessed both cases and controls, or controls only. Differences between groups were examined with the Kruskal-Wallis test.
For each of the 6 clinical signs in the video quiz (35 videos in total), individual responses were used to assess the percentage agreement between the clinical staff and the principal trainer, who was the designated "gold standard. " Calculation of Fleiss' κ and the Gwet AC1 statistic, which is less affected by low prevalence than the κ statistic, were used to measure the degree of interobserver variability [14][15][16].
Kaplan-Meier curves were constructed to illustrate the proportion of PERCH clinical staff remaining in the study, from the time of baseline clinical standardization training. Curves were censored when staff members left the study, or on completion of PERCH enrollment.

Training Courses
Between March 2011 and August 2013, a total of 32 training courses were conducted at 8 study sites in 7 countries. Of 331 staff attending 1 or more courses, 45 (14%) were interested local clinical staff, not directly involved in the study. Feedback from course participants was positive, with 90% of all course components being graded as "very good" (4/5) or "excellent" (5/5). Initial (baseline) clinical standardization training took place over a 6-month period between March and September 2011. At each site, training occurred immediately prior to a period of pilot enrollment, and a median of 5 months (range, 4-9 months) before the start of the study. Seventy staff members joined PERCH after the initial training course at their site, and received baseline training from their site project leader and/ or the training website. In South Africa, baseline training was repeated 6 months after the start of enrollment, due to extensive staff turnover during the pilot period.
The first round of refresher training took place a median of 7 (range, 5-11) months and the second round a median of 18 months (range, 14-21) after the start of the study.  (Table 3). Median scores were ≥80% at each point of testing, and improved with refresher training by a median of 10 percentage points. There was significant heterogeneity (P < .001) in the range of baseline training scores between sites, with South Africa and Mali having the greatest range of scores and Thailand the least variability ( Figure 1A). Refresher training scores are shown in Figure 1B and 1C. The proportion of staff attaining a score of ≥80% rose from 54.7% and 60.4% before refresher training 1 and 2, respectively, to 84.9% and 82.8% after training ( Table 3). The difference between preand postcourse scores (excluding those attaining 100% in the precourse MCQ) did not vary significantly (P > .8) between sites. Median precourse MCQ scores were significantly lower among nurses and clinical officers compared to doctors (Table 4); nurses who assessed controls only scored lower than those who assessed both cases and controls, though (with the exception of prerefresher training 1) this failed to reach statistical significance.

DISCUSSION
Although staff training is an important component of all clinical trials, most studies fail to document its content or evaluate and report on its effectiveness [17]. By means of MCQs, a video quiz, and checklist evaluation of practical skills, we assessed key knowledge and clinical skills of PERCH staff throughout the duration of the study. Despite considerable challenges posed by staff turnover, language differences, intersite variation in the number and cadre of staff performing clinical assessments, and political and geographic factors beyond our control, a satisfactory level of clinical standardization was achieved within and across all study sites. Because of clinical standardization, we consider that the variable proportion of very severe pneumonia cases at different PERCH sites, from 10% in Bangladesh, where screening took place in an outpatient clinic, to approximately 50% among hospitalized children in Mali and Kenya, is a true reflection of intersite differences in case severity.
MCQs were administered at the end of baseline training and at regular intervals throughout the study. To answer questions correctly, staff needed thorough knowledge of the PERCH case definition and inclusion and exclusion criteria, and the ability to recognize and interpret key clinical signs from the accompanying video clips. The lower MCQ scores attained in Mali following baseline training may have been because the 3-day course concluded 1 day early due to extenuating circumstances, and was delivered in French by a nonnative speaker. In Thailand and Bangladesh, courses were delivered in both English and Thai or Bangla, and MCQ scores at these sites were comparable to the scores from countries where English is more widely spoken. At all sites and time points, doctors attained significantly higher MCQ scores than clinical officers and nurses, who generally spend a shorter period of time in professional clinical training. The nurses who only assessed healthy controls scored worse than nurses assessing both cases and controls, probably because they were exposed to fewer children with clinical signs.  Clinical video has been shown previously to be an effective way of testing agreement between clinicians on the presence or absence of clinical signs [6], despite the obvious difference from the "real-life" clinical situation, in which a clinician's judgement is affected by information other than an isolated clinical sign. The same study showed that health workers of different cadres and varying levels of clinical experience could correctly identify clinical signs from video recordings for which there was high proportionate agreement between experts [6]. Clinical signs are not, however, always clear-cut in real-life. To this end, the PERCH video quiz included a random selection (approximately 20%) of "gray" cases-namely, those in which a clinical sign (eg, LCWI) was present but subtle, making it genuinely difficult to decide on its presence or absence. Despite this, percentage agreement between staff and the trainer was ≥89% for all 6 clinical signs in the quiz, while interobserver agreement (agreement between participants) varied from "moderate" for central cyanosis, a clinical sign which is easily missed in African children and which is difficult to photograph or film successfully, to "substantial" or "excellent" for the other clinical signs. Goodquality clinical video clips are a valuable and scarce resource, and we hope that the video clips available on the PERCH clinical standardization training website (www.perchtraining.org) will be useful for other clinical researchers.
Although the PERCH clinical standardization program successfully attained its objectives, a number of useful lessons have been learned. It would have been informative to evaluate staff knowledge and skills prior to the initial training course, as this would have provided a useful baseline comparator for the subsequent MCQ scores. The improvement in MCQ scores with refresher training suggests that it would have been valuable to have had more regular refresher training courses at each site, coordinated by local site trainers. Limited availability of personal computers and slow internet speeds reduced the utility of the training website at several of the study sites. These shortcomings are not shared by mobile phone technology, which could provide a useful alternative platform for training and evaluation. It took time to obtain a sufficient number of good-quality video clips of relevant clinical signs, and consequently the video quiz took place during the final 4 months of enrollment, by which time many of the original PERCH staff had left the study. It would have been preferable to organize the quiz at the start of enrollment, and repeat it during the second year. Although the checklist evaluations of practical skills were useful training and evaluation tools, they were time-consuming and were consequently performed on approximately 60% of the relevant study staff. High rates of staff turnover emphasized the importance of establishing a robust system for training new staff outside of the regular training schedule. Turnover was lowest among clinical officers, which may reflect their longer-term clinical attachments.
There is increasing recognition that public health policy should be based on data that are globally representative. Enhanced connectivity, the widespread availability of powerful computing, statistical and data management tools, and the advent of funders willing to pay for large networked studies have increased the feasibility of conducting large, multicountry research studies. Ensuring that the clinical and laboratory data obtained during the course of such studies are robust, standardized, and comparable is of paramount importance. The results of the PERCH clinical standardization program give us confidence that any etiological or clinical differences observed across the study sites are true differences, and not attributable to differences in application of the clinical case definition or differences in techniques used for clinical measurements or specimen collection. We hope that the methods, results, and lessons learned from the PERCH clinical standardization program will usefully inform other researchers embarking on large-scale clinical or epidemiological studies of pneumonia or other major causes of childhood morbidity and mortality.