Recommendations for the conduct of efficacy trials of treatment devices for osteoarthritis : a report from a working group of the Arthritis Research UK Osteoarthritis and Crystal Diseases Clinical Studies Group

Objective: There are unique challenges to designing and carrying out high quality trials testing therapeutic devices in osteoarthritis and other rheumatic diseases. Such challenges include determining the mechanisms of action of the device and the appropriate sham. Design of device trials is more challenging than that of placebo-controlled drug trials. This study reports recommendations for designing device trials. Methods: An Arthritis Research UK study group comprised of 30 persons including rheumatologists, physiotherapists, podiatrists, engineers, orthopedists, trialists and patients, including many who have carried out device trials, met and using a Delphi-styled approach, came to consensus on recommendations for device trials. Results: Challenges unique to device trials include defining the mechanism of action of the device and therefore, the appropriate sham which provides a placebo effect without duplicating the action of the active device. Should there be no clear-cut mechanism of action, a three-arm trial including a ‘no treatment’ arm and one with presumed sham action was recommended. For individualised devices, generalisable indications and standardisation of the devices are needed so that treatments can be generalised. Conclusion: A consensus set of recommendations for device trials was developed, providing a basis for improved trial design, and hopefully improvement in the number of effective therapeutic devices for rheumatic diseases.

For more information, including our policy and submission procedure, please contact the Repository Team at: usir@salford.ac.uk.

INTRODUCTION
From hand splints to knee braces to surgical implants, devices are an important element in the clinical management of osteoarthritis (OA) and of other rheumatic diseases [1][2][3]. The Food and Drug Administration in the U.S. (FDA) defines a device as an instrument, apparatus, implement, machine, contrivance, implant, in vitro reagent or other similar article intended for the use in the cure, treatment or prevention of disease which does not achieve its primary intended purposes through chemical action and which is not dependent on being metabolised for achievement of its primary intended purposes (if through chemical action or metabolism, it would be classified as a drug) [4]. In this paper, we focus on devices intended to treat or prevent OA but given the interest in devices as treatments for localised joint problems in other rheumatic diseases [5], our recommendations have relevance for device testing in rheumatic diseases in general.
Osteoarthritis, especially in the knee and hip, is considered mechanically driven and patients may respond to treatment with a device that functions by altering the loading response of the joint, thus redistributing the stresses across joints tissues [6][7][8]. A wide range of devices are in use in clinical practice but few have been tested robustly and the evidence supporting their use is generally poor [6]. Pharmaceutical treatments have improved treatment of OA, and guidelines for the design of efficacy studies have facilitated the success of trials testing these agents [9]. No guidelines have been presented on the optimal design and conduct of device trials, although literature has recurrently highlighted the challenges faced when developing and undertaking these trials [10][11][12]. Such challenges have included: the determination of a sham or placebo comparator, standardisation of dose of the experimental intervention through participant adherence, controlling confounding variables such as activity levels and footwear or clothing which may confound outcomes [10][11][12][13].
Consequently there is limited consensus on study design which leads to confusion for researchers in planning and for journal and funding body peer-reviewers when appraising research in this area. The overriding consequence however, is that trials involving devices face significant disadvantages compared to pharmaceutical and biologic treatment trials in securing competitive grant funding and in the peer review process prior to publication.
The recommendations are not intended to replace existing guidelines for the conduct of trials but rather to complement them. They are focused on trials that test in a rigorous fashion the efficacy of a device and do not focus on pragmatic or effectiveness trials. The panel meeting started with a pre-defined objective presented by the two chairs (AR/DF).

METHODS
This objective was to determine what design features should be included in future device trials. The chairs commenced by presenting a review of the literature summarizing the issues raised in studies of physical devices in the previous 4-5 years and setting the scene for areas of common concern. The panel was then prompted to identify what features were felt relevant. These were compiled onto flipcharts and a preliminary list of approximately 20 challenges to the successful conduct of device trials was identified. Once identified, each design feature was discussed and refined with pooling of overlapping areas and reduction of the list to a suitable number of items for inclusion in formal guidance. Consensus on consolidation or inclusion/exclusion was defined where there was 100% verbal agreement from the panel. Those for which there was a consensus were included in a list of provisional design features. The provisional list was then transcribed and circulated 1 week after the consensus meeting. This was then reviewed and the wording of the recommendations finalized following approval from all panel.

Carefully define the phenotype and ensure its relevance to the intervention being tested
Devices used in OA management are often highly selected to specific indications or are customised to individual requirements. As a consequence, case ascertainment of the OA and careful phenotyping are recommended to ensure that the results of any efficacy trial can be generalised to others with the same clinical presentation (see table 1). For instance patients with patellofemoral joint OA are likely to need different devices for treatment than those predominantly with medial compartmental OA [14,15]. Persons with knee instability may need a stabilizing device for their knee, whereas those without instability may not.
Hence, devices targeting only one knee compartment may not be effective for individuals with knee OA where multiple compartments are affected. Similarly, devices that immobilise the wrist and fingers might not be appropriate for persons with isolated OA at the base of the thumb. Identifying location specific pathology such as compartment involvement with knees or specific joints affected in hands or wrists will require a reproducible examination to isolate the affected region and/or specific validated questions characterize specific problems.

Randomisation is critical
As with all clinical efficacy trials, randomisation in physical device trials is essential to minimise bias and to ensure a fair comparison between interventions. There is strong evidence that physical device studies showing the largest and possibly spurious treatment effects tend to lack randomisation and adequate comparators [2,16].

Where mechanisms of action of the intervention are understood, the comparator (or sham) intervention should be defined in terms of action in order to control for these mechanisms as much as possible
The optimal device trial is one in which the mechanism of action of the device is understood or hypothesised and a comparator can be chosen that provides a placebo effect, while avoiding replicating the mechanism of action of the active device. One example is acupuncture which is characterised by the FDA as a device (e.g. (http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?ID=3509; http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpcd/classification.cfm?id=2495 ) where the location of placement of needles is thought to be one important mechanism of action .
For sham acupuncture, needles may be positioned in non-therapeutic locations and even moxibustion controls where needles are placed in the skin regardless of location. Needles placed in nontherapeutic locations may trigger release of neurotransmitters or serotonin, producing some pain relief, and this raises questions about the real mechanism of action of acupuncture [17]. Another example is lateral wedges for treatment of medial knee OA. In this circumstance the mechanism of action is felt to be a reduction in medial load across the knee and the control device has consisted of a neutral insert placed inside the shoe which may have no effect on this load.
In many device trials the control treatment or device is less easily defined because the mechanism of action of the device itself is not clear-cut. In practice, isolating all possible mechanisms through which a device could provide a therapeutic effect can be extremely difficult. Considerable effort has been invested in the development of so-called 'sham' devices in controlled trials of physical interventions. Sham devices are control devices which purport to have none of the desired characteristics of the active intervention. For example, to test the efficacy of a realigning brace for medial knee OA, the control comparison might be a brace which does not realign. However, a brace which does not realign may have therapeutic value by altering proprioceptive input, muscle strength or muscle recruitment, and a non-realigning brace may limit injurious joint motion. Similarly, functional foot orthoses are in widespread clinical use and have been shown to affect walking mechanics [18][19][20].
Formal evidence regarding their efficacy has been uncertain as several parallel-group randomised controlled trials which employed sham control arms, have demonstrated clinically and statistically significant improvements in the controls, with no difference between the foot orthoses and controls particularly in patient-reported outcomes [21,22]. While these findings may reflect that the experimental intervention is indeed not effective, there is concern that some sham devices may provide elements of active treatment and not constitute pure placebos.
A major challenge in testing a device such as a brace or foot orthosis is whether the sham comparison is designed so that it does not provide therapeutic effects through the same mechanisms of action as the active device. If it does, there is a good chance that trials will fail to detect the efficacy of the active device.

Where mechanisms are not understood or a comparator is not feasible, a nonintervention arm should be considered
As noted earlier, the mechanism of action being tested depends on the comparator selected and if the goal of a trial is to test the overall effect of the device, regardless of its mechanism of action, then a comparator group which gets no exposure to any of the potential therapeutic mechanisms of action is needed. This creates interpretive challenges as follows: are the effects of the device simply due to the placebo effect of placing something on the limb or inside a shoe or are effects seen due to the mechanisms of action attributable to the device? A limited number of three-armed trials in which one arm is a nontreated comparator and the second arm is a comparator which provides some but not all of the potential mechanisms of action of the active treatment will help solve this conundrum.
Kirkley et al. designed such a trial of valgus knee braces in which there were three treatment groups, one an active valgus brace, a second no treatment, and a third sham neoprene sleeve which provided proprioceptive input and a sense of stability but did not realign the knee [23]. In this combination, the valgus brace was found to be statistically superior to both the other two trial arms.

Should the intervention need to be customised, the investigator is encouraged to define a standard operating procedure for the customisation
Some physical devices are provided off-the-shelf without local modification and so their effects can therefore be considered to be standardised and generalisable. Other devices require local modification to address specific patient characteristics, while other devices are custom made to individual patient-related or practitioner specifications. In the latter two cases, it may be challenging to generalise study findings to broader patient populations or to practitioners who do not modify in the same manner or using the same principles. For example, foot orthoses or hand splints may be prescribed and manufactured using a variety of materials and manufacturing methods, based on different underlying theories. Further assessments of the disease phenotype indicating the use of the orthosis may differ. Lastly treatment may be modified during the treatment course [24]. This results in considerable variation in the matching of a final product to the original patient presentation [25].
To test the efficacy of such devices, the active treatment needs to be patient specific but sufficiently standardised in delivery and design (i.e. standard operating procedures for delivery) that the efficacy of the device in one setting can be generalised to other settings. A principle mechanism of action needs to be posited and the consensus features of a device should be presented explicitly. For surgical device studies, there is often a steep learning curve for operators. To evaluate efficacy of the device, experienced operators are necessary [26]. If the treatment is affected by experience or learning requirements, then this should be standardised. These are particularly important considerations when identifying study sites and planned educational support for multi-centre device trials.

Other design considerations include run-in phase, within-patient controls, cross-over designs and increased patient and public involvement
A run-in phase has been used in some trials to attempt to overcome issues relating to early failure to tolerate physical devices Run-in periods allow patients who are not necessarily naïve to prior physical interventions to acclimatise to altered function or adapted behaviours associated with the intervention.
Within-patient controls are a further approach to defining a control group. In the SPLINTOA trial, people with interphalangeal joint OA were randomised to splinting of the most painful deviated distal interphalangeal (DIP) joint in the past week leading up to enrolment, whilst up to three other 'affected' DIP joints on either hand were not splinted but monitored as 'control' joints [27]. This provides a control which limits variability in population characteristics and behavioural environmental differences but also assumes that the disease state and treatment responsiveness are the same in treated and untreated joints. Within person designs (either cross-over or treating one affected joint while not others) because they test treatment effects in the context of within person (not across person) variability are especially efficient. This design also has the added benefit of usually requiring fewer participants. To carry out a rigorous cross-over trial, the efficacy of the device being tested must wash out when the device is removed.
Some clinicians may find it challenging to provide a knowingly sham intervention to their patients [28]. The consistency of information, interaction and communication of clinicians to participants when providing the device is critical and may have a significant impact on participant attitudes and subsequent adoption of the device. The expectation for benefit during the trial also requires standardisation so that patients expect the same benefit from the sham and active device, a feature of the trial design that requires close coordination and acceptance from the local ethics committee. Clinicians should be supported and educated to understand the role of the sham specifically within the trial.
Consulting with patient and public involvement (PPI) groups during the development of the interventions trial design can help establish more convincing sham interventions from the participant's perspective, as demonstrated with the OTTER trial [13].
Other considerations generic to the design and conduct of clinical trials are also relevant to device trials including the necessity of pilot work before embarking on a full trial, the study of dose response effects and blinding, although our recommendations regarding sham design bear on blinding. Further, for OA trials, an extensive description of the participants' disease including inflammatory components and stage of disease, may reveal subpopulations in whom devices may be effective.

In efficacy studies, the duration of the trial may be short (6-12 weeks) in order to maximise adherence and minimise loss of follow-up
For most devices, the desired physical effects are achieved in a relatively short period (6-12 weeks). In determining efficacy therefore, there is merit in separating the issues of technical efficacy in the short to medium term versus long term adherence with the treatment. This is relevant, as the presence or absence of technical efficacy versus longerterm adherence with a device represent different dimensions and should be assessed and addressed separately. Long term effectiveness is critical for treatment of chronic illness but may be examined after short term efficacy is established.
Consideration should be made to the use of implementation diaries, activity monitoring, goal setting or other behaviour change strategies to maximise adherence to both experimental and sham interventions.

Trials should include assessments of the use of the device and measures of adherence.
This is direct extension of point seven and highlights that evaluation of the use of the device and adherence with treatment protocols can be distinguished from the technical efficacy of the device and are vitally important components of the overall effectiveness in clinical settings. Use and adherence should be considered and measured explicitly.
Measures of activity levels of patients in the trial may also help in the assessment of the device.
Device trials can present specific challenges over other efficacy trial designs such as pharmacological studies. For instance, in foot orthoses efficacy trials there is currently no standardisation of whether footwear should be controlled across participants as the orthosis and footwear interact. Similarly, the terrain and level of physical activity the participant engages in wearing the device may be confounding factors. In splinting trials for hand OA, there can be variability in when splints are used during the day/night which may make the devices more/less tolerable, but may result in different clinical outcomes [24,29].

Trials should include a blinded assessor and should use one or more objective measures to assess the primary effect
In many device studies there is a need for a clinician to provide a fitting and evaluation of the intervention and the related control. Unlike in pharmacological trials, it may be impossible to blind this treating clinician to the treatment allocation and so it is necessary for a second researcher to be employed to obtain metrology or supervise patient reporting of subjective measures. This duplication of human resource often raises concerns from funders that device trials are financially inefficient. It is however highly desirable methodologically to maintain this division of care and metrology.
To some extent the potential for bias from pollution between clinical and metrological roles can be minimised through use of objective measures of the primary effect, i.e. to obtain an outcome measure likely to be unaffected by a placebo effect, such as use of imaging to assess modification of structural findings in a targeted joint. Whenever possible therefore, an objective measure, such as biomechanical or imaging findings, as a counterpoint to subjective measures such as pain and self reported function should be employed and direct comparisons made in the statistical analysis. Functional measures including those derived from observed performance may also be valuable complements to patient reported outcomes. If biomechanical measures or imaging are included as outcome measures, they may add substantial resources and logistical challenges to the study. Nonetheless, researchers should not be deterred in recommending all appropriate measures to best determine efficacy in device trials.

Trial design and analysis should take into account adverse events, including pain in other joints.
Some trials of physical devices such as foot orthoses which require accommodation within existing footwear, have reported a greater incidence of adverse events in the experimental intervention group compared with the sham group [30,31]. If adverse events are not taken into account in the overall effectiveness of an intervention, then an additional important indicator of whether an intervention is truly effective is missed. Adverse events can include difficulties with compliance, direct consequences of device use such as skin irritation/blisters, or secondary consequences such as new onset pain in other joints. Careful capture and analysis of such adverse events and comparison across active and sham interventions is essential in determining the overall efficacy of a physical device.

CONCLUSIONS
This paper has identified key methodological features which should be considered when designing efficacy device trials for people with OA. As noted earlier, while these recommendations focused on OA trials, they are relevant to rheumatic disease device trials in general. As highlighted, this can be a challenging area of research, with numerous design limitations making identification of treatment effect more difficult than other areas such as placebo-controlled pharmacological trials.
In addition to these challenges in study design and methods, we suggest that future research should incorporate a multidisciplinary team including engineers and scientists who can partner with clinicians who provide input into the clinical applicability of devices. This will lead to devices that are durable and efficacious and that have clear mechanisms of action.
Both device design and measurement of the technical effects of devices require skills that extend beyond the expertise of most clinicians.
Since compliance is such a critical aspect of the long-term effectiveness of physical devices, it is essential that the OA community also collaborates more effectively with patients in how to optimise comfort and convenience of devices found to be technically efficacious. For clinical efficacy studies, 1. Carefully define the phenotype and ensure its relevance to the intervention being tested 2. Randomisation is critical 3. Where mechanisms of action of the intervention are understood, the comparator intervention should be defined in terms of action in order to control for these mechanisms as much as possible 4. Where mechanisms are not understood or a comparator is not feasible, a non-intervention arm should be considered 5. Should the intervention need to be customised, the investigator is encouraged to define a standard operating procedure for the customisation 6. Other design considerations include a run-in phase, within-patient controls, cross-over designs and increased patient and public involvement 7. In efficacy studies, the duration of the trial may be short (6-12 weeks) in order to maximise adherence and minimise loss of follow-up 8. Trials should include assessments of the use of the device and measures of compliance 9. Trials should include a blinded assessor and should use one or more objective measures to assess the primary effect 10. Trial design and analysis should take into account adverse events, including pain in other joints