Objectives. The requirement in Northern Ireland to prescribe biological agents according to National Institute for Clinical Excellence/British Society for Rheumatology (NICE/BSR) guidelines and within a fixed budget has created a waiting list for treatment that has no parallel in the Republic of Ireland. The study investigated the bearing this situation may have had on consultants’ judgements in the respective areas.
Methods. Seventy-eight case vignettes created from the data on real patients with RA treated with biologicals in the north and south of Ireland were appraised by nine southern and eight northern consultants, who judged the clinical benefit and significance of the patients’ condition after a trial of therapy. Quantitative (clinical judgement analysis) and qualitative (focus groups) techniques were used.
Results. Northern consultants perceived a slightly greater degree of clinical benefit after a trial of therapy than southern consultants. Judgement models of northern and southern consultants were broadly comparable. The latter tended to be more uniform in their judgements than the southern group. Focus group discussions with consultants largely validated the findings of the quantitative analysis but revealed how clinical judgement analysis might be misled by gaming strategies.
Conclusions. Despite the absence of overt rationing in the south of Ireland, as far as the judgement of therapeutic benefit from biologicals was concerned, the clinical judgement policies of practitioners were very similar to those in the north. The adoption of NICE/BSR guidelines in the north may have improved the uniformity of clinical practice in Northern Ireland.
In 2002, the National Institute for Clinical Excellence (NICE) in the UK issued advice on the use of infliximab and etanercept in rheumatoid arthritis (RA) . Essentially, their use was endorsed according to the British Society for Rheumatology (BSR) guidelines. In addition, longer-term data on efficacy and safety would be sought through the creation of a National Biologics Registry that would hold data on all patients given a trial of therapy. This advice was welcomed, even though no central allocation of funds was made to ensure that NICE guidance could be implemented uniformly across the country.
Over the last few years the emerging budgetary pressures facing primary care trusts from novel pharmaceutical agents and new health technologies have become more intense, yet despite the efforts of the NICE to clarify the scientific justification for new treatments, its remit does not embrace prioritization . Consequently, anecdotally at least, there is some evidence of variation in levels of funding made available to support the introduction of the newer treatments for rheumatoid arthritis . Funding availability, of course, is not the only explanation for different levels of prescribing of the biologicals across the country. Rheumatologists may differ in how they perceive and weigh up the possible benefits of treatment and risks of harm from the potentially life-threatening immune suppression and the risk of systemic infection.
There is also a concern that the way the criteria for use are applied varies among hospitals, consultants and patients. But whether arising from the challenge of reliable measurement or from frank ‘gaming’, there can be important differences between guideline intentions and guidelines in routine practice [4, 5].
In its Priorities for Action (2004/05), the Department of Health and Social Services and Public Safety for Northern Ireland (DHSSPSNI) recently confirmed that it expected Health Boards to fund biological treatment for an additional 100 patients within the local health service. Even so, funding strictures have resulted in the creation of a waiting list for this therapy, one which is now formally reported to the (DHSSPSNI) and similar in format to any surgical waiting list. Whether or not this is a first nationally, this state of affairs was previously unprecedented in Northern Ireland and is difficult to reconcile with the absence of similar arrangements for other high-cost pharmaceuticals. Furthermore, from the perspective of the patient waiting for treatment (and quite different from patients on routine surgical waiting lists), treatment delay in RA results in unacceptable levels of patient discomfort and contributes to long-term joint damage and disability.
While lobbying for increased expenditure, local rheumatologists from all four Health Board areas have sought to apply the BSR criteria uniformly and fairly but within the resources available. A recent regional audit showed that fewer than 300 patients in total were on treatment. On the other hand, their colleagues from another jurisdiction, the Republic of Ireland, have so far experienced no funding barriers to the prescribing of biologicals. The total number of patients on this treatment in the Republic of Ireland is estimated to be 2000, and though guidelines are being introduced that may curtail the growth, the number of new patients that will be offered treatment annually is projected to be approximately 660 (personal communication, O. FitzGerald). This will represent approximately two to three times the availability of treatment in Northern Ireland. (The resident populations of the north and south are 1.7 m and 4 m, respectively). Though all southern consultants are encouraged to use an evidence-based approach, ultimately they have the freedom to prescribe according to clinical judgement. In practice, this results in many consultants considering a biological agent when their RA patient has ongoing active disease despite maximum (<25 mg orally per week) tolerated doses of methotrexate. Given the subtle variations in ways that the ostensibly objective BSR criteria may be interpreted , a comparison of northern and southern practice might provide valuable insights into which clinical cues are most affected by any conscious or unconscious need to ration access to treatment.
By learning more about which clinical signs or symptoms affect judgements about treatment eligibility, it may be possible to focus educational interventions aimed at achieving greater equity of access . However, decision-makers frequently overestimate or misjudge the information they can or actually do use in making decisions . The study of clinical judgement can be approached in many ways but has been advanced substantially in the last three decades by the application of social judgement theory to the problem of combining observations as a basis for action . Brunswick's analogy  of rays of light passing through a convex lens to describe the relationship between the interpretation of information (cues) and the actual relationship of those cues to the real world has had fairly wide application in clinical settings [11, 12]. Cues typically take the form of symptoms or signs or laboratory results, the lens model paradigm allowing differences in the doctors’ judgements to be displayed in terms of the differences in the weights attached to the various cues and in the differences in the combination rule used to arrive at the final judgement. In fact, amongst the earliest clinical applications of the clinical judgement analysis approach (CJA) were those in rheumatology by Kirwan et al. [13, 14], who sought to clarify the cues’ employed by rheumatologists (in a prebiologicals era) to judge response to treatment. Others have subsequently considered the potential advantages of CJA in minimizing type II errors when designing multicentre trials in which multiple observers have to make efficacy judgements .
Our objectives, therefore, were: (i) to use CJA to determine which, if any, clinical variables distinguish the judgements of rheumatology consultants in Northern Ireland and the Republic of Ireland, on the efficacy of biological disease-modifying therapy in RA; and (ii) to triangulate the findings using a qualitative study of focus groups of consultants in each jurisdiction.
Patients and methods
A research nurse reviewed the medical notes of all RA patients who, prior to spring 2003, had received biological therapy at Musgrave Park Hospital, Belfast and St Vincent's University Hospital, Dublin. A random sample of 78 vignettes or ‘paper cases’, as illustrated in Fig. 1, was drawn from among those with complete information with respect to BSR prescribing criteria. In order to subsequently evaluate consistency, 20 duplicate cases were created and added to the original 78 in a folder that was given to each participating consultant. Half of these 20 were exact duplicates but were not identifiable as such to the consultants. The remaining 10 had a slightly different preamble (as described and explained below). The details of each case included the patient's age, employment status, history of side-effects on previous DMARDs, whether there were any allergic or intercurrent infection side-effects while on infliximab, and the change in the patient's global assessment, Health Assessment Questionnaire (HAQ) score, the number of swollen and tender joints, the erythrocyte sedimentation rate (ESR) and the Disease Activity Score (DAS).
Consultants were recruited to the study, as volunteers, at the 2003 spring meeting of the Irish Rheumatology Society. Each was asked to make certain judgements (and record these on the vignette) about prescribing biologicals for the paper cases. For each case, the consultant was to indicate on a visual analogue scale (VAS) the extent of the change in the patient's condition after a treatment trial, whether they deemed it clinically important and whether they recommended continuing with infliximab biological therapy.
In the second batch of 10 duplicates, the preamble stated that the patients had not previously been on biologicals but had recently increased their DMARD dose to the maximum permitted. The first two judgements to be made on these cases were the same as before, but for the third they were to indicate whether they were likely to recommend a switch to biological treatment. The reason for the inclusion of these duplicates was to study whether, for exactly similar clinical scenarios, the class of drug affected the perceived benefit.
A sessional payment was made to consultants for each completed folder returned and for their subsequent participation in the focus group approximately 6 months later.
Participating consultants from the two regions had comparable experience, the average number of years since they had first taken up a consultant post being 10.5 in the south and 12 in the north. Of the 17 participating consultants, four held academic positions, two from the south and two from the north.
In deriving judgement analysis models for each consultant, multiple linear regression was used, in which the judgement is the dependent variable and the clinical cues are the independent variables. Some of the latter categorical variables were collapsed into two categories, while the change in the continuous cues after the treatment trial (such as the number of swollen joints or the ESR) was converted into a percentage change [(before treatment score – after treatment score)/before treatment score]. However, a subsidiary analysis was also conducted in which the change for these variables was entered into the model as an absolute value rather than as a percentage.
The differences between northern and southern consultants in the regression coefficients (β-weights) associated with each clinical cue were assessed using the Mann–Whitney test.
Before building the regression models, in keeping with good practice, we checked for non-independence of residuals by computing Durbin–Watson statistics for each consultant's judgements . In no case was there statistical violation of the independence assumption.
To study the effects of their location (north or south) on the consultants’ VAS judgements, a three-way analysis of variance (ANOVA) was conducted. In respect of their binary judgements, the differences between northern and southern consultants were assessed using Fisher's exact test on a 2 × 2 contingency table for each of the 78 cases. The number of these tests expected to show a statistical difference between the consultant groups was then computed using tables from the binomial distribution.
Intra-rater consistency was assessed by deriving the intraclass correlation coefficient for the judgements made on the 10 exact duplicates.
Inter-rater consistency was assessed in order to assess consistency within the two groups. In other words, to determine the extent of agreement among consultants from the same location. The inter-rater consistency was assessed using Cochran's Q test , which investigates the differences between the choices made by consultants within each location, and is appropriate for dichotomous data. In this situation, Cochran's Q tests the null hypothesis that the probability of a ‘yes’ response is the same across consultants within each location.
Qualitative analysis of focus groups
The analysis was undertaken using a grounded theory approach, a method that uses a systematic set of procedures to develop an inductively derived description of clinical judgements. The intent was to use this approach to identify the major constructs and their relationships in a clinical context. The full qualitative analysis is not reported here. Rather, the analysis was used to complement the quantitative statistical analysis by providing a richer descriptive account of clinical decision-making in context.
The consultants in each location, at separate group meetings, were provided with the 10 most discordant cases from the quantitative analysis. They were asked to discuss each case openly and make judgements on the 10 cases, similarly to what had been required in the earlier quantitative study.
The study was approved by the Queen's University Medical Research Ethics Committee.
Was the change in the patient's condition deemed clinically important?
For each of the 78 cases, 2 × 2 contingency tables were assembled to compare the number of consultants in each jurisdiction rating the change in the patient's condition as clinically important. The case shown (Fig. 1) was the only one to disclose significant differences between the consultants (two-sided Fisher exact score = 0.0497). One case in 78 is no more than would be expected by chance.
To what degree did the patient's condition change?
On inspection of the raw data it appeared that northern consultants, on average, perceived there to be greater changes in the patients’ condition. The summary results of a three-way ANOVA are displayed in Table 1. Thus, apportioning the variation between consultants and between locations, it can be seen that there was a significant effect of consultant location on judgements about the extent of change, the northern consultants perceiving there to be a change of slightly greater clinical significance than the southern consultants (equivalent to only 2.5 points on the VAS).
|Sum of squares||Degrees of freedom||Mean square||F||P||Mean VAS score|
|Sum of squares||Degrees of freedom||Mean square||F||P||Mean VAS score|
The contribution of each clinical cue to the judgements made on the extent to which the patient's condition had changed is indicated by the magnitude of the standardized β coefficients in the (individual consultant) regression models. These average coefficients (across consultants) are plotted on the histogram shown in Fig. 2. Broadly speaking, while some differences may have been evident at an individual level, as can be seen from the figure, the models for northern consultants appear quite similar to those of the southern consultants. In the Mann–Whitney test, however, only one cue distinguished the models of the two groups of consultants. ESR affected the judgements of the northern group more that the southern group (P = 0.01).
The judgement models generally had reasonable explanatory power, the average R2 being 0.613, and, with respect to intra-rater consistency, the correlation between judgements made on duplicate cases was on average r = 0.74 (range 0.25–0.90).
Does the class of the drug affect the doctor's perception of the extent of change in the patient's condition?
A second three-way ANOVA was conducted using the judgements made on the second batch of duplicates. Because one case had, in error, been included in the folder in triplicate, our 17 clinicians were making judgements on nine cases.
The results (Table 2) indicate that the southern consultants were more likely to recommend a switch to biologicals (average southern VAS rating 5.27 compared with 3.65 in the northern group).
|Sum of squares||Degrees of freedom||Mean square||F||P||Mean score|
|Sum of squares||Degrees of freedom||Mean square||F||P||Mean score|
Are the northern or southern consultants more uniform in their yes/no judgements?
The Cochran's Q-test analysis of inter-rater consistency indicated that northern consultants were considerably more uniform in their judgements of the clinical importance of the change in the patient's condition than the southern consultants [Cochran's Q in the north, 9.33 (d.f. = 7), P = 0.230 vs 32.15 (d.f. = 8), P<0.001 in the south].
Focus group analysis
A major theme emerging from the dialogue among northern consultants was the ambiguity and tension that affected their judgements when subjective (patient-reported symptoms) and objective (clinically assessed) cues appeared to be at odds.
The DAS was the most explicitly discussed cue among in the northern group, but much less so in the south. In fact, the DAS score was mentioned 57 times in the northern discussion and on 27 instances in the south. The number of swollen joints (assessed routinely by the consultant or a nurse practitioner) was the next most commonly mentioned cue in both jurisdictions. Both sets of consultants mentioned the importance of it being assessed by the same person before and after the treatment in order to enhance objectivity. This cue was regarded as being more reliable than the number of tender joints.
There was some discussion in the northern group (but not in the southern group) about the possibility of ‘gaming’ affecting the application of the BSR NICE response criteria. Northern consultants openly discussed pressure to abide by the BSR NICE guidelines. In fact, a number of the consultants cited gaming strategies that might allow patients to meet BSR NICE criteria. For example, some consultants reported that they were inclined to avoid increasing a patient's steroid dose if it meant that they might thereby not meet BSR NICE eligibility criteria. In addition, there were less than explicit ways of implying to patients that their global assessment response needed to shift to ensure treatment continuity.
The discussion among both groups indicated that the HAQ was not greatly influential in their judgements on treatment efficacy. In the northern group, and not mirrored in the south, there was considerable discussion about how ESR changes affected their efficacy judgements.
Both groups discussed how difficult it was, in terms of the emotional cost to the patient, to withdraw a patient from biologicals. However, the northern consultants found some easement in respect of this dilemma by making explicit reference to the BSR NICE guidelines.
The establishment of the NICE was seen by some as a way of helping to reduce ‘postcode prescribing’ by demonstrating how the evidence of clinical and cost effectiveness could support best practice. Others hold the view that it was always bound to fail  because it could take no account of the real budgetary pressures faced in different areas of the country or of factors other than evidence of a treatment's efficacy or effectiveness affecting prescribing behaviour .
This study, in two regions where the use of biologicals differs nearly two-fold, was undertaken to shed further light on factors affecting the clinical decision-making of rheumatologists using these treatments.
Using a series of anonymized ‘paper patients’, based on real cases treated in both regions with a trial of biological therapy, we found that consultants from the north and the south of Ireland were equally likely to rate the change in a patient's condition after treatment as clinically important, even though the perceived change was rated as very slightly greater by the northern consultants. Even though for some southern consultants (in the subsidiary analysis of hidden duplicates) a given therapeutic effect was deemed to be of greater clinical significance if the drug was believed to be a biological rather than a DMARD, on the whole it would appear that doctors from either constituency would maintain similar groups of patients on biological treatment, and there do not appear to be important differences in their judgements of treatment efficacy. Such differences, at least at the level of individual consultant rheumatologists, have been apparent in the past when a CJA approach has been used to study their treatment decisions . Although our findings in this regard are at a group (rather than at an individual doctor) level, it may be that the perceived benefit of modern biological treatments is of such a different order of magnitude that differences between doctors in the judgement of efficacy, in 2004, are of comparatively little consequence, which may not have been the case when CJA emerged first in rheumatology 20 yr ago.
Our intention from the outset was to use real rather than fictionalized patient data for our vignettes, as in some past CJA studies. Because of the side-effect profile of the new drugs, patients require careful monitoring and it was possible much more readily to retrieve the information from medical charts (such as DAS and HAQ scores) to devise the vignettes than it would have been for patients who had not been treated and whom we excluded from our study. Even though our northern and southern consultants had broadly comparable judgement models overall, it would still be possible for different clinical cues to affect the decisions of northern and southern consultants in different ways in patient populations with a different case-mix (and a different prevalence of these clinical cues). The judgement of eligibility for treatment (of a naive patient population) could not be directly modelled in this clinical judgement analysis but was discussed in the focus groups. The analysis of the hidden duplicates (which to our participants represented patients previously untreated with biologicals) may nevertheless suggest that southern consultants may have a lower overall eligibility threshold (for offering this treatment). Interestingly, when we drew a sample of cases from St Vincent's University Hospital (Dublin) for this study, it proved far harder to obtain a set with all the necessary information that is required to determine BSR eligibility. (The establishment of a treatment register made the task much easier in the north.) We determined that, prior to initiation of biological therapy, 12 of the 40 cases from the south included in the study would not have met the BSR criteria. This was the case for only one of the Belfast cases. However, because there were only nine of these duplicates, the specific contribution of particular cues to individual judgement models cannot be derived from these data.
However, the clinical judgement analysis showed that there were few major differences (between the two groups of consultants) in the way the cues (largely those mentioned in the BSR criteria) bore upon their decisions to maintain patients on treatment and on their perceptions of the benefit from treatment, so it would be surprising if their judgement models for selecting naive patients for treatment were significantly different. This merits further study.
The particular cue that did distinguish their judgements about perceived therapeutic benefit was ESR, which tended to affect the judgements of northern group a little more than the southern group. In fact, though CJA may offer what has been called a ‘paramorphic’ representation of judgement, the models emerging are largely descriptive and are not intended to reveal the underlying cognitive processes. This was one of the reasons we thought it important to triangulate the CJA findings with the qualitative analysis of our two focus groups. Indeed, the northern group was concerned (following the focus group discussions) that ESR and DAS could make disproportionate contributions to the evaluation of outcome (using the BSR criteria). There was much more explicit discussion on these two factors in the northern group than among the southern consultants. The context for these comments concerned the tension they felt when objective and subjective response criteria had to be combined to reflect a single response threshold. They could foresee circumstances when a low ESR might be given disproportionate weight . For example (one such case discussed in the focus group), a 53-yr-old patient at baseline had a patient global assessment of 24; he had 10 swollen and five tender joints and his ESR was 13 mm/h, resulting in a DAS28 score of 5.25. At his 3-month review, the patient global assessment was 25; there were eight swollen and two tender joints, and the ESR was 4 mm/h. The second DAS28 score was 3.8, an impressive improvement of 1.45. Thus, even though there had been little or no discernible improvement, the DAS28 response exceeded the cut-off of 1.2 stipulated in the BSR guidelines. In this case, the guidelines advise continuation of treatment. As a result of the log transformation of ESR values used to calculate the DAS28 score, changes in ESR below 20 mm/h may have an inordinately large effect on the change in DAS28 score. Patients with a low initial ESR are thus more likely to show greater improvements in DAS28 score than those with a high initial ESR. The contribution of an ESR change from 17 to 2 mm/h to the change in DAS28 score is 1.5, whereas a change in ESR from 90 to 60 mm/h would have contributed just 0.27. It is thus not surprising therefore that gaming came to be discussed in the northern group as a possible factor affecting their decisions, whereby a consultant, in some circumstances, might tacitly influence the patient's global assessment in order to ensure that the 1.2 threshold was crossed. Such discussions did not arise in the southern focus group.
That northern consultants were affected by the explicit obligation (placed on them by the commissioning health authorities) to subscribe to the BSR criteria may be reflected in the slightly higher inter-rater consistency that emerged from the Cochran's Q analysis, when compared with their southern counterparts. Interestingly, neither age nor the employment status of patients had much bearing on decisions made by doctors from either constituency. In the north this was because, from the outset of the waiting list (for these drugs), a firm decision was made that eligibility for treatment at any instant would be dictated only by the BSR guidance and the date of assessment.
Among its advocates, there are several supposed strengths of a CJA approach. It is claimed that while doctors may not explicitly recognize all the factors bearing upon their prescribing judgements, providing cognitive feedback on their judgement policy model may help to reduce inconsistencies between doctors. While this was not an intervention study, we feel it would have been unlikely that the results from the clinical judgement analysis alone, without triangulating the findings with a qualitative study, would have been just as enlightening to the researchers or the participants, who afterwards felt moved to record their feelings in correspondence .
These results show that, while judgement models of southern and northern consultants concerning therapeutic efficacy bear many similarities, there is some suggestion that southern consultants may be more inclined (than northern consultants) to commence biological therapy even for patients who would not meet strict BSR eligibility criteria. Nevertheless, the application of the NICE criteria and BSR guidelines in the north, without adequate funding provision, has resulted in considerable treatment delays for northern patients with RA.
The authors would like to thank all the participating consultants and the research nurses Anne Madigan and Kim Brown, who gathered the vignette information. F.K. and N.S. acted as guarantors for the study. The study was funded by a North–South Cooperative Grant from the Research and Development Office of DHSSPSNI (Belfast) and the Health Research Board (Dublin).
O.F. has received honoraria, is a member of a speakers bureau and has received grants/research support from Wyeth. He has also received grants/research support from Schering-Plough and Abbott Pharm. The other authors have declared no conflicts of interest.
Department of Epidemiology and Public Health, 1Department of Psychology and 2Department of Rheumatology, Queen's University of Belfast, Belfast, UK and 3St Vincent's University Hospital, Dublin, Ireland.