Validity of a two-component imaging-derived disease activity score for improved assessment of synovitis in early rheumatoid arthritis

Abstract Objectives Imaging of joint inflammation provides a standard against which to derive an updated DAS for RA. Our objectives were to develop and validate a DAS based on reweighting the DAS28 components to maximize association with US-assessed synovitis. Methods Early RA patients from two observational cohorts (n = 434 and n = 117) and a clinical trial (n = 59) were assessed at intervals up to 104 weeks from baseline; all US scans were within 1 week of clinical exam. There were 899, 163 and 183 visits in each cohort. Associations of combined US grey scale and power Doppler scores (GSPD) with 28 tender joint count and 28 swollen joint count (SJC28), CRP, ESR and general health visual analogue scale were examined in linear mixed model regressions. Cross-validation evaluated model predictive ability. Coefficients learned from training data defined a re-weighted DAS28 that was validated against radiographic progression in independent data (3037 observations; 717 patients). Results Of the conventional DAS28 components only SJC28 and CRP were associated with GSPD in all three development cohorts. A two-component model including SJC28 and CRP outperformed a four-component model (R2 = 0.235, 0.392, 0.380 vs 0.232, 0.380, 0.375, respectively). The re-weighted two-component DAS28CRP outperformed conventional DAS28 definitions in predicting GSPD (Δtest log-likelihood <−2.6, P < 0.01), Larsen score and presence of erosions. Conclusion A score based on SJC28 and CRP alone demonstrated stronger associations with synovitis and radiographic progression than the original DAS28 and should be considered in research on pathophysiological manifestations of early RA. Implications for clinical management of RA remain to be established.


Introduction
The DAS, introduced in 1990 [1], has had a major impact on the management of patients with RA. It is now widely used in research and in many countries DASbased thresholds determine which patients can access biologic therapy [2]. However, the DAS has limitations. Despite recommendation by the European Medicines Agency that 28-joint DAS (DAS28) remission (DAS28 < 2.6) [3] is the preferred primary outcome for trials of agents other than NSAIDs [4], there is evidence of continued radiographic progression in patients achieving this goal [5]. There is also recognition that two patients with the same DAS score can have different phenotypes [6].
The DAS was developed using rheumatologists' treatment decisions to define periods of high and low disease activity. Of the DAS components, Ritchie articular index and 44 swollen joint count (SJC44) made the largest contribution to discriminating between these states, suggesting that the rheumatologists placed more importance on joint assessments than blood markers of inflammation (ESR) or patient subjective effects (general health visual analogue scale; GHVAS). Updated DAS definitions have since been introduced; to reduce assessment time, 28joint counts [7] may be used. Other definitions substitute CRP for ESR [8], and omit the GHVAS [7], although these definitions are not interchangeable [911].
All existing DAS variants were developed using the same outcome (treatment decision) prior to the widespread use of modern imaging, which has since demonstrated disparity between clinical assessments and imaging-detected synovitis, a possible explanation for continued joint destruction in patients considered to be in clinical remission [1217]. Baker et al. [18] found that of the original DAS28 components only SJC28 and acute phase reactant (CRP or ESR) were independently associated with MRI-detected synovitis, despite the DAS28 being weighted most heavily for joint tenderness. Furthermore, recent findings from the Norfolk Arthritis Register (NOAR) and Early RA Study cohorts demonstrated that HLA-DRB1 amino acids associated with RA susceptibility and radiographic joint damage were associated with SJC28 and CRP, but not 28 tender joint count (TJC28) (GHVAS was not assessed) [19]. Thus, there are biologically plausible reasons for differential associations between core DAS components and RA patient phenotypes.
In addition to SJC28 and CRP, Baker et al. found evaluator's assessment of global disease activity VAS to be associated with MRI-detected synovitis. However, evaluator's assessment of global disease activity VAS is difficult to standardize and is frequently unavailable in large-scale (e.g. genome-wide association) studies incorporating existing datasets. There is therefore a need for a short-form DAS that allows evaluation of synovitis activity in existing large international RA cohorts.
The objectives of the current study were firstly to confirm whether TJC28, SJC28, CRP, ESR and GHVAS are independently associated with imaging-detected synovitis, using US combined grey scale and power Doppler (GSPD) score as the outcome; and secondly to define a novel re-weighted DAS28 using appropriate components, and validate it against radiographic progression in an external cohort.

Methods
All patients selected for this analysis satisfied a diagnosis of RA by 1987 ACR [20] and/or 2010 ACR/EULAR [21] criteria.
In the development phase we selected RA patient data from three sources: The Leeds Inflammatory Arthritis Continuum (IACON), a single-centre observational cohort of patients with early inflammatory arthritis recruited 20102014; the Pathobiology of Early Arthritis Cohort (PEAC), a multicentre observational cohort of patients with early RA recruited 20092015; and a clinical trial (Infliximab as Induction Therapy in Early Rheumatoid Arthritis; IDEA [22]), which was largely recruited in Leeds and satellite centres 20062009 only Leeds patients are included here. Validation against radiographic progression was conducted in RA patients selected from the Norfolk Arthritis Register (NOAR), a primary care-based inception cohort of patients recruited 19892008, presenting with recent-onset inflammatory polyarthritis, defined as 52 swollen joints for >4 weeks [23].
All patients provided written informed consent for inclusion in each of the studies, and ethical approval was granted by the following: IACON was recorded in mg/L; for censored values (CRP < 5 mg/ L), we imputed CRP = 2 prior to analysis (see 'Statistical methods' and 'Results' for more details). ESR was measured in mm/h. Observations were available in IACON at 0, 26, 52 and 104 weeks, in IDEA at 0, 52 and 78 weeks and in PEAC at 0 and 26 weeks. Visits were eligible if the US scan occurred within 1 week of the clinical assessment.
In NOAR CRP was measured in stored serum samples collected at 0, 5, 10 and 15 years; consequently, DAS28CRP was only calculated at these time points. Tender and SJCs were carried out at 0, 3, 5, 10, 15 and 20 years. A three-component (3C) DAS28 score was calculated for comparison with the re-weighted DAS28, as patient GHVAS was not available.

Ultrasound
Full details of US scanning procedures are provided in online Supplementary material, section 'Methods', available at Rheumatology online. In IACON and IDEA grey scale (GS) and power Doppler (PD) synovitis were each scored semi-quantitatively 03 [12]; GS scores included both synovial hyperplasia and joint effusion. Sagittal plane views were scored. In IACON, scoring was performed by several trained sonographers and rheumatologists. In IDEA the majority of scoring was performed by one expert rheumatologist. In IDEA and IACON the following joints were scanned: wrists, MCPs 23, PIPs 23, knees, MTPs 15. In PEAC GS and PD were scored from 04 against a standardized image atlas [24], using transverse plane views; bilateral MCPs 15 were scanned.

Radiography
In NOAR, radiographs of the hands and feet were performed during the first 10 years of follow-up and were scored using the Larsen method [25] as previously described [2628]. Joint erosion was defined as a cortical break of 52 mm and was assigned a score of 52. Radiographs were read independently, blind to sequence, by 2 medically qualified observers (intra-and inter-observer agreement for erosion presence 90% and 91%, respectively), who underwent specific training. Disagreements on erosion status were arbitrated by a third investigator.

Statistical methods
In IACON and IDEA, OMERACT-EULAR composite PDUS scores that combined GS and PD at the joint level [29] were summated across 22 joints to give GSPD.
In PEAC, GS and PD scores were simply summated across 10 joints, to give GSPD.
In the largest dataset (IACON), within observed data, robust regression on order statistics was used to impute left-censored values of CRP < 5 mg/L; summary statistics calculated for the imputed values indicated which single value should be imputed in the main analysis. Single imputation was chosen over model-based imputation to provide clarity for clinicians and researchers wishing to use the new score. The DAS28-CRP was originally derived using high sensitivity (hs)CRP; however, it is often calculated at centres where the reporting limit is 5, leading to variation in the values imputed for CRP < 5 mg/L.
Multiple imputation by chained equations was used to address missing covariate data; GSPD was not included in covariate imputation models and only patients with observed GSPD were retained. Imputation models included DAS28 components, transformed where appropriate to maintain consistency with analysis models [ˇSCJ28,ˇTJC28, ln(ESR), ln(CRP+1), GHVAS], Health Assessment Questionnaire Disease Index (HAQ-DI) score, age and sex. Predictive mean matching was used to impute all variables; results from 20 imputed datasets were combined according to Rubin's rules.
Conventional DAS28 scores were calculated as follows (CRP mg/L, GHVAS mm): In addition we calculated partial simplified disease activity index (SDAI) clinical disease activity index (CDAI) scores, which excluded physician VAS, as follows (CRP mg/dL, GHVAS cm): Partial SDAI = TJC28 + SJC28 + CRP + GHVAS Partial CDAI = TJC28 + SJC28 + GHVAS We modelled GSPD as a function of individual DAS28 components in linear mixed models, using 20-fold crossvalidation to evaluate predictive performance on data not seen before; multiple imputation was not nested within training folds. Predictor performance was evaluated as the squared Pearson correlation (R 2 ) between predicted and observed US scores, concatenating all test folds into a single dataset. The strength of evidence favouring one model over another was evaluated as the difference in test log-likelihoods. For each individual in the test dataset, the log of the multivariate Gaussian likelihood of the model given the residuals was calculated, and this test log-likelihood was summed over all individuals in all test folds. The difference in test log-likelihood can be interpreted directly as a measure of the strength of evidence favouring one model over another. The asymptotic equivalence of model choice by leave-one-out cross-validation and Akaike's information criterion (AIC) [30] implies that a difference in test log-likelihood of 2.6 natural log units is equivalent (for comparison of linear nested models differing by two extra parameters) to P = 0.01.
To account for differences in scaling of the GSPD outcome, ratios between coefficients obtained from each cohort were weighted by the number of cases with nonmissing acute phase measurements to produce the final definition for the re-weighted DAS28. For example, the ratio of the coefficient for SJC28 to the coefficient for CRP from each cohort was multiplied by the number of observations, then the values from the three cohorts were added together and divided by the total number of observations to give the final ratio to be used in the score. We did not attempt to scale the final score to maintain compatibility with the conventional DAS28 definition(s) because they were derived using entirely different methods and existing cut-offs for remission or severity of disease activity would not be valid for use with the new score.
As previously described [19,28,31], the Larsen score was modelled as a longitudinal continuous non-normally distributed outcome variable using a generalized linear latent and mixed model [32] that included the covariates age and disease duration in addition to disease activity variables. Multivariable models including individual DAS28 components [ˇTJC28,ˇSJC28, ln(CRP+1)] were constructed first, followed by models that included each DAS28 score (re-weighted 2C or conventional 3C) individually. Effect sizes are given as a b-coefficient with 95% CIs. AIC and Bayesian information criteria were used to determine model fit; lower values indicate better fit. The presence of erosions of the hands and feet was treated as a longitudinal binary variable and modelled using a generalized estimating equation model with logit-link function and an exchangeable within-subject correlation structure. The quasi-AIC, the equivalent of AIC in a generalized estimating equation, was used to determine model fit [33].
Analyses that included US were conducted in R version 3.3.1 (R Foundation for Statistical Computing, Vienna, Austria); analyses of radiographic outcomes were carried out in Stata v14 (StataCorp, College Station, TX, USA).

Results
Baseline characteristics of patients are provided in Table 1. We included 889 observations in 434 IACON patients, 163 in 59 IDEA patients, and 183 in 117 PEAC patients who had at least one US measurement. For validation against radiographic progression there were 3037 observations in 717 NOAR patients.
Imputation of CRP < 5 The median of values imputed in IACON using robust regression on order statistics for values of CRP < 5 was 1.88. We therefore imputed 2 for observations of CRP < 5. See Table 2 for numbers of cases in which this was necessary.

Associations between DAS28 components and GSPD
Associations between DAS28 components and GSPD were only investigated in IACON, IDEA and PEAC; NOAR was excluded to provide external validation of the new scale against radiographic progression.
In four-component (4C) models that included CRP, in all three cohorts only SJC28 and CRP were independently associated with GSPD. Results for models with two or four individual components, using either CRP or ESR, are presented in Table 3.
Repeating this analysis with complete case data, within each cohort, did not alter the conclusions (data not shown).   2CÀDAS28CRP ¼ˇSJC28 þ 0:6 Â lnðCRP þ 1Þ We then calculated DAS28 according to conventional definitions (3C-DAS28CRP, 4C-DAS28CRP and 4C-DAS28ESR), and according to the new 2C-DAS28CRP defined above, in addition to partial SDAI and partial CDAI. R 2 values for the association with GSPD were considerably higher for 2C-DAS28CRP than for the conventional scores (Table 4), and large differences in test log-likelihood confirmed that the new score outperformed the existing definitions.
While we consider 2C-DAS28CRP the most appropriate measure of global synovitis, to permit analysis of historical datasets with only ESR available we constructed 2C-DAS28ESR, using the same method to combine SJC28: ESR coefficient ratios across the three cohorts: Associations between DAS28 components and radiographic outcome Results of analyses testing the association of DAS28 components with radiographic outcomes are presented in Table 5. TJC28 was not associated with Larsen score, while SJC28 and CRP were both positively associated. Higher TJC28 was associated with lower odds of erosion, while SJC28 and CRP were both positively associated with erosion.
Comparison between conventional and re-weighted DAS28 scores Conventional 3C-DAS28CRP was significantly associated with Larsen score; however, the association was stronger with re-weighted 2C-DAS28CRP (Table 6). Furthermore, model fit was improved with the re-weighted 2C-DAS28CRP score. Conventional 3C-DAS28CRP was not associated with the presence of erosions. However, re-weighted 2C-DAS28CRP was significantly associated with the presence of erosions. Model fit was again better with the reweighted 2C-DAS28CRP score.

Discussion
This is the first study to produce a 2C DAS weighted against US-detected inflammation and demonstrate that it outperforms the conventional definitions in the strength of association both with synovitis and radiographic progression. The resulting 2C-DAS28CRP is a potential tool for assessing one pathophysiological manifestation of RA, i.e. synovitis, consistent with the OMERACT core set definition [34]. RA DASs were intended to capture the severity of the patient's inflammatory symptoms. The 2C-DAS28CRP was developed to better reflect the pathophysiological manifestation of synovial inflammation and not a more global construct of disease activity, as reflected in the breadth of the OMERACT core set. Consequently, we would not recommend replacing the conventional DAS28 with the 2C score in clinical trials or clinical practice without due consideration of how to measure the other core set areas. The 2C-DAS28CRP score will therefore be useful in studies that focus on the pathophysiological manifestations of RA: these include (pharmaco)genetic/genomic studies. It could be argued that to identify biological factors associated with disease activity or response to therapy, US inflammation should be used directly as the outcome. This would have the advantage of allowing us to separate synovitis from tenosynovitis, although it is likely that these both fall into a broader construct of 'RA inflammation' which would remain an appropriate target for such research. However, it is not currently possible to acquire and score images in routine clinical settings, from which most of the large (pharmaco)genetic/genomic datasets have been compiled to date and are likely to be compiled in the near future. The 2C-DAS28CRP will therefore aid the (re)analysis of (existing) large-scale datasets and may yield larger effect sizes for any putative associations in comparison with existing DAS28CRP.
Our findings show that if TJC28 and patient GHVAS are removed from the DAS28, the association with the pathophysiological mechanism targeted by DMARDs, namely synovial inflammation, is improved. The association was stronger for CRP-based models than for ESR-based models, consistent with CRP being a more sensitive and responsive measure of inflammation than ESR, which is an indirect measure of multiple plasma proteins and influenced by age, gender, haemoglobin and serum immunoglobulin levels, among other confounders [35].
In previous work Baker et al. [18] found that SJC28, acute phase reactants and evaluator's global assessment were associated with MRI-detected synovitis, which broadly agrees with our findings. However, we were unable to investigate the association with physician's global assessment as this was not available in our cohorts; indeed, this partly motivated the development of the new score. For the same reason we were unable to calculate full SDAI and CDAI scores; however, the equivalent partial scores performed poorly in comparison with 2C-DAS28CRP, because they gave equal weight to tender and swollen joints and included patient VAS, when neither tenderness nor patient VAS were independently associated with GSPD.
In a previous attempt to update the DAS28 using USdetected synovitis [36], TJC28 and SJC28 were replaced with PD score from 22 joints (including MTPs and excluding elbows, shoulders and PIPs), and a count of GS synovitis presence in the standard 28 joints. However, coefficients for each term were not amended, and replacing the clinical counts with US restricts the clinical utility of the score because it requires US assessment in all cases. Nevertheless, the authors reported stronger associations between their US-derived DAS28 scores and radiographic and MRI measures of structural progression compared with the original DAS28.
There are a number of limitations to this study. A limited amount of missing covariate data was addressed using multiple imputation, which has been shown to reduce imprecision and bias in the estimation of coefficients in comparison to ad hoc approaches such as complete case analysis [37]. We did not collect hsCRP and therefore opted to impute a single value for observations of CRP < 5 to standardize practice when using the new scale. It is possible that for patients with no swollen joints, hsCRP would provide valuable additional information, and a DAS using alternative weights may be more appropriate in such circumstances. However, hsCRP is not routinely available in the clinic, limiting application. US assessments of different joint sets were made for IACON/IDEA and PEAC, and the methods of calculating GSPD differed between the cohorts. Coupled with potential case mix differences, this explains why intercepts and coefficients differed between the cohorts, which we addressed by using ratios of the coefficients rather than the coefficients themselves. The inclusion of cohorts representing a spectrum of disease activity and treatment is desirable in a validation study. Despite the differences in case mix evident in Table 1, and the differences in US methodology, patterns of association between DAS28 components and the US outcome were the same across the development cohorts. The validation cohort was recruited prior to the introduction of rapid DMARD escalation for management of RA; therefore, the level of radiographic progression may have been higher than we would expect in modern cohorts. However, greater range in the outcome is desirable when studying associations and, importantly, we identified the same patterns of association, this time replacing US inflammation with its posited consequence, radiographic progression, as the outcome. We chose not to rely on observed coefficients so our results would be widely generalizable across varied cohorts. Furthermore, any differences between cohorts could not have affected the finding that 2C-DAS28CRP was more strongly associated with the outcome than original DAS28 within each cohort. Study design is an important consideration when US methods are not standardized; it would not be appropriate to directly compare levels of inflammation between cohorts such as the ones included in this study. However, our results show that with the use of carefully selected methods (such as using ratios rather than measured coefficients) it is possible to derive clinically meaningful results by combining data from a range of US scoring systems. In future, advances in machine learning for feature extraction and scoring may help improve standardization. GS scoring may reflect fibrosis in addition to inflammation; we have only included patients with early RA in this development study, which should limit the impact of this issue, but may also limit the applicability of the scale for use in patients with more established disease. Future areas of validation work for the new scale will include reassessment of treatment effects in historical trials of both early and established RA, in addition to prospective data collection.
In summary, a re-weighted DAS28 equation including only SJC28 and CRP was more closely associated with US-detected synovitis than definitions that also included TJC28 and GHVAS. CRP-based DASs were more closely associated with synovitis than ESR-based counterparts. The 2C-DAS28CRP showed stronger association with burden of radiographic damage than the conventional 3C-DAS28 including TJC28. The improved association with both synovitis and erosion demonstrates that the novel 2C-DAS28 is a more appropriate measure of pathophysiology in early RA compared with conventional DAS28.