Nomogram to predict primary non-response to infliximab in patients with Crohn’s disease: a multicenter study

Abstract Background Infliximab (IFX) is effective at inducing and maintaining clinical remission and mucosal healing in patients with Crohn’s disease (CD); however, 9%–40% of patients do not respond to primary IFX treatment. This study aimed to construct and validate nomograms to predict IFX response in CD patients. Methods A total of 343 patients diagnosed with CD who had received IFX induction from four tertiary centers between September 2008 and September 2019 were enrolled in this study and randomly classified into a training cohort (n = 240) and a validation cohort (n = 103). The primary outcome was primary non-response (PNR) and the secondary outcome was mucosal healing (MH). Nomograms were constructed from the training cohort using multivariate logistic regression. Performance of nomograms was evaluated by area under the receiver-operating characteristic curve (AUC) and calibration curve. The clinical usefulness of nomograms was evaluated by decision-curve analysis. Results The nomogram for PNR was developed based on four independent predictors: age, C-reactive protein (CRP) at week 2, body mass index, and non-stricturing, non-penetrating behavior (B1). AUC was 0.77 in the training cohort and 0.76 in the validation cohort. The nomogram for MH included four independent factors: baseline Crohn’s Disease Endoscopic Index of Severity, CRP at week 2, B1, and disease duration. AUC was 0.79 and 0.72 in the training and validation cohorts, respectively. The two nomograms showed good calibration in both cohorts and were superior to single factors and an existing matrix model. The decision curve indicated the clinical usefulness of the PNR nomogram. Conclusions We established and validated nomograms for the prediction of PNR to IFX and MH in CD patients. This graphical tool is easy to use and will assist physicians in therapeutic decision-making.


Introduction
Crohn's disease (CD) is a chronic and refractory inflammatory bowel disease that can affect any portion of the gastrointestinal tract from the mouth to the perianal area. CD is mainly characterized by prolonged diarrhea with crampy abdominal pain, weight loss, and fever, with or without gross bleeding [1]. A substantial portion of patients with CD may develop a series of complications, such as fistula, abdominal abscess, and bowel obstruction [2][3][4], which decrease quality of life, cause physical and psychological morbidity, and increase mortality [5][6][7]. CD has been reported to be associated with significantly increased healthcare and economic burdens [8]. As the incidence of CD is increasing rapidly in Asian countries, especially China [9], efficient and cost-effective therapies are urgently being sought.
Infliximab (IFX), a chimeric anti-human necrosis factor-a monoclonal antibody, is the most cost-effective biologic for CD patients who fail to respond to standard therapy [10]. It is also well known that IFX is effective for inducing and maintaining remission in CD patients [11]. Nowadays, the therapeutic goal for CD patients has evolved from remission to mucosal healing (MH)-a more ambitious goal. As a 'treat-to-target' strategy, MH decreases the risks of surgery, new penetrating events, and new stenosis, thus reducing disability caused by CD [12].
However, 9%-40% of CD patients exhibit primary nonresponse (PNR) to IFX [13]. There is no consensus on the definition of PNR; however, PNR features little improvement after initiating the induction therapy of IFX. Since IFX is expensive, especially for patients in Asia, where the majority of the populace are not covered by insurance [14], precision medicine has been suggested to optimize treatment strategies. Furthermore, IFX is related to several adverse events, such as infusion reactions, serious infection, and malignancy [15][16][17], and treating non-responders with IFX increases exposure to adverse events and delays initiation of other effective CD treatments.
Several risk factors associated with PNR have been explored, such as age, sex, body mass index (BMI), smoking, disease duration, small-bowel involvement, stenosing and/or penetrating phenotype, FAS-L, and caspase-9 in apoptosis-related genes [11,13]. The effects of these risk factors can be partly explained by pharmacokinetics, pharmacodynamics, and pharmacogenetics of monoclonal antibodies [11,18]. However, the exact mechanism of PNR to IFX has not been elucidated.
By combining these risk factors, Billiet et al. [19] constructed a matrix model that makes predictions based on age, BMI, and prior surgery history, but the model was not validated in external or multicenter cohorts. Tang et al. [20] and Jung et al. [21] developed effective prediction models by combining clinical data and genetic factors; however, since genetic analyses are not currently applied in routine examinations, these models are not practical.
Thus, we aimed to use routine clinical data to construct and validate nomograms of IFX response in CD patients from a multicenter cohort; these nomograms could aid in therapeutic decision-making.

Study population
We collected data of patients with CD treated in four inflammatory bowel disease (IBD) centers in China between September 2008 and September 2019, including the First Affiliated Hospital of Sun Yat-sen University (Guangzhou, China), Sir Run Run Shaw Hospital of Zhejiang University School of Medicine (Hangzhou, China), the Second Affiliated Hospital of Zhejiang University School of Medicine (Hangzhou, China), and the General Hospital of Tianjin Medical University (Tianjin, China). Inclusion criteria were as follows: (i) a diagnosis of CD based on clinical, endoscopic, radiographic, and histological evidence in the center; (ii) endoscopic active with a Crohn's Disease Endoscopic Index of Severity (CDEIS) of > 3 before IFX treatment; and (iii) complement of induction of IFX of 5 mg/kg at weeks 0, 2, and 6 in the center. Exclusion criteria included (i) previous ileac, colonic, or ileocolonic resection; (ii) lack of evaluation of endoscopy before induction of IFX or at week 14 after the first induction of IFX; and (iii) incomplete data. Patients with previous ileum, colon, or ileocolic resection were excluded because the standard evaluation of post-surgical recurrence is Rutgeerts score, rather than CDEIS. We decided that 70% of enrolled patients were used to derive the model and 30% of them were used for the validation study. Since patients were recruited from various hospitals, the randomization was supposed to be stratified by four centers.

Data collection
A predetermined data sheet was used to collect information including age, sex, weight, height, smoking habits, surgical history, disease duration, disease localization, disease behavior (as defined by the Montreal Classification [22]), presence of extraintestinal manifestations, concomitant therapy, C-reactive protein (CRP), serum albumin concentration, and CDEIS at the initiation of IFX treatment. Follow-up data included CRP at every IFX infusion and CDEIS at week 14. CDEIS was calculated by experienced IBD clinicians.

Outcomes and definitions
The primary outcome was PNR, which was defined as a decrease in CDEIS of < 50% from baseline at week 14 after the first IFX infusion. On the contrary, response was defined as a decrease in CDEIS of ! 50% from baseline at week 14. The secondary outcome was MH, defined as CDEIS < 1.5 at week 14. Prior surgery was defined as resection of a part of the gut, strictureplasty for stenosing complications, or a fistulectomy/fistulotomy for complicated perianal disease [19].

Construction of nomograms
First, Spearman's correlation analyses were performed to detect multicollinearity. Correlation factors >0.7 were considered significant and collinear factors should be excluded from analysis to decrease bias. Logistic regression was used to select risk factors for univariate analysis. Significant variables were included in a multivariate logistic-regression analysis. The final model in multivariate regression was selected by backward step-down analysis based on Akaike's Information Criterion. A nomogram was developed based on the multivariate logisticregression model. A nomogram is an intuitive and quantitative tool to predict the probability of outcomes.

Assessment of nomogram performance
Discriminative ability was assessed by the area under the receiver-operating characteristic curve (AUC). The values of AUC are between 0.5 and 1.0, with 0.5 corresponding to a model with no discriminatory ability and 1.0 corresponding to perfect discrimination. The comparison of AUCs was conducted by Delong's test for single factors and 2,000 bootstrap resamples for the matrix model [19]. Calibration was tested by a Hosmer-Lemeshow goodness-of-fit test after splitting the sample into quintiles. This test assessed how well the model fits observed data, with P > 0.05 indicating no evidence of poor fit. Calibration curves were presented to depict the relationships between predicted probabilities and observed frequencies. The overlap with the reference line indicates perfect agreement on the model.

Clinical utility of nomograms
Decision-curve analysis (DCA) was conducted to calculate the net benefits at different threshold probabilities in the combined training and validation cohorts. The optimal cut-off value was selected by maximizing the sum of the sensitivity and specificity on the Youden index from the training group.

Statistical analysis
Continuous variables were presented as medians and interquartile range (IQR), and were compared using Wilcoxon ranksum tests. Categorical variables were presented as counts and percentages of the cohort, and were compared by using Chisquared tests or Fisher's exact tests, as appropriate. All statistical analyses were performed in R software (version 3.6.1). Randomization was conducted using the 'caTools' package. Receiver-operating characteristic (ROC) curves were plotted using the 'pROC' package. Nomograms and calibration curves were performed using the 'rms' package. The Hosmer-Lemeshow test was analysed using the 'ResourceSelection' package. DCA was generated using the 'rmda' package. P values < 0.05 were considered statistically significant.

Patient characteristics
A total of 343 patients diagnosed with CD who had received IFX induction were enrolled in this study. After applying randomization stratified by centers, these patients were randomly divided into a training cohort (n ¼ 240) and a validation cohort (n ¼ 103). The baseline characteristics of the two cohorts are displayed in Table 1. There was no difference in the PNR rate between the two cohorts (25.8% and 25.2% in the training and validation cohorts, respectively; P ¼ 1.000). The MH rate was also not significantly different between the two cohorts (39.6% and 35.9%, respectively; P ¼ 0.605). No significant differences between the two cohorts were found in any variables. No correlation factors were > 0.40 in Spearman's correlation analyses.
To use the nomogram, we drew a vertical line straight upward to the points axis for each predictor, added up the points from each predictor, and drew a vertical line downward from the total points axis to determine the probability of PNR. For example, a 20-year-old male was diagnosed with CD of B1 phenotype; his BMI was 16.5 kg/m 2 ; CRP at week 2 after first IFX was 1.0 mg/L. We adopted the nomogram to this case: Age 20 ¼ 27, CRP at week 2 lower than 5 mg/L ¼ 0, BMI lower than 18.5 kg/m 2 ¼ 54, B1 phenotype ¼ 0, total point ¼ 81. The probability for PNR was 0.16.

Performance of the nomogram
The AUCs were 0.77 (95% CI: 0.70-0.84) and 0.76 (95% CI: 0.64-0.88) in the training and validation cohorts, respectively ( Figure 1B and C). P-values of the Hosmer-Lemeshow goodnessof-fit test were >0.05 in both the training cohort and the validation cohort. Calibration curves showed excellent agreement between the nomogram prediction and actual PNR rate in the training and validation cohorts ( Figure 1D and E).
Comparison of the nomogram with single factors and the matrix model As shown in Figure 2A, the AUC in the combined training and validation cohorts was 0.77, which had a significantly higher predictive accuracy for PNR than age at first IFX (AUC ¼ 0.53, P < 0.001), CRP at week 2 (AUC ¼ 0.67, P < 0.001), BMI (AUC ¼ 0.61, P < 0.001), or B1 phenotype (AUC ¼ 0.64, P < 0.001) alone. We also compared the discrimination of the nomogram with that of an existing matrix model [19]. The AUC of nomogram for PNR was significantly higher than that of the matrix model (AUC ¼ 0.47, P < 0.001).   Clinical utility of the nomogram The DCA for the nomogram of PNR was plotted ( Figure 3A). The net benefit was positive when the threshold probability for response, which is equal to 1 À probability of PNR, was within a range of 0.15-0.90. In other words, when the threshold probability of PNR was between 0.10 and 0.85, the nomogram added more net benefit than 'treat-all' or 'treat-none' strategies. Based on the Youden index in the training cohort, the overall patients were divided into low-and high-risk groups by a cutoff value of 0.296 of PNR probability (equal to 121 of the total points in the nomogram). Patients with high risk had greater probability of PNR in overall patients (47.1% vs 14.3%, P < 0.001; Figure 3B). Since >60% of patients were complicated with perianal disease, we also evaluated the utility of the nomogram for these patients and found that the nomogram had good discriminatory ability for PNR in luminal disease for them (11.0% vs 50.0%, P < 0.001; Figure 3C).
Performance of the nomogram AUCs were 0.79 (95% CI, 0.73-0.85) and 0.72 (95% CI, 0.62-0.82) in the training cohort and validation cohort, respectively ( Figure  4B and C). The P-values of the Hosmer-Lemeshow goodness-offit test were 0.241 and 0.346 in the training and validation cohorts, respectively. The calibration curves showed notable agreement between predicted MH probability and observed MH rate ( Figure 4D and E).

Comparison of the nomogram with single factors
The discriminatory ability of the nomogram was significantly higher than that of disease duration, B1 phenotype, CRP at week 2, or baseline CDEIS in the combined training and validation cohorts (all P < 0.05; Figure 2B).

Discussion
The differential responses of CD patients to IFX treatment present an avenue by which precision medicine, in the form of individualized treatment strategies, can be used to optimize CD therapies [13]. Treatment with Vedolizumab, another biologic CD therapeutic, is already indicated by a scoring system derived from clinical-trial data, which has been validated on real-world data [23]. However, IFX, as the first biologic approved for CD, still lacks a convincing and practical clinical prediction tool. IFX remains the predominant biologic treatment for CD in China, so a reliable tool for predicting IFX response is urgently needed. Previously, we reported that serum interleukin 9 levels were predictive of IFX clinical efficacy in CD patients at our center [24]. To obtain a more widely applicable predictive model, we performed this multicenter study and constructed two nomograms to predict PNR and MH in response to IFX treatment, which represented the worst and best outcomes, respectively. Both nomograms had a notable discriminatory ability in our multicenter cohort. Our study showed that PNR occurred in 25% of cases, which is consistent with previous studies [13], and the MH rate was $38%, which is similar to that in other studies (29%-45%) [25].
As a biomarker for inflammation, CRP has been shown to be associated with disease activity in CD patients [26] and played an important role in our nomograms. To summarize, CRP at week 2 were negatively related to response to IFX in our study.
While several studies have illustrated that CRP at week 14 is a biomarker of clinical response to IFX in CD patients [27][28][29], we found that early normalization of CRP at week 2 was also indicative of IFX response. A similar result for the predictive capacity of CRP at week 2 has been previously reported in ulcerative colitis patients [30]. As for baseline CRP, there is no consensus on whether elevated CRP is related to response to IFX or how it affects the outcome [31]. In this study, we found that higher baseline CRP levels were negatively associated with response in univariate analysis, which is consistent with one previous report [27] but contrary to other studies [28,29,32]. Despite significance in univariate analysis, baseline CRP was not influential enough to enter the final model. Thus, physicians should not make decisions according to baseline CRP.
Low BMI may also predict disease-course severity [33]. We found that low BMI (BMI < 18.5 kg/m 2 ) was negatively associated with IFX response, similarly to previously reported findings [19]. There have been numerous studies about the association between obesity and loss of response [34,35], but investigations into the relationships between underweight and PNR merit further investigation. Since our patients had a lower BMI (median, 18.3 kg/m 2 ; IQR, 15.9-20.0 kg/m 2 ) than Western patients with CD [34,35], we only explored the influence of underweight.  Other clinical factors that we identified have also been linked to response to IFX in previous studies. It has been reported that younger patients are more likely to respond to IFX than older patients [19,36]. We also found that the probability of PNR elevated with increasing age, although the mechanism for this has not been fully elucidated. Some reports have suggested that patients with shorter disease duration have a higher chance of responding to IFX [37,38]. We also found similar relationship between disease duration and response in this study. Consistently with previous studies [39,40] and generally held beliefs, stenosing or fistulizing phenotypes were associated with worse clinical outcomes in our study. Additionally, studies about the relationship between disease severity and response to IFX are insufficient and remain controversial [31]. We found that more severe disease is not related to PNR but is less likely to achieve MH.
There are several advantages to the methodology and outcomes of our study compared to those of previous studies. First, our study made explicit comparisons between predictions of IFX response made by nomograms, by single factors included in the nomograms, and a previously published matrix model [19]. The nomograms developed in our study provided more accurate predictions than any single factors and the existing matrix model. Second, as our multicenter study included data from three geographically distinct IBD centers in China, comprising the southern, eastern, and northern parts of China, our results can be considered as representative of the patient population in China. Third, all data included in our study were routine clinical data, requiring no extra physical examinations or genetic characterization of patients, making the nomograms that we have developed both practical and economical for physicians, especially those in developing countries.
However, there are some limitations to our study. First, some clinical data, such as IFX levels and anti-IFX antibodies, were incomplete and not explored, while only half of the patients had combination therapy of azathioprine or 6-mercaptopurine. Second, > 60% of patients had perianal disease but only luminal disease outcomes were assessed for endpoints due to lack of a detailed record of perianal disease. Third, as patients were included with baseline CDEIS > 3, those with disease limited to the upper gastrointestinal tract and a substantial number of patients with lesions limited to the terminal ileum were excluded from this study, potentially biasing our cohort and limiting the scope of patients to which our nomograms are applicable. However, 70% of CD patients have disease located in the ileocolon or colon [41], so the nomograms we developed are still useful for the majority of CD patients. Moreover, since CDEIS was used as the definition of outcome, lesions in the upper gastrointestinal tract were not evaluated. Furthermore, the nomograms developed require prospective and external validation before they can be widely adopted.
In conclusion, our proposed nomograms provide accurate predictions of IFX-related PNR and MH in CD patients. To the best of our knowledge, this is the first nomogram to be developed from a multicenter cohort to predict response to IFX in CD patients. Using our nomograms to predict IFX response could reduce the time needed to identify effective therapeutic approaches for CD patients, saving costs and reducing patient harm.