Abstract

Objective

The purpose of this systematic review was to identify and appraise externally validated prognostic models to predict a patient’s health outcomes relevant to physical rehabilitation of musculoskeletal (MSK) conditions.

Methods

We systematically reviewed 8 databases and reported our findings according to Preferred Reporting Items for Systematic Reviews and Meta-Analysis 2020. An information specialist designed a search strategy to identify externally validated prognostic models for MSK conditions. Paired reviewers independently screened the title, abstract, and full text and conducted data extraction. We extracted characteristics of included studies (eg, country and study design), prognostic models (eg, performance measures and type of model) and predicted clinical outcomes (eg, pain and disability). We assessed the risk of bias and concerns of applicability using the prediction model risk of bias assessment tool. We proposed and used a 5-step method to determine which prognostic models were clinically valuable.

Results

We found 4896 citations, read 300 full-text articles, and included 46 papers (37 distinct models). Prognostic models were externally validated for the spine, upper limb, lower limb conditions, and MSK trauma, injuries, and pain. All studies presented a high risk of bias. Half of the models showed low concerns for applicability. Reporting of calibration and discrimination performance measures was often lacking. We found 6 externally validated models with adequate measures, which could be deemed clinically valuable [ie, (1) STart Back Screening Tool, (2) Wallis Occupational Rehabilitation RisK model, (3) Da Silva model, (4) PICKUP model, (5) Schellingerhout rule, and (6) Keene model]. Despite having a high risk of bias, which is mostly explained by the very conservative properties of the PROBAST tool, the 6 models remain clinically relevant.

Conclusion

We found 6 externally validated prognostic models developed to predict patients’ health outcomes that were clinically relevant to the physical rehabilitation of MSK conditions.

Impact

Our results provide clinicians with externally validated prognostic models to help them better predict patients’ clinical outcomes and facilitate personalized treatment plans. Incorporating clinically valuable prognostic models could inherently improve the value of care provided by physical therapists.

Introduction

Affecting 1.71 billion people worldwide in 2019, musculoskeletal (MSK) conditions are the most prevalent type of condition requiring rehabilitation.1 Evidence from meta-analyses on MSK conditions reveals that most interventions in rehabilitation have small to moderate effects.2–9 Stratified medicine has been touted as a promising avenue to improve clinical outcomes following rehabilitation and to provide better personalized care.10 Stratified medicine refers to splitting heterogeneous and oversimplified label conditions into homogeneous subgroups that share similar biological or risk characteristics.10 Prognosis-related findings represent a fundamental component of stratified medicine.11

Prognosis refers to the risk of future health outcomes or treatment response in people with a health condition.12 Prognosis goes beyond diagnosis, as it predicts the patient’s trajectory and outcomes, either poor or positive.12–14 Prognosis-related findings can be used to determine the person’s specific treatment needs.14, 15 Research involving prognosis investigates and determines specific factors, from biological, psychological, and social components, which are associated with a defined outcome trajectory.14, 15 The utilization of prognostic factors in isolation usually results in poor prediction and may lead to inappropriate interventions.15, 16 To improve the predictions in terms of outcome or treatment allocation, it is essential to use multiple prognostic factors combined within a prognostic model.16 Accordingly, based on the PROGRESS Framework, treatment allocation may be best informed by prognosis research involving prognostic models.16

A prognostic model is a formal combination of multiple prognostic factors allowing to estimate an individual risk at a specific endpoint.16 In clinical settings, these prognostic models are very useful.17 In addition to refining the prediction, the modifiable factors presented in the models represent therapeutic targets to be considered in the care plan.13 This clinical value has led to the development of prognostic tools in rehabilitation, such as clinical prediction rules (ie, rules that estimate the probability of future outcomes) and clinical decision rules (ie, also called prescriptive clinical prediction rules, because they suggest a course of action).18–20 However, most clinical prediction rules were developed with low methodological quality resulting in poor predictive ability.18–20 Standards on the methodology in the development of these models have only recently been proposed14, 21 and 3 basic steps must be followed to develop a prognostic model: (1) model development, (2) model validation (internal and external), and (3) clinical impact assessment.14, 16

Model validation can be determined through the process of internal and external validation.16 External validation is the most relevant step to obtain a first indication of the clinical value of a prognostic model.16 External validation consists of testing the developed and “internally validated” model on a new sample of participants to determine its generalizability and performance.16, 22 Two important predictive performance measures in the external validation step are discrimination (accuracy) and calibration (reliability).14, 23–25 Discrimination refers to the model’s ability to correctly distinguish between the absence and presence of the outcome.24, 25 Calibration is the agreement between predicted probabilities of occurrence and observed proportions of the outcome.24, 25 Contrary to discrimination, which cannot be improved, an external validation step can lead to an updated version of the model, allowing its recalibration.22, 24 Despite the importance of the external validation, this step is very often overlooked in most developed prognostic models.14, 18, 19

As prognosis can serve as a way to personalize the care of MSK patients,11 it is essential to provide a critical synthesis of the externally validated prognostic models to help clinicians incorporate high-quality prognostic data into their MSK practice. The main objective of this review was to identify and appraise externally validated prognostic models that aim to predict a patient’s health outcomes that are relevant to physical rehabilitation of MSK conditions. Additionally, we aimed to identify and describe externally validated prognostic models with the greatest value to physical rehabilitation clinicians.

Methods

We conducted a systematic review following the JBI guidelines26 and reported our findings according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis 2020 guidelines.27 This systematic review was registered with the PROSPERO database: CRD42020181959.

Search Strategy

Using an iterative process, the search strategies were developed and tested by an experienced medical information specialist in consultation with the review team. The MEDLINE strategy was peer reviewed by another senior information specialist, prior to execution, using the PRESS Checklist.28 Strategies utilized a combination of controlled vocabulary (eg, “prognosis,” “models, statistical,” and “physical therapy modalities”) and keywords (eg, “prediction tool,” “rehabilitation,” and “validity”). Vocabulary and syntax were adjusted across databases. There were no date restrictions on any of the searches but, when possible, animal-only records were removed from the results. Specific details regarding the strategies appear in Supplementary Appendix 1.

Information Sources

The systematic search was undertaken using multiple sources, including the following OVID databases: Ovid MEDLINE®, including Epub Ahead of Print and In-Process & Other Non-Indexed Citations, Embase Classic, Embase, PsycINFO, and the Cochrane Library databases included in EBM Reviews. We also searched CINAHL (Ebsco platform), Web of Science, and PEDro. All searches were performed on January 27, 2022.

Eligibility Criteria

For the title and abstract screening phase, the potential studies had to meet 5 criteria:

1. Participants: Adults or children who live with a MSK condition affecting physical functioning and requiring physical rehabilitation (ie, “a set of interventions designed to optimize functioning and reduce disability in individuals with health conditions in interaction with their environment”).29

2. Intervention: Prognostic models for physical rehabilitation practice.

3. Outcomes: Prognostic models that predict outcomes relevant to improve a patient’s health outcomes, as defined by the International Classification of Functioning, Disability and Health: Body function impairments, activity limitations, and/or participation restrictions.

4. Published either in English or in French.

5. Reported a study design to validate the prognostic model.

For the full-text screening phase, we applied one more criterion:

1. Design: The study had to pertain to the external validation phase.

Selection Process

After removing duplicates, 6 independent reviewers (AB, AP, CH, FN, MD, SD, and YTL) screened the study titles and abstracts. A calibration exercise on the first 25 citations was performed by the reviewers and if inter-rater agreement (κ statistic) was below k = 0.60 or if many selected references were irrelevant to the research question, eligibility criteria were clarified.30 Potential studies were full text screened by 3 independent reviewers (AP, CH, and FN) following another calibration exercise on the first 25 citations.30 In case of disagreement during the screening, 5 reviewers (AP, CH, FN, SD, and YTL) reached a consensus.

Data Collection Process

Two independent reviewers (CH and FN) extracted the data from the retained studies. In case of disagreement, a 3rd evaluator (SD or YTL) was brought in to reach consensus. We used the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies checklist for prognostic model studies31 and the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines32 to guide data extraction based on likely variables reported in external validation studies. The 2 evaluators reviewed the data extraction form after having extracted the first 10 studies to ensure relevance and to standardize the extraction.

Data Items

We extracted the following study characteristics: Authors, year of publication, country, where the study was conducted, study design, settings in which prognostic models were validated (primary care, inpatient, and community), sample size and patients’ characteristics, and the MSK conditions requiring physical rehabilitation (low back pain, acute ankle sprain, etc).

We classified the study design as a prospective cohort design, randomized interventional trial, and retrospective design. If the data used in the study were not collected for the purpose of external validation of a prognostic model (principal objective of the study), we decided to categorize it as a retrospective design. We made this choice because of the inherent limitations of this design.33

From these prognostic models, we extracted data that included: Predictors, type of prognostic model assessed (ie, clinical prediction rule,14 clinical decision rule,14 regression formula,17 and other17), name of the model, intervals of follow-up (Endpoint), performance measures (ie, calibration, discrimination, sensitivity, specificity, and likelihood ratios),23, 25 other relevant performance measures (ie, R,2 Brier score, and net benefit),34 update of the model, and methods used for external validation (ie, temporal, geographical, domain/setting, other).22, 32 We further extracted the outcome label (name) and measurement tool(s) as predicted clinical outcomes.

Study Risk of Bias and Applicability Assessment

We conducted risk of bias assessment using the Prediction model Risk of Bias Assessment Tool (PROBAST)21 as advised by the Cochrane Prognosis Methods Group for prognostic reviews. The PROBAST examines 4 domains: Participants, predictors, outcome, and analysis and explores risk of biases as well as applicability (ie, “Concern that the included participants and setting do not match the review question”21). Two independent evaluators (CH and FN) completed the assessment. In case of disagreement, a 3rd evaluator (SD or YTL) was involved in reaching consensus. The risk of bias of each study was rated as low, high, or unclear according to the 4 domains of the PROBAST and their applicability to the review question was rated as low, high, or unclear concern. Because the inter-rater reliability and agreement are not stable within the PROBAST domains,35 we performed a calibration training with 5 studies. We applied the 2 following criteria: (1) Inter-rater reliability (κ ≥ 0.6)30 and (2) inter-rater agreement (≥80%).36 In case of uncertainty or if a criterion was not met, a consensus meeting between evaluators was performed to render the PROBAST items more explicit.

Synthesis Methods

We conducted a narrative synthesis centered on the description of model characteristics to guide clinicians in their decision to use these prognostic models according to their practice context. The extracted data are not easy to interpret clinically. The PROBAST is a tool to assess the risk of bias and applicability but is of limited value for selecting impactful tools for clinical practice. The lack of a method or standards (benchmarks) to interpret performance measures also makes it difficult for clinicians to determine which prognostic models should be used. To address this limitation, we proposed a structured, 5-step method to determine which prognostic models were deemed clinically valuable to the physical rehabilitation clinicians. Based on current literature on prognostic model development, we identified 5 criteria that a model must fulfill to show clinical value. The model must show:

1. Low concerns about applicability.

2. Complete report of performance measures.

3. Calibration performance measure must be acceptable or good (Hosmer-Lemeshow test: P > .05,38 calibration slope = 1,39 calibration intercept = 0,39 and/or calibration plot considered as good or acceptable39).

4. Discrimination performance measure must be between 0.61 and 0.75 to be possibly helpful and >0.75 to be clearly helpful for clinicians23 and

5. Risks of bias must have minor impacts on the model.35 For example, a non-appropriated data source is more detrimental than predictors that are part of the outcome definition.

The predictive validity measures were not considered in this decision rule because they add little information on the discriminatory ability of models.40, 41

The models who met these 5 criteria are presented in the results section.

Role of Funding Source

The funders played no role in the design, conduct, or reporting of this study.

Results

Study Selection

We retrieved 4886 citations from our systematic search. Ten more articles were included from other sources (handsearching). Of the 4896 citations, 300 were retained for the full-text screening phase. We found 46 eligible studies, reporting on 37 models for spinal conditions, extremity conditions, and MSK injuries and trauma, met all the inclusion criteria and were selected for data extraction, risk of bias and applicability assessment (see Figure for the flow chart diagram).

Flow chart. MSK = musculoskeletal.
Figure

Flow chart. MSK = musculoskeletal.

Study Characteristics

Less than half (n = 20, 43.5%) of the studies retained were published before the TRIPOD reporting guidelines (2015) and 67.4% (31/46) of the studies were published before the PROBAST (2019). The data sources included prospective designs (n = 19, 41.3%), retrospective designs (n = 15, 32.6%), randomized control trials (n = 11, 23.9%), and mixed designs (RCT and prospective) (n = 1, 2.2%). Temporal validation (n = 24, 52.1%) and geographical validation (n = 17, 37%) were the 2 main methods for the external validation. The sample size in each study ranged from 91 to 28,919 participants, while their mean age ranged from 34 to 75 years old. We found 17 (37%) studies pertaining to spinal conditions, 19 (41.3%) studies pertaining to upper or lower extremity conditions, and 10 (21.7%) studies for MSK trauma, injuries, and pain (undefined region). The United States of America (n = 13) and Australia (n = 7) were the most represented countries. All extracted characteristics are available in Table.

Table

Characteristics of the Selected Studies

n (%)
Year of publication  
<2015
Between 2015 and 2018
≥2019

20 (43.5%)
11 (23.9%)
15 (32.6%)
Country  
United States of America
Australia
Sweden
Denmark, Netherlands, United Kingdom
Canada, Singapore, Switzerland, Multicentric
France, Germany, Norway, Spain

13
7
5
3
2
1
Settings  
Primary care
Inpatient
Community
Two or more settings
Unclear

25 (54.3%)
13 (28.3%)
3 (6.5%)
4 (8.7%)
1 (2.2%)
Study design  
Prospective cohort design
Retrospective design
Randomized interventional design
Randomized interventional and prospective cohort design

19 (41.3%)
15 (32.6%)
11 (23.9%)
1 (2.2%)
Methods of external validation  
Temporal validation
Geographical validation
Domain/setting validation
Other
Unclear
Mixed method

24 (52.1%)
17 (37%)
2 (4.3%)
1 (2.2%)
1 (2.2%)
1 (2.2%)
Model update  
Yes
No

4 (8.7%)
42 (91.3%)
n (%)
Year of publication  
<2015
Between 2015 and 2018
≥2019

20 (43.5%)
11 (23.9%)
15 (32.6%)
Country  
United States of America
Australia
Sweden
Denmark, Netherlands, United Kingdom
Canada, Singapore, Switzerland, Multicentric
France, Germany, Norway, Spain

13
7
5
3
2
1
Settings  
Primary care
Inpatient
Community
Two or more settings
Unclear

25 (54.3%)
13 (28.3%)
3 (6.5%)
4 (8.7%)
1 (2.2%)
Study design  
Prospective cohort design
Retrospective design
Randomized interventional design
Randomized interventional and prospective cohort design

19 (41.3%)
15 (32.6%)
11 (23.9%)
1 (2.2%)
Methods of external validation  
Temporal validation
Geographical validation
Domain/setting validation
Other
Unclear
Mixed method

24 (52.1%)
17 (37%)
2 (4.3%)
1 (2.2%)
1 (2.2%)
1 (2.2%)
Model update  
Yes
No

4 (8.7%)
42 (91.3%)
Table

Characteristics of the Selected Studies

n (%)
Year of publication  
<2015
Between 2015 and 2018
≥2019

20 (43.5%)
11 (23.9%)
15 (32.6%)
Country  
United States of America
Australia
Sweden
Denmark, Netherlands, United Kingdom
Canada, Singapore, Switzerland, Multicentric
France, Germany, Norway, Spain

13
7
5
3
2
1
Settings  
Primary care
Inpatient
Community
Two or more settings
Unclear

25 (54.3%)
13 (28.3%)
3 (6.5%)
4 (8.7%)
1 (2.2%)
Study design  
Prospective cohort design
Retrospective design
Randomized interventional design
Randomized interventional and prospective cohort design

19 (41.3%)
15 (32.6%)
11 (23.9%)
1 (2.2%)
Methods of external validation  
Temporal validation
Geographical validation
Domain/setting validation
Other
Unclear
Mixed method

24 (52.1%)
17 (37%)
2 (4.3%)
1 (2.2%)
1 (2.2%)
1 (2.2%)
Model update  
Yes
No

4 (8.7%)
42 (91.3%)
n (%)
Year of publication  
<2015
Between 2015 and 2018
≥2019

20 (43.5%)
11 (23.9%)
15 (32.6%)
Country  
United States of America
Australia
Sweden
Denmark, Netherlands, United Kingdom
Canada, Singapore, Switzerland, Multicentric
France, Germany, Norway, Spain

13
7
5
3
2
1
Settings  
Primary care
Inpatient
Community
Two or more settings
Unclear

25 (54.3%)
13 (28.3%)
3 (6.5%)
4 (8.7%)
1 (2.2%)
Study design  
Prospective cohort design
Retrospective design
Randomized interventional design
Randomized interventional and prospective cohort design

19 (41.3%)
15 (32.6%)
11 (23.9%)
1 (2.2%)
Methods of external validation  
Temporal validation
Geographical validation
Domain/setting validation
Other
Unclear
Mixed method

24 (52.1%)
17 (37%)
2 (4.3%)
1 (2.2%)
1 (2.2%)
1 (2.2%)
Model update  
Yes
No

4 (8.7%)
42 (91.3%)

Risk of Bias and Applicability in Studies

The overall judgment of risk of bias was high for all studies with 97.8% of studies presenting high risk of bias on analysis domain. The overall judgment of concerns about applicability was low for two-thirds of the studies (69.6%). Supplementary Table 1 presents the risk of bias and applicability assessment for each study according to the PROBAST.

Results of Individual Studies

Overall Information of Selected Prognostic Models

We found 46 studies reporting on 37 unique models. Nineteen studies (41.3%) reported information on calibration through 6 different methods (calibration slope and intercept, Hosmer-Lemeshow test, calibration plot, expected/observed ratio, comparison with previous data, and scatter plot). Twenty-eight studies (60.9%) reported information on discrimination through one method (AUC = C-statistic = C-index). The endpoint ranged from 4 days to 10 years with the main endpoints being at 3 months (n = 7), 6 months (n = 9), and 12 months (n = 6).

External Validation of Prognostic Models for Spinal Conditions

We found 17 studies, of which 11 were for low back conditions and 6 focused on neck conditions. For neck pain, the Sterling clinical prediction rule (n = 2/6),42, 42 Schellingerhout clinical prediction rule (n = 2/6),42, 43 and the Ritchie clinical prediction rule (n = 2/6)42, 44 were the most studied models. The sample size for each study ranged from 101 to 1193 participants and disability was the main outcome (n = 3/6). For low back pain, the STart Back Screening Tool (n = 4/11)45–48 and the Flynn clinical decision rule (n = 2/11)49, 50 were the most studied models. The sample size for each study ranged from 105 to 1528 participants and disability was the main outcome (n = 3/11). Supplementary Table 2 presents all the extracted characteristics of studies on spine conditions.

External Validation of Prognostic Models for Extremity Conditions

We found 19 studies, of which 15 were on lower extremity and 4 studies on upper extremity conditions. For the lower extremity, the main studied model was the Risk Assessment and Prediction Tool (n = 3/15).51–53 The sample size for each study ranged from 52 to 2863 participants. Discharge destination was the main outcome (n = 6/15). For the upper extremity, the sample size for each study ranged from 120 to 3637. Supplementary Table 3 presents all the extracted characteristics of studies on extremity conditions.

External Validation of Prognostic Models for Musculoskeletal Trauma, Injuries and Pain (Undefined Region)

We found 9 studies, of which 3 were on orthopedic trauma, 3 studies on MSK pain, and 3 studies on MSK injuries. The Wallis Occupational Rehabilitation RisK (WORRK) model (n = 2/9)55, 56 and the Örebro MSK Pain Questionnaire (n = 2/9)57, 58 were the most studied. The sample size for each study ranged from 107 to 28,919 participants. Return-to-work was the main outcome (n = 4/9). Supplementary Table 4 presents all the extracted characteristics of studies on MSK conditions in general.

Clinically Valuable Prognostic Models

We submitted the 46 studies that were retained to our 5-step process (described in the Synthesis Methods section). We observed that 16/46 showed high concerns about applicability, 17/30 showed no or incomplete report of performance measures, and 6/13 showed poor calibration. From the 7 studies that showed possibly helpful discrimination, 1 (1/7) showed risk of bias with major impact, which left 6 (6/7) studies with possibly helpful discrimination measures (Suppl. Material S1).

The 6 models with the greatest clinical value identified were (for an interactive version of this result, with additional information and resources, please visit https://view.genial.ly/62190374dcdc9300111d6d28/interactive-content-prognostic-models-in-musculoskeletal-rehabilitation or see Suppl. Material S2):

1. Forsbrand et al for their prediction of health-related quality of life (discrimination = 0.73) and work ability (discrimination = 0.68) at an endpoint between 11 and 27 months for people with acute/subacute low back or neck pain.46

2. Luthi et al for their prediction of return-to-work (discrimination = 0.73) at 24 months for people with orthopedic trauma.54

3. Da Silva et al for their prediction of number of days to pain recovery (discrimination = 0.71) at 1 month for people with acute low back pain.58

4. Traeger et al for their prediction of chronicity (discrimination = 0.66) at 3 months for people with low back pain.59

5. Schellingerhout et al for their prediction of global perceived recovery (discrimination = 0.66) at 6 months for people with neck pain.43

6. Keene et al for their prediction of poor outcome (ie, severe persistent pain and/or severe functional difficulty and/or significant lack of confidence in the ankle and/or recurrent sprain) (discrimination = 0.64) at 9 months for people with acute ankle sprain.60

Discussion

The objective of this study was to identify and appraise externally validated prognostic models that aim to predict a patient’s health outcomes that are relevant to physical rehabilitation of MSK conditions. We found 46 studies reporting on 37 unique models for spine, lower limbs, and non-specific MSK injuries. Although the risks of bias were high, 6 were deemed clinically valuable. These findings led us to 3 main considerations.

There are few prognostic models that are highly clinically valuable. To be considered clinically valuable, a model must at least show “adequate” calibration and discrimination performance measures at the end of the external validation phase.16, 23, 61 A more conservative approach would be to add a clinical applicability criterion, which we did from the results of the applicability assessment of the PROBAST,21 and risk of bias criteria. All models that presented adequate performance measures (calibration/discrimination) and low concern about applicability were considered at high risk of bias according to the PROBAST. However, we must take into consideration that the structure of the PROBAST is extremely conservative.35 For example, if 1 of the 18 items is judged as “absent,” the overall judgment is automatically considered at “high risk of bias.”21 Nevertheless, certain items have a major impact on the performance measures (eg, small dataset with inadequate number of events-per-variable), whereas others have a minor impact (eg, outcome is part of the predictors), an important aspect that is not taken into consideration by the PROBAST.35 We considered the elements that had major impact on the risk of bias. In absence of specific standards in using the PROBAST, we chose this route to determine which models that had a high risk of bias could still be clinically valuable.

Among the 6 models with greatest value, the study by Forsbrand et al reported calibration performance through the Hosmer-Lemeshow test without other information.46 This test had to be complemented by calibration plot or table comparing predicted versus observed outcome frequencies to provide useful information on calibration performance.21 Clinicians must remain conscious of this calibration limit when using the STart Back Screening Tool to predict health-related quality of life and/or work ability at 6 months in people with low back pain and/or neck pain. Thus, clinicians should keep in mind that the predicted results from the STart Back Screening Tool could slightly deviate from the observed outcomes at the endpoint.17

Even with our structured 5-step method, it remains complex for clinicians to determine if a given model is suitable for their practice. Further considerations need to be given to the population (ie, selection/recruitment of participants) and clinical settings in which the model was developed/validated. There is often heterogeneity, which makes the comparison difficult between models.33, 62 Also, clinicians must acknowledge the specific eligibility criteria of each model to guide their decision to use (or not) for a specific patient. Moreover, predicted outcomes of the selected models were generally in adequation with rehabilitation scope and patients’ needs.63 Some models used “recovery” as a predicted outcome; however, the definition of recovery varies between these models. This bias represents a major barrier for model comparison and transferability.33 Finally, clinical applicability is important to facilitate the integration of prognosis in clinical practice.

Of the 46 studies retained in this review, only a few showed potential value for clinical utilization. This is disappointing and highlights the difficulty of summarizing the trajectory of MSK conditions with few clinical variables. Indeed, prognostic models are designed to be pragmatic (ie, brief) tools that can be easily applicable in any clinical setting.16 For example, in the context of low back pain, experts in pain management have validated a diagnostic framework incorporating 51 modifiable factors that develop or maintain low back pain.64 Yet, it is not realistic to explore all potential modifiable and non-modifiable factors contributing to an individual’s trajectory via a simple and clinically usable prognostic model.65 To address this limitation, a stratified approach with prognostic models was introduced and showed promising impact.66–69

Two systematic reviews on prognostic models were recently published. Walsh et al focused on the identification of development and validation studies of clinical decision rules (CDR, ie, response to physical therapist interventions).20 For their part, Silva et al focused on the development and validation of prognostic models for acute low back pain.70 Our findings on CDR are convergent to Walsh et al’s conclusion that current literature does not support the use of the externally validated CDR.20 However, Walsh et al reported good overall risk of bias for validation studies, which is inconsistent with our review.20 This discrepancy is most likely explained by the use of different tools to assess risk of bias. Walsh et al used a non-prognosis risk of bias tool (ie, the Cochrane Effective Practice and Organization of Care group criteria),20 whereas we followed the Cochrane Prognosis Methods Group for prognostic reviews with the use of PROBAST. Nevertheless, our findings on predominant high risk of bias are consistent with those of Silva et al who used PROBAST for methodological quality assessment.70 Silva et al also highlighted the low report of performance measures, which is an important limitation for determining potential clinical value.70 According to the Silva et al’s review, the Da Silva prognostic model was found to be the most valuable one.70 This result is consistent with the findings of our structured 5-step method to determine potential clinically valuable models. However, a discrepancy appears regarding the PICKUP model. With our 5-step method, we found that this model could be possibly helpful, whereas Silva et al did not.70 This could be explained by 2 different discrimination thresholds. We decided to use discrimination thresholds based on clinical dimension,23 whereas Silva et al used higher thresholds based on mathematical considerations.37 As reported by Traeger et al, the PICKUP model showed higher discrimination performance than clinical judgment, which confirms our conclusion on its possible helpfulness.59 We therefore believe that the discrimination threshold for the clinical value of a prognostic model should be determined by the comparison between the prognostic model and clinical judgment. Compared with previous literature, we operationalized a synthesis method for clinical value that allows us to propose impactful models for clinicians.

Methodological and statistical concerns may limit the clinical integration of these prognostic models into clinical practice. From the PROBAST, we can conclude that the methods used for external validation of prognostic models in the MSK field have high risks of bias. This is consistent with previous systematic reviews,20, 62 reporting that limitation. These methodological flaws can lead to over- or under-fitting of the included models.

The clinical integration of these prognostic models may also be limited by 5 main aspects:

1. The lack of information on calibration and discrimination measures. From the 46 studies included in our review, 19 reported calibration information, and 28 reported discrimination information. Internal validation techniques to correct overfitting and optimism are insufficient to preserve the accuracy of the model in new patients.22 It is essential that publications on external validation of prognostic models include calibration and discrimination measures.23, 61, 71 These measures could inform clinicians on the relevance of a model. If calibration is poor on a new sample, this model is not useful in its present form.25 If discrimination is poor on the new sample, the model is not clinically useful.23, 24

2. Few domain/settings were used for the external validation. From the 46 studies included in our review, 2 used a domain setting as a method of external validation process. Differences between participants included in internal and external validation steps are potentially greater in domain/setting validation versus temporal validation.22 Models with temporal validation provide the weakest evidence of generalizability.22 Inspired by the proposal of Jenkins et al, a continual process including the different methods of external validation could provide information for model updating and improve the generalizability of the model.16, 22, 72

3. Small sample size. Numerous studies included in our review presented a small sample size that did not fit with the accepted rule-of-thumb of at least 100 events and 100 non-events.73, 74 When this rule-of-thumb is not reached, studies can report an inaccurate estimation of predictive performance measures that can be misleading for clinicians.71, 75 Even if this rule has been reached, it is not specific to the model and the validation setting.76 As a result, recent methodological studies on sample-size calculation must be integrated to improve the estimation of important performance measures.71, 76

4. Predictor(s) included in outcome bias. In some studies included in our review, the predictors were included in the outcome. This may lead to an overestimation of the association between the predictor and the outcome and can lead to an optimistic estimation of performance measures.21

5. Design. From the 46 studies included in our review, 19 used the actual gold standard design (ie, a prospective cohort developed to externally validate a specific prognostic model).21 Thus, inappropriate designs are still frequently used in the external validation of prognostic models.33 Randomized controlled trials (RCTs) often present more restricted eligibility criteria that lead to more homogeneous participants. This smaller case-mix tends to obtain lower discriminative ability.21 The inclusion and exclusion of potential participants is also problematic in retrospective design, based on routine care registries.77, 78

There is a definite need (call) for standardization. The weaknesses observed for external validation methodology can be explained in part by the recent publication of guidelines. Less than a half the studies were published before the TRIPOD publication. With the publication of guidelines and tools such as PROBAST, there is now an opportunity to standardize and improve the methods to perform meta-analysis. The extracted data from this review revealed a lack of standardization that makes a meta-analysis impossible. With the recent publication of the PROBAST, there is an opportunity to bridge this gap. From our perspective, some important key aspects could be added to the PROBAST tool to make it more comprehensive and usable. Due to the absence of a precise cut-off for calibration and discrimination,79 it could be difficult for clinicians to determine if a model is informative. Thus, the use of the R2 and/or Brier score could also be relevant to obtain easily interpretable information on the overall performance of models.34 Moreover, from a clinical perspective, information from a Decision Curve Analysis allows the clinician to determine if a model is likely to be useful for decision-making.79, 80 Thus, information on the net benefit (trade-off between benefit and harm) could be helpful to determine the best model for clinical integration.41 The development of computerized tools could also facilitate the use of the models in clinical settings. Indeed, after entering the patient’s data, clinicians could directly obtain his/her prognosis (see an example on http://myback.neura.edu.au/). Finally, the current external validation literature is complex and heterogeneous because of the number of different terms used to qualify the external validation step (eg, external validity, prognostic validity, predictive validity/ability/capacity, discriminative validity/ability). Most of these terms are relevant but not specific to the external validation step. This review was designed to identify external validation studies, based on search terms from prognostic research literature, and may have missed citations from rehabilitation literature using terminology that is different from external validation.

Recalibration or updating (addition of new predictor[s]) is uncommon in the prognostic models included in our review. Our results suggest that if the external validation of a model is poor, researchers are tempted to develop a new model, rather than to improve the first one.22, 81 Adjusting the model with the characteristics of the validation sample should be part of the validation study.14, 72, 81 In our review, 4 studies (8.7%) performed an update of the model. Moreover, this update is likely to improve the model’s generalizability.22 With this perspective in mind, Jenkins et al argue for continual updating and monitoring of prognostic models to reflect constantly evolving knowledge and practices.72

Limitations

Our review presents some limitations. The main one being probable overestimation of the risk of bias because of the PROBAST. The structure of this tool, the expertise required to complete it, and the complicated guidelines can lead to rating errors.35 In addition, our decision rule to determine clinical value is based on criteria from scientific literature and on the authors’ opinion regarding the PROBAST items. This strategy is not without potential bias, and experts’ opinion could refine and validate the content of our proposal. Because PROBAST is very recent (2019) and has high standards to assess the risk of bias in prognostic studies, we expected to find that most of the included studies would be categorized at high-risk of bias. Yet, considering the very conservative properties of the PROBAST, our findings are still impactful.35 Heterogeneity in the terminology used could lead to indexing or interpretation errors that could impact the results of our search strategy and/or selection process. There appears to be no standardized terminology in the rehabilitation literature concerning prognostic models. This standardization will be essential to improve prognostic literature and to eventually conduct meta-analyses of prognostic models in rehabilitation.

Conclusion

Our systematic review found 46 studies on 37 unique prognostic models spanning spinal, lower extremity, upper extremity, and MSK injuries, trauma, and pain (undefined region). Performance measures were not systematically reported in the included studies. According to the PROBAST, two-thirds of the studies presented low concerns about applicability, but only one study presented low risk of bias. We developed and performed a structured 5-step method allowing us to identify 6 prognostic models that could be deemed to be clinically valuable: (1) STart Back Screening Tool, (2) WORRK model, (3) Da Silva model, (4) PICKUP model, (5) Schellingerhout CPR, and (6) Keene model. Researchers must consider the PROBAST, and methodological issues reported in our review to propose high-quality external validation studies. We also recommend that researchers standardize the terminology used to report studies on the external validation of prognostic models.

Authors’ Contributions

Concept/idea/research design: F. Naye, Y. Tousignant-Laflamme, C. Houle, C. Cook, A. LeBlanc, S. Décary

Writing: F. Naye, Y. Tousignant-Laflamme, C. Houle, C. Cook, M. Dugas, B. Skidmore, A. LeBlanc, S. Décary

Data analysis: C. Houle, F. Naye, Y. Tousignant-Laflamme, M. Dugas, S. Décary

Project management: Y. Tousignant-Laflamme, S. Décary

Fund procurement: Y. Tousignant-Laflamme, S. Décary

Providing facilities/equipment: Y. Tousignant-Laflamme, S. Décary

Providing institutional liaisons: Y. Tousignant-Laflamme

Consultation (including review of manuscript before submitting): A. LeBlanc

Funding

This work was supported by the Ordre Professionnel de la Physiothérapie du Québec and the Strategy for Patient-Oriented Research.

Data Availability Statement

Data are available upon request.

Disclosures

The authors completed the ICMJE Form for Disclosure of Potential Conflicts of Interest and reported no conflict of interest.

References

1.

Cieza
 
A
,
Causey
 
K
,
Kamenov
 
K
,
Hanson
 
SW
,
Chatterji
 
S
,
Vos
 
T
.
Global estimates of the need for rehabilitation based on the Global Burden of Disease study 2019: a systematic analysis for the Global Burden of Disease study 2019
.
Lancet
.
2020
;
396
:
2006
2017
. https://doi.org/10.1016/S0140-6736(20)32340-0.

2.

Østerås
 
N
,
Kjeken
 
I
,
Smedslund
 
G
, et al.  
Exercise for hand osteoarthritis: a Cochrane systematic review
.
J Rheumatol
.
2017
;
44
:
1850
1858
. https://doi.org/10.3899/jrheum.170424.

3.

Steuri
 
R
,
Sattelmayer
 
M
,
Elsig
 
S
, et al.  
Effectiveness of conservative interventions including exercise, manual therapy and medical management in adults with shoulder impingement: a systematic review and meta-analysis of RCTs
.
Br J Sports Med
.
2017
;
51
:
1340
1347
. https://doi.org/10.1136/bjsports-2016-096515.

4.

Challoumas
 
D
,
Biddle
 
M
,
McLean
 
M
,
Millar
 
NL
.
Comparison of treatments for frozen shoulder: a systematic review and meta-analysis
.
JAMA Netw Open
.
2020
;
3
:
e2029581
. https://doi.org/10.1001/jamanetworkopen.2020.29581.

5.

Paige
 
NM
,
Miake-Lye
 
IM
,
Booth
 
MS
, et al.  
Association of spinal manipulative therapy with clinical benefit and harm for acute low back pain: systematic review and meta-analysis
.
JAMA
.
2017
;
317
:
1451
1460
. https://doi.org/10.1001/jama.2017.3086.

6.

Masaracchio
 
M
,
Kirker
 
K
,
States
 
R
,
Hanney
 
WJ
,
Liu
 
X
,
Kolber
 
M
.
Thoracic spine manipulation for the management of mechanical neck pain: a systematic review and meta-analysis
.
PLoS One
.
2019
;
14
:
e0211877
. https://doi.org/10.1371/journal.pone.0211877.

7.

Naunton
 
J
,
Street
 
G
,
Littlewood
 
C
,
Haines
 
T
,
Malliaras
 
P
.
Effectiveness of progressive and resisted and non-progressive or non-resisted exercise in rotator cuff related shoulder pain: a systematic review and meta-analysis of randomized controlled trials
.
Clin Rehabil
.
2020
;
34
:
1198
1216
. https://doi.org/10.1177/0269215520934147.

8.

Nascimento
 
P
,
Costa
 
LOP
,
Araujo
 
AC
,
Poitras
 
S
,
Bilodeau
 
M
.
Effectiveness of interventions for non-specific low back pain in older adults. A systematic review and meta-analysis
.
Physiotherapy
.
2019
;
105
:
147
162
. https://doi.org/10.1016/j.physio.2018.11.004.

9.

Luan
 
L
,
Adams
 
R
,
Witchalls
 
J
,
Ganderton
 
C
,
Han
 
J
.
Does strength training for chronic ankle instability improve balance and patient-reported outcomes and by clinically detectable amounts? A systematic review and meta-analysis
.
Phys Ther
.
2021
;
101
:pzab046. https://doi.org/10.1093/ptj/pzab046.

10.

Hingorani
 
AD
,
Windt
 
DA
,
Riley
 
RD
, et al.  
Prognosis research strategy (PROGRESS) 4: stratified medicine research
.
BMJ
.
2013
;
346
:
e5793
. https://doi.org/10.1136/bmj.e5793.

11.

Lentz
 
TA
,
Goode
 
AP
,
Thigpen
 
CA
,
George
 
SZ
.
Value-based care for musculoskeletal pain: are physical therapists ready to deliver?
 
Phys Ther
.
2020
;
100
:
621
632
. https://doi.org/10.1093/ptj/pzz171.

12.

Hemingway
 
H
,
Croft
 
P
,
Perel
 
P
, et al.  
Prognosis research strategy (PROGRESS) 1: a framework for researching clinical outcomes
.
BMJ
.
2013
;
346
:
e5595
. https://doi.org/10.1136/bmj.e5595.

13.

Croft
 
P
,
Altman
 
DG
,
Deeks
 
JJ
, et al.  
The science of clinical practice: disease diagnosis or patient prognosis? Evidence about "what is likely to happen" should shape clinical practice
.
BMC Med
.
2015
;
13
:
20
. https://doi.org/10.1186/s12916-014-0265-4.

14.

Kent
 
P
,
Cancelliere
 
C
,
Boyle
 
E
,
Cassidy
 
JD
,
Kongsted
 
A
.
A conceptual framework for prognostic research
.
BMC Med Res Methodol
.
2020
;
20
:
172
. https://doi.org/10.1186/s12874-020-01050-7.

15.

Riley
 
RD
,
Hayden
 
JA
,
Steyerberg
 
EW
, et al.  
Prognosis research strategy (PROGRESS) 2: prognostic factor research
.
PLoS Med
.
2013
;
10
:
e1001380
. https://doi.org/10.1371/journal.pmed.1001380.

16.

Steyerberg
 
EW
,
Moons
 
KG
,
van der
 
Windt
 
DA
, et al.  
Prognosis research strategy (PROGRESS) 3: prognostic model research
.
PLoS Med
.
2013
;
10
:
e1001381
. https://doi.org/10.1371/journal.pmed.1001381.

17.

Tousignant-Laflamme
 
Y
,
Houle
 
C
,
Cook
 
C
,
Naye
 
F
,
LeBlanc
 
A
,
Decary
 
S
.
Mastering prognostic tools: an opportunity to enhance personalized care and to optimize clinical outcomes in physical therapy
.
Phys Ther
.
2022
;
102
:pzac023. https://doi.org/10.1093/ptj/pzac023.

18.

Kelly
 
J
,
Ritchie
 
C
,
Sterling
 
M
.
Clinical prediction rules for prognosis and treatment prescription in neck pain: a systematic review
.
Musculoskelet Sci Pract
.
2017
;
27
:
155
164
. https://doi.org/10.1016/j.math.2016.10.066.

19.

Haskins
 
R
,
Osmotherly
 
PG
,
Rivett
 
DA
.
Validation and impact analysis of prognostic clinical prediction rules for low back pain is needed: a systematic review
.
J Clin Epidemiol
.
2015
;
68
:
821
832
. https://doi.org/10.1016/j.jclinepi.2015.02.003.

20.

Walsh
 
ME
,
French
 
HP
,
Wallace
 
E
, et al.  
Existing validated clinical prediction rules for predicting response to physiotherapy interventions for musculoskeletal conditions have limited clinical value: a systematic review
.
J Clin Epidemiol
.
2021
;
135
:
90
102
. https://doi.org/10.1016/j.jclinepi.2021.02.005.

21.

Moons
 
KGM
,
Wolff
 
RF
,
Riley
 
RD
, et al.  
PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration
.
Ann Intern Med
.
2019
;
170
:
W1
W33
. https://doi.org/10.7326/M18-1377.

22.

Toll
 
DB
,
Janssen
 
KJ
,
Vergouwe
 
Y
,
Moons
 
KG
.
Validation, updating and impact of clinical prediction rules: a review
.
J Clin Epidemiol
.
2008
;
61
:
1085
1094
. https://doi.org/10.1016/j.jclinepi.2008.04.008.

23.

Alba
 
AC
,
Agoritsas
 
T
,
Walsh
 
M
, et al.  
Discrimination and calibration of clinical prediction models: users' guides to the medical literature
.
JAMA
.
2017
;
318
:
1377
1384
. https://doi.org/10.1001/jama.2017.12126.

24.

Dimitrov
 
BD
,
Motterlini
 
N
,
Fahey
 
T
.
A simplified approach to the pooled analysis of calibration of clinical prediction rules for systematic reviews of validation studies
.
Clin Epidemiol
.
2015
;
7
:
267
280
. https://doi.org/10.2147/CLEP.S67632.

25.

Vach
 
W
.
Calibration of clinical prediction rules does not just assess bias
.
J Clin Epidemiol
.
2013
;
66
:
1296
1301
. https://doi.org/10.1016/j.jclinepi.2013.06.003.

26.

Aromataris
 
E
,
Munn
 
Z
. Systematic reviews of effectiveness.
JBI Manual for Evidence Synthesis
;
2020
:
71
87
. https://doi.org/10.46658/JBIRM-17-01.

27.

Page
 
MJ
,
McKenzie
 
JE
,
Bossuyt
 
PM
, et al.  
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
BMJ
.
2021
;
372
:
n71
. https://doi.org/10.1136/bmj.n71.

28.

McGowan
 
J
,
Sampson
 
M
,
Salzwedel
 
DM
,
Cogo
 
E
,
Foerster
 
V
,
Lefebvre
 
C
.
PRESS peer review of electronic search strategies: 2015 guideline statement
.
J Clin Epidemiol
.
2016
;
75
:
40
46
. https://doi.org/10.1016/j.jclinepi.2016.01.021.

29.

World Health Organization
.
Rehabilitation.
 
2021
. Available from: https://www.who.int/news-room/fact-sheets/detail/rehabilitation Accessed April 11, 2023.

30.

Sim
 
J
,
Wright
 
CC
.
The kappa statistic in reliability studies: use, interpretation, and sample size requirements
.
Phys Ther
.
2005
;
85
:
257
268
. https://doi.org/10.1093/ptj/85.3.257.

31.

Moons
 
KG
,
de
 
Groot
 
JA
,
Bouwmeester
 
W
, et al.  
Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist
.
PLoS Med
.
2014
;
11
:
e1001744
. https://doi.org/10.1371/journal.pmed.1001744.

32.

Collins
 
GS
,
Reitsma
 
JB
,
Altman
 
DG
,
Moons
 
KG
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
J Clin Epidemiol
.
2015
;
68
:
134
143
. https://doi.org/10.1016/j.jclinepi.2014.11.010.

33.

Wingbermühle
 
RW
,
Chiarotto
 
A
,
Koes
 
B
,
Heymans
 
MW
,
van
 
Trijffel
 
E
.
Challenges and solutions in prognostic prediction models in spinal disorders
.
J Clin Epidemiol
.
2021
;
132
:
125
130
. https://doi.org/10.1016/j.jclinepi.2020.12.017.

34.

Steyerberg
 
EW
,
Vickers
 
AJ
,
Cook
 
NR
, et al.  
Assessing the performance of prediction models: a framework for traditional and novel measures
.
Epidemiology
.
2010
;
21
:
128
138
. https://doi.org/10.1097/EDE.0b013e3181c30fb2.

35.

Venema
 
E
,
Wessler
 
BS
,
Paulus
 
JK
, et al.  
Large-scale validation of the prediction model risk of bias assessment tool (PROBAST) using a short form: high risk of bias models show poorer discrimination
.
J Clin Epidemiol
.
2021
;
138
:
32
39
. https://doi.org/10.1016/j.jclinepi.2021.06.017.

36.

Shea
 
BJ
,
Reeves
 
BC
,
Wells
 
G
, et al.  
AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both
.
BMJ
.
2017
;
358
:
j4008
. https://doi.org/10.1136/bmj.j4008.

37.

Hosmer
 
DWJ
,
Lemeshow
 
S
,
Sturdivant
 
RX
.
Applied Logistic Regression
. 3rd ed. New York, NY:
Wiley
;
2013
.

38.

Austin
 
PC
,
van
 
Klaveren
 
D
,
Vergouwe
 
Y
,
Nieboer
 
D
,
Lee
 
DS
,
Steyerberg
 
EW
.
Geographic and temporal validity of prediction models: different approaches were useful to examine model performance
.
J Clin Epidemiol
.
2016
;
79
:
76
85
. https://doi.org/10.1016/j.jclinepi.2016.05.007.

39.

Crowson
 
CS
,
Atkinson
 
EJ
,
Therneau
 
TM
.
Assessing calibration of prognostic risk scores
.
Stat Methods Med Res
.
2016
;
25
:
1692
1706
. https://doi.org/10.1177/0962280213497434.

40.

Verbakel
 
JY
,
Steyerberg
 
EW
,
Uno
 
H
, et al.  
ROC curves for clinical prediction models part 1. ROC plots showed no added value above the AUC when evaluating the performance of clinical prediction models
.
J Clin Epidemiol
.
2020
;
126
:
207
216
. https://doi.org/10.1016/j.jclinepi.2020.01.028.

41.

Van Calster
 
B
,
Wynants
 
L
,
Collins
 
GS
,
Verbakel
 
JY
,
Steyerberg
 
EW
.
ROC curves for clinical prediction models part 3. The ROC plot: a picture that needs a 1000 words
.
J Clin Epidemiol
.
2020
;
126
:
220
223
. https://doi.org/10.1016/j.jclinepi.2020.05.037.

42.

Wingbermühle
 
RW
,
Heymans
 
MW
,
van
 
Trijffel
 
E
,
Chiarotto
 
A
,
Koes
 
B
,
Verhagen
 
AP
.
External validation of prognostic models for recovery in patients with neck pain
.
Braz J Phys Ther
.
2021
;
25
:
775
784
. https://doi.org/10.1016/j.bjpt.2021.06.001.

43.

Schellingerhout
 
JM
,
Heymans
 
MW
,
Verhagen
 
AP
,
Lewis
 
M
,
de
 
Vet
 
HC
,
Koes
 
BW
.
Prognosis of patients with nonspecific neck pain: development and external validation of a prediction rule for persistence of complaints
.
Spine (Phila Pa 1976)
.
2010
;
35
:
E827
E835
. https://doi.org/10.1097/BRS.0b013e3181d85ad5.

44.

Ritchie
 
C
,
Hendrikz
 
J
,
Jull
 
G
,
Elliott
 
J
,
Sterling
 
M
.
External validation of a clinical prediction rule to predict full recovery and ongoing moderate/severe disability following acute whiplash injury
.
J Orthop Sports Phys Ther
.
2015
;
45
:
242
250
. https://doi.org/10.2519/jospt.2015.5642.

45.

Hill
 
JC
,
Dunn
 
KM
,
Lewis
 
M
, et al.  
A primary care back pain screening tool: identifying patient subgroups for initial treatment
.
Arthritis Rheum
.
2008
;
59
:
632
641
. https://doi.org/10.1002/art.23563.

46.

Forsbrand
 
MH
,
Grahn
 
B
,
Hill
 
JC
,
Petersson
 
IF
,
Post Sennehed
 
C
,
Stigmar
 
K
.
Can the STarT back tool predict health-related quality of life and work ability after an acute/subacute episode with back or neck pain? A psychometric validation study in primary care
.
BMJ Open
.
2018
;
8
:
e021748
. https://doi.org/10.1136/bmjopen-2018-021748.

47.

Beneciuk
 
JM
,
Bishop
 
MD
,
Fritz
 
JM
, et al.  
The STarT back screening tool and individual psychological measures: evaluation of prognostic capabilities for low back pain clinical outcomes in outpatient physical therapy settings
.
Phys Ther
.
2013
;
93
:
321
333
. https://doi.org/10.2522/ptj.20120207.

48.

Kneeman
 
J
,
Battalio
 
SL
,
Korpak
 
A
, et al.  
Predicting persistent disabling low back pain in veterans affairs primary care using the STarT back tool
.
PMR
.
2021
;
13
:
241
249
. https://doi.org/10.1002/pmrj.12488.

49.

Childs
 
JD
,
Fritz
 
JM
,
Flynn
 
TW
, et al.  
A clinical prediction rule to identify patients with low back pain most likely to benefit from spinal manipulation: a validation study
.
Ann Intern Med
.
2004
;
141
:
920
928
. https://doi.org/10.7326/0003-4819-141-12-200412210-00008.

50.

Hancock
 
MJ
,
Maher
 
CG
,
Latimer
 
J
,
Herbert
 
RD
,
McAuley
 
JH
.
Independent evaluation of a clinical prediction rule for spinal manipulative therapy: a randomised controlled trial
.
Eur Spine J
.
2008
;
17
:
936
943
. https://doi.org/10.1007/s00586-008-0679-9.

51.

Tan
 
C
,
Loo
 
G
,
Pua
 
YH
, et al.  
Predicting discharge outcomes after total knee replacement using the risk assessment and predictor tool
.
Physiotherapy
.
2014
;
100
:
176
181
. https://doi.org/10.1016/j.physio.2013.02.003.

52.

Hansen
 
VJ
,
Gromov
 
K
,
Lebrun
 
LM
,
Rubash
 
HE
,
Malchau
 
H
,
Freiberg
 
AA
.
Does the risk assessment and prediction tool predict discharge disposition after joint replacement?
 
Clin Orthop Relat Res
.
2015
;
473
:
597
601
. https://doi.org/10.1007/s11999-014-3851-z.

53.

Coudeyre
 
E
,
Eschalier
 
B
,
Descamps
 
S
, et al.  
Transcultural validation of the risk assessment and predictor tool (RAPT) to predict discharge outcomes after total hip replacement
.
Ann Phys Rehabil Med
.
2014
;
57
:
169
184
. https://doi.org/10.1016/j.rehab.2014.02.002.

54.

Luthi
 
F
,
Deriaz
 
O
,
Vuistiner
 
P
,
Burrus
 
C
,
Hilfiker
 
R
.
Predicting non return to work after orthopaedic trauma: the Wallis occupational rehabilitation risk (WORRK) model
.
PLoS One
.
2014
;
9
:
e94268
. https://doi.org/10.1371/journal.pone.0094268.

55.

Plomb-Holmes
 
C
,
Lüthi
 
F
,
Vuistiner
 
P
,
Leger
 
B
,
Hilfiker
 
R
.
A return-to-work prognostic model for orthopaedic trauma patients (WORRK) updated for use at 3, 12 and 24 months
.
J Occup Rehabil
.
2017
;
27
:
568
575
. https://doi.org/10.1007/s10926-016-9688-4.

56.

Margison
 
DA
,
French
 
DJ
.
Predicting treatment failure in the subacute injury phase using the Orebro musculoskeletal pain questionnaire: an observational prospective study in a workers' compensation system
.
J Occup Environ Med
.
2007
;
49
:
59
67
. https://doi.org/10.1097/JOM.0b013e31802db51e.

57.

Westman
 
A
,
Linton
 
SJ
,
Ohrvik
 
J
,
Wahlén
 
P
,
Leppert
 
J
.
Do psychosocial factors predict disability and health at a 3-year follow-up for patients with non-acute musculoskeletal pain? A validation of the Orebro musculoskeletal pain screening questionnaire
.
Eur J Pain
.
2008
;
12
:
641
649
. https://doi.org/10.1016/j.ejpain.2007.10.007.

58.

da
 
Silva
 
T
,
Macaskill
 
P
,
Kongsted
 
A
,
Mills
 
K
,
Maher
 
CG
,
Hancock
 
MJ
.
Predicting pain recovery in patients with acute low back pain: updating and validation of a clinical prediction model
.
Eur J Pain
.
2019
;
23
:
341
353
. https://doi.org/10.1002/ejp.1308.

59.

Traeger
 
AC
,
Henschke
 
N
,
Hübscher
 
M
, et al.  
Estimating the risk of chronic pain: development and validation of a prognostic model (PICKUP) for patients with acute low back pain
.
PLoS Med
.
2016
;
13
:
e1002019
. https://doi.org/10.1371/journal.pmed.1002019.

60.

Keene
 
DJ
,
Schlüssel
 
MM
,
Thompson
 
J
, et al.  
Prognostic models for identifying risk of poor outcome in people with acute ankle sprains: the SPRAINED development and external validation study
.
Health Technol Assess
.
2018
;
22
:
1
112
. https://doi.org/10.3310/hta22640.

61.

Dijkland
 
SA
,
Retel Helmrich
 
IRA
,
Steyerberg
 
EW
.
Validation of prognostic models: challenges and opportunities
.
J Emerg Crit Care Med
.
2018
;
2
:91. https://doi.org/10.21037/jeccm.2018.10.10.

62.

McIntosh
 
G
,
Steenstra
 
I
,
Hogg-Johnson
 
S
,
Carter
 
T
,
Hall
 
H
.
Lack of prognostic model validation in low back pain prediction studies: a systematic review
.
Clin J Pain
.
2018
;
34
:
748
754
. https://doi.org/10.1097/AJP.0000000000000591.

63.

Lim
 
YZ
,
Chou
 
L
,
Au
 
RT
, et al.  
People with low back pain want clear, consistent and personalised information on prognosis, treatment options and self-management strategies: a systematic review
.
J Physiother
.
2019
;
65
:
124
135
. https://doi.org/10.1016/j.jphys.2019.05.010.

64.

Tousignant-Laflamme
 
Y
,
Cook
 
CE
,
Mathieu
 
A
, et al.  
Operationalization of the new pain and disability drivers management model: a modified Delphi survey of multidisciplinary pain management experts
.
J Eval Clin Pract
.
2020
;
26
:
316
325
. https://doi.org/10.1111/jep.13190.

65.

Linton
 
SJ
,
Nicholas
 
M
,
Shaw
 
W
.
Why wait to address high-risk cases of acute low back pain? A comparison of stepped, stratified, and matched care
.
Pain
.
2018
;
159
:
2437
2441
. https://doi.org/10.1097/j.pain.0000000000001308.

66.

Hill
 
JC
,
Whitehurst
 
DG
,
Lewis
 
M
, et al.  
Comparison of stratified primary care management for low back pain with current best practice (STarT back): a randomised controlled trial
.
Lancet
.
2011
;
378
:
1560
1571
. https://doi.org/10.1016/S0140-6736(11)60937-9.

67.

Hall
 
JA
,
Jowett
 
S
,
Lewis
 
M
,
Oppong
 
R
,
Konstantinou
 
K
.
The STarT back stratified care model for nonspecific low back pain: a model-based evaluation of long-term cost-effectiveness
.
Pain
.
2021
;
162
:
702
710
. https://doi.org/10.1097/j.pain.0000000000002057.

68.

Bamford
 
A
,
Nation
 
A
,
Durrell
 
S
,
Andronis
 
L
,
Rule
 
E
,
McLeod
 
H
.
Implementing the Keele stratified care model for patients with low back pain: an observational impact study
.
BMC Musculoskelet Disord
.
2017
;
18
:
66
. https://doi.org/10.1186/s12891-017-1412-9.

69.

Beneciuk
 
JM
,
George
 
SZ
.
Pragmatic implementation of a stratified primary care model for low back pain management in outpatient physical therapy settings: two-phase, sequential preliminary study
.
Phys Ther
.
2015
;
95
:
1120
1134
. https://doi.org/10.2522/ptj.20140418.

70.

Silva
 
FG
,
Costa
 
LOP
,
Hancock
 
MJ
,
Palomo
 
GA
,
Costa
 
LCM
,
da
 
Silva
 
T
.
No prognostic model for people with recent-onset low back pain has yet been demonstrated to be suitable for use in clinical practice: a systematic review
.
J Physiother
.
2022
;
68
:
99
109
. https://doi.org/10.1016/j.jphys.2022.03.009.

71.

Archer
 
L
,
Snell
 
KIE
,
Ensor
 
J
,
Hudda
 
MT
,
Collins
 
GS
,
Riley
 
RD
.
Minimum sample size for external validation of a clinical prediction model with a continuous outcome
.
Stat Med
.
2021
;
40
:
133
146
. https://doi.org/10.1002/sim.8766.

72.

Jenkins
 
DA
,
Martin
 
GP
,
Sperrin
 
M
, et al.  
Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems?
 
Diagn Progn Res
.
2021
;
5
:
1
. https://doi.org/10.1186/s41512-020-00090-3.

73.

Vergouwe
 
Y
,
Steyerberg
 
EW
,
Eijkemans
 
MJC
,
Habbema
 
JDF
.
Substantial effective sample sizes were required for external validation studies of predictive logistic regression models
.
J Clin Epidemiol
.
2005
;
58
:
475
483
. https://doi.org/10.1016/j.jclinepi.2004.06.017.

74.

Collins
 
SD
,
Peek
 
N
,
Riley
 
RD
,
Martin
 
GP
.
Sample sizes of prediction model studies in prostate cancer were rarely justified and often insufficient
.
J Clin Epidemiol
.
2021
;
133
:
53
60
. https://doi.org/10.1016/j.jclinepi.2020.12.011.

75.

Snell
 
KIE
,
Archer
 
L
,
Ensor
 
J
, et al.  
External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb
.
J Clin Epidemiol
.
2021
;
135
:
79
89
. https://doi.org/10.1016/j.jclinepi.2021.02.011.

76.

Riley
 
RD
,
Debray
 
TPA
,
Collins
 
GS
, et al.  
Minimum sample size for external validation of a clinical prediction model with a binary outcome
.
Stat Med
.
2021
;
40
:
4230
4251
. https://doi.org/10.1002/sim.9025.

77.

Riley
 
RD
,
Ensor
 
J
,
Snell
 
KI
, et al.  
External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges
.
BMJ
.
2016
;
353
:
i3140
. https://doi.org/10.1136/bmj.i3140.

78.

Herrett
 
E
,
Gallagher
 
AM
,
Bhaskaran
 
K
, et al.  
Data resource profile: clinical practice research datalink (CPRD)
.
Int J Epidemiol
.
2015
;
44
:
827
836
. https://doi.org/10.1093/ije/dyv098.

79.

Traeger
 
AC
,
Hübscher
 
M
,
McAuley
 
JH
.
Understanding the usefulness of prognostic models in clinical decision-making
.
J Physiother
.
2017
;
63
:
121
125
. https://doi.org/10.1016/j.jphys.2017.01.003.

80.

Vickers
 
AJ
,
Elkin
 
EB
.
Decision curve analysis: a novel method for evaluating prediction models
.
Med Decis Mak
.
2006
;
26
:
565
574
. https://doi.org/10.1177/0272989X06295361.

81.

Janssen
 
KJM
,
Moons
 
KGM
,
Kalkman
 
CJ
,
Grobbee
 
DE
,
Vergouwe
 
Y
.
Updating methods improved the performance of a clinical prediction model in new patients
.
J Clin Epidemiol
.
2008
;
61
:
76
86
. https://doi.org/10.1016/j.jclinepi.2007.04.018.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.