Identifying factors associated with user retention and outcomes of a digital intervention for substance use disorder: a retrospective analysis of real-world data

Abstract Objectives Successful delivery of digital health interventions is affected by multiple real-world factors. These factors may be identified in routinely collected, ecologically valid data from these interventions. We propose ideas for exploring these data, focusing on interventions targeting complex, comorbid conditions. Materials and Methods This study retrospectively explores pre-post data collected between 2016 and 2019 from users of digital cognitive behavioral therapy (CBT)—containing psychoeducation and practical exercises—for substance use disorder (SUD) at UK addiction services. To identify factors associated with heterogenous user responses to the technology, we employed multivariable and multivariate regressions and random forest models of user-reported questionnaire data. Results The dataset contained information from 14 078 individuals of which 12 529 reported complete data at baseline and 2925 did so again after engagement with the CBT. Ninety-three percent screened positive for dependence on 1 of 43 substances at baseline, and 73% screened positive for anxiety or depression. Despite pre-post improvements independent of user sociodemographics, women reported more frequent and persistent symptoms of SUD, anxiety, and depression. Retention—minimum 2 use events recorded—was associated more with deployment environment than user characteristics. Prediction accuracy of post-engagement outcomes was acceptable (Area Under Curve [AUC]: 0.74–0.79), depending non-trivially on user characteristics. Discussion Traditionally, performance of digital health interventions is determined in controlled trials. Our analysis showcases multivariate models with which real-world data from these interventions can be explored and sources of user heterogeneity in retention and symptom reduction uncovered. Conclusion Real-world data from digital health interventions contain information on natural user-technology interactions which could enrich results from controlled trials.

sensations", "difficult situations", "negative thoughts", and "lifestyle".BFO visualizes a user's degree of functioning in each of these domains based on their answers from the assessment battery included in the program.After the initial pre-engagement assessment, these answers must be updated at least bi-weekly in order for users to be able to continue accessing the clinical content in BFO.Users who have updated their assessment at least once are considered retained in treatment, and otherwise, dropouts.
Each domain is associated with slide series based psychoeducation on the impact of this domain on functioning ("Information Strategy"), and an interactive, skills building exercise ("Action Strategy").Domain modules can be accessed in any desired order and pace.The interactive exercises make use of a range of evidence-based behavioral change techniques, including refusal and assertiveness skills, emotional regulation, coping strategy enhancement, mindfulness-based cognitive therapy, motivational enhancement, cognitive restructuring, reward and reinforcement, harm reduction and crisis management.[19] All BFO pages are supported by audio or video content.Learnings from psychoeducation and interactive exercises, including user input, can be downloaded.They can also be sent per mail to the user and their BFO recovery supporters which can be nominated in the program through entering up to three email addresses.
The mobile Companion app complements the BFO web app.Specifically, it makes use of geolocation technology to provide alerts of user-inputted locations which bear individual risk of substance use, and calendar and time alerts for planned activities of achievement and enjoyment, and planned steps towards a life goal.

Details of random forest prediction
Pre-engagement answers on items associated with the SDS are used as predictors only if postengagement answers on these items are predicted.500 trees were grown in an individual random forest.The number of predictors randomly sampled at each node split corresponded to the square root of the number of predictors included to predict an outcome.Random forest performance was assessed by calculating ROC curves and areas under them which were averaged across folds.Accumulated local effects, used to illustrate the effects of drivers of prediction on post-engagement anxiety, describe the main effect of an answer on one of these drivers on the predicted probability of anxiety compared to the average prediction. .This allows accounting for the high correlation between items of a scale (range ρ for PHQ-4 items: 0.69 -0.86) when modelling single items.High correlation is also suggested by a high rate of co-occurrence of the same answer category on two items of the same scale in the data, as shown for the PHQ-4 in (b).

Figure S1 :
Figure S1: Correlations between PHQ-4 items.Correlation matrices as in (a) are estimated alongside other parameters of our regression model inspired by [20].This allows accounting for the high correlation between items of a scale (range ρ for PHQ-4 items: 0.69 -0.86) when modelling single items.High correlation is also suggested by a high rate of co-occurrence of the same answer category on two items of the same scale in the data, as shown for the PHQ-4 in (b).

Figure S2 :Figure S3 :Figure S5 :Figure S6 :Figure S7 :
Figure S2: Participant flow through BFO and eligibility.Completion of the preengagement assessment battery is defined as providing an answer for at least one item per psychometric (sub-) scale.Information about missing data in the dataset used for statistical analysis is provided in Table2.